Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000872 [1003.1(2013)/Issue7+TC1] Base Definitions and Headers Editorial Clarification Requested 2014-08-27 16:16 2019-06-10 08:54
Reporter nsz View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Closed  
Name Szabolcs Nagy
Organization musl libc
User Reference
Section 9.3.5 RE Bracket Expression
Page Number 184
Line Number 5968-5970
Interp Status ---
Final Accepted Text Note: 0002415
Summary 0000872: REG_ICASE regex matching and negated bracket expr
Description In chapter 9 the case insensitive matching of negated (^) bracket
expressions is inconsistent with historical practice.

(1) Case insensitive matching according to section 9.2 "Regular
Expression General Requirements":

 "when each character in the string is matched against the pattern, not
 only the character, but also its case counterpart (if any), shall be
 matched."

(2) Rule 3. in 9.3.5 "RE Bracket Expression":

 "A non-matching list expression begins with a <circumflex> ( '^' ), and
 specifies a list that shall match any single-character collating
 element except for the expressions represented in the list after the
 leading <circumflex>."


these two rules together mean that [^a] should match 'a' and 'A' with
REG_ICASE, because using (1) both 'a' and 'A' should be tried when
matching either of them against the bracket expr and 'A' does match [^a]
according to (2).

on historical implementations [^a] does not match 'a' nor 'A' with
REG_ICASE
Desired Action change

 "A non-matching list expression begins with a <circumflex> ( '^' ), and
 specifies a list that shall match any single-character collating element
 except for the expressions represented in the list after the leading
 <circumflex>."

to

 "A non-matching list expression begins with a <circumflex> ( '^' ), and
 specifies a list that shall match any single-character collating element
 except for the ones that match the expressions represented in the list
 after the leading <circumflex>. Matching the expressions in the list is
 done without regard to the case when the regular expression is matched
 case-insensitively."
Tags tc2-2008
Attached Files

- Relationships
related to 0000938Closed Collation issues in XBD (changes for TC2) 

-  Notes
(0002396)
nsz (reporter)
2014-09-23 18:52

I noticed that the regcomp rationale says:

  The REG_ICASE flag supports the operations taken by the grep -i
  option and the historical implementations of ex and vi. Including
  this flag will make it easier for application code to be written
  that does the same thing as these utilities.

none of the original grep -i, ex, vi (with :set ignorecase)
follow the current posix definition of REG_ICASE (they don't
match [^a] to a or A)
(0002413)
shware_systems (reporter)
2014-10-09 14:22

They aren't supposed to match, with the negate; they're supposed to return match found for 'b' or 'B', etc. If anything those implementations were probably written without REG_ICASE support and not updated, if they are returning match found for '[^a]' tested against 'A' with REG_ICASE specified.

The desired action does emphasize to implementers REG_ICASE needs to be accounted for in evaluating '^', but is nominally superfluous in my opinion.

In Section 9.2, I think it is less ambiguously expressed by changing:

 "when each character in the string is matched against the pattern, not
 only the character, but also its case counterpart (if any), shall be
 matched."

to:

 "when a character in the string is tested against the pattern, not
 only the character, but also its case counterparts (if any), shall be
 tested, and a match occurs if one of them fit the test criteria."

This emphasizes match status is determined after the relevant testing, not presumed true and possibly negated as it can be read now. Note 'counterpart' pluralized, as preliminary ground work for changes required to adequately support Unicode's extra casing classifications. Not adding more, as that's for a separate report as an Issue 8 matter, but I feel it doesn't change the intent of that section for Issue 7.
(0002415)
rhansen (manager)
2014-10-09 15:53

On page 184 lines 5968-5970 (XBD 9.3.5 RE Bracket Expression), change:
A non-matching list expression begins with a <circumflex> ('^'), and specifies a list that shall match any single-character collating element except for the expressions represented in the list after the leading <circumflex>.

to:
A non-matching list expression begins with a <circumflex> ('^'), and the matching behavior shall be the logical inverse of the corresponding matching list expression (the same bracket expression but without the leading <circumflex>).

- Issue History
Date Modified Username Field Change
2014-08-27 16:16 nsz New Issue
2014-08-27 16:16 nsz Name => Szabolcs Nagy
2014-08-27 16:16 nsz Organization => musl libc
2014-08-27 16:16 nsz Section => 9.3.5 RE Bracket Expression
2014-08-27 16:16 nsz Page Number => -
2014-08-27 16:16 nsz Line Number => -
2014-09-23 18:52 nsz Note Added: 0002396
2014-10-09 14:22 shware_systems Note Added: 0002413
2014-10-09 15:27 rhansen Page Number - => 184
2014-10-09 15:27 rhansen Line Number - => 5968-5970
2014-10-09 15:27 rhansen Interp Status => ---
2014-10-09 15:53 rhansen Note Added: 0002415
2014-10-09 15:56 rhansen Final Accepted Text => Note: 0002415
2014-10-09 15:56 rhansen Status New => Resolved
2014-10-09 15:56 rhansen Resolution Open => Accepted As Marked
2014-10-09 15:57 rhansen Tag Attached: tc2-2008
2015-06-04 16:25 eblake Relationship added related to 0000938
2019-06-10 08:54 agadmin Status Resolved => Closed


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker