Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000948 [1003.1(2013)/Issue7+TC1] Base Definitions and Headers Objection Error 2015-05-11 15:38 2016-02-04 16:33
Reporter geoffclare View Status public  
Assigned To
Priority normal Resolution Accepted  
Status Resolved  
Name Geoff Clare
Organization The Open Group
User Reference
Section 7.3.2, 9.3.5
Page Number 147, 150, 184
Line Number 4393, 4503, 5963, and more
Interp Status ---
Final Accepted Text
Summary 0000948: Collation issues in XBD (changes for Issue 8)
Description A discussion on the mailing list identified some issues related to
collation for locales that do not define a collation sequence with
a total ordering of all characters. It is proposed that these issues
are addressed in Issue 8 by requiring implementation-provided locales
that do not have an '@' modifier in their name to define a collation
sequence that has a total ordering of all characters (thus reducing
the problem to "special" locales and user-defined locales), and by
modifying the requirements for regular expressions and affected
utilities so that they cope better with such locales. As an
intermediate step, it is proposed that the new requirements slated
for Issue 8 are recommended (or at least allowed) in TC2.

The necessary changes will be split across four Mantis bugs, targeting
XBD TC2, XCU TC2, XBD Issue 8, and XCU Issue 8. This bug contains the
changes proposed for XBD in Issue 8.
Desired Action After applying the bug 0000938 changes at each of the following locations, make further changes to the new text as noted below.

On Page: 147 Line: 4393 Section: 7.3.2 LC_COLLATE

In the new paragraph after the numbered list, change from:

All implementation-provided locales (either preinstalled or provided as locale definitions which can be installed later) should define ...

to:

All implementation-provided locales (either preinstalled or provided as locale definitions which can be installed later) shall define ...

and delete the first of the new small-font notes:

<small>Note: a future version of this standard may require these locales to define a collation sequence that has a total ordering of all characters (by changing "should" to "shall").</small>

On Page: 150 Line: 4503 Section: 7.3.2.4 Collation Order

In the new paragraph, change from:

Weights should be assigned such that the collation sequence ...

to:

Weights shall be assigned such that the collation sequence ...

and delete the small-font note:

<small>Note: a future version of this standard may require a total ordering of all characters for implementation-provided locales that do not have an '@' modifier in the locale name. See [xref to 7.3.2].</small>

On Page: 150 Line: 4517 Section: 7.3.2.4 Collation Order

In the updated text, change from:

If the collation order has only one weight level, these characters should be assigned unique primary weights, equal to the relative order of their character in the character collation sequence, but may be assigned the same primary weight.

to:

If the collation order has only one weight level, these characters shall be assigned unique primary weights, equal to the relative order of their character in the character collation sequence.

and delete the small-font note:

<small>Note: a future version of this standard may require these characters to be assigned unique primary weights if the collation order has only one weight level.</small>

On Page: 184 Line: 5963 Section: 9.3.5 RE Bracket Expression

In the updated list item 2, change from:

An ordinary character in the list should only match that character, but may match any single character that collates equally with that character; for example, "[abc]" is an RE that should only match one of the characters 'a', 'b', or 'c'.

to:

An ordinary character in the list shall only match that character; for example, "[abc]" is an RE that only matches one of the characters 'a', 'b', or 'c'.

and delete the small-font note:

<small>Note: a future version of this standard may require that an ordinary character in the list only matches that character.</small>

On Page: 184 Line: 5970 Section: 9.3.5 RE Bracket Expression

In the updated list item 3, change from:

For example, if the RE "[abc]" only matches 'a', 'b', or 'c', then "[^abc]" is an RE that matches any character except 'a', 'b', or 'c'.

to:

For example, since the RE "[abc]" only matches 'a', 'b', or 'c', it follows that "[^abc]" is an RE that matches any character except 'a', 'b', or 'c'.


Cross-volume changes to XRAT ...

On Page: 3490 Line: 117820 Section: A.7.3.2 LC_COLLATE

In the new paragraph, change from:

This standard recommends (by the use of "should" in the normative text) that ...

to:

This standard requires that ...
Tags issue8, UTF-8_Locale
Attached Files

- Relationships
related to 0000938Closed Collation issues in XBD (changes for TC2) 
related to 0000963Closed Collation issues in XCU (changes for TC2) 
related to 0001070Resolved Collation issues in XCU (changes for Issue 8) 

-  Notes
(0002697)
eblake (manager)
2015-06-04 18:02

Is this proposed wording still accurate, in light of 0000872 documenting how REG_ICASE affects range expressions?
 An ordinary character in the list shall only match that character; for example, "[abc]" is an RE that only matches one of the characters 'a', 'b', or 'c'.
(0002698)
shware_systems (reporter)
2015-06-04 18:38

It nominally is, as 'upper' and 'lower' determination is independant of collation order and bug 872 relates to equality testing only, not ordering, but a clarification 'when REG_ICASE has not been specified' inserted somewhere in there wouldn't hurt either, imo.
(0002699)
shware_systems (reporter)
2015-06-04 20:58
edited on: 2015-06-04 21:00

Where the wording changes introduce a possible ambiguity is with the strxfrm() interface. The C standard just states the interface shall refer to the LC_COLLATE category, but is not explicit about when items are copied from a source string verbatim and when a transformed substitute must be stored, so the change from should to shall may break some existing implementations.

If the collation weightings are set up so that after COLL_WEIGHTS_MAX weights have been examined two elements can still compare as equal, though their binary value differs, it is not specified which element has primacy for storage so that strcmp() is deterministic. Using case insensitive on letters as an example, does the lower case or upper case version of a character get copied or always stored, or is it the first member of the given weight class pulled from the LC_COLLATE category that gets stored, which may be upper where what was input is lower.

I can see the latter being the intent, but some implementations may prefer a particular case as a last determining factor, to match the expectations of various standards the transformed strings are routinely used with. Maybe I'm off, but I don't see the language precluding such a preference being implemented.

(0002700)
geoffclare (manager)
2015-06-05 09:38

(Response to Note: 0002697)
I don't see any problem with the new wording as regards REG_ICASE. The way XBD chapter 9 is structured is that the details in 9.3 and 9.4 describe the normal case-sensitive matching process and the variation needed for case-insensitive matching is covered by this statement in 9.2:

"When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched."

- Issue History
Date Modified Username Field Change
2015-05-11 15:38 geoffclare New Issue
2015-05-11 15:38 geoffclare Name => Geoff Clare
2015-05-11 15:38 geoffclare Organization => The Open Group
2015-05-11 15:38 geoffclare Section => 7.3.2, 9.3.5
2015-05-11 15:38 geoffclare Page Number => 147, 150, 184
2015-05-11 15:38 geoffclare Line Number => 4393, 4503, 5963, and more
2015-05-11 15:38 geoffclare Interp Status => ---
2015-05-11 15:39 geoffclare Relationship added related to 0000938
2015-06-04 18:02 eblake Note Added: 0002697
2015-06-04 18:38 shware_systems Note Added: 0002698
2015-06-04 20:58 shware_systems Note Added: 0002699
2015-06-04 21:00 shware_systems Note Edited: 0002699
2015-06-05 09:38 geoffclare Note Added: 0002700
2015-07-30 17:08 rhansen Tag Attached: issue8
2015-07-30 17:08 rhansen Tag Attached: UTF-8_Locale
2016-02-04 16:18 nick Relationship added related to 0000963
2016-02-04 16:33 Don Cragun Status New => Resolved
2016-02-04 16:33 Don Cragun Resolution Open => Accepted
2016-02-04 16:33 Don Cragun Desired Action Updated
2016-08-25 11:12 geoffclare Relationship added related to 0001070


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker