|Anonymous | Login||2023-03-21 10:43 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details [ Jump to Notes ]||[ Issue History ] [ Print ]|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0000322||[1003.1(2004)/Issue 6] Shell and Utilities||Objection||Omission||2010-09-18 18:01||2010-09-18 18:20|
|Reporter||Don Cragun||View Status||public|
|Final Accepted Text|
|Summary||0000322: Defect in XCU File Format Notation|
The printf utility in XCU and the printf library interface in XSH (and C99) have great similarity, but are not completely aligned.
In particular, C99 allows additional escape sequences in the format string (which in C99 is just an array of characters).
C99 defines a character constant as including an escape sequence:
escape-sequence: simple-escape-sequence octal-escape-sequence hexadecimal-escape-sequence universal-character-name
The file format notation (used by the printf utility) has the first two of these (the simple escape sequence is slightly different, though not in a way that is detectable, in that it does not explicitly require \' \" and \? processing, but the shell for
The file format notation does not require hexadecimal or universal character names.
Hexadecimal escapes in particular are commonly implemented as extensions (e.g. coreutils), and frequently used when patching binary files. Indeed, coreutils also implements universal-character-names.
(draft 3.4 page XCU 3038 line 100761)
change: In addition to the escape sequences shown in XBD Chapter 5 (on page 119) (\\, \a, \b, \f, \n, \r, \t, \v), "\ddd", where ddd is a one, two, or three-digit octal number, shall be written as a byte with the numeric value specified by the octal number. to In addition to the escape sequences shown in XBD Chapter 5 (on page 119) (\\, \a, \b, \f, \n, \r, \t, \v), the following escape sequences shall also be recognized: + "\ddd", where ddd is a one, two, or three-digit octal number, shall be written as a byte with the numeric value specified by the octal number. + "\xdd", where dd is a one or two digit hexadecimal number, shall be written as a byte with the numeric value specified by the hexadecimal number. + "\udddd", where dddd is a four character hexadecimal number, shall be written as the character whose four-digit short identifier as specified by ISO/IEC 10646) is dddd, except that a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (), or one in the range D800 through DFFF inclusive need not be supported. + "\Udddddddd", where dddddddd is an eight character hexadecimal number shall be written as the character whose eight-digit short identifier as specified by ISO/IEC 10646) is dddddddd, except that a character whose short identifier is less than 000000A0 other than 00000024 ($), 00000040 (@), or 00000060 (), or one in the range 0000D800 through 0000DFFF inclusive need not be supported. NOTE: The disallowed characters are the characters in the basic character set and the code positions reserved by ISO/IEC 10646 for control characters, the character DELETE, and the S-zone (reserved for use by UTF−16) ======= Before line 100778 (d3.4), insert the following: + "\xdd" where dd is a one, or two digit hexadecimal number that shall be converted to a byte with the numeric value specified by the hexadecimal number
|Tags||No tags attached.|
Don Cragun (manager)
This item was sent down the interpretations track as AI-211, but
encountered significant objections. Hence this item has been rejected.
The former disposition of the bug is enclosed here:
ERN 177(->interps track) AI-211
Send down the interps track, the std is silent on the issue
Take the changes in the Desired Action and then also update the section
of APP USAGE to
line 100839 in d3.4 top of page 3040
The treatment of hexadecimal escapes in the format operand differs from
the way hexadecimal character constants are recognized in a C string
literal used as the format argument for the printf() family of functions.
In the printf utility they are limited to two hexadecimal digits because
otherwise there would be no consistent way to detect the end of the
constant. In C string literals, hexadecimal escape sequences are only
terminated by a non-hex-digit character or the end of the string literal;
concatenation of adjacent string literals can be used to terminate an
escape sequence and follow it with a hexadecimal character to be written.
In the shell, concatenation occurs before the printf utility has a chance
to parse the end of the hexadecimal constant.
Don Cragun (manager)
edited on: 2010-09-20 22:08
As noted in Note: 0000556 an attempt to approve the
proposed resolution as an interpretation encountered significant objections.
The discussion concerning this proposal can be found in austin-group-l e-mail
sequence numbers: 11363, 11373-11374, 11502, 11509-11510, 11512-11514,
11516, and 11519-11528.
See 0000249 for a request to add $'...' to the shell as proposed by some in the
|2010-09-18 18:01||Don Cragun||New Issue|
|2010-09-18 18:01||Don Cragun||Status||New => Under Review|
|2010-09-18 18:01||Don Cragun||Assigned To||=> ajosey|
|2010-09-18 18:01||Don Cragun||Name||=> Nick Stoughton|
|2010-09-18 18:01||Don Cragun||Organization||=> USENIX|
|2010-09-18 18:01||Don Cragun||User Reference||=> nms-hex-string|
|2010-09-18 18:01||Don Cragun||Section||=> printf|
|2010-09-18 18:01||Don Cragun||Page Number||=> 743|
|2010-09-18 18:01||Don Cragun||Line Number||=> 28941-28944|
|2010-09-18 18:01||Don Cragun||Interp Status||=> ---|
|2010-09-18 18:08||Don Cragun||Note Added: 0000556|
|2010-09-18 18:08||Don Cragun||Resolution||Open => Rejected|
|2010-09-18 18:08||Don Cragun||Desired Action Updated|
|2010-09-18 18:12||Don Cragun||Relationship added||related to 0000249|
|2010-09-18 18:20||Don Cragun||Note Added: 0000557|
|2010-09-18 18:20||Don Cragun||Status||Under Review => Closed|
|2010-09-20 22:08||Don Cragun||Note Edited: 0000557|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|