|Anonymous | Login||2020-08-09 22:41 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001008||[1003.1(2013)/Issue7+TC1] System Interfaces||Objection||Clarification Requested||2015-11-16 22:24||2019-10-21 13:42|
|Priority||normal||Resolution||Accepted As Marked|
|Section||Vol.2, System Interfaces, iconv|
|Line Number||37302 ff.|
|Final Accepted Text||Note: 0003326|
|Summary||0001008: 1. clarify iconv(3) reset usage; 2. truly support Unicode character input|
For the (iconv_t, NULL, NULL, &OBUF, &OBUF_LEN) usage case, POSIX says
When iconv( ) is called in this way,..
[it] shall place, into the output buffer, the byte sequence to change the output buffer to its initial shift state.
POSIX states at a different place (Vol. 1, Base Definitions, 6.4.1 State-Dependent Character Encodings, 2.; p. 133, l. 3830 ff.)
A utility that divides, truncates, or extracts substrings from statefully encoded data shall produce output that contains locking shifts at the beginning or end of the resulting data, if appropriate, to retain correct state information.
Effectively a string must be "atomic" regarding its locking state, otherwise it could not be used by itself. Therefore a reset sequence has to be placed if "normal US-ASCII" text is about to follow (e.g., after a RFC 2047 encoded word in e-mail header that uses stateful encoding).
I wonder wether placing this reset sequence shouldn't be a mandatory task before iconv_close(), since only like that
$ cat file1 file2 > file3
would work according to above wording if file1 has been created via strings that have been converted via iconv() - POSIX doesn't say that a newline character causes locking shift state reset.
Making the (iconv_t, NULL, NULL, &OBUF, &OBUF_LEN) case mandatory before iconv_close() would enable character set conversion to reliably detect and compose ISO 10646 / Unicode constructs like decomposed character sequences and even graphem cluster boundaries.
These "techniques" are basic concepts of Unicode and their understanding may be mandatory in order to be able to perform a correct input charset to output charset conversion.
On the mailing list examples have been given, one replication of which can be found in .
 http://austingroupbugs.net/view.php?id=249#c2923 [^]
It has to be noted that today "hacks" exist to overcome the fact that the envisaged new requirement, e.g., many iconv implementations ship with a special "UTF-8-MAc" character set that i think does nothing but support decomposed characters. ..Ok it seems Apple has chosen not to honour the Unicode standard completely but to not decompose some character ranges due to some internal compatibility problems . Beside that it is decomposed Unicode.
 https://developer.apple.com/library/mac/qa/qa1173/_index.html [^]
1. It should be clarified wether it is necessary to explicitly place a reset sequence after input processing is complete, before iconv_close(). Since iconv() doesn't know that the end of the input is reached, it could otherwise not ensure that the resulting data is valid according to the POSIX specification.
2. If the above is true and clarification will be applied in the envisaged way, POSIX should enhance the iconv description so that it not only talks about state-dependent encodings but also considers Unicode / ISO 10646 text processing requirements, since output character set character composition may be possible only after applying Unicode composition and graphem cluster boundary detection to input data, which may require to hold back data output unless text processing detects a true boundary that can be emitted to the output character set.
1. Excuse me please, this should have been comitted to Issue7+TC1; i searched before i have started editing, and obviously i forgot to switch back the form. I don't know how this could be changed except by opening another issue.
2. I also think, apart from the above, that
37302 For state-dependent encodings, the conversion descriptor cd is placed into its initial shift state by
37303 a call for which inbuf is a null pointer, or for which inbuf points to a null pointer. When iconv( ) is
37304 called in this way, and if outbuf is not a null pointer or a pointer to a null pointer, and outbytesleft
37305 points to a positive value, iconv( ) shall place, into the output buffer, the byte sequence to change
37306 the output buffer to its initial shift state.
should be changed to
For state-dependent encodings, the conversion descriptor cd is placed into its initial shift state by a call for which inbuf is a null pointer, or for which inbuf points to a null pointer. When iconv( ) is called in this way, and if outbuf is not a null pointer or a pointer to a null pointer, and outbytesleft points to a positive value, iconv( ) shall place, into the output buffer, the byte sequence to change the output buffer to its initial shift state, if the former state of the conversion descriptor cd mandates so.
edited on: 2016-08-11 15:22
Add to application usage as a new paragraph after P1110, L37364:
It is the responsibility of the application to ensure that, if the output codeset has a locking-shift encoding, the output buffer is returned to its initial shift state when conversion is completed. This can be accomplished by calling iconv() with inbuf as a null pointer, or with inbuf pointing to a null pointer, before calling iconv_close(). Since the standard does not provide a way to query whether a codeset has a locking-shift encoding, it is recommended that applications always call iconv() in this way before calling iconv_close().
|Note: 0003326 was edited during the 2016-08-11 teleconference to add the last sentence.|
|2015-11-16 22:24||steffen||New Issue|
|2015-11-16 22:24||steffen||Status||New => Under Review|
|2015-11-16 22:24||steffen||Assigned To||=> ajosey|
|2015-11-16 22:24||steffen||Name||=> steffen|
|2015-11-16 22:24||steffen||Section||=> Vol.2, System Interfaces, iconv|
|2015-11-16 22:24||steffen||Page Number||=> 1109|
|2015-11-16 22:24||steffen||Line Number||=> 37302 ff.|
|2015-11-17 16:59||steffen||Note Added: 0002966|
|2015-11-17 17:03||geoffclare||Project||1003.1(2008)/Issue 7 => 1003.1(2013)/Issue7+TC1|
|2016-08-04 16:35||geoffclare||Note Added: 0003326|
|2016-08-04 16:35||geoffclare||Interp Status||=> ---|
|2016-08-04 16:35||geoffclare||Final Accepted Text||=> Note: 0003326|
|2016-08-04 16:35||geoffclare||Status||Under Review => Resolved|
|2016-08-04 16:35||geoffclare||Resolution||Open => Accepted As Marked|
|2016-08-04 16:36||geoffclare||Tag Attached: tc3-2008|
|2016-08-11 15:22||geoffclare||Note Edited: 0003326|
|2016-08-11 15:23||geoffclare||Note Added: 0003335|
|2019-10-21 13:42||geoffclare||Status||Resolved => Applied|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|