0001202: printf %.Nb with \c in arg (more than N chars into arg) behaviour unclear

ID	Project	Category	View Status	Date Submitted	Last Update

0001202	1003.1(2016/18)/Issue7+TC2	Shell and Utilities	public	2018-08-29 19:26	2024-06-11 09:08

Reporter	kre	Assigned To
Priority	normal	Severity	Objection	Type	Clarification Requested
Status	Closed	Resolution	Accepted As Marked

Name	Robert Elz
Organization
User Reference
Section	XCU 4 -- printf
Page Number	3113
Line Number	104118 - 104120, 104123 - 104126
Interp Status	Approved
Final Accepted Text	0001202:0004341


Summary	0001202: printf %.Nb with \c in arg (more than N chars into arg) behaviour unclear
Description	When a %b format conversion specification is used with printf(1) and the associated string arg contains a \c escape sequence, the operation is that the character preceding the \c is the last one printed (printf exits after printing that character). The specification says that in other words, but that's the effect, and that's clear. What is not clear is whether the \c from the arg string needs to actually be "written" for it to have this effect. Example: printf '%.2bX' 'a string\c' Does this print "a X" (the first 2 characters of the arg, and then as the \c was not used, continue with the remainder of the format (the literal X here). Or does the presence of the \c in the arg string cause processing to stop (even though the \c was not "printed") in which case "a " should be printed. I only really have the builtin printf's in shells to test (and not all shells use a builtin for it) so there are perhaps more interpretations than I have seen, but: Most shells builtin printf's (and NetBSD's /usr/bin/printf) seem to adopt the 2nd approach. bosh (the only one I have actually seen) picks the first. I think both are reasonable interpretations, as the standard is not currently clear. The version of the FreeBSD shell I have to test ignores the precision with %b format conversions, but that looks to have been fixed earlier this month, and from reading their sources, the behaviour now appears to be the "most shells" style. ksh93 is just plain weird (or the version I tested) - it allows precisions on %b formats ('printf "%.2b" "abcd"' prints "ab" as it should, but if there's a \c in the arg string, the precision seems to be ignored, ('printf "%.2b" "abcd\c"' printf "abcd"). That can't be anything but a bug. The text can almost be read to imply the "most shells" format, from the words (line 104123) Bytes from the converted string shall be written which could be interpreted to mean that the arg string should have its escape sequences converted to actual characters first, and then written as if the %b was %s ... if that's done (which is what I think most shells actually do) then the \c will be seen during the conversion step, and will cause further processing to end. But there is no particular reason to read the text that way, the conversion of the arg string, and the output of the bytes, could easily proceed in parallel, with bytes from the arg string that are not to be written not being converted (doing so is really just a waste of time - certainly for all the escapes except \c).
Desired Action	Add a new sentence after the sentence which concludes on line 104120 ... If the \c is not reached during the processing of the string operand (because a precision argument limits the number printed) it is unspecified whether the effects of the \c, just indicated apply, or whether the \c is simply ignored. though I actually would expect that would be re-written to be something that means much the same, but is better written! Alternatively, if bosh builtin printf happens to be the only printf which behaves the way it does, the added sentence could say Note that the presence of a \c in the operand string shall have this effect even if output from the converted operand string terminates, due to a precision argument, before the \c is reached.
Tags	tc3-2008

joerg 2018-08-30 12:46 reporter bugnote:0004092	Some shells convert the %b argument to expand the escape sequences and later let the resulting string be processed by printf(3). This does not work, since printf() cannot forward nul characters but as a side effect, the \c ending is seen while doing the conversion. If you like to permit coded nul bytes to be inside the string parameter, you need to process things completely without help from printf(). If you at the same time like to honor %.#b, the natural behavior is the behavior from bosh since the \c is not seen as string processing is aborted because of the %.# modifier.

shware_systems 2018-08-30 14:43 reporter bugnote:0004093	I think the controlling language here is, at Line 104121: "The interpretation of a <backslash> followed by any other sequence of characters is unspecified. Bytes from the converted string shall be written until the end of the string or the number of bytes indicated by the precision specification is reached. If the precision is omitted, it shall be taken to be infinite, so all bytes up to the end of the converted string shall be written." which indicates the entire argument shall be processed before applying the precision specifier, and \c taking effect as described for the second option, to leave open the possibility of other sequences being used as extensions that may convert to one or more final characters subject to the precision, or treated as a syntax error abort without outputting anything. Because of the possibility of embedded <NUL> characters, this means the conversion has to be treated as a counted string, not a nul-delimited one, to satisfy the last sentence quoted; since it does not have an explicit 'or a NUL character is encountered'. As example, a platform might define a \u to expand to the current GMT with microsecond precision, which would be at least 8 characters (HH:MM:SS) replacing the \u. On another platform it might output Syntax Error on the next line, without 'a X' or 'a ' at all.

kre 2018-08-30 16:28 reporter bugnote:0004095	Re note 4093. There is a reply on the mailing list to most of that, which is not relevant to the current issue. However this part: which indicates the entire argument shall be processed before applying the precision specifier is incorrect, it indicates nothing of the kind, though as the Description of the issue says, it is easy to read it that way. But all it means is that it is converted bytes from the operand string that are written, not that they must be converted before any of the string is processed.

shware_systems 2018-08-30 18:13 reporter bugnote:0004096	Yes, it does say that... Your interpretation would be phrased "bytes from the argument string, as converted, shall be written as if by putc()", or similarly, to require that piecemeal effect, but it says take them from the ~wholly~ converted string, not while converting. It is an indication because the 'wholly' is implied, but I don't see the first interpretation allowed as the intent. The string has to be fully formed before you can take bytes from it, in other words, otherwise it is just a character sequence that may have invalid code points, as bad escapes or according to the charmap of the current locale. Granted, doing it piecemeal is less memory usage intensive, so an implementer might want to read it that first way, but in other parts of the standard it is explicit where this is permitted or required, using phrases like "as if by putc()".

kre 2018-08-30 18:42 reporter bugnote:0004097 Last edited: 2018-09-02 01:50	Re note 4096 ... We could continue with the "yes it does", "no it doesn't" debate for ages. That will accomplish nothing. First, I don't have an interpretation - that is the point. The text does not even consider the issue, which is not a surprise, as to emulate Sys V echo, which is the only reason %b exists, using a precision (or field width) makes no sense at all. I agree that if the "\c only works when consumed" was to be the intent, the text would need to be clearer to specify that. But as you need to infer a non-existing "wholly" (which would not be the correct way to fix it, but never mind) to reach the other conclusion, I think we actually agree that the text as it stands is not clear, and needs to be corrected. We can debate what the solution should be (that should happen on the list, not here in notes) but it is plainly obvious that a fix is needed.

joerg 2019-03-18 17:05 reporter bugnote:0004334 Last edited: 2019-03-21 14:56	It seems that all implementations that honor the \c in the argument string first convert the whole argument string into a new allocated string. This way, they know about the \c. The problem with this implementation is that it limits the possible length of the string as there is a need to have another copy of it. bosh interprets the argument string and keeps a new copy only for the result of the conversion. It thus does not see a \c that appears after "precision" number of characters have been send to the output.

kre 2019-03-19 00:42 reporter bugnote:0004337	Re 0001202:0004334 Implementation details are not really relevant, but it certainly is not necessary to "convert the whole argument string into a new allocated string" and the NetBSD implementation (which does detect \c even if that "character" is not logically part of the output) does not work that way. But even if it did, it would not (could not rationally) limit the length of the string, except perhaps in the most constrained of implementations, as all the args to printf(1) must appear on the command line, which means their total length is limited by ARG_MAX which is typically much much smaller than the amount of available memory in the data or stack space ... making a copy of the string, if that is what an implementation decided to do, would not be an issue to worry about (nor would the performance cost, %b is a very little used conversion operator.) The reason the string is scanned before anything is output, and the \c is seen, even if it would not be output, is that we need to discover how many characters will eventually be output before outputting any of them, so the leading padding (as in %20b when the string produces only 8 chars, needing 12 leading spaces) can be calculated. Of course, we could stop processing the string when the precision is reached (%20.6b with the same arg string would not need to look at the final 2 chars that would be produced from the %b conversion) but performance isn't really an issue, and that just complicates the implementation. So we detect, and process, a \c even if it is in one of those last two positions. So, this is not really an implementation issue, however one decides to implement it, \c could be processed either way wrt its magic function. We can treat a \c in the %b arg as a "stop all processing" just by its presence in the arg, or we an do that only if that "character" would be output. Since %b is (I believe) 100% a POSIX invention, it could be specified either way - but wasn't (either way). It should be. No guidance can be had from its inspiration, as SysIII (or SysV) echo had no field width or precision args - it simply always wrote the entire string (up to a \c if one existed). So there's no help there. It also makes no real practical difference, as (unit tests excepted) no-one actually uses field width or precision parameters with %b - it only gets used to use printf as an alternative to Sys III (or V) echo. So, what should be done is to decide what is the most rational specification and simply define it that way. If we need to change our implementation to match that, we will (and I'm sure others will too) - such a change would not bother anyone. But we won't change unless the standard says we should.

kre 2019-03-19 01:06 reporter bugnote:0004338	Incidentally, one other rational approach to solving this problem would be to simply forbid use of the field width and precision parameters with %b - that is, make it unspecified what happens if either of those (or the left justification flag) is given. None of those are needed for %b's purpose. That would mean changing (deleting) the text that now exists: Bytes from the converted string shall be written until the end of the string or the number of bytes indicated by the precision specification is reached. If the precision is omitted, it shall be taken to be infinite, so all bytes up to the end of the converted string shall be written. and replacing it with something like: Bytes from the converted string shall be written until the end of the string ignoring any field width or precision specifiers. or perhaps: Bytes from the converted string shall be written until the end of the string. The effect of any field width or precision specifier is unspecified. Either of those would vastly simplify the implementation. Another approach would be to handle another missing piece of the specification. Nothing says what the "alternate output format" flag does to 'b' conversions. All the others are specified in XBD 5 (which for some is "undefined"), but 'b' does not exist there, only in printf(1) and the spec of printf(1) ignores the # flag completly. (The other flags are handled by XBD 5 - which specifies the conversion types they apply to, and implicitly makes then no-ops for the others, '+' for example only applies to "signed conversions" but there's nothing wrong with %+u or %+s - the '+' simply does nothing. But '#' is not like that. I doubt there is any good reason, but we could make use of that, and have one of %b and %#b detect and process a \c anywhere it appears in the arg string to the conversion, and the other ignore a \c in that arg string if it would be dropped because of a precision specification.

joerg 2019-03-21 14:58 reporter bugnote:0004340 Last edited: 2019-03-21 14:58	Re 0001202:0004338 There typically is no ARG_MAX limit for printf since this usually is a builtin command. For this reason, it is important how many characters need to be copied to a newly allocated location.

geoffclare 2019-03-21 15:32 manager bugnote:0004341	Interpretation response ------------------------ The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: ------------- When the standard states "bytes from the converted string shall be written" it is not clear whether the string has to be completely converted or can be converted on the fly. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 3113 line 104120 section printf, append to the \c description: If a precision is specified and the argument contains a '\c' after the point at which the number of bytes indicated by the precision specification have been written, it is unspecified whether the '\c' takes effect.

agadmin 2019-03-21 15:48 administrator bugnote:0004344	Interpretation Proposed: 21st March 2019

agadmin 2019-05-09 09:26 administrator bugnote:0004388	Interpretation approved: 9 May 2019

Date Modified	Username	Field	Change
2018-08-29 19:26	kre	New Issue
2018-08-29 19:26	kre	Name	=> Robert Elz
2018-08-29 19:26	kre	Section	=> XCU 4 -- printf
2018-08-29 19:26	kre	Page Number	=> 3113
2018-08-29 19:26	kre	Line Number	=> 104118 - 104120, 104123 - 104126
2018-08-30 12:46	joerg	Note Added: 0004092
2018-08-30 14:43	shware_systems	Note Added: 0004093
2018-08-30 16:28	kre	Note Added: 0004095
2018-08-30 18:13	shware_systems	Note Added: 0004096
2018-08-30 18:42	kre	Note Added: 0004097
2018-09-02 01:50	kre	Note Edited: 0004097
2019-03-18 17:05	joerg	Note Added: 0004334
2019-03-18 17:35	joerg	Note Edited: 0004334
2019-03-19 00:42	kre	Note Added: 0004337
2019-03-19 01:06	kre	Note Added: 0004338
2019-03-21 14:56	joerg	Note Edited: 0004334
2019-03-21 14:58	joerg	Note Added: 0004340
2019-03-21 14:58	joerg	Note Edited: 0004340
2019-03-21 15:32	geoffclare	Note Added: 0004341
2019-03-21 15:33	geoffclare	Interp Status	=> Pending
2019-03-21 15:33	geoffclare	Final Accepted Text	=> 0001202:0004341
2019-03-21 15:33	geoffclare	Status	New => Interpretation Required
2019-03-21 15:33	geoffclare	Resolution	Open => Accepted As Marked
2019-03-21 15:33	geoffclare	Tag Attached: tc3-2008
2019-03-21 15:48	agadmin	Interp Status	Pending => Proposed
2019-03-21 15:48	agadmin	Note Added: 0004344
2019-05-09 09:26	agadmin	Interp Status	Proposed => Approved
2019-05-09 09:26	agadmin	Note Added: 0004388
2019-11-12 15:19	geoffclare	Status	Interpretation Required => Applied
2024-06-11 09:08	agadmin	Status	Applied => Closed

View Issue Details

Activities

Issue History