Anonymous | Login | 2024-12-12 17:39 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0000243 | [1003.1(2008)/Issue 7] Shell and Utilities | Objection | Enhancement Request | 2010-04-29 19:23 | 2024-06-11 08:53 | ||
Reporter | dwheeler | View Status | public | ||||
Assigned To | ajosey | ||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | David A. Wheeler | ||||||
Organization | IDA | ||||||
User Reference | |||||||
Section | find | ||||||
Page Number | 2740 | ||||||
Line Number | 89194 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | Note: 0006110 | ||||||
Summary | 0000243: Add -print0 to "find" | ||||||
Description |
The POSIX specification and common implementations permit nearly all bytes to be in pathnames, and yet it is surprisingly difficult to portably and correctly process such pathnames. This is one of the more common reason for security vulnerabilities (see CERT’s "Secure Coding" item MSC09-C, CWE 78, CWE 73, and CWE 116, and the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors). For more details about this problem, see: http://www.dwheeler.com/essays/filenames-in-shell.html [^] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [^] The find command's "-exec...+" was intended to fix this, but it is simply inadequate. This is only practical for trivial commands. It also fails to acknowledge a very common construct, find ... -print0 | xargs -0, which is technically not portable (it's not in the spec) but is actually in wide use. The 2008 specification notes that "Other implementations have added other ways to get around this problem, notably a -print0 primary that wrote filenames with a null byte terminator. This was considered here, but not adopted. Using a null terminator meant that any utility that was going to process find's -print0 output had to add a new option to parse the null terminators it would now be reading." I believe that this decision must be revisited. While it's true that adding null terminator support means that other extensions are necessary, the POSIX -exec...+ construct is simply inadequate to support robust filename processing. Complex commands are rediculously unreadable when placed there, for example, and xargs supports other capabilities (such as limiting the number of parameters) that find does not duplicate. Nor should find duplicate xargs; the beauty of POSIX is that different tools can be good at one job. POSIX should either completely forbid the characters such as newline in filenames, or it should be extended to adequately support such filenames. The current situation is that it is too hard to *correctly* process filenames, leading to a number of security vulnerabilities. Expecting users and developers to use complicated constructs to handle filenames is unreasonable and dangerous; they should be given a safer and easy-to-use set of constructs for this common case. |
||||||
Desired Action |
After line 89195, add: -print0 The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a null byte. In lines 89387-89401, delete the now-obsolete text "Other implementations... reading." In line 89285, append: "(but note that pathnames may include newlines, so you cannnot be sure that each line is actually a different pathname)" In the "STDOUT" section, after line 89257, state: The −print0 primary shall cause the current pathnames to be written to standard output, with each pathname terminated by a null byte. The format shall be: "%s", <path> followed by a null byte for each <path>. Note that this change is a prerequisite for several other proposals that are necessary to make "find" useful and secure for ALL pathnames permitted by POSIX. |
||||||
Tags | issue8 | ||||||
Attached Files | |||||||
|
Relationships | |||||||||||||||||||||||||||||||
|
Notes | |
(0000882) Don Cragun (manager) 2011-07-06 23:54 |
The current plan is to add a set of byte values (based on single-byte characters in the C Locale) that will not be allowed in newly created filenames using 0000251 as the bug to make the changes. If consensus is reached on a resolution for bug 251, the plan is to reject and close bugs 243, 244, and 245. These three bugs will remain open until bug 251 is resolved. |
(0001020) dwheeler (reporter) 2011-11-16 18:22 |
On further reflection, I recommend that bugs 243, 244, and 245 be accepted, regardless of the resolution of bug 251. Adding these capabilities will make it easier to implement portable applications. Most POSIX systems today permit filenames with include anything except NUL (including newline). Even if a future version of POSIX forbids it, there's no guarantee that implementations will move quickly to implement this change to POSIX. In addition, most application developers will want to develop software that works correctly on both older and newer systems. Technically older POSIX systems need not implement bug 243, 244, and 245, but they are very widely implemented. Adding these capabilities will make many programs - and various widely-recommended and used constructs - POSIX-compliant. |
(0006091) geoffclare (manager) 2022-12-08 15:39 edited on: 2022-12-09 11:21 |
It is looking like the group might decide to add find -print0 and related xargs and read features (for reasons I won't go into here). To minimise the delay to draft 3 should this be decided, here are some suggested wording changes. Page and line numbers are for Issue 8 draft 2.1. On page 2763 line 91806 section find (OPERANDS), change: The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output. The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a <newline>.-print0The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a null byte. On page 2765 line 91869 section find (STDOUT), change: current pathnames to be writtento: current pathname to be written After page 2765 line 91871 section find (STDOUT), add: The -print0 primary shall cause the current pathname to be written to standard output, followed by a null byte. On page 2766 line 91911 section find (EXAMPLES), after: They both write out the entire directory hierarchy from the current directory.append: With this output format, if any pathnames include <newline> characters, it is not possible to tell where each pathname begins and ends. This problem can be avoided by omitting such pathnames:LC_ALL=POSIX find . -name $'*\n*' -prune -o -printor by using a sentinel in the pathname that find would never otherwise produce, such as:find .//. -printor by using -print0 instead of -print and processing the output with a utility that can accept null-terminated pathnames as input, such as xargs with the -0 option or read with -d "", for example:find . -print0 | while IFS= read -rd "" file do # process "$file" doneIt should be noted that using find with -print0 is less safe than using find with -exec because if find -print0 is terminated after it has written a partial pathname, the partial pathname will be processed as if it was a complete pathname. On page 2769 line 92033-92037 section find (RATIONALE), delete: Other implementations [...] it would now be reading. On page 3106 line 105084 section read (SYNOPSIS), change: to:read [-r] var... read [-r] [-d delim] var... On page 3106 line 105088 section read (DESCRIPTION), change: By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields.to: By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of either <newline> or the logical line delimiter specified with the -d delim option (if it is used and delim is not <newline>); it is unspecified which. If this excepted character follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and the excepted character shall be removed before splitting the input into fields. On page 3106 line 105097 section read (DESCRIPTION), change: The terminating <newline> (if any) shall be removed from the inputto: The terminating logical line delimiter (if any) shall be removed from the input On page 3106 line 105118 section read (OPTIONS), change: The following option is supported:to: The following options shall be supported: On page 3107 line 105125 section read (STDIN), change: The standard input shall be a text file.to: If the -d delim option is not specified, or if it is specified and delim is <newline>, the standard input shall be a text file, except that it can contain lines longer than {LINE_MAX}. After page 3108 line 105167 section read (APPLICATION USAGE), add two new paragraphs: The -d delim option enables reading up to an arbitrary single-byte delimiter. When delim is the null string, the delimiter is the null byte and this allows read to be used to process null-terminated lists of pathnames (as produced by the find -print0 primary), with correct handling of pathnames that contain <newline> characters. Note that in order to specify the null string as the delimiter, -d and delim need to be specified as two separate arguments. Implementations differ in their handling of <backslash> for line continuation when -d delim is specified (and delim is not <newline>); some treat <backslash>delim (or <backslash><NUL> if delim is the null string) as a line continuation, whereas others still treat <backslash><newline> as a line continuation. Consequently, portable applications need to specify -r whenever they specify -d delim (and delim is not <newline>). On page 3108 line 105186 section read (RATIONALE), change: Although the standard input is required to be a text fileto: Although the standard input is required to be a text file (without the {LINE_MAX} limit) when the logical line delimiter is <newline> On page 3365 line 114578 section xargs (SYNOPSIS), change: [-E eofstr]to: [-E eofstr|-0] On page 3365 line 114593 section xargs (DESCRIPTION), change: The application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters. A string of zero or more non-double-quote ('"') characters and non-<newline> characters can be quoted by enclosing them in double-quotes. A string of zero or more non-<apostrophe> ('\'') characters and non-<newline> characters can be quoted by enclosing them in <apostrophe> characters. Any unquoted character can be escaped by preceding it with a <backslash>. The utility named by utility shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. The results are unspecified if the utility named by utility attempts to read from its standard input.to: If the -0 option is not specified, the application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters, and quoting characters shall be interpreted as follows: On page 3365 line 114612 section xargs (OPTIONS -E), change: If -E is not specifiedto: If neither -E nor -0 is specified On page 3365 line 114617 section xargs (OPTIONS -I), change: Insert mode: utility is executed for each logical line from standard input. Arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters. Any unquoted unescaped <blank> characters at the beginning of each line shall be ignored.to: Insert mode: invoke utility for each argument from standard input. If -0 is not specified, arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters, and any unquoted unescaped <blank> characters at the beginning of each line shall be ignored. On page 3366 line 114625 section xargs (OPTIONS -L), change: The utility shall be executed for each non-empty number lines of arguments from standard input. The last invocation of utility shall be with fewer lines of arguments if fewer than number remain. A line is considered to end with the first <newline> unless the last character of the line is an unescaped <blank>; a trailing unescaped <blank> signals continuation to the next non-empty line, inclusive.to: Invoke utility for each set of number arguments from standard input. The last invocation of utility shall be with fewer arguments if fewer than number remain. If the -0 option is not specified, each line in the standard input shall be treated as containing one argument except that empty lines shall be ignored and a line ending with a trailing unescaped <blank> shall signal continuation to the next non-empty line, inclusive; such continuation shall result in removal of all trailing unescaped <blank> characters and all <newline> characters that immediately follow them from the argument. On page 3366 line 114644 section xargs (OPTIONS -s), change: The total number of lines exceeds that specified by the -L option.to: The total number of arguments exceeds that specified by the -L option. After page 3366 line 114655 section xargs (OPTIONS), add: -0Use a null byte as the input argument delimiter and do not treat any other input bytes as special.If the mutually exclusive -0 and -E eofstr options are both specified, the behavior is unspecified, except that if eofstr is the null string the behavior shall be the same as if -0 was specified without -E eofstr. On page 3367 line 114664 section xargs (STDIN), change: The standard input shall be a text file. The results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>.to: If the -0 option is not specified, the standard input shall be a text file and the results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>. On page 3368 line 114722 section xargs (APPLICATION USAGE), change: Note that since input is parsed as lines, ...to: Note that since (if -0 is not specified) input is parsed as lines, ... On page 3368 line 114726 section xargs (APPLICATION USAGE), change: This can be solved by ...to: This can be solved by using the -print0 primary of find together with the xargs -0 option, or by ... |
(0006092) stephane (reporter) 2022-12-08 16:21 edited on: 2022-12-08 16:23 |
> find . ! -name \*'$\n'\* -print Should be: LC_ALL=C find . -name $'*\n*' -prune -o -print > LC_ALL=POSIX read -d "" -r file Should be: IFS= read -rd '' file I don't know of any shell where LC_ALL=POSIX will make a difference. The -r and IFS= are needed in all of them though. Even in yash, the only shell that does care about proper text encoding: $ printf 'a\200b\n' | { LC_ALL=C IFS= read r a; printf '<%s>\n' "$a"; } read: cannot read input: Invalid or incomplete multibyte or wide character <> |
(0006093) stephane (reporter) 2022-12-08 16:32 edited on: 2022-12-08 17:02 |
One of the issues with find -print0 | xargs -0 cmd and that can make it less safe than find -exec cmd {} + is that if find is killed for some reason, or more generally if xargs' input is truncated, you may end up passing the wrong path to cmd as current xargs implementation that support -0 don't mandate the records be delimited. For instance, a: LC_ALL=C find /var/tmp -name '*.tmp' -type d -prune -print0 | xargs -r0 rm -rf Could end up running rm -rf /var if find gets killed (like because it exceeded some resource limit) just after it has output of block that happened to end on the /var or /var/. I don't know if we can do anything about that as it's likely mandating the 0 delimiter could break some existing applications. (that -r should also be added IMO). |
(0006094) geoffclare (manager) 2022-12-09 10:50 |
I have edited Note: 0006091 to address points raised in Note: 0006092 and Note: 0006093. The changes made were to change the example find -name and read -d "" commands along the lines suggested, to add a note there about the safety of find -print0, and to update the addition to read APPLICATION USAGE (at line 105167) to insert "If IFS is not set to the null string" in the last sentence. The behaviour of yash seen in Note: 0006092 is likely not yash's fault: it is probably calling a non-conforming library function to do a multi-byte to wide character conversion. |
(0006095) stephane (reporter) 2022-12-09 12:09 |
To clarify my previous comment, I find that LC_ALL=C or LC_ALL=POSIX is not needed in the specific case of IFS= read -rd '' var, but that's not necessarily the case if $IFS it not empty or -r is not supplied or for other values of delimiters (even single byte ones). I find ksh93u+m (one of the ksh93 forks with read -d '' support, I've not tested others) and zsh are quite buggy, I'm busy raising bug reports ATM. It may be worth specifying that IFS= read -rd '' var should be able to read arbitrary byte values into a variable. About yash, I think it's rather or also that yash doesn't support changing locale charmap midway through a script (within a shell invocation). For a shell that works character-based always, that's hardly surprising. if, from within a UTF-8 locale, printf '\200' | LC_ALL=C read var worked, where 0x80 is not a defined character in most C locales, what would a subsequent printf %s "$var", output when charmap is back to UTF-8? The UTF-8 encoding of some undefined character? And there's the reverse problem if calling LC_ALL=C.UTF-8 read from within a locale where the charmap has fewer characters. It's true though that on my system, printf '\200' | LC_ALL=C yash -c 'read var' fails as mbrtowc() fails with EILSEQ which is not allowed by POSIX. In any case, yash can only be used with text data, encoded in the charmap of the locale that was in effect at the time yash was invoked. In the C locale, on GNU systems at least (where wchar_t uses the Unicode codepoint), it can only deal with ASCII. |
(0006100) geoffclare (manager) 2023-01-09 16:20 edited on: 2023-01-12 09:55 |
Page and line numbers are for Issue 8 draft 2.1. On page 2763 line 91806 section find (OPERANDS), change: The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output. The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a <newline>.-print0The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a null byte. On page 2765 line 91869 section find (STDOUT), change: current pathnames to be writtento: current pathname to be written After page 2765 line 91871 section find (STDOUT), add: The -print0 primary shall cause the current pathname to be written to standard output, followed by a null byte. On page 2766 line 91911 section find (EXAMPLES), after: They both write out the entire directory hierarchy from the current directory.append: With this output format, if any pathnames include <newline> characters, it is not possible to tell where each pathname begins and ends. This problem can be avoided by omitting such pathnames:LC_ALL=POSIX find . -name $'*\n*' -prune -o -printor by using a sentinel in the pathname that find would never otherwise produce, such as:find .//. -printor by using -print0 instead of -print and processing the output with a utility that can accept null-terminated pathnames as input, such as xargs with the -0 option or read with -d "", for example:find . -print0 | while IFS= read -rd "" file do # process "$file" doneIt should be noted that using find with -print0 to pipe input to xargs -0 is less safe than using find with -exec because if find -print0 is terminated after it has written a partial pathname, the partial pathname will be processed as if it was a complete pathname. On page 2769 line 92033-92037 section find (RATIONALE), delete: Other implementations [...] it would now be reading. On page 3106 line 105084 section read (SYNOPSIS), change: to:read [-r] var... read [-r] [-d delim] var... On page 3106 line 105088 section read (DESCRIPTION), change: By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields.to: By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of either <newline> or the logical line delimiter specified with the -d delim option (if it is used and delim is not <newline>); it is unspecified which. If this excepted character follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and the excepted character shall be removed before splitting the input into fields. On page 3106 line 105097 section read (DESCRIPTION), change: The terminating <newline> (if any) shall be removed from the inputto: The terminating logical line delimiter (if any) shall be removed from the input After page 3106 line 105115 section read (DESCRIPTION), add: If end-of-file is detected before a terminating logical line delimiter is encountered, the variables specified by the var operands shall be set as described above and the exit status shall be 1. On page 3106 line 105118 section read (OPTIONS), change: The following option is supported:to: The following options shall be supported: On page 3107 line 105125 section read (STDIN), change: The standard input shall be a text file.to: If the -d delim option is not specified, or if it is specified and delim consists of one single-byte character, the standard input shall contain zero or more characters and shall not contain any null bytes. After page 3108 line 105167 section read (APPLICATION USAGE), add two new paragraphs: The -d delim option enables reading up to an arbitrary single-byte delimiter. When delim is the null string, the delimiter is the null byte and this allows read to be used to process null-terminated lists of pathnames (as produced by the find -print0 primary), with correct handling of pathnames that contain <newline> characters. Note that in order to specify the null string as the delimiter, -d and delim need to be specified as two separate arguments. Implementations differ in their handling of <backslash> for line continuation when -d delim is specified (and delim is not <newline>); some treat <backslash>delim (or <backslash><NUL> if delim is the null string) as a line continuation, whereas others still treat <backslash><newline> as a line continuation. Consequently, portable applications need to specify -r whenever they specify -d delim (and delim is not <newline>). On page 3108 line 105186 section read (RATIONALE), change: Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the −r option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that ``if any’’ is used in ``The terminating <newline> (if any) shall be removed from the input’’ in the description. It is not a relaxation of the requirement for standard input to be a text file.to: Earlier versions of this standard required the standard input to be a text file, and therefore the results were undefined if the input was not empty and end-of-file was detected before a <newline> character was encountered. However, all of the most popular shell implementations have been found to have consistent behavior in this case, and so the behavior is now specified and the requirement for standard input to be a text file has been relaxed to allow non-empty input that does not end with a <newline>. On page 3365 line 114578 section xargs (SYNOPSIS), change: [-E eofstr]to: [-E eofstr|-0] On page 3365 line 114593 section xargs (DESCRIPTION), change: The application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters. A string of zero or more non-double-quote ('"') characters and non-<newline> characters can be quoted by enclosing them in double-quotes. A string of zero or more non-<apostrophe> ('\'') characters and non-<newline> characters can be quoted by enclosing them in <apostrophe> characters. Any unquoted character can be escaped by preceding it with a <backslash>. The utility named by utility shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. The results are unspecified if the utility named by utility attempts to read from its standard input.to: If the -0 option is not specified, the application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters, and quoting characters shall be interpreted as follows: On page 3365 line 114612 section xargs (OPTIONS -E), change: If -E is not specifiedto: If neither -E nor -0 is specified On page 3365 line 114617 section xargs (OPTIONS -I), change: Insert mode: utility is executed for each logical line from standard input. Arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters. Any unquoted unescaped <blank> characters at the beginning of each line shall be ignored.to: Insert mode: invoke utility for each argument from standard input. If -0 is not specified, arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters, and any unquoted unescaped <blank> characters at the beginning of each line shall be ignored. On page 3366 line 114625 section xargs (OPTIONS -L), change: The utility shall be executed for each non-empty number lines of arguments from standard input. The last invocation of utility shall be with fewer lines of arguments if fewer than number remain. A line is considered to end with the first <newline> unless the last character of the line is an unescaped <blank>; a trailing unescaped <blank> signals continuation to the next non-empty line, inclusive.to: Invoke utility for each set of number arguments from standard input. The last invocation of utility shall be with fewer arguments if fewer than number remain. If the -0 option is not specified, each line in the standard input shall be treated as containing one argument except that empty lines shall be ignored and a line ending with a trailing unescaped <blank> shall signal continuation to the next non-empty line, inclusive; such continuation shall result in removal of all trailing unescaped <blank> characters and all <newline> characters that immediately follow them from the argument. On page 3366 line 114644 section xargs (OPTIONS -s), change: The total number of lines exceeds that specified by the -L option.to: The total number of arguments exceeds that specified by the -L option. After page 3366 line 114655 section xargs (OPTIONS), add: -0Use a null byte as the input argument delimiter and do not treat any other input bytes as special.If the mutually exclusive -0 and -E eofstr options are both specified, the behavior is unspecified, except that if eofstr is the null string the behavior shall be the same as if -0 was specified without -E eofstr. On page 3367 line 114664 section xargs (STDIN), change: The standard input shall be a text file. The results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>.to: If the -0 option is not specified, the standard input shall be a text file and the results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>. On page 3368 line 114722 section xargs (APPLICATION USAGE), change: Note that since input is parsed as lines, ...to: Note that since (if -0 is not specified) input is parsed as lines, ... On page 3368 line 114726 section xargs (APPLICATION USAGE), change: This can be solved by ...to: This can be solved by using the -print0 primary of find together with the xargs -0 option, or by ... |
(0006105) geoffclare (manager) 2023-01-10 10:08 |
Reopening because, as discussed on the mailing list, the xargs DESCRIPTION text is not quite right. |
(0006106) geoffclare (manager) 2023-01-10 10:32 |
Revised proposal for the xargs DESCRIPTION change:If the -0 option is not specified, the application shall ensure that arguments in the standard input are delimited by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters, and quoting characters shall be interpreted as follows: The use of "separated" in the text for the -I option should also change to "delimited". In addition, at the end of the find EXAMPLES addition, this text: the partial pathname will be processed as if it was a complete pathname. should say "may" instead of "will". |
(0006107) geoffclare (manager) 2023-01-10 14:46 edited on: 2023-01-10 15:55 |
Another point raised on the mailing list is that xargs -0 is typically used with -r, so it would make sense to add -r as well. Here are some suggested additional changes for that... In the find EXAMPLES change, this text: to pipe input to xargs -0should instead be: to pipe input to xargs -r0 In the last paragraph of the xargs DESCRIPTION change, this sentence: The utility named by utility shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found.should instead be these two: The utility named by utility shall be executed zero or more times until the end-of-file is reached or the logical end-of file string is found. If no arguments are supplied on standard input, the utility named by utility shall be executed zero times if the -r option is specified and shall be executed exactly once if the -r option is not specified. Extra changes to add... On page 3365 line 114578 section xargs (SYNOPSIS), change: [-ptx]to: [-prtx] After page 3366 line 114639 section xargs (OPTIONS), add: -rDo not execute the utility named by utility if no arguments are supplied on standard input. On page 3368 line 114707 section xargs (EXIT STATUS), change: All invocations of utility returned exit status zero.to: Successful completion. |
(0006108) dwheeler (reporter) 2023-01-10 16:00 |
First: My thanks to everyone for reconsidering and moving toward acceptance of this proposal! These changes will make it a little easier to write secure portable software. It's a fair point that trailing data without a terminating \0 could suggest partial data & thus perhaps should be ignored. However, while the current text *allows* addressing this, it doesn't *encourage* addressing this, so I don't think it encourages safe implementations. I have a minor suggestion: use IETF-like language to clarify this, to encourage "better" behavior. That is, change this: > If the standard input is not empty and does not end with a null byte, it is unspecified whether the trailing non-null bytes are ignored or are used as the last argument passed to utility. Into this: > If the standard input is not empty and does not end with a null byte, an implementation should ignore the trailing non-null bytes (as this can signal incomplete data) but may use them as the last argument passed to utility. Thanks! |
(0006110) geoffclare (manager) 2023-01-12 09:56 edited on: 2023-01-12 16:22 |
The following is a copy of Note: 0006100 with the updates suggested in the subsequent notes applied. Page and line numbers are for Issue 8 draft 2.1. On page 2763 line 91806 section find (OPERANDS), change: The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output. The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a <newline>.-print0The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a null byte. On page 2765 line 91869 section find (STDOUT), change: current pathnames to be writtento: current pathname to be written After page 2765 line 91871 section find (STDOUT), add: The -print0 primary shall cause the current pathname to be written to standard output, followed by a null byte. On page 2766 line 91911 section find (EXAMPLES), after: They both write out the entire directory hierarchy from the current directory.append: With this output format, if any pathnames include <newline> characters, it is not possible to tell where each pathname begins and ends. This problem can be avoided by omitting such pathnames:LC_ALL=POSIX find . -name $'*\n*' -prune -o -printor by using a sentinel in the pathname that find would never otherwise produce, such as:find .//. -printor by using -print0 instead of -print and processing the output with a utility that can accept null-terminated pathnames as input, such as xargs with the -0 option or read with -d "", for example:find . -print0 | while IFS= read -rd "" file do # process "$file" doneIt should be noted that using find with -print0 to pipe input to xargs -r0 is less safe than using find with -exec because if find -print0 is terminated after it has written a partial pathname, the partial pathname may be processed as if it was a complete pathname. On page 2769 line 92033-92037 section find (RATIONALE), delete: Other implementations [...] it would now be reading. On page 3106 line 105084 section read (SYNOPSIS), change: to:read [-r] var... read [-r] [-d delim] var... On page 3106 line 105088 section read (DESCRIPTION), change: By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields.to: By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of either <newline> or the logical line delimiter specified with the -d delim option (if it is used and delim is not <newline>); it is unspecified which. If this excepted character follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and the excepted character shall be removed before splitting the input into fields. On page 3106 line 105097 section read (DESCRIPTION), change: The terminating <newline> (if any) shall be removed from the inputto: The terminating logical line delimiter (if any) shall be removed from the input After page 3106 line 105115 section read (DESCRIPTION), add: If end-of-file is detected before a terminating logical line delimiter is encountered, the variables specified by the var operands shall be set as described above and the exit status shall be 1. On page 3106 line 105118 section read (OPTIONS), change: The following option is supported:to: The following options shall be supported: On page 3107 line 105125 section read (STDIN), change: The standard input shall be a text file.to: If the -d delim option is not specified, or if it is specified and delim consists of one single-byte character, the standard input shall contain zero or more characters and shall not contain any null bytes. After page 3108 line 105167 section read (APPLICATION USAGE), add two new paragraphs: The -d delim option enables reading up to an arbitrary single-byte delimiter. When delim is the null string, the delimiter is the null byte and this allows read to be used to process null-terminated lists of pathnames (as produced by the find -print0 primary), with correct handling of pathnames that contain <newline> characters. Note that in order to specify the null string as the delimiter, -d and delim need to be specified as two separate arguments. Implementations differ in their handling of <backslash> for line continuation when -d delim is specified (and delim is not <newline>); some treat <backslash>delim (or <backslash><NUL> if delim is the null string) as a line continuation, whereas others still treat <backslash><newline> as a line continuation. Consequently, portable applications need to specify -r whenever they specify -d delim (and delim is not <newline>). On page 3108 line 105186 section read (RATIONALE), change: Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the −r option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that ``if any’’ is used in ``The terminating <newline> (if any) shall be removed from the input’’ in the description. It is not a relaxation of the requirement for standard input to be a text file.to: Earlier versions of this standard required the standard input to be a text file, and therefore the results were undefined if the input was not empty and end-of-file was detected before a <newline> character was encountered. However, all of the most popular shell implementations have been found to have consistent behavior in this case, and so the behavior is now specified and the requirement for standard input to be a text file has been relaxed to allow non-empty input that does not end with a <newline>. On page 3365 line 114578 section xargs (SYNOPSIS), change: [-ptx]to: [-prtx] On page 3365 line 114578 section xargs (SYNOPSIS), change: [-E eofstr]to: [-E eofstr|-0] On page 3365 line 114593 section xargs (DESCRIPTION), change: The application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters. A string of zero or more non-double-quote ('"') characters and non-<newline> characters can be quoted by enclosing them in double-quotes. A string of zero or more non-<apostrophe> ('\'') characters and non-<newline> characters can be quoted by enclosing them in <apostrophe> characters. Any unquoted character can be escaped by preceding it with a <backslash>. The utility named by utility shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. The results are unspecified if the utility named by utility attempts to read from its standard input.to: If the -0 option is not specified, the application shall ensure that arguments in the standard input are delimited by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters, and quoting characters shall be interpreted as follows: On page 3365 line 114612 section xargs (OPTIONS -E), change: If -E is not specifiedto: If neither -E nor -0 is specified On page 3365 line 114617 section xargs (OPTIONS -I), change: Insert mode: utility is executed for each logical line from standard input. Arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters. Any unquoted unescaped <blank> characters at the beginning of each line shall be ignored.to: Insert mode: invoke utility for each argument from standard input. If -0 is not specified, arguments in the standard input shall be delimited only by unescaped <newline> characters, not by <blank> characters, and any unquoted unescaped <blank> characters at the beginning of each line shall be ignored. On page 3366 line 114625 section xargs (OPTIONS -L), change: The utility shall be executed for each non-empty number lines of arguments from standard input. The last invocation of utility shall be with fewer lines of arguments if fewer than number remain. A line is considered to end with the first <newline> unless the last character of the line is an unescaped <blank>; a trailing unescaped <blank> signals continuation to the next non-empty line, inclusive.to: Invoke utility for each set of number arguments from standard input. The last invocation of utility shall be with fewer arguments if fewer than number remain. If the -0 option is not specified, each line in the standard input shall be treated as containing one argument except that empty lines shall be ignored and a line ending with a trailing unescaped <blank> shall signal continuation to the next non-empty line, inclusive; such continuation shall result in removal of all trailing unescaped <blank> characters and all <newline> characters that immediately follow them from the argument. After page 3366 line 114639 section xargs (OPTIONS), add: -rDo not execute the utility named by utility if no arguments are supplied on standard input. On page 3366 line 114644 section xargs (OPTIONS -s), change: The total number of lines exceeds that specified by the -L option.to: The total number of arguments exceeds that specified by the -L option. After page 3366 line 114655 section xargs (OPTIONS), add: -0Use a null byte as the input argument delimiter and do not treat any other input bytes as special.If the mutually exclusive -0 and -E eofstr options are both specified, the behavior is unspecified, except that if eofstr is the null string the behavior shall be the same as if -0 was specified without -E eofstr. On page 3367 line 114664 section xargs (STDIN), change: The standard input shall be a text file. The results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>.to: If the -0 option is not specified, the standard input shall be a text file and the results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>. On page 3368 line 114707 section xargs (EXIT STATUS), change: All invocations of utility returned exit status zero.to: Successful completion. On page 3368 line 114722 section xargs (APPLICATION USAGE), change: Note that since input is parsed as lines, ...to: Note that since input is parsed as lines (if -0 is not specified), ... On page 3368 line 114726 section xargs (APPLICATION USAGE), change: This can be solved by ...to: This can be solved by using the -print0 primary of find together with the xargs -0 option, or by ... On page 3370 line 114830 section xargs (FUTURE DIRECTIONS), change "None" to: A future version of this standard may require that, when the -0 option is specified, if the standard input is not empty and does not end with a null byte, xargs ignores the trailing non-null bytes. |
Issue History | |||
Date Modified | Username | Field | Change |
2010-04-29 19:23 | dwheeler | New Issue | |
2010-04-29 19:23 | dwheeler | Status | New => Under Review |
2010-04-29 19:23 | dwheeler | Assigned To | => ajosey |
2010-04-29 19:23 | dwheeler | Name | => David A. Wheeler |
2010-04-29 19:23 | dwheeler | Organization | => IDA |
2010-04-29 19:23 | dwheeler | Section | => find |
2010-04-29 19:23 | dwheeler | Page Number | => 2740 |
2010-04-29 19:23 | dwheeler | Line Number | => 89194 |
2011-07-06 23:42 | Don Cragun | Relationship added | related to 0000244 |
2011-07-06 23:42 | Don Cragun | Relationship added | related to 0000245 |
2011-07-06 23:54 | Don Cragun | Note Added: 0000882 | |
2011-11-16 18:22 | dwheeler | Note Added: 0001020 | |
2015-03-12 16:15 | Don Cragun | Relationship added | has duplicate 0000903 |
2022-12-08 15:39 | geoffclare | Note Added: 0006091 | |
2022-12-08 15:40 | geoffclare | Note Edited: 0006091 | |
2022-12-08 16:21 | stephane | Note Added: 0006092 | |
2022-12-08 16:23 | stephane | Note Edited: 0006092 | |
2022-12-08 16:32 | stephane | Note Added: 0006093 | |
2022-12-08 17:02 | stephane | Note Edited: 0006093 | |
2022-12-09 10:22 | geoffclare | Note Edited: 0006091 | |
2022-12-09 10:30 | geoffclare | Note Edited: 0006091 | |
2022-12-09 10:44 | geoffclare | Note Edited: 0006091 | |
2022-12-09 10:50 | geoffclare | Note Added: 0006094 | |
2022-12-09 11:21 | geoffclare | Note Edited: 0006091 | |
2022-12-09 12:09 | stephane | Note Added: 0006095 | |
2023-01-09 16:13 | Don Cragun | Relationship replaced | has duplicate 0000244 |
2023-01-09 16:17 | Don Cragun | Relationship replaced | has duplicate 0000245 |
2023-01-09 16:20 | geoffclare | Note Added: 0006100 | |
2023-01-09 16:23 | geoffclare | Note Edited: 0006100 | |
2023-01-09 16:24 | geoffclare | Note Edited: 0006100 | |
2023-01-09 16:26 | geoffclare | Interp Status | => --- |
2023-01-09 16:26 | geoffclare | Final Accepted Text | => Note: 0006100 |
2023-01-09 16:26 | geoffclare | Status | Under Review => Resolved |
2023-01-09 16:26 | geoffclare | Resolution | Open => Accepted As Marked |
2023-01-09 16:26 | geoffclare | Tag Attached: issue8 | |
2023-01-09 17:07 | geoffclare | Note Edited: 0006100 | |
2023-01-10 10:08 | geoffclare | Note Added: 0006105 | |
2023-01-10 10:08 | geoffclare | Status | Resolved => Under Review |
2023-01-10 10:08 | geoffclare | Resolution | Accepted As Marked => Reopened |
2023-01-10 10:32 | geoffclare | Note Added: 0006106 | |
2023-01-10 14:46 | geoffclare | Note Added: 0006107 | |
2023-01-10 14:50 | geoffclare | Note Edited: 0006107 | |
2023-01-10 15:55 | geoffclare | Note Edited: 0006107 | |
2023-01-10 16:00 | dwheeler | Note Added: 0006108 | |
2023-01-10 16:00 | dwheeler | Note Added: 0006109 | |
2023-01-10 16:50 | dwheeler | Note Deleted: 0006109 | |
2023-01-12 09:55 | geoffclare | Note Edited: 0006100 | |
2023-01-12 09:56 | geoffclare | Note Added: 0006110 | |
2023-01-12 16:22 | geoffclare | Note Edited: 0006110 | |
2023-01-12 16:25 | geoffclare | Final Accepted Text | Note: 0006100 => Note: 0006110 |
2023-01-12 16:25 | geoffclare | Status | Under Review => Resolved |
2023-01-12 16:25 | geoffclare | Resolution | Reopened => Accepted As Marked |
2023-01-12 17:34 | dwheeler | Issue Monitored: dwheeler | |
2023-01-17 12:11 | geoffclare | Status | Resolved => Applied |
2023-08-22 06:28 | Don Cragun | Relationship added | related to 0000251 |
2024-06-11 08:53 | agadmin | Status | Applied => Closed |
2024-10-17 09:12 | geoffclare | Relationship added | related to 0001861 |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |