View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001915 | 1003.1(2016/18)/Issue7+TC2 | Shell and Utilities | public | 2025-03-17 19:17 | 2025-05-09 08:27 |
Reporter | steffen | Assigned To | |||
Priority | normal | Severity | Editorial | Type | Clarification Requested |
Status | Interpretation Required | Resolution | Accepted As Marked | ||
Name | steffen | ||||
Organization | |||||
User Reference | |||||
Section | 2.5.2 | ||||
Page Number | 2479 | ||||
Line Number | 80382 | ||||
Interp Status | Proposed | ||||
Final Accepted Text | 0001915:0007160 | ||||
Summary | 0001915: clarification of 2.6.5 field splitting of 2.5.2 special parameter $* | ||||
Description | I was implementing a shell expression parser. It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh. The standard defines (p. 2479, lines 80382 ff.)
So in an example <code> a() { echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" echo $#,'*'="$*"/$*, } set -- '' 'a' '' for f in ' ' '' : ': ' ' :'; do IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS done </code> my parser was en par with the mentioned shells except for <code> --- .1 2025-03-15 23:38:31.359307576 +0100 +++ .2 2025-03-15 23:38:32.715974215 +0100 @@ -6,10 +6,10 @@ a a a$ 3,*=aaa/a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ </code> After that many months i did not give up and wrote to kre@ and on the bash-bug list: <code> By the very meaning of this [POSIX words] the fields are split individually, *first*. This is exactly what i do. Hence echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" -> 4,1=:a:/ a ,2=a/a,3=/,4=a becomes :a: -> '' + a a -> a '' -> discarded (but remembered as it separates fields) a -> a becomes, with IFS=:, when actually creating the argument :a:a::a becomes the actual argument < a a a> </code> Long story short (initial typo corrected): <code> + /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to + * deviate from POSIX standardized behaviour, and field split the quoted variant instead! + * This applies to $@ as well as $* */ + if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){ + cp = n_var_vlook(n_star, TRU1); + goto jfs_split; + } + + /* In all other cases individually field split the expanded parameters */ </code> Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case. This seems to be the case for multiple decades, if not ever. | ||||
Desired Action | Please clarify whether POSIX *really* meant what it says in *all* cases, whether the text is an omission of taking over application behavior into the first standard version. Or, whether the above special case for non-(IFS-)WS byte in IFS[0] is a regular desired implementation detail. (maybe reorder mantis layout so section etc are at the top again?) | ||||
Tags | tc1-2024 |
|
The following is a copy of the Description, with the "code" tags changed to "pre". --------------------------------------------------------------------------------------------------------------- I was implementing a shell expression parser. It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh. The standard defines (p. 2479, lines 80382 ff.)
So in an example a() { echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" echo $#,'*'="$*"/$*, } set -- '' 'a' '' for f in ' ' '' : ': ' ' :'; do IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS done my parser was en par with the mentioned shells except for --- .1 2025-03-15 23:38:31.359307576 +0100 +++ .2 2025-03-15 23:38:32.715974215 +0100 @@ -6,10 +6,10 @@ a a a$ 3,*=aaa/a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ -4,*=:a::a::a/ a a a,$ +4,*=:a::a::a/ a a a,$ a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ After that many months i did not give up and wrote to kre@ and on the bash-bug list: By the very meaning of this [POSIX words] the fields are split individually, *first*. This is exactly what i do. Hence echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4" -> 4,1=:a:/ a ,2=a/a,3=/,4=a becomes :a: -> '' + a a -> a '' -> discarded (but remembered as it separates fields) a -> a becomes, with IFS=:, when actually creating the argument :a:a::a becomes the actual argument < a a a> Long story short (initial typo corrected): + /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to + * deviate from POSIX standardized behaviour, and field split the quoted variant instead! + * This applies to $@ as well as $* */ + if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){ + cp = n_var_vlook(n_star, TRU1); + goto jfs_split; + } + + /* In all other cases individually field split the expanded parameters */ Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case. This seems to be the case for multiple decades, if not ever. |
|
Is the test script output supposed to be consistent across theoretically conforming shells? Because here I got 5 different outputs. At least AT&T ksh gives the exact same output as busybox sh (ash derivative), dash, and yash. $ printf ',l\nq\n' | ed test.sh 178 a() {$ \techo \$#,1="\$1"/\$1,2="\$2"/\$2,3="\$3"/\$3,4="\$4"$ \techo \$#,'*'="\$*"/\$*,$ }$ set -- '' 'a' ''$ for f in ' ' '' : ': ' ' :'; do$ \tIFS=\$f ; echo "\$*"\$* \$*; a "\$*"\$* \$*;unset IFS$ done$ $ env -i POSIXLY_CORRECT=1 sh -c 'for i in sh posh ksh mksh lksh loksh dash bash yash "busybox sh"; do qfile -v `command -v ${i% *}` ; $i test.sh >|"test_${i}.txt" 2>&1; done' app-alternatives/sh-0: /bin/sh -> lksh app-shells/posh-0.14.1: /bin/posh app-shells/ksh-1.0.8: /bin/ksh app-shells/mksh-59c: /bin/mksh app-shells/mksh-59c: /bin/lksh app-shells/loksh-7.6: /bin/loksh app-shells/dash-0.5.12-r1: /bin/dash app-shells/bash-5.2_p37: /bin/bash app-shells/yash-2.57: /bin/yash sys-apps/busybox-1.36.1-r3: /bin/busybox $ sha1sum test.sh *.txt | sort 031cf59fcbcfc7eede60e35c9ede332bbb962f35 test_posh.txt 7edcc231165d9b3ca2f875501c02e4f8eff73e6b test_loksh.txt 88acd3756f017d1f74d5bb62cfa9a0f0e72a08ee test_bash.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_busybox sh.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_dash.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_ksh.txt a947a1ebd8dc2fcfc1068359836fc6a224a13ccb test_yash.txt c7f2494ab0f12315513dfc2a70ab24d693dd6592 test_lksh.txt c7f2494ab0f12315513dfc2a70ab24d693dd6592 test_mksh.txt c7f2494ab0f12315513dfc2a70ab24d693dd6592 test_sh.txt ce30bb2a9eafa825dcd67be9a60e49529f091166 test.sh $ diff -u test_posh.txt test_loksh.txt --- test_posh.txt 2025-03-18 12:16:56.797930657 +0100 +++ test_loksh.txt 2025-03-18 12:16:56.822930552 +0100 @@ -1,9 +1,9 @@ a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, -a a a -2,1=a a /a a ,2= a / a ,3=/,4= -2,*=a a a /a a a , +a a a +3,1=a/a,2=a/a,3=a/a,4= +3,*=aaa/a a a, :a: a a 3,1=:a:/ a ,2=a/a,3=a/a,4= 3,*=:a::a:a/ a a a, $ diff -u test_posh.txt test_bash.txt --- test_posh.txt 2025-03-18 12:16:56.797930657 +0100 +++ test_bash.txt 2025-03-18 12:16:56.837930489 +0100 @@ -1,15 +1,15 @@ a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, -a a a -2,1=a a /a a ,2= a / a ,3=/,4= -2,*=a a a /a a a , -:a: a a -3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, -:a: a a -3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, +a a a +3,1=a/a,2=a/a,3=a/a,4= +3,*=aaa/a a a, +:a: a a +4,1=:a:/ a ,2=a/a,3=/,4=a +4,*=:a::a::a/ a a a, +:a: a a +4,1=:a:/ a ,2=a/a,3=/,4=a +4,*=:a::a::a/ a a a, a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, $ diff -u test_bash.txt test_dash.txt --- test_bash.txt 2025-03-18 12:16:56.837930489 +0100 +++ test_dash.txt 2025-03-18 12:16:56.830930519 +0100 @@ -4,12 +4,12 @@ a a a 3,1=a/a,2=a/a,3=a/a,4= 3,*=aaa/a a a, -:a: a a -4,1=:a:/ a ,2=a/a,3=/,4=a -4,*=:a::a::a/ a a a, -:a: a a -4,1=:a:/ a ,2=a/a,3=/,4=a -4,*=:a::a::a/ a a a, +:a: a a +3,1=:a:/ a ,2=a/a,3=a/a,4= +3,*=:a::a:a/ a a a, +:a: a a +3,1=:a:/ a ,2=a/a,3=a/a,4= +3,*=:a::a:a/ a a a, a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, $ diff -u test_dash.txt test_lksh.txt --- test_dash.txt 2025-03-18 12:16:56.830930519 +0100 +++ test_lksh.txt 2025-03-18 12:16:56.813930590 +0100 @@ -6,10 +6,10 @@ 3,*=aaa/a a a, :a: a a 3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, +3,*=:a::a:a/ a a a, :a: a a 3,1=:a:/ a ,2=a/a,3=a/a,4= -3,*=:a::a:a/ a a a, +3,*=:a::a:a/ a a a, a a a 3,1= a / a ,2=a/a,3=a/a,4= 3,*= a a a/ a a a, |
|
> Is the test script output supposed to be consistent across theoretically conforming shells? Two different outputs are expected because of the optional discarding of empty fields (when the expansion occurs in a context where field splitting will be performed). If we disregard posh (which we don't usually pay attention to) and loksh (which I believe is descended from pdksh which had appallingly bad conformance), then you're seeing three behaviours. Unfortunately, ksh88 differs from those three: $ sha1sum test_ksh88.txt c615e994262cb2a6a6469a0f967ae8feeaa40966 test_ksh88.txt $ sed -n l test_ksh88.txt a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ a a a $ 2,1=a a /a a ,2= a / a ,3=/,4=$ 2,*=a a a /a a a ,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ 4,*=:a::a::a/ a a a,$ :a: a a$ 4,1=:a:/ a ,2=a/a,3=/,4=a$ 4,*=:a::a::a/ a a a,$ a a a$ 3,1= a / a ,2=a/a,3=a/a,4=$ 3,*= a a a/ a a a,$ (tested using /usr/xpg4/bin/sh on Solaris 11.4). |
|
Here is a much simpler test script which eliminates the unspecified behaviour and shows clearly the issue that Steffen has identified:set a: a IFS=: printf '[%s]\n' $* I get two different results with this script: ksh93 and dash produce: [a] [a]but ksh88, bash and mksh produce: [a] [] [a] The standard requires the ksh93/dash behaviour. Shells which first do a quoted expansion and then split it end up splitting "a::a" which produces the extra empty field. This difference happens because IFS characters are terminators not separators (as stated in the RATIONALE on the sh page). |
|
Note 0001915:0007138 was discussed in the April 24, 2025 teleconference and it was agreed that the standard should allow both behaviours. Interpretation response ------------------------ The standard states how the '*' special parameter is expanded, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- When IFS begins with a non-whitespace character and a positional parameter (other than the last one) ends with a non-whitespace IFS character, some existing implementations produce an extra empty field in the expansion of unquoted $* after the fields that the standard requires to be produced for that positional parameter. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- On page 2479 line 80362 section 2.5.2 ('@' special parameter), change: When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in [xref to 2.6.5].to: When the expansion occurs in a context where field splitting will be performed, these initial fields shall be further processed in the same manner as for the '*' special parameter. On page 2479 line 80383 section 2.5.2 ('*' special parameter), change: When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in [xref to 2.6.5]. When the expansion occurs in a context where field splitting will not be performed, the initial fields shall be joined to form a single field with the value of each parameter separated by the first character of the IFS variable if IFS contains at least one character, or separated by a <space> if IFS is unset, or with no separation if IFS is set to a null string.to: When the expansion occurs in a context where field splitting will not be performed, the initial fields shall be joined to form a single field with the value of each parameter separated by the first character of the IFS variable if IFS contains at least one character, or separated by a <space> if IFS is unset, or with no separation if IFS is set to a null string. When the expansion occurs in a context where field splitting will be performed: After page 3878 line 134509 section C.2.5.2 Special Parameters, add: The Korn shell changed the way $* is expanded between the 1988 and 1993 versions, when subject to field splitting: ksh88 first joined the positional parameters as if $* were quoted and then performed field splitting on the result. This produced an extra empty field if IFS begins with a non-whitespace character and a positional parameter, other than the last one, ends with a non-whitespace IFS character. For example:set a: a; IFS=:; printf '[%s]\n' $* [a] [] [a]The natural expectation is that expanding $* here would produce the same fields as expanding $1 and $2 as separate arguments:set a: a; IFS=:; printf '[%s]\n' $1 $2 [a] [a]and this is what ksh93 does. Since David Korn was involved with development of the shell language requirements in POSIX.2-1992 at the same time that he was working on ksh93, it is believed that the intention was for POSIX.2-1992 to require the ksh93 behavior, but the wording was not sufficiently clear and some shells have continued to behave like ksh88. This standard now explicitly allows both behaviors (and likewise for $@ when subject to field splitting). However, implementors of shells that behave like ksh88 are encouraged to change to the ksh93 behavior. |
|
Test note to see if I can post here. |
|
Interpretation proposed: 9 May 2025 |
Date Modified | Username | Field | Change |
---|---|---|---|
2025-03-17 19:17 | steffen | New Issue | |
2025-03-18 10:31 | geoffclare | Note Added: 0007122 | |
2025-03-18 11:29 | lanodan | Note Added: 0007123 | |
2025-03-18 11:32 | lanodan | Note Edited: 0007123 | |
2025-03-18 12:11 | geoffclare | Note Added: 0007124 | |
2025-04-17 10:00 | geoffclare | Note Added: 0007138 | |
2025-04-29 11:16 | geoffclare | Note Added: 0007160 | |
2025-04-29 11:20 | geoffclare | Note Edited: 0007160 | |
2025-04-29 11:20 | geoffclare | Note Edited: 0007160 | |
2025-04-30 18:12 | chet_ramey | Note Added: 0007164 | |
2025-05-08 15:19 | geoffclare | Note Edited: 0007160 | |
2025-05-08 15:22 | geoffclare | Status | New => Interpretation Required |
2025-05-08 15:22 | geoffclare | Resolution | Open => Accepted As Marked |
2025-05-08 15:22 | geoffclare | Interp Status | => Pending |
2025-05-08 15:22 | geoffclare | Final Accepted Text | => 0001915:0007160 |
2025-05-08 15:23 | geoffclare | Tag Attached: tc1-2024 | |
2025-05-09 08:27 | agadmin | Interp Status | Pending => Proposed |
2025-05-09 08:27 | agadmin | Note Added: 0007182 |