View Issue Details

IDProjectCategoryView StatusLast Update
00019151003.1(2016/18)/Issue7+TC2Shell and Utilitiespublic2025-05-09 08:27
Reportersteffen Assigned To 
PrioritynormalSeverityEditorialTypeClarification Requested
Status Interpretation RequiredResolutionAccepted As Marked 
Namesteffen
Organization
User Reference
Section2.5.2
Page Number2479
Line Number80382
Interp StatusProposed
Final Accepted Text0001915:0007160
Summary0001915: clarification of 2.6.5 field splitting of 2.5.2 special parameter $*
DescriptionI was implementing a shell expression parser.
It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh.
The standard defines (p. 2479, lines 80382 ff.)


   [.]one field for
  each positional parameter that is set. When the expansion occurs in a context where field
  splitting will be performed, any empty fields may be discarded and each of the non-empty
  fields shall be further split as described in Section 2.6.5.


So in an example
<code>
  a() {
          echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
          echo $#,'*'="$*"/$*,
  }
  set -- '' 'a' ''
  for f in ' ' '' : ': ' ' :'; do
          IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS
  done
</code>

my parser was en par with the mentioned shells except for
<code>
  --- .1 2025-03-15 23:38:31.359307576 +0100
  +++ .2 2025-03-15 23:38:32.715974215 +0100
  @@ -6,10 +6,10 @@ a a a$
   3,*=aaa/a a a,$
   :a: a a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a a a,$
  +4,*=:a::a::a/ a a a,$
   :a: a a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a a a,$
  +4,*=:a::a::a/ a a a,$
    a a a$
   3,1= a / a ,2=a/a,3=a/a,4=$
   3,*= a a a/ a a a,$
</code>

After that many months i did not give up and wrote to kre@ and on the bash-bug list:
<code>
By the very meaning of this [POSIX words] the fields are split individually,
*first*. This is exactly what i do.
Hence
    echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
->
    4,1=:a:/ a ,2=a/a,3=/,4=a
becomes
    :a: -> '' + a
    a -> a
    '' -> discarded (but remembered as it separates fields)
    a -> a
becomes, with IFS=:, when actually creating the argument
    :a:a::a
becomes the actual argument
    < a a a>
</code>

Long story short (initial typo corrected):

<code>
  + /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to
  + * deviate from POSIX standardized behaviour, and field split the quoted variant instead!
  + * This applies to $@ as well as $* */
  + if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){
  + cp = n_var_vlook(n_star, TRU1);
  + goto jfs_split;
  + }
  +
  + /* In all other cases individually field split the expanded parameters */
</code>

Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case.
This seems to be the case for multiple decades, if not ever.
Desired ActionPlease clarify whether POSIX *really* meant what it says in *all* cases, whether the text is an omission of taking over application behavior into the first standard version.
Or, whether the above special case for non-(IFS-)WS byte in IFS[0] is a regular desired implementation detail.

(maybe reorder mantis layout so section etc are at the top again?)
Tagstc1-2024

Activities

geoffclare

2025-03-18 10:31

manager   bugnote:0007122

The following is a copy of the Description, with the "code" tags changed to "pre".
---------------------------------------------------------------------------------------------------------------

I was implementing a shell expression parser.
It was impossible to generate $* splitting compatible compatible with bash, NetBSD sh and NetBSD ksh.
The standard defines (p. 2479, lines 80382 ff.)


   [.]one field for
  each positional parameter that is set. When the expansion occurs in a context where field
  splitting will be performed, any empty fields may be discarded and each of the non-empty
  fields shall be further split as described in Section 2.6.5.


So in an example
  a() {
          echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
          echo $#,'*'="$*"/$*,
  }
  set -- '' 'a' ''
  for f in ' ' '' : ': ' ' :'; do
          IFS=$f ; echo "$*"$* $*; a "$*"$* $*;unset IFS
  done


my parser was en par with the mentioned shells except for
  --- .1  2025-03-15 23:38:31.359307576 +0100
  +++ .2  2025-03-15 23:38:32.715974215 +0100
  @@ -6,10 +6,10 @@ a a a$
   3,*=aaa/a a a,$
   :a: a  a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a  a  a,$
  +4,*=:a::a::a/ a a  a,$
   :a: a  a$
   4,1=:a:/ a ,2=a/a,3=/,4=a$
  -4,*=:a::a::a/ a  a  a,$
  +4,*=:a::a::a/ a a  a,$
    a  a a$
   3,1= a / a ,2=a/a,3=a/a,4=$
   3,*= a  a a/ a a a,$


After that many months i did not give up and wrote to kre@ and on the bash-bug list:
By the very meaning of this [POSIX words] the fields are split individually,
*first*.  This is exactly what i do.
Hence
    echo $#,1="$1"/$1,2="$2"/$2,3="$3"/$3,4="$4"
->
    4,1=:a:/ a ,2=a/a,3=/,4=a
becomes
    :a: -> '' + a
    a -> a
    '' -> discarded (but remembered as it separates fields)
    a -> a
becomes, with IFS=:, when actually creating the argument
    :a:a::a
becomes the actual argument
    < a a  a>


Long story short (initial typo corrected):

  +                       /* In order to be compatible with bash, NetBSD sh and NetBSD ksh, at minimum, we need to
  +                        * deviate from POSIX standardized behaviour, and field split the quoted variant instead!
  +                        * This applies to $@ as well as $* */
  +                       if(*spcp->spc_ifs != '\0' && !su_cs_is_space(*spcp->spc_ifs)){
  +                               cp = n_var_vlook(n_star, TRU1);
  +                               goto jfs_split;
  +                       }
  +
  +                       /* In all other cases individually field split the expanded parameters */


Ie, the mentioned shells use the *quoted* variant of $* to perform the expansion in the mentioned case.
This seems to be the case for multiple decades, if not ever.

lanodan

2025-03-18 11:29

reporter   bugnote:0007123

Last edited: 2025-03-18 11:32

Is the test script output supposed to be consistent across theoretically conforming shells? Because here I got 5 different outputs.

At least AT&T ksh gives the exact same output as busybox sh (ash derivative), dash, and yash.

$ printf ',l\nq\n' | ed test.sh
178
a() {$
\techo \$#,1="\$1"/\$1,2="\$2"/\$2,3="\$3"/\$3,4="\$4"$
\techo \$#,'*'="\$*"/\$*,$
}$
set -- '' 'a' ''$
for f in ' ' '' : ': ' ' :'; do$
\tIFS=\$f ; echo "\$*"\$* \$*; a "\$*"\$* \$*;unset IFS$
done$
$ env -i POSIXLY_CORRECT=1 sh -c 'for i in sh posh ksh mksh lksh loksh dash bash yash "busybox sh"; do qfile -v `command -v ${i% *}` ; $i test.sh >|"test_${i}.txt" 2>&1; done'
app-alternatives/sh-0: /bin/sh -> lksh
app-shells/posh-0.14.1: /bin/posh
app-shells/ksh-1.0.8: /bin/ksh
app-shells/mksh-59c: /bin/mksh
app-shells/mksh-59c: /bin/lksh
app-shells/loksh-7.6: /bin/loksh
app-shells/dash-0.5.12-r1: /bin/dash
app-shells/bash-5.2_p37: /bin/bash
app-shells/yash-2.57: /bin/yash
sys-apps/busybox-1.36.1-r3: /bin/busybox
$ sha1sum test.sh *.txt | sort
031cf59fcbcfc7eede60e35c9ede332bbb962f35  test_posh.txt
7edcc231165d9b3ca2f875501c02e4f8eff73e6b  test_loksh.txt
88acd3756f017d1f74d5bb62cfa9a0f0e72a08ee  test_bash.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_busybox sh.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_dash.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_ksh.txt
a947a1ebd8dc2fcfc1068359836fc6a224a13ccb  test_yash.txt
c7f2494ab0f12315513dfc2a70ab24d693dd6592  test_lksh.txt
c7f2494ab0f12315513dfc2a70ab24d693dd6592  test_mksh.txt
c7f2494ab0f12315513dfc2a70ab24d693dd6592  test_sh.txt
ce30bb2a9eafa825dcd67be9a60e49529f091166  test.sh
$ diff -u test_posh.txt test_loksh.txt
--- test_posh.txt       2025-03-18 12:16:56.797930657 +0100
+++ test_loksh.txt      2025-03-18 12:16:56.822930552 +0100
@@ -1,9 +1,9 @@
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
-a a   a
-2,1=a a /a a ,2= a / a ,3=/,4=
-2,*=a a  a /a a   a ,
+a a a
+3,1=a/a,2=a/a,3=a/a,4=
+3,*=aaa/a a a,
 :a: a a
 3,1=:a:/  a ,2=a/a,3=a/a,4=
 3,*=:a::a:a/ a  a a,
$ diff -u test_posh.txt test_bash.txt
--- test_posh.txt       2025-03-18 12:16:56.797930657 +0100
+++ test_bash.txt       2025-03-18 12:16:56.837930489 +0100
@@ -1,15 +1,15 @@
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
-a a   a
-2,1=a a /a a ,2= a / a ,3=/,4=
-2,*=a a  a /a a   a ,
-:a: a a
-3,1=:a:/  a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a  a a,
-:a: a a
-3,1=:a:/  a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a  a a,
+a a a
+3,1=a/a,2=a/a,3=a/a,4=
+3,*=aaa/a a a,
+:a: a  a
+4,1=:a:/ a ,2=a/a,3=/,4=a
+4,*=:a::a::a/ a  a  a,
+:a: a  a
+4,1=:a:/ a ,2=a/a,3=/,4=a
+4,*=:a::a::a/ a  a  a,
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
$ diff -u test_bash.txt test_dash.txt
--- test_bash.txt       2025-03-18 12:16:56.837930489 +0100
+++ test_dash.txt       2025-03-18 12:16:56.830930519 +0100
@@ -4,12 +4,12 @@
 a a a
 3,1=a/a,2=a/a,3=a/a,4=
 3,*=aaa/a a a,
-:a: a  a
-4,1=:a:/ a ,2=a/a,3=/,4=a
-4,*=:a::a::a/ a  a  a,
-:a: a  a
-4,1=:a:/ a ,2=a/a,3=/,4=a
-4,*=:a::a::a/ a  a  a,
+:a: a a
+3,1=:a:/ a ,2=a/a,3=a/a,4=
+3,*=:a::a:a/ a a a,
+:a: a a
+3,1=:a:/ a ,2=a/a,3=a/a,4=
+3,*=:a::a:a/ a a a,
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,
$ diff -u test_dash.txt test_lksh.txt
--- test_dash.txt       2025-03-18 12:16:56.830930519 +0100
+++ test_lksh.txt       2025-03-18 12:16:56.813930590 +0100
@@ -6,10 +6,10 @@
 3,*=aaa/a a a,
 :a: a a
 3,1=:a:/ a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a a a,
+3,*=:a::a:a/ a  a a,
 :a: a a
 3,1=:a:/ a ,2=a/a,3=a/a,4=
-3,*=:a::a:a/ a a a,
+3,*=:a::a:a/ a  a a,
  a  a a
 3,1= a / a ,2=a/a,3=a/a,4=
 3,*= a  a a/ a a a,

geoffclare

2025-03-18 12:11

manager   bugnote:0007124

> Is the test script output supposed to be consistent across theoretically conforming shells?

Two different outputs are expected because of the optional discarding of empty fields (when the expansion occurs in a context where field splitting will be performed). If we disregard posh (which we don't usually pay attention to) and loksh (which I believe is descended from pdksh which had appallingly bad conformance), then you're seeing three behaviours. Unfortunately, ksh88 differs from those three:
$ sha1sum test_ksh88.txt
c615e994262cb2a6a6469a0f967ae8feeaa40966  test_ksh88.txt
$ sed -n l test_ksh88.txt
 a  a a$
3,1= a / a ,2=a/a,3=a/a,4=$
3,*= a  a a/ a a a,$
a a   a $
2,1=a a /a a ,2= a / a ,3=/,4=$
2,*=a a  a /a a   a ,$
:a: a  a$
4,1=:a:/ a ,2=a/a,3=/,4=a$
4,*=:a::a::a/ a  a  a,$
:a: a  a$
4,1=:a:/ a ,2=a/a,3=/,4=a$
4,*=:a::a::a/ a  a  a,$
 a  a a$
3,1= a / a ,2=a/a,3=a/a,4=$
3,*= a  a a/ a a a,$

(tested using /usr/xpg4/bin/sh on Solaris 11.4).

geoffclare

2025-04-17 10:00

manager   bugnote:0007138

Here is a much simpler test script which eliminates the unspecified behaviour and shows clearly the issue that Steffen has identified:
set a: a
IFS=:
printf '[%s]\n' $*

I get two different results with this script: ksh93 and dash produce:
[a]
[a]
but ksh88, bash and mksh produce:
[a]
[]
[a]

The standard requires the ksh93/dash behaviour. Shells which first do a quoted expansion and then split it end up splitting "a::a" which produces the extra empty field. This difference happens because IFS characters are terminators not separators (as stated in the RATIONALE on the sh page).

geoffclare

2025-04-29 11:16

manager   bugnote:0007160

Last edited: 2025-05-08 15:19

Note 0001915:0007138 was discussed in the April 24, 2025 teleconference and it was agreed that the standard should allow both behaviours.

Interpretation response
------------------------
The standard states how the '*' special parameter is expanded, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------

When IFS begins with a non-whitespace character and a positional parameter (other than the last one) ends with a non-whitespace IFS character, some existing implementations produce an extra empty field in the expansion of unquoted $* after the fields that the standard requires to be produced for that positional parameter.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

On page 2479 line 80362 section 2.5.2 ('@' special parameter), change:
When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in [xref to 2.6.5].
to:
When the expansion occurs in a context where field splitting will be performed, these initial fields shall be further processed in the same manner as for the '*' special parameter.

On page 2479 line 80383 section 2.5.2 ('*' special parameter), change:
When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in [xref to 2.6.5]. When the expansion occurs in a context where field splitting will not be performed, the initial fields shall be joined to form a single field with the value of each parameter separated by the first character of the IFS variable if IFS contains at least one character, or separated by a <space> if IFS is unset, or with no separation if IFS is set to a null string.
to:
When the expansion occurs in a context where field splitting will not be performed, the initial fields shall be joined to form a single field with the value of each parameter separated by the first character of the IFS variable if IFS contains at least one character, or separated by a <space> if IFS is unset, or with no separation if IFS is set to a null string. When the expansion occurs in a context where field splitting will be performed:

  • If IFS is set to a null string, any empty fields may be discarded but no further processing shall be performed.

  • Otherwise, any empty fields may be discarded and the remaining fields shall be processed in one of the following two ways:

    1. Each of the non-empty fields shall be split as described in [xref to 2.6.5].

    2. The remaining fields shall be joined to form a single field, as described above for expansion in a context where field splitting will not be performed, and this field shall then be split as described in [xref to 2.6.5].

<small>NOTE: These two alternatives produce different results if IFS begins with a non-whitespace character and a positional parameter, other than the last one, ends with a non-whitespace IFS character. Joining before splitting produces an extra empty field (after the fields that splitting each field separately produces for that positional parameter). A future version of this standard may require that this extra empty field is not produced.</small>

After page 3878 line 134509 section C.2.5.2 Special Parameters, add:
The Korn shell changed the way $* is expanded between the 1988 and 1993 versions, when subject to field splitting: ksh88 first joined the positional parameters as if $* were quoted and then performed field splitting on the result. This produced an extra empty field if IFS begins with a non-whitespace character and a positional parameter, other than the last one, ends with a non-whitespace IFS character. For example:
set a: a; IFS=:; printf '[%s]\n' $*
[a]
[]
[a]
The natural expectation is that expanding $* here would produce the same fields as expanding $1 and $2 as separate arguments:
set a: a; IFS=:; printf '[%s]\n' $1 $2
[a]
[a]
and this is what ksh93 does. Since David Korn was involved with development of the shell language requirements in POSIX.2-1992 at the same time that he was working on ksh93, it is believed that the intention was for POSIX.2-1992 to require the ksh93 behavior, but the wording was not sufficiently clear and some shells have continued to behave like ksh88. This standard now explicitly allows both behaviors (and likewise for $@ when subject to field splitting). However, implementors of shells that behave like ksh88 are encouraged to change to the ksh93 behavior.

chet_ramey

2025-04-30 18:12

reporter   bugnote:0007164

Test note to see if I can post here.

agadmin

2025-05-09 08:27

administrator   bugnote:0007182

Interpretation proposed: 9 May 2025

Issue History

Date Modified Username Field Change
2025-03-17 19:17 steffen New Issue
2025-03-18 10:31 geoffclare Note Added: 0007122
2025-03-18 11:29 lanodan Note Added: 0007123
2025-03-18 11:32 lanodan Note Edited: 0007123
2025-03-18 12:11 geoffclare Note Added: 0007124
2025-04-17 10:00 geoffclare Note Added: 0007138
2025-04-29 11:16 geoffclare Note Added: 0007160
2025-04-29 11:20 geoffclare Note Edited: 0007160
2025-04-29 11:20 geoffclare Note Edited: 0007160
2025-04-30 18:12 chet_ramey Note Added: 0007164
2025-05-08 15:19 geoffclare Note Edited: 0007160
2025-05-08 15:22 geoffclare Status New => Interpretation Required
2025-05-08 15:22 geoffclare Resolution Open => Accepted As Marked
2025-05-08 15:22 geoffclare Interp Status => Pending
2025-05-08 15:22 geoffclare Final Accepted Text => 0001915:0007160
2025-05-08 15:23 geoffclare Tag Attached: tc1-2024
2025-05-09 08:27 agadmin Interp Status Pending => Proposed
2025-05-09 08:27 agadmin Note Added: 0007182