Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000947 [1003.1(2008)/Issue 7] Shell and Utilities Objection Enhancement Request 2015-05-06 11:51 2019-10-21 09:32
Reporter joerg View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Jörg Schilling
Organization
User Reference
Section Shell command language
Page Number 2324
Line Number 73758
Interp Status Approved
Final Accepted Text See Note: 0003054.
Summary 0000947: Shell should not have $? == 0 for exit(256)
Description Since SVr4 (1989) we have waitid() and therefore, the full 32 bit value
if the exit() parameter is available but the shell still does not make
use of this feature now available sinve 25 years.

We should require the shell to use waitid() and si.si_status to retrieve
the exit code from a child process. It may make sense to introduce a new
macro "$status" that carries the full exit status of a child.
Desired Action Change line 73738 from:

      ? Expands to the decimal exit status of the most recent pipeline (see Section 2.9.2).

to:

      ? Expands to the decimal exit status of the most recent pipeline (see Section 2.9.2). If the lower 8 bits of the exit() function have all been zero,
it expands to 1.

Tags tc3-2008
Attached Files

- Relationships
related to 0000690Closed 1003.1(2013)/Issue7+TC1 clarify behavior when calling waitpid with SA_NOCLDWAIT 
related to 0001026Closed 1003.1(2013)/Issue7+TC1 The shell should support access to all 32 bit from the exit code 
child of 0000594Closedajosey 1003.1(2008)/Issue 7 Clarify interaction of si_status and WIF* macros 

-  Notes
(0002657)
stephane (reporter)
2015-05-07 09:29

Note that it depends on bug http://austingroupbugs.net/view.php?id=594 [^]

awk's system() should probably be updated as well (at the moment, it's not clear what it returns. As worded it suggests it should return the same thing as system(3), but I don't think any implementation does, it doesn't say what awk's close() should return either (some return what pclose() returns, some a shell-like ($?) exit status)

It may also be worth pointing out in the description of exit/_exit that applications should avoid using exit codes above 255 as they won't be available to the shell (and awk's system()/close()) and that system(3), pclose(3), wait(3), waitpid(3) will only get the lower 8 bits of it.

I would also find worth mentioning that values 128 to 255 are used by most shell implementations to report the death-by-signal of its last child (the only exception I know (and that may be why POSIX doesn't clearly specify it) being recent versions of ksh93 that kill themselves with the same signal resulting in nasty side effects like spurious logs and extra core files and IMO is plain wrong, see also http://unix.stackexchange.com/a/99134). [^]

So those values should be avoided except to convey the same meaning. Special values like the shell's (and other's) 127 for command-not-found may even be mentioned there.

Also, at the moment, the only cases I've seen of applications calling exit() with a value greater than 255 is exit(-1)/return -1 (often not intentionally). I wouldn't be surprised if there are a number of shell scripts out there that do:

cmd
ret=$?
if [ "$ret" -eq 255 ]; then
  echo "something horribly wrong happened to cmd"
  exit 1
fi

And the proposed change would break those scripts. Maybe using 255 for any exit code above 255 instead of 1 would be better even if it overlaps with reporting the death of a child by signal 127.
(0002658)
joerg (reporter)
2015-05-07 09:47
edited on: 2015-06-18 09:37

My proposal was to introduce a migration path that tries to fix problems
with the historic implementations.

To upgrade to something that is usable for today, it would be a good
idea to have additional macros:

$status ----> si_status
$code ----> si_code
$pid ----> si_pid

si_code is from sigonfo.h:

#define CLD_EXITED 1 /* child has exited */
#define CLD_KILLED 2 /* child was killed */
#define CLD_DUMPED 3 /* child has coredumped */
#define CLD_TRAPPED 4 /* traced child has stopped */
#define CLD_STOPPED 5 /* child has stopped on signal */
#define CLD_CONTINUED 6 /* stopped child has continued */

If a child dies from a signal, $code would be == 2 (CLD_KILLED)
and $status would hold the signal number,

For a shell, we just would need to define special values to
flag "file not found" and "permission denied" from exec().

In order to make $code useful in a shell, we would also need
to standardize the exact values for CLD_* #defines. Given the
fact that waitid() was introduced by Svr4, I recommend to use
the above values taken from Solaris.

It may also be a good idea to define a method to signal the error
situations:

file found but not executable that currently results in $? == 126

and

executable not found that currently results in $? == 127

The new system could also be a solution for the current underspecified
situation where POSIX requires just a value > 128 in $? when a command
was terminated by a signal, that results in 128 + signal number on
a Bourne Shell and the closed source POSIX Korn Shell that was
derived from ksh88 (0x80 + signo) but in 0x100 + signo for ksh93.

The new system would help because $code in such a case would be
2 (killed by signal) and $status would contain the signal number.

Note: bash and zsh also deliver 0x80 + signo.

(0002728)
joerg (reporter)
2015-06-22 14:58
edited on: 2015-07-02 08:45

After working on a related implementation in the Bourne Shell, I have the
following proposal:

     .sh.code
             The numerical reason waitid(2) returned for the
             child status change. It matches the CLD_* defini-
             tions from signal.h. Note that the numbers are usu-
             ally in the range 1..6 but this is not guaranteed.
             Use ${.sh.codename} for portability.
 
     .sh.codename
             The reason waitid(2) returned for the child status
             change as text that is generated by stripping off
             CLD_ from the related definitions from signal.h.
             Possible values are:
 
             EXITED The program had a normal termination and
                         the exit(2) code is in ${.sh.status}.
 
             KILLED The program was killed by a signal, the
                         signal number is in ${.sh.status} the
                         signal name is in ${.sh.termsig}.
 
             DUMPED The program was killed by a signal,
                         similar to KILLED above, but the program
                         in addition created a core dump.
 
             TRAPPED A traced child has trapped.
 
             STOPPED The program was stopped by a signal, the
                         signal number is in ${.sh.status} the
                         signal name is in ${.sh.termsig}.
 
             CONTINUED A stopped child was continued.
 
     .sh.pid The process number of the process that caused the
             current waitid(2) status.
 
     .sh.signame
             The name of the causing signal. If the status is
             related to a set of waitid(2) return values, this is
             CHLD or CLD, depending on the os. When a trap com-
             mand is executed, ${.sh.signame} holds the signal
             that caused the trap.
 
     .sh.signo
             The signal number related to ${.sh.signame}.
 
     .sh.status
             The decimal value returned by the last synchronously
             executed command. The value is unaltered and con-
             tains the full int from the exit(2) call in the
             child in case the shell is run on a modern os.
 
     .sh.termsig
             The signal name related to the numerical
             ${.sh.status} value. The translation to signal names
             takes place regardless of whether the child was ter-
             minated by a signal or terminated normally.
 
     Note that trying to use the ${.sh.xxx} parameters on older
     shells will cause the older shells to exit with a bad sub-
     stitution message unless the shell is an interactive shell.


.sh.signo/.sh.signame could be set while a trapcommand is executed.
In this case, it would reflect the signal that triggered the shell trap handler.

I further propose to set $? to the special value 128 in case that
si_status has a value that would result in $? == 0 after truncation to 8 bits.

Please comment this proposal.

(0002731)
joerg (reporter)
2015-06-24 11:53
edited on: 2015-07-07 10:20

Today, I published an updated Bourne Shell source at:

http://sourceforge.net/projects/schilytools/files/schily-2015-07-07.tar.bz2 [^]

Or the latest available tarball at:

http://sourceforge.net/projects/schilytools/files/ [^]

See also:

http://sourceforge.net/projects/schilytools/files/AN-2015-07-07 [^]

of the latest announcement file in the related tarball.

The included Bourne Shell source now implements full support for
the siginfo status. This includes exec()d programs, builtins
and setting .sh.signo and .sh.signame to the causing signal while
a trap(1) command is executed.

I would be interested to get feedback on which platforms implement
a usable waidid() implementation.

Solaris works

FreeBSD masks si_status with 0xFF
HP-UX-10.20 masks si_status with 0xFF
Linux masks si_status with 0xFF
Mac OS X clears si_code and si_pid and masks si_status with 0xFFFFFF and then sign extends

So from my current knowledge, only Solaris is following POSIX rules.

If you need a program that may return arbitrary exit code, try:

bourne-shell -c "exit <value>"

or use the test program I add in the n ext note:

(0002732)
joerg (reporter)
2015-06-24 11:53

#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <stdio.h>
/*
 * Non-standard compliant platforms may need
 * #include <signal.h> or something similar
 * in addition to the include files above.
 */

int
main()
{
        siginfo_t si;
        pid_t pid;
        int ret;

        if ((pid = fork()) < 0)
                exit(1);
        if (pid == 0) {
                _exit(1234567890);
        }
        ret = waitid(P_PID, pid, &si, WEXITED);
        printf("ret: %d si_pid: %ld si_status: %d si_code: %d\n",
                ret,
                (long) si.si_pid, si.si_status, si.si_code);
        if (pid != si.si_pid)
                printf("si_pid in struct siginfo should be %ld but is %ld\n",
                        (long) pid, (long) si.si_pid);
        if (si.si_status != 1234567890)
                printf("si_status in struct signinfo should be %d (0x%x) but is %d (0x%x)\n",
                        1234567890, 1234567890, si.si_status, si.si_status);
        if (CLD_EXITED != 1)
                printf("CLD_EXITED is %d on this platform\n", CLD_EXITED);
        return (0);
}
(0002745)
joerg (reporter)
2015-07-06 11:56

Let me add a hint to understand the background:

The call waitid() has been introduced in 1989 with SVr4.
On SVr4, it supports to retrieve the full 32 bits from
the exit() call since it was introduced.

Given the fact that POSIX does not intend to modify but
rather describes existing features, it can be seen as a
POSIX bug when POSIX claims that the exit code is always
masked by 0xFF.
(0002746)
geoffclare (manager)
2015-07-06 14:39

Looking at the history of waitid() standardisation, there is an interesting change between SUSv1 and SUSv2.

If you read SUSv1 carefully, the description of exit() and _exit() was misleading but did not actually conflict with SVR4 waitid() behaviour. It said:

"If the parent process of the calling process is executing a wait(), wait3(), waitid() or waitpid(), and has neither set its SA_NOCLDWAIT flag nor set SIGCHLD to SIG_IGN, it is notified of the calling process’ termination and the low-order eight bits (that is, bits 0377) of status are made available to it."

This required the low-order eight bits to be available, but said nothing about whether other bits might also be made available.

In SUSv2 that part was the same, but the following sentence was added earlier in the description:

"The values of status can be EXIT_SUCCESS or EXIT_FAILURE, as described in <stdlib.h>, or any implementation-dependent value, although note that only the range 0 through 255 will be available to a waiting parent process."

(The sentence has since morphed into what is now the 2nd paragraph.)

This was the point at which the conflict with SVR4 waitid() behaviour was introduced.
(0002749)
joerg (reporter)
2015-07-06 15:05

Do you remember why this change has been introduced?
Are there any type of records about the discussion for SUSv2?
(0002750)
joerg (reporter)
2015-07-08 12:55

Today, July 8 2015, FreeBSD added support for the whole int from
exit() in siginfo.si_status.
(0002754)
eblake (manager)
2015-07-15 17:45

In response to Note: 0002657, just as awk's system() needs an update, so does m4's sysval.

Also, xargs is documented to have special behavior on commands that return 255, where exit(-1) has typically triggered this special behavior. Mapping exit(256) to 1 vs. mapping it to 255 (instead of the current truncation to 0) may have implications on this usage of xargs.
(0002756)
joerg (reporter)
2015-07-16 14:30

Let me first mention that my implementation in the Bourne Shell was
mainly (in the first attempt) to be able get an impression on the
implications. Now that the implementation exist and is complete
since a week, it seems that

See my Bourne Shell man page that is updated from time to time here:

     http://schillix.sourceforge.net/man/man1/bosh.1.html [^]

m4's sysval seems to be as "simple" as the shell as we would just
need to define new variables for si_status and si_code.

system() and awk may be harder.
(0002776)
joerg (reporter)
2015-07-29 08:17
edited on: 2016-01-21 16:48

Some more notes:

1) meanwhile, I added "set -o fullexitcode" to the Bourne Shell to
   tell the shell not t mask the value in $?. I am in hope that
   Surceforge will allow uploads again soon, in order to be able to
   make the related source available.

2) it may be of interest that the ability to get access to all 32 bits
   from the exit() code existed in 1980 on UNOS (the first UNIX clone)
   already and was accessible via the cwait() function that has a
   remarkable similarity to the waitid() interface introduced in 1989.
   Here is the man page from UNOS:


cwait() waits for child processes to exit or suspend

:format

int cwait(pidp,statp)
  int *pidp; /* where to fill in pid */
  int *statp; /* where to fill in status */

:description

cwait() waits for children to exit or suspend and indicates why
the particular process exited (e.g. normal termination).

:returns

Returns the following values, depending on why cwait() came
back:

     < 0 error (no children)

      0 normal termination, exit code of child

      1 ^C

      2 killed

      3 trap, particular trap in status

      4 suspended, can be resumed or killed

      5 exec failure, standard system error code in status

      6 syserr, error detected in kernel, standard error code

:notes

A process that suspends still exists in the system and should
eventually be killed to free up the resources it uses.

(0003022)
mirabilos (reporter)
2016-01-13 21:46

I have just been made aware of the move towards more than 8 bit of exit code, and I would like to object against doing that, for several reasons:

– real existing code has been relying on the formulation of an exit code being an unsigned integer in the standard for long

– wording in the standard suggested the exit code could only be in the range 0‥255, inclusive

– waitid() may have existed for a long time, but it’s still not present in some contemporary operating systems (such as OpenBSD), and is reportedly broken in others (and apparently, this was caused by a mistake in some versions of the standard, which, while a bug that should arguably be fixed, people were *encouraged* to rely on – nobody appreciates a standard which changes what’s allowed and forbidden for every issue)

– real existing code (both shell scripts and compiled C code) can rely on the masking (in fact, I vaguely recall writing shell scripts doing that)

– changing the standard in such a backwards-incompatible way that *additionally* imposes requirements on the operating system causes no small amount of harm to portability (e.g. that of a shell implementation)

– once a standard gets too removed from the real existing situation out in the field, the standard will lose all credibility

Not part of my objection, but still needs to be said: one of the greatest strengths of mksh is that it behaves consistently even across platforms. This does include such things as restricting arithmetics to 32 bit everywhere, but it guarantees scripts a well-defined runtime environment. This means suggesting me to only not do the masking when the OS supports waitid() will not be well received, as it would introduce another point of diverging depending on the underlying operating system.
(0003025)
chet_ramey (reporter)
2016-01-15 15:12

I agree with Thorsten. This seems like more of a change for change's sake, rather than solving some problem or satisfying a need.
(0003054)
rhansen (manager)
2016-01-28 17:09
edited on: 2016-01-28 17:09

Interpretation response
------------------------
The standard clearly states that if a command terminates normally (not by a signal), the shell sets $? to the value retrieved for the command by the equivalent of the wait() function WEXITSTATUS macro, and conforming implementations must conform to this.

Rationale:
-------------
If an application calls exit(256), the WEXITSTATUS macro's effective modulo 256 operation will result in the shell behaving as if the application called exit(0) (the '?' special parameter will expand to 0 and the exit status will be treated as true for purposes of if, while, AND and OR lists, etc.). The concern is that this behavior will result in certain application errors being interpreted as success by the shell. While this behavior is unfortunate, all known existing implementations behave this way, and changing the standard to require the use of the full exit value would render these implementations non-conformant. In addition, some applications may rely on the modulo 256 behavior to report different flavors of success.


Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

On page 2337-2338 lines 74309-74319 (XCU 2.8.2 Exit Status for Commands), change:
If a command is not found, the exit status shall be 127. If the command name is found, but it is not an executable utility, the exit status shall be 126. Applications that invoke utilities without using the shell should use these exit status values to report similar errors.

If a command fails during word expansion or redirection, its exit status shall be greater than zero.

Internally, for purposes of deciding whether a command exits with a non-zero exit status, the shell shall recognize the entire status value retrieved for the command by the equivalent of the wait( ) function WEXITSTATUS macro (as defined in the System Interfaces volume of POSIX.1-2008). When reporting the exit status with the special parameter <tt>'?'</tt>, the shell shall report the full eight bits of exit status available. The exit status of a command that terminated because it received a signal shall be reported as greater than 128.
to:
The exit status of a command shall be determined as follows:
  • If the command is not found, the exit status shall be 127.
  • Otherwise, if the command name is found, but it is not an executable utility, the exit status shall be 126.
  • Otherwise, if the command terminated due to the receipt of a signal that was not caught, the exit status shall be greater than 128. Note that shell implementations are permitted to use an exit status greater than 255 if a command terminates due to a signal.
  • Otherwise, the exit status shall be the value obtained by the equivalent of the WEXITSTATUS macro applied to the status obtained by the wait( ) function (as defined in the System Interfaces volume of POSIX.1-2008). Note that for C programs, this value is equal to the result of performing a modulo 256 operation on the value passed to _Exit( ), _exit( ), or exit( ) or returned from main( ).

On page 3692 after line 126274 (XRAT C.2.8.2 Exit Status for Commands) insert the following new paragraphs:
If a C application calls <tt>exit(256)</tt>, the command's exit status in the shell becomes zero due to the modulo 256 operation. Since zero is interpreted as "true" or "success" for if statements, AND and OR lists, <tt>set -e</tt>, and so on, applications should be careful to avoid exiting with a value that is a multiple of 256 unless the value is intended to be interpreted as true or success.

To avoid ambiguity caused by the modulo 256 operation, applications are encouraged to avoid using a count or the result of a computation as the exit value unless the value is guaranteed to be non-negative and less than 256.

The ambiguity caused by the modulo 256 operation is unfortunate, but required due to historical implementation behavior. A future version of this standard may change the definition of exit status to remove the modulo 256 requirement and use all bits of the value passed to exit( ) (or equivalent), and may introduce a way to select whether the special parameter <tt>'?'</tt> contains the exit status modulo 256 or the full exit status.


(0003065)
ajosey (manager)
2016-02-01 21:31

Interpretation Proposed: 1st February 2016
(0003092)
ajosey (manager)
2016-03-08 19:22

Interpretation approved: 8 March 2016

- Issue History
Date Modified Username Field Change
2015-05-06 11:51 joerg New Issue
2015-05-06 11:51 joerg Status New => Under Review
2015-05-06 11:51 joerg Assigned To => ajosey
2015-05-06 11:51 joerg Name => Jörg Schilling
2015-05-06 11:51 joerg Section => Shell command language
2015-05-06 11:51 joerg Page Number => 2324
2015-05-06 11:51 joerg Line Number => 73758
2015-05-07 09:29 stephane Note Added: 0002657
2015-05-07 09:47 joerg Note Added: 0002658
2015-05-07 12:33 joerg Note Edited: 0002658
2015-05-07 12:33 joerg Note Edited: 0002658
2015-06-18 09:25 joerg Note Edited: 0002658
2015-06-18 09:36 joerg Note Edited: 0002658
2015-06-18 09:37 joerg Note Edited: 0002658
2015-06-22 14:58 joerg Note Added: 0002728
2015-06-22 15:01 joerg Note Edited: 0002728
2015-06-22 15:16 joerg Note Edited: 0002728
2015-06-24 11:53 joerg Note Added: 0002731
2015-06-24 11:53 joerg Note Added: 0002732
2015-06-24 12:29 joerg Note Edited: 0002731
2015-07-02 08:45 joerg Note Edited: 0002728
2015-07-02 10:30 joerg Note Edited: 0002731
2015-07-06 11:56 joerg Note Added: 0002745
2015-07-06 14:39 geoffclare Note Added: 0002746
2015-07-06 15:05 joerg Note Added: 0002749
2015-07-06 16:21 rhansen Relationship added related to 0000690
2015-07-07 10:20 joerg Note Edited: 0002731
2015-07-08 12:55 joerg Note Added: 0002750
2015-07-15 17:45 eblake Note Added: 0002754
2015-07-16 14:30 joerg Note Added: 0002756
2015-07-29 08:17 joerg Note Added: 0002776
2016-01-07 16:22 eblake Relationship added child of 0000594
2016-01-13 21:46 mirabilos Note Added: 0003022
2016-01-15 15:12 chet_ramey Note Added: 0003025
2016-01-21 16:48 joerg Note Edited: 0002776
2016-01-28 16:09 Don Cragun Relationship added related to 0001026
2016-01-28 17:09 rhansen Note Added: 0003054
2016-01-28 17:09 rhansen Note Edited: 0003054
2016-01-28 17:12 Don Cragun Interp Status => Pending
2016-01-28 17:12 Don Cragun Final Accepted Text => See Note: 0003054.
2016-01-28 17:12 Don Cragun Status Under Review => Interpretation Required
2016-01-28 17:12 Don Cragun Resolution Open => Accepted As Marked
2016-01-28 17:12 Don Cragun Tag Attached: tc3-2008
2016-02-01 21:31 ajosey Interp Status Pending => Proposed
2016-02-01 21:31 ajosey Note Added: 0003065
2016-03-08 19:22 ajosey Interp Status Proposed => Approved
2016-03-08 19:22 ajosey Note Added: 0003092
2019-10-21 09:32 geoffclare Status Interpretation Required => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker