Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001026 [1003.1(2013)/Issue7+TC1] Shell and Utilities Editorial Enhancement Request 2016-01-28 13:16 2020-01-31 15:57
Reporter joerg View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Jörg Schilling
Organization
User Reference
Section 2.5.2
Page Number 2324, 2382
Line Number 73738-73769, 75887
Interp Status ---
Final Accepted Text
Summary 0001026: The shell should support access to all 32 bit from the exit code
Description The current shell description is based on a very outdated
version of the POSIX standard that does not yet include the
waitid() interface.

Since waitid() was added aprox. 20 years ago, waitid() is a
non-optional part of the POSIX standard and permits to retrieve
all 32 bits from the exit() call in a child process and in
addition implements an easier to integrate interface than
waitpid() and the W*() macros.

New software should be encouraged to use the waitid() interface
instead of wait() or waitpid() and the shell should be enhanced
to support the better interface from waitid().

This results in a clean separation of the information for the
reason of a child termination from the exit() code, the termination
signal and other problems like "file not found" or "file not executable".

The shell also should be able to base it's internal logic on
whether the exit() parameter was != 0 regardless of whether
exitcode mod 256 is zero or not.
Desired Action On age 2382 after line 75887 insert:
               fullexitcode   Do not mask the exit code with 0xFF
                              when   expanding  $?.   This  gives
                              access to the full 32 bits from the
                              child's  exit  code  via  $? on all
                              POSIX  operating systems that  sup-
                              port waitid().  This also makes the
                              shell logic for condditional execution
                              based on the full 32 bit from the 
                              exit code.

On page 2324 after line 73769 insert:

     $/      This varaiable contains a signed decimal number in
             case that the program terminated normally. In case
             the program was terminated by a signal, it contains
             the name of the signal and in the other cases, it
             contains the name for the reason listed for
             ${.sh.codename} below.

     .sh.code
             The numerical  reason  waitid(2)  returned  for  the
             child   status   change.   It   matches   the  CLD_*
             definitions from signal.h.  Note  that  the  numbers
             are  usually  in  the  range  1..6  but  this is not
             guaranteed.  Use ${.sh.codename} for portability.

     .sh.codename
             The reason waitid(2) returned for the  child  status
             change  as  text  that is generated by stripping off
             CLD_ from the  related  definitions  from  signal.h.
             Possible values are:

             EXITED      The program had a normal termination and
                         the exit(2) code is in ${.sh.status}.

             KILLED      The program was killed by a signal,  the
                         signal  number  is  in ${.sh.status} the
                         signal name is in ${.sh.termsig}.

             DUMPED      The program  was  killed  by  a  signal,
                         similar to KILLED above, but the program
                         in addition created a core dump.

             TRAPPED     A traced child has trapped.

             STOPPED     The program was stopped by a signal, the
                         signal  number  is  in ${.sh.status} the
                         signal name is in ${.sh.termsig}.

             CONTINUED   A stopped child was continued.

             NOEXEC      An existing file could not be  executed.
                         This  can  happen  when  e.g. either the
                         type of the file is not  plain  file  or
                         when the file does not have execute per-
                         mission, or when the  argument  list  is
                         too long.

                         This is not a result from waitid(2)  but
                         from execve(2).

             NOTFOUND    A file was not found and thus could  not
                         be executed.

                         This is not a result from waitid(2)  but
                         from execve(2).

             The   child   codes   NOEXEC   and    NOTFOUND    in
             ${.sh.codename} may need  shared  memory  (e.g. from
             vfork(2)) to allow a reliable reporting.

     .sh.pid The process number of the process  that  caused  the
             current waitid(2) status.

     .sh.signame
             The name of the causing signal.  If  the  status  is
             related to a set of waitid(2) return values, this is
             CHLD or CLD, depending on the os.   When  a  trap(1)
             command is executed, ${.sh.signame} holds the signal
             that caused the trap.

     .sh.signo
             The signal number related to ${.sh.signame}.

     .sh.status
             The decimal value returned by the last synchronously
             executed  command.   The value is unaltered and con-
             tains the full int from  the  exit(2)  call  in  the
             child in case the shell is run on a modern os.

     .sh.termsig
             The   signal   name   related   to   the   numerical
             ${.sh.status} value. The translation to signal names
             takes place regardless of whether the child was ter-
             minated by a signal or terminated normally.



It may help to mention a code fragment to emulate waitid() on
non-POSIX systems for portability:
static int 
waitid(idtype, id, infop, opts) 
        idtype_t        idtype; 
        id_t            id; 
        siginfo_t       *infop;         /* Must be != NULL */ 
        int             opts; 
{ 
        int             exstat; 
        pid_t           pid; 
 
        opts &= ~(WEXITED|WTRAPPED);    /* waitpid() doesn't understand them */ 
#if     WSTOPPED != WUNTRACED 
        if (opts & WSTOPPED) { 
                opts &= ~WSTOPPED; 
                opts |= WUNTRACED; 
        } 
#endif 
 
        if (idtype == P_PID) 
                pid = id; 
        else if (idtype == P_PGID) 
                pid = -id; 
        else if (idtype == P_ALL) 
                pid = -1; 
        else 
                pid = 0; 
 
        infop->si_utime = 0; 
        infop->si_stime = 0; 
        pid = waitpid(pid, &exstat, opts); 
        infop->si_pid = pid; 
        infop->si_code = 0; 
        infop->si_status = 0; 
 
        if (pid == (pid_t)-1) 
                return (-1); 
 
        if (WIFEXITED(exstat)) { 
                infop->si_code = CLD_EXITED; 
                infop->si_status = WEXITSTATUS(exstat); 
        } else if (WIFSIGNALED(exstat)) { 
                if (WCOREDUMP(exstat)) 
                        infop->si_code = CLD_DUMPED; 
                else 
                        infop->si_code = CLD_KILLED; 
                infop->si_status = WTERMSIG(exstat); 
        } else if (WIFSTOPPED(exstat)) { 
                if (WSTOPSIG(exstat) == SIGTRAP) 
                        infop->si_code = CLD_TRAPPED; 
                else 
                        infop->si_code = CLD_STOPPED; 
                infop->si_status = WSTOPSIG(exstat); 
        } else if (WIFCONTINUED(exstat)) { 
                infop->si_code = CLD_CONTINUED; 
                infop->si_status = 0; 
        } 
        return (0); 
} 

Tags No tags attached.
Attached Files

- Relationships
related to 0000947Appliedajosey 1003.1(2008)/Issue 7 Shell should not have $? == 0 for exit(256) 

-  Notes
(0003057)
user229
2016-01-30 05:32

Re. NOEXEC and NOTFOUND, I don't see why it couldn't be reported using a pipe. Also, why only these two, and not the whole suite of possible errno values that can be returned from the exec family (or fork)? It reflects the traditional boundary of 127 for "not found" and 126 for "other", but there's no reason to limit a novel reporting mechanism in this way.
(0003058)
shware_systems (reporter)
2016-01-30 12:16

Alternate method brought up in phone call:

<signal.h> adds SIGEXIT as a signal number with SIG_DFLT type of I and non-maskable like SIGKILL or SIGSTOP.
When a sigaction handler is used, si_code can be:
EXIT_NORMAL Child has exited.

EXIT_SIGNAL Child has terminated due to a signal and did not create a core file.

EXIT_DUMPED Child has terminated abnormally and created a core file.

While these duplicate SIGCHLD codes they are limited to termination reasons that affect the shell. A shell can install a handler that sets the above $/ variable, before initiating any action specified for EXIT using the trap builtin. $/ is superfluous, actually, but benign.

The exit() interfaces get extended to require they shall use the effect of sigqueue() with SIGEXIT as signal number to the parent process, in addition to any SIGCHLD raise() or sigqueue().

The trap builtin gets extended with a -f flag, that limits the action for a signal to invoking a script function by name.
Syntax:
trap -f funcname [SA_OPTS] condition...

This differs from the 'trap action conditions' format in that eval is not used and the default handler type is a sigaction one, not sighandler, so [SA]_SIGINFO unnecessary as an SA_OPTS argument.

For brevity, the SA_OPTS arguments can be specified using simply an underscore. Whether an implemention should or shouldn't support a particular option I leave open.

The reason for limiting the action to a shell function is so the siginfo fields relevant to any signal being unqueued can be passed as positional parameters of the function. Each parameter would use effectively a "$(printf '%s=%d' fieldname value)" substitution, with fieldname skipping the 'si_' prefix. For some fields '%s=%s' as format can be used too. Where a function is handling multiple signals the first parameter, $1, would normally be 'signo=nnn' or 'signo=EXIT', as example. If a function is handling only one signal it can be the last parameter or missing entirely. Because of the naming any order possible and shouldn't affect script portability.

When a script uses
trap -f exitfunc EXIT

exitfunc gets the 32-bit exit code passed to exit() as positional parameter status=nnnn. The other fields beginning .sh. would be parameters also, and if the script desires it can assign those values to names it chooses, not have those static names cluttering the variables name space. A side benefit is a handler can use variable prefixes in combination with a pid= value to track termination of both synchronous and asynchronous child processes or sub-shells.

For backwards compatibility the processing for $? stays unchanged. The fullexitcode option to set is unnecessary also. Existing scripts using trap don't need changes either, except for one remotely possible case. What wait() and waitpid() put in stat_loc also doesn't change, so code based on them will still work.
(0003059)
user229
2016-01-30 15:53
edited on: 2016-01-30 15:56

What does SIGEXIT provide, as an actual signal, that SIGCHLD doesn't? Couldn't this be synthesized within the shell (if the main benefit is that trap handlers work differently) rather than being a real signal? Also, the shell already has an EXIT trap, for when the shell itself exits.

(0003061)
shware_systems (reporter)
2016-01-30 19:12

SIGCHLD can be blocked or ignored, SIGEXIT is specified as non-blockable so is always queued. This matches wait() and waitid(), in that they get a notification of termination whether SIGCHLD blocked or not.

Also, implementations are encouraged to use some of bits 8-31 as flags the W* macros test and strip off, and these bits may also be set in the si_status field for SIGCHLD, along with easier to use values in si_code. It's not prohibited, anyways, that I see.
I left this implied, sorry, but I expect SIGEXIT to prohibit this, as exit() has only the exit code passed in to be stored in si_status, or would store the terminating signal number by itself.

I also glossed over that if an action is set using the current form of trap for EXIT that the current usage expectation still holds. A new handler would be expected to compare the passed in pid to itself, to differentiate from one used by a child process, to perform any necessary atexit() type processing. Enabling this for an application would mean defining SA_EXITSELF as an option for sigaction(SIGEXIT), effectively adding atexit(siginfo_t info) and atquickexit(info) as interfaces. Overload si_band to hold a thread id and you can get pthread_atexit(info) also easily enough.

As things go I think what I outlined is a plausible method for any application to get the limited or full exit codes, not just the shell, with only two backwards compatibility concerns. That is -f as a flag may eval to an actual file on disk intended to be a condition action handler for a current script. I don't see that as a high liklihood, but it exists and that file and scripts referencing it would need a new name. The other is some <signal.h> I'm not familiar with may use SIGEXIT as a signo define already.

There are cleaner alternatives, sure, but I don't see the C standard changing the definition of main() and system() to return siginfo_t instead of int any time soon.
(0003062)
user229
2016-01-30 20:46

I don't see why main would have to be changed. As for system, what about a system_ex(const char *, siginfo_t *) - and pclose_ex?

Another thing to keep in mind if we're thinking about system (or popen) is that the shell is an intermediary. The shell is unlikely to be killed by a signal (though it may exec a process that is), or to fail to be executed (though it still could be due to argument size).
(0003063)
shware_systems (reporter)
2016-01-31 01:14

It's not that it has to, but is something that could be done with the current interfaces C requires. If there's problems there POSIX inherits them, additional interfaces can hide things but doesn't address everything. A new application still has to elect to use the new interfaces.
(0003128)
kre (reporter)
2016-04-05 14:14

I can understand wanting to separate the exit code and exit reason (signal, etc)
and keep those separate, but I cannot think of a single reason why anyone would
want to support more than a hundred different exit codes - for a single
application, 3 or 4 should be sufficient, more in the overall system just in
case there is some perverse desire to have lots of applications, each with a
distinct set of failure exit codes (I would assume there is no plan to alter
the single 0 meaning "success"). Keeping the limitation on applications to
not expect the system to support exit codes outside the range 0..255 seems
entirely the right thing to do. That is not to prohibit systems allowing more
exit codes than that if they can find some reason for so doing.
(0003129)
joerg (reporter)
2016-04-05 14:25

640k is enough for anyone...


BTW: it was the range 0..125 not 0..255 and please note that the range
of errno is already beyond the range 0..125.
(0003133)
kre (reporter)
2016-04-07 00:24

wrt:
    640k is enough for anyone...

entirely different kinds of issues.

And:
    BTW: it was the range 0..125 not 0..255

In the shell, yes, and if there was a sane way to redefine the way that "wait"
(the shell command) works to allow it to get the values 126..255 in a way that
was backwards compatible, I'd support that. Applications can supply an exit
code from 0-255 in a portable way, which has always worked (at C/kernel level).
Outside that range is not portable (the parameter for "exit" is an int only
because of the C parameter type widening rules, the kernel has only ever
guaranteed 8 bits of value there). It is a pity that sh originally chose to
limit it further by combining the "why the application exited" value, with the
"exit code when it exited normally" in a way that made them impossible to
distinguish.

But it did. This group (no standards group) are not legislators - it is not
appropriate to specify what we wish had been done originally, while both
requiring that for future implementations (which would not be so bad) while
also promising users that is how systems behave (which is simply false.)

Just document that standard behaviour - do not try to force changes.
(0003136)
joerg (reporter)
2016-04-07 08:52

You seem to missinterpret things. We are no longer in the first 20 years
of UNIX, but in the following 26 years that allow applications to return
32 bits from the exit code to the parent process.

This issue is not to enforce changes, but rather to document existing
behavior from the POSIX compliant waitid() and the existing recent
Bourne Shell implementation that is based on this POSIX compliant
behavior from waitid().

Given that ignoring parts of the exit code must be seen as a bug, this
is also a bug-fix.
(0003137)
kre (reporter)
2016-04-07 13:25

Joerg - aside from it being a little odd that you're dismissing
ancient history (which is actually not ancient, systems still work
this way) when you are usually complaining that the spec is different
from the way some 1980's vintage shell worked, but the issue here
is do essentially all modern shells actually pass back 32 bits of
exit code as the result of the wait command?

If not (and I suspect "not" is correct) then what you are attempting here
is to legislate the way that you believe the world should operate, rather
than specify the way it actually does. That's not correct behaviour
of a standards body. What you need to do is convince all the shell
authors to actually implement wait the way that you believe it should be
implemented, and once that is the standard behaviour, it can be documented
as that. Until then, pretending that just because waitid() has a 32
bit field into which an exit status can be put, and that the exit() call
takes an int as a parameter, means that when a program does exit(0x12345678)
that the shell will set $? to 305419896 does no-one any good.

I cannot test this, as my system still does as posix (as currently published)
still allows, and only handles exit values in the range 0..255.
(0003140)
joerg (reporter)
2016-04-08 10:22

You miss the way POSIX works: It standardizes existing implementations,
it does not modify them unless they contain a bug. But then it needs
to give a rationale on why it did not standardize existing behavior.

For this reason, we have 20 years of history where POSIX requires to
return all 32 bits from exit() with the mandatory waitid().

If your personal environment does not support this, it is not POSIX.
(0003142)
user229
2016-04-08 14:17
edited on: 2016-04-08 14:20

"For this reason, we have 20 years of history where POSIX requires to
return all 32 bits from exit() with the mandatory waitid()."

Which requirement does not in fact entail that all other mechanisms for obtaining an exit status (wait, $?, system) shall also return all 32 bits. Nor that all languages that provide support for calling external programs (sh, awk, ex) shall support any such mechanism.

The shell does not have a binding to the system interfaces in general, or to waitid in particular, and therefore is not bound by requirements the standard puts on waitid.

(0003143)
kre (reporter)
2016-04-09 11:22
edited on: 2016-04-09 11:27

Wrt note 3140 (and the last I am going to say on this issue)

   For this reason, we have 20 years of history where POSIX requires to
   return all 32 bits from exit() with the mandatory waitid().

I can find absolutely nothing in the published standard that supports this.
On the contrary, the exact opposite (in the sense in which we are talking)
is stated.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/exit.html [^]
     The value of status may be 0, EXIT_SUCCESS, EXIT_FAILURE,
     [CX] [Option Start] or any other value, though only the least
     significant 8 bits (that is, status & 0377) shall be available to a
     waiting parent process. [Option End]

Note "only the least significant 8 bits shall be available".

The description of waitid
  http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html [^]
says nothing to this issue directly at all, but defers to signal.h

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html [^]

    int si_status Exit value or signal.

Which is all it says about the field - it is an int, into which a value with
no specific specified range is to be placed (other than that it obviously needs
to fit in an int.) The range of values from exit (0..255) fits, so that is
OK, the range of signal numbers (0.. who knows, but usually < 128) also fits.

All looks good, and no mention anywhere at all about systems passing back
32 bit exit values from applications to parent processes (including shells.)

None.

Any inference you could draw about the range of exit values from the type
being "int" would logically have to apply to signal numbers as well, right?
So if your argument is that the system must support a range of exit values
from -2^31 .. 2^31-1 (or perhaps 0..2^32-1) then it would logically also be
required to support signal numbers with the same range, right? After all
they both get stored (in appropriate circumstances) into the same field.

And if your inference comes from the parameter to exit() being an int, then
you can draw the same inference from the (signal number) parameter to kill().

Personally, I consider this issue closed. No only is there no posix
requirement for exit codes outside the range 0..255 (it actually says that
only those values shall be available, not just that systems are not required
to support a wider set of values) I also cannot see (and no-one has presented)
a single reason why a larger set of exit code values would be of any use to
anyone whatever (and no, wanting to send back errno as an exit code is not
good enough - you'd have to explain why just "failed" with the failure reason
on stderr, or similar, is not a better approach.)

(0003146)
shware_systems (reporter)
2016-04-09 21:35

Re: 3128

One of the reasons for wanting a large error/status code range is while an application may have only a few generic types of errors, there may be multiple places in the application where each type may be returned. In the absence of a core dump or debugger providing an indication of exactly where in the code an error type occurred, the application may encode both type, as expressed by errno, and a usage count that can be mapped back to a source file and line number.

For cases where SIGHUP on a terminal used for stdout and stderr makes a more descriptive message impossible, what is left as an _always_available_ reporting method is the exit code. Signals are similar, in that they can be masked off or set to SIG_IGN in the parent, outside any control of the application.

Also, there are applications that display numbered choices interactively and report the choice made via the exit code to a script, avoiding the system overhead of using a pipe or stdout and catching SIGHUP. When the choices are filenames from a glob expansion, as example, this can be well over 255 entries. A zero usually means EXIT_ABORT or EXIT_NEXTBLOCK (if it groups the choices in blocks of 100 or so to stay in the 1 to 125 range), not EXIT_SUCCESS, when using apps like this.

For an international app they may elect to use a utility that shows a localized menu of 0=abort prompt, 1=yes, 2=no, to avoid the locale used with stdin to match a keyboard's charset, LC_CTYPE nominally, after output respecting LC_MESSAGES being set to a different locale, not having a direct mapping to the yesexpr and noexpr prompt string characters.

Per note 3061, the standard requires the status value returned by wait() to encode in the int sized container the exit code and any additional bits necessary for evaluation of all the W* macros unambiguously. As no particular encoding of those extra bits is specified, currently no application or interface, including waitid() and users of it, can make assumptions about which bits are not reserved by wait() to implement extensions with portably. The standard requires those bits have an analogue in si_code, but does not require si_status to have the bits stripped out so it isn't also usable with the W* macros. The example in the description elects to, but infop->si_status=exstat; legal also, it looks. This possibility is inconsistent with the expectation that si_status values set by the application for use with sigqueue() are presented to the signal handler unchanged.
(0004764)
stephane (reporter)
2020-01-31 15:57
edited on: 2020-01-31 15:59

[Copied from the mailing list in a discussion about 0001321]

$/ is not a very good choice of parameter name IMO.

That would break widely seen code like

sed "s/.$//"

sed "/^$/d; s/$foo/$bar/g"



Those are non-POSIX code as POSIX currently leaves the behaviour
unspecified if an unescaped $ is followed by a /, but it's
commonly seen in the wild as it works in all implementations in
practice (except recent versions of bosh).

(in other words, / is often seen following $ in arguments to
sed/awk/perl/pax/bsdtar... sometimes not within single quotes).

zsh has similar problems with $~var, $=var, $^var, though to a
lesser extent as things like sed "s~.$~~" are less widely used
in practice.

($/ is the record separator variable in perl (/ visually conveys
"separation" more than exit status IMO). perl is the only
language I can think of other than Bourne-like shells where $?
is the exit status. Most other shells (csh, fish, rc, akanga,
zsh) use $status instead. In csh, $?var expands to 1 if $var is
set and 0 otherwise.)


- Issue History
Date Modified Username Field Change
2016-01-28 13:16 joerg New Issue
2016-01-28 13:16 joerg Name => Jörg Schilling
2016-01-28 13:16 joerg Section => 2.5.2
2016-01-28 13:16 joerg Page Number => 2324, 2382
2016-01-28 13:16 joerg Line Number => 73738-73769, 75887
2016-01-28 13:26 joerg Tag Attached: issue8
2016-01-28 16:09 Don Cragun Relationship added related to 0000947
2016-01-30 05:32 user229 Note Added: 0003057
2016-01-30 12:16 shware_systems Note Added: 0003058
2016-01-30 15:53 user229 Note Added: 0003059
2016-01-30 15:55 user229 Note Added: 0003060
2016-01-30 15:56 user229 Note Edited: 0003059
2016-01-30 15:56 user229 Note Deleted: 0003060
2016-01-30 19:12 shware_systems Note Added: 0003061
2016-01-30 20:46 user229 Note Added: 0003062
2016-01-31 01:14 shware_systems Note Added: 0003063
2016-04-05 14:14 kre Note Added: 0003128
2016-04-05 14:25 joerg Note Added: 0003129
2016-04-07 00:24 kre Note Added: 0003133
2016-04-07 08:52 joerg Note Added: 0003136
2016-04-07 13:25 kre Note Added: 0003137
2016-04-08 10:22 joerg Note Added: 0003140
2016-04-08 14:17 user229 Note Added: 0003142
2016-04-08 14:19 user229 Note Edited: 0003142
2016-04-08 14:20 user229 Note Edited: 0003142
2016-04-09 11:22 kre Note Added: 0003143
2016-04-09 11:27 kre Note Edited: 0003143
2016-04-09 21:35 shware_systems Note Added: 0003146
2019-05-23 16:06 geoffclare Tag Detached: issue8
2020-01-31 15:57 stephane Note Added: 0004764
2020-01-31 15:59 stephane Note Edited: 0004764


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker