Anonymous | Login | 2024-09-12 22:54 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001645 | [1003.1(2016/18)/Issue7+TC2] System Interfaces | Objection | Clarification Requested | 2023-03-22 19:47 | 2024-06-11 09:07 | ||
Reporter | eblake | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Eric Blake | ||||||
Organization | Red Hat | ||||||
User Reference | ebb.execvp | ||||||
Section | XSH exec | ||||||
Page Number | 784 | ||||||
Line Number | 26548 | ||||||
Interp Status | Approved | ||||||
Final Accepted Text | Note: 0006281 | ||||||
Summary | 0001645: execvp( ) requirements on arg0 are too strict | ||||||
Description |
The standard is clear that execlp() and execvp() cannot fail with ENOEXEC (except in the extremely unlikely event that attempting to overlay the process with sh also fails with that error), but must instead attempt to re-execute sh with a command line set so that sh will execute the desired filename as a shell script. Furthermore, the standard is explicit that the original:execvl(file, arg0, arg1, ..., NULL) is retried as: execl(shell path, arg0, file, arg1, ..., NULL) that is, whatever name was passed in argv[0] in the original attempt should continue to be the argv[0] seen by the sh process that will be parsing file. But in practice, this does not actually happen on a number of systems. Here is an email describing bugs found in three separate projects (busybox, musl libc, and glibc) while investigating why attempting to rely on what the standard says about execvp() fallback behavior fails on Alpine Linux: https://listman.redhat.com/archives/libguestfs/2023-March/031135.html [^] In particular: 1. busybox installs /bin/sh as a multi-name binary, whose behavior DEPENDS on argv[0] ending in a basename of sh. If execvp() actually calls execl("/bin/sh", arg0, file, ...), the binary installed at /bin/sh will NOT see 'sh' as its basename but instead whatever is in arg0, and fails to behave as sh. (Bug filed at https://bugs.busybox.net/show_bug.cgi?id=15481 [^] asking the busybox team to consider installing a minimal shim for /bin/sh that is NOT dependent on argv[0]) 2. musl currently refuses to do ENOEXEC handling (a knowing violation of POSIX, but the alternative requires coordinating the allocation of memory to provide the space for the larger argv entailed by injecting /bin/sh into the argument list); see https://www.openwall.com/lists/musl/2020/02/12/9 [^] which acknowledges the issue, where Adélie Linux has patched musl for POSIX compliance but upstream musl does not like the patch. This followup mail surveyed the behavior of various other libc; many use VLA to handle things, but musl argues that VLA is itself prone to bugs https://www.openwall.com/lists/musl/2020/02/13/3. [^] Arguably, musl's claim that execvp() must be safe to use after vfork() can therefore not use malloc() is a bit of a stretch (the standard explicitly documents that execlp() and execvp() need not be async-signal-safe; and even though we've deprecated vfork(), the arguments about what is safe after vfork() roughly correspond to the same arguments about what async-signal-safe functions can be used between regular fork() and exec*()). 3. glibc does ENOEXEC handling, but passes "/bin/sh" rather than arg0 as the process name of the subsequent shell invocation, losing any ability to expose the original arg0 to the script. https://sourceware.org/git/?p=glibc.git;a=blob;f=posix/execvpe.c;h=871bb4c4#l51 [^] shows that the fallback executes is the equivalent to execl("/bin/sh", "/bin/sh", file, arg1, ...) Admittedly, Linux in general, and particularly Alpine Linux, will intentionally diverge from POSIX any time they feel it practical; but we should still consider whether the standard is too strict in requiring argv[0] to pass through unchanged to the script when the fallback kicks in. And I think the real intent is less about what sh's argv[0] is, and more about what the script's $0 is. Even historically, FreeBSD used to pass in "sh" rather than preserving arg0, up until 2020: https://cgit.freebsd.org/src/commit/?id=301cb491ea. [^] And _requiring_ arg0 to be used unchanged falls apart when a user invokes execlp("binary", NULL, NULL) (such behavior is non-conforming, since line 26559 states "The argument arg0 should point to a filename string that is associated with the process being started by one of the exec functions.", but a fallback to execl("/bin/sh", NULL, "binary", NULL) obviously won't do what is intended, so the library has to stick something there). Why don't we see complaints about this more frequently? Well, for starters, MOST people install shell scripts (or even scripts designed for other interpreters) with a #! shebang line. The standard is explicit that this is outside the realm of the standards (because different systems behave differently on how that first line is parsed to determine which interpreter to invoke), but at least on Linux, a script with a #! line NEVER fails with ENOEXEC - that aspect is handled by the kernel. The only time you ever get to a glibc or musl fallback that even has to worry about ENOEXEC is when the script has no leading #! line, which tends to not be common practice (even though the standard seems to imply otherwise). Additionally, most shells don't directly call execvp() - they instead do their _own_ PATH lookup, and then use execl() or similar - if that fails with ENOEXEC, the shell itself can then immediately parse the file contents with the desired $0 already in place, without having to rely on execvp() to try to spawn yet another instance of sh for the purpose. In playing with this, I note that the as-if rule might permit:
where quoted_filename is created by quoting the original file in such a way that the shell sees the original name after processing quoting rules (so as not to open a security hole when file contains shell metacharacters) as roughly the same effect as execl("/bin/sh", arg0, file, arg1, ..., NULL) - in that it kicks off a shell invocation that executes commands from the given file while $0 is set to the original name. It additionally has the benefits that it will work on a system with busybox as /bin/sh (because busybox still sees "sh" as argv[0], but also has enough knowledge of what to store into $0 for the duration of sourcing the file). So I went ahead and included a mention of that in non-normative RATIONALE - but we may decide to drop that. Why? Because we took pains in 0000953 to clarify that the dot utility might parse a file as either a program or a compound_list, while the 'sh file arg1' form requires parsing as a program, so it might create an observable difference if this alternative fallback ends up parsing as a compound_list (or we might also decide to tweak the proposed normative text to allow for this difference in parsing). What's more, if musl is already complaining about injecting "/bin/sh" into argv as being hard to do safely given memory constraints after vfork( ), it will be even harder to argue in favor of creating the string ". quoted_filename", which requires even more memory. In parallel with this, I'm planning to open a bug report against glibc to see if they will consider making the same change as FreeBSD did in 2020 of preserving arg0 to the eventual script. But they may reply that it risks breaking existing clients that have come to depend on the fallback passing $0 as a variant of "sh" rather than the original arg0, therefore my proposal here is to relax the requirements of the standard to allow more existing implementations to be rendered compliant as-is, even though it gives up the nice $0 guarantees. I also wonder if the standard should consider adding support for 'exec -a arg0 cmd arg1...', which is another common implementation extension in many sh versions for setting argv[0] of the subsequent cmd. That belongs in a separate bug report, if at all. But by the as-if rule, an implementation with that extension might use execl("/bin/sh", "sh", "-c", "exec -a \"$0\" quoted_file \"$@\"", arg0, arg1, ..., NULL) as a way to execute the correct file with the desired $0 even if it can't use the proposed dot trick due to difference in parse scope. |
||||||
Desired Action |
line numbers from Issue 7 + TC2 (POSIX 2017), although the same text appears in draft 3 of issue 8. At page 784 lines 26552-26557 (XSH exec DESCRIPTION), change: ...the executed command shall be as if the process invoked the sh utility using execl( ) as follows:to: ...the executed command shall be as if the process invoked the sh utility using execl( ) as follows: After page 794 line 26981 (XSH exec RATIONALE), add a new paragraph: When execlp( ) or execvp( ) fall back to invoking sh because of an ENOEXEC condition, the standard leaves the process name (what becomes argv[0] in the resulting sh process) unspecified. Existing implementations vary on whether they pass a variation of "sh", or preserve the original arg0. There are existing implementations of sh that behave differently depending on the contents of argv[0], such that blindly passing the original arg0 on to the fallback execution can fail to invoke a compliant shell environment. An implementation may instead utilize <tt>execl(<shell name>, "sh", "-c", ". <quoted_file>", arg0, arg1, ..., NULL)</tt>, where quoted_file is created by escaping any characters special to the shell, as a way to expose the original $0 to the shell commands contained within file without breaking sh sensitive to the contents of argv[0]. |
||||||
Tags | applied_after_i8d3, tc3-2008 | ||||||
Attached Files | |||||||
|
Relationships | |||||||||||||||||||
|
Notes | |
(0006226) eblake (manager) 2023-03-22 20:35 |
Promised glibc bug at https://sourceware.org/bugzilla/show_bug.cgi?id=30262 [^] |
(0006228) lacos (reporter) 2023-03-23 08:11 |
It occurs to me that argv[0], as execvp() is currently required to fill it in on the ENOEXEC fallback, could be irrelevant as far as $0 in the shell script is concerned. To recap, the fallback is execl(<shell path>, arg0, file, arg1, ..., (char *)0); Now if you compare that with the "sh" utility's specification, it seems to correspond to the *sole* synopsis form where the "command_file" operand is provided. And then the spec states (in the description of the "command_file" operand), "Special parameter 0 [...] shall be set to the value of command_file". This means that argv[0] will never be visible to the script, only to the shell executable. (The next question is of course if the shell executable cares about argv[0]. BusyBox does.) This suggests that - the normative section (description) should indeed be relaxed as Eric says, - the rationale need not recommend either "exec -a" or "sh -c", because $0 is already well specified. |
(0006231) bastien (reporter) 2023-03-23 10:53 |
@eblake dash does not support exec -a ... |
(0006281) geoffclare (manager) 2023-05-11 16:05 |
Interpretation response ------------------------ The standard states the value of arg0 to be passed to the sh utility, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- The standard does not match some existing practice, and a different arg0 value is not observable by applications (without using extensions). Notes to the Editor (not part of this interpretation): ------------------------------------------------------- At page 784 lines 26552-26557 (XSH exec DESCRIPTION), change: ...the executed command shall be as if the process invoked the sh utility using execl( ) as follows:to: ...the executed command shall be as if the process invoked the sh utility using execl( ) as follows: After page 794 line 26981 (XSH exec RATIONALE), add a new paragraph: When execlp( ) or execvp( ) fall back to invoking sh because of an ENOEXEC condition, the standard leaves the process name (what becomes argv[0] in the resulting sh process) unspecified. Existing implementations vary on whether they pass a variation of "sh", or preserve the original arg0. There are existing implementations of sh that behave differently depending on the contents of argv[0], such that blindly passing the original arg0 on to the fallback execution can fail to invoke a compliant shell environment. Because of the requirements on how sh handles its command line arguments, the shell script will see $0 containing the pathname of the script being executed, regardless of the value of argv[0]. |
(0006284) ajosey (manager) 2023-05-12 08:30 |
Interpretation proposed: 12 May 2023 |
(0006340) ajosey (manager) 2023-06-20 10:55 |
Interpretation approved: 20 June 2023 |
Issue History | |||
Date Modified | Username | Field | Change |
2023-03-22 19:47 | eblake | New Issue | |
2023-03-22 19:47 | eblake | Name | => Eric Blake |
2023-03-22 19:47 | eblake | Organization | => Red Hat |
2023-03-22 19:47 | eblake | User Reference | => ebb.execvp |
2023-03-22 19:47 | eblake | Section | => XSH exec |
2023-03-22 19:47 | eblake | Page Number | => 784 |
2023-03-22 19:47 | eblake | Line Number | => 26548 |
2023-03-22 19:47 | eblake | Interp Status | => --- |
2023-03-22 19:55 | eblake | Description Updated | |
2023-03-22 20:05 | eblake | Relationship added | related to 0000953 |
2023-03-22 20:35 | eblake | Note Added: 0006226 | |
2023-03-22 21:19 | eblake | Description Updated | |
2023-03-23 08:11 | lacos | Note Added: 0006228 | |
2023-03-23 10:53 | bastien | Issue Monitored: bastien | |
2023-03-23 10:53 | bastien | Note Added: 0006231 | |
2023-03-23 11:11 | lacos | Issue Monitored: lacos | |
2023-04-19 17:26 | eblake | Relationship added | related to 0001674 |
2023-05-11 16:05 | geoffclare | Note Added: 0006281 | |
2023-05-11 16:07 | geoffclare | Interp Status | --- => Pending |
2023-05-11 16:07 | geoffclare | Final Accepted Text | => Note: 0006281 |
2023-05-11 16:07 | geoffclare | Status | New => Interpretation Required |
2023-05-11 16:07 | geoffclare | Resolution | Open => Accepted As Marked |
2023-05-11 16:07 | geoffclare | Tag Attached: tc3-2008 | |
2023-05-12 08:30 | ajosey | Interp Status | Pending => Proposed |
2023-05-12 08:30 | ajosey | Note Added: 0006284 | |
2023-06-20 10:55 | ajosey | Interp Status | Proposed => Approved |
2023-06-20 10:55 | ajosey | Note Added: 0006340 | |
2023-08-17 10:53 | geoffclare | Status | Interpretation Required => Applied |
2023-08-17 10:54 | geoffclare | Tag Attached: applied_after_i8d3 | |
2024-01-15 16:40 | nick | Relationship added | related to 0001789 |
2024-06-11 09:07 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |