View Issue Details

IDProjectCategoryView StatusLast Update
00010551003.1(2013)/Issue7+TC1Shell and Utilitiespublic2024-06-11 08:56
Reporterrhansen Assigned To 
PrioritynormalSeverityObjectionTypeOmission
Status ClosedResolutionAccepted As Marked 
NameRichard Hansen
Organization
User Reference
Section2.3 Token Recognition
Page Number2321-2322
Line Number73636-73689
Interp Status---
Final Accepted Text0001055:0004247
Summary0001055: unspecified how much is parsed before execution begins
DescriptionPOSIX does not say how much of the input (or eval text, command substitution body, dot script, or ENV script) is parsed before execution begins. This matters because
  • it affects how much code is executed before a syntax error causes the shell to exit (note that it is common to create a self-extracting archive by prepending some shell code to a tarball; the expectation is that the shell executes the extraction code before it would try and inevitably fail to parse the tarball), and
  • there is an intimate relationship between parsing and alias substitution.
Desired ActionSpecific wording to be provided later, but a summary of the desired changes (assuming implementations behave this way):
  • Input and ENV scripts shall be parsed using program as the start symbol.
  • eval bodies, command substitution bodies, and dot scripts shall be parsed using compound_list as the start symbol.
  • When code is parsed as a program symbol: Once a complete_command has been parsed, the shell shall execute the complete_command before it starts parsing the next complete_command.
  • When code is parsed as a compound_list symbol: The compound_list shall be fully parsed before any execution of that compound_list begins.

Tagstc3-2008

Relationships

related to 0000953 Closedajosey Alias expansion is under-specified 
related to 0001048 Closed deprecate alias and unalias 

Activities

rhansen

2016-06-23 16:26

manager   bugnote:0003270

Last edited: 2016-06-23 16:44

On page 2322 after line 73689 (just before XCU 2.3.1 Alias Substitution), insert a new paragraph:
Once a complete_command symbol has been recognized by the grammar (see [xref to 2.10 Shell Grammar]), the complete_command shall be subjected to alias substitution (see [xref to 2.3.1 Alias Substitution]) then executed before the next complete_command is tokenized and parsed.

On page 2322 lines 73691-73693 (XCU 2.3.1 Alias Substitution), change:
After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.
to:
After a sequeunce of tokens has been parsed and recognized as a command or compound list by the grammar (see [xref to 2.10 Shell Grammar]), but before the command or compound list is executed, each word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.

On page 2331 lines 74074-74076 (XCU 2.6.3 Command Substitution), change:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command. Any valid shell script can be used for command, except a script consisting solely of redirections which produces unspecified results.
to:
With the $(command) form, all characters following the open parenthesis to the matching closing parenthesis constitute the command.

With both the backquoted and $(command) forms, command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed. Any valid compound_list can be used for command, except a compound_list consisting solely of redirections which produces unspecified results.

On page 2325 lines 73782-73785 (XCU 2.5.3 Shell Variables, ENV) change:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
to:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file. Before any interactive commands are read, the contents of the file shall be tokenized, parsed, subjected to alias expansion, and executed as described in [xref to 2.3 Token Recognition]. The contents shall be executed in the current environment.

On page 2364 line 75304 (XCU 2.14 dot DESCRIPTION), change:
The shell shall execute commands from the file in the current environment.
to:
The contents of file shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2366 line 75371 (XCU 2.14 eval), change:
The constructed command shall be read and executed by the shell.
to:
The constructed command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2350 after line 74800 (XCU 2.10.2 Shell Grammar Rules), insert the following comment above %start:
/* The start symbol is compound_list when parsing dot scripts, command
   substitution bodies, and the arguments passed to eval */

On page 3678 after line 125707 (just before XRAT C.2.3.1), insert a new paragraph:
Because a complete_command is executed before the next complete_command is tokenized and parsed, syntax errors are not discovered by the shell until just before the code would be executed. While in some cases it might be desirable to detect and react to syntax errors before anything is executed, deferring the discovery of syntax errors has several benefits:
  • It makes it possible for script authors to test for the avilability of a nonstandard extension and react appropriately before the use of the extension would trigger a syntax error.
  • It makes it possible to create self-extracting tarballs (a shell script concatenated with a payload archive that extracts the archive when executed).
  • The shell does not have to read and parse the complete script before execution, which reduces memory usage when executing extremely long scripts.

On page 3681 lines 125831-125832 (XRAT C.2.5.3 ENV) change:
However, unlike dot scripts, no PATH searching is performed. This is used as a guard against Trojan Horse security breaches.
to:
However, unlike dot scripts, ENV scripts are parsed as a program, not a compound_list. This distinction matters because it influences when aliases take effect and whether syntax errors in the script are discovered before any part of the script is executed.

For security reasons, PATH is not searched when locating the ENV script.


geoffclare

2016-06-29 16:06

manager   bugnote:0003276

I'm not too keen on the proposed change to 2.6.3 Command Substitution.

The existing text saying "any valid shell script" highlights the main difference between $(...) and `...`. That difference is lexical in nature; i.e. you can copy and paste a shell script into $(...) and it works. You can't do that with `...` because of quoting. By changing to "any valid compound_list" and treating $(...) and `...` the same, this lexical difference is ignored.

rhansen

2016-07-07 15:50

manager   bugnote:0003291

Good catch, Geoff! I'll post a new revision.

rhansen

2016-07-07 15:52

manager   bugnote:0003292

Last edited: 2016-07-07 15:53

On page 2322 after line 73689 (just before XCU 2.3.1 Alias Substitution), insert a new paragraph:
Once a complete_command symbol has been recognized by the grammar (see [xref to 2.10 Shell Grammar]), the complete_command shall be subjected to alias substitution (see [xref to 2.3.1 Alias Substitution]) then executed before the next complete_command is tokenized and parsed.

On page 2322 lines 73691-73693 (XCU 2.3.1 Alias Substitution), change:
After a token has been delimited, but before applying the grammatical rules in Section 2.10, a resulting word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.
to:
After a sequeunce of tokens has been parsed and recognized as a command or compound list by the grammar (see [xref to 2.10 Shell Grammar]), but before the command or compound list is executed, each word that is identified to be the command name word of a simple command shall be examined to determine whether it is an unquoted, valid alias name.

On page 2331 line 74073 (XCU 2.6.3 Command Substitution), add a new sentence at the end of the paragraph at lines 74067-74073:
After backslashes have been processed, the characters in command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed.

On page 2331 line 74076 (XCU 2.6.3 Command Substitution), add a new sentence at the end of the paragraph at lines 74074-74076:
The characters in command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed.

On page 2325 lines 73782-73785 (XCU 2.5.3 Shell Variables, ENV) change:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file containing shell commands to execute in the current environment.
to:
This variable, when and only when an interactive shell is invoked, shall be subjected to parameter expansion (see Section 2.6.2) by the shell and the resulting value shall be used as a pathname of a file. Before any interactive commands are read, the contents of the file shall be tokenized, parsed, subjected to alias expansion, and executed as described in [xref to 2.3 Token Recognition]. The contents shall be executed in the current environment.

On page 2364 line 75304 (XCU 2.14 dot DESCRIPTION), change:
The shell shall execute commands from the file in the current environment.
to:
The contents of file shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2366 line 75371 (XCU 2.14 eval), change:
The constructed command shall be read and executed by the shell.
to:
The constructed command shall be tokenized (see [xref to XCU 2.3 Token Recognition]), then parsed as a single compound_list (see [xref to XCU 2.10 Shell Grammar]), then subjected to alias substitution (see [xref to 2.3.1]), then executed in the current environment.

On page 2350 after line 74800 (XCU 2.10.2 Shell Grammar Rules), insert the following comment above %start:
/* The start symbol is compound_list when parsing dot scripts, command
   substitution bodies, and the arguments passed to eval */

On page 3678 after line 125707 (just before XRAT C.2.3.1), insert a new paragraph:
Because a complete_command is executed before the next complete_command is tokenized and parsed, syntax errors are not discovered by the shell until just before the code would be executed. While in some cases it might be desirable to detect and react to syntax errors before anything is executed, deferring the discovery of syntax errors has several benefits:
  • It makes it possible for script authors to test for the avilability of a nonstandard extension and react appropriately before the use of the extension would trigger a syntax error.
  • It makes it possible to create self-extracting tarballs (a shell script concatenated with a payload archive that extracts the archive when executed).
  • The shell does not have to read and parse the complete script before execution, which reduces memory usage when executing extremely long scripts.

On page 3681 lines 125831-125832 (XRAT C.2.5.3 ENV) change:
However, unlike dot scripts, no PATH searching is performed. This is used as a guard against Trojan Horse security breaches.
to:
However, unlike dot scripts, ENV scripts are parsed as a program, not a compound_list. This distinction matters because it influences when aliases take effect and whether syntax errors in the script are discovered before any part of the script is executed.

For security reasons, PATH is not searched when locating the ENV script.


geoffclare

2019-01-02 15:30

manager   bugnote:0004194

Reopening because the resolution includes a change to XCU 2.3.1 that overlaps with the one proposed in 0000953:0003113 of bug 0000953 (which was reopened after that proposal).

geoffclare

2019-02-11 16:58

manager   bugnote:0004247

On (2016 edition) page 2348 after line 74792 (just before XCU 2.3.1 Alias Substitution), insert a new paragraph:
In situations where the shell parses its input as a program, once a complete_command has been recognized by the grammar (see [xref to 2.10 Shell Grammar]), the complete_command shall be executed before the next complete_command is tokenized and parsed.

After (2016 edition) page 2412 line 77241 (set Application Usage), add a new paragraph:
Use of <tt>set -n</tt> causes the shell to parse the rest of the script without executing any commands, meaning that <tt>set +n</tt> cannot be used to undo the effect. Syntax checking is more commonly done via <tt>sh -n script_name</tt>.

After (2016 edition) page 3239 line 108855 (sh utility Application Usage), add a new paragraph:
<tt>sh -n</tt> can be used to check for many syntax errors without waiting for complete_commands to be executed, but may be fooled into declaring false positives or missing actual errors that would occur when the shell actually evaluates eval commands present in the script, or if there are alias (or unalias) commands in the script that would alter the syntax of commands that use the affected aliases.

On (2016 edition) page 3720 after line 127520 (just before XRAT C.2.3.1), insert a new paragraph:
Because a complete_command encountered during a program is executed before the next complete_command is tokenized and parsed, syntax errors are not discovered by the shell until just before the code would be executed. While in some cases it might be desirable to detect and react to syntax errors before anything is executed (possible with <tt>sh -n</tt>), deferring the discovery of syntax errors has several benefits:
  • It makes it possible for script authors to test for the availability of a nonstandard extension and react appropriately before the use of the extension would trigger a syntax error.

  • It makes it possible to create self-extracting tarballs (a shell script concatenated with a payload archive that extracts the archive when executed).

  • The shell does not have to read and parse the complete script before execution, which reduces memory usage when executing extremely long scripts.

Issue History

Date Modified Username Field Change
2016-06-02 16:49 rhansen New Issue
2016-06-02 16:49 rhansen Name => Richard Hansen
2016-06-02 16:49 rhansen Section => 2.3 Token Recognition
2016-06-02 16:49 rhansen Page Number => 2321-2322
2016-06-02 16:49 rhansen Line Number => 73636-73689
2016-06-02 16:49 rhansen Interp Status => ---
2016-06-02 16:53 rhansen Relationship added related to 0000953
2016-06-23 16:26 rhansen Note Added: 0003270
2016-06-23 16:28 rhansen Note Edited: 0003270
2016-06-23 16:32 rhansen Note Edited: 0003270
2016-06-23 16:44 rhansen Note Edited: 0003270
2016-06-29 16:06 geoffclare Note Added: 0003276
2016-07-07 15:50 rhansen Note Added: 0003291
2016-07-07 15:52 rhansen Note Added: 0003292
2016-07-07 15:53 rhansen Note Edited: 0003292
2017-07-06 16:00 Don Cragun Tag Attached: issue8
2017-07-06 16:00 geoffclare Final Accepted Text => 0001055:0003292
2017-07-06 16:00 geoffclare Status New => Resolved
2017-07-06 16:00 geoffclare Resolution Open => Accepted As Marked
2019-01-02 15:30 geoffclare Note Added: 0004194
2019-01-02 15:30 geoffclare Status Resolved => Under Review
2019-01-02 15:30 geoffclare Resolution Accepted As Marked => Reopened
2019-02-11 16:58 geoffclare Note Added: 0004247
2019-02-11 16:59 geoffclare Final Accepted Text 0001055:0003292 => 0001055:0004247
2019-02-11 16:59 geoffclare Status Under Review => Resolved
2019-02-11 16:59 geoffclare Resolution Reopened => Accepted As Marked
2019-02-11 17:00 geoffclare Tag Detached: issue8
2019-02-11 17:00 geoffclare Tag Attached: tc3-2008
2019-02-14 16:21 eblake Relationship added related to 0001048
2019-10-23 14:36 geoffclare Status Resolved => Applied
2024-06-11 08:56 agadmin Status Applied => Closed