0001100: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.

ID	Project	Category	View Status	Date Submitted	Last Update

0001100	1003.1(2016/18)/Issue7+TC2	Shell and Utilities	public	2016-10-27 12:40	2018-05-17 16:03

Reporter	Mark_Galeck	Assigned To
Priority	normal	Severity	Editorial	Type	Clarification Requested
Status	Closed	Resolution	Rejected

Name	Mark Galeck
Organization
User Reference
Section	2.10 Shell Grammar
Page Number	2375-2381
Line Number	75873-76150
Interp Status	---
Final Accepted Text


Summary	0001100: Rewrite of Section 2.10 Shell Grammar, of the Shell Standard, to fix previous reports, fix new issues, and improve presentation.
Description	I recently made several reports concerning sections 2.10.1/2, and then I saw at least one more problem of the similar kind. If I continue making incremental reports, even if the changes were approved, they will result in a bigger and bigger mess. Therefore I decided to cancel some previous reports, add new issues and make one summary report, which is a comprehensive rewrite of the whole Shell Grammar section, to fix the issues I find, as well as make the whole presentation more straightforward and less convoluted. Here is the list of all the specific bugs this report addresses, including some previous reports. I am not listing changes here that morely improve the presentation; to see all the changes, you should probably use some "diff" program. 1. Previous reports 1096, 1094, 1097, 1099, 1095, 1092 are included here and can be cancelled. 2. Previous reports 1098, 1093, 1091, 1088 can be cancelled. Let's say we classify them as bogus, and those changes are not included here. 3. (new issue) In the current standard, cmd_word cannot be a reserved word. It is very convoluted, but if you carefully trace the application of various rules to each other, you will end up that in fact, cmd_name and cmd_word follow exactly the same semantics right now, both do not allow reserved words. Only cmd_name should not allow reserved words. 4. (new issue) In multiple places in the current standard, rule 1 applies to WORD, and thus reserved words are not allowed, where all words should be allowed. Some of the reports above cover this. Additionally, we have: WORD in the case_clause production - currently it cannot be a reserved word, but it should be allowed to be a reserved word. Same for WORD in cmd_suffix production. ------------------------ This rewrite is intended only to include the changes mentioned above, and should otherwise be equivalent to the current standard. I will be happy to answer any questions, provide clarifications, or fix if you find any bugs. I do not have the time to discuss the merits of the changes. The maintainer of this standard is free to reject any part or all of this report, or to continue to rewrite my Section 2.10 in any way that suits them. I completely do not mind. Yes the text I provide for the new Section 2.10 is just raw text format, it does not have hyperlinks and different fonts. Somebody else would have to do that. Thank you!
Desired Action	2.10. Shell Grammar The following grammar defines the Shell Command Language. This formal syntax shall take precedence over the preceding text syntax description. The rules in Token Recognition delimit operator and word tokens. In order to appear in the grammar as token identifiers, the tokens shall be classified according to the following rules, applied in the following order of precedence: 1. The token identifier for any operator, occurs when the token is that operator. 2. IO_NUMBER is if the string consists solely of digits and the delimiter character is one of '<' or '>'. 3. This rule only applies in function_body production; see below in the grammar. Word expansion and assignment shall never occur, even when required by the rules below, when this production is being parsed. WORD is each token that might either be expanded or have assignment applied to it, consisting only of characters that are exactly described in Token Recognition. 4. The token identifier for any reserved word, occurs when the token is exactly that reserved word. Note: Because at this point <quotation-mark> characters are retained in the token, quoted strings cannot be recognized as reserved words. Also note that line joining is done before tokenization, as described in Escape Character (Backslash), so escaped <newline> characters are already removed at this point. 5. This rule only applies in simple_command and cmd_prefix productions; see below in the grammar. For this rule, we define "important" <equal-sign> characters in a token: they are unquoted (as determined while applying rule 4 from Token Recognition), that are not part of an embedded parameter expansion, command substitution, or arithmetic expansion construct (as determined while applying rule 5 from Token Recognition), and do not begin the token. For the definition of a valid "name", see XBD Name. 5a. If the token does not contain important '=' and is not a reserved word, it is WORD. If there are important '=' and all the characters preceding the first such '=' do not form a valid name, it is unspecified whether it is WORD. 5b. If the token does not contain important '=', it is WORD. If there are important '=' and all the characters preceding the first such '=' do not form a valid name, it is unspecified whether it is WORD. 5c. If there are important '=' and all the characters preceding the first such '=' form a valid name, it is ASSIGNMENT_WORD. If they do not form a valid name, it is unspecified whether it is ASSIGNMENT_WORD. Assignment to the name within ASSIGNMENT_WORD token shall occur as specified in Simple Commands. 6. This rule only applies in the function_definition production; see below in the grammar. NAME is any word that is not reserved, and is a valid name. 7. This rule only applies in the for_clause production; see below in the grammar. NAME is any valid name. 8. This rule only applies in pattern_not_esac productions; see below in the grammar. WORD is any word except 'esac'. 9. This rule only applies in here_end production; see below in the grammar. Quote removal shall be applied to the word to determine the delimiter that is used to find the end of the here-document that begins after the next <newline>. 10. This rule only applies in the filename production; see below in the grammar. The expansions specified in Redirection shall occur. WORD occurs, if as specified there, exactly one field results (or the result is unspecified), and there are additional requirements on pathname expansion. 11. WORD is any word. ------------------------------ The WORD tokens shall have the word expansion rules applied to them immediately before the associated command is executed, not at the time the command is parsed. /* ------------------------------------------------------- The grammar symbols ------------------------------------------------------- / %token WORD %token ASSIGNMENT_WORD %token NAME %token NEWLINE %token IO_NUMBER / The following are the operators (see XBD Operator) containing more than one character. / %token AND_IF OR_IF DSEMI / '&&' '\|\|' ';;' / %token DLESS DGREAT LESSAND GREATAND LESSGREAT DLESSDASH / '<<' '>>' '<&' '>&' '<>' '<<-' / %token CLOBBER / '>\|' / / The following are the reserved words. / %token If Then Else Elif Fi Do Done / 'if' 'then' 'else' 'elif' 'fi' 'do' 'done' / %token Case Esac While Until For / 'case' 'esac' 'while' 'until' 'for' / / These are reserved words, not operator tokens, and are recognized when reserved words are recognized. / %token Lbrace Rbrace Bang / '{' '}' '!' / %token In / 'in' / / ------------------------------------------------------- The Grammar ------------------------------------------------------- / %start program %% program : linebreak complete_commands linebreak \| linebreak ; complete_commands: complete_commands newline_list complete_command \| complete_command ; complete_command : list separator_op \| list ; list : list separator_op and_or \| and_or ; and_or : pipeline \| and_or AND_IF linebreak pipeline \| and_or OR_IF linebreak pipeline ; pipeline : pipe_sequence \| Bang pipe_sequence ; pipe_sequence : command \| pipe_sequence '\|' linebreak command ; command : simple_command \| compound_command \| compound_command redirect_list \| function_definition ; compound_command : brace_group \| subshell \| for_clause \| case_clause \| if_clause \| while_clause \| until_clause ; subshell : '(' compound_list ')' ; compound_list : linebreak term \| linebreak term separator ; term : term separator and_or \| and_or ; / Apply rule 7:/ for_clause : For NAME do_group \| For NAME sequential_sep do_group \| For NAME linebreak In sequential_sep do_group \| For NAME linebreak In wordlist sequential_sep do_group ; wordlist : wordlist WORD \| WORD ; case_clause : Case WORD linebreak In linebreak case_list Esac \| Case WORD linebreak In linebreak case_list_ns Esac \| Case WORD linebreak In linebreak Esac ; case_list_ns : case_list case_item_ns \| case_item_ns ; case_list : case_list case_item \| case_item ; case_item_ns : pattern_not_esac ')' linebreak \| pattern_not_esac ')' compound_list \| '(' pattern ')' linebreak \| '(' pattern ')' compound_list ; case_item : pattern_not_esac ')' linebreak DSEMI linebreak \| pattern_not_esac ')' compound_list DSEMI linebreak \| '(' pattern ')' linebreak DSEMI linebreak \| '(' pattern ')' compound_list DSEMI linebreak ; / Apply rule 8:/ pattern_not_esac: WORD \| WORD '\|' pattern ; pattern : WORD \| pattern '\|' WORD ; if_clause : If compound_list Then compound_list else_part Fi \| If compound_list Then compound_list Fi ; else_part : Elif compound_list Then compound_list \| Elif compound_list Then compound_list else_part \| Else compound_list ; while_clause : While compound_list do_group ; until_clause : Until compound_list do_group ; / Apply rule 6:/ function_definition : NAME '(' ')' linebreak function_body ; / Apply rule 3:/ function_body : compound_command \| compound_command redirect_list ; brace_group : Lbrace compound_list Rbrace ; do_group : Do compound_list Done ; simple_command : cmd_prefix WORD cmd_suffix / Apply rule 5b / \| cmd_prefix WORD / Apply rule 5b / \| cmd_prefix \| WORD cmd_suffix / Apply rule 5a / \| WORD / Apply rule 5a / ; / Apply rule 5c:/ cmd_prefix : io_redirect \| cmd_prefix io_redirect \| ASSIGNMENT_WORD \| cmd_prefix ASSIGNMENT_WORD ; cmd_suffix : io_redirect \| cmd_suffix io_redirect \| WORD \| cmd_suffix WORD ; redirect_list : io_redirect \| redirect_list io_redirect ; io_redirect : io_file \| IO_NUMBER io_file \| io_here \| IO_NUMBER io_here ; io_file : '<' filename \| LESSAND filename \| '>' filename \| GREATAND filename \| DGREAT filename \| LESSGREAT filename \| CLOBBER filename ; filename : WORD / Apply rule 10/ ; io_here : DLESS here_end \| DLESSDASH here_end ; here_end : WORD / Apply rule 9 / ; newline_list : NEWLINE \| newline_list NEWLINE ; linebreak : newline_list \| / empty */ ; separator_op : '&' \| ';' ; separator : separator_op linebreak \| newline_list ; sequential_sep : ';' linebreak \| newline_list ;
Tags	No tags attached.

has duplicate	0001098	Closed	1003.1(2016/18)/Issue7+TC2	do_group symbol cannot be accepted as written, because rule 6 cannot yield Done token
has duplicate	0001088	Closed	1003.1(2016/18)/Issue7+TC2	"When more than one rule applies, the highest numbered rule shall apply " is pointless
has duplicate	0001091	Closed	1003.1(2016/18)/Issue7+TC2	Some "WORD tokens" do not have "the associated command"
has duplicate	0001093	Closed	1003.1(2016/18)/Issue7+TC2	"or applies globally" is pointless
related to	0001082	Closed	1003.1(2016/18)/Issue7+TC2	"delimited" is incorrect
related to	0001083	Closed	1003.1(2016/18)/Issue7+TC2	"next" character is misleading
related to	0001084	Resolved	1003.1(2016/18)/Issue7+TC2	rule 3, 4, 5 do not say that a token is started, if needed
related to	0001085	Closed	1003.1(2016/18)/Issue7+TC2	"token shall be from the current position in the input" is incorrect
related to	0001086	Closed	1003.1(2016/18)/Issue7+TC2	Token "Recognition" is misleading and the usage of "word" in that section should be clarified.
related to	0001276	Closed	1003.1(2013)/Issue7+TC1	incorrect resolution in 0000839

Mark_Galeck 2016-10-27 12:57 reporter bugnote:0003470	In report 1098, shware_systems wrote the note 3457, and I don't understand parts of that note. I asked several questions, but they did not respond yet. Report 1098 is now intended to be included here. Once shware_systems responds, I will see if there is anything I need to fix, and then I will fix those things here.

kre 2018-03-28 03:59 reporter bugnote:0003944	I have not yet (after all this time) been able to find the time to see if the reworded section is correct or not.... (which also means I have not discovered any cases where it is incorrect.) But I do have 2 comments - both of which relate to other issues I believe. First the proposed rule 9, "next <newline>" is not nearly specific (or correct) enough to be useful. See issue 1043 (still unresolved...) Second, in proposed rule 6, and the grammar production for function_definition there is absolutely no reason for the function name to be a NAME rather than a word -in fact it should not be. Aside from (perhaps) disallowing '/' in function names (as such a function can never be executed because of the command search and execution rules) anything that can be a filesystem command name should be able to be a function name (including characters that need to be quoted to be entered without meaning something different, like white space and the operator and quoting characters). Most shells implement this already. (I kind of remember a bug report for this, but cannot find it now.)

shware_systems 2018-05-11 20:10 reporter bugnote:0004030	Re: Note 3457/3470 >This isn't obvious if one is thinking the grammar matches yacc or another standard's production style. As Geoff pointed out, the Introduction to Shell & Utilities, says "The grammar is based on the syntax used by the yacc utility." QUESTION. Are you contradicting that? Please explain. No, it is not a contradiction, as "based on" is not "matches" or "equivalent to", as is explained after that quote in XCU. My examples follow from the paragraph at XCU 2.9.1, Line 75534 - that there is a defined behavior when no command name is present... The grammar or note 7 may not reflect this properly, but I don't see that any shell should be reporting it as a syntax error; since no I/O redirects are specified that might affect the sub-shell being setup it's simply a no-op that uses up some time, that I see.

kre 2018-05-11 21:39 reporter bugnote:0004031	Mark (Galeck) - forget note 3457 (issue 1098) - it is 90% gibberish, and has essentially no relation to anything. The point about the notes that you and Geoff were disagreeing about is due to a misunderstanding about how they work - which should be clarified. When a rule says "apply rule N" it is not intended to mean that rule N applies here, and nothing else does, what it means is to look and see if rule N can apply here, and if it can, apply it. Otherwise rule 1 applies. Rule 6 applies only when a for or case statement is being parsed (which must be a for statement for the do_group rule - case does not use do_group - rule 6 gets used there via the production for "in") and it only applies to the third word of that statement. Its purpose is so that for WORD in for word do can recognise in or do as a reserved word (In or Do) and not just a word (the asterisks are just for emphasis here). That (and the "case word in" via the in: In production) are the only times rule 6 ever applies, it is never used anywhere else, that's what the: 6. [Third word of for and case] is all about. And then either 6a or 6b applies depending upon whether the first word was "case" or "for" resp. Done is recognised by rule 1, only when it appears in a command name position (certainly not anywhere in a do_group) so, for example in for x in a b c; do echo done; done similarly for x in do ; do echo $x; done the first "do" is WORD, as it is not the 3rd word, so rule 6 does not apply to it (and it is not the command word, so rule 1 does not apply either). but for x do echo done; done the "do" is "Do" as that is the 3rd word of a for, so rule 6 does apply (and once it does, rule 1 is irrelevant to that token.) the first "done" parses as WORD as it is not in the command name position ("echo" is there) but after the ';' we have a new command name next, rule 1 applies, and "done" produces Done The grammar you are suggesting seems to do things a different way, with a rule applying for a whole production. I suspect a change like that is more than we would want to make - better to just clarify better exactly what "apply rule N" means, and the conditions upon that.

shware_systems 2018-05-12 06:59 reporter bugnote:0004032	I'd forget Note 4031: he considers anything he isn't smart enough to understand gibberish, apparently, and likes being rude in the process.

eblake 2018-05-17 15:33 manager bugnote:0004037	Here is a diff between the original formal grammar and the proposed new one: --- /tmp/grammar.1 2018-05-10 09:11:35.894306140 -0700 +++ /tmp/grammar.2 2018-05-10 09:12:23.347012514 -0700 @@ -96,21 +96,18 @@ term : term separator and_or \| and_or ; -for_clause : For name do_group - \| For name sequential_sep do_group - \| For name linebreak in sequential_sep do_group - \| For name linebreak in wordlist sequential_sep do_group - ; -name : NAME /* Apply rule 5 / - ; -in : In / Apply rule 6 / +/ Apply rule 7:/ +for_clause : For NAME do_group + \| For NAME sequential_sep do_group + \| For NAME linebreak In sequential_sep do_group + \| For NAME linebreak In wordlist sequential_sep do_group ; wordlist : wordlist WORD \| WORD ; -case_clause : Case WORD linebreak in linebreak case_list Esac - \| Case WORD linebreak in linebreak case_list_ns Esac - \| Case WORD linebreak in linebreak Esac +case_clause : Case WORD linebreak In linebreak case_list Esac + \| Case WORD linebreak In linebreak case_list_ns Esac + \| Case WORD linebreak In linebreak Esac ; case_list_ns : case_list case_item_ns \| case_item_ns @@ -118,18 +115,22 @@ case_list : case_list case_item \| case_item ; -case_item_ns : pattern ')' linebreak - \| pattern ')' compound_list +case_item_ns : pattern_not_esac ')' linebreak + \| pattern_not_esac ')' compound_list \| '(' pattern ')' linebreak \| '(' pattern ')' compound_list ; -case_item : pattern ')' linebreak DSEMI linebreak - \| pattern ')' compound_list DSEMI linebreak +case_item : pattern_not_esac ')' linebreak DSEMI linebreak + \| pattern_not_esac ')' compound_list DSEMI linebreak \| '(' pattern ')' linebreak DSEMI linebreak \| '(' pattern ')' compound_list DSEMI linebreak ; -pattern : WORD / Apply rule 4 / - \| pattern '\|' WORD / Do not apply rule 4 / +/ Apply rule 8:/ +pattern_not_esac: WORD + \| WORD '\|' pattern + ; +pattern : WORD + \| pattern '\|' WORD ; if_clause : If compound_list Then compound_list else_part Fi \| If compound_list Then compound_list Fi @@ -142,27 +143,24 @@ ; until_clause : Until compound_list do_group ; -function_definition : fname '(' ')' linebreak function_body - ; -function_body : compound_command / Apply rule 9 / - \| compound_command redirect_list / Apply rule 9 / +/ Apply rule 6:/ +function_definition : NAME '(' ')' linebreak function_body ; -fname : NAME / Apply rule 8 / +/ Apply rule 3:/ +function_body : compound_command + \| compound_command redirect_list ; brace_group : Lbrace compound_list Rbrace ; -do_group : Do compound_list Done / Apply rule 6 / +do_group : Do compound_list Done ; -simple_command : cmd_prefix cmd_word cmd_suffix - \| cmd_prefix cmd_word +simple_command : cmd_prefix WORD cmd_suffix / Apply rule 5b / + \| cmd_prefix WORD / Apply rule 5b / \| cmd_prefix - \| cmd_name cmd_suffix - \| cmd_name - ; -cmd_name : WORD / Apply rule 7a / - ; -cmd_word : WORD / Apply rule 7b / + \| WORD cmd_suffix / Apply rule 5a / + \| WORD / Apply rule 5a / ; +/ Apply rule 5c:/ cmd_prefix : io_redirect \| cmd_prefix io_redirect \| ASSIGNMENT_WORD @@ -189,12 +187,12 @@ \| LESSGREAT filename \| CLOBBER filename ; -filename : WORD / Apply rule 2 / +filename : WORD / Apply rule 10/ ; io_here : DLESS here_end \| DLESSDASH here_end ; -here_end : WORD / Apply rule 3 / +here_end : WORD / Apply rule 9 */ ; newline_list : NEWLINE \| newline_list NEWLINE @@ -211,4 +209,3 @@ sequential_sep : ';' linebreak \| newline_list ; -

~~Don Cragun~~ 2018-05-17 15:58 viewer bugnote:0004038	We believe that some of the changes suggested in this bug report reflect a misunderstanding of the grammar as it is presented in the standard rather than problems in the grammar itself. With no rationale for the changes that are being made, no indication of what is intended to be fixed by the changes that have been made, and no definitions for new terms that have been added to the grammar and the description of the grammar, we are unable to determine which, if any, of the suggested changes should be made. We believe that there may be discrepancies between the grammar as it currently appears in the standard and the shell language described by the standard, but are unable to determine which, if any, of the changes suggested in this bug report address those problems. We are going to reject this bug report, but would be happy to have the submitter provide another bug report with a list of defects that need to be addressed and a set of changes to meet those defects (with each change identifying the defect it addresses). We would also like to see addtitions to the definitions section for newly defined terms (e.g., "important" <equal-sign> characters) and changes to the rationale in XRAT C.2.10 explaining how the grammar is being changed to reflect differences between what the standard has intended to require and what the grammar currently does require. When describing problems in the grammar, giving an example of a shell construct that is not accepted by the grammar when it should be or that is accepted by the grammar when it should not be would be a big help in understanding the issues that are being addressed by proposed changes. Note that existing shells are allowed to support extensions to constructs required by the POSIX shell grammar. Therefore, there is no requirement that all existing shell constructs need to be recognized by the grammar.

Date Modified	Username	Field	Change
2016-10-27 12:40	Mark_Galeck	New Issue
2016-10-27 12:40	Mark_Galeck	Name	=> Mark Galeck
2016-10-27 12:40	Mark_Galeck	Section	=> 2.10 Shell Grammar
2016-10-27 12:40	Mark_Galeck	Page Number	=> 2375-2381
2016-10-27 12:40	Mark_Galeck	Line Number	=> 75873-76150
2016-10-27 12:57	Mark_Galeck	Note Added: 0003470
2016-10-28 08:19	geoffclare	Relationship added	related to 0001082
2016-10-28 08:20	geoffclare	Relationship added	related to 0001083
2016-10-28 08:20	geoffclare	Relationship added	related to 0001084
2016-10-28 08:21	geoffclare	Relationship added	related to 0001085
2016-10-28 08:21	geoffclare	Relationship added	related to 0001086
2016-10-28 08:22	geoffclare	Relationship added	related to 0001098
2018-03-28 03:59	kre	Note Added: 0003944
2018-04-12 15:38	eblake	Relationship added	has duplicate 0001088
2018-04-12 15:39	eblake	Relationship added	has duplicate 0001091
2018-04-12 15:39	eblake	Relationship added	has duplicate 0001093
2018-04-12 15:40	eblake	Relationship replaced	has duplicate 0001098
2018-05-11 20:10	shware_systems	Note Added: 0004030
2018-05-11 21:39	kre	Note Added: 0004031
2018-05-12 06:59	shware_systems	Note Added: 0004032
2018-05-17 15:33	eblake	Note Added: 0004037
2018-05-17 15:58	~~Don Cragun~~	Note Added: 0004038
2018-05-17 16:03	~~Don Cragun~~	Interp Status	=> ---
2018-05-17 16:03	~~Don Cragun~~	Status	New => Closed
2018-05-17 16:03	~~Don Cragun~~	Resolution	Open => Rejected
2019-07-30 14:27	eblake	Relationship added	related to 0001276

View Issue Details

Relationships

Activities

Issue History