0001094: Rule 7a is needed to apply to ASSIGNMENT_WORD productions

Notes
(0003433) kre (reporter) 2016-10-19 09:27 edited on: 2016-10-19 10:31	That solution cannot be correct, applying rule 7 (either half) in the cmd_prefix rules where ASSIGNMENT_WORD is expected would be loo late - we must already have an ASSIGNMENT_WORD for those productions to apply. I believe the intent is that the parser attempt to match either a cmd_name or cmd_word, at which point 7a or 7b (resp) is applied, which may turn the WORD we started with into an ASSIGNMENT_WORD and cause the cmd_name or cmd_word production to fail (there is no longer a WORD) - backtracking goes back to simple command, which can now match cmd_prefix which did not apply earlier. Whether it is possible to make this work with any currently available parser generator is an irrelevant issue for current purposes.

(0003437) Mark_Galeck (reporter) 2016-10-19 20:21	I see. I thought, that since the grammar is given in Bison/Yacc syntax, then it is implicitly assumed that the algorithm used for accepting strings, is the same as Bison. From what you are saying, that may not be the case, especially because you talk about some "backtracking", which is not present in Bison. It also sounds like, there is no particular "parser generator" at all. So if it's not Bison, then please explain, exactly how does this algorithm work. How does this "backtracking" work? How is accepting strings done? Where is this explained?

(0003439) geoffclare (manager) 2016-10-20 08:23	Not sure if it completely answers your question, but XCU 1.3 Grammar Conventions says: Portions of this volume of POSIX.1-2008 are expressed in terms of a special grammar notation. It is used to portray the complex syntax of certain program input. The grammar is based on the syntax used by the yacc utility. However, it does not represent fully functional yacc input, suitable for program use; the lexical processing and all semantic requirements are described only in textual form. The grammar is not based on source used in any traditional implementation and has not been tested with the semantic code that would normally be required to accompany it. Furthermore, there is no implication that the partial yacc code presented represents the most efficient, or only, means of supporting the complex syntax within the utility. Implementations may use other programming languages or algorithms, as long as the syntax supported is the same as that represented by the grammar.

(0003440) Mark_Galeck (reporter) 2016-10-20 09:13	Well, from section 2.10.2, the algorithm used, is ? + the rules from 2.10.1. And from this quoted above, it definitely appears, that ? is equivalent to yacc algorithm. Therefore, there is no backtracking like kre says. So as best I can tell, the current status of this report, is that there is something that needs to be added to the productions that involve "ASSIGNMENT_WORD", and kre says that what I want to add, is incorrect, but there is currently at present not another alternative given.

(0003441) geoffclare (manager) 2016-10-20 09:49	I believe your suggestion to add a comment to the ASSIGNMENT_WORD productions is correct, and kre is mistaken when he says "we must already have an ASSIGNMENT_WORD for those productions to apply". He has probably not taken into account the statement in 2.10.1 that "When a TOKEN is seen where one of those annotated productions could be used to reduce the symbol, the applicable rule shall be applied to convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar." Tokens are always initially either an operator, IO_NUMBER, or TOKEN. The only way to get an ASSIGNMENT_WORD token is to convert TOKEN to ASSIGNMENT_WORD by applying rule 7. Rule 1 is the only one that applies globally, so the only way to have rule 7 apply is via an annotation in the grammar (as per Section 2.10.1). Your suggestion was to add /* Apply Rule 7a */ to both productions, but I believe it should be 7a for the first and 7b for the second.

(0003442) Mark_Galeck (reporter) 2016-10-20 12:03 edited on: 2016-10-20 12:07	Hmm, that was a typo actually, I intended to /* Apply Rule 7b / . I don't understand (in this and many other cases), why it would matter to apply Rule 7a to the first ASSIGNMENT_WORD production. Why do we need to apply rules that we know won't yield the token ID that is acceptable at this point in the algorithm. To illustrate this with an SSCCE, let's say this is the only token in the input, and that comes from the Token Recognition section as TOKEN. Out of all the productions in the grammar with WORD, NAME, ASSIGNMENT_WORD, or reserved word, we need to decide which production to use. The algorithm "somehow" (I am not concerned with that part of the algorithm) makes the connection between start symbol "program" and these productions. Based on that, the following productions are possible, which will be tried in the order given: pipeline: Bang pipe_sequence fname: NAME / Apply rule 8 / until_clause: Until (...) while_clause: While (...) if_clause: If (...) case_clause: Case (...) for_clause: For (...) brace_group : Lbrace compound_list Rbrace cmd_name: WORD / Apply rule 7a / cmd_prefix: ASSIGNMENT_WORD / Apply rule 7a as you say, or 7b, as I say / For all the productions without the rule, rule 1 is implicit. Let's say the token is "foobar". In this case, all the productions will fail until we get to cmd_name. In particular, fname will fail because we lookahead one token and do not see '(', but see end of input. During application of rule 8, we do assign NAME to this token, but we cannot keep that (as kre implies that is done) because then we could not accept that once we get to cmd_name . So then we get to cmd_name , still have TOKEN and not NAME, apply rule 7a and succeed. No need to go to cmd_prefix. Now, let's say the token is "foobar=". As above, we get to cmd_name, apply 7a, direct to 7b, and fail to get WORD, so this production fails. Then we go to cmd_prefix. If it were as you say, we should first apply 7a, then we would have to redirect to 7b, and succeed. Then my point is, why not just start with 7b for the production cmd_prefix: ASSIGNMENT_WORD 7a cannot succeed anyway in this case, other than by redirecting to 7b, because rule 1 cannot yield ASSIGNMENT_WORD, why bother? In fact, this is not just the case here, but in several other places in the grammar, we spend time to apply rules or parts of rules, that will fail for sure, to differentiate between several different token ID outcomes, that will all fail anyway. For example, in : In /Apply Rule 6*/ Rule 6 for "for", differentiates between "In", "Do", and "WORD". Why bother with "Do", it fails anyway. We should just apply another rule here, that says, "yield In if it is exactly 'in', and WORD otherwise". This would be faster than the current Rule 6b. Why not?

(0003443) Mark_Galeck (reporter) 2016-10-20 12:14	Not to confuse things, there is the original report, and then above I added another question "why bother applying rules or parts of rules that fail anyway" - I will split this off into another report.

(0003444) geoffclare (manager) 2016-10-20 14:26	The reason I said 7a for the first and 7b for the second is because of the text that appears next to those rule numbers: a. [When the first word] b. [Not the first word] I felt that the comments should be in keeping with those; if nothing else it would prevent future bug reports saying there is a mismatch.

(0003460) Mark_Galeck (reporter) 2016-10-27 12:43	This report is included in the summary report 1100 and can be cancelled.

(0003473) Don Cragun (manager) 2016-10-27 18:08	Withdrawn by submitter as noted in Note: 0003460

Issue History
Date Modified	Username	Field	Change
2016-10-18 12:19	Mark_Galeck	New Issue
2016-10-18 12:19	Mark_Galeck	Name	=> Mark Galeck
2016-10-18 12:19	Mark_Galeck	Section	=> 2.10.2 Shell Grammar Rules
2016-10-18 12:19	Mark_Galeck	Page Number	=> 2380
2016-10-18 12:19	Mark_Galeck	Line Number	=> 76105-76106
2016-10-19 09:27	kre	Note Added: 0003433
2016-10-19 10:31	kre	Note Edited: 0003433
2016-10-19 20:21	Mark_Galeck	Note Added: 0003437
2016-10-20 08:23	geoffclare	Note Added: 0003439
2016-10-20 09:13	Mark_Galeck	Note Added: 0003440
2016-10-20 09:49	geoffclare	Note Added: 0003441
2016-10-20 12:03	Mark_Galeck	Note Added: 0003442
2016-10-20 12:04	Mark_Galeck	Note Edited: 0003442
2016-10-20 12:07	Mark_Galeck	Note Edited: 0003442
2016-10-20 12:14	Mark_Galeck	Note Added: 0003443
2016-10-20 14:26	geoffclare	Note Added: 0003444
2016-10-27 12:43	Mark_Galeck	Note Added: 0003460
2016-10-27 18:08	Don Cragun	Interp Status	=> ---
2016-10-27 18:08	Don Cragun	Note Added: 0003473
2016-10-27 18:08	Don Cragun	Status	New => Closed
2016-10-27 18:08	Don Cragun	Resolution	Open => Withdrawn

Relationships

Aardvark Mark IV