|Anonymous | Login||2020-07-06 04:47 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001094||[1003.1(2016)/Issue7+TC2] Shell and Utilities||Editorial||Error||2016-10-18 12:19||2016-10-27 18:08|
|Section||2.10.2 Shell Grammar Rules|
|Final Accepted Text|
|Summary||0001094: Rule 7a is needed to apply to ASSIGNMENT_WORD productions|
Since the lexer section "Token Recognition" does not yield "ASSIGNMENT_WORD" identifier, but only "TOKEN", section 2.10.1 is applied, using commented actions in Grammar productions, to assign identifiers. However, for the two productions involving "ASSIGNMENT_WORD", no such action is specified.
One can perhaps assume, that in such a case, the specified identifier is the only one that "TOKEN" can turn to in that specific point in the Bison LALR(1) algorithm. That is in fact, how I understand the lack of comment actions with some cases of "WORD" identifier (and so I don't make another report on that topic).
However, in this particular case, that does not work either: one cannot assume that at this point in the parsing process, only "ASSIGNMENT_WORD" must result.
So it is necessary to make a comment action " /* Apply Rule 7a */
Add comment action:
/* Apply Rule 7a */
to each of the two productions involving ASSIGNMENT_WORD.
|Tags||No tags attached.|
edited on: 2016-10-19 10:31
That solution cannot be correct, applying rule 7 (either half) in
the cmd_prefix rules where ASSIGNMENT_WORD is expected would be
loo late - we must already have an ASSIGNMENT_WORD for those productions
I believe the intent is that the parser attempt to match either a cmd_name
or cmd_word, at which point 7a or 7b (resp) is applied, which may turn
the WORD we started with into an ASSIGNMENT_WORD and cause the cmd_name
or cmd_word production to fail (there is no longer a WORD) - backtracking
goes back to simple command, which can now match cmd_prefix which did not
Whether it is possible to make this work with any currently available
parser generator is an irrelevant issue for current purposes.
I see. I thought, that since the grammar is given in Bison/Yacc syntax, then it is implicitly assumed that the algorithm used for accepting strings, is the same as Bison.
From what you are saying, that may not be the case, especially because you talk about some "backtracking", which is not present in Bison. It also sounds like, there is no particular "parser generator" at all.
So if it's not Bison, then please explain, exactly how does this algorithm work. How does this "backtracking" work? How is accepting strings done? Where is this explained?
Not sure if it completely answers your question, but XCU 1.3 Grammar Conventions says:
Portions of this volume of POSIX.1-2008 are expressed in terms of a special grammar notation. It is used to portray the complex syntax of certain program input. The grammar is based on the syntax used by the yacc utility. However, it does not represent fully functional yacc input, suitable for program use; the lexical processing and all semantic requirements are described only in textual form. The grammar is not based on source used in any traditional implementation and has not been tested with the semantic code that would normally be required to accompany it. Furthermore, there is no implication that the partial yacc code presented represents the most efficient, or only, means of supporting the complex syntax within the utility. Implementations may use other programming languages or algorithms, as long as the syntax supported is the same as that represented by the grammar.
Well, from section 2.10.2, the algorithm used, is ? + the rules from 2.10.1. And from this quoted above, it definitely appears, that ? is equivalent to yacc algorithm. Therefore, there is no backtracking like kre says.
So as best I can tell, the current status of this report, is that there is something that needs to be added to the productions that involve "ASSIGNMENT_WORD", and kre says that what I want to add, is incorrect, but there is currently at present not another alternative given.
I believe your suggestion to add a comment to the ASSIGNMENT_WORD productions is correct, and kre is mistaken when he says "we must already have an ASSIGNMENT_WORD for those productions to apply". He has probably not taken into account the statement in 2.10.1 that "When a TOKEN is seen where one of those annotated productions could be used to reduce the symbol, the applicable rule shall be applied to convert the token identifier type of the TOKEN to a token identifier acceptable at that point in the grammar."
Tokens are always initially either an operator, IO_NUMBER, or TOKEN. The only way to get an ASSIGNMENT_WORD token is to convert TOKEN to ASSIGNMENT_WORD by applying rule 7. Rule 1 is the only one that applies globally, so the only way to have rule 7 apply is via an annotation in the grammar (as per Section 2.10.1).
Your suggestion was to add /* Apply Rule 7a */ to both productions, but I believe it should be 7a for the first and 7b for the second.
edited on: 2016-10-20 12:07
Hmm, that was a typo actually, I intended to /* Apply Rule 7b */ .
I don't understand (in this and many other cases), why it would matter to apply Rule 7a to the first ASSIGNMENT_WORD production. Why do we need to apply rules that we know won't yield the token ID that is acceptable at this point in the algorithm.
To illustrate this with an SSCCE, let's say this is the only token in the input, and that comes from the Token Recognition section as TOKEN. Out of all the productions in the grammar with WORD, NAME, ASSIGNMENT_WORD, or reserved word, we need to decide which production to use. The algorithm "somehow" (I am not concerned with that part of the algorithm) makes the connection between start symbol "program" and these productions. Based on that, the following productions are possible, which will be tried in the order given:
pipeline: Bang pipe_sequence
fname: NAME /* Apply rule 8 */
until_clause: Until (...)
while_clause: While (...)
if_clause: If (...)
case_clause: Case (...)
for_clause: For (...)
brace_group : Lbrace compound_list Rbrace
cmd_name: WORD /* Apply rule 7a */
cmd_prefix: ASSIGNMENT_WORD /* Apply rule 7a as you say, or 7b, as I say */
For all the productions without the rule, rule 1 is implicit.
Let's say the token is "foobar". In this case, all the productions will fail until we get to cmd_name. In particular, fname will fail because we lookahead one token and do not see '(', but see end of input. During application of rule 8, we do assign NAME to this token, but we cannot keep that (as kre implies that is done) because then we could not accept that once we get to cmd_name .
So then we get to cmd_name , still have TOKEN and not NAME, apply rule 7a and succeed. No need to go to cmd_prefix.
Now, let's say the token is "foobar=". As above, we get to cmd_name, apply 7a, direct to 7b, and fail to get WORD, so this production fails.
Then we go to cmd_prefix.
If it were as you say, we should first apply 7a, then we would have to redirect to 7b, and succeed.
Then my point is, why not just start with 7b for the production
7a cannot succeed anyway in this case, other than by redirecting to 7b, because rule 1 cannot yield ASSIGNMENT_WORD, why bother?
In fact, this is not just the case here, but in several other places in the grammar, we spend time to apply rules or parts of rules, that will fail for sure, to differentiate between several different token ID outcomes, that will all fail anyway.
in : In /*Apply Rule 6*/
Rule 6 for "for", differentiates between "In", "Do", and "WORD". Why bother with "Do", it fails anyway. We should just apply another rule here, that says, "yield In if it is exactly 'in', and WORD otherwise". This would be faster than the current Rule 6b. Why not?
|Not to confuse things, there is the original report, and then above I added another question "why bother applying rules or parts of rules that fail anyway" - I will split this off into another report.|
The reason I said 7a for the first and 7b for the second is because of the text that appears next to those rule numbers:
a. [When the first word]
b. [Not the first word]
I felt that the comments should be in keeping with those; if nothing else it would prevent future bug reports saying there is a mismatch.
|This report is included in the summary report 1100 and can be cancelled.|
Don Cragun (manager)
|Withdrawn by submitter as noted in Note: 0003460|
|2016-10-18 12:19||Mark_Galeck||New Issue|
|2016-10-18 12:19||Mark_Galeck||Name||=> Mark Galeck|
|2016-10-18 12:19||Mark_Galeck||Section||=> 2.10.2 Shell Grammar Rules|
|2016-10-18 12:19||Mark_Galeck||Page Number||=> 2380|
|2016-10-18 12:19||Mark_Galeck||Line Number||=> 76105-76106|
|2016-10-19 09:27||kre||Note Added: 0003433|
|2016-10-19 10:31||kre||Note Edited: 0003433|
|2016-10-19 20:21||Mark_Galeck||Note Added: 0003437|
|2016-10-20 08:23||geoffclare||Note Added: 0003439|
|2016-10-20 09:13||Mark_Galeck||Note Added: 0003440|
|2016-10-20 09:49||geoffclare||Note Added: 0003441|
|2016-10-20 12:03||Mark_Galeck||Note Added: 0003442|
|2016-10-20 12:04||Mark_Galeck||Note Edited: 0003442|
|2016-10-20 12:07||Mark_Galeck||Note Edited: 0003442|
|2016-10-20 12:14||Mark_Galeck||Note Added: 0003443|
|2016-10-20 14:26||geoffclare||Note Added: 0003444|
|2016-10-27 12:43||Mark_Galeck||Note Added: 0003460|
|2016-10-27 18:08||Don Cragun||Interp Status||=> ---|
|2016-10-27 18:08||Don Cragun||Note Added: 0003473|
|2016-10-27 18:08||Don Cragun||Status||New => Closed|
|2016-10-27 18:08||Don Cragun||Resolution||Open => Withdrawn|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|