|Anonymous | Login||2021-04-21 05:45 UTC|
|Main | My View | View Issues | Change Log | Docs|
|Viewing Issue Simple Details|
|ID||Category||Severity||Type||Date Submitted||Last Update|
|0001043||[1003.1(2013)/Issue7+TC1] Shell and Utilities||Objection||Omission||2016-04-07 13:16||2017-03-23 16:11|
|Final Accepted Text|
|Summary||0001043: Which newline starts collection of here document data?|
The spec for a here doc says that the here doc will begin
after the next newline.
First, let's assume that really means after the next NEWLINE
token, as is written elsewhere, that is in
sed << FILE_END '
none of the newlines in the quoted string is intended to be the
"next newline" in question. Since none of those is a NEWLINE token
making that simple change avoids problems there. To the best of
my belief, there is no shell that doesn't do it this way already.
That still leaves two unresolved issues, probably highly related to
each other, but seemingly different in a sense, and both relate to
subshell environments used in relationship with here docs.
One easy way to seen this is to use command substitution to make
the subshell environment, so let's concentrate on that first.
The first issue is: Does a newline token in a command substitution
that starts on the same line (hence no earlier NEWLINE token)
as a here doc redirection operator, count as the NEWLINE token
(meaning the here doc would appear in the middle of the command
substitution, even though it is not used with it in any way) or
is the search for a NEWLINE token interrupted while processing
the text of the command substitution, meaning that the here
document starts not at the "next" NEWLINE token, but at the
NEWLINE token that next appears at the "same parser level"
(which I am sure is not the correct way to say what I mean.)
And second, if a here document redirect operator appears within
a command substitution, does the here document also have to appear
within the same command substitution, even in cases where otherwise
the command substitution would contain no NEWLINE token at all.
Examples to illustrate:
cat $( find . -name text-file* -mtime +3
-ctime -1 )
is intended to cat a bunch of files found by fine (for the purpose
of this example, let's ignore the filename issues raised by doing find
this way, that isn't relevant to the point.)
If we then assume that we also want to prefix the output with a
standard message, we might want to include that in what cat reads
and prints, one way would be
printf "%s\n" message > /tmp/file
cat /tmp/file $( find ...
but this is a perfect use of here documents, so ...
cat - $( find ...
... ) <<EOF
will clearly work. But that separates the << from the "-" that
uses it, so we may prefer to write
cat - <<EOF $( find ...
and at this point we need to answer the question, is what follows
or is what follows
This one seems to have been implemented both ways by different shells.
A literal reading of the current specification would suggest that the
first way is correct - the "next NEWLINE token" (or even just the next
newline - in this example they are the same) is the one in the middle
of the command substitution, so the here document should start there.
But most people are likely to find that form difficult to comprehend,
and probably even more difficult to write correctly.
For the second issue consider
printf "%s\n" $( cat << EOF )
Is that valid, or not. Again, according to a literal reading of the
specification, it is - the next newline token is the one that appears
after the closing ')' of the command substitution. However, many
shells expect that command (inside the $( ) ) to be complete by itself,
and treat the here document referenced there as being empty (delimited
by the "end of file" which is the end of the command substitution string,
and either simply pass an empty file to cat as its stdin, or generate
a syntax error - which of those is appropriate is one of the issues
of 0001036) Shells that don't abort because of a syntax error and
which act this way then go on to attempt to execute "line" and "EOF"
as commands. Other shells simply keep looking for a "next newline"
outside the command substitution, and pass "line 1" and "line 2" to
cat, which eventually gives those lines to printf.
Which of those is correct ?
I should also say that for these, it should make no difference
whether $( ) (new style) command substitution, or `...` (old style)
is used, the same issues arise. I will also say that when old
style is used, no shell I know of (until I changed the NetBSD shell
within the past week) parse
cat ` sed 's/-/_/' <<FILENAME ) `
as the author of that script clearly intended it to be parsed
(the actual script where this was detected was a little more
complicated, and had a better reason to be written in this kind
of way - though it could easily have moved the closing ` to
after the line containing FILENAME.
The same issues arise with ( ) sub-shells
cat << FO0 | ( while read line
whatever || break
if [ "$line" = something ]
In that, where is the correct spot to put the here document data?
This one doesn't even have the easy answer "just move the << to later"
that exists in the earlier case. However, it could be written
cat << FOO |
(while read line
which I suspect all shells would parse as intended. Much the
isame issue arises if the ( ) are not used, as in
cat << FOO | while read line
Here doc data here, or not ??
Or is the here doc data here?
And for the other issue
(cat << FILE1; cat << FILE2) | wc -l
data for file1
data for file2
I know, an unlikely command, but still... Is this correct, or should
it be written as
(cat << FILE1; cat << FILE2
data for file1
data foe file2
) | wc -l
I have no doubt that the second form is correct, but is the first
correct as well?
For the second issue, I believe a suitable solution is clear.
Add words like
It is unspecified whether the here document data for
a here document relocation operator is required to
occur in the same subshell environment as the operator.
Applications shall ensure that when a here document
redirection operator occurs in a subshell environment
the data is also placed in that same environment.
Though what that does to pipelines, and similar, I am not sure.
cat << EOF | wc -l
Would that remain valid? If not, how should the wording be fixed
to allow that one to work, while requiring applications to keep
here docs inside () and $( ) and `..` ?
For the first issue, the solution appears to be less clear.
I suspect that the best that can be done is ...
It is unspecified whether a NEWLINE token that appears
within shell input that is to be executed within a
sub-shell environment, where the redirection operator
occurs outside that sub-shell, is the "next NEWLINE token"
which starts collection of data for the here document.
Applications shall be code so as avoid this.
But that is ugly...
|Tags||No tags attached.|
Some of your questions are easy to answer once you understand that a
command substitution with $(..) or `..` always is a word or part of a
Given that the shell needs to first collect all characters that form the
word, it is obvious that "the next NEWLINE" must be seen locally first,
in case of a here document that appears to be inside a command substitution.
Sorry, I have no idea what "must be seen locally first" means.
The point here is that shells interpret these things in different
ways. Perhaps there is something in the spec which makes it clear
which is correct, but personally, I cannot see it.
Perhaps it is obvious which should be correct - and maybe this is
a case where what should be correct (rather than what is actually
implemented) might be specified (since it is rather an outlier in
the syntax) but if that is the case, I cannot come to a conclusion
about what should be correct, and what should not. I know what the
NetBSD shell does in these cases, and I have done some testing of
other shells, but none of that has blessed me with magic enlightenment
ps: I do understand that command substitution is part of a word, but
I cannot fathom how that helps - the actual here document, and the
here document operator that creates it, are separated lexically in the
input. What matters is just how that is to be resolved in some kind
of consistent matter that is more or less in accordance with what works
That a command substitution is always part of a WORD or similar token implies that any newlines part of the command substitution are not NEWLINE tokens on that level and do not start here-document contents. For example:
cat - <<EOF $(find .
is a valid command.
A different situation is where the << redirection is within the command substitution and the here-document contents are outside of it. Historically, ash variants have used their implementation technique that fully parses command substitutions when encountered to allow things like:
in addition to the standard
The ash-specific form violates the statement in XCU 2.6.3 Command Substitution that "all characters following the open parenthesis to the matching closing parenthesis constitute the command", since the here-document contents are outside the parentheses.
More practically, the ash-specific form is hard to parse for implementations that only parse command substitutions to the minimal level necessary to find their end while parsing the outer command and only fully parse them just before execution. I think both implementation techniques (ash-style immediate full parse and bash/ksh93-style minimal immediate parse) should be valid. Changing from the latter to the former technique is likely break existing scripts that contain invalid command substitutions that are not executed.
The same special form with `...` command substitution:
seems to have no historical basis.
Perhaps I erred by concentrating so much on command substitution in
the original filing of this issue, it is just that that is where it
first really came to my attention, so ...
But this, from note 3145
That a command substitution is always part of a WORD or similar token
implies that any newlines part of the command substitution are not NEWLINE
tokens on that level
gets right to the crux of the issue, and which led to the title of this
bug report "which newline ..."
That is, from where, in the standard, do you get the qualification
"on that level". I do not see that anywhere.
If we take that same example, and re-cast it slightly to:
cat - <<EOF ; if find .
then (ignoring the command args, and whether this is a sane way to write
the command) is that a legal command sequence, or not (this time using "if"
and "fi" as the bracketing operators rather than $( ) ).
If this is correct, upon what basis is the newline after "." being ignored
What if we made it a simple subshell instead ..
cat - <<EOF ; ( find .
Is that one correct? And if so, the same question. There is no "one word"
or even "same level" argument to use here.
And if those forms are not valid, how exactly do you explain to script
writers how those (particularly the sub-shell version) are different from
the command substitution example in a way they can comprehend.
And while doing this also explain how
(cat << EOF) | cmd
works in a consistent way (which I am assuming we agree is how it should work)
Or is it required to be written
(cat << EOF
) | cmd
? And if that is required, where is that written? The spec just says that
here doc data comes after the next newline (token) - and we are back to the
topic of the bug report - "which newline (token)" ?
Historically, ash variants have used their implementation technique that
fully parses command substitutions when encountered to allow things like:
in addition to the standard
I have no problem with considering the second of those "standard", but
I am by no means convinced that the first is not just as standard. I
see nothing written currently that makes it so - maybe the ash technique is
how those things should be parsed? Or maybe the doc is just deficient
and needs fixing?
Note: I have no particular axe to grind here, I am not advocating one result
over another (which the wording I proposed adding, as poor and sloppy as it
was, should, I hope, make clear.) What I would like to see happen is for
some resolution to be reached so that this same discussion doesn't have to
happen again sometime in the future, when perhaps there is actually something
important riding on the outcome.
Lastly, I agree that the form:
seems to have never been implemented (previously) anywhere. However I saw it
used in an actual script (one I did not write - rather, one I got bug reports
about when I made NetBSD's sh start to object to, rather than simply ignore,
missing here document data - previously the script had been parsed without
error, after my earlier change, it no longer was, and that was brought to my
attention as a problem caused by my first change.)
Now the script in question had other errors, it could never have actually
worked as intended, so it is not really a good example to use, but when I
thought about it, I could find nothing in the standard to forbid this (that
the command actually embedded in the `...` did not do what its author intended
was not material), if anything the "next newline (token)" wording seems to
explicitly allow it.
It turned out to be easy to "fix" (and looked to be something of an oversight
caused by the way that `...` type command substitutions are parsed, that it
had not worked all along) so I did. That handled the "bug report" ... the
script in question still doesn't work, but that doesn't matter, it has no
syntax error any more, so it parses "correctly" (even if differently than
before) and the actual command sequence is, in practice, never used anyway.
So, everyone was happy...
|2016-04-07 13:16||kre||New Issue|
|2016-04-07 13:16||kre||Name||=> Robert Elz|
|2016-04-07 13:16||kre||Section||=> 2.7.4|
|2016-04-07 13:16||kre||Page Number||=> 2335-2336|
|2016-04-07 13:16||kre||Line Number||=> 74235-74256|
|2016-04-08 13:33||joerg||Note Added: 0003141|
|2016-04-09 11:42||kre||Note Added: 0003144|
|2016-04-09 15:33||jilles||Note Added: 0003145|
|2016-04-10 02:27||kre||Note Added: 0003147|
|2017-03-23 16:11||nick||Relationship added||related to 0001036|
|2017-03-23 16:11||nick||Relationship added||related to 0001037|
|Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group|