Anonymous | Login | 2021-01-21 04:32 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001043 | [1003.1(2013)/Issue7+TC1] Shell and Utilities | Objection | Omission | 2016-04-07 13:16 | 2017-03-23 16:11 | |||||||
Reporter | kre | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | |||||||||||
Name | Robert Elz | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | 2.7.4 | |||||||||||
Page Number | 2335-2336 | |||||||||||
Line Number | 74235-74256 | |||||||||||
Interp Status | --- | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001043: Which newline starts collection of here document data? | |||||||||||
Description |
The spec for a here doc says that the here doc will begin after the next newline. First, let's assume that really means after the next NEWLINE token, as is written elsewhere, that is in sed << FILE_END ' s/$/: EOL/ 22i\ -------------- /foo/s/bar/& bletch/ ' none of the newlines in the quoted string is intended to be the "next newline" in question. Since none of those is a NEWLINE token making that simple change avoids problems there. To the best of my belief, there is no shell that doesn't do it this way already. That still leaves two unresolved issues, probably highly related to each other, but seemingly different in a sense, and both relate to subshell environments used in relationship with here docs. One easy way to seen this is to use command substitution to make the subshell environment, so let's concentrate on that first. The first issue is: Does a newline token in a command substitution that starts on the same line (hence no earlier NEWLINE token) as a here doc redirection operator, count as the NEWLINE token (meaning the here doc would appear in the middle of the command substitution, even though it is not used with it in any way) or is the search for a NEWLINE token interrupted while processing the text of the command substitution, meaning that the here document starts not at the "next" NEWLINE token, but at the NEWLINE token that next appears at the "same parser level" (which I am sure is not the correct way to say what I mean.) And second, if a here document redirect operator appears within a command substitution, does the here document also have to appear within the same command substitution, even in cases where otherwise the command substitution would contain no NEWLINE token at all. Examples to illustrate: First, given cat $( find . -name text-file* -mtime +3 -ctime -1 ) is intended to cat a bunch of files found by fine (for the purpose of this example, let's ignore the filename issues raised by doing find this way, that isn't relevant to the point.) If we then assume that we also want to prefix the output with a standard message, we might want to include that in what cat reads and prints, one way would be printf "%s\n" message > /tmp/file and then cat /tmp/file $( find ... ... ) but this is a perfect use of here documents, so ... cat - $( find ... ... ) <<EOF message EOF will clearly work. But that separates the << from the "-" that uses it, so we may prefer to write cat - <<EOF $( find ... and at this point we need to answer the question, is what follows that message EOF ... ) or is what follows ... ) message EOF This one seems to have been implemented both ways by different shells. A literal reading of the current specification would suggest that the first way is correct - the "next NEWLINE token" (or even just the next newline - in this example they are the same) is the one in the middle of the command substitution, so the here document should start there. But most people are likely to find that form difficult to comprehend, and probably even more difficult to write correctly. For the second issue consider printf "%s\n" $( cat << EOF ) line 1 line 2 EOF Is that valid, or not. Again, according to a literal reading of the specification, it is - the next newline token is the one that appears after the closing ')' of the command substitution. However, many shells expect that command (inside the $( ) ) to be complete by itself, and treat the here document referenced there as being empty (delimited by the "end of file" which is the end of the command substitution string, and either simply pass an empty file to cat as its stdin, or generate a syntax error - which of those is appropriate is one of the issues of 0001036) Shells that don't abort because of a syntax error and which act this way then go on to attempt to execute "line" and "EOF" as commands. Other shells simply keep looking for a "next newline" outside the command substitution, and pass "line 1" and "line 2" to cat, which eventually gives those lines to printf. Which of those is correct ? I should also say that for these, it should make no difference whether $( ) (new style) command substitution, or `...` (old style) is used, the same issues arise. I will also say that when old style is used, no shell I know of (until I changed the NetBSD shell within the past week) parse cat ` sed 's/-/_/' <<FILENAME ) ` file-name FILENAME as the author of that script clearly intended it to be parsed (the actual script where this was detected was a little more complicated, and had a better reason to be written in this kind of way - though it could easily have moved the closing ` to after the line containing FILENAME. The same issues arise with ( ) sub-shells cat << FO0 | ( while read line do whatever || break done if [ "$line" = something ] then something -else f ) In that, where is the correct spot to put the here document data? This one doesn't even have the easy answer "just move the << to later" that exists in the earlier case. However, it could be written cat << FOO | data data data FOO (while read line do ... ) which I suspect all shells would parse as intended. Much the isame issue arises if the ( ) are not used, as in cat << FOO | while read line Here doc data here, or not ?? do ... done Or is the here doc data here? And for the other issue (cat << FILE1; cat << FILE2) | wc -l data for file1 FILE1 data for file2 FILE2 I know, an unlikely command, but still... Is this correct, or should it be written as (cat << FILE1; cat << FILE2 data for file1 FILE1 data foe file2 FILE2 ) | wc -l I have no doubt that the second form is correct, but is the first correct as well? |
|||||||||||
Desired Action |
For the second issue, I believe a suitable solution is clear. Add words like It is unspecified whether the here document data for a here document relocation operator is required to occur in the same subshell environment as the operator. Applications shall ensure that when a here document redirection operator occurs in a subshell environment the data is also placed in that same environment. Though what that does to pipelines, and similar, I am not sure. That is cat << EOF | wc -l data EOF Would that remain valid? If not, how should the wording be fixed to allow that one to work, while requiring applications to keep here docs inside () and $( ) and `..` ? For the first issue, the solution appears to be less clear. I suspect that the best that can be done is ... It is unspecified whether a NEWLINE token that appears within shell input that is to be executed within a sub-shell environment, where the redirection operator occurs outside that sub-shell, is the "next NEWLINE token" which starts collection of data for the here document. Applications shall be code so as avoid this. But that is ugly... |
|||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | ||||||||||||
|
![]() |
|||||||||||
|
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |