Anonymous | Login | 2021-04-21 04:08 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001036 | [1003.1(2013)/Issue7+TC1] Shell and Utilities | Objection | Error | 2016-03-22 03:18 | 2017-06-10 00:58 | |||||||
Reporter | kre | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | |||||||||||
Name | Robert Elz | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | 2.7.4 | |||||||||||
Page Number | 2335-2336 | |||||||||||
Line Number | 74235-74256 | |||||||||||
Interp Status | --- | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001036: Errors/Omissions in specification of here document redirection | |||||||||||
Description |
Aside from the question of just which newline is the "next" newline, that has been canvassed (without resolution I can see) elsewhere, there are several problems with the specification of here documents. First, given that the here doc is processed after encountering a newline (which newline is the other issue) they must be largely processed as a side effect of lexical processing (as newlines, other than those that happen to be literal) no longer exist in the scanned form of the shell input, they have served as token delimiters, and are not otherwise relevant. This would suggest that the here document is processed during lexical analysis - and nothing in the specification contradicts that. The spec does say that (given an unquoted) delimiter word, the text is subject to various expansions. It does not say that those expansions should not be performed while reading the here doc text, however I believe that is (or should be) the intent - that is, if the here doc is never used because it is attached to a command that is never executed (on the "wrong" side of an && or similar) then the expansions in the here doc should not be performed. It could be I am missing something, but I cannot see any text that says that the expansions in the here doc should be evaluated in the context of the command that is about to use the data, immediately before it is used (in the appropriate sequence of all applicable redirect operations). Second, the text says ... If any character in word is quoted, the delimiter shall be... and in the following paragraph ... If no characters in word are quoted, all lines of the ... but I do not believe that is what is intended, and is not what is actually implemented in any shell I can find. Consider ... cat << ""EOF lines of text EOF The delimiter there is the string EOF in which none of the characters were quoted. True it was preceded by a quoted null string, but that contains no quoted characters. Hence no characters of the delimiter word were quoted, and according to the spec, "lines of text" should be subject to the various expansions. No-one implements it that way, it is not whether any characters in the word are quoted, but whether any quote characters were encountered while scanning word. Third, in cases where expansions are done, nothing makes it explicit that the end delimiter cannot be found as the result of an expansion. That is, in the following, there is one here document that happens to contain the string echo foo << EOF and not two here documents end=EOF cat <<EOF lines of text $end echo foo <<EOF another line EOF Of course, if the first question above is resolved to make it clear that expansions do not happen when the document is being read, this would be a moot point, as $end expanding to EOF would not be known while the here doc is being read, which I believe is the correct interpretation. Fourth, I am totally confused by the relationship between double quoting and backtick command expansions, section 2.2.3 appears to say that if backticks appear inside double quotes, then the double-quote interpretation continues through the command expansion (if that were not true, it would not be possible for a double quoted string to start before a ` command substitution, and end inside it, as a " inside the `...` would be the start of a new string, not the end of the previous one (the same as it is in $( ) command substitutions). The relevance of this to here documents is illustrated by the following ... echo "` cat << EOF X = $(( 1 + 2 )) EOF `" If things are as I have postulated, then the EOF is quoted (by the double quotes that surround the command substitution) and hence the here document should not be expanded, and echo should (eventually) output X = $(( 1 + 2 )) and not X = 3 but again, I do not believe this is in accordance with what any shell does. This again may be an artifact of the 2nd point above, and if the text is changed so that only quote characters encountered while scanning the delimiter word cause the expansion to be supressed, and not whether "characters are quoted" then this issue will go away. Fifth, and more minor I think, when the delimiter is not quoted, the text states that backslashes work the way they do in double quoted strings, and references section 2.2.3 for the details. There we are informed that inside double quotes, \ is only special (only a quote character) when the following character is one of \ " ` $ and newline (so for example "\n" is a two character string). But then (back to 2.7.4) the text goes on to say that inside the here document " is not special. The problem is that it is not clear whether \ continues to act as a quote character when followed by this non-special " or not (ie: is \" in a here document, with an unquoted delimiter word, one character, or two?) I believe two is correct. Sixth, and perhaps most important of all, there is no discussion of what is expected to happen when the input string ends before the here document delimiter is encountered. Most important, because unlike the previous issues where I believe all shells (all I could find) actually agree on what should be done, and the text just needs to be more clear, for this one, there is a difference of opinion. Some shells treat end of file as equivalent to the delimiter, and go ahead and execute whatever command the here document was attached to with as much input as they managed to gather (one issues a warning when it does this, but does it anyway, most that adopt this behaviour do it silently.) Other shells consider this to be a redirect error, suppress execution of the command, and set $? to indicate failure. Personally I believe that the latter is the best approach, as it avoids situations where the shell eats the entire rest of the script as the here document because of some error or other (the one that happens to me from time to time is that I cut & paste a script, or script segment, and the tabs that had been present get converted to spaces, and then the <<-EOF doe not stop on space space..EOF where it would have with tab EOF.) |
|||||||||||
Desired Action |
Change the words "If any character in word is quoted" to "If any quoting character is encountered while scanning word", and "If no characters in word are quoted" to "If no quoting characters are encountered while scanning word". At the end of the paragraph that currently starts "If no characters in word are quoted" add an extra sentence along the lines of ... "The expansions listed are preformed in the context of the command about to be executed to which the here document contents are to be input, and at the appropriate time in the sequence of all redirect operators applying to that command - if that command is never executed the here document shall not be expanded." Where it talks about <backslash> quoting in here documents with unquoted delimiter words, add some text to make it clear that even though a \ in "" quotes a ", a \ in a here document does not, and the sequence \" is 2 chars. Finally, add (somewhere) words to the effect "If the terminating line of the here document is not located before the shell exhausts its input, the behaviour is undefined, implementers are encouraged to treat this as a redirect error, but applications should not rely upon this." |
|||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | ||||||||||||
|
![]() |
|||||||||||||||||||
|
![]() |
|
(0003097) kre (reporter) 2016-03-22 06:19 |
When I look more closely, I think you can forget the fifth issue in the list (and the third of the desired actions), I missed the phrase "when considered special" in 2.2.3 (the description of \ inside "..."). But I think you can replace that one with a request that the "<newline>" in "begins after the next <newline>" (2nd paragraph in 2.7.4) should be changed to "NEWLINE token" (as is done in the description of token recognition in section 2.3 (2nd paragraph). The difference is that in nl=' ' there is a <newline> but no NEWLINE token. I think it is clear that in a sequence like cat <<EOF; nl=' ' line 1 line 2 EOF The here document starts at "line 1" not at the line containing just the ' character, even though that is the line after the next <newline> Similarly, there is a <newline> in the \ sequence (\ followed immediately by newline), that is also not a NEWLINE token, and a here document would not start after that either. This still leaves the question of NEWLINE tokens embedded in command substitutions and similar, where the << operator was outside. |
(0003098) joerg (reporter) 2016-03-22 19:52 edited on: 2016-03-23 11:15 |
I did not yet check the other cases but you are mistaken with respect to cat << ""EOF $$ EOF as it expands to the process id of the Bourne Shell and even ksh88 and ksh93 document exactly this behavior. From the ksh93 documentation: If any character of word is quoted, then no interpretation is placed upon the characters of the document. Otherwise, parame- ter expansion, command substitution, and arithmetic substitution occur, so it seems that you discovered a ksh bug. Check the original Bourne Shell at: http://schilytools.sourceforge.net/bosh.html [^] to verify that your example causes parameter substitution. Note that the documentation from the Bourne Shell, ksh88, ksh93, bash, mksh and zsh clearly mention that no expansion occurs when any of the characters from word is quoted. The dash man page is not written clearly and thus does not help. Looking at the ksh88 source, it is obvious that the deviating behavior from ksh is an unintended side-effect of the rewritten field splitting code that is used to strip off the quoting from "word". |
(0003099) geoffclare (manager) 2016-03-23 09:29 |
We are already fixing the "any character in word" problem: TC2 changes it to "any part of word". See 0000583 |
(0003100) joerg (reporter) 2016-03-23 10:57 |
Geoff, it seems that this was a mistake as from what I can say, the ksh behavior was changed unintentionally and the new text is in conflict with both the documentation and the behavior of the Bourne Shell. |
(0003101) geoffclare (manager) 2016-03-23 11:30 |
No it is not a mistake. That is how ksh88 behaves, and the POSIX shell was based on ksh88 not Bourne. |
(0003103) kre (reporter) 2016-03-24 00:04 |
Thanks for the pointer to issue 583 - I actually did a search (I looked for references to 2.7.4) and the search did not produce that one... But that resolution is fine for that point, though you might want to amend the language just a little more to handle the case where the whole redirection (including the delimiter word) is in a quoted environment (making it clear that quotes need to be explicit in "word" itself to count as quoting existing for this purpose). The suggestion (in 583) that "if quote removal changes the word, it was quoted" seems about right to me, kre |
(0003104) kre (reporter) 2016-03-24 00:18 |
To make the first point in the issue more clear (or one aspect of it anyway), consider the following ... unset X cat <<EOF ${X=2} EOF echo "${X-1}" No question but that the output from cat is "2" (a line containing 2), but what value does the echo line print, 1 or 2 ? That all depends upon the context in which the here doc is evaluated. If it is in the context of the shell running the script, then the answer is 2. On the other hand if it is in the context being established to run cat, then 1 would be the answer. My testing shows shells (I have to test) about equally divided on this issue, but I would have expected that 1 makes most sense, given that the here document is processed at the correct point in the sequence of redirections. kre |
(0003105) kre (reporter) 2016-03-24 00:28 |
For the sixth point, consider this example cat > File1 <<EOF 2>File2 lines of text but no line containing "EOF" and the script ends right here. There are several possibilities here, one is that the here doc with no end delimiter is a lexical or parser level syntax error, and nothing else is done with the command at all (a non-interactive shell would exit). That is the one I prefer... Another is that the faulty here document is discovered during redirect processing, after File1 is created, before File2 is created, and things stop in that state, again with a syntax error, but later in the processing. Third is that it isn't an error at all, cat is run, the data present is sent to File1, File2 is created, but empty, as there are no errors. Exit status would be 0. This solution (though seemingly quite common) I do not like at all, as this almost always indicate some kind of user error, and giving either half the intended data, or just as likely, more than intended when the end delimiter is entered in an incorrect way, is just as bad as being unable to open a file in a normal '<' redirection, so just going ahead and substituting some other file (/dev/null maybe) instead. Not sane. kre |
(0003124) chet_ramey (reporter) 2016-04-04 19:52 |
I'm interested in what the group would like to do about the EOF-as here-document-delimiter issue. As kre says, just about every shell I looked at allows EOF to delimit a here document (mksh is the notable exception). It's clearly existing practice, but the standard is silent. Does this render all these shells non-conformant? The other interesting case is whether or not a shell allows an instance of the delimiter immediately followed by the end of a command substitution (`)' or ``') to delimit a here-document. For example, what should the following output? x=$(cat <<EOF a b EOF) echo "$x" echo after There are varying behaviors. Shells allowing delimiter+right paren to delimit here document in $(...): ksh93, bash, mksh, zsh, posh Shells that do not: dash, BSD(s) sh Shells allowing delimiter+backquote to delimit here document in `...`: ksh93, SVR4.2 sh, bash, mksh, zsh, posh, dash, BSD(s) sh Even in this there is varying behavior: dash uses EOF (in the form of the end of the `...` command substitution) as the delimiter and includes the delimiter word as part of the here document. Is it worthwhile to add text saying the behavior is unspecified if the shell encounters end-of-file before finding the here-document delimiter? What about the command substitution case? |
(0003125) jilles (reporter) 2016-04-04 21:48 |
Given that $(case x in x) : ;; esac) is a single valid command substitution, I would expect the following two to be as well: $(cat <<EOF && EOF) EOF :) $(if :; then cat <<EOF EOF) EOF fi) If that is accepted but: $(cat <<EOF x EOF) is also to be a single valid command substitution, detecting whether the end marker with closing parenthesis is an end marker is rather complicated. There is no such issue with: `cat <<EOF x EOF` |
(0003126) joerg (reporter) 2016-04-05 12:20 |
`cat <<EOF x EOF` is parsed in a different way than $(cat <<EOF x EOF) The first one is parsed on a lexical base, i.e. the next unescaped "`" is searched for, before any instance in the shell tries to understand the here document. In the second form, there is a need to recursively call the parser in order to understand where the end of the $() command is. This is caused by the fact that the number of opening and closing parenthesis in a command is not always equal. For this reason, with $(), the here document is read in during the lexial scan already, because the lexical scan calls a recursive parser. Note that this is a POSIXLY correct command: echo $(if cat <<EOF 1 2 3 EOF) EOF then echo a fi) but it is not accepted by ksh93 because ksh93 implements a funny recognition of "EOF)". |
(0003127) kre (reporter) 2016-04-05 13:58 |
Re note 3124: Is it worthwhile to add text saying the behavior is unspecified if the shell encounters end-of-file before finding the here-document delimiter? I would say yes - along with an admonition on applications to always supply the end delimiter. The NetBSD shell is another which (now) complains about here docs without an end delim (treats it as a syntax error), which I think is far and away the best thing for shells to do - and I would encourage all of you who are implementers to do that. Since we made that change, we have (as far as I know) encountered exactly 1 script which was working only because of the "eof terminates a here doc" behaviour - and that one was almost certainly an accident (it was one of many scripts, several others of which also had here docs that ran to the end of the script, and only that one was missing the end delimiter). So, I would not be too worried that you will be breaking large numbers of scripts that are relying on EOF delimiting here docs, the users don't seem to know about this, or find it simple enough to add the string just before EOF... On the other hand, accidentally getting the here doc end delimiter incorrect is easy to do - say a space amongst the leading tabs that are to be stripped, or a simple typo - having that silently cause the rest of the script to be treated as here doc content, silently, and simply carrying on working, is not friendly. On: What about the command substitution case? That one should simply be regarded as incorrect, the spec is already clear on what makes an end delimiter, and a ) following the string is not it, the newline before the ) is required. (And yes, I agree, the `` case is quite different, in all ways, even though it seems initially to be just a different, more difficult to nest, equivalent.) |
(0003134) kre (reporter) 2016-04-07 00:46 |
Since it seems agreed that there is no consistency on what happens if a here doc is not terminated, and that applications neither need to, or ever should, rely upon any particular behaviour there (and in practice, do not seem to), I suggest that the following wording be added to the end of the normative text in section 2.7.4 (just before the informative example). [Sorry, I do not have page or line numbers - someone else will need to add those.] The effect of failing to detect the here-document delimiter before the shell exhausts its input stream is unspecified. Applications shall ensure the delimiter is present. And perhaps in a rationale section somewhere... Traditional shell behaviour has been to treat "end of file" as being equivalent to the delimiter of a here document, terminating the here document, usually without any indication, and continuing as if the delimiter had been recognised. This can cause problems where the delimiter had been intended to occur much earlier in the script, but was incorrectly entered - a mistake which for many other errors would have resulted in a syntax error, and an aborted script, instead simply generates incorrect results. Because of this some shell implementations have changed to reporting an undelimited here document as a syntax error. Other implementations are encouraged to do the same. or maybe something less wordy with similar effect... The other issues still need resolution. |
(0003135) kre (reporter) 2016-04-07 01:05 |
Another of my original issues, in which I believe (hope) there is no dissent (as in, I believe all shells act this way, it is just not, yet, written in the standard), also add to the normative text of section 2.7.4, in the paragraph that begins: If no characters in word are quoted, ... add the following sentence after the initial sentence of the paragraph (again, sorry, page/line numbers not available to me): This expansion happens after the here document delimiter has been recognised and the here document extracted from the input stream, and thus the end delimiter for the here document cannot be generated as a by-product of the expansion. And to make things even clearer (I believe this is how shells behave, but am less certain that this is universal), at the end of that same paragraph add When an unquoted backslash is followed by a newline, line joining occurs, and the backslash newline combination is removed. This occurs while the here document is being scanned, the end delimiter will not be recognised immediately after a newline that has been deleted in this way. |
(0003572) stephane (reporter) 2017-02-25 14:12 |
About your last sixth point, I'm not sure I see a problem. It's currently unspecified so applications have to make sure there delimiter is provided and implementations can do what they want when it's not, allowing warning or error message, or ignore the problem, all of which are valid approaches to me. There's also the case of eval "cat << EOF" xxx EOF (and the same with the "." command) that may need to be covered. |
(0003592) kre (reporter) 2017-03-03 15:57 |
Re note 3572 ... I agree that the behaviour is unspecified in the literal sense (in that the spec says nothing about it at all), I do not agree that is adequate however - if unspecified behaviour is what is expected in this case (and I'd certainly accept that as an outcome for that point), it ought to be explicitly unspecified, not just literally. kre |
(0003690) kre (reporter) 2017-05-11 13:46 |
When this issue reaches the head of the queue, it might be worthwhile spending a minute or two on the subject of \newline continuation lines in here docs, and their effect on tab suppression, and end-string recognition. For the first of those, given cat <<-EOF \ X EOF (where the white space is supposed to represent a tab character, but is spaces here, as I cannot seem to input a tab in the form...) what exactly is expected to be written to stdout? That is, is the tab before X a leading tab, or not? For the second, the following script (with the same caveat about spaces and tabs...) EOF() { printf 'EOF executed as a command\n'; } cat <<-EOF \ EOF EOF executes differently in different shells, in some it simply says "EOF" (where the second EOF is the end-string) and in others it says "EOF executed as a command" where the first EOF is the end string. (Here similar things happen without the tab stripping if the \ line contains only a \ character). |
(0003691) joerg (reporter) 2017-05-11 14:01 edited on: 2017-05-11 14:05 |
Re: Note: 0003690 With your first example, all shells except ksh93 seem to print just a X With your second example, the historic Bourne Shell, ksh88, bash, bosh, mksh print "EOF executed as a command" While ksh93, dash, yash, zsh are non-compliant. You discovered an interesting aspect. |
(0003756) kre (reporter) 2017-06-10 00:58 |
There is another case worth considering ... cat <<'EOF REALLY' Hello EOF REALLY Is this supposed to work, or not? I see nothing in the text that prohibits a \n as one of the characters of the end delimiter (obviously, like spaces, and other operator type characters, it can only occur in a quoted delimiter) My tests show that yash simply forbids it. Nothing else I tested does, though when the here doc delimiter contains a \n, none of bash, zsh, mksh, or bosh seem to recognise anything as the delimiter. The ash derived shells (dash, freebsd netbsd) and ksh93 all just say "Hello" when given the command above (the two line end delimiter is handled just fine .. I didn't test more than two, but I am fairly confident that at least the FreeBSD and NetBSD shells would handle as many embedded \n's as you want to give - probably the others as well.) |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |