View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0001190 | 1003.1(2016/18)/Issue7+TC2 | Base Definitions and Headers | public | 2018-04-13 11:16 | 2024-06-11 09:08 |
Reporter | geoffclare | Assigned To | |||
Priority | normal | Severity | Comment | Type | Clarification Requested |
Status | Closed | Resolution | Accepted As Marked | ||
Name | Geoff Clare | ||||
Organization | The Open Group | ||||
User Reference | |||||
Section | 9.3.5 | ||||
Page Number | 184 | ||||
Line Number | 6089 | ||||
Interp Status | Approved | ||||
Final Accepted Text | 0001190:0004277 | ||||
Summary | 0001190: backslash has two special meanings in the shell and only loses one of them in bracket expressions | ||||
Description | XBD 9.3.5 item 1 says:The special characters '.', '*', '[', and '\\' (<period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression. In the case of <backslash>, in the shell the character has two different special meanings and this text does not make clear that it is only referring to the pattern-matching special meaning of <backslash> and does not affect its shell-quoting special meaning. | ||||
Desired Action | On page 184 line 6089 section 9.3.5 RE Bracket Expression, after:... shall lose their special meaning within a bracket expression.add a small-font note: Note: In the context of shell pattern matching, although <backslash> ('\\') loses its special meaning as a pattern matching character in bracket expressions, in situations where shell quoting is performed it is still a shell escape character as described in [xref to XCU 2.2 Quoting]. For example:$ ls ! $ - a b c $ echo [a\-c] - a c $ echo [\!a] ! a $ echo ["!\$a-c"] ! $ - a c $ echo [!"\$a-c"] ! b | ||||
Tags | tc3-2008 |
|
Just to restate what I have said on the list, I don't think this is the correct approach to solve the problem at all. Much better would be to completely divorce this section (which deals with regular expressions) from anything related to shell pattern matching (glob patterns) which are superficially similar, but really completely different, and should be described independently. But if it is eventually decided to do it this way for some reason, the examples should avoid the complication of needing \$ inside the double quotes by simply using single quotes instead - none of them are relying on the expansions that can happen in double quoted strings. |
|
The examples demonstrate cases where backslash is a shell-quoting escape character. Inside single quotes it is not, and that is the reason I did not include any examples using single quotes. |
|
Geoff, what are those "two special meanings" you're refering to? AFAICT, there's only one: a quoting operator. \ is not a glob operator in shells, it's only for for fnmatch() patterns. It's just that quoted characters are not considered as glob operators in shells. Note that the notion of "special meaning" inside bracket expressions would have to be clarified. For instance, the example seems to imply that by quoting that "-", its "special meaning" as a range operator was removed, but would [\a-c] remove "a"'s special meaning as a range start? What about ["[:alnum:]"], [[:"$class":]], [[=$'\ue9'=]], etc (and there are variations between implementations there). |
|
Special meaning 1 is an escape character in shell quoting, as per 2.2.1 "A <backslash> that is not quoted shall preserve the literal value of the following character, with the exception of a <newline>." Special meaning 2 is an escape character in pattern matching, as per 2.13.1 "A <backslash> character shall escape the following character. The escaping <backslash> shall be discarded." |
|
Another problem with this part of 9.3.5 has been identified in email discussion: it only lists the special characters for BREs. It should have different lists for BREs, EREs and shell patterns. New proposed changes... On page 184 line 6087 section 9.3.5 RE Bracket Expression, change: The special characters '.', '*', '[', and '\\' (<period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.to: When the bracket expression appears within a BRE, the special characters '.', '*', '[', and '\\' (<period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within the bracket expression. When the bracket expression appears within an ERE, the special characters '.', '(', '*', '+', '?', '{', '|', '$', '[', and '\\' (<period>, <left-parenthesis>, <asterisk>, plus-sign>, <question-mark>, <left-brace>, <vertical-line>, dollar-sign>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within the bracket expression; <circumflex> ('^') shall lose its special meaning as an anchor. When the bracket expression appears within a shell pattern (see [xref to XCU 2.13]), the special characters '?', '*', and '[' (<question-mark>, <asterisk>, and <left-square-bracket>, respectively) shall lose their special meaning within the bracket expression; whether or not <backslash> ('\\') loses its special meaning as a pattern matching character is described in [xref to XCU 2.13.1], but in contexts where a shell-quoting <backslash> can be used it shall retain its special meaning (see [xref to XCU 2.2]). For example:$ ls ! $ - \ a b c $ echo [a\-c] - a c $ echo [\!a] ! a $ echo ["!\$a-c"] ! $ - a c $ echo [!"\$a-c"] ! \ b $ echo [!\]\\] ! $ - a b c |
|
That's wrong for shell pattern matching. As I said earlier, backslash is not a pattern matching operator in shell wildcards (not any more than ' or "), it's only a quoting operator and quoting disables wildcard operators. For instance, pattern='\*' case $string in $pattern) echo something;; esac Matches on any string that starts with backslash, not on a literal star (at least in Bourne/ksh/ash/pdksh, not in bash nor zsh (which match on *)). That's different for find . -name '\*' Which matches on files called literally "*" (same as -name '[*]') as in fnmatch() backslash is used as an ersatz of shell quoting. Also note that $ loses its special meaning inside [...] in ERE (in BRE, is loses it already by not being the last character of the RE). |
|
Note that:pattern='[\]*' case $string in $pattern) echo something;; esac matches on []anything in bash and dash, gives an error in zsh, matches on \anything in Bourne/ksh88/yash/mksh/busybox-sh (as I'd expect), only on [\]* (that I can tell) in ksh93. So it does look like a jolly mess. |
|
More fun:$ ksh -c 'case "[\\]" in [\\]) echo yes;; esac' yes $ ksh -c 'case "\\" in [\\]) echo yes;; esac' yes (both ksh88 and ksh93, also in Bourne). Same for [\\\\] or [\\\\\\\\]. It looks as if those shells resort to string equality comparison when the patterns don't match (case [a] in [a]) echo yes;; esac also matches). AFAICT, that is not allowed by POSIX. It does look like a historical "feature", as ksh doesn't do it for its [[ $string = $pattern ]] other pattern matching operator. What that means is that it looks like it's impossible to have a variable contain a pattern meant to match a string that starts with backslash. - pattern='\*' doesn't work in bash/zsh - pattern='[\\]*' doesn't work reliably in ksh88/ksh93. |
|
Re: 0001190:0003960, XCU 2.13.1 clearly defines a pattern-matching rule, distinct from the usual quoting rule, for backslash in the shell (in the first paragraph - it specifies it separately in the last paragraph for non-shell pattern matchers). It appears that bash and zsh are implementing the standard as written but the other shells you tested are not. When testing this stuff note that ksh93 is known to behave incorrectly as regards quoting inside bracket expressions - that was the reason this whole discussion started in the first place. ksh88 also has some weird bugs, such as ["a\-c"] matching 'a', backslash and 'c' but not '-'. Re: 0001190:0003962, your final observation would seem to be a reason to keep the standard's requirements in 2.13.1 as-is, so that pattern='\\*' can be used for this, which works in bash and presumably zsh. |
|
I have edited 0001190:0003959 to add '$' and '^' in the part about ERE special characters. |
|
Re: 0001190:0003963 Hmmm. Looks like pattern='\\*' is yet another different case with different differences between shells. In Bourne/ksh88/mksh/yash/FreeBSD-sh, it matches on \\anything (as I'd expect), with dash, ksh93, bash, zsh, it matches on \anything instead. For busybox-sh, I see two different behaviours with 2 different versions. Note that as per your proposed text, if I understand correctly, bash, dash and zsh would not be compliant as with pattern='[\]*', they match on []anything instead of \anything and with pattern='[\-^]', they match on - and not \. That is backslash didn't lose its special meaning as a wildcard operator. |
|
Could you please present a reproducable script to verify your claims and could you please mention which ksh93 version you are testing? |
|
Re: 0001190:0003966 That was ksh93u+ on Ubuntu 16.04 amd64. Try for instance #! /usr/bin/env bash export PATTERN STRING set -o noglob while read -r PATTERN strings; do printf '\n%s\n' "$PATTERN" for shell do printf ' %12s[1]:' "$shell" for STRING in $strings; do (exec -a sh "$shell" -c ' case $STRING in $PATTERN) ;; *) exit 1; esac') && printf ' %s' "$STRING" done printf '\n %12s[2]:' "$shell" for STRING in $strings; do (exec -a sh "$shell" -c " case \$STRING in $PATTERN) ;; *) exit 1; esac") && printf ' %s' "$STRING" done echo done done << 'EOF' [\]* \anything []anything [\]anything [\]* [\\]* * [\\]* \anything []anything [\]anything [\]* [\\]* * \* \anything anything \* * \\* \anything \\anything \\* \* * EOF To run as that-script dash bash ksh93 mksh posh yash zsh busybox schily-sh |
|
The Bourne Shell, ksh88 and bosh give this result: [\]* sh[1]: \anything [\]* sh[2]: [\\]* sh[1]: \anything [\\]* sh[2]: \anything [\]* \* sh[1]: \anything \* sh[2]: * \\* sh[1]: \\anything \\* sh[2]: \anything \\anything \\* \* It has been manually verified for correctness. |
|
Let me add another script: ---> if [ "$BASH_VERSION" != "" ]; then echo() { command echo -e "$@"; } fi chk() { echo [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]; } mkdir td && cd td || exit printf '%s\n' '---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c]' echo ": \c"; chk :> a; echo "a: \c"; chk; rm a :> b; echo "b: \c"; chk; rm b :> ./-; echo "-: \c"; chk; rm ./- :> c; echo "c: \c"; chk; rm c :> _; echo "_: \c"; chk; rm _ :> \\; echo "\\: \c"; chk; rm \\ :> d; echo "d: \c"; chk; rm d rm -f * cd .. rmdir td <--- Call: $shell ./test-script Expected result: ---> [a-c] ["a-c"] ["a\-c"] [\a\-\c] [a\-c] : [a-c] [a-c] [a\-c] [a-c] [a-c] a: a a a a a b: b [a-c] [a\-c] [a-c] [a-c] -: [a-c] - - - - c: c c c c c _: [a-c] [a-c] [a\-c] [a-c] [a-c] \: [a-c] [a-c] \ [a-c] [a-c] d: [a-c] [a-c] [a\-c] [a-c] [a-c] |
|
The above discussion glosses over that the current requirements, as I read it, are that there are two contexts for evaluation of patterns; before and after quote removal has been performed. Before removal usage can occur, it appears, when evaluating ASSIGNMENT_WORD's, cmd_suffix WORD's and the operand to the in reserved word, as a glob expansion; after removal usage applies to case labels, and is limited to clauses XCU 2.13.1 and 2.13.2 as the file system is not being implicitly accessed. I believe each case has to be considered separately on how '\' is treated, as an escape or ordinary character, and this may entail additions to the grammar to make the distinction's fully normative. |
|
Re note 3972 There should be just one definition of shell pattern matching, which should apply to all three places where patterns are used in sh: glob (aka pathname expansion), case matching, and substring extraction (${var%pattern} etc). In particular. all of them operate on the pattern before any quote removal is done (the change proposed in issue number 985, which added quote removal as one of the espansions to be done on patterns is simply wrong, does not match historical practice, and simply complicates things. There's no reason for it, since, as you say, glob and substring matching happens before quote removal (and manages to work) then case pattern matching can also happen before quote removal, and work in exactly the same way. The one difference is that after matching (or not) a case pattern is not of any further use, so there is nothing to happen to it afterward - quote removal is not needed at all, ever. How much the definition of shell matching is intertwined with the (totally different) pattern matching definition for regular expressions is a different issue - I'd prefer the answer to be "not at all", but as long as the specification ends up correct, this is really a style/editorial issue which does not matter too much (mixing them together saves a few lines of close to duplicated text, at the expense of needing "except when used ..." type noise added in a few places, which makes it all harder to read.) Re Joerg's note 3969 - aside from the perverse use of echo in a non-portable way (for no particular reason I can see, printf would work just as well, and in a fully specified and portable manner, and not need a special case "if it is bash" hack - which omits the ash derived shells which also have echo -e, if you wanted to use it). Never mind, if that script were redone using printf instead of echo, I have no issues with it, or its results from what I can tell. However, the "manually verified for correctness" results from Stephane's script given in note 3968 do not match what any other shells do. They obviously suffer from the "if the pattern doesn't match, use strcmp instead) nonsense that those shells implement (which is non-standard, perverse, and unneeded -- if the script author wants to match as a literal string, that's easy to specify as an alternative pattern, just by quoting it). Best practice results are: [\]* (older ash shells match \anything for [1]) bash[1]: []anything bash[2]: []anything [\\]* (excluding strcmp() most (perhaps all) agree) bash[1]: \anything bash[2]: \anything \* (older ash shells get this one wrong too) bash[1]: * bash[2]: * \\* (and this) bash[1]: \anything \\anything \\* \* bash[2]: \anything \\anything \\* \* The rationales are: [\]* cannot contain a bracket expression, as there is no closing ']', the ] that is there is quoted. The \ serves only that purpose, it quotes the ] (this happens as a shell quoting mechanism, there is no \ inside [] for the \ to be "not considered special" here. So, all that is left is a sequence of plain chars, '[]anything' and hence that is all that can match. [\\]* here the first \ (as a shell quoting character) quotes the second \ and so the pattern match is handed [\]anything (with the \ quoted, which is irrelevant, as a \ is not special inside [], quoted or not). So we have a bracket expr containing a single char, which therefore matches that char as a literal, followed by anything. So \anything matches (and nothing else). This is the one all shells agree on (ignoring strcmp nonsense.) \* This is just a quoted asterisk, and hence nothing special, it matches only itself, as does any other quoted character, the only possible match for this is the string '*'. \\* is a quoted \ (the first \ quotes the second) followed by an unquoted asterisk (match anything) and consequently matches any string starting with a literal backslash. All of these match exactly the same way (match the exact same inputs) when used as glob patterns, case patterns, and substring matching. Note the quoting characters were not removed, but still served as quoting chars, if the quotes had been removed, \* would just be '*' and would match anything, so ls \* would match all files - which we all know, and agree, is not what happens (I hope.) For the exact same reason ${var%\*} deletes a literal '*' from the end of the value of ${var} (if it was there). And in a case match, \* as a pattern matches a literal '*' (it is the exact same matching operation). For differences: zsh treats [\]* as a bad pattern - which is some new invention, as sh glob patterns (as distinct from their csh cousins) never had any notion of "bad pattern" - everything has a defined meaning. Old ash shells, and yash treat the [\] the same as [\\] which is just a bug (it cannot rationally be anything else (after all, the inputs are different, a different character is quoted, they're not the same at all.) The other patterns tested (except for \\* which everyone handles - modulo strcmp) have similar misinterpretations in old ash and yash ("old ash" as in dash, and the NetBSD sh no longer work that way, but match as expected.) [Aside: you need a recent NetBSD sh to get a version with the bugs fixed.] |
|
Interpretation response ------------------------ The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: ------------- None. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- Make the changes in 0001190:0003959 |
|
As ISO/IEC JTC 1/SC 22 OR I approve this change. |
|
As ISO/IEC JTC 1/SC 22 OR I approve this change. |
|
As IEEE PASC OR I approve this change. |
|
I'm personally objecting to backslash having no meaning within bracket expressions. There are too many utility implementations where [\t] doesn't match on backslash and t or where [\]] matches a ] only that in practice one needs to write it [\\t] [\\]] if one wants a literal backslash to be matched. Modern (non POSIX) regexps (perl, perl-like or derived from it) treat backslash specially within [...] and are becoming de-facto standards. Requiring backslash to be treated specially hinders progress as it forbids common extensions like [\t] [\d\s]... |
|
It also doesn't address the fact that \ has no special meaning as a wildcard operator (2.13.1 wrong for shells) in any of the original sh implementations (Thompson, Mashey, Bourne, Almquist, Forsyth, Korn, also pdksh, yash). It's only a quoting operator there. As already mentioned, echo '*'* and echo \** match files whose name starts with * because the first * is quoted, but pattern="'*'*" or pattern="\**"; echo $pattern (where $pattern is not inside single quotes) in those shells match files like 'foo'bar or \foobar, not *foobar (bash and zsh are the only two exceptions). In practice, if an application wants to store a pattern in a variable and match a literal \ or * there, it needs to use bracket expressions: pattern='[\\]*' (the double backslash my point in 0001190:0004288) pattern='[*]*' Also, please make it clear, maybe as a "rationale" section stating why it was not considered for inclusion (and also add a conformance test case) that the Bourne/Korn feature by which "case [a] in [a]) echo match; esac" matches (fall back to strcmp when pattern matching doesn't match) is not allowed. Still allow "rm [a]" to remove "[a]" when there's no "a" file in the current directory, another misfeature introduced by the Bourne shell but unfortunately followed by other Bourne-like shells (except zsh when not in sh mode) and specified by POSIX (some shells have a nomatch, failglob or cshnullglob option to work around it). |
|
The points raised in 0001190:0004288 and 0001190:0004289 are separate issues that do not affect the shell quoting issue that this bug addresses. Please file separate defect reports for them. |
|
Re: 0001190:0004290 Done now (bugs 1233, 1234 and 1235). They (1234 in particular) are not completely separate though. |
|
Wrt notes 3959 and 4277 ... Leaving aside any other issues, and accepting that the substance of this issue is approved, I still believe that the examples in the text to be added should be written as: $ ls ! $ - \ a b c $ echo [a\-c] - a c $ echo [\!a] ! a $ echo ['!$a-c'] ! $ - a c $ echo [!'$a-c'] ! \ b $ echo [!\]\\] ! $ - a b c which is simply substituting ' for " quoting (in two of them) and then removing the (no longer appropriate) \ before the $ in the quoted string. That \ only confuses things here, it is not relevant for the purposes of the example (it simply avoided parameter expansion inside the string) and is best avoided for this purpose. |
|
Re 0001190:0004308, as I said in 0001190:0003955, I chose examples which demonstrate cases where backslash is a shell-quoting escape character. Since it is still a shell-quoting escape character when inside double-quotes, there should be examples that include backslash in double quotes. I would have no objection to adding single-quote cases to the examples, but not to removing the double-quote cases. |
|
Note from the March 14, 2019 conference call: Since there is an on-going discussion about 0001234, this interpretation is on hold. Once 0001234 is resolved, we will either re-open this bug and supply a new interpretation or close this bug and handle the issue in the changes for that bug. |
|
Per the 23 Sept 2019 conference call, now that 0001234 is ready, the following changes are needed to this bug. They are small enough that we edited 0001190:0003959 in place, changing:with:
|
|
Interpretation proposed: 7 October 2019 |
|
Interpretation Approved: 11 Nov 2019 |
Date Modified | Username | Field | Change |
---|---|---|---|
2018-04-13 11:16 | geoffclare | New Issue | |
2018-04-13 11:16 | geoffclare | Name | => Geoff Clare |
2018-04-13 11:16 | geoffclare | Organization | => The Open Group |
2018-04-13 11:16 | geoffclare | Section | => 9.3.5 |
2018-04-13 11:16 | geoffclare | Page Number | => 184 |
2018-04-13 11:16 | geoffclare | Line Number | => 6089 |
2018-04-13 11:16 | geoffclare | Interp Status | => --- |
2018-04-13 11:18 | geoffclare | Desired Action Updated | |
2018-04-13 12:39 | kre | Note Added: 0003954 | |
2018-04-13 13:47 | geoffclare | Note Added: 0003955 | |
2018-04-14 17:17 | stephane | Note Added: 0003957 | |
2018-04-16 08:31 | geoffclare | Note Added: 0003958 | |
2018-04-16 09:23 | geoffclare | Note Added: 0003959 | |
2018-04-16 09:25 | geoffclare | Note Edited: 0003959 | |
2018-04-16 11:06 | stephane | Note Added: 0003960 | |
2018-04-16 11:10 | stephane | Note Edited: 0003960 | |
2018-04-16 11:12 | stephane | Note Edited: 0003960 | |
2018-04-16 11:21 | stephane | Note Added: 0003961 | |
2018-04-16 11:22 | stephane | Note Edited: 0003961 | |
2018-04-16 11:27 | stephane | Note Edited: 0003961 | |
2018-04-16 11:45 | stephane | Note Added: 0003962 | |
2018-04-16 12:18 | stephane | Note Edited: 0003962 | |
2018-04-16 12:22 | stephane | Note Edited: 0003962 | |
2018-04-16 14:41 | geoffclare | Note Added: 0003963 | |
2018-04-16 14:50 | geoffclare | Note Edited: 0003959 | |
2018-04-16 14:52 | geoffclare | Note Added: 0003964 | |
2018-04-16 16:27 | stephane | Note Added: 0003965 | |
2018-04-16 16:39 | stephane | Note Edited: 0003965 | |
2018-04-16 16:57 | joerg | Note Added: 0003966 | |
2018-04-16 20:11 | stephane | Note Added: 0003967 | |
2018-04-16 20:13 | stephane | Note Edited: 0003967 | |
2018-04-16 20:17 | stephane | Note Edited: 0003967 | |
2018-04-17 10:21 | joerg | Note Added: 0003968 | |
2018-04-17 10:26 | joerg | Note Edited: 0003968 | |
2018-04-17 10:34 | joerg | Note Added: 0003969 | |
2018-04-19 06:27 | shware_systems | Note Added: 0003972 | |
2019-02-18 14:22 | kre | Note Added: 0004257 | |
2019-03-04 16:50 | eblake | Relationship added | related to 0000985 |
2019-03-07 16:09 | geoffclare | Note Edited: 0003959 | |
2019-03-07 16:12 | geoffclare | Note Added: 0004277 | |
2019-03-07 16:13 | geoffclare | Interp Status | --- => Pending |
2019-03-07 16:13 | geoffclare | Final Accepted Text | => 0001190:0004277 |
2019-03-07 16:13 | geoffclare | Status | New => Interpretation Required |
2019-03-07 16:13 | geoffclare | Resolution | Open => Accepted As Marked |
2019-03-07 16:13 | geoffclare | Tag Attached: tc3-2008 | |
2019-03-07 18:47 | nick | Note Added: 0004284 | |
2019-03-07 18:47 | nick | Note Added: 0004285 | |
2019-03-07 19:00 | Don Cragun | Note Added: 0004286 | |
2019-03-07 20:33 | stephane | Note Added: 0004288 | |
2019-03-08 07:19 | stephane | Note Added: 0004289 | |
2019-03-08 09:45 | geoffclare | Note Added: 0004290 | |
2019-03-09 00:48 | stephane | Note Added: 0004293 | |
2019-03-12 03:44 | kre | Note Edited: 0004257 | |
2019-03-12 03:50 | kre | Note Added: 0004308 | |
2019-03-12 11:16 | geoffclare | Note Added: 0004312 | |
2019-03-14 15:58 | Don Cragun | Relationship added | related to 0001234 |
2019-03-14 16:05 | Don Cragun | Note Added: 0004321 | |
2019-03-14 16:09 | Don Cragun | Note Edited: 0004321 | |
2019-09-23 15:37 | eblake | Note Added: 0004563 | |
2019-09-23 15:39 | eblake | Note Edited: 0003959 | |
2019-09-23 15:42 | eblake | Note Edited: 0003959 | |
2019-10-07 15:18 | agadmin | Interp Status | Pending => Proposed |
2019-10-07 15:18 | agadmin | Note Added: 0004611 | |
2019-11-11 12:21 | agadmin | Interp Status | Proposed => Approved |
2019-11-11 12:21 | agadmin | Note Added: 0004655 | |
2019-12-12 10:21 | geoffclare | Status | Interpretation Required => Applied |
2024-06-11 09:08 | agadmin | Status | Applied => Closed |