Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001234 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Editorial Enhancement Request 2019-03-08 23:58 2019-12-12 10:45
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Stephane Chazelas
Organization
User Reference
Section 2.13.1
Page Number 2382
Line Number 76212-76215
Interp Status Approved
Final Accepted Text Note: 0004564
Summary 0001234: in most shells, backslash doesn't have two meaning wrt pattern matching
Description That's a follow-up on bug:1190 where Geoff claims backslash has
two meanings in shell, one as a quoting operator and another one
as an "escaping" pattern matching operator.

The relevant part of the spec in 2.13.1:

> A <backslash> character shall escape the following character.
> The escaping <backslash> shall be discarded. If a pattern ends
> with an unescaped <backslash>, it is unspecified whether the
> pattern does not match anything or the pattern is treated as
> invalid.

That text seems to me to be referring to what happens in
fnmatch() where backslash is used as a substitute to shell
quoting.

In most shells, beside quoting, backslash doesn't have that
"second" meaning for shell globbing and "case" pattern matching.

See also the last paragraph in that same section:

> When pattern matching is used where shell quote removal is not
> performed (such as in the argument to the find −name primary
> when find is being called using one of the exec functions as
> defined in the System Interfaces volume of POSIX.1-2017, or in
> the pattern argument to the fnmatch( ) function), special
> characters can be escaped to remove their special meaning by
> preceding them with a <backslash> character. This escaping
> <backslash> is discarded. The sequence "\\" represents one
> literal <backslash>. All of the requirements and effects of
> quoting on ordinary, shell special, and special pattern
> characters shall apply to escaping in this context.

That paragraph makes little sense if like Geoff we interpret the
first one above as meaning backslash has a special meaning
independent of quoting in all cases including shell and
fnmatch().

There is one and only one shell however that implemented Geoff's
interpretation: bash. And it's done partly (but differently) in
ksh93, some ash-derived shells and zsh. In all other shells:
Thomson sh, Mashey sh, Bourne sh, ksh88, Almquist sh, Forsyth
sh, pdksh, mksh, yash, backslash is just a quoting operator like
'...' and "..." in that regard (and quoting removes the special
meaning of wildcard operators), backslash doesn't have a special
meaning.

glob came from Unix V1 where the shell would invoke a /etc/glob
helper to expand globs. That /etc/glob didn't treat backslash
specially. sh would convey to glob what characters were quoted
by setting the 8th bit on them. /etc/glob was moved inside the
shell in the Mashey shell, and the Bourne shell fixed most of
the issues (and added some) but worked on the same principle:
quoting marks characters as "quoted" internally which prevents
them from being treated as wildcard by the globbing routine

The second meaning, if any, only reveals itself when the pattern
comes from some expansion, as in:

   files='\?*.ext'; ls -d -- $files

In all shells but bash and zsh (in sh emulation), that expands
to the file names that start with \ followed by one or more
characters and ends in .ext as \ is not a wildcard operator
there, just a quoting one and that \ is not used as a quoting
operator.

In bash and zsh, that expands to the file names that start with
? and end in .ext.

zsh doesn't implement Geoff's interpretation of the POSIX
requirement fully as the \ above is only considered as an escape
operator before wildcard operators. It's not removed when
followed by normal ones. You'll see the difference in:

   $ touch ab '\a\b' 'a*'
   $ p='\a\b' bash -c 'ls -d -- $p'
   \a\b
   $ p='\a\b' bash -c 'ls -d -- $p*'
   ab
   $ (p='\a\b' exec -a sh zsh -c 'ls -d -- $p*'
   \a\b

Where it gets a bit silly is in:

   $ p='a\*' bash -c 'printf "%s\n" $p
   a\*

That * is escaped (by a wildcard operator?) but since it doesn't
contain wildcard operators anymore (so the answer is "no"), it's
left as is and doesn't match the a* file. (see how it differs in
"case" statements below).

In dash and busybox sh (but not other ash derivatives like
NetBSD sh or FreeBSD sh) and ksh93, while backslash doesn't have
a second meaning for globs, it seems to have one in case
statements.

In those (and in bash)

   p='a\*'; case "a*" in $p) echo match; esac
   p='\a'; case "a" in $p) echo match; esac

outputs "match" twice (note the difference with globs for the
first one in bash and zsh).
Desired Action I don't think that second meaning was intended in the POSIX
specification. If it was, it was poorly worded. Still, some
shells (very few) have implemented it, if only partially.

I would recommend suggestion of that second meaning be removed
from the specification, but that shells that implemented it be
accommodated.

So remove the first quoted text above, undo the changes in
bug:1190 that suggest that backslash has a second meaning in
shell wildcards, keep the last paragraph about find/fnmatch()
(possibly clarified as per bug:1233), but make it clear to
application writers, that in a wildcard pattern that comes from
an unquoted parameter expansion or command substitution, a
backslash should be matched by [\\] (the double backslash as per
bug:1233), and a wildcard operator ([, *, ?) by [[], [*], [?]
respectively:

Maybe clarified with some examples:

- files='[?]*'; ls -d -- $files for the list of files that start
  with ?
- files='[\\]*'; ls -d -- $files for the list of files that start
  with \
- p=*; ls -d -- '\'$p for the list of files that start with \
- files='\?*'; ls -d -- $files unspecified
- files='\a*'; ls -d -- $files unspecified
- files='*\'; ls -d -- $files unspecified
- pattern='\a'; case $var in $pattern) ...; esac unspecified

Now for:

- files='\a'; ls -d -- $files
- files='a\*'; ls -d -- $files

(globs that contain backslashes but no unescaped wildcard) we
can make it unspecified for consistency with the "case"
equivalent above, or since even bash appears to not treat the \
specially there, require that it be expanded as is.
Tags tc3-2008
Attached Files

- Relationships
parent of 0001295Closed 1003.1(2016/18)/Issue7+TC2 Left brackets in shell patterns may cause other pattern matching characters to be taken literally in all contexts 
related to 0001190Applied 1003.1(2016/18)/Issue7+TC2 backslash has two special meanings in the shell and only loses one of them in bracket expressions 
related to 0000247Closedajosey 1003.1(2008)/Issue 7 Add nullglob (null globbing) support to shell's "set" and glob() 
related to 0000985Applied 1003.1(2013)/Issue7+TC1 quote removal missing from case statement patterns and alternative expansions 

-  Notes
(0004296)
kre (reporter)
2019-03-11 02:42
edited on: 2019-03-11 03:16

I regard this:

    In most shells, beside quoting, backslash doesn't have that
    "second" meaning for shell globbing and "case" pattern matching.

as an indication that most shells have had bugs in the matching code.

That is, few even really considered what happens with
    ls $var

when var contains pattern matching characters. Everyone agrees
that pathname expansion is performed on the result, so if var='*.c'
the ls would list all files with names ending in .c

Similarly, all agree that if one wants to avoid pathname expansion,
the parameter expansion just gets quoted
    ls "$var"
will only list the file '*.c'

But it was not historic practice to correctly deal with the possibility
that var might contain both characters that are to be treated specially,
and those that are not. All shells correctly handle

    ls \**.c

to list all files with names starting with a '*' and ending with '.c'
but (until relatively recently) few provided any method to allow such
a pattern to be stored in a variable, and accessed using parameter
expansion.

Quoting the parameter expansion cannot work - that always disables pathname
expansion (as does "set -f"). So there *must be* a method to allow a
parameter expansion to contain both wildcard characters, and those characters
treated as literal characters.

The way this is done in all other contexts is by using \ as the (one and
only) escape character. In:
      find -name '\**.c' ...
works, the \ used this way works in regular expressions. Everywhere but
in broken shells.

I cannot treat this defect as anything other than a bug in those shells,
however common it is for this bug to exist.

Wrt:
        files='\?*.ext'; ls -d -- $files

    In all shells but bash and zsh (in sh emulation), that expands
    to the file names that start with \ followed by one or more
    characters and ends in .ext as \ is not a wildcard operator
    there, just a quoting one and that \ is not used as a quoting
    operator.

First, it would be wrong to ever consider a \ as a wildcard operator.
It is not, never was, never will be. It is a quoting character,
and should work there in the shell as it does in other applications.

Further, contrary to what that says, in the (fixed) NetBSD sh (fixed
since late last year, and will be in both 8.1 - as it is a bug fix, and
9 when that appears) this would expand to files whose names start with
a '?' and end in ".ext" - the ? is quoted, and so matches literally,
exactly the same as would happen if the contents of the var were listed
on the command line, or if one did

     eval command "${files}"

but without the side issues that has if files contains other quoting
characters (' or ") which have no effect on matching in any case, and
are simply characters.

Consequently I object to the desired action. I would tolerate some
words added noting that this is a common bug - but a bug is all this
one is, working the (various different) ways that the various shells
operate has no specifically designed benefit - this case simply has not
been properly considered in their implementations (and is not all that
hard to fix - I know, I have done that.)

There is a real need to be able to put generic patterns in variables,
and have them work, and work consistently in all 3 uses of patterns in
the shell (pathname expansion, case patterns, and substring matching in
parameter expansion) so that the same results occur in all 3 cases.

If the price for that is to have some shells listed as non-conformant
until they fix their bugs, then so be it.

The conformance test suite should be explicitly testing for these cases,
and failing shells that do not comply.

On the other hand, warning script writers that this is an area that is
fraught with danger, currently, is reasonable - even if there really is
no good workaround available currently, with a broken shell, there are
simply some things that cannot be accomplished, which should be possible.

(0004298)
kre (reporter)
2019-03-11 03:23

In note 4296 I wrote
   Quoting the parameter expansion cannot work - that always disables pathname
   expansion

That is loose (ok, incorrect) wording, parameter expansion isn't disabled
by quoting, rather it is just ineffective - as all of the quoted characters
are literals, so if the whole word is quoted (as in ls "${var}") pathname
expansion might as well be disabled, as only one pathname can possibly
result - that which is the exact contents of $var (with everything treated
literally). Still this is not really "disabling" pathname expansion, and I
should have been more precise in my wording. This glitch has no effect on
the substance of the note however.
(0004300)
stephane (reporter)
2019-03-11 07:24
edited on: 2019-06-15 11:42

Re: Note: 0004296

> All shells correctly handle
>
> ls \**.c
>
> to list all files with names starting with a '*' and ending with '.c'

As a historical note, that didn't work until V7. In the Thompson or even the Mashey shell (where the /etc/glob functionality was moved into the shell), that 8th bit on quoted character was removed just before calling exec(), so was still there at the filename matching stage, so you couldn't have globs with quoted characters. Even "a"*.c didn't work.

You would do [*]*.c then, just like you'd do

   files='[*]*.c'
   rm $files

now (except that it would delete a literal [*]*.c if there was no matching file, a misfeature/bug the Thompson/Mashey shells didn't have that was introduced by the Bourne shell (and no (cf 1235), it's not easy to work around and almost never worked around in real life scripts, but it's too late to fix now))

There is no need to add that confusing second meaning to backslash.

In awk (where different implementations handle backslash differently) and more generally in tools I'm not sure how escaping works, I tend to use [*] instead of \* as well to be on the safe side and which always works (the fish shell being an exception as it doesn't have the [...] wildcard operator).

(0004301)
stephane (reporter)
2019-03-11 07:31

Re: Note: 0004298

I would say that yes quoting a variable, or more generally making sure that a word in list context doesn't have any unquoted wildcard disables pathname expansion.

That's how it was done in the original glob implementation where /etc/glob would only do pathname expansion on arguments that contained unquoted */?/[.

"echo foo" would output foo, but "echo [f]oo" would fail with a "No match" error if there was no matching file. If it did output "No match" in the "echo foo" case, that would clearly be a bug.

Modern shells that still fail commands when glob don't match do the same.

And in those that don't, you wouldn't want echo foo/bar/baz to cause the shell to read the content of ., foo and foo/bar to realise there's only at most one match and then leave the result asis.
(0004303)
geoffclare (manager)
2019-03-11 09:41

This bug touches on many of the same issues that came up during discussion of bug 0000985. It may be best to close it as a duplicate of that bug and put our efforts into resolving the issues there instead of having a separate discussion here.
(0004306)
kre (reporter)
2019-03-12 03:24
edited on: 2019-03-12 12:14

Lacking the ability to write a pattern in anything other than original
shell code reliably is too big a loss to suffer - there must be a reliable
way to access a variable pattern, reliably, and use it (fully functional).

I'm not aware of any shell claiming that their broken way is correct, the
most I have seen is "posix doesn't mandate it" (which always was questionable)
"so we do not need to fix it".

What the Thompson (6th edn and earlier) shell, or the Mashey (PWD) shell
did, just as what csh (and derivatives) do is of no relevance whatever.
As I recall the Thompson shell (and probably the Mashey one too, though I
used that very little) didn't even have variables for this to be an issue.

And while the [] form of quoting works, most of the time, as an alernative
to \ quoting, it is needlessly different for no good reason, I can do

var='\**.c'; find /wherever -name "$var" -print

and that works as expected, but if I later do

ls -l $var

it does not? In your other issues you are arguing for consistency
across multiple different applications (extending as far away from sh
as perl) yet here you want things to remain different? Why?

Further, given that (for historical reasons) both ! and ^ are (or
might be) special when used as the first character of a bracket
expression, without allowing \ quoting, there's no way to write a
pattern that matches either of those chars (if one wanted to write
a pattern to look for glob patterns for example). One of the two
must be first, and thus have a special meaning which is unwanted.
Nothing else can be inserted into the bracket expression, as that
would result in a match against 3 chars instead of just the two.
On the other hand, if \ quoting works, then [\^!] or [\!^] work
just fine. Those do work when entered on the shell command line,
shouldn't they work in a variable as well?

Wrt Note: 0004301 ... whether pathname expansion is "disabled" or just ineffective
makes no difference (nor does the optimisation in the Thompson shell to not
exec /etc/glob when it would obviously change nothing). For most purposes this
is just semantics. But for the purposes of the standard, pathname expansion
is only disabled when done so via "set -f" (or "set -o noglob" of course).
In some situations it is not performed, and in others it is performed but
does nothing, and in others it produces a set of new words to replace the
original.

The details of how pathname expansion is performed - given that it must comply
with the constraint that 'r' permission is not to be required on any directory
component unless there is a wildcard operator in the source word component to
match (and one might even argue whether a usage like [a] is such, given the
only possible match is 'a') - is all up to the implementation, but as soon as
it starts looking at a word breaking it into pathname components and looking
to see if unquoted wildcard characters exist, it is performing pathname expansion (no other function in the shell requires such inspection) - and it
would not be doing any of that if pathname expansion were disabled.

And last, wrt Note: 0004303, I will go and (look again) at bug 0000985, though it
seems to me that when a bug is that old, and not yet resolved, it takes something like this to get it moved back into being under active consideration
rather than simply stagnating.

(0004307)
kre (reporter)
2019-03-12 03:37
edited on: 2019-03-12 12:16

Re Note: 0004303 again...

I suspect that you added that note to the wrong bug report. Bug 0001235
falls squarely into what is/was being discussed in bug 0000985 whereas
this one is not.

This is more related to bug 0001190, in which, in Note: 0004290, you explicitly
asked for new issues to be opened to discuss these side issues of the
original report.

(0004309)
Konrad_Schwarz (reporter)
2019-03-12 09:13

Re Note 4306:

why does

   eval ls -l $var

not suffice?
(0004310)
geoffclare (manager)
2019-03-12 09:58

Re Note: 0004307 I had misremembered where the relevant discussion occurred (it was on the mailing list, not initially in connection with bug 0000985), but I still believe the issue raised here about backslash in shell patterns should be dealt with in bug 985 because the proposed resolution for that bug puts the detail of pattern matching into fnmatch() and then rewrites 2.13 to refer to fnmatch(). The effect of the aforementioned discussion on how to rewrite 2.13 is the reason bug 985 was reopened (see Note: 0003947 and Note: 0003948).
(0004313)
kre (reporter)
2019-03-12 12:15
edited on: 2019-03-12 14:56

Re:Note: 0004309

You probably mean
    eval ls -l "$var"
without the quotes you'd get pathname expansion on $var (files that start
with a '*' for example - however that is encoded in var) and then the eval
would do pathname expansion again on each of the results from the first time.
That's rarely going to be what anyone wants.

But for the form with quotes in it, imagine var="[']*.c"

then what the eval sees is

     eval ls -l [']*.c

That isn't going to work at all. There's often (but not always) a way
to code around this, but then it means that the variable needs to be
different to be used in glob expansions like this, and in case patterns,
or parameter expansion substring operators - and certainly as an arg to
find. The same thing ought to be able to be used, with the same meaning
in all of those contexts.

(0004314)
kre (reporter)
2019-03-12 12:23
edited on: 2019-03-12 14:58

Re: Note: 0004310

I disagree actually - I'd keep 0000985 to discuss
just how case should be handled, and move all the discussions of how patterns
ought to be processed elsewhere (eg: here) as patterns affect much more than case
statements.

(0004317)
Konrad_Schwarz (reporter)
2019-03-12 14:29

Re: Note: 0004313

(I think) I understand.

But as desirable as it may be, there is not going to be a way for $var to
mean the same thing to the shell globber and to find -name, because when a backslash arrives at the shell globber, it is literal,
whereas it is an escape for find -name.

See also http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13, [^] 2.13.1 Patterns Matching a Single Character:

When pattern matching is used where shell quote removal is not performed (such as in the argument to the find - name primary when find is being called using one of the exec functions as defined in the System Interfaces volume of IEEE Std 1003.1-2001, or in the pattern argument to the fnmatch() function), special characters can be escaped to remove their special meaning by preceding them with a backslash character. This escaping backslash is discarded. The sequence "\\" represents one literal backslash. All of the requirements and effects of quoting on ordinary, shell special, and special pattern characters shall apply to escaping in this context.
(0004318)
kre (reporter)
2019-03-12 15:11

Re Note: 0004317

     "because when a backslash arrives at the shell globber, it is literal,
      whereas it is an escape for find -name."

That's exactly what the issue is. And what needs to change in order for
patterns in variables to work correctly. And here, "change" means "be
interpreted in a reasonable way", all of that is actually correct as it is.

Certainly a \ entered in a script (part of the shell input) is a quoting
character, and if it gets through to the pattern matching, it must have
been quoted (and as such be in a place where quote removal will be
performed, or at least would be, when used in a place where that happens).
That (quoted) \ which gets to the pattern matching code is certainly a
literal backslash, and simply matches itself.

But when we have
     var='\**.c'
(or similar) and then
     ls $var
(which becomes after parameter expansion)
     ls \**.c
that \**.c (as it is not the original text) is not subject to quote removal
(only quoting characters that were in the original script get removed by
quote removal) and is "When pattern matching is used where shell quote
removal is not performed" and the "special characters can be escaped"
(as in the first '*') "by preceding them with a backslash character."
which is exactly what has been done here.

This is another case where the standard as written is correct, and doesn't
really need any changes - we just need to actually believe what it says,
and interpret it correctly.
(0004319)
stephane (reporter)
2019-03-12 22:37
edited on: 2019-06-15 11:43

Re: Note: 0004306

> Lacking the ability to write a pattern in anything other than original
> shell code reliably is too big a loss to suffer - there must be a reliable
> way to access a variable pattern, reliably, and use it (fully functional).

"loss" implies something that you had and that you no longer have. Again, that "essential" feature you couldn't do without is hardly implemented by any shell, not even yours (at least not in the NetBSD 7.1.2 test VM I have here).

On the other hand, [*]* has been the way to match strings starting with * for almost 50 years (hence my "historical" reference to the Thompson shell, which the Mashey shell built open which the Bourne shell built upon which the Korn shell built upon, a subset of which POSIX specifies as sh).

I avoid \ as a quoting operator whenever I can as \ is way too overloaded (as quoting operator (with several layers in shells with `...`), as escape sequence introducer, as line continuation, and again in most utilities called by the shell).

Rather than

    case $x in (\?*)

I prefer:

    case $x in ('?'*)

Should I argue that

    pattern="'?'*"
    case $x in ($pattern)

should match on strings that start with "?"?

How about

    pattern='$var*'

Should we ask for a second round of shell evaluation (which is what you're asking for \ hence Konrad's eval suggestion, why stop there?) on the content of a variable when that variable is used as a pattern (reminding me of the infamous wordexp(3))

Again, fnmatch() doing backslash handling is its own implementation of quoting as a poor man substitute to shell quoting (globs came from the shell, fnmatch() was an effort to bring it to other commands). It's only doing \, not "..." nor '...', possibly for simplicity and/or to align with regexps.

Note that I'm not asking that POSIX prohibit that double backslash evaluation. It's too late, some shells are already doing it. Since bash, one can no longer do files='\x*'; ls -d $files and expect that to list the filenames that start with \x (not that anybody is likely to want to do that).

What I'm saying is: allow it if you want, but do not mandate it.

I'm not going to argue much longer, that's enough of a corner case, that I don't really care. If POSIX decides to mandate it, that p='\**'; echo $p should match files starting with *, OK (though I'll probably keep using the more portable [*]*), but at least leave p='\x*'; echo $p (that is where \ is not followed by a glob or \) unspecified so implementations can keep matching on '\x'* files and break fewer existing scripts.

(0004421)
geoffclare (manager)
2019-06-14 07:32

Re: Note: 0004319 you refer to "double backslash evaluation", implying that shell quoting turns \\ into \ and the \ can then be a pattern matching character. That is not the case. When shell quoting is applied to \\ the first \ turns the second one into a literal character.

So the two meanings of \ can never both be applied to the same pattern. I think we should make this clear by rearranging 2.13.1 to describe them one after the other, emphasising the different contexts:

When pattern matching is used where shell quoting affects the pattern, a <backslash> character shall escape the following character as described in [xref to 2.2.1] ...

When pattern matching is used where shell quoting does not affect the pattern (such as ...), a <backslash> character ...


and the "such as ..." part should be extended to include "word expansions when a pattern used in pathname expansion is not present in the original word but results from an earlier expansion".
(0004431)
stephane (reporter)
2019-06-18 17:37

Re: Note: 0004296
> I regard this:
>
> In most shells, beside quoting, backslash doesn't have that
> "second" meaning for shell globbing and "case" pattern matching.
>
> as an indication that most shells have had bugs in the matching code.

I feel like I have to reply to this here even if those points have already been made on the mailing list.

To me a bug is when the implementation doesn't behave as documented.

I've not managed to found any shell documentation that mentions \ as a wildcard quoting operator. AFAICT:

x='\x'
case $x in $x) echo yes; esac

is documented by all shells to be supposed to output "yes".

So ksh93, bash, dash and the sh of recent NetBSD have a bug (even if a documentation bug) in that they don't work as documented and don't output "yes" here.

You might argue that it's a minor bug in that people rarely use variables as case patterns and even if they do, rarely have backslashes in them.

bash 5 (which Geoff says exhibits the conforming behaviour) however has a major regression.

Its documentation says that:

x='\.'
printf '%s\n' $x

Should output \., but since version 5, it outputs . instead. I'm starting to suspect the change was not intentional (or at least that the consequences of the change were overlooked) as despite being a major non-backward-compatible change, it's not documented in the release notes, not reverted with BASH_COMPAT=4.4 and still not documented in the manual (the only related note about it in the CHANGES file that states "d. Reverted a change from April, 2018 that caused strings containing backslashes to be flagged as glob patterns.", suggesting the "bug" has not been correctly fixed/reverted).


It's much more serious because passing variable expansions (or other word expansions) unquoted is very common in practice.
(0004432)
joerg (reporter)
2019-06-20 08:06
edited on: 2019-06-20 08:09

In order to understand the problem and possible solutions, I need to
explain that there are at least two different methods to implement a
POSIX shell that affect the behavior in this context.

Method 1:

The parser reads multi-byte characters from stdin and converts them into
wide characters. The data structures created by the parser contain the
strings converted back into multi byte character strings and any quoted
character is prepended by a backslash. This results in a parser output
of:

        \a\b

that is e.g. created for the input

        'ab'

Double quotes are kept in the parser output and removed in the macro
expansion stage.

Method 2:

The parser reads multi-byte characters from stdin and converts them into
wide characters. The data structures created by the parser contain the
strings as wide character strings and any wide character that was quoted
has the top bit set as marker for a quoted character.

Method 1 is what has been implemented for the Bourne Shell Svr3 in the
mid 1980s. It saves space in the virtual memory of the shell but does
not allow to distinct a backslash that has been typed in on the command
line from a backslash that has been been created by the shell in order to
mark a quoted character in a string.

Method 1 is used by the Bourne Shell and it's derivatives like ksh88
and ksh93.

Since the POSIX standard intentionally uses an abstract wording that
does not require either of theses methods, it is sometimes hard to
understand what the POSIX standard text likes to say.

It should be obvious that any new wording in the POSIX standard needs
to be done in a way that does not make method 1 illegal.

(0004433)
geoffclare (manager)
2019-06-20 09:22

Re: Note: 0004432 where you say "the POSIX standard intentionally uses an abstract wording that does not require either of theses methods" I'm afraid you are completely wrong. The standard clearly requires backslash to be special in all shell pattern matches, regardless of whether they are directly coded or result from earlier expansions. If method 1 cannot comply with this requirement (which seems to be what you are saying) then the standard does not allow method 1.
(0004436)
joerg (reporter)
2019-06-20 12:38

I am afraid you missunderstand things.

The intention of POSIX is not to introduced changes that make
existing historic UNIX versions non-compliant. Existing historic
UNIX versions use ksh88 that is based on method 1.

If you believe that the current wording of POSIX is not compatible
to method 1, then the current POSIX wording is wrong.

I have problems to understand what you are currently interested in,
but maybe you can write an explanation that makes use of the staged
processing from method 1 to explain what you like to achieve.
(0004437)
kre (reporter)
2019-06-20 12:59

Re Note: 0004432 and Note: 0004433

Please, neither of you should be getting concerned about that, the method
by which the shell internally to mark which characters were quoted in the
original has no bearing on any of this. If it does it correctly, it all
simply works, whatever method is used.

The issue here all relates to unquoted backslashes in unquoted variable
expansions, so what happens to anything quoted (which as best I can tell in
all shells is largely correct) is completely irrelevant.

But re Note: 0004436 - Joerg, 20 years (30 years?) later that cannot any
more be a tenable position. POSIX must document what the standard is for
shells - if that has altered in the intervening period, the standard needs
to keep up, it cannot be held back because some ancient implementation cannot
be bothered updating to match contemporary standards.
(0004438)
joerg (reporter)
2019-06-20 14:41
edited on: 2019-06-20 14:42

Shells must be constantly rewritten to fix minor bugs and this is OK.

I am however concerned if an apparently small textual change in the
standard could enforce shell authors to rewrite the whole shell. The
Bourne Shell is now 42 years old, it even started as a patch to the
Tompson Shell (so it really is older) and I cannot see even a single
year without changes, but all these changes have been small changes
that have been made in a way that allows to test the probably
resulting bugs that are caused by changes.

(0004439)
geoffclare (manager)
2019-06-20 14:46

Re: Note: 0004436 Although POSIX based the shell description on the behaviour of ksh88, it did not just specify the exact behaviour of that shell. There are a number of deliberate differences. You cannot argue that if the behaviour the standard requires is not the same as (historic) ksh88 then the standard is automatically wrong.

In this particular case, it is clear that the intention of the standard authors was for backslash escapes to work in all shell patterns. They probably did not realise that ksh88 did not do that when the pattern comes from an expansion, but if they had been told about this difference, I expect they would have kept the standard as-is because that is the more consistent behaviour.

In any case we have already discussed this question thoroughly when we resolved bug 1190 and decided to uphold the standard. I see no reason to change that decision now.
(0004442)
stephane (reporter)
2019-06-21 05:23

Re: Note: 0004439
> In this particular case, it is clear that the intention of the standard
> authors was for backslash escapes to work in all shell patterns.

That is simply not true. If you look at the standard's history, it's pretty obvious that it was never the standard's intention for the shell to implement an extra layer of backslash processing.

While newer version could be seen as ambiguous about it, if you look at SUSv2 (http://pubs.opengroup.org/onlinepubs/7908799/xcu/chap2.html#tag_001_013_001) [^] there was extra verbiage that made it clearer the intention of the standard: that the \ was for fnmatch() (without FNM_NOESCAPE) / find / pax (since the specification of those point to that section when specifying the pattern they implement) as an ersatz of shell quoting there.

SUSv3 (https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_13_01) [^] removed some of that verbiage so it could be seen as more ambiguous, but if it had been as a decision to start to force shells to implement that extra operator as well, you'd have thought *more* verbiage would have been added to make it clear as it would have been a deviation from *all* existing implementations (or at least of their documentation for some).

Again, *not a single shell* documents that extra backslash processing, not even those that do it partly, not even bash5 that does it fully (or at least fuller than other implementations). Even yash that was written to the standard doesn't implement it.

So, you are not "upholding the standard", you're upholding a miss-interpretation of that standard (which admittedly was unclear), that goes against what *all* shells document. That's pure invention, which is breaking backward compatibility and for a feature that was never needed and is making the sh syntax even more arcane. The POSIX sh already has a portable syntax for patterns used as a result of word expansions to match literal wildcard operators ([?], [*], [[]), here we just need to specify that you also need [\\] to match a literal backslash to account for the few shells that have made unquoted \ special in wildcards as well.

That change (assuming you want to make it explicit now) would break backward compatibility in *every* shell (except bash5 which has just broken its own backward compatibility already to implement it).

That seems to go against everything POSIX is about.
(0004443)
stephane (reporter)
2019-06-21 05:35

Re: Note: 0004439
> In any case we have already discussed this question thoroughly when we resolved bug 1190 and decided to uphold the standard.

This bug (0001234) is *explicitly* about refuting that claim in 0001190 that \ has a double meaning in globs, which it doesn't have in most shells, and when it has it's not documented. And again AFAICT it was never the standard's intention for \ to have a double meaning in globs.
(0004444)
geoffclare (manager)
2019-06-21 09:36

Re: Note: 0004442 I believe that the intention of the original authors was to specify \ in pattern matching as an ersatz of shell quoting everywhere that shell quoting did not provide that meaning of \. However, they got the wording slightly wrong.

They wrote "When pattern matching is used where shell quote removal is not performed", but to match the intent they should have written something like "When pattern matching is used where the meaning of <backslash> as a shell quoting character is not in effect". By referring to quote removal they made it apply to case statements but not to pathname expansions. I think that's an inconsistency that nobody would want.

However, when interpreting the current standard you don't need to rely on deducing this intention, since the other reference to backslash escaping in the first paragraph of 2.13.1 clearly applies to all shell patterns.

Where you refer to "*every* shell (except bash5 ...)", that's inaccurate because:

1. Robert Elz and Harald van Dijk have shells that behave like bash5.

2. You imply that bash4 behaves like ksh88, but it doesn't: it treats indirect backslashes as special in pathname expansions, but it has a bug where it only does so when it needs to read the directory due to the presence of unescaped *, ? or [...]:
$ echo $BASH_VERSION
4.4.12(1)-release
$ ls
*.c  \*.c  \a.c
$ var='\**.c'; printf '%s\n' $var
*.c

(ksh lists \*.c and \a.c here).

So bash4 almost conforms to the POSIX requirement, it just doesn't handle situations properly where it doesn't need to read the directory to look for matches. (I assume bash5 fixes this by doing an lstat() in those situations.)
(0004445)
stephane (reporter)
2019-06-21 18:48
edited on: 2019-06-22 05:18

Re: Note: 0004444
> Re: http://austingroupbugs.net/view.php?id=1234#c4442 [^] I believe that the
> intention of the original authors was
> to specify \ in pattern matching as an ersatz of shell quoting everywhere
> that shell quoting did not provide that meaning of \. However, they got the
> wording slightly wrong.
[...]

That's one of several possible interpretations of a text that was probably not meant to cover that. It seems what's most likely is that the standard authors overlooked that completely, they did not consider that it could apply to the case of a pattern used in an unquoted word expansion (in "case" or globs). If they did, they would have made it explicit. Most likely they would have described the behaviour of the implementations of the time which didn't do it.

The best would have been to cover only shell pattern matching in that section (mentioning that shell quoting, '...', "..." and backslash remove the special meaning of wildcard characters), fnmatch()/glob() to refer to shell pattern matching, and documenting there (as it does already) that \ without FNM_NOESCAPE is used as an ersatz of shell quoting, and find/pax/ex... to point to fnmatch().

In the unlikely event that at the time they would have chosen to deviate from every shell implementation and enforce the behaviour you're promoting now (breaking backward compatibility in the process), I'd expect they would have done so *very* explicitly (if only to make sure the implementors were aware that they needed to change their implementation), but not before having sought David Korn's opinion as they invariably did at the time. That obviously didn't happen since ksh93 still doesn't do \ processing in globs.

The decision time is now, not then, now that that issue has been brought up. And it's between specifying existing implementations (except bash5 to keep backward compatibility) or forcing shells to implement a feature that is not useful and that only one shell has only recently started to implement and has already been proven to break existing scripts.
 
> Where you refer to "*every* shell (except bash5 ...)", that's inaccurate
> because:
>
> 1. Robert Elz and Harald van Dijk have shells that behave like bash5.

None of which are released yet. AFAICT, Robert Elz has not fully made up his mind. As stated in my earlier email which it seems you haven't read yet (https://www.mail-archive.com/austin-group-l%40opengroup.org/msg04136.html), [^] the current version of the NetBSD shell behaves mostly like bash4 but with an unfortunate (though minor) difference. So they can always undo that mistake.

Same thing for bash where the bash5 behaviour can mostly be seen as a bug (regression) which needs to be fixed as it's mostly undocumented and deviates from every other shell (as I already said earlier). It's also not been deployed widely yet, I can't imagine anyone has made use of that new "feature" yet.

> 2. You imply that bash4 behaves like ksh88, but it doesn't

I imply no such thing, I've clearly described it including the chicken and egg conundrum that it shares with kre's shell in that message already mentioned above and a few more.

> it treats
> indirect backslashes as special in pathname expansions, but it has a bug
> where it only does so when it needs to read the directory due to the
> presence of unescaped *, ? or [...]:
[...]
> So bash4 almost conforms to the POSIX requirement, it just doesn't handle

Yes, I have no issue with the bash4 behaviour as long as we don't mandate it. It's not pretty, but it makes sure that in most cases, \ is not interpreted unless it's needed to escape a glob operator. It's not as good as zsh's which only treats \ specially in front of glob operators (though has a few bugs which I also mentioned in a separate message) or ksh93/dash which don't treat \ specially in pathname expansion at all, but at least it's mostly compatible with Bourne/ksh88 most of the time.

Not only would I not mandate the bash5 behaviour, but I would go as far as prohibiting it, as allowing it would mean we would have to leave a lot of behaviour unspecified when a word expansion is unquoted and contains a backslash. For instance, without bash5, though we can't say what x='.\*'; grep $x will do (3 different possible behaviour), x='\.' grep $x is portable and consistent across all shells. Same for as_echo='printf %s\n'; $as_echo test

There's another aspect which I haven't mentioned yet (I'll develop more on that later) where the bash5 behaviour is making things worse when character sets like BIG5, GB18030 that have characters that contain the encoding of backslash are involved.

(0004446)
stephane (reporter)
2019-06-22 05:51
edited on: 2019-06-22 06:12

Re: Note: 0004444

> (I assume bash5 fixes this by doing an lstat() in those situations.)

No, if it were doing that, that would be wrong.

globbing does report the files that are accessible, but the entries that are found in a directory, like ls. For instance, it doesn't need search permission to a directory to expand files in it. It does need read permission though, which lstat() doesn't as long as there's search permission.

So, here in

a='\x' bash5 -c 'echo dir/$a'

Like in

a='[x]' anyshell -c 'echo dir/$a'

bash5 needs to read the content of dir (updating its access time by doing so!) to find a file called "x", doing an lstat("x") would give the wrong result if dir was readable but not searchable and contained a "x" file (or any other reason lstat("x") would fail; it would also give the wrong result if the file existed and was lstat()able but could not be found in the directory list because that directory is not readable or it's one of those .zfs/./.. type of file that is omitted from directory listings).

And that's another side effect of the new change of interface in bash5

Here where the cwd is a largeish directory (6000 files) on a FS on rotational drives (of course it could be a lot worse on NFS), dropping kernel caches between each command:

$ time a='\\\\' bash5 -c 'echo $a'
\\\\
a='\\\\' bash5 -c 'echo $a' 0.00s user 0.02s system 4% cpu 0.435 total
$ time a='\\\\' bash4 -c 'echo $a'
\\\\
a='\\\\' bash4 -c 'echo $a' 0.00s user 0.00s system 11% cpu 0.045 total
$ time a='\\\\\' bash5 -c 'echo $a'
\\\\\
a='\\\\\' bash5 -c 'echo $a' 0.00s user 0.01s system 18% cpu 0.044 total

(with 5 backslashes, bash5 doesn't do globbing because of the last \ with nothing following it).

(0004447)
stephane (reporter)
2019-06-23 05:14
edited on: 2019-06-23 05:32

Re: Note: 0004446 (replying to myself)

That's not quite right. If the shell is meant to do that extra layer of backslash escaping, as if calling glob() without GLOB_NOESCAPE on the content of the variable, then \ should not be treated as a "pattern character", and the shell should do a lstat() as Geoff says (as opposed to finding matching files in the content of the directory (even though the presence of \ alone is otherwise enough to trigger globbing)).

As noted on the mailing list at https://www.mail-archive.com/austin-group-l%40opengroup.org/msg04264.html [^] bash seems to do it "right" for all but the last path component of a wildcard which is inconsistent in addition to not conforming to your interpretation of the standard:

$ mkdir -m a=x searchable
$ p='searchable/\.' bash5 -c 'printf "%s\n" $p'
searchable/\.
$ p='searchable/\./.' bash5 -c 'printf "%s\n" $p'
searchable/./.
$ mkdir -m a=r readable
$ p='readable/\.' bash5 -c 'printf "%s\n" $p'
readable/.
$ p='readable/\./.' bash5 -c 'printf "%s\n" $p'
readable/\./.


And it does it as well for a quoted "." (which is also a change from bash4)!

bash5 -c 'printf "%s\n" */\.'
readable/.
$ bash5 -c 'printf "%s\n" */"."'
readable/.
$ bash5 -c 'printf "%s\n" */.'
searchable/.

$ bash4 -c 'printf "%s\n" */\.'
searchable/.


(0004448)
joerg (reporter)
2019-06-24 10:23
edited on: 2019-06-24 10:29

Re: Note: 0004445

My understanding of the standard is that it standardizes existing behavior of
UNIX platforms.

In case and only in case that there is a definite bug in that UNIX behavior,
POSIX may explain a different behavior, but only in case that there is a
detailled explanation on why the historical UNIX behavior is to be seen
wrong and how exactly the named bug should be prevented.

My impression is that this is not the case for our bug.

(0004449)
geoffclare (manager)
2019-06-24 11:06
edited on: 2019-06-24 11:06

Re: Note: 0004448 "the standard [...] standardizes existing behavior of UNIX platforms". That was one of the goals of POSIX.2-1992 but another major goal was consistency. There are numerous examples where the 1992 standard required existing UNIX implementations to change in order to conform, in the name of consistency. Perhaps the biggest was the Utility Syntax Guidelines, but there was also clearly a goal of rationalising and bringing consistency across REs and shell pattern matching. I'm sure that if this issue had come to light during the development of POSIX.2-1992 the developers would have decided that improving consistency in the handling of backslash across shell pattern matching (direct and indirect), find -name, pax pattern operands, and fnmatch() was the right thing to do.

(0004564)
geoffclare (manager)
2019-09-23 15:39
edited on: 2019-09-30 16:04

Interpretation response
------------------------
1. The standard clearly states in XCU 2.13.1 that backslash has an escaping role in shell patterns that is distinct from its role as a quoting character, and conforming implementations must conform to this.

2. The standard states in XCU 2.13.3 that patterns in pathname expansion are matched against existing files regardless of the pattern contents, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
1. Although existing practice in some shells is not to treat backslash as special in situations where shell quoting does not affect the pattern (such as in word expansions when a pattern used in pathname expansion is "indirect", i.e. not present in the original word but resulting from an earlier expansion), relaxing the standard to allow this behavior would be undesirable, as it would mean that the only way to match a literal '?', '*' or '[' would be to put them in a bracket expression, unlike all other contexts where these characters are special and they can be escaped with backslash. Application writers should be able to use an unquoted unescaped backslash that is not inside a bracket expression in a pattern and have it interpreted the same way across the shell (in all contexts), find, pax, fnmatch() and glob(). This was the aim of the original POSIX.2-1992 developers in having all of those parts of the standard, where they talk about pattern matching, reference what is now XCU 2.13. It is unfortunate that the issue of patterns in shell variables did not come to light earlier, thus allowing the current discrepancy in some shells to persist for several years instead of being corrected long ago. However, the goal of consistency across all uses of pattern matching is still as worthwhile now as it was in 1992.

2. Existing practice in most shells that do treat backslash as special in "indirect" patterns in pathname expansions is only to match patterns against existing pathnames if the pattern includes a '*', '?' or '[' that is treated as special. This prevents accidental removal of backslash characters in variable expansions where generating a list of matching files is not intended and a (usually oddly named) file with a matching name happens to exist.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------

On page 2382 line 76210 section 2.13.1, change:
The following patterns matching a single character shall match a single character: ordinary characters, special pattern characters, and pattern bracket expressions. The pattern bracket expression also shall match a single collating element. A <backslash> character shall escape the following character. The escaping <backslash> shall be discarded. If a pattern ends with an unescaped <backslash>, it is unspecified whether the pattern does not match anything or the pattern is treated as invalid.
to:
The following patterns shall match a single character: ordinary characters, special pattern characters, and pattern bracket expressions. The pattern bracket expression also shall match a single collating element.

In a pattern, or part of one, where a shell-quoting <backslash> can be used, a <backslash> character shall escape the following character as described in [xref to 2.2.1], regardless of whether or not the <backslash> is inside a bracket expression. (The sequence "\\" represents one literal <backslash>.)

In a pattern, or part of one, where a shell-quoting <backslash> cannot be used to preserve the literal value of a character that would otherwise be treated as special:
  • A <backslash> character that is not inside a bracket expression shall preserve the literal value of the following character, unless the following character is in a part of the pattern where shell quoting can be used and is a shell quoting character, in which case the behavior is unspecified.

  • For the shell only, it is unspecified whether or not a <backslash> character inside a bracket expression preserves the literal value of the following character.

All of the requirements and effects of quoting on ordinary, shell special, and special pattern characters shall apply to escaping in this context, except where specified otherwise. (Situations where this applies include word expansions when a pattern used in pathname expansion is not present in the original word but results from an earlier expansion, or the argument to the find -name or -path primary as passed to find, or the pattern argument to the fnmatch() and glob() functions when FNM_NOESCAPE or GLOB_NOESCAPE is not set in flags respectively.)

If a pattern ends with an unescaped <backslash>, the behavior is unspecified.

On page 2382 line 76216 section 2.13.1 change:
An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in [xref to 2.2] that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.

When unquoted and outside a bracket expression, ...
to:
An ordinary character is a pattern that shall match itself. In a pattern, or part of one, where a shell-quoting <backslash> can be used, an ordinary character can be any character in the supported character set except for NUL, those special shell characters in [xref to 2.2] that require quoting, and the three special pattern characters described below. In a pattern, or part of one, where a shell-quoting <backslash> cannot be used to preserve the literal value of a character that would otherwise be treated as special, an ordinary character can be any character in the supported character set except for NUL and the three special pattern characters described below. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, or escaped with a <backslash>, that pattern shall match the character itself. The application shall ensure that it quotes or escapes any character that would otherwise be treated as special, in order for it to be matched as an ordinary character.

When unquoted, unescaped, and not inside a bracket expression, ...

On page 2383 line 76232 section 2.13.1, delete:
When pattern matching is used where shell quote removal is not performed (such as in the argument to the find -name primary when find is being called using one of the exec functions as defined in the System Interfaces volume of POSIX.1-2017, or in the pattern argument to the fnmatch() function), special characters can be escaped to remove their special meaning by preceding them with a <backslash> character. This escaping <backslash> is discarded. The sequence "\\" represents one literal <backslash>. All of the requirements and effects of quoting on ordinary, shell special, and special pattern characters shall apply to escaping in this context.

On page 2384 line 76271 section 2.13.3, change:
3. Specified patterns shall be matched against existing filenames and pathnames, as appropriate. Each component that contains a pattern character shall require read permission in the directory containing that component. Any component, except the last, that does not contain a pattern character shall require search permission.
to:
3. If a specified pattern contains any '*', '?' or '[' characters that will be treated as special (see [xref to 2.13.1]), it shall be matched against existing filenames and pathnames, as appropriate. Each component that contains any such characters shall require read permission in the directory containing that  component. Each component that contains a <backslash> that will be treated as special may require read permission in the directory containing that component.  Any component, except the last, that does not contain any '*', '?', or '[' characters that will be treated as special shall require search permission.

On page 2384 line 76288 section 2.13.3, change:
it is unspecified whether other unquoted pattern matching characters within the same slash-delimited component
to:
it is unspecified whether other unquoted '*', '?', '[' or <backslash> characters within the same slash-delimited component

On page 2384 line 76295 section 2.13.3, add:
4. If a specified pattern does not contain any '*', '?' or '[' characters that will be treated as special, the pattern string shall be left unchanged.

On page 3748 line 128686 section C.2.13.1, change:
Calling a utility or function without going through a shell, as described for find and the fnmatch() function defined in the System Interfaces volume of POSIX.1-2017.
to:
Calling a utility or function without going through a shell, as described for find and the fnmatch() and glob() functions defined in the System Interfaces volume of POSIX.1-2017, or pattern matching in the shell in situations where the pattern is specified indirectly instead of directly to the shell, such as <tt>ls -ld -- $pattern</tt> or <tt>case $var in ($pattern) ...</tt>.

On page 3748 line 128696 section C.2.13.1 change:
pax −r ... "*a\(\?"

to:
pax −r ... "*a(\?"

On page 3748 line 128697 section C.2.13.1, add these new paragraphs after the numbered list:
The wording "In a pattern, or part of one, where a shell-quoting <backslash> cannot be used to preserve the literal value of a character that would otherwise be treated as special" has been carefully crafted so that for the shell it only applies to certain contexts. In particular:
  • The use of "or part of one" is needed because a single pattern can be produced partly from characters directly included in a word and partly from characters that result from one or more of the word expansions. For example, in the following command the <backslash> escapes the '?' character:
    dir='abc\?'
    ls -l -- $dir/*.c
    

  • The reference to "a shell-quoting <backslash>" rather than just using "where shell quoting cannot be used" is because there are ways that other types of shell quoting can be used where a shell-quoting <backslash> cannot, such as placing an expansion within double-quotes as in this example:
    dir='abc?'
    ls -l -- "$dir"/*.c
    

  • The use of "that would otherwise be treated as special" is needed because otherwise the condition would apply to <backslash> in single-quotes. For example, in the following command the <backslash> is not treated as escaping the '?' because the '?' would not be treated as special anyway:
    ls -l 'abc\?'/*.c
    

In patterns specified indirectly to the shell, it is unspecified whether or not <backslash> is special inside bracket expressions. This is because there are two mutually exclusive consistency aims and neither is considered more important than the other. One is consistency with direct patterns, where <backslash> is special inside bracket expressions (which is, in turn, for consistency with the way single-quotes and double-quotes preserve the literal value of characters inside bracket expressions); the other is consistency with regular expressions, find, pax, fnmatch(), and glob(), where <backslash> is not special inside bracket expressions (not counting the extra C-string escaping in EREs in awk).

Earlier versions of this standard allowed two behaviors when a pattern ends with an unescaped <backslash>: it could match nothing or be treated as an invalid pattern. However, a third behavior has since been observed, where the ending <backslash> is treated as a literal <backslash>, and therefore this standard now simply states that the behavior is unspecified.

On page 3748 line 128698 section C.2.13.1 change:
Conforming applications are required to quote or escape the shell special characters (sometimes called metacharacters). If used without this protection, syntax errors can result or implementation extensions can be triggered. For example, the KornShell supports a series of extensions based on parentheses in patterns.
to:
Earlier versions of this standard included the statement "The shell special characters always require quoting" in [xref to XCU 2.13.1]. It is unclear what was intended by this, since there are pattern matching contexts in which it is not possible to quote those characters, such as:
execlp("find", "find", ".", "-name", "*[()]*", (ch
ar *)0);

where the parentheses cannot be escaped with a <backslash> because <backslash> is not special in bracket expressions in that context. The statement is thought to have been a warning to application writers and interactive shell users that shell special characters (sometimes called metacharacters) always need quoting in patterns that appear directly in shell code; for example, this code:
case $char in
[()]) ... ;;
esac
is incorrect because the parentheses are parsed as operators - they need to be quoted in order to be treated as part of the pattern. This standard now simply requires instead that applications quote or escape any character that would otherwise be treated as special, in order for it to be matched as an ordinary character. If shell special characters are used without this protection in contexts where they are treated as special, syntax errors can result or implementation extensions can be triggered. Some shells support a series of extensions based on parentheses in patterns that are valid extensions in these contexts because they would otherwise cause syntax errors. However, this means that they are not allowed by this standard to be recognized in contexts where those syntax errors would not occur anyway, such as in:
pattern='a*(b)'; ls -- $pattern
which this standard requires to list files with names beginning 'a' and ending "(b)". It is recommended that implementations do not extend pattern matching in the shell in ways that are only valid extensions because they would otherwise be syntax errors, in order to avoid inconsistency between different pattern matching contexts. One way to provide an extension that is consistent between different pattern matching contexts in the shell (although still not consistent with find -name, fnmatch(), etc.) is to enable the extension only when a non-standard shell option is set, or when the shell is executed using a command name other than sh. Consistency with non-shell contexts can then be achieved by enabling equivalent extensions in those other contexts by use of non-standard utility options or non-standard FNM_* and GLOB_* flags.


On page 3749 line 128725 section C.2.13.3, add a new paragraph:
Patterns are matched against existing filenames and pathnames only when the pattern contains a '*', '?' or '[' character that will be treated as special. This prevents accidental removal of backslash characters in variable expansions where generating a list of matching files is not intended and a (usually oddly named) file with a matching name happens to exist. For example, a shell script that tries to be portable to systems that predate the introduction of functions and printf might use this on POSIX systems:
myecho='printf %s\n'

to be used as:
$myecho args...
If <tt>%s\n</tt> were to be matched against existing files, this would not work if a file called <tt>%sn</tt> happened to exist.


(0004571)
geoffclare (manager)
2019-09-26 16:20

As agreed in the Sep 26th teleconference, Note: 0004564 has been updated to make the following changes:

Added "unless the following character is in a part of the pattern where shell quoting can be used and is a shell quoting character, in which case the behavior is unspecified" to the line 76210 change.

Added the part about permissions to the line 76271 change.
(0004576)
geoffclare (manager)
2019-09-30 16:05

As agreed in the Sep 30th teleconference, Note: 0004564 has been updated to add the page 2384 line 76288 section 2.13.3 change.
(0004604)
agadmin (administrator)
2019-10-07 15:16

Interpretation proposed: 7 October 2019
(0004640)
agadmin (administrator)
2019-11-11 12:16

Interpretation Approved: 11 Nov 2019

- Issue History
Date Modified Username Field Change
2019-03-08 23:58 stephane New Issue
2019-03-08 23:58 stephane Name => Stephane Chazelas
2019-03-08 23:58 stephane Section => 2.13.1
2019-03-08 23:58 stephane Page Number => 2382
2019-03-08 23:58 stephane Line Number => 76212-76215
2019-03-11 02:42 kre Note Added: 0004296
2019-03-11 02:58 kre Note Edited: 0004296
2019-03-11 03:07 kre Note Edited: 0004296
2019-03-11 03:10 kre Note Added: 0004297
2019-03-11 03:16 kre Note Edited: 0004296
2019-03-11 03:23 kre Note Added: 0004298
2019-03-11 07:24 stephane Note Added: 0004300
2019-03-11 07:31 stephane Note Added: 0004301
2019-03-11 09:41 geoffclare Note Added: 0004303
2019-03-12 03:24 kre Note Added: 0004306
2019-03-12 03:37 kre Note Added: 0004307
2019-03-12 03:38 kre Note Deleted: 0004297
2019-03-12 09:13 Konrad_Schwarz Note Added: 0004309
2019-03-12 09:58 geoffclare Note Added: 0004310
2019-03-12 12:13 Don Cragun Note Edited: 0004306
2019-03-12 12:14 Don Cragun Note Edited: 0004306
2019-03-12 12:15 kre Note Added: 0004313
2019-03-12 12:16 Don Cragun Note Edited: 0004307
2019-03-12 12:23 kre Note Added: 0004314
2019-03-12 12:24 kre Note Edited: 0004314
2019-03-12 14:29 Konrad_Schwarz Note Added: 0004317
2019-03-12 14:56 kre Note Edited: 0004313
2019-03-12 14:58 kre Note Edited: 0004314
2019-03-12 15:11 kre Note Added: 0004318
2019-03-12 22:37 stephane Note Added: 0004319
2019-03-14 15:58 Don Cragun Relationship added related to 0001190
2019-06-14 07:32 geoffclare Note Added: 0004421
2019-06-15 11:42 stephane Note Edited: 0004300
2019-06-15 11:43 stephane Note Edited: 0004319
2019-06-18 17:37 stephane Note Added: 0004431
2019-06-20 08:06 joerg Note Added: 0004432
2019-06-20 08:08 joerg Note Edited: 0004432
2019-06-20 08:09 joerg Note Edited: 0004432
2019-06-20 09:22 geoffclare Note Added: 0004433
2019-06-20 12:38 joerg Note Added: 0004436
2019-06-20 12:59 kre Note Added: 0004437
2019-06-20 14:41 joerg Note Added: 0004438
2019-06-20 14:42 joerg Note Edited: 0004438
2019-06-20 14:46 geoffclare Note Added: 0004439
2019-06-21 05:23 stephane Note Added: 0004442
2019-06-21 05:35 stephane Note Added: 0004443
2019-06-21 09:36 geoffclare Note Added: 0004444
2019-06-21 18:48 stephane Note Added: 0004445
2019-06-22 05:18 stephane Note Edited: 0004445
2019-06-22 05:51 stephane Note Added: 0004446
2019-06-22 05:56 stephane Note Edited: 0004446
2019-06-22 06:12 stephane Note Edited: 0004446
2019-06-23 05:14 stephane Note Added: 0004447
2019-06-23 05:32 stephane Note Edited: 0004447
2019-06-24 10:23 joerg Note Added: 0004448
2019-06-24 10:29 joerg Note Edited: 0004448
2019-06-24 11:06 geoffclare Note Added: 0004449
2019-06-24 11:06 geoffclare Note Edited: 0004449
2019-09-23 15:27 geoffclare Relationship added related to 0000247
2019-09-23 15:39 geoffclare Note Added: 0004564
2019-09-23 15:42 geoffclare Interp Status => ---
2019-09-23 15:42 geoffclare Final Accepted Text => Note: 0004564
2019-09-23 15:42 geoffclare Status New => Interpretation Required
2019-09-23 15:42 geoffclare Resolution Open => Accepted As Marked
2019-09-23 15:43 geoffclare Interp Status --- => Pending
2019-09-23 15:43 geoffclare Tag Attached: tc3-2008
2019-09-23 15:44 Don Cragun Relationship added related to 0000985
2019-09-26 16:19 geoffclare Note Edited: 0004564
2019-09-26 16:20 geoffclare Note Added: 0004571
2019-09-30 16:04 geoffclare Note Edited: 0004564
2019-09-30 16:05 geoffclare Note Added: 0004576
2019-10-02 21:45 Don Cragun Relationship added parent of 0001295
2019-10-07 15:16 agadmin Interp Status Pending => Proposed
2019-10-07 15:16 agadmin Note Added: 0004604
2019-11-11 12:16 agadmin Interp Status Proposed => Approved
2019-11-11 12:16 agadmin Note Added: 0004640
2019-12-12 10:45 geoffclare Status Interpretation Required => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker