Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000247 [1003.1(2008)/Issue 7] Shell and Utilities Objection Enhancement Request 2010-04-29 22:50 2019-09-24 20:58
Reporter dwheeler View Status public  
Assigned To ajosey
Priority normal Resolution Open  
Status Under Review  
Name David A. Wheeler
Organization
User Reference
Section set,glob
Page Number 256,1088,2333-2334,2357-2359
Line Number 8395,36273,73812-73813,74488-74605
Interp Status ---
Final Accepted Text
Summary 0000247: Add nullglob (null globbing) support to shell's "set" and glob()
Description Even though filename processing is a very common operation, it is surprisingly difficult to do correctly, as described here:
 http://www.dwheeler.com/essays/filenames-in-shell.html [^]
 http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html [^]

One annoyance is that if a glob pattern is not matched, the failed pattern is returned. This explicitly required in section 2.13.3 ("Patterns Used for Filename Expansion"), starting at line 73812: "If the pattern does not match any existing filenames or pathnames, the pattern string shall be left unchanged".

This means that many common shell constructs are incorrect, because they fail when the pattern does not match anything. For example, this is usually wrong, because there is no guarantee that a directory will have a .txt file:
 for file in ./*.txt ; do
  COMMAND "$file"  # This may try to process the file named "*.txt" if no match

 done


One solution is include a check inside the loop, but this is complicated and inefficient, and is thus often is not done. There are also pathological cases where the pattern failed but a filename *with* the pattern exists, which causes the wrong thing to be done. In short, while it's possible to do this, people don't do it:
 for file in ./* ; do        # Use "./*", NOT bare "*", to avoid "-filename
s".
   if [ -e "$file" ] ; then  # Make sure it isn't an empty match
     COMMAND ... "$file" ...
   fi
 done


It would be far better if the shell could automatically do the "right" thing, that is, return an empty set if a metacharacter is included *and* there is no matching result. The "glob.h" header file (page 256) includes an option GLOB_NOCHECK that is close to what is desired, though not quite.
Shell null globbing returns an empty result if there is no match *and* there was at least one metacharacter; it returns the file unchanged if there is NO metacharacter. The glob() routine's GLOB_NOCHECK returns empty even if there were NO metacharacters, and is thus subtly different. The current glob() routine is sufficient to implement the shell proposal, but it would be useful to add an addition option to glob() so that they can also do null globbing, so that is added as well.

One possible objection is that this does not always handle failed matches when the command does something different without any files. For example, "cat" will read from a list of filenames, but will read from stdin when no filenames are listed, so "cat ./*.txt" will still do the wrong thing when no filenames are present. This is true, but "cat ./*.txt" already does the wrong thing (it will try to open a non-existant file "*.txt" if the match fails), so using this option doesn't make it fundamentally any *worse*, while making "for" loops far more useful. Commands that don't switch to "read from stdin when no files" are also better off. Finally, since it is an option, people can enable it or disable it as they choose.

Many shells have a way of doing this, but there is no *standard* way to do it. Doing this in a shell is often called "null globbing"
Null globbing fixes this by replacing an unmatched pattern with nothing at all. In bash you can enable nullglob with "shopt -s nullglob". In zsh, you can use "setopt NULL_GLOB" for the same result. Then, "for" loops on glob patterns will work correctly if nothing matches the glob pattern.

There are many possible short and long option names; the very problem right now is that there is no standardization of the name! This proposal suggests set -N for the short option to "set", and "nullglob" as the long name in "set -o". I searched and found that these did not interfere with existing options in bash 4.0.23, dash version 0.5.5.1, ksh version 93t+, or zsh 4.3.9. Obviously, other option names are possible; the key is standardize it.

It would be nice to use "set -G" as the option for nullglob, since zsh already supports null globbing with this very name, and bash does not have an interfering use for it. Unfortunately, ksh uses "set -G" to expand "**" into a recursive descent of files, so "set -G" should *not* be used as it would impede adoption elsewhere.

It might be nice to modify wordexp() (e.g., page 461), too. This proposal doesn't do that, but that would be an obvious next step.

This proposal proposes one approach to modifying glob() to support this as well - a new option GLOB_NULLGLOB that only has effect if GLOB_NOCHECK is enabled, and slightly modifies how GLOB_NOCHECK works. There are other ways to do this, of course.
Desired Action Document this new shell option ("nullglob") as follows:

In page 2333, line 73812-73813, replace:
"If the pattern does not match any existing filenames or pathnames, the pattern string shall be left unchanged".
with:
"If the pattern does not match any existing filenames or pathnames, and contains at least one metacharacter, the result depends on the nullglob option. If the nullglob option is enabled, a null string results. If the nullglob option is not enabled, the pattern string shall be left unchanged".

On Page 2357, in the synopsis lines 74489-74490, add a new "-N" short option name for set.

Under line 74562, add:
-N
When this option is on, a filename expansion pattern which matches no files, yet included at least one character with a special meaning (see 2.13.1), expands to a null string rather than itself.

Under line 74583, add:
nullglob
Equivalent to -N.


Document the new lower-level glob option (GLOB_NULLGLOB) as follows:


Page 256, under line 8395 (glob), add:
GLOB_NULLGLOB
If the pattern contains special characters and does not match any pathname, then the result is empty instead of the pattern. Only has effect if GLOB_NOCHECK is also enabled.
On line 8393, append "(if GLOB_NULLGLOB is enabled, the pattern will only be returned if there are no special characters in the pattern)".

Page 1088, under line 36273:
GLOB_NULLGLOB
If pattern contains wildcards and does not match any pathname, then the result is empty instead of the pattern. Only has effect if GLOB_NOCHECK is also enabled.

Line 36269, append "(if GLOB_NULLGLOB is enabled, the pattern will only be returned if there are no special characters in the pattern)".

Tags No tags attached.
Attached Files

- Relationships
related to 0001234Applied 1003.1(2016)/Issue7+TC2 in most shells, backslash doesn't have two meaning wrt pattern matching 

-  Notes
(0000411)
dwheeler (reporter)
2010-04-29 23:00

Quick fix - in my proposal, change:
"If the pattern contains special characters"
to:
"If the pattern contains at least one special character"

And change:
"If pattern contains wildcard"
to:
"If the pattern contains at least one special character"
(0000420)
nick (manager)
2010-05-27 15:31

From David Korn (email seq 13735):

Subject: Re: [1003.1(2008)/Issue 7 0000247]: Add nullglob (null globbing) support to shell's "set" and glob()
--------

> Many shells have a way of doing this, but there is no *standard* way to do
> it. Doing this in a shell is often called "null globbing"
> Null globbing fixes this by replacing an unmatched pattern with nothing at
> all. In bash you can enable nullglob with "shopt -s nullglob". In zsh, you
> can use "setopt NULL_GLOB" for the same result. Then, "for" loops on glob
> patterns will work correctly if nothing matches the glob pattern.
>

In ksh93, you can do this on a per pattern basis with ~(N) in front of the
pattern, for exampe
        for i in ~(N)*.c
        do xxx
        done
which will skip the loop of there are no files ending with .c.


David Korn
dgk@research.att.com
(0000421)
nick (manager)
2010-05-27 15:42
edited on: 2019-09-19 16:34

During May 27 2010 conf call, general consensus is that ksh93 filename generation appears to have many useful extensions, and we should move in that direction. See http://www2.research.att.com/sw/download/man/man1/ksh.html [^] for man page details. New wording invited.

Update: Later discussions relating to ksh93 filename generation brought to light a number of issues. In particular, it is only a valid extension because it would otherwise cause syntax errors. This means that POSIX does not allow it to be used in contexts where those syntax errors would not occur anyway, such as in:
pattern='a*(b)'; ls -- $pattern

which POSIX requires to list files with names beginning 'a' and ending "(b)". Consideration was given to explicitly allowing the extension in pattern matching, but this would risk breaking existing applications that use parentheses in patterns specified to find, pax, fnmatch() and glob().

(0004560)
eblake (manager)
2019-09-23 15:21

glibc has:
       GLOB_NOMAGIC
              If the pattern contains no metacharacters, then it should be
              returned as the sole matching word, even if there is no file
              with that name.
Is that close enough, in which case it would be worth standardizing something that already exists rather than inventing things for glob(3)?
(0004567)
joerg (reporter)
2019-09-24 14:29

There might be a problem caused by the fact that I am not aware of
a shell implementaton that used glob() from libc.

Shells historically used gmatch() that was either part ofthe shell
sources or inside the AT&T library "libgen" and these implementations
do not have flags.
(0004568)
kre (reporter)
2019-09-24 19:00
edited on: 2019-09-24 19:12

[Aside: I have not yet read all the proposed resolutions of the other related
bugs ... I was going to before commenting here, but with these recent messages
I thought perhaps a speedier reply here might be in order]

Re Note: 0004567 ... Joerg, it is irrelevant how the shell is implemented, that
is of no concern here (I suspect that we all know that glob() was invented as
a mechanism to allow other programs to duplicate the shell's pattern matching,
which necessarily implies that the shell had it elsewhere, first).

Here for glob() I see no harm in specifying the GLOB_NOMAGIC flag if it is
really needed - I am not sure it is, the glob() interface already provides all
of the mechanism needed to allow almost any desired behaviour to be implemented
without this extra option, but it is easy to do, and exists in the wild
(probably in many places) already, so, if it is really wanted, go for it
(but see below, that ought to be done in a new bug report)

The NetBSD man page for glob(3) after describing what the option does, goes
on to say:

   GLOB_NOMAGIC is provided to simplify implementing the historic csh(1)
   globbing behavior and should probably not be used anywhere else.

which I think is correct. Note particularly "simplify implementing" - the
option doesn't provide any abilities not already available with the existing
interface, it simply makes using this one mode of operation a little easier,
the implementation isn't required to duplicate some of the work already
performed inside glob().

Flags to glob() - and in fact anything related to glob() whether the shell
happens to use glob(3) to implement its own globbing or not, are irrelevant
to the specification of what the shell does however. What would matter there
is just which flags the shell set when it called glob() (or how its alternate
implementation behaves) - and if needed what the UI is to be to allow scripts
to control how the shell expands patterns when doing pathname expansion - and
further, how that UI relates to the other uses of pattern matching in the
shell, if at all.

Personally I am 100% opposed to any form of "nullglob" in the shell - with the
possible exception of a ksh93 type technique, though not the one it actually
uses. That is, an ability to specify as part of a pattern that the expansion
of filenames based upon that one pattern should return nothing, rather than the
pattern, should no matches be found is reasonable - there are occasional places
in scripting where that can simplify the code. The actual ksh93 implementation
isn't the right one - patterns using the syntax need to be possible to generate
regardless of how the pattern is generated, having it work only when the pattern
is literally in the script, and not when the pattern is obtained from an
expansion is not acceptable. There are other ways it could be done, using
patterns that are meaningless, like for example ** (though not that one as too
many implementations already give that a different meaning) but perhaps ?**
or *?* as a prefix for the pattern (both of which are equivalent to ?* (or *?)
and so are unlikely to ever be seen in anything existing. But such a change
would be pure invention at this point, and not something that should be
contemplated here - better for some shell to pick some technique and see how
much use scripts actually make of it, and then request standardisation of that
technique if it appears useful (if it is useful, it will likely be copied by
other shells in order to make scripts written for the first shell work).

However, as some kind of global option, nullglob makes script writing more
complex, and we should not even contemplate standardising any such thing.
Some shells have it already, and from best I can tell based upon how rarely
(like never) I see requests to implemnent it in order for scripts from those
shells to work, it isn't actually used very much, if at all.

Despite what the Description says (and perhaps because of the almost 10 year
gap between then and now, during which user expectations may have altered
a little) it is neither difficult, or not done in practice, for scripts to
actually test whether a pattern match that returned exactly the pattern did
so because that one result is a file that happens to have the same name as the
pattern (a script can make this less likely, or impossible, to occur in some
cases, by simply replacing any literal character (eg: C) by the sequence [C].
If the pattern is returned unchanged, then it matched no intended files
(whether it might have matched a file containing the sequence "[C]" is
irrelevant, not that it should, as that is not what the script is looking for).

That is, to take one of the examples from the description,

    for file in ./*.txt ; do [...]

can be written

    for file in ./*.[t]xt; do case "$file" in './*.[t]xt') continue;; esac; [...]

which is simple and cheap since the case pattern match is simply a strcmp()
in this case. (The "continue" could also be "break", as when this happens
there is only the one match, so the two are equivalent.)

I know not all cases are this simple, and in some there is a need to do an
existence test, but
    for file in *; do
(aside: there is no need to use ./* here, the expansion here is not where
issues with leading '-' can occur, it is when ${file} is expanded - such
expansions can be written ./${file} if needed, or a -- can be used to prevent
a leading '-' being treating as introducing options to the program).

Anyway that can be easily rewritten as

    for file in *; do case "${file}" in '*') test -e '*' || continue; esac;;

which is simple, and again cheap (there is extra stat() call, or perhaps
test invocation if test is not builtin) only when '*' is returned, which
will only ever be once (at most) each time the loop is executed, plus one
extra strcmp() each time around the loop, which is not going to be noticeable
in any real application (that is, excluding benmchmarks set up precisely to
make this do more work).

On the other hand, the other example from the description "cat ./*.txt"
is definitely not any kind of example promoting nullglob, as:

     already does the wrong thing (it will try to open a non-existant
     file "*.txt" if the match fails)

is true, but that is exactly what the script normally wants to happen

[jinx]{2}$ mkdir /tmp/foo
[jinx]{2}$ cd /tmp/foo
[jinx]{2}$ cat ./*.txt
cat: ./*.txt: No such file or directory
[jinx]{2}$

which is certainly better than

[jinx]{2}$ bash -O nullglob
jinx$ cat ./*.txt

(at which point cat simply hangs - it is reading stdin, and in this
situation, the hang is the least evil of the possible poor effects that
can occur). That is MUCH worse than the previous.

Of course, it is possible to code around them, but how many scripts does
anyone see which actually do anything like

    files=$(printf '%s\n' ./*.txt | sed -e "s/.*/'&'/")
    test -n "${files}" && eval cat $files

(and yes, I know the sed script actually needs to be much more complex to
deal with files with single-quote characters in their names, here I am
illustrating the simplest possible case, not reality)

On the other hand when one actually wants to generate nothing when the
pattern doesn't match the techniques above for the for loop apply (for loops
are the most common place this is the desired action) work.

So, please let us not specify nullglob.

On the other hand, the companion option to nullglob that sometimes exists,
errglob (with whatever name) which causes a pathname expansion to fail if
no files match (which results in the same behaviour as any other expansion
failing) is something that can be useful, if the script really assumes that
there should be a file matching *.txt and for some reason there is not,
having the shell fail to continue is not unreasonable - so if that one were
proposed for standardisation I would be less opposed. But that would have
to happen in an entirely new bug/defect report.

For this one, I suggest simply rejecting it. If the GLOB_NOMAGIC flag to
glob(3) is seen as useful enough to actually standardise (that is, given that
it is already fairly widely implemented, if more than one or two applications
actually exist that use it) then that should be requested in a new defect
report - it is certainly not what was requested in this one.

(0004569)
stephane (reporter)
2019-09-24 20:58

Re: Note: 0004568

> Personally I am 100% opposed to any form of "nullglob" in the shell -
> with the possible exception of a ksh93 type technique,

Note that ksh93's ~(N) was inspired from zsh's N glob qualifier. The N glob qualifier enables the nullglob option for the glob it's qualifying. The nullglob option is also a zsh invention, copied by several other shells later.

There's a similar D qualifier for the dotglob option (in ksh93 you set $FIGNORE), and a (#i) extended glob operator for nocaseglob (ksh93 has ~(i)).

I generally agree that modifying the behaviour using special syntax in the glob on a per-glob basis is much better than global options or special parameters and that's not specific to glob (see for instance the nightmare it is to manage $IFS to do word splitting; compare with rc's ``(separator){cmd} or zsh's "s" parameter expansion flag), but nullglob, nocaseglob, dotglob, noglob are already widely implemented and are better than nothing and nullglob is much needed if you want to programmatically retrieve the list of files that match a glob.

Also note fish's approach where globs with no match fail the command, except when the command is "set" or "count" in which case a nullglob behaviour occurs.

So you can do

set files *.txt

To get the list of *.txt files (similar to zsh's files=(*.txt(N)) or ksh93's files=(~(N)*.txt)), but

cat -- *.txt

fails as expected when there's no txt file.

You're making otherwise very good points that nullglob is not an option you want to enable by default or generally use except in a few specific cases. See also https://unix.stackexchange.com/questions/204803/why-is-nullglob-not-default/204944#204944 [^]

- Issue History
Date Modified Username Field Change
2010-04-29 22:50 dwheeler New Issue
2010-04-29 22:50 dwheeler Status New => Under Review
2010-04-29 22:50 dwheeler Assigned To => ajosey
2010-04-29 22:50 dwheeler Name => David A. Wheeler
2010-04-29 22:50 dwheeler Section => set,glob
2010-04-29 22:50 dwheeler Page Number => 256,1088,2333-2334,2357-2359
2010-04-29 22:50 dwheeler Line Number => 8395,36273,73812-73813,74488-74605
2010-04-29 23:00 dwheeler Note Added: 0000411
2010-05-27 15:31 nick Note Added: 0000420
2010-05-27 15:42 nick Note Added: 0000421
2019-09-19 16:33 nick Note Edited: 0000421
2019-09-19 16:34 nick Note Edited: 0000421
2019-09-23 15:21 eblake Note Added: 0004560
2019-09-23 15:27 geoffclare Relationship added related to 0001234
2019-09-24 14:29 joerg Note Added: 0004567
2019-09-24 19:00 kre Note Added: 0004568
2019-09-24 19:10 kre Note Edited: 0004568
2019-09-24 19:12 kre Note Edited: 0004568
2019-09-24 20:58 stephane Note Added: 0004569


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker