Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001824 [Issue 8 drafts] Shell and Utilities Editorial Clarification Requested 2024-04-01 15:31 2024-06-14 17:57
Reporter dag-erling View Status public  
Assigned To
Priority normal Resolution Open  
Status New   Product Version Draft 4.1
Name Dag-Erling Smørgrav
Organization
User Reference
Section Utilities
Page Number 2741-2748
Line Number 90593-90715, 90876-90880
Final Accepted Text
Summary 0001824: cp: directories and symlinks
Description I would like to request a clarification on the matter of cp's handling
of symbolic links in the destination.

To begin with, I find the wording of the final paragraph of the
rationale (90876-90880) confusing. It mentions “file types not
specified by the System Interfaces” and implies that symbolic links
fall into that category, but I can no indication anywhere else that
this is the case. On the contrary, the definition of “file” in §3.139
on page 51 explicitly includes “symbolic link” in its enumeration of
file types (1592-1593), before stating that “[o]ther types of files
may be supported by the implementation” (1593-1594).

If we jump back to the description section, the behavior of cp if a
source file is a directory and the corresponding destination file
exists and is a symbolic link is not entirely clear to me. If you
believe the final paragraph of the rationale, it is covered by item 2c
(90638-90639) which says it's implementation-defined. If you don't,
it depends on a couple of additional factors. First, do you consider
the type of the link or the type of its target? (I will come back to
this later.) If you consider the type of the link, or if you consider
the type of its target and its target is not a directory, we turn to
item 2d (90640-90642) which says to emit an error, not descend, and go
on with the next source file. If you consider the type of its target
and its target is a directory, we turn to item 2f (90649-90650) which
says to copy the contents of the source into the destination.

I cannot find any discussion anywhere in the specification for cp of
what to do if the target of a symbolic link does not exist, unless the
second paragraph of item 4c (90697-90698) is intended to cover this
case (but 4c discusses the case where the source is a symbolic link,
so it doesn't tell us what to do if the destination exists, is a
symbolic link, and references a non-existent file).

Now for the matter of whether, if the destination file is a symbolic
link, we should consider instead the type of its target. The
descriptions of the -L and -P options repeatedly use the phrase
“symbolic links encountered during traversal of a file hierarchy”
(cf. 90625, 90627, 90712, 90715). Given that the surrounding text
mostly refers to the source, it is not clear to me whether this phrase
only applies to the source, or to both the source and the destination.
Turning to historical precedent, BSD cp has traditionally followed
symbolic links in the destination hierarchy while GNU cp appears not
to. FreeBSD recently changed its implementation to take the -R, -L,
and -P options into consideration when checking the destination as
well, while the GNU cp documentation appears to state quite clearly
(and, in my opinion, correctly) that these options only apply to the
source. I believe that this change was a mistake, and I intend to
revert it. However, I cannot make up my mind as to whether the
historical behavior of BSD cp (always follow symbolic links in the
destination) is correct. I can easily conceive of situations where
you would want cp to do that, but it has been pointed out to me that
doing so, at least by default, can be considered a security risk.

Note that in the case where the source file is a file and the
destination exists (item 3a lines 90657-90671), the file type of the
destination is not taken into account at all. I am not as concerned
with this case as I am with the directory case, but it should probably
be addressed as well.

To summarize:

- The rationale implies that symbolic links are an extension, which I
  believe to be incorrect.

- It is unclear whether symbolic links in the destination should be
  followed, and whether the -L and -P options apply, when inspecting
  destination paths.

- There is historical precedent for answering these questions with
  “yes” and “no”, respectively. Recent history suggests the wording
  is vague enough that implementers are confused on the second point.

- The description section does not adequately discuss how cp should
  behave if the source is a directory and the destination exists and
  is a symbolic link.

- The description section does not consider the type of the
  destination at all in the case where the source is a file and the
  destination exists.
Desired Action 1. Clarify the final paragraph of the rationale.

2. Modify the description section to state either:

   a) That cp always follows links in the destination.
   b) That cp never follows links in the destination.
   c) That whether cp follows links in the destination is unspecified.

   The best option is probably c) since there is historical precedent
   for (and therefore also against) both a) and b).

3. Modify the phrase “symbolic link(s) encountered during traversal of
   a file hierarchy”, which appears twice in the description section
   and twice in the options section, to clarify whether it only refers
   to the source, or to both source and destination.

4. Modify the list of steps taken for each source file in the
   description section to clarify what happens if the source is a
   directory and the destination is a symbolic link.

5. Optionally expand the list of steps taken for each source file in
   the description section to also describe what happens if the source
   is a regular file and the destination is a symbolic link.
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0006731)
dannyniu (reporter)
2024-04-02 06:19
edited on: 2024-04-02 06:22

Here's how I understand the (new) wording of the rationale:

1. Implementations are allowed to copy directories to their own implementation-defined file types,

2. the wording is chosen such that implementations may support symbolic links (pointing to whatever that the implementation supports) as copy destinations for directories.

As such, the normative text is modified to "support" this "loophole".

(0006734)
geoffclare (manager)
2024-04-02 15:51

The final paragraph of rationale dates back to the original POSIX.2-1992 standard, where the text was "implementation-defined file types not specified by POSIX.1 {8}". The reference to "POSIX.1 {8}" was to the POSIX.1-1990 standard which did not specify symbolic links.
(0006737)
geoffclare (manager)
2024-04-04 14:30
edited on: 2024-04-08 08:42

I believe the standard is clear on all of the points raised here.

Taking the bullet items after "To summarize" in turn:

- The rationale implies that symbolic links are an extension, which I believe to be incorrect.
Yes, as per my previous note, the final paragraph of the rationale is out of date. It should be disregarded when interpreting the normative text.

- It is unclear whether symbolic links in the destination should be followed, and whether the -L and -P options apply, when inspecting destination paths.
XBD 4.16 Pathname Resolution requires that symbolic links are followed except when all of the following are true: 1. This is the last pathname component of the pathname. 2. The pathname has no trailing <slash>. 3. The function is required to act on the symbolic link itself, or certain arguments direct that the function act on the symbolic link itself.

(Note that "function" here is a reference back to "the function being performed"; it does not mean C language function.)

So the only conditions under which cp does not follow a symlink specified as the destination are those where the cp description explicitly says cp acts on the symlink. It says so for -P on line 90628: "shall not follow any symbolic links". I believe all other mentions of acting on the symlink are not related to the destination. (Note that text referring to traversal of the file hierarchy can only be referring to source files, since no traversal is performed for destination files; existence checks are not traversals.)

- There is historical precedent for answering these questions with "yes" and "no", respectively. Recent history suggests the wording is vague enough that implementers are confused on the second point.
I believe the text in the standard is sufficient to answer the questions. Whether the required behaviour matches what implementations do is another matter. The recent history could be taken as an indication that (at least some) implementors are willing to make changes to conform, once they are made aware of the correct interpretation of the standard.

- The description section does not adequately discuss how cp should behave if the source is a directory and the destination exists and is a symbolic link.
It needs to be read in combination with XBD 4.16 Pathname Resolution, and noting that -P implies symlinks are not followed for both source and destination.

- The description section does not consider the type of the destination at all in the case where the source is a file and the destination exists.
Correct, and I don't see any problem with that. If the destination is a directory, the open() call in 3.a.ii will fail and cp will report an error. If -f is specified, an unlink() is attempted in 3.a.iii and will normally also fail, although it does seem to imply that if the implementation supports privileged unlinking of directories then cp will do so when run with appropriate privilege. We should consider adding a "the file is a non-directory file" condition to 3.a.iii.


(0006739)
dag-erling (reporter)
2024-04-04 16:01

Geoff, I think you should look more carefully at the context of your quote. You plucked it from a portion of the text which discusses traversal of the source; I don't agree that it clearly requires -P to apply to the destination.

Furthermore, if you are correct and line 90628 applies equally to the destination as to the source, then the same must go for the rest of the surrounding text, including lines 90619-90620 which say that the default behavior is unspecified. This contradicts your claim that the default behavior should be to follow symbolic links.

Your insistence that this is all perfectly clear clashes with the reality that implementers (of which I am one) either find it unclear or outright disagree... and with your own apparent confusion.
(0006740)
geoffclare (manager)
2024-04-04 16:59
edited on: 2024-04-05 08:16

What is there before line 90628 that makes you think its context is traversal of the source? I'm not seeing it. Quite the opposite, as line 90613 says "The term source_file refers to the file that is being copied, whether specified as an operand or a file in a file hierarchy rooted in a source_file operand."

Lines 90619-90620 say "If none of the options −H, −L, nor −P were specified, it is unspecified which of −H, −L, or −P will be used as a default." It doesn't make any sense to try and think of this in terms of it applying to certain files and not to others. It is simply saying "cp -R a b" can behave the same as "cp -RH a b", "cp -RL a b", or "cp -RP a b".

You assert that I claimed "the default behavior should be to follow symbolic links". That is not true. I said that the pathname resolution rules require that symbolic links are followed except when a certain set of conditions is met. For cp, one of those exception conditions is met when the -P option is in effect. If a cp implementation has -P as the default when -R is specified and none of -H, -L or -P is specified, then that condition is met by default and symlinks will not be followed by default for cp -R.

(0006741)
dag-erling (reporter)
2024-04-05 11:33

Line 90628 is within a section about -R. Furthermore, lines 90714-90715, which describe -P in the options section, speak only of source_file or traversal, which we've established is about the source, not the destination.
(0006743)
geoffclare (manager)
2024-04-08 08:39
edited on: 2024-04-08 08:44

> Line 90628 is within a section about -R.

That doesn't imply "its context is traversal of the source". You can use -R without any traversal occurring, e.g.:

cp -RL symlink_to_regfile copy_of_regfile
cp -RP symlink copy_of_symlink

and lines 90623-90625 (for -L) and 90626-90628 (for -P) apply to these cases.

> Furthermore, lines 90714-90715, which describe -P in the options section, speak only of source_file or traversal, which we've established is about the source, not the destination.

Aha! You've finally identified something I agree is a problem. This text conflicts with line 90628, as it implies (together with the pathname resolution rules) that symlinks are always followed for the destination whereas 90628 says they aren't.

The description of -P in OPTIONS was missing from the final POSIX.2b draft and was added by IEEE PASC Interpretation 1003.2 #194. I suspect the working group which processed that interpretation just came up with the wording by comparison to the -H and -L descriptions and missed the significance of the DESCRIPTION text for -P saying "shall not follow any symbolic links" as regards the destination. Note that the rationale for the interpretation says "The standard is clear as the -P option is described in the description section. However, it would be better to have the option described in the Options section as well." (See https://web.archive.org/web/20050116074829/http://www.pasc.org/interps/unofficial/db/p1003.2/pasc-1003.2-194.html [^] ). So, it is clear the intention was for the new -P text in OPTIONS to match the existing DESCRIPTION text.

(0006788)
geoffclare (manager)
2024-05-20 09:26
edited on: 2024-05-20 09:28

Before we can work on wording, we need to decide what behaviour(s) to require/allow for -P. We should be guided by existing practice.

I tried a few tests on the systems I have access to.

- First I tried this:
mkdir targdir; ln -s targdir destdir
echo src > src
cp -RP src destdir
and cp created src in targdir (on Solaris, Linux, and macOS).

- Then I tried this:
mkdir destdir; cd destdir; echo targ > targ; ln -s targ src
cd ..; echo src > src
cp -RP src destdir
and cp copied src contents into the targ file (on Solaris, Linux, and macOS).

- Then I tried this:
mkdir subdir destdir; cd destdir; mkdir targdir; ln -s targdir subdir
cd ../subdir; echo src > src; cd ..
cp -RP subdir destdir
Solaris and macOS created src in targdir.
Linux failed with "cannot overwrite non-directory 'destdir/subdir' with directory 'subdir'"

Conclusion: Solaris and macOS consistently follow destination symlinks, as does historical FreeBSD according to the bug description, but Linux (GNU coreutils 9.1 on a Debian system) is inconsistent; it follows destination symlinks in two out of three of my test cases but does not in the other case.

(0006807)
eblake (manager)
2024-06-10 18:16

A lot of this stems from Linux's intentional decision years ago that when readlink("dangling") returns "newdir" but stat("dangling") fails with ENOENT, then rename("olddir", "dangling/") should fail with ENOTDIR, instead of leaving "dangling" intact as a symlink and renaming "olddir" to "newdir". It is not just rename() affected; mkdir(), unlink(), and several other Linux syscalls intentionally refuse to dereference through a symlink followed by a trailing slash on the grounds that it is somewhat ambiguous on whether you wanted to act on a (potential) directory name or on the symlink itself.

Is it time to recognize the Linux syscall behaviors on dangling symlinks as a valid alternative to traditional Unix behavior (a much bigger change to identify all of those affected interfaces, but would make it easier for Linux to finally comply with POSIX), or are we only wanting to paper over this scenario at the command line interface of cp despite Linux being unwilling to change kernel behavior at the C level, or something else altogether?
(0006808)
geoffclare (manager)
2024-06-11 09:38

Re: Note: 0006807 I thought that behaviour only happened when there are trailing slashes. In my tests in Note: 0006788 I intentionally did not test trailing slashes so as to avoid that issue.

To answer your question, this has been discussed in the past and we decided not to allow the Linux behaviour for trailing slashes. I see no reason to change that decision now.

Note that the Linux systems which achieved UNIX03 certification (Inspur K-UX and Huawei Euler-OS) changed their trailing slash behaviour to conform to the standard, so not all Linux systems behave that way.
(0006809)
geoffclare (manager)
2024-06-11 09:53

> A lot of this stems from Linux's intentional decision ...

I repeated my tests from Note: 0006788 with /usr/gnu/bin/cp on Solaris 11.4 and got the same results as on Linux. So I think you are mistaken to blame the GNU cp behaviour on the behaviour of the underlying Linux system calls it uses.
(0006818)
mirabilos (reporter)
2024-06-14 17:57

FWIW, MirBSD (“historic BSD from the 2000s”) does the same as Solaris and Mac OSX in Note: 0006788.

- Issue History
Date Modified Username Field Change
2024-04-01 15:31 dag-erling New Issue
2024-04-01 15:31 dag-erling Name => Dag-Erling Smørgrav
2024-04-01 15:31 dag-erling Section => Utilities
2024-04-01 15:31 dag-erling Page Number => 2741-2748
2024-04-01 15:31 dag-erling Line Number => 90593-90715, 90876-90880
2024-04-02 06:19 dannyniu Note Added: 0006731
2024-04-02 06:20 dannyniu Note Added: 0006732
2024-04-02 06:21 dannyniu Note Deleted: 0006732
2024-04-02 06:22 dannyniu Note Edited: 0006731
2024-04-02 15:51 geoffclare Note Added: 0006734
2024-04-04 14:30 geoffclare Note Added: 0006737
2024-04-04 16:01 dag-erling Note Added: 0006739
2024-04-04 16:59 geoffclare Note Added: 0006740
2024-04-04 18:19 geoffclare Note Edited: 0006740
2024-04-05 08:16 geoffclare Note Edited: 0006740
2024-04-05 11:33 dag-erling Note Added: 0006741
2024-04-08 08:39 geoffclare Note Added: 0006743
2024-04-08 08:42 geoffclare Note Edited: 0006737
2024-04-08 08:44 geoffclare Note Edited: 0006743
2024-05-20 09:26 geoffclare Note Added: 0006788
2024-05-20 09:28 geoffclare Note Edited: 0006788
2024-06-10 18:16 eblake Note Added: 0006807
2024-06-11 09:38 geoffclare Note Added: 0006808
2024-06-11 09:53 geoffclare Note Added: 0006809
2024-06-14 17:57 mirabilos Note Added: 0006818


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker