Anonymous | Login | 2024-04-20 03:02 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0000527 | [1003.1(2008)/Issue 7] Shell and Utilities | Objection | Clarification Requested | 2011-12-17 04:15 | 2019-06-10 08:55 | ||
Reporter | eblake | View Status | public | ||||
Assigned To | ajosey | ||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Eric Blake | ||||||
Organization | Red Hat | ||||||
User Reference | ebb.du | ||||||
Section | du | ||||||
Page Number | 2611 | ||||||
Line Number | 84170 | ||||||
Interp Status | Approved | ||||||
Final Accepted Text | See Note: 0001105 | ||||||
Summary | 0000527: du and files found via multiple command line arguments | ||||||
Description |
The standard currently states about du: [line 84170] Files with multiple links shall be counted and written for only one entry. The directory entry that is selected in the report is unspecified. A strict interpretation of the standard applies this restriction to all ways in which the same file can be found, even if the file is found twice only by virtue of multiple command line arguments. However, this strict reading renders existing du implementations non-compliant, since they reset their hash of duplicate files when proceeding on to additional arguments; meanwhile, GNU du implemented this strict reading of the standard, and is in the middle of a debate about whether this change from traditional behavior is desirable: http://debbugs.gnu.org/10281 [^] This shell sequence demonstrates traditional behavior (as implemented on Solaris 10) vs. GNU behavior (the fact that the overall numbers differ appears to stem from allocation patterns on the file system, and what appears to be a slight algorithm differences between Solaris and GNU about whether to favor st_blocks or st_size, and how rounding affects matters; but that's unrelated to the point at hand about eliding duplicate entries). $ mkdir tmp && cd tmp $ mkdir a b $ printf %04098d 1 > a/a $ ln a/a b/b $ # both implementations detect that a/a and b/b are links, and elide b/b $ /usr/xpg4/bin/du -ak . 5 ./a/a 6 ./a 1 ./b 9 . $ ~/gnu/du -ak . 5 ./a/a 7 ./a 2 ./b 10 . $ # now, compare when a and b are listed as separate arguments $ /usr/xpg4/bin/du -ak a b 5 a/a 6 a 5 b/b 6 b $ ~/gnu/du -ak a b 5 a/a 7 a 2 b Meanwhile, the existing wording can be deemed self-conflicting, in that both the -a and -s options mention listing all arguments given on the command line, in contrast to the earlier statement about eliding duplicates: [line 84117] Regardless of the presence of the −a option, non-directories given as file operands shall always be listed. [line 84185] Instead of the default output, report only the total sum for each of the specified files. In all likelihood, the intent of the standard was to codify traditional behavior where the hash for duplicate files is reset each time du starts processing the next command line argument, and GNU du was wrong for trying to take the standard too literally. However, it was pointed out that the GNU behavior of remembering duplicates across multiple command line arguments does have a use not possible in the traditional implementation: if a user has multiple directories, all of which share some hard links, then only the GNU semantics make it possible to see how much disk space will be reclaimed by removing the one directory, by invoking 'du -s' with the directory to be removed as the last argument. Therefore, I'm presenting two options for solving the conflict in the standard, although my preference would be for option 1 (the GNU implementation is willing to change its behavior to comply with option 1 by adding an extension option to provide its current behavior of remembering links across multiple command line arguments, and all other implementations already comply with option 1). |
||||||
Desired Action |
Option 1 - require duplicate checking to reset for each /file/: Change line 84170 [du DESCRIPTION] from: Files with multiple links shall be counted and written for only one entry. to: Within the file hierarchy of each individual /file/ argument, files with multiple links shall be counted and written for only one entry. Option 2 - leave things unspecified to permit both traditional and GNU behavior when there are multiple /file/ arguments: Change line 84170 [du DESCRIPTION] from: Files with multiple links shall be counted and written for only one entry. to: Files with multiple links shall be counted and written for only one entry, although when there are multiple /file/ arguments, it is unspecified whether a file entry encountered via two different arguments will be counted or skipped during processing of the later argument. |
||||||
Tags | tc2-2008 | ||||||
Attached Files | |||||||
|
Relationships | ||||||
|
Notes | |
(0001102) eblake (manager) 2012-01-26 16:10 |
Paul Eggert sent this comment to the mailing list: Eric Blake's Option 1 does not appear to be tenable, as du traditionally preserved hashes of duplicate files across all of its operands. 7th Edition Unix 'du' did that, and (as Jilles Tjoelker pointed out) so do at least two current 'du' implementations, namely, FreeBSD and GNU. The idea behind Eric's Option 2 is better, but its wording is unclear partly because of another issue Jilles raised: whether a file's disk space should be counted multiple times if the file occurs multiple times and its link count is 1. For example: mkdir d cd d cp /bin/sh w cp w y ln y ../y ln -s w x ln -s y z du -aL This analyzes a directory with two regular files, 'w' and 'y'. GNU and Solaris du count these files once each, with an accurate sum of non-symlink disk usage under the current directory. But w's link count is 1 so FreeBSD counts 'w' twice, thus overcounting disk usage. The current POSIX wording does not say what to do for this example, but the intent is to avoid overcounting disk usage, and the GNU and Solaris behavior supports this intent better. (The 7th Edition Unix behavior agrees with FreeBSD, but this predates symbolic links so the behavior is now dubious.) Given all the above, the standard's wording could be improved in several different ways, all elaborations of Option 2. Here are two possibilities: Option 2A - require that files be hashed among all operands, and that disk usage be counted at most once. Change line 84170 [du DESCRIPTION] from: Files with multiple links shall be counted and written for only one entry. to: A file that occurs multiple times shall be counted and written for only one entry, even if the occurrences are under different file operands. Option 2B - leave unspecified whether files are hashed among all operands, and leave unspecified whether disk usage is counted multiple times for files whose link count does not exceed 1. From the user's point of view, this means du's output is a reliable count of disk usage only if du is invoked without -L and with -x and with at most one operand. Change line 84170 [du DESCRIPTION] from: Files with multiple links shall be counted and written for only one entry. to: A file that occurs multiple times under one file operand and that has a link count greater than 1 shall be counted and written for only one entry. It is implementation-defined whether a file that has a link count no greater than 1 is counted and written just once, or is counted and written for each occurrence. It is implementation-defined whether a file that occurs under one file operand is counted for other file operands. Option 2A is simpler and clearer, but it invalidates many existing implementations. Option 2B modifies the standard to describe how existing implementations actually work, but is more complicated and more of a hassle to use reliably. Eric raised one other issue: the description of the -a option implies that "du A B" must always list B. This implication is incorrect for 7th edition Unix du, GNU du, and (I expect) FreeBSD du, so it should be fixed as well. Here's one possible fix, which is independent of the abovementioned changes. Change line ????? [du OPTIONS] from: Regardless of the presence of the -a option, non-directories given as file operands shall always be listed. to: The -a option does not affect whether non-directories given as file operands are listed. (Sorry, I don't know the line number here; I don't have a PDF copy of the current standard and don't know offhand how to get one.) |
(0001103) eblake (manager) 2012-01-26 16:11 |
Paul also sent these comments: > It boils down to a decision of whether we want to standardize a useful > behavior, and whether that behavior avoids over-counting, but possibly > invalidating existing implementations (in which case, it is better > targetted to Issue 8), or whether we give up and declare things > unspecified when encountering files with link count of 1 through > multiple locations (in which case we could make the changes in TC2 of > Issue 7, and still make recommendations on the underlying goal of > avoiding over-counting). We can do both, and it makes sense to do both. That is, we can have Issue 7 TC2 specify Option 2B with a suggestion to implement Option 2A, and have Issue 8 require Option 2A. On 01/13/2012, Geoff Clare wrote: > One problem with requiring Option 2A is that it requires du to use > much more memory for hierarchies where there are large numbers > of files with link count 1. This could be a problem for embedded > systems in particular. The extra memory shouldn't be needed in the typical case where there is at most one file operand and where -L is not used. In the typical case, du is within its rights to not hash files whose link count is 1, even if Option 2A is required. This is because in the typical case du can't encounter the same file twice if its link count is 1. |
(0001104) nick (manager) 2012-01-26 16:19 edited on: 2012-01-26 16:28 |
Change line 84170 [du DESCRIPTION] from: Files with multiple links shall be counted and written for only one entry. to: A file that occurs multiple times under one file operand and that has a link count greater than 1 shall be counted and written for only one entry. It is implementation-defined whether a file that has a link count no greater than 1 is counted and written just once, or is counted and written for each occurrence. It is implementation-defined whether a file that occurs under one file operand is counted for other file operands. In FUTURE DIRECTIONS, change line 84274 from "None" to "A future version of this standard may require that a file that occurs multiple times shall be counted and written for only one entry, even if the occurrences are under different file operands." Change line 84177 [du OPTIONS] from: Regardless of the presence of the -a option, non-directories given as file operands shall always be listed. to: The -a option does not affect whether non-directories given as file operands are listed. |
(0001105) nick (manager) 2012-01-26 16:24 edited on: 2012-01-26 16:25 |
Interpretation response ------------------------ The standard states that files with multiple links shall be counted and written for only one entry , and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor. Rationale: ------------- Existing practice is varied in how du counts files. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- See Note: 0001104 |
(0001291) ajosey (manager) 2012-06-29 16:16 |
Interpretation proposed 29 June 2012 for final 45 day review |
(0001354) ajosey (manager) 2012-08-30 09:15 |
Interpretation approved 30 Aug 2012 |
Issue History | |||
Date Modified | Username | Field | Change |
2011-12-17 04:15 | eblake | New Issue | |
2011-12-17 04:15 | eblake | Status | New => Under Review |
2011-12-17 04:15 | eblake | Assigned To | => ajosey |
2011-12-17 04:15 | eblake | Name | => Eric Blake |
2011-12-17 04:15 | eblake | Organization | => Red Hat |
2011-12-17 04:15 | eblake | User Reference | => ebb.du |
2011-12-17 04:15 | eblake | Section | => du |
2011-12-17 04:15 | eblake | Page Number | => 2611 |
2011-12-17 04:15 | eblake | Line Number | => 84170 |
2011-12-17 04:15 | eblake | Interp Status | => --- |
2012-01-26 16:10 | eblake | Note Added: 0001102 | |
2012-01-26 16:11 | eblake | Note Added: 0001103 | |
2012-01-26 16:19 | nick | Note Added: 0001104 | |
2012-01-26 16:24 | nick | Note Added: 0001105 | |
2012-01-26 16:25 | nick | Note Edited: 0001105 | |
2012-01-26 16:25 | nick | Note Edited: 0001105 | |
2012-01-26 16:26 | nick | Note Edited: 0001104 | |
2012-01-26 16:27 | nick | Interp Status | --- => Pending |
2012-01-26 16:27 | nick | Final Accepted Text | => See Note: 0001104 |
2012-01-26 16:27 | nick | Status | Under Review => Interpretation Required |
2012-01-26 16:27 | nick | Resolution | Open => Accepted As Marked |
2012-01-26 16:27 | nick | Tag Attached: tc2-2008 | |
2012-01-26 16:28 | nick | Note Edited: 0001104 | |
2012-01-26 16:29 | nick | Final Accepted Text | See Note: 0001104 => See Note: 0001105 |
2012-01-26 21:04 | eblake | Relationship added | related to 0000539 |
2012-06-29 16:16 | ajosey | Interp Status | Pending => Proposed |
2012-06-29 16:16 | ajosey | Note Added: 0001291 | |
2012-08-30 09:15 | ajosey | Interp Status | Proposed => Approved |
2012-08-30 09:15 | ajosey | Note Added: 0001354 | |
2019-06-10 08:55 | agadmin | Status | Interpretation Required => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |