Anonymous | Login | 2024-12-12 12:54 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001784 | [Issue 8 drafts] Shell and Utilities | Objection | Error | 2023-10-22 06:14 | 2024-06-11 09:12 | ||
Reporter | kre | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | Product Version | Draft 3 | ||||
Name | Robert Elz | ||||||
Organization | |||||||
User Reference | |||||||
Section | XCU 3 / getopts | ||||||
Page Number | 2955 - 2959 | ||||||
Line Number | 98803 - 98966 | ||||||
Final Accepted Text | See Note: 0006600. | ||||||
Summary | 0001784: getopts specification needs fixing (multiple issues) | ||||||
Description |
First: Line 98807 and the index of the next argument to be processed in the shell variable OPTIND. Much the same is in the ENVIRONMENT VARIABLES section, lines 98888-9 say: OPTIND This variable shall be used by the getopts utility as the index of the next argument to be processed. Which is the "next argument to be processed" - the argument after the one that supplied the option written into the name arg, or the argument that will be processed by the next call to getopts ? It makes a difference when the argument in question has two (or more) options in it, and anything but the last of them is being processed now. Eg: (given an optstring with "xy" in it (no colons)) script -xy -d if getopts is used in script to process those options, then where name is set to 'x', this same arg will be processed again next time to return 'y', but the "next argument" is the one containing -d in many people's interpretation (and different shells interpret it each way, in some OPTIND is 1 for 'x' and '2' for 'y', in others it is 2 for both 'x' and 'y'). yash is different, it's (intermediate) OPTIND settings contain the index of the arg being processed, a colon, and the index of the option char within that arg (so would be 1:2 and 1:3 in this case). The standard is unclear what is intended here, it would be better to simply say that the value of OPTIND at this point is unspecified, as in practice there isn't anything much a script can do with it anyway, even if we did pick one of the plausible interpretations. Pretending that a simple integer is useful to the implementation (which the definition at line 98888 does) is not helpful to anyway - to keep track of whet it is up to, the implementation either needs to use some other mechanism (ie: not use OPTIND for anything except when the application does OPTIND=1) or it needs (as yash does) to encode more than just an integer into OPTIND. Beyond that, is the term "index of" defined anywhere? (It isn't in XBD 3) If it is, there should be an xref, otherwise there should be a definition given here. What is its format? For the usage when getopts returns an exit status of 1, it is clearly intended to contain an integer, as the EXAMPLES section, shows at like 98951 shift $(($OPTIND - 1)) which wouldn't work if OPTIND were not an integer. But is that also actually required of the OPTIND returned upon other invocations? If the intent here was to rely upon the standard English use of the term, then that fails, as there really isn't one of those, to be useful an index has to be relative to some base, is the first option index 0 or index 1 (or something else) ? On line 98836 it is stated: The shell variables OPTIND and OPTARG shall be local to the caller of getopts WTF? What is that supposed to mean, that is, what does it mean to be local to something, and what exactly is the "caller of getopts" ?? Really! This is particularly absurd, as in the immediately following paragraph (lines 98840-1) it says: The shell variable specified by the name operand, OPTIND, and OPTARG shall affect the current shell execution environment; which makes sense, and is what implementations actually do. If that shell environment is "the caller" then what does it mean to be "local", that it isn't allowed to be exported? That it doesn't survive the termination of that shell environment? If this last one, then why does it need stating, what variables do survive the termination of the shell environment? Or was something else fanciful intended there ? Next, at lines 98862-3 the value in OPTARG shall be stripped of the option character and the '-'. So, if we have an optstring of "abc:d" and the invocation of getopts is getopts abc:d var -abcfoo -d then when 'var' is set to 'c' OPTARG is supposed to be "abfoo" ? (that is we remove the 'c' and the '-' as instructed). No, that can't be right, the option-argument is (at least implied by) XBD 12.1 (which isn't referenced anywhere in XCU 3/getopts - directly or indirectly, only XBD 12.2) the string which follows the option when it is included in the same argument as the option, so the 'ab' should not be included, just "foo" - but the '-' does not follow the option there either, so why is the standard saying that the '-' must be removed? Why isn't just saying that OPTARG is the option-argument (properly defined by an xref) and leaving it at that? Incidentally, XBD 3.244 is not very helpful here, all it says is an Option-Argument is: A parameter that follows certain options. In some cases an option-argument is included within the same argument string as the option--in most cases it is the next argument. The "follows" is suggestive, but "included within the same argument string" leaves more possibilities open. And why does that say "certain options" ? If it means options that require one, those aren't "certain". Just "some options" would be better there. In the RATIONALE, at lines: 98964-6 : Although a leading <plus-sign> in optstring is required to have no effect on the behavior of getopt(), this standard intentionally allows implementations of the getopts utility to use a leading <plus-sign> as an extension that alters behavior. First, I am not sure just where it intentionally does that, the RATIONALE isn't a normative part of the standard, so that paragraph can't be it, did I miss something? But ignoring that... Implementations are to be allowed to support a leading '+' in optstring. But how does that effect (at line 98821, and I think other places, like line 98895, there might be more): If the first character of optstring is a <colon> ... In XSH/getopt it is clear that the optional '+' precedes the optional ':' in optstring, but if that is followed here, how can that ':' be the first character of optstring? Must the application use only one or the other, or is getopts doing the reverse of getopt() and requiring the order be ":+..." (and if so, where does it say so) or should the wording here be fixed so it works like the getopt() function ? And while we're here. the first mention of options (line 98803) should contain an xref to XBD 3.243, the first mention of option-arguments (also on line 98803) should have an xref to XBD 3.243 and the first mention of operand (I think on line 98831) should have an xref to XBD 3.241. These xrefs then each refer to XBD 12.1 which shows better than the definitions how those things are formed (particularly in bullet point 1) - but referencing the definitions is better I think (XBD 12.1 does not refer back to XBD 3). |
||||||
Desired Action |
Fix it all... Maybe some wording, for some of it, may follow sometime later, in a note. |
||||||
Tags | applied_after_i8d3, issue8 | ||||||
Attached Files | |||||||
|
Relationships | |||||||
|
Notes | |
(0006555) kre (reporter) 2023-10-28 05:19 |
I have just realised there is yet another problem with the spec of getopts beyond those above... On page 2955 (lines 98843...) - right at the bottom of that page (which is the first page of the getopts spec) it says: Any other attempt to invoke getopts multiple times in a single shell execution environment with parameters (positional parameters or arg operands) that are not the same in all invocations, or with an OPTIND value modified to be a value other than 1, produces unspecified results. The problem is that final "or with an OPTIND value modified..." as the spec actually requires that getopts modify OPTIND each time it is invoked, and some of those modifications will be to values other than 1 (and the application cannot know, in advance, when that will happen). In effect that sentence (the "produces unspecified results") means that every invocation of getopts, other than the first after OPTIND has been initialised to 1, is potentially unspecified. I suspect what this sentence meant to say was "or with an OPTIND value modified by the application to be a value other than 1," - but that isn't what it currently says. |
(0006556) kre (reporter) 2023-10-28 05:36 edited on: 2023-10-28 06:34 |
This note deleted ... it just wondered about some relationships with other issues that were inadvertently applied (as noted in Note: 0006558) Since that has been fixed, there is no need for a note asking about it. Nor was there any need for any apologies - mistakes happen, it just surprised me at first - when I added Note: 0006557 (to the correct 0001785) I had worked out what probably happened. Thanks for fixing it so quickly. |
(0006558) Don Cragun (manager) 2023-10-28 06:25 |
Re Note: 0006556: I apologize; I should know better than to try to update bug reports this late at night. I intended to note the relationships between 0001785 (instead of this bug) and 0001535, 0001393, and 0000351. I will correct the relationships now. |
(0006568) shware_systems (reporter) 2023-11-13 18:09 |
I think originally the getopts utility interface assumed a user would specify voluntarily all options be preceded by a <dash>, or <plus>, as separate arguments, e.g. "-a -b" and not "-ab", and having multiple options was more a syntax line documentation convenience only. There may have been thoughts too on making it the shells responsibility to split apart multiple options to this format before processing lines of a script so getopt wouldn't need to be bothered, but it doesn't look like any shells ever implemented this. Then OPTIND as documented would specify which argument that had a leading option <dash> was next to be referenced unambiguously. Without such munging it is probably better to make OPTIND an opaque variable of unspecified format, not numeric, that only getopt may reliably reference. |
(0006569) kre (reporter) 2023-11-13 20:16 |
Re: Note: 0006568 The first paragraph cannot possibly be correct, unix programs have been using multiple flag options after a single '-' since about when (perhaps exactly when, 'twas before my time) they were invented. "ls -al" is a simple example that has been with us forever. There is no way that anyone, anywhere, ever, would have even considered requiring that to be "ls -a -l". Further, it is getopts' role to parse the option args (and was getopt's before that, as much as it was able) expecting the shell to parse them (which it would need to do to distinguish between a: and al as the optstring, which varies how ls -al would need to be treated) and then invoke getopts to parse them again would be absurd. The second paragraph (2nd sentence in particular, we can't do the first, as there is no existing standard to document) I almost agree with - except that we write "unspecified value" not "opaque" (the meaning is almost the same), and that we must require OPTIND to contain a string representing an integer after getopts has returned "no more" (ie: exit status 1), as we must be able to do "shift $(( OPTIND - 1 ))" In general, the only time a script should reference OPTIND is after getopts has indicated the options are done (and with that in mind, it might be worth adding a note in the application usage section advising against a "break" out of a while getopts ... ; do ; done loop, the loop should be allowed to terminate naturally) and it can be set to 1 (OPTIND=1) before the getopts loop starts to reinit things. |
(0006598) geoffclare (manager) 2023-12-11 17:37 |
The behavior when optstring contains a leading <plus-sign> is unspecified, courtesy of the following normative text in the description of optstring:The characters <question-mark> and <colon> shall not be used as option characters by an application. The use of other option characters that are not alphanumeric produces unspecified results. |
(0006600) Don Cragun (manager) 2023-12-11 17:50 edited on: 2023-12-14 16:51 |
On P67, L2057-2058 (XBD 3.244 Option-Argument definition) change:A parameter that follows certain options. In some cases an option-argument is included within the same argument string as the option—in most cases it is the next argument.to: A parameter that follows certain options. In some cases an option-argument immediately follows the option character within the same argument string as the option; otherwise the option-argument is the next argument string. On page 2995 line 98801 change: to:getopts optstring name [arg...] getopts optstring name [param...] And globally rename s/arg/param/ elsewhere in the remainder of the getopts page. On page 2995 lines 98806-98808 Change: Each time it is invoked, the getopts utility shall place the value of the next option in the shell variable specified by the name operand and the index of the next argument to be processed in the shell variable OPTIND. Whenever the shell is invoked, OPTIND shall be initialized to 1.to: When the shell is first invoked, the shell variable OPTIND shall be initialized to 1. Each time getopts is invoked, it shall place the value of the next option found in the parameter list in the shell variable specified by the name operand and the shell variable OPTIND shall be set as follows: Replace Lines 98830-98835 with: When the end of options is encountered, the getopts utility shall exit with a return value of one; the shell variable OPTIND shall be set to the index of the argument containing the first operand in the parameter list, or the value 1 plus the number of elements in the parameter list if there are no operands in the parameter list; the name variable shall be set to the <question-mark> character. Any of the following shall identify the end of options: the first "--" element of the parameter list that is not an option-argument, finding an element of the parameter list that is not an option-argument and does not begin with a '−', or encountering an error. Change lines 98836-98837 from: The shell variables OPTIND and OPTARG shall be local to the caller of getopts and shall not be exported by default.to: The shell variables OPTIND and OPTARG shall not be exported by default. Change lines 98840-98841 from: The shell variable specified by the name operand, OPTIND, and OPTARG shall affect the current shell execution environment;to: The getopts utility can affect OPTIND, OPTARG, and the shell variable specified by the name operand, within the current shell execution environment; On P2956, L98845-98846 change: ... or with an OPTIND value modified to be a value other than 1, produces unspecified resultsto: ... or with an OPTIND value modified by the application to be a value other than 1, produces unspecified results On P2956, L98861-98863 change: If the option-argument is not supplied as a separate argument from the option character, the value in OPTARG shall be stripped of the option character and the '−'.to: Whether or not the option-argument is supplied as a separate argument from the option character, the value in OPTARG shall only be the characters of the option-argument. Change P2956, L98868-98869 from: The getopts utility by default shall parse positional parameters passed to the invoking shell procedure. If args are given, they shall be parsed instead of the positional parameters.to: By default, the list of parameters parsed by the getopts utility shall be the positional parameters currently set in the invoking shell environment (<tt>"$@"</tt>). If param operands are given, they shall be parsed instead of the positional parameters. Note that the next element of the parameter list need not exist; in this case, OPTIND will be set to <tt>$#+1</tt> or the number of param operands plus 1. After P2958, L98964-98966 that currently says:: Although a leading <plus-sign> in optstring is required to have no effect on the behavior of getopt( ), this standard intentionally allows implementations of the getopts utility to use a leading <plus-sign> as an extension that alters behavior.add a new sentence: In fact, a <plus-sign> anywhere in the optstring in the getopts utility produces unspecified behavior. |
(0006602) kre (reporter) 2023-12-11 23:41 edited on: 2023-12-11 23:42 |
Re Note: 0006600 The change before the final one (lines 98868...) all look to be OK to be (at least without yet considering their context, I have yet to fit them into place to see what the whole thing reads like) but that final change looks "off" to me - that is, its final sentence. What does "Note that the next element of the parameter list need not exist;" mean here? And what relationship is $# supposed to have to the case where param operands are given? That text looks like it was perhaps meant to be placed elsewhere, or perhaps is an alternate version of text that was added elsewhere. |
(0006603) kre (reporter) 2023-12-11 23:52 |
II agree with Note: 0006598 - what I don't understand is why the Rationale (in the section quoted in the description here) says: this standard intentionally allows implementations of the getopts utility to use a leading <plus-sign> as an extension that alters behavior. That is, I don't see where it intentionally does that, or why, given that we agree it would produce unspecified results, anything needs to be said in the Rationale about it at all. Lots of things are permitted by the standard as extensions, almost none of them warrant mention, and when they do, it is usually to explain why a common extension (or a historic but obsolete usage) has not been included in the standard, rather than to suggest that some unspecified extra feature is somehow a good thing, so good that it is being "intentionally allowed". |
(0006604) geoffclare (manager) 2023-12-12 09:25 edited on: 2023-12-12 09:25 |
Re Note: 0006603 That rationale is there in order to draw attention to the difference between getopt() and getopts in their treatment of a leading <plus-sign>. |
(0006606) kre (reporter) 2023-12-12 21:22 |
Re Note: 0006604 That's fine - having something in the rationale - but it still says it "intentionally allows" and unless you can point to some normative text which does that, I consider that sentence to be outright fabrication (or if you like, a lie). What it should say is something more like: Although a leading <plus-sign> in optstring is required to |
(0006607) Don Cragun (manager) 2023-12-14 07:05 |
Re: Note: 0006602
|
(0006608) geoffclare (manager) 2023-12-14 09:46 |
Re Note: 0006606 It is not a fabrication, it is a factually correct statement about the decision-making process during resolution of bug 0000191. The decision we made was to require the requested leading <plus-sign> behaviour for getopt() but intentionally leave it unspecified for getopts in order to allow it to be used for extensions, as stated in the rationale for getopt():Note that the use of a leading <plus-sign> in optstring is only standardized for getopt(). Use of a <plus-sign> is intentionally left unspecified for the getopts utility, where historical implementations did not require a leading <plus-sign> for conforming behavior, and because some historical getopts implementations used a leading <plus-sign> for a different extension. |
(0006609) Don Cragun (manager) 2023-12-14 16:50 edited on: 2023-12-14 16:53 |
Note: 0006600 has been updated in place to address issues discussed in Note: 0006602-Note: 0006604 and Note: 0006606-Note: 0006608. The status remains resolved. |
(0006610) kre (reporter) 2023-12-15 00:05 |
Re Note: 0006607 Note: 0006608 and Note: 0006609 I know what you're trying to say in that paragraph at lines 98868... but the wording is still poor. "The next element in the parameter list" is just ordinary English text, if you wanted to make that be something special, so that a reader would associated it with the earlier text, you'd need to either define a term to represent this object, or at the very least set it off using some stylistic mechanism (quotes, italics, ...) - in both cases. It is worse, because the sentence that starts "Note that" is in the paragraph about where the parameters come from, and it is natural to read the "in this case" as referring to the case where the params are given on the getopts command line, rather than those that come from the positional params, which is not what is intended. It would be improved if that were a separate paragraph (but that alone is not really enough). I'd prefer it if instead of "next argument in the parameter list need not exist", it simply said "there need not be any parameters in the list following the last option or option-argument processed" and then in the earlier bullet points, instead of making OPTIND the index of the next element of the parameter list, if it exists, just define it to be one more than the index of the last param consumed by the processing (which necessarily must exist, and hence avoids the philosophical issue of how something which does not exist gets numbered, and hence the need to specially handle that case). I really see no need for obscure wording, when alternates are just as good, and clearer. Wrt the other issue - the sentence being added (currently the last sentence in Note: 0006600) is OK, but is unrelated to the issue I have. Note: 0006608 makes things quite clear I think, and requires a change of the wording. What is clear from that note, is that the standards developers decided to leave '+' unspecified. That's fine, and if the text said that, then it would be OK. But it doesn't. It says that "the standard intentionally"... which is a different thing entirely. Believe it or not, the standards developers, and the standard, are two different things. Since no-one has even attempted to show any normative text in the standard which is "intentionally allowing" the use of the leading plus-sign in the getopts optstring, then I conclude that it simply does not exist. If you were to take the opinion that anything which is unspecified is intentionally allowing any kind of behaviour, then you're opening a whole bunch of cans of worms all over the place, which would make it very difficult to ever add anything to the standard, because all of those unspecified items can be argued to be intentionally allowing whatever I want them to mean - hence blocking any different interpretation in the future. Again, this is just wording, not the intended meaning, I believe, so I really cannot understand the insistence on not using words that are clearer, just because some other words currently exist. You can also compare the words added to getopt() by 0000191 (which are reasonable) which were quoted in Note: 0006608 where it says that the use of a leading <plus-sign> in getopts is unspecified. That's fine. But what the rationale in getopts is saying is that the standard is intentionally allowing implementations to use a leading '+' - which is an entirely different thing to say. Please, just fix the wording! |
(0006611) geoffclare (manager) 2023-12-18 10:43 |
> Believe it or not, the standards developers, and the standard, are two different things. That is true, but the standard, since it is has no consciousness, cannot itself intend anything. The phrase "this standard intentionally allows" effectively means "the standard developers intentionally made this standard allow". As per the decision made in the Dec 14 teleconference, this bug has now been applied to the troff source so that it can be included in draft 4. If you believe that the wording should be changed to state that the intention here is that of the standard developers, by all means submit a bug against draft 4. However, the intention itself should remain described the way it is. Trying to alter it would be an attempt to alter the past, because it is a true statement about the intentions of the standard developers when bug 191 was resolved. |
Issue History | |||
Date Modified | Username | Field | Change |
2023-10-22 06:14 | kre | New Issue | |
2023-10-22 06:14 | kre | Name | => Robert Elz |
2023-10-22 06:14 | kre | Section | => XCU 3 / getopts |
2023-10-22 06:14 | kre | Page Number | => 2955 - 2959 |
2023-10-22 06:14 | kre | Line Number | => 98803 - 98966 |
2023-10-22 06:40 | kre | Tag Attached: issue8 | |
2023-10-28 05:08 | Don Cragun | Relationship added | related to 0001535 |
2023-10-28 05:10 | Don Cragun | Relationship added | related to 0001393 |
2023-10-28 05:10 | Don Cragun | Relationship added | parent of 0000351 |
2023-10-28 05:19 | kre | Note Added: 0006555 | |
2023-10-28 05:36 | kre | Note Added: 0006556 | |
2023-10-28 06:25 | Don Cragun | Note Added: 0006558 | |
2023-10-28 06:27 | Don Cragun | Relationship deleted | related to 0001535 |
2023-10-28 06:28 | Don Cragun | Relationship deleted | related to 0001393 |
2023-10-28 06:34 | Don Cragun | Relationship deleted | parent of 0000351 |
2023-10-28 06:34 | kre | Note Edited: 0006556 | |
2023-11-13 18:09 | shware_systems | Note Added: 0006568 | |
2023-11-13 20:16 | kre | Note Added: 0006569 | |
2023-11-14 09:44 | geoffclare | Tag Detached: issue8 | |
2023-11-15 22:23 | salewski | Issue Monitored: salewski | |
2023-12-11 17:37 | geoffclare | Note Added: 0006598 | |
2023-12-11 17:41 | Don Cragun | Note Added: 0006599 | |
2023-12-11 17:50 | Don Cragun | Note Added: 0006600 | |
2023-12-11 17:55 | Don Cragun | Note Edited: 0006600 | |
2023-12-11 17:58 | Don Cragun | Note Edited: 0006600 | |
2023-12-11 17:59 | Don Cragun | Status | New => Resolved |
2023-12-11 17:59 | Don Cragun | Resolution | Open => Accepted As Marked |
2023-12-11 18:01 | Don Cragun | Final Accepted Text | => See Note: 0006600. |
2023-12-11 18:02 | Don Cragun | Tag Attached: issue8 | |
2023-12-11 19:22 | Don Cragun | Note Deleted: 0006599 | |
2023-12-11 23:41 | kre | Note Added: 0006602 | |
2023-12-11 23:42 | kre | Note Edited: 0006602 | |
2023-12-11 23:52 | kre | Note Added: 0006603 | |
2023-12-12 09:25 | geoffclare | Note Added: 0006604 | |
2023-12-12 09:25 | geoffclare | Note Edited: 0006604 | |
2023-12-12 21:22 | kre | Note Added: 0006606 | |
2023-12-14 07:05 | Don Cragun | Note Added: 0006607 | |
2023-12-14 09:46 | geoffclare | Note Added: 0006608 | |
2023-12-14 09:47 | geoffclare | Relationship added | related to 0000191 |
2023-12-14 16:44 | Don Cragun | Note Edited: 0006600 | |
2023-12-14 16:50 | Don Cragun | Note Added: 0006609 | |
2023-12-14 16:51 | Don Cragun | Note Edited: 0006600 | |
2023-12-14 16:53 | Don Cragun | Note Edited: 0006609 | |
2023-12-15 00:05 | kre | Note Added: 0006610 | |
2023-12-18 10:27 | geoffclare | Status | Resolved => Applied |
2023-12-18 10:28 | geoffclare | Tag Attached: applied_after_i8d3 | |
2023-12-18 10:43 | geoffclare | Note Added: 0006611 | |
2024-06-11 09:12 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |