View Issue Details
| ID | Project | Category | View Status | Date Submitted | Last Update |
|---|---|---|---|---|---|
| 0001973 | 1003.1(2024)/Issue8 | Shell and Utilities | public | 2026-03-06 07:22 | 2026-06-06 14:42 |
| Reporter | stephane | Assigned To | |||
| Priority | normal | Severity | Objection | Type | Clarification Requested |
| Status | Interpretation Required | Resolution | Accepted As Marked | ||
| Name | Stephane Chazelas | ||||
| Organization | |||||
| User Reference | |||||
| Section | awk utility | ||||
| Page Number | 2610 | ||||
| Line Number | 85386-85394 | ||||
| Interp Status | Pending | ||||
| Final Accepted Text | see 0001973:0007438 | ||||
| Summary | 0001973: awk "numeric string " origins | ||||
| Description | The awk specification (https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/utilities/awk.html#tag_20_06_13_02) has: <<< A string value shall be considered a numeric string if it comes from one of the following: 1. Field variables 2. Input from the getline() function 3. FILENAME 4. ARGV array elements 5. ENVIRON array elements 6. Array elements created by the split() function 7. A command line variable assignment 8. Variable assignment from another numeric string variable >>> It can be interpreted as meaning that awk 'BEGIN{$1 = "10"; print ($1 > 2)}' should return 1 for instance. But no implementation that I know does so. By assigning a string to $1, it loses that special property whereby when containing a string that looks like a number it shall be considered as a number. Same applies for ARGV, FILENAME... Typo in rationale section btw: > also shall have the numeric value of the numeric string" was removed >from several sections of the ISO POSIX-2:1993 standard because *is* > specifies an unnecessary implementation detail is -> it | ||||
| Desired Action | Make it clear that it's 1. the values resulting from the splitting of $0 into $1, $2... (upon first dereferencing after reading a record (including via getline) or after assigning to $0) that are candidate for numeric strings, not the field variables per se, or change to "Field variables unless subsequently assigned a string value". 3. the current input file as initially assigned to FILENAME, or "FILENAME unless subsequently assigned a string value" And so on for ARGV and ENVIRON Or add some verbiage below that list along the lines of: > And the corresponding variables have not been subsequently assigned a string value. That still makes it ambiguous for things like: $1 = "10"; $0 = "11 12"; print ($1 > 2) Where $1 becomes a numeric string again after assignment to $0 | ||||
| Tags | tc1-2024 | ||||
|
|
May also be worth clarifying (in a separate ticket?) that in sub(ere, repl[, in ]) or gsub(ere, repl[, in ]), if "in" (or $0 if omitted) was a numeric string and there's been at least one substitution, then it becomes a non-numeric string even if it contains the valid representation of a number. That is for instance: printf '%s\n' 12 13 | awk '{gsub("2", "2")}; $0 > 2' Should output 13 only as 12 is successfully substituted with 12, making it a string which is not greater than "2" while 13 remains a numeric string as the substitution failed. |
|
|
For context, that came up at https://unix.stackexchange.com/questions/804798/awk-comparing-to-constant-numbers |
|
|
> 1. the values resulting from the splitting of $0 into $1, $2... Sorry, that wording is insufficient as that doesn't cover $0 itself, where it's its assigning from input (the current record or via getline) that is considered for numeric strings. For the case where $0 is recomputed when individual fields are modified, I find the behaviour varies between implementations. echo 10 | LC_ALL=C awk '{$1 = $1}; $0 > 2' outputs 10 in mawk, but not in busybox, GNU nor bwk's `awk`. While echo 10 | LC_ALL=C awk -v OFS=. '{$2 = 3}; $0 > 2' outputs 10 in none of them. |
|
|
Adjust summary as requested (seq 38910) |
|
|
Interpretation response ------------------------ The standard is unclear on this issue, and no conformance distinction can be made between alternative implementations based on this. This is being referred to the sponsor. Rationale: ------------- Strings are enclosed in double-quotes, and the standard says this. However it is unclear what a numeric string means. Notes to the Editor (not part of this interpretation): ------------------------------------------------------- At page 2610, line 85386, change:
to:
|
|
|
Re: 0001973:0007438 Thanks, but I fear the issue has been misinterpreted here as that resolution seems to be beside the point. My point here is that currently, the POSIX wording would suggest that: awk 'BEGIN{ $1 = 1 2 3; $2 = 4 5; print ($1 < $2) }' would be required to output 0 as those $1, $2 are *field variables* and contain something that looks like a number. No awk implementation returns 0 here. They all return 1, because it's not about *field variables* but about where the value they (or any other variable) have been assigned comes from. Here, those values come from the concatenation operator so are *strings*, not *numeric strings*. Just to clarify for those not familiar with this quirk of the awk language: numeric string *values* are value that are strings in that the exact text representation is preserved. For example: $ echo 01.000e0 | awk '{print $1}' 01.000e0 But that are treated numerically by the comparison operator: $ echo 01.000e0 1 | awk '{print $1 == $2}' 1 Because they contain a representation of a number and come from some particular origin, here the implicit splitting of the input record. One might argue that it's a misdesign for awk's == != < > <= >= operators to have been overloaded to do both string and number comparison (see how perl introduced separate eq ne lt gt le ge operators), but that's not the point I'm trying to make here and I'm not suggesting that should be changed in awk (too late for that). |
|
|
I'm suggesting a different wording: The following string values: - input records as assigned to $0 at the start of a cycle or to any variable by getline statements - the result of splitting strings, as assigned to field variables when $0 is assigned a new value, or as assigned to array element values by the split() function. - the current input file name as assigned to FILENAME - the command line arguments as assigned to values of elements of the ARGV array - the environment variable values as assigned to values of the elements of the ENVIRON array - the values of command line variable assignments shall be considered numeric strings if they meet an implementation-dependent condition corresponding to either case (a) or (b) below: It would also be worth noting that the type is attached to the *value*, not the variable it is assigned to, though the value's type is preserved upon assignment to a variable or array element value (not key which is always a string). |
| Date Modified | Username | Field | Change |
|---|---|---|---|
| 2026-03-06 07:22 | stephane | New Issue | |
| 2026-03-06 08:01 | stephane | Note Added: 0007389 | |
| 2026-03-06 09:59 | stephane | Note Added: 0007391 | |
| 2026-03-06 10:20 | stephane | Note Added: 0007392 | |
| 2026-03-06 10:21 | stephane | Note Edited: 0007392 | |
| 2026-03-06 10:25 | stephane | Note Edited: 0007389 | |
| 2026-03-07 11:48 | agadmin | Summary | awk "string variables" origin => awk "numeric string " origins |
| 2026-03-07 11:48 | agadmin | Interp Status | => --- |
| 2026-03-07 11:48 | agadmin | Note Added: 0007393 | |
| 2026-06-04 15:34 | nick | Page Number | (page or range of pages) => 2610 |
| 2026-06-04 15:34 | nick | Line Number | (Line or range of lines) => 85386-85394 |
| 2026-06-04 16:27 | nick | Note Added: 0007438 | |
| 2026-06-04 16:28 | nick | Status | New => Resolved |
| 2026-06-04 16:28 | nick | Resolution | Open => Accepted As Marked |
| 2026-06-04 16:28 | nick | Final Accepted Text | => see 0001973:0007438 |
| 2026-06-04 16:28 | nick | Tag Attached: tc1-2024 | |
| 2026-06-04 16:30 | nick | Note Edited: 0007438 | |
| 2026-06-04 16:30 | nick | Status | Resolved => Interpretation Required |
| 2026-06-04 16:30 | nick | Interp Status | --- => Pending |
| 2026-06-05 07:08 | stephane | Note Added: 0007439 | |
| 2026-06-06 14:42 | stephane | Note Added: 0007440 |