Anonymous | Login | 2024-09-12 22:18 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||
ID | Category | Severity | Type | Date Submitted | Last Update | ||
0001320 | [1003.1(2016/18)/Issue7+TC2] Shell and Utilities | Editorial | Error | 2020-01-26 07:50 | 2024-06-11 09:08 | ||
Reporter | stephane | View Status | public | ||||
Assigned To | |||||||
Priority | normal | Resolution | Accepted As Marked | ||||
Status | Closed | ||||||
Name | Stephane Chazelas | ||||||
Organization | |||||||
User Reference | |||||||
Section | awk utility | ||||||
Page Number | 2493 | ||||||
Line Number | 80185-80194 | ||||||
Interp Status | --- | ||||||
Final Accepted Text | Note: 0005041 | ||||||
Summary | 0001320: /\n/ can match newline | ||||||
Description |
There is a very bizarre/confused text in the awk specification: > Except for the '~' and "!~" operators, and in the gsub, > match, split, and sub built-in functions, ERE matching > shall be based on input records; that is, record separator > characters (the first character of the value of the > variable RS, <newline> by default) cannot be embedded in > the expression, and no expression shall match the record > separator character. If the record separator is not > <newline>, <newline> characters embedded in the expression > can be matched. For the '~' and "!~" operators, and in > those four built-in functions, ERE matching shall be based > on text strings; that is, any character (including > <newline> and the record separator) can be embedded in the > pattern, and an appropriate pattern shall match any > character. It kind of implies that: echo x | awk -F'\n' '{$0 = "a\nb"; print /\n/; print $1}' should print 0 a b or possibly 0 x because /ERE/ or FS cannot match on the record separator and should match on the input record. That's not what awk implementations do. RE matching in those cases is not done on input records but on $0. The fact that $0 (in statements other than BEGIN) is initialised from the value of the current input record (which *at that point* didn't contain the then current value of RS) is irrelevant to describe how RE matching is done. RE matching behaviour is totally independent of the value of RS. RS is only used at the time a record is read. |
||||||
Desired Action |
Replace that whole section with something along the lines of: If the subject is not specified (like in ~, !~, match()...), regexps are matched against the current value of $0. Also, whether awk can deal with non-text data (NUL, byte values that don't form valid characters, strings longer than LINE_MAX) should probably be moved to some more generic section not specific to RE matching. |
||||||
Tags | tc3-2008 | ||||||
Attached Files | |||||||
|
Issue History | |||
Date Modified | Username | Field | Change |
2020-01-26 07:50 | stephane | New Issue | |
2020-01-26 07:50 | stephane | Name | => Stephane Chazelas |
2020-01-26 07:50 | stephane | Section | => awk utility |
2020-10-08 16:40 | Don Cragun | Page Number | => 2493 |
2020-10-08 16:40 | Don Cragun | Line Number | => 80185-80194 |
2020-10-08 16:40 | Don Cragun | Interp Status | => --- |
2020-10-09 09:43 | geoffclare | Project | 1003.1(2013)/Issue7+TC1 => 1003.1(2016/18)/Issue7+TC2 |
2020-10-09 10:23 | geoffclare | Note Added: 0005041 | |
2020-10-12 15:29 | geoffclare | Final Accepted Text | => Note: 0005041 |
2020-10-12 15:29 | geoffclare | Status | New => Resolved |
2020-10-12 15:29 | geoffclare | Resolution | Open => Accepted As Marked |
2020-10-12 15:29 | geoffclare | Tag Attached: tc3-2008 | |
2020-12-04 16:36 | geoffclare | Status | Resolved => Applied |
2024-06-11 09:08 | agadmin | Status | Applied => Closed |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |