Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001320 [1003.1(2016/18)/Issue7+TC2] Shell and Utilities Editorial Error 2020-01-26 07:50 2020-12-04 16:36
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Stephane Chazelas
Organization
User Reference
Section awk utility
Page Number 2493
Line Number 80185-80194
Interp Status ---
Final Accepted Text Note: 0005041
Summary 0001320: /\n/ can match newline
Description There is a very bizarre/confused text in the awk specification:

> Except for the '~' and "!~" operators, and in the gsub,
> match, split, and sub built-in functions, ERE matching
> shall be based on input records; that is, record separator
> characters (the first character of the value of the
> variable RS, <newline> by default) cannot be embedded in
> the expression, and no expression shall match the record
> separator character. If the record separator is not
> <newline>, <newline> characters embedded in the expression
> can be matched. For the '~' and "!~" operators, and in
> those four built-in functions, ERE matching shall be based
> on text strings; that is, any character (including
> <newline> and the record separator) can be embedded in the
> pattern, and an appropriate pattern shall match any
> character.

It kind of implies that:

echo x | awk -F'\n' '{$0 = "a\nb"; print /\n/; print $1}'

should print

0
a
b

or possibly

0
x

because /ERE/ or FS cannot match on the record separator and should match on the input record.

That's not what awk implementations do.


RE matching in those cases is not done on input records but on $0. The fact that $0 (in statements other than BEGIN) is initialised from the value of the current input record (which *at that point* didn't contain the then current value of RS) is irrelevant to describe how RE matching is done. RE matching behaviour is totally independent of the value of RS. RS is only used at the time a record is read.
Desired Action Replace that whole section with something along the lines of:

If the subject is not specified (like in ~, !~, match()...), regexps are matched against the current value of $0.


Also, whether awk can deal with non-text data (NUL, byte values that don't form valid characters, strings longer than LINE_MAX) should probably be moved to some more generic section not specific to RE matching.
Tags tc3-2008
Attached Files

- Relationships

-  Notes
(0005041)
geoffclare (manager)
2020-10-09 10:23

Rather than the kind of radical change suggested in the desired action, I would prefer to make a minimal fix. The purpose of the paragraph is to point out a difference between matching input records and matching text strings; the only real problem is that it incorrectly states the condition under which matching is against input records. I suggest that an appropriate fix would be the following.

Change:
Except for the '~' and "!~" operators, and in the gsub, match, split, and sub built-in functions, ERE matching shall be based on input records; that is, record separator characters ...
to:
When ERE matching is performed against input records; that is, the match is against $0 and the current value of $0 resulted from processing an input record, record separator characters ...

Change:
For the '~' and "!~" operators, and in those four built-in functions, ERE matching shall be based on text strings; that is, any character ...
to:
When ERE matching is not performed against input records, it shall be based on text strings; any character ...

- Issue History
Date Modified Username Field Change
2020-01-26 07:50 stephane New Issue
2020-01-26 07:50 stephane Name => Stephane Chazelas
2020-01-26 07:50 stephane Section => awk utility
2020-10-08 16:40 Don Cragun Page Number => 2493
2020-10-08 16:40 Don Cragun Line Number => 80185-80194
2020-10-08 16:40 Don Cragun Interp Status => ---
2020-10-09 09:43 geoffclare Project 1003.1(2013)/Issue7+TC1 => 1003.1(2016/18)/Issue7+TC2
2020-10-09 10:23 geoffclare Note Added: 0005041
2020-10-12 15:29 geoffclare Final Accepted Text => Note: 0005041
2020-10-12 15:29 geoffclare Status New => Resolved
2020-10-12 15:29 geoffclare Resolution Open => Accepted As Marked
2020-10-12 15:29 geoffclare Tag Attached: tc3-2008
2020-12-04 16:36 geoffclare Status Resolved => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker