Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001320 [1003.1(2013)/Issue7+TC1] Shell and Utilities Editorial Error 2020-01-26 07:50 2020-01-26 07:50
Reporter stephane View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Stephane Chazelas
Organization
User Reference
Section awk utility
Page Number
Line Number
Interp Status ---
Final Accepted Text
Summary 0001320: /\n/ can match newline
Description There is a very bizarre/confused text in the awk specification:

> Except for the '~' and "!~" operators, and in the gsub,
> match, split, and sub built-in functions, ERE matching
> shall be based on input records; that is, record separator
> characters (the first character of the value of the
> variable RS, <newline> by default) cannot be embedded in
> the expression, and no expression shall match the record
> separator character. If the record separator is not
> <newline>, <newline> characters embedded in the expression
> can be matched. For the '~' and "!~" operators, and in
> those four built-in functions, ERE matching shall be based
> on text strings; that is, any character (including
> <newline> and the record separator) can be embedded in the
> pattern, and an appropriate pattern shall match any
> character.

It kind of implies that:

echo x | awk -F'\n' '{$0 = "a\nb"; print /\n/; print $1}'

should print

0
a
b

or possibly

0
x

because /ERE/ or FS cannot match on the record separator and should match on the input record.

That's not what awk implementations do.


RE matching in those cases is not done on input records but on $0. The fact that $0 (in statements other than BEGIN) is initialised from the value of the current input record (which *at that point* didn't contain the then current value of RS) is irrelevant to describe how RE matching is done. RE matching behaviour is totally independent of the value of RS. RS is only used at the time a record is read.
Desired Action Replace that whole section with something along the lines of:

If the subject is not specified (like in ~, !~, match()...), regexps are matched against the current value of $0.


Also, whether awk can deal with non-text data (NUL, byte values that don't form valid characters, strings longer than LINE_MAX) should probably be moved to some more generic section not specific to RE matching.
Tags No tags attached.
Attached Files

- Relationships

There are no notes attached to this issue.

- Issue History
Date Modified Username Field Change
2020-01-26 07:50 stephane New Issue
2020-01-26 07:50 stephane Name => Stephane Chazelas
2020-01-26 07:50 stephane Section => awk utility


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker