Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000773 [1003.1(2008)/Issue 7] Base Definitions and Headers Objection Enhancement Request 2013-10-16 02:51 2013-10-28 01:30
Reporter dwheeler View Status public  
Assigned To ajosey
Priority normal Resolution Open  
Status Under Review  
Name David A. Wheeler
Organization
User Reference
Section 9 Regular Expressions
Page Number 187-193
Line Number 6068-6337
Interp Status ---
Final Accepted Text
Summary 0000773: Summary: Add \+, \?, and \| to Basic Regular Expressions (BREs)
Description BREs are the default or only regular expression format supported by some tools. However, BREs as currently defined in POSIX don’t support \+, \?, or \| as BRE equivalents of the ERE +, ?, or |. These capabilities are built into EREs because they are convenient and useful; BREs should be updated to provide these capabilities in a backwards-compatible way.

These are already available in multiple implementations. GNU’s BRE implementation already supports \+, \?, and \|. MacOS also supports these when the REG_ENHANCED flag is used: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man7/re_format.7.html [^]

The proposed "desired action" was created by copying and modifying some of the ERE text into the rules for BREs.
Desired Action Insert before line 6068 the following text as a new numbered item (this text is based on lines 6170-6174):
“When a BRE matching a single character or a BRE enclosed in parentheses is followed
by <backslash> <plus-sign> (’\+’), that sequence shall match what one or more consecutive occurrences of the BRE would match. For example, the BRE "b\+(bc)" matches the fourth to seventh characters in the string "acabbbcde". And, "[ab]\+" and "[ab][ab]*" are equivalent.”

Insert before line 6072 the following text as a new numbered item (this text is based on lines 6181-6184):
When a BRE matching a single character or a BRE enclosed in parentheses is followed
by <backslash> <question-mark> (’\?’), that entire sequence shall match what zero or one consecutive occurrences of the BRE would match. For example, the BRE "b\?c" matches the second character in the string "acabbbcde".

Insert before line 6089 a new subsection “BRE Alternation” with the following text (this text is based on lines 6200-6205):
Two BREs separated by <backslash> <vertical-line> (’\|’) shall match a string that is matched by either. For example, the BRE "a((bc)\|d)" matches the string "abc" and the string "ad". Single characters, or expressions matching single characters, separated by the <backslash> <vertical-line> and enclosed in parentheses, shall be treated as a BRE matching a single
character.

In section 9.3.7’s table, modify it as follows (this is based on the table in section 9.4.8): for “Single-character-BRE duplication” add \+ and \?. Also add a new row, Alternation, with value \|.

After line 6313, add:
%token Back_plus Back_star Back_bar
/* \+ \* \| */

On line 6314 and later, rename basic_reg_exp to BRE_expression, and insert above it the following text based on the equivalent ERE grammar:
basic_reg_exp : BRE_branch
| basic_reg_exp ’\|’ BRE_branch
;
BRE_branch : BRE_expression
| BRE_branch BRE_expression

After line 6337, add to RE_dupl_symbol :
| Back_plus
| Back_star
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0001914)
geoffclare (manager)
2013-10-16 09:05

For this to stand any chance of being accepted, some major omissions
in the desired action need to be addressed:

9.3.8 needs updating (this may also affect the grammar).

Need to say something about \+ or \? at the beginning of a BRE or
following \|, ^ (when special), or \(, and about * or \{ following \|.

Some changes are needed on the regcomp() page (e.g. item 2 in the
numbered list).

Changes might be needed to the REG_* error macros on the regcomp()
and <regex.h> pages - looks like they need an overhaul anyway to
distinguish properly between BREs and EREs.

There may be more omissions - these are just the ones that have
occurred to me so far.

Also the phrase "enclosed in parentheses" is incorrect for BREs; it
should be: enclosed between "\(" and "\)".
(0001946)
dwheeler (reporter)
2013-10-28 01:30

Thanks for the comments!

I'm confused about "9.3.8 needs updating". That section is "BRE Expression Anchoring", which is about "^" and "$"... which is not related at all. I'm guessing that you meant another section, can you tell me which one?

I'd be okay with leaving "weird" situations unspecified. E.G., at line 6084-6085, change "The behavior of multiple adjacent duplication symbols (’*’ and intervals) produces undefined results." into the following:
The behavior of multiple adjacent duplication symbols (’*’, ’\+’, ’\?’, and intervals) produces undefined results. The behavior a \? or \+ that is initial (begins a BRE, follows \|, follows ^ when special, or follows "\(") produces undefined results. The behavior or "*" or "\{" following "\|" produces undefined results.

There are, of course, good arguments for producing an error instead, so if anyone wants to require that, that's great.

To fix regcomp(), starting line 57468, Change:
 ’*’ or "\{\}" appears immediately after the subexpression in a basic regular expression..."
Into:
 ’*’ or "\{\}" or "\+" or "\?" appears immediately after the subexpression in a basic regular expression..."

Before line 57472 add the following text (which is intentionally similar to what is there):
’\|’ is used in a basic regular expression to select this subexpression or
another, and the other subexpression matched.

Note: If people DO want to cause bad \? to produce errors, then we need to modify line 57505: Change "REG_BADRPT ’?’, ’*’, or ’+’ not preceded by valid regular expression." to add to the end of the list '\?', '\+'.

- Issue History
Date Modified Username Field Change
2013-10-16 02:51 dwheeler New Issue
2013-10-16 02:51 dwheeler Status New => Under Review
2013-10-16 02:51 dwheeler Assigned To => ajosey
2013-10-16 02:51 dwheeler Name => David A. Wheeler
2013-10-16 02:51 dwheeler Section => 9 Regular Expressions
2013-10-16 02:51 dwheeler Page Number => 187-193
2013-10-16 02:51 dwheeler Line Number => 6068-6337
2013-10-16 09:05 geoffclare Note Added: 0001914
2013-10-28 01:30 dwheeler Note Added: 0001946


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker