Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000793 [1003.1(2013)/Issue7+TC1] Base Definitions and Headers Editorial Enhancement Request 2013-11-15 14:37 2020-03-30 08:40
Reporter steffen View Status public  
Assigned To ajosey
Priority normal Resolution Accepted  
Status Applied  
Name steffen
Organization
User Reference
Section Vol 1. 9.4.6, Vol. 1. 13. regex.h, Vol 2. regcomp(), Vol. 4. A.9.2
Page Number 190, 322, 1783, 3501
Line Number 6195, 10781, 57428, 118310
Interp Status ---
Final Accepted Text
Summary 0000793: Regular Expressions: add REG_MINIMAL and a minimum repitition modifier
Description The current POSIX regular expressions only offer a very restricted set of functionality, which forces many, if not most, real-life programs to use
external regular expressions libraries which add features like non-greediness,
positive and negative lookaround assertions and Unicode compatibility.

Some Open Group members already ship with regular expression facilities which support at least some of the extensions, and there exist long-proven, stable and free (also for commercial use, BSD-licensed), almost drop-in, alternatives which can be used by the others (see the mailing list for some references).
Desired Action - Vol. 1: Base Definitions, Chapter 9, «Regular Expressions».

  9.4.6 EREs Matching Multiple Characters, p. 190, line 6195:
  insert after

        6. Each of the duplication symbols (’+’, ’*’, ’?’, and intervals) may
           be suffixed by the minimal repitition modifier ’?’ <question-mark>,
           in which case matching behaviour is changed from the «leftmost
           longest possible match» to the «leftmost shortest possible match»,
           including the null match
           (see [reference to A.9, p. 3500 ff.]). For example, the ERE ".*c"
           matches the last character (’c’) in the string "abc abc", whereas
           the ERE ".*?c" matches the first character ’c’, the third character
           in the string.

           If the REG_MINIMAL flag, as defined in the <regex.h>[REF] header,
           is used when compiling an ERE via regcomp(3)[REF], the «leftmost
           shortest possible match» is the default, and the minimal repitition
           modifier ’?’ can be used to select the «leftmost longest possible
           match».

  change, on (current) line 6195 ff.,

        The behavior of multiple adjacent duplication symbols (’+’, ’*’, ’?’,
        and intervals) produces undefined results.
        
  to
  
        The behavior of multiple adjacent duplication symbols (’+’, ’*’, ’?’,
        and intervals, possibly suffixed by the minimal repitition modifier)
        produces undefined results.

- Vol. 1: Base Definitions, Chapter 13, «Headers».

  On p. 322, line 10781
  insert after

        REG_MINIMAL Change default matching behaviour to »leftmost shortest
                possible match». Only applicable to REG_EXTENDED regular
                expressions.

- Vol. 2: System Interfaces.

  On p. 1783, line 57428
  insert after

        REG_MINIMAL Change default matching behaviour to »leftmost shortest
                possible match». Only applicable to REG_EXTENDED regular
                expressions.

- Vol. 4: Rationale (Informative),
  A.9.2 «Regular Expression General Requirements».

  On p. 3501, line 118310
  insert after

        EREs can optionally use a «leftmost-shortest» rule (enabled via
        the REG_MINIMAL flag and/or the ’?’ minimal repitition modifier), in
        which case the «shortest possible matching prefix» is instead
        identified as the matching sequence.
Tags issue8
Attached Files

- Relationships
parent of 0001329New Problem in resolution of 0000793: "Regular Expressions: add REG_MINIMAL and a minimum repitition modifier" 
Not all the children of this issue are yet resolved or closed.

-  Notes
(0002092)
Don Cragun (manager)
2013-12-21 20:56

It is not possible to make changes to an approved TC and the page and line numbers don't match TC1 either.

This bug has been moved from project 2008-TC1 with category Rationale to project 1003.1(2013)/Issue7+TC1 with category Base Definitions and Headers.
(0004807)
geoffclare (manager)
2020-03-30 08:40

When applying this bug I spotted that "up to" was missing from the item 6 addition to 9.4.6 and inserted it.

The source currently has:
For example, the ERE
.sG ".*c"
matches up to the last character (\c
.cH c )
in the string
.sG "abc abc" ,
whereas the ERE
.sG ".*?c"
matches up to the first character
.cH c ,
the third character in the string.

(I also fixed the spelling of "repetition").

- Issue History
Date Modified Username Field Change
2013-11-15 14:37 steffen New Issue
2013-11-15 14:37 steffen Status New => Under Review
2013-11-15 14:37 steffen Assigned To => ajosey
2013-11-15 14:37 steffen Name => steffen
2013-11-15 14:37 steffen Section => Vol 1. 9.4.6, Vol. 1. 13. regex.h, Vol 2. regcomp(), Vol. 4. A.9.2
2013-11-15 14:37 steffen Page Number => 190, 322, 1783, 3501
2013-11-15 14:37 steffen Line Number => 6195, 10781, 57428, 118310
2013-12-21 20:48 Don Cragun Project 2008-TC1 => 1003.1(2013)/Issue7+TC1
2013-12-21 20:56 Don Cragun Interp Status => ---
2013-12-21 20:56 Don Cragun Note Added: 0002092
2013-12-21 20:56 Don Cragun Category Rationale => Base Definitions and Headers
2014-01-09 17:15 Don Cragun Status Under Review => Resolved
2014-01-09 17:15 Don Cragun Resolution Open => Accepted
2014-01-09 17:15 Don Cragun Tag Attached: issue8
2020-03-25 15:58 geoffclare Status Resolved => Applied
2020-03-29 13:50 rhialto Issue Monitored: rhialto
2020-03-29 23:03 Don Cragun Relationship added parent of 0001329
2020-03-30 08:40 geoffclare Note Added: 0004807


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker