Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0000616 [1003.1(2008)/Issue 7] System Interfaces Comment Clarification Requested 2012-09-26 15:47 2020-03-23 10:31
Reporter nick View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Nick Stoughton
Organization USENIX
User Reference nms-mbsnrtowcs-002
Section mbsnrtowcs
Page Number 1277
Line Number 41975
Interp Status ---
Final Accepted Text Note: 0001569
Summary 0000616: mbsnrtowcs clarification
Description In austin-group-l:archive/latest/17532 Matthew Dempsky posed the following question:

On Ubuntu 10.04, the code below prints "0 2".  This is the behavior
that I think logically makes sense (and that I was intending to
implement for OpenBSD).

However, my reading of mbsnrtowcs() description in Issue 7 is that the
correct output (assuming "en_US.UTF-8" is a valid UTF-8 based locale)
should be "0 0".

Issue 7 says:

"""
If dst is not a null pointer, the pointer object pointed to by src
shall be assigned either a null pointer (if conversion stopped due to
reaching a terminating null character) or the address just past the
last character converted (if any).
"""

However, in my test program, mbs+2 is in the *middle* of a
[multi-byte] character, not "just past" a [multi-byte] character.
Ubuntu 10.04's behavior would be consistent if the description was
"just past the last input byte consumed".

Am I misunderstanding something?  Or is there a bug in either Ubuntu
10.04's implementation or the POSIX wording?


#include <wchar.h>
#include <locale.h>
#include <string.h>
#include <stdio.h>

wchar_t wcs[100];
char mbs[100];

int main()
{
        setlocale(LC_CTYPE, "en_US.UTF-8");
        memcpy(mbs, "\xe7\x95\x8c", 4);
        const char *s = mbs;
        printf("%u ", (unsigned)mbsnrtowcs(wcs, &s, 2, 100, NULL));
        printf("%u\n", (unsigned)(s - mbs));
}


Further discussion noted that 'C99 does in fact state that
mbstate_t's conversion state includes tracking "the position within a
multibyte character", so multibyte character string inputs do not
necessarily need to be processed exclusively at multibyte character
boundaries. E.g., it's okay to call mbrtowc() to process one byte at
a time of a multibyte string.'

But more importantly, do any implementations of mbsnrtowcs() print "0
0"? Glibc, FreeBSD, and OS X all print "0 2". If no implementation
actually prints "0 0", then I think it makes sense to revise the
wording for mbsnrtowcs() to "just past the last byte processed"
instead of "just past the last multibyte character converted".

---
Given that a number of implementations do not follow the apparent requirements of the standard to process the src string character by character rather than byte by byte, I believe a formal interpretation is required.
Desired Action As described in 0000601, at page 1277 line 41977 change:

    past the last character converted (if any)

to:

    past the last byte processed (if any)

At page 1277 line 41986 change:

    ... limited to at most nmc bytes (the size of the input buffer).

to (all within the CX shading):

    ... limited to at most nmc bytes (the size of the input buffer).
    If the input buffer ends with an incomplete character,
    conversion shall stop at the end of the input buffer;
    a subsequent call to mbsnrtowcs() with an input
    buffer that starts with the remainder of the incomplete character
    shall correctly complete the conversion of that character.

    
Assuming that 0000601 is implemented,
at line 1278 line 42008 change FUTURE DIRECTIONS from:

    A future version may require that when the input buffer ends with
    an incomplete character, conversion stops at the end of the input buffer.

to
    None.
Tags issue8
Attached Files

- Relationships
child of 0000601Closedajosey mbsnrtowcs clarification 

-  Notes
(0001569)
geoffclare (manager)
2013-05-03 10:04

New proposed changes which match Note: 0001568...

At page 1277 line 41986 after applying the changes in 0000601, change:

    If the input buffer ends with an incomplete character, it
    is unspecified whether conversion stops at the end of the previous
    character (if any), or at the end of the input buffer. In the
    latter case, a subsequent call to mbsnrtowcs() with an input
    buffer that starts with the remainder of the incomplete character
    shall correctly complete the conversion of that character.

to:

    If the input buffer ends with an incomplete character,
    conversion shall stop at the end of the input buffer;
    a subsequent call to mbsnrtowcs() with an input
    buffer that starts with the remainder of the incomplete character
    shall correctly complete the conversion of that character.

Assuming that 0000601 is implemented,
at page 1278 line 42008 change FUTURE DIRECTIONS from:

    A future version may require that when the input buffer ends with
    an incomplete character, conversion stops at the end of the input buffer.

to
    None.

- Issue History
Date Modified Username Field Change
2012-09-26 15:47 nick New Issue
2012-09-26 15:47 nick Status New => Under Review
2012-09-26 15:47 nick Assigned To => ajosey
2012-09-26 15:47 nick Name => Nick Stoughton
2012-09-26 15:47 nick Organization => USENIX
2012-09-26 15:47 nick User Reference => nms-mbsnrtowcs-002
2012-09-26 15:47 nick Section => mbsnrtowcs
2012-09-26 15:47 nick Page Number => 1277
2012-09-26 15:47 nick Line Number => 41975
2012-09-26 15:47 nick Interp Status => ---
2012-09-26 15:47 nick Issue generated from 0000601
2012-09-26 15:47 nick Relationship added child of 0000601
2012-09-26 15:47 nick Tag Attached: issue8
2012-09-26 15:50 nick Desired Action Updated
2012-09-26 15:57 nick Desired Action Updated
2012-09-26 15:59 jim_pugsley Status Under Review => Resolved
2012-09-26 15:59 jim_pugsley Resolution Open => Accepted
2012-09-27 07:26 geoffclare Desired Action Updated
2013-05-03 10:04 geoffclare Note Added: 0001569
2013-05-03 10:04 geoffclare Status Resolved => Under Review
2013-05-03 10:04 geoffclare Resolution Accepted => Reopened
2013-05-16 15:43 msbrown Final Accepted Text => Note: 0001569
2013-05-16 15:43 msbrown Status Under Review => Resolved
2013-05-16 15:43 msbrown Resolution Reopened => Accepted As Marked
2020-03-23 10:31 geoffclare Status Resolved => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker