Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001834 [1003.1(2024)/Issue8] System Interfaces Editorial Error 2024-06-20 00:26 2024-07-13 01:10
Reporter Don Cragun View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Don Cragun
Organization
User Reference
Section strlen() & wcslen()
Page Number 2147, 2380
Line Number 70218-70220, 77131-77134
Interp Status ---
Final Accepted Text
Summary 0001834: strnlen() & wcsnlen() descriptions use of "terminating" NUL character
Description The description of the strlen() function is:
The strlen() function shall compute the number of bytes in the string to which s points, not including the terminating NUL character.

and this is fine since a string, by definition, is terminated by a NUL character. However, the description of the strnlen() function is:
<CX> The strnlen() function shall compute the smaller of the number of bytes in the array to which s points, not including any terminating NUL character, or the value of the maxlen argument. The strnlen() function shall never examine more than maxlen bytes of the array pointed to by s.</CX>

but this is a problem because an array of bytes does not have a terminating character. The description needs to be rewritten to more closely match the more correctly written return value section:
<CX The strnlen() function shall return the number of bytes preceding the first null byte in the array to which s points, if s contains a null byte within the first maxlen bytes; otherwise, it shall return maxlen.</CX>

The description of the wcsnlen() function:
<CX>The wcsnlen() function shall compute the smaller of the number of wide characters in the array to which ws points, not including any terminating null wide-character code, and the value of maxlen. The wcsnlen() function shall never examine more than the first maxlen characters of the wide-character array pointed to by ws.</CS>

suffers from the same logical problem.
Desired Action On P2147, L70218-70219 (strlen() DESCRIPTION) change:
smaller of the number of bytes in the array to which s points, not including any terminating NUL character, or the value
to:
smaller of the number of bytes before the first null byte in the array to which s points, if there is one, and the value

On P2380, L77131-77132 (wcslen() DESCRIPTION) change:
smaller of the number of wide characters in the array to which ws points, not including any terminating null wide-character code, and the value
to:
smaller of the number of wide characters before the first null wide-character code in the array to which ws points, if there is one, and the value
Tags No tags attached.
Attached Files pdf file icon n3252b.pdf [^] (330,264 bytes) 2024-06-21 15:24

- Relationships

-  Notes
(0006820)
geoffclare (manager)
2024-06-20 11:15

I agree the current wording is in need of improvement. However, since the C committee are adding these functions in their next revision, we should wait to see what wording they decide on.
(0006821)
Don Cragun (manager)
2024-06-20 14:30
edited on: 2024-06-20 14:32

Re: Note: 0006820:

We heard last week that the C committee is currently planning to call the bytes in the array a string even when those bytes do not contain a NUL byte.

I think we should suggest new wording to them to avoid both the wording they are planning to use and the wording currently in POSIX.

(0006822)
geoffclare (manager)
2024-06-20 14:57

My point was that we shouldn't just resolve this bug with new wording of our choosing; we need to liaise with the C committee and wait for their decision.
(0006823)
Don Cragun (manager)
2024-06-20 15:33

We discussed this during the 2024-06-20 meeting. We believe that the wording in the Desired Action is better than the current wording in the standard. Nick will e-mail Chris Bazeley (the author of the proposal to add these function to C2Y) with this as the direction to which POSIX is leaning.
(0006824)
eblake (manager)
2024-06-20 18:22

I asked the Linux man pages project about their willingness to update wording in strnlen.3, and they pointed me to https://man7.org/linux/man-pages/man7/string_copying.7.html [^] as a useful resource (covers more than POSIX, and doesn't visit the similarly-affected wcs* functions, but has a nice overview of various consistently used concepts)
(0006830)
nick (manager)
2024-06-21 15:28

An updated proposal from the C committee is attached (n3252b.pdf)


7.26.6.5 The strnlen function

Synopsis

1

#include <string.h>

size_t strnlen(const char *s, size_t n);

Description

2 The strnlen function counts not more than n characters (a null character and characters that
follow it are not counted) in the array to which s points. At most the first n characters of s shall be
accessed by strnlen.

Returns

3 The strnlen function returns the number of characters that precede the terminating null
character. If there is no null character in the first n characters of s then strnlen returns n.

7.31.4.7.3 The wcsnlen function

Synopsis

1

#include <wchar.h>

size_t wcsnlen(const wchar_t *s, size_t n);

Description

2 The wcsnlen function counts not more than n wide characters (a null wide character and wide
characters that follow it are not counted) in the array to which s points. At most the first n wide
characters of s shall be accessed by wcsnlen.

Returns

3 The wcsnlen function returns the number of wide characters that precede the terminating null
wide character. If there is no null wide character in the first n wide characters of s then wcsnlen
returns n.
(0006831)
eblake (manager)
2024-07-10 01:31

At https://lists.gnu.org/archive/html/bug-gnulib/2024-07/msg00094.html, [^] Paul Eggert argues:


> at which point, strnlen("", SIZE_MAX)_is_ allowed to_access_ beyond
> the NUL byte,

No it wouldn't, because strnlen must stop counting at the first null byte.

If this point isn't made clear in the current proposal, it should be made
clear. Lots of user code relies on strnlen doing the right thing even if the
string is shorter than n. In practice implementations that screw up in this
area, and are incompatible with glibc etc., are deemed broken and are fixed.
The standard should not allow further breakage.


The proposed wording allows an implementation to access beyond the NUL when a string is passed in, and Paul is arguing that the standard should be stricter and stop accessing at the first NUL or at n bytes, whichever is first (implying a specific linear access pattern, and preventing optimizations such as dividing the array in two, calculating constrained lengths on both halves in parallel, and then doing the appropriate math to return the correct answer even if more bytes than the returned value were accessed). That is, Paul wants code like this (present in the wild) to "work":

      len = strnlen (string, precision <= 0 ? SIZE_MAX : precision);
(0006832)
eblake (manager)
2024-07-11 15:54

Summarizing Note: 0006831, it would be nice if the C wording for strnlen() copied the C23 requirement on memchr("", 0, SIZE_MAX) reliably returning the first argument since "The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found."
(0006833)
kre (reporter)
2024-07-12 01:23

It would be good to remember that there are two (quite different) uses
for strnlen (and wcsnlen() -- though for that one the second use tends to
be more complex).

One is for dealing with "strings" stored in fixed length arrays of
chars (or wide chars) where the desire is to use every possible
storage element for meaningful data, omitting the terminating nul
character if the actual data fills the array - but including the
nul if the data is shorter. Examples were old style directory
entries in the filesystem with char name[14] in the struct, old style
utmp entries where the tty name (line) and user name were each stored
in char xxx[8] arrays. In this situation it is possible to access
any of 'n' (the size_t param to the function) bytes of the array,
but nothing with an index of n or greater (or negative of course).

The other use is for determining if a string is longer than N bytes,
without caring how long it actually is. Such strings are typically
stored always with a terminating nul character, in an "array" which
is exactly long enough for the string, and its terminating nul.
Referring to anything in that array beyond that terminating nul is
undefined behaviour. Example of this are the argv and environ data,
and any strings stored in appropriately sized memory from malloc().

[For why one might want this information, consider outputting a string
 summary in a fixed column width space (which is why the wide char example
 gets more complex, though the same principles apply) - eg: assume I have
 30 columns of a fixed with font to display the leading part of the string,
 but I also want to indicate when the string was longer, and if so, indicate
 that it was truncated in the tradiional way (terminatng elipsis). For
 this, all I need to know is whether the string is 31 chars long, or longer.
 If it is, I will take the first 27 chars, perhaps back up from there to a
 word end - depending upon the context, then append " ...". The actual
 string might be very long (like a chapter of a book, and all I want to
 display is "It was the best of times, ...") I don't need to know its length,
 and don't want to waste time determining that (or I'd just use strlen())
 so I use strnlen(string, 31) instead. If the answer is <= 30, then I simply
 use the entire string, however long it might be. If it is > 30 (ie: 31,
 the only possible case in this scenario) then I do the string truncation
 dance, and output that.
]

The wording needs to be done in such a way that both of these scenarios
work correctly.
(0006834)
eblake (manager)
2024-07-12 17:57

It would also be nice if strncmp ("a", "b", SIZE_MAX) were guaranteed to work in linear order in the same manner as memchr() rather than an implementation being permitted to access arbitrary bytes within the two arrays according to an intentionally over-large size.
(0006835)
kre (reporter)
2024-07-13 01:10

Re Note: 0006834

I agree, though this would probably be better addressed by the C
standard than here.

But in general, I'd suggest that any function (standard ones, or user
created functions) which access any character sequence parameter passed
to them, beyond either the first nul character, or the provided length
if there is one, should result in undefined behaviour. Naturally this
only applies in cases where the incoming parameter data is defined to
end at the first nul encountered. There should be a name for such objects,
they are almost strings, except those require that the terminating nul
exist, and this other type do not.

If that were done, then the sequential access to whatevers in all of these
functions would be guaranteed, as one cannot access p[1] without first
ensuring that p[0] is not nul, similarly accessing p[2] requires that p[1]
also not be nul (etc) - the only valid access regime is p[0] first, then
p[1] up to (but not exceeding) p[N] (where N is the provided length, if
that is available, or SIZE_MAX if it isn't), and never going beyond p[n]
if p[n] is nul, and p[i] 0 <= i < n are all not nul. Naturally n < N.

Once the maximum length that can be referenced is discovered, the
implementation is free to then access (again) the bytes in the data
in any order it likes. For something simple line strncmp() or strnlen()
it makes no real sense to do anything other that compare (or count) the
data as it is being examined for the first time, also looking for the nul
terminator, but for more complex operations, like pattern matching, other
access methods might work out better (matching the RE '^.*abc$' is faster
by starting at the end of the input string, and working backwards, than
at the beginning and working forwards).

All of this works (or should) for the wide char functions as well, which is
why I keep writing nul rather than '\0' as I intend that to mean the
appropriate nul character for the character data type involved.

- Issue History
Date Modified Username Field Change
2024-06-20 00:26 Don Cragun New Issue
2024-06-20 00:26 Don Cragun Name => Don Cragun
2024-06-20 00:26 Don Cragun Section => strlen() & wcslen()
2024-06-20 00:26 Don Cragun Page Number => 2147, 2380
2024-06-20 00:26 Don Cragun Line Number => 70218-70220, 77131-77134
2024-06-20 00:26 Don Cragun Interp Status => ---
2024-06-20 00:38 Don Cragun Description Updated
2024-06-20 00:38 Don Cragun Desired Action Updated
2024-06-20 11:15 geoffclare Note Added: 0006820
2024-06-20 14:30 Don Cragun Note Added: 0006821
2024-06-20 14:32 Don Cragun Note Edited: 0006821
2024-06-20 14:57 geoffclare Note Added: 0006822
2024-06-20 15:33 Don Cragun Note Added: 0006823
2024-06-20 18:22 eblake Note Added: 0006824
2024-06-21 15:24 nick File Added: n3252b.pdf
2024-06-21 15:28 nick Note Added: 0006830
2024-07-10 01:31 eblake Note Added: 0006831
2024-07-11 15:54 eblake Note Added: 0006832
2024-07-12 01:23 kre Note Added: 0006833
2024-07-12 17:57 eblake Note Added: 0006834
2024-07-13 01:10 kre Note Added: 0006835


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker