Anonymous | Login | 2024-09-07 14:19 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001834 | [1003.1(2024)/Issue8] System Interfaces | Editorial | Error | 2024-06-20 00:26 | 2024-07-13 01:10 | |||||||
Reporter | Don Cragun | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | |||||||||||
Name | Don Cragun | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | strlen() & wcslen() | |||||||||||
Page Number | 2147, 2380 | |||||||||||
Line Number | 70218-70220, 77131-77134 | |||||||||||
Interp Status | --- | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001834: strnlen() & wcsnlen() descriptions use of "terminating" NUL character | |||||||||||
Description |
The description of the strlen() function is:The strlen() function shall compute the number of bytes in the string to which s points, not including the terminating NUL character. and this is fine since a string, by definition, is terminated by a NUL character. However, the description of the strnlen() function is: <CX> The strnlen() function shall compute the smaller of the number of bytes in the array to which s points, not including any terminating NUL character, or the value of the maxlen argument. The strnlen() function shall never examine more than maxlen bytes of the array pointed to by s.</CX> but this is a problem because an array of bytes does not have a terminating character. The description needs to be rewritten to more closely match the more correctly written return value section: <CX The strnlen() function shall return the number of bytes preceding the first null byte in the array to which s points, if s contains a null byte within the first maxlen bytes; otherwise, it shall return maxlen.</CX> The description of the wcsnlen() function: <CX>The wcsnlen() function shall compute the smaller of the number of wide characters in the array to which ws points, not including any terminating null wide-character code, and the value of maxlen. The wcsnlen() function shall never examine more than the first maxlen characters of the wide-character array pointed to by ws.</CS> suffers from the same logical problem. |
|||||||||||
Desired Action |
On P2147, L70218-70219 (strlen() DESCRIPTION) change:smaller of the number of bytes in the array to which s points, not including any terminating NUL character, or the valueto: smaller of the number of bytes before the first null byte in the array to which s points, if there is one, and the value On P2380, L77131-77132 (wcslen() DESCRIPTION) change: smaller of the number of wide characters in the array to which ws points, not including any terminating null wide-character code, and the valueto: smaller of the number of wide characters before the first null wide-character code in the array to which ws points, if there is one, and the value |
|||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | n3252b.pdf [^] (330,264 bytes) 2024-06-21 15:24 | |||||||||||
|
Notes | |
(0006820) geoffclare (manager) 2024-06-20 11:15 |
I agree the current wording is in need of improvement. However, since the C committee are adding these functions in their next revision, we should wait to see what wording they decide on. |
(0006821) Don Cragun (manager) 2024-06-20 14:30 edited on: 2024-06-20 14:32 |
Re: Note: 0006820: We heard last week that the C committee is currently planning to call the bytes in the array a string even when those bytes do not contain a NUL byte. I think we should suggest new wording to them to avoid both the wording they are planning to use and the wording currently in POSIX. |
(0006822) geoffclare (manager) 2024-06-20 14:57 |
My point was that we shouldn't just resolve this bug with new wording of our choosing; we need to liaise with the C committee and wait for their decision. |
(0006823) Don Cragun (manager) 2024-06-20 15:33 |
We discussed this during the 2024-06-20 meeting. We believe that the wording in the Desired Action is better than the current wording in the standard. Nick will e-mail Chris Bazeley (the author of the proposal to add these function to C2Y) with this as the direction to which POSIX is leaning. |
(0006824) eblake (manager) 2024-06-20 18:22 |
I asked the Linux man pages project about their willingness to update wording in strnlen.3, and they pointed me to https://man7.org/linux/man-pages/man7/string_copying.7.html [^] as a useful resource (covers more than POSIX, and doesn't visit the similarly-affected wcs* functions, but has a nice overview of various consistently used concepts) |
(0006830) nick (manager) 2024-06-21 15:28 |
An updated proposal from the C committee is attached (n3252b.pdf)
|
(0006831) eblake (manager) 2024-07-10 01:31 |
At https://lists.gnu.org/archive/html/bug-gnulib/2024-07/msg00094.html, [^] Paul Eggert argues:
The proposed wording allows an implementation to access beyond the NUL when a string is passed in, and Paul is arguing that the standard should be stricter and stop accessing at the first NUL or at n bytes, whichever is first (implying a specific linear access pattern, and preventing optimizations such as dividing the array in two, calculating constrained lengths on both halves in parallel, and then doing the appropriate math to return the correct answer even if more bytes than the returned value were accessed). That is, Paul wants code like this (present in the wild) to "work": len = strnlen (string, precision <= 0 ? SIZE_MAX : precision); |
(0006832) eblake (manager) 2024-07-11 15:54 |
Summarizing Note: 0006831, it would be nice if the C wording for strnlen() copied the C23 requirement on memchr("", 0, SIZE_MAX) reliably returning the first argument since "The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found." |
(0006833) kre (reporter) 2024-07-12 01:23 |
It would be good to remember that there are two (quite different) uses for strnlen (and wcsnlen() -- though for that one the second use tends to be more complex). One is for dealing with "strings" stored in fixed length arrays of chars (or wide chars) where the desire is to use every possible storage element for meaningful data, omitting the terminating nul character if the actual data fills the array - but including the nul if the data is shorter. Examples were old style directory entries in the filesystem with char name[14] in the struct, old style utmp entries where the tty name (line) and user name were each stored in char xxx[8] arrays. In this situation it is possible to access any of 'n' (the size_t param to the function) bytes of the array, but nothing with an index of n or greater (or negative of course). The other use is for determining if a string is longer than N bytes, without caring how long it actually is. Such strings are typically stored always with a terminating nul character, in an "array" which is exactly long enough for the string, and its terminating nul. Referring to anything in that array beyond that terminating nul is undefined behaviour. Example of this are the argv and environ data, and any strings stored in appropriately sized memory from malloc(). [For why one might want this information, consider outputting a string summary in a fixed column width space (which is why the wide char example gets more complex, though the same principles apply) - eg: assume I have 30 columns of a fixed with font to display the leading part of the string, but I also want to indicate when the string was longer, and if so, indicate that it was truncated in the tradiional way (terminatng elipsis). For this, all I need to know is whether the string is 31 chars long, or longer. If it is, I will take the first 27 chars, perhaps back up from there to a word end - depending upon the context, then append " ...". The actual string might be very long (like a chapter of a book, and all I want to display is "It was the best of times, ...") I don't need to know its length, and don't want to waste time determining that (or I'd just use strlen()) so I use strnlen(string, 31) instead. If the answer is <= 30, then I simply use the entire string, however long it might be. If it is > 30 (ie: 31, the only possible case in this scenario) then I do the string truncation dance, and output that. ] The wording needs to be done in such a way that both of these scenarios work correctly. |
(0006834) eblake (manager) 2024-07-12 17:57 |
It would also be nice if strncmp ("a", "b", SIZE_MAX) were guaranteed to work in linear order in the same manner as memchr() rather than an implementation being permitted to access arbitrary bytes within the two arrays according to an intentionally over-large size. |
(0006835) kre (reporter) 2024-07-13 01:10 |
Re Note: 0006834 I agree, though this would probably be better addressed by the C standard than here. But in general, I'd suggest that any function (standard ones, or user created functions) which access any character sequence parameter passed to them, beyond either the first nul character, or the provided length if there is one, should result in undefined behaviour. Naturally this only applies in cases where the incoming parameter data is defined to end at the first nul encountered. There should be a name for such objects, they are almost strings, except those require that the terminating nul exist, and this other type do not. If that were done, then the sequential access to whatevers in all of these functions would be guaranteed, as one cannot access p[1] without first ensuring that p[0] is not nul, similarly accessing p[2] requires that p[1] also not be nul (etc) - the only valid access regime is p[0] first, then p[1] up to (but not exceeding) p[N] (where N is the provided length, if that is available, or SIZE_MAX if it isn't), and never going beyond p[n] if p[n] is nul, and p[i] 0 <= i < n are all not nul. Naturally n < N. Once the maximum length that can be referenced is discovered, the implementation is free to then access (again) the bytes in the data in any order it likes. For something simple line strncmp() or strnlen() it makes no real sense to do anything other that compare (or count) the data as it is being examined for the first time, also looking for the nul terminator, but for more complex operations, like pattern matching, other access methods might work out better (matching the RE '^.*abc$' is faster by starting at the end of the input string, and working backwards, than at the beginning and working forwards). All of this works (or should) for the wide char functions as well, which is why I keep writing nul rather than '\0' as I intend that to mean the appropriate nul character for the character data type involved. |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |