Anonymous | Login | 2024-09-07 14:29 UTC |
Main | My View | View Issues | Change Log | Docs |
Viewing Issue Simple Details [ Jump to Notes ] | [ Issue History ] [ Print ] | |||||||||||
ID | Category | Severity | Type | Date Submitted | Last Update | |||||||
0001832 | [Issue 8 drafts] System Interfaces | Comment | Enhancement Request | 2024-05-24 21:48 | 2024-05-27 20:04 | |||||||
Reporter | alanc | View Status | public | |||||||||
Assigned To | ||||||||||||
Priority | normal | Resolution | Open | |||||||||
Status | New | Product Version | ||||||||||
Name | Alan Coopersmith | |||||||||||
Organization | ||||||||||||
User Reference | ||||||||||||
Section | System Interfaces | |||||||||||
Page Number | (page or range of pages) | |||||||||||
Line Number | (Line or range of lines) | |||||||||||
Final Accepted Text | ||||||||||||
Summary | 0001832: Add preadv() and pwritev() | |||||||||||
Description |
Many implementations offer preadv() and pwritev() interfaces, which are like the existing readv() and writev() APIs, except that they use specified positions instead of the current file offsets, just like the existing pread() and pwrite() versions of read() and write(). FreeBSD: - https://man.freebsd.org/cgi/man.cgi?query=preadv [^] - https://man.freebsd.org/cgi/man.cgi?query=pwritev [^] illumos: - https://illumos.org/man/2/preadv [^] - https://illumos.org/man/2/pwritev [^] Linux (GNU libc): - https://man7.org/linux/man-pages/man2/preadv.2.html [^] NetBSD: - https://man.netbsd.org/preadv.2 [^] - https://man.netbsd.org/pwritev.2 [^] OpenBSD: - https://man.openbsd.org/preadv.2 [^] - https://man.openbsd.org/pwritev.2 [^] Solaris: - (man pages not online yet, just added in 11.4.69 in May 2024) Known consumers include PostgreSQL and libuv, as discussed in the thread at https://twitter.com/MengTangmu/status/1729704220368990235 [^] . |
|||||||||||
Desired Action | Add preadv() and pwritev() to the System Interfaces in Issue 9. | |||||||||||
Tags | No tags attached. | |||||||||||
Attached Files | ||||||||||||
|
Notes | |
(0006800) Guy Harris (reporter) 2024-05-24 21:56 |
macOS has it as well (formatted man pages not online at an Apple site, but their GitHub repository has it in the read(2) man page at https://opensource.apple.com/source/xnu/xnu-7195.60.75/bsd/man/man2/read.2.auto.html). [^] |
(0006803) alanc (reporter) 2024-05-25 23:10 |
If added, preadv() and pwritev() should also be considered to be added to the list of Cancellation Points in the Thread Cancellation section, to match readv/writev/pread/pwrite. |
(0006804) philip-guenther (reporter) 2024-05-26 00:03 |
I suggest that preadv/pwritev be documented to MAY be cancellation points and not MUST be, and even to have a future direction that they be forbidden from being cancellation points. The argument is that being a cancellation point is proper for 'slow' operations, which may block for an arbitrary length of time and where EINTR would otherwise be an appropriate return-value**. For read/write those conditions can occur on sockets, pipes, FIFOs, and ttys...but none of those are valid for preadv/pwritev! Yes, this argument applies to pread/pwrite too. IMHO, I think that the standard's current requirement for them should be weakened to a MAY as well with a future direction to completely remove permission for them to be cancellation points. Compare this with fcntl() which is only required to be a cancellation point when the cmd is F_SETLKW, the only 'unbounded blocking' call. For pread/pwrite/preadv/pwritev we know they can't exhibit that: why is it natural for them to be cancellation points? ** Yes, yes, that's not the precise statement of when cancellation would be expected to be tested, but hopefully the point it clear |
(0006805) Guy Harris (reporter) 2024-05-26 02:43 |
> and even to have a future direction that they be forbidden from being cancellation points. Given that macOS has both "preadv" and "preadv_nocancel" system calls, and that the GNU libc used in most Linux systems has both "preadv()" and "__preadv_nocancel()" routines, it's likely that "preadv()" is a cancellation point on both those OSes, so that might cause problems. Furthermore, on both those OSes (and most if not all other UN*Xes), "slow" operation scan occur on special files other than ttys, *and* those special files might use the seek offset, so there might well be cases where making pthreadv() not a cancellation point would cause problems, even if it doesn't cause problems for POSIX-conformant programs. So I would recommend against that. Instead, if there is a demand for non-cancellation-point calls, I would recommend, for a future direction, that _nocancel versions of calls be added. I don't know why neither macOS nor GNU(libc)/Linux provide _nocanncel versions of those APIs, but, other than the namespace pollution issues, it would be trivial to provide them, at least in those OSes. |
(0006806) steffen (reporter) 2024-05-27 20:04 |
This remark surely belongs to Rich Felker of musl, also because he wrote the Linux kernel patch, i think (to remember), but i want to add a note .. or, let me paste parts of a musl commit message instead: POSIX requires pwrite to honor the explicit file offset where the write should take place even if the file was opened as O_APPEND. however, linux historically defined the pwrite syscall family as honoring O_APPEND. this cannot be changed on the kernel side due to stability policy, but the addition of the pwritev2 syscall with a flags argument opened the door to fixing it this patch changes the pwrite function to first attempt using the pwritev2 syscall with RWF_NOAPPEND, falling back to using the old pwrite syscall only after checking that O_APPEND is not set for the open file. if O_APPEND is set, the operation fails with EOPNOTSUPP, reflecting that the kernel does not support the correct behavior. this is an extended error case needed to avoid the wrong behavior that happened before (writing the data at the wrong location), and is aligned with the spirit of the POSIX requirement that "An attempt to perform a pwrite() on a file that is incapable of seeking shall result in an error." since the pwritev2 syscall interprets the offset of -1 as a request to write at the current file offset, it is mapped to a different negative value that will produce the expected error. pwritev, though not governed by POSIX at this time, is adjusted to match pwrite in honoring the offset. |
Mantis 1.1.6[^] Copyright © 2000 - 2008 Mantis Group |