Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001387 [1003.1(2008)/Issue 7] System Interfaces Editorial Clarification Requested 2020-08-10 16:46 2021-03-08 15:18
Reporter rhansen View Status public  
Assigned To ajosey
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Richard Hansen
Organization
User Reference
Section malloc
Page Number 1295 (Issue 7 2018 edition)
Line Number 43161 (Issue 7 2018 edition)
Interp Status Approved
Final Accepted Text Note: 0005203
Summary 0001387: Should EAGAIN be acceptable for malloc failure?
Description On failure, the implementations of malloc from Solaris, OpenSolaris, OpenIndiana, illumos, etc. set errno to either ENOMEM or EAGAIN (see https://illumos.org/man/3c/malloc [^] ). It seems to me that EAGAIN is used for at least a subset of ordinary out-of-memory conditions, so on the surface these implementations appear to be non-conforming.

I am unfamiliar with the implementation details, and the phrase "Insufficient storage space is available" could be interpreted in a few subtly different ways, so perhaps one could argue that EAGAIN is only used for error cases other than "Insufficient storage space is available" (which is permitted by the standard).

IIUC, Solaris has behaved this way for a very long time. If the implementations are considered to be non-conforming, then it might make more sense to change the standard to permit EAGAIN than to change the implementations.

Link to HTML version of Issue 7 2018 edition: https://pubs.opengroup.org/onlinepubs/9699919799/functions/malloc.html [^]
Desired Action On line page 1295 line 43161, change:

  [ENOMEM] Insufficient storage space is available.

to:

  [ENOMEM], [EAGAIN] Insufficient storage space is available.

A similar change would be required for every other function that allocates memory, assuming the implementations of those functions on Solaris and friends use malloc and leave errno unmodified on malloc failure.
Tags tc3-2008
Attached Files

- Relationships
related to 0001489Applied Issue 8 drafts malloc RATIONALE awkward wording 

-  Notes
(0004914)
rhansen (manager)
2020-08-10 17:09

A problem arises if we were to replace all instances of [ENOMEM] with [ENOMEM], [EAGAIN]: Some functions already specify [EAGAIN] for other error conditions. These include:
  • accept
  • fclose
  • fflush
  • fgetc
  • fgetwc
  • fork
  • fputc
  • fputwc
  • mlock
  • mlockall
  • mmap
  • mprotect
  • open
  • openat
  • posix_trace_create
  • posix_trace_create_withlog
  • pread
  • pthread_barrier_init
  • pthread_cond_init
  • pthread_key_create
  • pthread_mutex_init
  • pthread_rwlock_init
  • pthread_rwlockattr_init
  • pthread_spin_init
  • read
  • recv
  • recvfrom
  • recvmsg
  • sendmsg
  • sendto
(0004915)
Don Cragun (manager)
2020-08-10 17:39

Rather than:

      [ENOMEM], [EAGAIN] Insufficient storage space is available.

I would prefer to see:

     [EAGAIN] Allocating the requested storage space would cause the thread to be blocked.
     [ENOMEM] Insufficient storage space is available.
(0004916)
alanc (reporter)
2020-08-10 17:45
edited on: 2020-08-10 17:45

I believe the Solaris behavior originally came from passing through sbrk() failures without checking/changing the reported errno value, and thus distinguishes between hitting some limit, for which trying again is not worthwhile, vs. waiting for other processes to exit or otherwise free up memory.

(0004918)
shware_systems (reporter)
2020-08-11 01:08

EAGAIN, if added as Don proposes, should be a may fail case, not both as shall fail. As ENOMEM is currently synonymous with a null return with most platforms, adding a symbolic non-NULL ptr, i.e. AGAIN_PTR, would be in keeping with the C standard only using the return value to indicate errors. The implementation-defined errno possible with 0 size allocs also needs wording that it doesn't conflict with either of these, as well.
(0004923)
joerg (reporter)
2020-08-14 09:50
edited on: 2020-08-14 09:51

Re:Note: 0004916

Hi Alan, do you know where EAGAIN is created in the Solaris kernel while running sbrk()?

Due to the object oriented design of the address space administration in SunOS, it is hard to find the right location in the code.

What I however can say is that since SunOS-4.0 (from late 1987), kmem_alloc() with a flag of KM_NOSLEEP returns EAGAIN in case that the operation would result in a sleep.

The anon pages segment driver vm_anon.c however only returns ENOMEM, the sgvn driver seg_vn.c returns ENOMEM in plenty of cases (e.g. with shared mappings that are not expected to apply to sbrk()), but also ENOMEM in other cases.

(0004927)
Konrad_Schwarz (reporter)
2020-08-19 09:53

Shouldn't this fall under the following provision in https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_03? [^]

"Implementations may support additional errors not included in this list, may generate errors included in this list under circumstances other than those described here, or may contain extensions or limitations that prevent some errors from occurring."
(0004932)
rhansen (manager)
2020-08-19 19:57

> Shouldn't this fall under the following provision in [...]

Yes, if you interpret illumos's EAGAIN case to be distinct from "insufficient storage space is available." To me, it doesn't feel any different, but I can see how others would feel otherwise.

If I saw Don's suggested wording (Note: 0004915) in the standard, then I would interpret "insufficient storage space is available" more narrowly than I do now.
(0004933)
Konrad_Schwarz (reporter)
2020-08-20 07:23

But isn't this a distinction without a difference? In vast majority of
cases, resolution of the EAGAIN error depends on the actions of other
processes in the system over which the application has no control.

For fork(), I think a case can be made that in Unix, many processes are
short lived and therefore a retry after a limited period can be worthwhile
(given a fixed-size process table). Similar for open file descriptors.

For memory, in a system with long-lived, memory hogging processes
(e.g., large databases), retry after memory allocation failure
does not seem worthwhile.

If POSIX lists EAGAIN as an alternative for ENOMEM everywhere
ENOMEM is documented, scrupulous application programmers will
have to handle EAGAIN explicitly, presumably differently from the ENOMEM
case (i.e., by sleeping and then retrying).

As it is now, if malloc() returns EAGAIN, this would be
handled by fully-conforming code under the "additional errors"
case, e.g., with perror(). This gives the system administrator the
input that more memory or more swap space needs to be installed --
it's not something that the application can handle in a useful way.
(0004935)
rhansen (manager)
2020-08-20 17:58
edited on: 2020-08-20 18:10

> But isn't this a distinction without a difference?

I agree with your analysis. That's why it feels wrong to me that Solaris-based systems use EAGAIN—they should use ENOMEM instead.

I see these options:
  • Change Solaris-based systems to use ENOMEM instead of EAGAIN. I prefer this option, but it is unlikely to happen.
  • Declare Solaris-based systems to be non-conforming and live with that.
  • Try to convince ourselves that the Solaris EAGAIN error condition is completely different than "insufficient storage space is available", and therefore the behavior does not run afoul of "Implementations shall not generate a different error number from one required by this volume of POSIX.1-2017 for an error condition described in this volume of POSIX.1-2017." I don't like this option because I don't think the error condition is completely different, for the reasons you state in your note.
  • Change the standard to mention EAGAIN as a possible error with a subtly different meaning ("insufficient storage space right now, but it might succeed if you try again later" vs. "a hard limit would be exceeded so there will never be sufficient storage space"). I don't like this option because retries are not generally useful so why make the distinction? In my opinion this makes the standard slightly less useful. The only reason to choose this option is to allow Solaris to be declared conforming without changing its implementation.


(0004936)
alanc (reporter)
2020-08-20 18:14

There is an open bug report against Solaris for this:
15109791 malloc(3C) fails with EAGAIN where ENOMEM is expected
and it could be fixed if needed, it's just never been a high priority
since few applications do anything differently for EAGAIN vs. ENOMEM
return values - most just print the error message from them.

That bug notes the historical distinction was:

     ENOMEM
           The physical limits of the system are exceeded by size
           bytes of memory which cannot be allocated.

     EAGAIN
           There is not enough memory available to allocate size
           bytes of memory; but the application could try again
           later.

but also notes Solaris has multiple malloc implementations, and not all
of them made this distinction. (We're currently up to 8 different malloc
library options in Solaris 11.4 - see the ALTERNATIVE IMPLEMENTATIONS section
at the end of:
https://docs.oracle.com/cd/E88353_01/html/E37843/malloc-3c.html#scrolltoc [^] .)
(0004937)
rhansen (manager)
2020-08-20 18:24
edited on: 2020-08-20 19:44

"So you're telling me there's a chance. YEAH!" :)
(reference: https://www.imdb.com/title/tt0109686/quotes/qt0995799 [^] )

Joking aside, it's good news that Oracle isn't totally opposed to changing the behavior.

(0004938)
rhansen (manager)
2020-08-20 18:30

Is that Solaris bug report publicly accessible?
(0004939)
alanc (reporter)
2020-08-20 19:42

Sorry, Oracle only makes Solaris bug reports accessible to customers with
support contracts, not to the general public.

(And yes, there's a chance, given a good reason, but that would only affect
 future support updates to Solaris 11.4, not decades of past releases that
 have had this behavior in.)
(0004940)
rhansen (manager)
2020-08-20 19:48
edited on: 2020-08-20 19:49

> there's a chance, given a good reason

Do you know if POSIX conformance would be a good enough reason by itself?

> that would only affect future support updates to Solaris 11.4,
> not decades of past releases that have had this behavior

From a POSIX perspective that's OK—only changing new releases is good enough to avoid rewording the standard.

(0005203)
geoffclare (manager)
2021-01-21 16:11
edited on: 2021-01-22 09:30

Interpretation response
------------------------
The standard clearly states that implementations may use any error number that is applicable to a particular failure, and conforming implementations must conform to this.

Rationale:
-------------
Some implementations return EAGAIN if the resource (memory) is temporarily unavailable. This is acceptable under the general rules for error numbers in section 2.3 of XSH.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
On page 637 line 22025 (calloc() rationale), change:
None.
to:
See the RATIONALE for [xref to malloc()].

On page 1295 line 43167 (malloc() rationale), change:
None.
to:
Some implementations set errno to [EAGAIN] to signal memory allocation failures that might succeed if retried and [ENOMEM] for failures that are unlikely to ever succeed, for example due to configured limits. XSH section 2.3 permits this behavior, since when multiple error conditions described there are simultaneously true there is no precedence between them.

On page 1788 line 57907 (realloc() rationale), change:
None.
to:
See the RATIONALE for [xref to malloc()].


(0005204)
agadmin (administrator)
2021-01-21 16:25

Interpretation proposed: 21 Jan 2021
(0005249)
agadmin (administrator)
2021-02-22 13:52

Interpretation approved: 22nd February 2021

- Issue History
Date Modified Username Field Change
2020-08-10 16:46 rhansen New Issue
2020-08-10 16:46 rhansen Status New => Under Review
2020-08-10 16:46 rhansen Assigned To => ajosey
2020-08-10 16:46 rhansen Name => Richard Hansen
2020-08-10 16:46 rhansen Section => malloc
2020-08-10 16:46 rhansen Page Number => 1295 (Issue 7 2018 edition)
2020-08-10 16:46 rhansen Line Number => 43161 (Issue 7 2018 edition)
2020-08-10 16:46 rhansen Interp Status => ---
2020-08-10 17:09 rhansen Note Added: 0004914
2020-08-10 17:14 rhansen Description Updated
2020-08-10 17:35 rhansen Desired Action Updated
2020-08-10 17:36 rhansen Desired Action Updated
2020-08-10 17:39 Don Cragun Note Added: 0004915
2020-08-10 17:45 alanc Note Added: 0004916
2020-08-10 17:45 alanc Note Edited: 0004916
2020-08-11 01:08 shware_systems Note Added: 0004918
2020-08-14 09:50 joerg Note Added: 0004923
2020-08-14 09:50 joerg Note Edited: 0004923
2020-08-14 09:51 joerg Note Edited: 0004923
2020-08-19 09:53 Konrad_Schwarz Note Added: 0004927
2020-08-19 19:57 rhansen Note Added: 0004932
2020-08-20 07:23 Konrad_Schwarz Note Added: 0004933
2020-08-20 17:58 rhansen Note Added: 0004935
2020-08-20 18:04 rhansen Note Edited: 0004935
2020-08-20 18:08 rhansen Note Edited: 0004935
2020-08-20 18:10 rhansen Note Edited: 0004935
2020-08-20 18:14 alanc Note Added: 0004936
2020-08-20 18:24 rhansen Note Added: 0004937
2020-08-20 18:25 rhansen Note Edited: 0004937
2020-08-20 18:25 rhansen Note Edited: 0004937
2020-08-20 18:30 rhansen Note Added: 0004938
2020-08-20 19:42 alanc Note Added: 0004939
2020-08-20 19:44 rhansen Note Edited: 0004937
2020-08-20 19:48 rhansen Note Added: 0004940
2020-08-20 19:49 rhansen Note Edited: 0004940
2020-08-20 19:49 rhansen Note Edited: 0004940
2021-01-21 16:11 geoffclare Note Added: 0005203
2021-01-21 16:13 geoffclare Interp Status --- => Pending
2021-01-21 16:13 geoffclare Final Accepted Text => Note: 0005203
2021-01-21 16:13 geoffclare Status Under Review => Interpretation Required
2021-01-21 16:13 geoffclare Resolution Open => Accepted As Marked
2021-01-21 16:13 geoffclare Tag Attached: tc3-2008
2021-01-21 16:25 agadmin Interp Status Pending => Proposed
2021-01-21 16:25 agadmin Note Added: 0005204
2021-01-22 09:30 geoffclare Note Edited: 0005203
2021-02-22 13:52 agadmin Interp Status Proposed => Approved
2021-02-22 13:52 agadmin Note Added: 0005249
2021-03-08 15:18 geoffclare Status Interpretation Required => Applied
2021-07-12 16:10 rhansen Relationship added related to 0001489


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker