Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001143 [1003.1(2016/18)/Issue7+TC2] Base Definitions and Headers Comment Clarification Requested 2017-06-14 13:59 2019-11-05 12:04
Reporter dstaesse View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Dimitri Staessens
Organization
User Reference
Section 2.9.5
Page Number 520
Line Number 18219-18223
Interp Status Approved
Final Accepted Text Note: 0004159
Summary 0001143: cancellation points: contradiction between base definition and rationale
Description The second clause in the statement

It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread
resumes normal execution if:
• The thread is suspended at a cancellation point and the event for which it is waiting occurs
• A specified timeout expired before the cancellation request is acted upon.

contradicts the rationale for B.2.9.5 on p.3657 lines 125123-125129

Cancellation points are points inside of certain functions where a thread has to act on any pending cancellation request when cancelability is enabled. For functions in the ``shall occur ’’list, a cancellation check must be performed on every call regardless of whether, absent the cancellation, the call would have blocked.
Desired Action Solve the contradiction
Tags tc3-2008
Attached Files

- Relationships

-  Notes
(0003760)
terekhov (reporter)
2017-06-14 14:25

add wording clarifying that "shall occur" does NOT cover "shall fail" cases like ETIMEDOUT
(0003765)
dstaesse (reporter)
2017-06-15 18:49

This seems to imply an even bigger change of the way cancellation points behave.

The "shall fail" cases that are not covered must be specified.

This may introduce inconsistencies for cancellation points that may time out and
* do not have "shall fail" cases
* return success on timeout
* return EAGAIN on timeout
* return -1 and set errno
(0003783)
dstaesse (reporter)
2017-06-17 12:17

The rationale in AI-136 that the delay interfaces need special consideration for timeouts is dubious. There were cancellation points that timed out before the introduction of the delay interfaces. There is no need for a special case for timeouts, and the introduction of undefined behaviour for this case leaves the possibility for very awkward API differences between implementations. A cancellation check must happen even if the cancellation point times out immediately (provided that the cancellation request was pending prior to the invokation of the timed cancellation point).

Suggested action is to revert the change introduced by ERN 207.
(0003784)
terekhov (reporter)
2017-06-19 09:04

Reverting ERN 27 won't undo normative mandatory "shall fail" detection of prior timeout. The "shall occur" regarding cancel is concerned with non failure mode (the case of no prior timeout... other 'failure modes' aside for a moment).

Consider that in the past, the Rationale said:

"Cancellation points are points inside of certain functions where a thread has to act on any pending cancellation request when cancelability is enabled, if the function would block. As with checking for signals, operations need only check for pending cancellation requests when the operation is about to block indefinitely."

It is my understanding that shall "shall occur" list is meant to preclude "block indefinitely." Where as "may also occur" list is meant for functions with internal fast path with no danger to "block indefinitely" on that fast path (and with cancel delivery on slow path only).
(0003790)
dstaesse (reporter)
2017-06-20 03:50

I interpret the "shall occur" list as functions that have to be cancellation points in every implementation, whereas the "may occur" list are function that may be cancellation points in some implementations but not in others. Portable code should take that into account.

Whether cancellation should have precedence over returning any "shall fail" case (or a success case for that matter) is of course a choice (they can't both happen). The old rationale you quote allows the function not to perform the cancellation check if it would not block indefinitely. That seems to have been changed in newer versions so that cancellation has precedence always at some point (which, to me, is the case that leads to the most robust specification).

The current text seems to allow undefined behaviour for some cases (functions that timeout) but not others (and has the above contradiction). This undefined behaviour based on an input parameter that cannot be fixed is the case to avoid. As an example, a 1 microsecond timeout may never be an expired value on a fast machine where the code is running on a dedicated core, sometimes be expired on an equally fast machine that schedules different processes, and always be expired on a very slow machine. This is hardly a portable interface, as program behaviour may become unpredictable. Not all cancellation points with a timeout return ETIMEDOUT that allows easy implementation of a check.
 
Why would any thread that
1) has a defined cancellation point in the code and
2) has a pending cancellation (so the program expects that thread to exit as soon as possible) and
3) was not yet executing that cancellation point and suspended when the cancellation request arrived
be allowed to continue existing until some timeout expires, or be allowed to postpone that cancellation indefinitely unless another cancellation point is explicitly added to explicitly (a pthread_testcancel()) avoid such a case?

Doesn't it make more sense to make cancellation mandatory for the above case, and only allow undefined behaviour if (and only if) clause 3) above is not met? That allows avoiding complexity in the implementation of the cancellation point so that it doesn't have to perform the cancellation check at awkward points in the code and roll back state and release already allocated resources?

At least that's how I see the intention of the current version of the base specification.
(0003792)
terekhov (reporter)
2017-06-20 09:59

I kinda like the clarification requested many years ago:

https://standards.ieee.org/findstds/interps/1003-1c-95_int/pasc-1003.1c-10.html [^]
http://people.exeter.ac.uk/DCannon/WG15/mail/wg151111.txt [^]

"A cancellation point will also occur in the following functions if the function causes the thread to block:"

and note that the old response:

"The "intent" is that the functions listed in ISO/IEC 9945-1: 1996 following line 56 in 18.1.2 are allowed to be cancellation points, just in case they are implemented using other routines specified to be cancellation points. Were it not for this language, these routines cannot use routines which are cancellation points in their implementation because the standard says that no POSIX routines other than those specified are cancellation points."

does not take into account pthread_setcancelstate() as implementation detail to disable and afterwards restore cancel state.
(0003793)
dstaesse (reporter)
2017-06-20 10:31

That's an interesting reference. I think the phrase

"A cancellation point will also occur in the following functions if the function causes the thread to block:"

is not correct when applied to the current specification. It was correct when there was a clause that cancellation would only occur when the cancellation point blocks indefinitely.

So indeed, a statement that these functions may cancel the thread if they are implemented using a cancellation point from the "shall occur" list is a very useful clarification to add. If they are implemented using a cancellation point, disabling the cancellation state might lead to hanging threads, so I wouldn't support removing them as possible cancellation points.

It looks to me that, at some point, there was a choice to prioritise cancellation (and actually enforce it) over returning and continuing the thread execution. (Is there an easy way to go through the revision history to pinpoint that change?)
(0003794)
terekhov (reporter)
2017-06-20 12:44

But "blocks indefinitely" was not meant to describe deadlocks/hanging threads as in waiting infinite time unless a timeout or cancel occurs. In fact, classical deadlock on mutexes is not covered by cancel at all. Think of cancel as just another event that unblocks a thread blocked "indefinitely". IOW read it as just blocked for unspecified period of time (timeouts/wakeup events aside for a moment) but not really infinite/hanging. :)
(0003795)
dstaesse (reporter)
2017-06-20 13:11

I agree. I assume sound code devoid of deadlocks or livelocks or any other problem with the implementated synchronisation logic.

I take that "block indefitely" means that the point in the future where the function will unblock is undefined. By the old definition quoted above, a timeout would probably not be considered as "block indefinitely" but a read() on a UDP socket would.

I don't think cancellation should be taken into that equation. Thread cancellation is the mechanism that prevents the thread from blocking beyond a point that is deemed acceptable from the viewpoint of the program. The cancel is an asynchronous event that signals the thread to terminate at some point in the (preferably very near) future. The execution of the next cancellation point is the point at which this cancellation will take place. At least according to the base specification issue 6.

Moving from a situation where a thread "may" cancel at a cancellation, to a situation where a thread "shall" cancel (which has apparently happened) is quite easy from the viewpoint of an existing codebase. Nothing has to be changed, the result is some dead code that will never be called: the check for the return value and the explicit cancellation check in the form of a pthread_testcancel().

The proposed solution to allow "shall fail" cases to skip the cancellation check seems to be in a direction to revert that rationale, which would be a much more painful move for existing code that expects the thread to cancel.
(0004158)
geoffclare (manager)
2018-11-02 10:01
edited on: 2018-11-02 10:18

Re: Note: 0003783 ERN 207 was intended simply as a clarification as to whether a timeout counts as part of "the event for which it is waiting". If we revert the change, the standard will go back to being unclear on this point. However, I do think the change that was made was not quite right, as the "if the thread is suspended at a cancellation point" condition should have applied to the timeout case as well.

The point of this part of the standard is that once a function that is a cancellation point has become suspended, then a cancellation request interrupts the suspension and terminates the thread. The function does not perform any further processing as it does not return from suspension, and thus it can assume that once it suspends, cancellation is taken care of and it doesn't need to do any further checking for cancellation. If a cancellation request is made simultaneously with another event that would end the suspension, there is a race and it is unspecified who wins.

So I think the only point that needs addressing is the one that "A cancellation check must happen even if the cancellation point times out immediately", which is the case the ERN 207 got wrong. I think changing to the following would fix that:
It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution if the thread is suspended at a cancellation point and either:
  • The event for which it is waiting occurs
  • A specified timeout expires
before the cancellation request is acted upon.


(0004159)
geoffclare (manager)
2018-11-08 17:01

Interpretation response
------------------------
The standard states the requirements for thread cancellation when a timeout occurs, and conforming implementations must conform to this. However, concerns have been raised about this which are being referred to the sponsor.

Rationale:
-------------
The change made as a result of ERN 207 in https://www.opengroup.org/austin/aardvark/latest/xshbug2.txt [^] was intended simply as a clarification as to whether a timeout counts as part of "the event for which it is waiting". However, the change that was made was not quite right, as the "if the thread is suspended at a cancellation point" condition should have applied to the timeout case as well.

The point of this part of the standard is that once a function that is a cancellation point has become suspended, then a cancellation request interrupts the suspension and terminates the thread. The function does not perform any further processing as it does not return from suspension, and thus it can assume that once it suspends, cancellation is taken care of and it doesn't need to do any further checking for cancellation. If a cancellation request is made simultaneously with another event that would end the suspension, there is a race and it is unspecified who wins.

Notes to the Editor (not part of this interpretation):
-------------------------------------------------------
Change:
It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution if:
  • The thread is suspended at a cancellation point and the event for which it is waiting occurs
  • A specified timeout expired
before the cancellation request is acted upon.
to:
It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution if the thread is suspended at a cancellation point and either:
  • The event for which it is waiting occurs
  • A specified timeout expires
before the cancellation request is acted upon.
(0004167)
ajosey (manager)
2018-11-12 19:50

Interpretation proposed: 12 November 2018
(0004188)
agadmin (administrator)
2018-12-14 15:03

Interpretation approved: 14 December 2018

- Issue History
Date Modified Username Field Change
2017-06-14 13:59 dstaesse New Issue
2017-06-14 13:59 dstaesse Name => Dimitri Staessens
2017-06-14 13:59 dstaesse Section => 2.9.5
2017-06-14 13:59 dstaesse Page Number => 520
2017-06-14 13:59 dstaesse Line Number => 18219-18223
2017-06-14 14:25 terekhov Note Added: 0003760
2017-06-15 18:49 dstaesse Note Added: 0003765
2017-06-17 12:17 dstaesse Note Added: 0003783
2017-06-19 09:04 terekhov Note Added: 0003784
2017-06-20 03:50 dstaesse Note Added: 0003790
2017-06-20 09:59 terekhov Note Added: 0003792
2017-06-20 10:31 dstaesse Note Added: 0003793
2017-06-20 12:44 terekhov Note Added: 0003794
2017-06-20 13:11 dstaesse Note Added: 0003795
2018-11-02 10:01 geoffclare Note Added: 0004158
2018-11-02 10:18 geoffclare Note Edited: 0004158
2018-11-08 17:01 geoffclare Note Added: 0004159
2018-11-08 17:02 geoffclare Interp Status => Pending
2018-11-08 17:02 geoffclare Final Accepted Text => Note: 0004159
2018-11-08 17:02 geoffclare Status New => Interpretation Required
2018-11-08 17:02 geoffclare Resolution Open => Accepted As Marked
2018-11-08 17:02 geoffclare Tag Attached: tc3-2008
2018-11-12 19:50 ajosey Interp Status Pending => Proposed
2018-11-12 19:50 ajosey Note Added: 0004167
2018-12-14 15:03 agadmin Interp Status Proposed => Approved
2018-12-14 15:03 agadmin Note Added: 0004188
2019-11-05 12:04 geoffclare Status Interpretation Required => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker