Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001220 [1003.1(2016/18)/Issue7+TC2] System Interfaces Editorial Omission 2018-12-20 13:46 2021-05-07 15:37
Reporter bhaible View Status public  
Assigned To
Priority normal Resolution Accepted As Marked  
Status Applied  
Name Bruno Haible
Organization GNU
User Reference
Section ---
Page Number ---
Line Number ---
Interp Status ---
Final Accepted Text Note: 0005341
Summary 0001220: Add an API to query the name of a locale category of a locale object
Description The purpose of the locale API is to let information flow within an application; it's not primarily an interface to the kernel. The information producer is the code that sets the locale, either implicitly at program start, or through setlocale, or per-thread through uselocale. The information consumer is the code that produces different data depending on the locale. There are information consumers defined by this standard (such as mbrtowc, which uses the LC_CTYPE category of the current locale, or fprintf, which prints floating-point numbers using the LC_NUMERIC category of the current locale), and there are information consumers defined by the application. So far, information consumers defined by the application can only query the locale set at program start or through setlocale, through a call to setlocale(category,NULL). So far, information consumers defined by the application have no way to actually be implemented in a portable way, because such an API is not defined by this standard. Instead, different APIs are provided by different libc vendors.

The need for such an API is best proved by looking at the locale_t object of OpenBSD 6.2. It contains only the minimal information needed to fulfil this standard. A locale_t object for "en_US.UTF-8" and a locale_t object object for "de_DE.UTF-8" are indistinguishable; they are in fact the same pointer values. In such a system, an application has no way to produce locale dependent data (such as, a different decimal separator (LC_NUMERIC category) or different translations of a string (LC_MESSAGES category)). This proves that the API provided by this standard so far is insufficient for information consumers defined by the application.

I propose to add an API as shown below.

Rationale for this API:

Different systems have different APIs.

The API proposed here is the one considered for Solaris 12. See https://lists.gnu.org/archive/html/bug-gnulib/2015-01/msg00078.html [^] . Solaris 11.4 in fact has __getlocalename_l.

FreeBSD and macOS have querylocale(). The problem with this API is that its first argument is a mask; the application therefore first has to convert the category to a category_mask.

GNU libc, Cygwin, and musl libc have an API that is based on <langinfo.h>: nl_langinfo_l (NL_LOCALE_NAME (category), locobj).

Desired Action NAME
     getlocalename_l - get name of locale object

SYNOPSIS
     #include <locale.h>
     const char * getlocalename_l (int category, locale_t locobj);

DESCRIPTION
     The getlocalename_l() function shall return the name of the locale locobj for the given category.

     The category argument must specify the category. The possible values include LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME, with the meaning defined in [reference to setlocale() description]. Otherwise, the behavior is undefined.

     The locobj argument must be a valid locale object handle and must not be equal to LC_GLOBAL_LOCALE. Otherwise, the behavior is undefined.

RETURN VALUE
     The getlocalename_l() function shall return the string associated with the specified category for the given locale. This string's lifetime ends when freelocale(locobj) gets invoked.

ERRORS
    No errors are defined.

EXAMPLES
    Determining the name of a category of the current per-thread locale

    The following example shows how to extract the name of a LC_NUMERIC category of the current per-thread locale.

    #include <locale.h>
    ...
    const char *name;
    locale_t loc = uselocale (NULL);
    if (loc == LC_GLOBAL_LOCALE)
      name = setlocale (LC_NUMERIC, NULL);
    else
      name = getlocalename_l (LC_NUMERIC, loc);

APPLICATION USAGE
    The special locale object handle LC_GLOBAL_LOCALE must not be passed for the locobj argument, even when returned by the uselocale() function.

SEE ALSO
    newlocale, freelocale, setlocale
    XBD Locale, <locale.h>
Tags issue8
Attached Files

- Relationships

-  Notes
(0005026)
geoffclare (manager)
2020-10-05 11:11
edited on: 2020-10-05 11:13

Suggested changes to go into The Open Group company review...

(The main differences from the desired action are the addition of the <locale.h> change, avoidance of the word "must", and requiring that a null pointer is returned when category is invalid, instead of the behaviour being undefined.)

On page 286 line 9687 section <locale.h>, add:
[CX]const char *getlocalename_l(int, locale_t);[/CX]

On page 1050 insert a new getlocalename_l page:

NAME
getlocalename_l -- get a locale name from a locale object

SYNOPSIS
[CX]#include <locale.h>

const char * getlocalename_l(int category, locale_t locobj);[/CX]

DESCRIPTION
The getlocalename_l() function shall return the locale name for the given locale category of the locale object locobj.

The category argument specifies the locale category to be queried. If the value is LC_ALL or is not a supported locale category value (see [xref to setlocale()]), getlocalename_l() shall fail.

The behavior is undefined if the locobj argument is the special locale object LC_GLOBAL_LOCALE or is not a valid locale object handle.

RETURN VALUE
Upon successful completion, getlocalename_l() shall return a pointer to a string containing the locale name; otherwise, a null pointer shall be returned. The returned string shall remain valid until the locale object locobj is used in a call to freelocale() or as the base argument in a successful call to newlocale().

ERRORS
No errors are defined.

EXAMPLES
Determining the locale name for a category of the current locale

The following example shows how to extract the locale name for the LC_NUMERIC category of the current thread-local locale, or of the global locale if no thread-local locale is in use.
#include <locale.h>
...
const char *name;
locale_t loc = uselocale(NULL);
if (loc == LC_GLOBAL_LOCALE)
    name = setlocale(LC_NUMERIC, NULL);
else
    name = getlocalename_l(LC_NUMERIC, loc);

APPLICATION USAGE
Applications need to ensure that they do not pass the special locale object handle LC_GLOBAL_LOCALE as the locobj argument, even when returned by the uselocale() function.

RATIONALE
None.

FUTURE DIRECTIONS
None.

SEE ALSO
freelocale(), newlocale(), setlocale(), uselocale()

XBD Chapter 7 (on page XXX), <locale.h>

CHANGE HISTORY
First released in Issue 8.

Add getlocalename_l() to the SEE ALSO section for each page listed in the getlocalename_l() SEE ALSO above.

On page 3791 line 130104 section E.1, add getlocalename_l() to the POSIX_MULTI_CONCURRENT_LOCALES subprofile group.

(0005027)
bhaible (reporter)
2020-10-05 15:45

Alas, there is an issue with the example program:
It should be possible to implement multithread-safe information consumers. However, setlocale is not multithread-safe: "The setlocale() function need not be thread-safe."

I see two ways to fix this:

(A) Specify that setlocale(category,NULL) is multithread-safe. That is, if thread1 executes setlocale(category1,NULL) and thread2 executes setlocale(category2,NULL), these two calls will not interfere with each other. Currently this is known to be true (for category == LC_ALL) on GNU libc, HP-UX, IRIX, Solaris, Microsoft Windows, and is known to be false (again, for category == LC_ALL) on musl libc, macOS, FreeBSD, NetBSD, OpenBSD, AIX, Haiku, Cygwin.

(B) Specify that getlocalename_l(category,LC_GLOBAL_LOCALE) returns the same result as setlocale(category,NULL) and that getlocalename_l is multithread-safe.

What is preferred, (A) or (B)?
(0005030)
shware_systems (reporter)
2020-10-05 17:57
edited on: 2020-10-05 17:58

Option (A) I would consider an unwarranted CX extension to setlocale(), for reasons related to preemptive thread scheduling. I won't belabor why, just is a 'no' vote, imho. I feel Option (B) has more to commend it, being POSIX specific and gives a use for LC_GLOBAL_LOCALE, not have its use be undefined.

(0005035)
geoffclare (manager)
2020-10-07 13:29

Alternative changes that solve the thread-safety problem by having getlocalename_l() handle LC_GLOBAL_LOCALE ...

On page 286 line 9687 section <locale.h>, add:
[CX]const char *getlocalename_l(int, locale_t);[/CX]

On page 1050 insert a new getlocalename_l page:

NAME
getlocalename_l - get a locale name from a locale object

SYNOPSIS
[CX]#include <locale.h>

const char * getlocalename_l(int category, locale_t locobj);[/CX]

DESCRIPTION
The getlocalename_l() function shall return the locale name for the given locale category of the locale object locobj, or of the global locale if locobj is the special locale object LC_GLOBAL_LOCALE.

The category argument specifies the locale category to be queried. If the value is LC_ALL or is not a supported locale category value (see [xref to setlocale()]), getlocalename_l() shall fail.

The behavior is undefined if the locobj argument is neither the special locale object LC_GLOBAL_LOCALE nor a valid locale object handle.

RETURN VALUE
Upon successful completion, getlocalename_l() shall return a pointer to a string containing the locale name; otherwise, a null pointer shall be returned.

If locobj is LC_GLOBAL_LOCALE, the returned string pointer might be invalidated or the string content might be overwritten by a subsequent call in the same thread to getlocalename_l() with LC_GLOBAL_LOCALE; the returned string pointer might also be invalidated if the calling thread is terminated. Otherwise, the returned string pointer and content shall remain valid until the locale object locobj is used in a call to freelocale() or as the base argument in a successful call to newlocale().

ERRORS
No errors are defined.

EXAMPLES
Determining the locale name for a category of the current locale

The following example shows how to obtain the locale name for the LC_NUMERIC category of the current thread-local locale, or of the global locale if no thread-local locale is in use.
#include <locale.h>
...
const char *name;
locale_t loc = uselocale(NULL);
name = getlocalename_l(LC_NUMERIC, loc);

APPLICATION USAGE
None.

RATIONALE
Historical versions of getlocalename_l() did not handle the special locale object LC_GLOBAL_LOCALE, requiring that applications used setlocale(category, NULL) to query the global locale if uselocale(NULL) returned LC_GLOBAL_LOCALE. However, since setlocale() is not required to be thread-safe (even when the only concurrent calls are ones that query the locale), this method was problematic for multi-threaded processes. This standard requires that getlocalename_l(category, LC_GLOBAL_LOCALE) queries the global locale in a thread-safe manner, for example by returning a pointer to a thread-local internal buffer instead of a process-wide internal buffer.

FUTURE DIRECTIONS
None.

SEE ALSO
freelocale(), newlocale(), setlocale(), uselocale()

XBD Chapter 7 (on page XXX), <locale.h>

CHANGE HISTORY
First released in Issue 8.

Add getlocalename_l() to the SEE ALSO section for each page listed in the getlocalename_l() SEE ALSO above.

On page 3791 line 130104 section E.1, add getlocalename_l() to the POSIX_MULTI_CONCURRENT_LOCALES subprofile group.
(0005037)
shware_systems (reporter)
2020-10-07 15:08

Looks decent, but I think LC_ALL shouldn't be precluded; I'd rather see it reflect the value of LC_ALL or LANG referenced in the environment when newlocale() was called to create locobj, or inherited via duplocale(), and may return an empty string, or "POSIX", if neither are set.
(0005059)
geoffclare (manager)
2020-10-23 14:21

The getlocalename_l() addition has been made in the Issue8NewAPIs branch in gitlab, based on Note: 0005035.
(0005062)
shware_systems (reporter)
2020-10-23 15:57

I feel Note:5059 is premature, in that Note: 5037 is still open for discussion on the Etherpad. I thought we were getting back to that after the current bug discussion was concluded. There is at least one implementation that already has the internal support for it, as well, and therefore exposing that support as suggested is trivial.
(0005063)
geoffclare (manager)
2020-10-26 10:03

Re note 5062, note 5037 was discussed in the 8th October teleconference and the decision we made was that we would not modify Note: 0005035 to add support for LC_ALL. In the 12th October teleconference we moved on to the backlog of other bugs because we had finished looking at the bugs relating to new APIs sponsored by The Open Group.

To add support for LC_ALL we would need to specify how multiple locale names would be returned, and this would be complicated by the existence of additional non-standard categories. If an application wants to know the locale names for all of the categories, it can simply query them one at a time.
(0005065)
shware_systems (reporter)
2020-10-26 14:50

Nothing in Note: 5037 implies multiple locale names are to be returned, mimicing setlocale(). The LC_ALL and LANG environment values are just a single name. That was someone's invention during that call this was the intent and I agree it would be nonsensical to do it. We've even argued at other times it's non-sensical for setlocale() to require it to begin with.

The primary reason this makes sense is it simplifies library routines that modify aspects of a locale but then are expected to reset them to the original LC_ALL value. Without a means to query the object for this name a separate parameter to the function is required to specify that value, since the function has no way of knowing a previous name returned after a change is the same as that LC_ALL name, nor that LC_ALL in the environment hasn't been modified.
(0005067)
geoffclare (manager)
2020-10-26 16:16

Re note 5065, environment variables are not always used (and if they are used, then they can simply be queried with getenv()). If an application calls:
setlocale(LC_NUMERIC, locale1);
setlocale(LC_TIME, locale2);
then getlocalename_l(LC_ALL, LC_GLOBAL_LOCALE) would have to return, somehow, the information that the locale name for LC_NUMERIC is locale1, for LC_TIME is locale2, and for all other categories is "C" or "POSIX".

With locale objects, in order to modify a locale and restore later there is no need to query locale names. It can be done using the existing locale_t handling functions (i.e. duplocale(), newlocale(), uselocale(), and freelocale()).
(0005085)
shware_systems (reporter)
2020-10-29 18:25

Environment variables are reliable only when it is known putenv() or setenv() hasn't modified the value. 3rd party libraries cannot make this assumption.

If an application calls
setlocale(LC_ALL, all_locale),
getlocalename_l(LC_ALL, LC_GLOBAL_LOCALE)
will be expected to return the value of all_locale, not multiple values. Without such a call, the application requirement that they start with the effect of setlocale(LC_ALL, "POSIX") means that gets returned, again a single value. Returning every name would be the province of a separate interface that returns a char** value, nominally, not char*, to avoid having to parse the return like is required with setlocale().

A savobj=duplocale(oldobj) and newlocale(,,oldobj);oldobj=savobj; simply use the current values, which is frequently desirable, but doesn't guarantee this is the LC_ALL value that is relevant if newlocale(LC_ALL_MASK, all_locale, 0) was what created oldobj to begin with. A library can't assume
oldobj=newlocale(e.g. LC_CTYPE_MASK,value, oldobj)
hasn't been called before it sees it or duplocale() was called. So no, it can NOT be done reliably using just the existing interfaces.
(0005086)
geoffclare (manager)
2020-10-30 09:23

Re: Note: 0005085 You misunderstand what setlocale() does with LC_ALL. It sets all of the categories, but individually; it does not remember an "LC_ALL locale".
(0005088)
shware_systems (reporter)
2020-10-30 18:23

No, I understand some are silly enough to implement it that way because the standard allows it and this simplifies the code a tiny bit, but also frequently wastes significant space on redundant string storage when a system has many processes active. Having all categories reference a single string value after a setlocale(LC_ALL) isn't precluded, however.

You are correct, for these implementations as they stand, this affects what getlocalename_l(LC_ALL, LC_GLOBAL_LOCALE) can return. The only reliable value is the default "POSIX" required at process startup. I do not see as onerous adding a CX requirement to setlocale() that when LC_ALL is used the value be saved for use by this interface. Similar goes for newlocale() when base is 0 about locale_t data maintaining a reference. This does not force them to change code to make use of it for accessing locale data, but enables this feature to be portable. Other possibilities are ENOSUP as a may fail error or a sysconf() "is this reliable" check for adding it to be robust.
(0005091)
geoffclare (manager)
2020-10-31 09:29

Re: Note: 0005088 I give up. Correcting your misconceptions it taking up too much of my time.

Unless anyone else feels like taking up the reins, I suggest that we ignore any further comments "shware_systems" makes about LC_ALL in this bug.
(0005092)
shware_systems (reporter)
2020-10-31 15:13

Re: 5091
You're fixated on there's only one way to do locale processing, it looks, and there's more. That you aren't willing to even entertain such possibilities means it is your commentary that is invalid and should be ignored, except as noted.
(0005341)
geoffclare (manager)
2021-04-29 15:48

Make the changes from "Additional APIs for Issue 8, Part 1" (Austin/1110).

- Issue History
Date Modified Username Field Change
2018-12-20 13:46 bhaible New Issue
2018-12-20 13:46 bhaible Name => Bruno Haible
2018-12-20 13:46 bhaible Organization => GNU
2018-12-20 13:46 bhaible Section => ---
2018-12-20 13:46 bhaible Page Number => ---
2018-12-20 13:46 bhaible Line Number => ---
2020-10-05 11:11 geoffclare Note Added: 0005026
2020-10-05 11:13 geoffclare Note Edited: 0005026
2020-10-05 11:13 geoffclare Note Edited: 0005026
2020-10-05 15:45 bhaible Note Added: 0005027
2020-10-05 17:57 shware_systems Note Added: 0005030
2020-10-05 17:58 shware_systems Note Edited: 0005030
2020-10-07 13:29 geoffclare Note Added: 0005035
2020-10-07 15:08 shware_systems Note Added: 0005037
2020-10-23 14:21 geoffclare Note Added: 0005059
2020-10-23 15:57 shware_systems Note Added: 0005062
2020-10-26 10:03 geoffclare Note Added: 0005063
2020-10-26 14:50 shware_systems Note Added: 0005065
2020-10-26 16:16 geoffclare Note Added: 0005067
2020-10-29 18:25 shware_systems Note Added: 0005085
2020-10-30 09:23 geoffclare Note Added: 0005086
2020-10-30 18:23 shware_systems Note Added: 0005088
2020-10-31 09:29 geoffclare Note Added: 0005091
2020-10-31 15:13 shware_systems Note Added: 0005092
2021-04-29 15:48 geoffclare Note Added: 0005341
2021-04-29 15:49 geoffclare Interp Status => ---
2021-04-29 15:49 geoffclare Final Accepted Text => Note: 0005341
2021-04-29 15:49 geoffclare Status New => Resolved
2021-04-29 15:49 geoffclare Resolution Open => Accepted As Marked
2021-04-29 15:50 geoffclare Tag Attached: issue8
2021-05-07 15:37 geoffclare Status Resolved => Applied


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker