Austin Group Defect Tracker

Aardvark Mark IV


Viewing Issue Simple Details Jump to Notes ] Issue History ] Print ]
ID Category Severity Type Date Submitted Last Update
0001220 [1003.1(2016/18)/Issue7+TC2] System Interfaces Editorial Omission 2018-12-20 13:46 2020-10-26 16:16
Reporter bhaible View Status public  
Assigned To
Priority normal Resolution Open  
Status New  
Name Bruno Haible
Organization GNU
User Reference
Section ---
Page Number ---
Line Number ---
Interp Status ---
Final Accepted Text
Summary 0001220: Add an API to query the name of a locale category of a locale object
Description The purpose of the locale API is to let information flow within an application; it's not primarily an interface to the kernel. The information producer is the code that sets the locale, either implicitly at program start, or through setlocale, or per-thread through uselocale. The information consumer is the code that produces different data depending on the locale. There are information consumers defined by this standard (such as mbrtowc, which uses the LC_CTYPE category of the current locale, or fprintf, which prints floating-point numbers using the LC_NUMERIC category of the current locale), and there are information consumers defined by the application. So far, information consumers defined by the application can only query the locale set at program start or through setlocale, through a call to setlocale(category,NULL). So far, information consumers defined by the application have no way to actually be implemented in a portable way, because such an API is not defined by this standard. Instead, different APIs are provided by different libc vendors.

The need for such an API is best proved by looking at the locale_t object of OpenBSD 6.2. It contains only the minimal information needed to fulfil this standard. A locale_t object for "en_US.UTF-8" and a locale_t object object for "de_DE.UTF-8" are indistinguishable; they are in fact the same pointer values. In such a system, an application has no way to produce locale dependent data (such as, a different decimal separator (LC_NUMERIC category) or different translations of a string (LC_MESSAGES category)). This proves that the API provided by this standard so far is insufficient for information consumers defined by the application.

I propose to add an API as shown below.

Rationale for this API:

Different systems have different APIs.

The API proposed here is the one considered for Solaris 12. See https://lists.gnu.org/archive/html/bug-gnulib/2015-01/msg00078.html [^] . Solaris 11.4 in fact has __getlocalename_l.

FreeBSD and macOS have querylocale(). The problem with this API is that its first argument is a mask; the application therefore first has to convert the category to a category_mask.

GNU libc, Cygwin, and musl libc have an API that is based on <langinfo.h>: nl_langinfo_l (NL_LOCALE_NAME (category), locobj).

Desired Action NAME
     getlocalename_l - get name of locale object

SYNOPSIS
     #include <locale.h>
     const char * getlocalename_l (int category, locale_t locobj);

DESCRIPTION
     The getlocalename_l() function shall return the name of the locale locobj for the given category.

     The category argument must specify the category. The possible values include LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_MONETARY, LC_NUMERIC, LC_TIME, with the meaning defined in [reference to setlocale() description]. Otherwise, the behavior is undefined.

     The locobj argument must be a valid locale object handle and must not be equal to LC_GLOBAL_LOCALE. Otherwise, the behavior is undefined.

RETURN VALUE
     The getlocalename_l() function shall return the string associated with the specified category for the given locale. This string's lifetime ends when freelocale(locobj) gets invoked.

ERRORS
    No errors are defined.

EXAMPLES
    Determining the name of a category of the current per-thread locale

    The following example shows how to extract the name of a LC_NUMERIC category of the current per-thread locale.

    #include <locale.h>
    ...
    const char *name;
    locale_t loc = uselocale (NULL);
    if (loc == LC_GLOBAL_LOCALE)
      name = setlocale (LC_NUMERIC, NULL);
    else
      name = getlocalename_l (LC_NUMERIC, loc);

APPLICATION USAGE
    The special locale object handle LC_GLOBAL_LOCALE must not be passed for the locobj argument, even when returned by the uselocale() function.

SEE ALSO
    newlocale, freelocale, setlocale
    XBD Locale, <locale.h>
Tags No tags attached.
Attached Files

- Relationships

-  Notes
(0005026)
geoffclare (manager)
2020-10-05 11:11
edited on: 2020-10-05 11:13

Suggested changes to go into The Open Group company review...

(The main differences from the desired action are the addition of the <locale.h> change, avoidance of the word "must", and requiring that a null pointer is returned when category is invalid, instead of the behaviour being undefined.)

On page 286 line 9687 section <locale.h>, add:
[CX]const char *getlocalename_l(int, locale_t);[/CX]

On page 1050 insert a new getlocalename_l page:

NAME
getlocalename_l -- get a locale name from a locale object

SYNOPSIS
[CX]#include <locale.h>

const char * getlocalename_l(int category, locale_t locobj);[/CX]

DESCRIPTION
The getlocalename_l() function shall return the locale name for the given locale category of the locale object locobj.

The category argument specifies the locale category to be queried. If the value is LC_ALL or is not a supported locale category value (see [xref to setlocale()]), getlocalename_l() shall fail.

The behavior is undefined if the locobj argument is the special locale object LC_GLOBAL_LOCALE or is not a valid locale object handle.

RETURN VALUE
Upon successful completion, getlocalename_l() shall return a pointer to a string containing the locale name; otherwise, a null pointer shall be returned. The returned string shall remain valid until the locale object locobj is used in a call to freelocale() or as the base argument in a successful call to newlocale().

ERRORS
No errors are defined.

EXAMPLES
Determining the locale name for a category of the current locale

The following example shows how to extract the locale name for the LC_NUMERIC category of the current thread-local locale, or of the global locale if no thread-local locale is in use.
#include <locale.h>
...
const char *name;
locale_t loc = uselocale(NULL);
if (loc == LC_GLOBAL_LOCALE)
    name = setlocale(LC_NUMERIC, NULL);
else
    name = getlocalename_l(LC_NUMERIC, loc);

APPLICATION USAGE
Applications need to ensure that they do not pass the special locale object handle LC_GLOBAL_LOCALE as the locobj argument, even when returned by the uselocale() function.

RATIONALE
None.

FUTURE DIRECTIONS
None.

SEE ALSO
freelocale(), newlocale(), setlocale(), uselocale()

XBD Chapter 7 (on page XXX), <locale.h>

CHANGE HISTORY
First released in Issue 8.

Add getlocalename_l() to the SEE ALSO section for each page listed in the getlocalename_l() SEE ALSO above.

On page 3791 line 130104 section E.1, add getlocalename_l() to the POSIX_MULTI_CONCURRENT_LOCALES subprofile group.

(0005027)
bhaible (reporter)
2020-10-05 15:45

Alas, there is an issue with the example program:
It should be possible to implement multithread-safe information consumers. However, setlocale is not multithread-safe: "The setlocale() function need not be thread-safe."

I see two ways to fix this:

(A) Specify that setlocale(category,NULL) is multithread-safe. That is, if thread1 executes setlocale(category1,NULL) and thread2 executes setlocale(category2,NULL), these two calls will not interfere with each other. Currently this is known to be true (for category == LC_ALL) on GNU libc, HP-UX, IRIX, Solaris, Microsoft Windows, and is known to be false (again, for category == LC_ALL) on musl libc, macOS, FreeBSD, NetBSD, OpenBSD, AIX, Haiku, Cygwin.

(B) Specify that getlocalename_l(category,LC_GLOBAL_LOCALE) returns the same result as setlocale(category,NULL) and that getlocalename_l is multithread-safe.

What is preferred, (A) or (B)?
(0005030)
shware_systems (reporter)
2020-10-05 17:57
edited on: 2020-10-05 17:58

Option (A) I would consider an unwarranted CX extension to setlocale(), for reasons related to preemptive thread scheduling. I won't belabor why, just is a 'no' vote, imho. I feel Option (B) has more to commend it, being POSIX specific and gives a use for LC_GLOBAL_LOCALE, not have its use be undefined.

(0005035)
geoffclare (manager)
2020-10-07 13:29

Alternative changes that solve the thread-safety problem by having getlocalename_l() handle LC_GLOBAL_LOCALE ...

On page 286 line 9687 section <locale.h>, add:
[CX]const char *getlocalename_l(int, locale_t);[/CX]

On page 1050 insert a new getlocalename_l page:

NAME
getlocalename_l - get a locale name from a locale object

SYNOPSIS
[CX]#include <locale.h>

const char * getlocalename_l(int category, locale_t locobj);[/CX]

DESCRIPTION
The getlocalename_l() function shall return the locale name for the given locale category of the locale object locobj, or of the global locale if locobj is the special locale object LC_GLOBAL_LOCALE.

The category argument specifies the locale category to be queried. If the value is LC_ALL or is not a supported locale category value (see [xref to setlocale()]), getlocalename_l() shall fail.

The behavior is undefined if the locobj argument is neither the special locale object LC_GLOBAL_LOCALE nor a valid locale object handle.

RETURN VALUE
Upon successful completion, getlocalename_l() shall return a pointer to a string containing the locale name; otherwise, a null pointer shall be returned.

If locobj is LC_GLOBAL_LOCALE, the returned string pointer might be invalidated or the string content might be overwritten by a subsequent call in the same thread to getlocalename_l() with LC_GLOBAL_LOCALE; the returned string pointer might also be invalidated if the calling thread is terminated. Otherwise, the returned string pointer and content shall remain valid until the locale object locobj is used in a call to freelocale() or as the base argument in a successful call to newlocale().

ERRORS
No errors are defined.

EXAMPLES
Determining the locale name for a category of the current locale

The following example shows how to obtain the locale name for the LC_NUMERIC category of the current thread-local locale, or of the global locale if no thread-local locale is in use.
#include <locale.h>
...
const char *name;
locale_t loc = uselocale(NULL);
name = getlocalename_l(LC_NUMERIC, loc);

APPLICATION USAGE
None.

RATIONALE
Historical versions of getlocalename_l() did not handle the special locale object LC_GLOBAL_LOCALE, requiring that applications used setlocale(category, NULL) to query the global locale if uselocale(NULL) returned LC_GLOBAL_LOCALE. However, since setlocale() is not required to be thread-safe (even when the only concurrent calls are ones that query the locale), this method was problematic for multi-threaded processes. This standard requires that getlocalename_l(category, LC_GLOBAL_LOCALE) queries the global locale in a thread-safe manner, for example by returning a pointer to a thread-local internal buffer instead of a process-wide internal buffer.

FUTURE DIRECTIONS
None.

SEE ALSO
freelocale(), newlocale(), setlocale(), uselocale()

XBD Chapter 7 (on page XXX), <locale.h>

CHANGE HISTORY
First released in Issue 8.

Add getlocalename_l() to the SEE ALSO section for each page listed in the getlocalename_l() SEE ALSO above.

On page 3791 line 130104 section E.1, add getlocalename_l() to the POSIX_MULTI_CONCURRENT_LOCALES subprofile group.
(0005037)
shware_systems (reporter)
2020-10-07 15:08

Looks decent, but I think LC_ALL shouldn't be precluded; I'd rather see it reflect the value of LC_ALL or LANG referenced in the environment when newlocale() was called to create locobj, or inherited via duplocale(), and may return an empty string, or "POSIX", if neither are set.
(0005059)
geoffclare (manager)
2020-10-23 14:21

The getlocalename_l() addition has been made in the Issue8NewAPIs branch in gitlab, based on Note: 0005035.
(0005062)
shware_systems (reporter)
2020-10-23 15:57

I feel Note:5059 is premature, in that Note: 5037 is still open for discussion on the Etherpad. I thought we were getting back to that after the current bug discussion was concluded. There is at least one implementation that already has the internal support for it, as well, and therefore exposing that support as suggested is trivial.
(0005063)
geoffclare (manager)
2020-10-26 10:03

Re note 5062, note 5037 was discussed in the 8th October teleconference and the decision we made was that we would not modify Note: 0005035 to add support for LC_ALL. In the 12th October teleconference we moved on to the backlog of other bugs because we had finished looking at the bugs relating to new APIs sponsored by The Open Group.

To add support for LC_ALL we would need to specify how multiple locale names would be returned, and this would be complicated by the existence of additional non-standard categories. If an application wants to know the locale names for all of the categories, it can simply query them one at a time.
(0005065)
shware_systems (reporter)
2020-10-26 14:50

Nothing in Note: 5037 implies multiple locale names are to be returned, mimicing setlocale(). The LC_ALL and LANG environment values are just a single name. That was someone's invention during that call this was the intent and I agree it would be nonsensical to do it. We've even argued at other times it's non-sensical for setlocale() to require it to begin with.

The primary reason this makes sense is it simplifies library routines that modify aspects of a locale but then are expected to reset them to the original LC_ALL value. Without a means to query the object for this name a separate parameter to the function is required to specify that value, since the function has no way of knowing a previous name returned after a change is the same as that LC_ALL name, nor that LC_ALL in the environment hasn't been modified.
(0005067)
geoffclare (manager)
2020-10-26 16:16

Re note 5065, environment variables are not always used (and if they are used, then they can simply be queried with getenv()). If an application calls:
setlocale(LC_NUMERIC, locale1);
setlocale(LC_TIME, locale2);
then getlocalename_l(LC_ALL, LC_GLOBAL_LOCALE) would have to return, somehow, the information that the locale name for LC_NUMERIC is locale1, for LC_TIME is locale2, and for all other categories is "C" or "POSIX".

With locale objects, in order to modify a locale and restore later there is no need to query locale names. It can be done using the existing locale_t handling functions (i.e. duplocale(), newlocale(), uselocale(), and freelocale()).

- Issue History
Date Modified Username Field Change
2018-12-20 13:46 bhaible New Issue
2018-12-20 13:46 bhaible Name => Bruno Haible
2018-12-20 13:46 bhaible Organization => GNU
2018-12-20 13:46 bhaible Section => ---
2018-12-20 13:46 bhaible Page Number => ---
2018-12-20 13:46 bhaible Line Number => ---
2020-10-05 11:11 geoffclare Note Added: 0005026
2020-10-05 11:13 geoffclare Note Edited: 0005026
2020-10-05 11:13 geoffclare Note Edited: 0005026
2020-10-05 15:45 bhaible Note Added: 0005027
2020-10-05 17:57 shware_systems Note Added: 0005030
2020-10-05 17:58 shware_systems Note Edited: 0005030
2020-10-07 13:29 geoffclare Note Added: 0005035
2020-10-07 15:08 shware_systems Note Added: 0005037
2020-10-23 14:21 geoffclare Note Added: 0005059
2020-10-23 15:57 shware_systems Note Added: 0005062
2020-10-26 10:03 geoffclare Note Added: 0005063
2020-10-26 14:50 shware_systems Note Added: 0005065
2020-10-26 16:16 geoffclare Note Added: 0005067


Mantis 1.1.6[^]
Copyright © 2000 - 2008 Mantis Group
Powered by Mantis Bugtracker