0000771: Expose alternate shell function usage to scripts

Notes
(0001909) dwheeler (reporter) 2013-10-15 18:21	This is not the same as, but it is somewhat related to, the proposal to add "local" (as is implemented in dash and bash, and by "typeset" in ksh). For that proposal, see: http://austingroupbugs.net/view.php?id=767 [^]

(0001913) shware_systems (reporter) 2013-10-15 23:59	Error Correction, as I overlooked '{' and '}' as singletons are reserved word tokens, not punct chars: In Suggested Text, Replace: After Line 74888, XCU 2.10.2 Shell Grammar Rules, insert with leading tabs: "\| fname ’{’ ’}’ linebreak function_body" with: Replace Line 74785-6, XCU 2.10.2 Shell Grammar Rules, with: %token CLOBBER LRbrace /* ’>\|’ '{}' */ After Line 74888 insert with leading tabs: "\| fname LRbrace linebreak function_body"

(0001922) shware_systems (reporter) 2013-10-16 15:36 edited on: 2013-10-16 16:22	Re: Mailing List Discussion: This proposal is about exposing a facility only available to implementation writers now to application writers also, not general scoping issues. I expect functions defined using this syntax to behave as if they were separate executables and these have from the calling shell values passed as arguments and those referenced by the environ variable as initializers of their scope as copies, not references. The main difference for a script is it uses direct assignment and references to those copies rather than getenv/setenv calls for access, so they have a block of ENVNAME = getenv("ENVNAME"); equivalents before the function definition is evaluated to import those names into the functions 'name space', if it were an executable, and arguments are accessed with operators rather than argv[ ] references. The only thing that should be affecting the calling environment, variables wise, is the return value of the function and possibly which files are open for use with the redirection operators. Variables in the calling environment that aren't exported should be treated as unset in the body of the function, and it is on the function to initialize a variable with the same name before use. This includes files that aren't piped in being treated as closed if FD_CLOEXEC set for them. Whether these new variables are stored statically as unexported variables in the functions environment copy or dynamically on the stack is not germane to the logical model. Either way, any changes to them get thrown away as a consequence of the return and do not affect the calling environment. Whether files with FD_CLOEXEC are physically closed and reopened on function return or the reference handle is just treated as unassigned in the environment copy is about the only scope conflict I see due to side effects of open and close.

(0001934) ranjit (reporter) 2013-10-19 07:56	This is a nice proposal. IIUC it essentially means to run the function "as-if" in a separate environment, such that it cannot affect the parent-script environment. I agree there needs to be a syntactic difference at the definition level, since that exposes useful functionality in a backwards-compatible manner. As a new feature, discussion is vital to make it conceptually coherent, and there's just a few points I see, which I feel need such. Obviously these are where I have differences with what has been expressed above: that should not be construed as any sort of indication of a disagreement with the idea itself. By implication it means everything else strikes me as a good idea, or necessary or I missed what seems obvious to you, due to being tired and not you. ;) Firstly, the existing specification does not expose this to application at all afaict, since it "may only be done by implementations as internal to the shell." It appears thus to be a concession to implementations. As such, it does not make sense to me, to specify anything at all about what happens when "fname matches the name of a standard utility the shell has implemented as a function," and allow scope for them to be overridden, or undefined, at the same time as a definition of every other shell built-in shall cause an error. How is a portable scripter supposed to distinguish: this appears to expose implementation detail to the scripter, where we should just be exposing application functionality in a self-consistent manner, ie: by simply stating that shell built-ins are the implementation's business alone, with a concession to implementors as to how they implement them. If not I could foresee situations in my own work, where it might become necessary to test the shell (either by name or via functionality) in order to override a function in a specific situation because "everyone" 'knows XYZ is built-in in fubarsh, and that you should override it to instead run: arcane -brouha' I'd rather work in a cleanly-specified environment where what the built-ins do is defined, and how they do it is up to the implementation (since it know its environment better than I do.) And not leave scope for confusion at the language level. In a similar vein, I don't see any purpose whatsoever in a different syntax on call. Leave that to the definition, similar to how it is up an include file to knows whether it is an "include-once" or an "include-always", in a sane language. The implementor of the function knows whether it needs to run in a "separate" environment; the user should not need to worry. If they wrote it, then they know what it does, and if they didn't then the only reason this would be used is because the implementor knows it is necessary, and it should be immediately apparent from the definition that it is a func_shell, for want of a better term. (func_subshell is too long to type repeatedly for these old bones;> but feel free to s/func_shell/whateveryouprefer/g in the following.) I do not see the utility in this either: "Variables in the calling environment that aren't exported should be treated as unset in the body of the function, and it is on the function to initialize a variable with the same name before use." What for? Why not let me just use the same vars I set, and if I want to change them, then I can do that myself, just like with any other script function? Variables that are exported remain exported in the func_shell, and everything operates with least surprise. It simply cannot affect the parent env. As such any fds it opens should close when it does, imo. The same as if it were a child process, it cannot open fds in the parent. Either that, or use set to enable fd "passing" to the parent func_shell (ie by leaving fds it opens, open on return.) This would naturally be at the caller's discretion. The alternate syntax proposed seems odd; function_definition: fname '(' ')' linebreak function_body \| fname '(' '&' ')' linebreak function_body ; ..would appear more consistent with the use of '&' as fork operator. It is also immediately apparent, ie it "sticks out" for a shell scripter, since we tend to be alert to & along with \| ; * ? [ and so on in our scripts.

(0001936) shware_systems (reporter) 2013-10-19 18:57	Re: "The alternate syntax proposed seems odd; function_definition: fname '(' ')' linebreak function_body \| fname '(' '&' ')' linebreak function_body" The syntax choice was to match the usage of command list groups, per XCU 2.9.4.1, as a mnemonic aid. This is not a fork; it avoids a full fork to a sub-shell environment as a {...} group does, as this is an alias of a BEGIN...END keyword pair in expected effect. It is also a reminder that this declaration is to behave more like a 'C' definition whose syntax uses " '{' statements '}' ". I don't see using '&', as the "put to background" operator having the same association. Re: I do not see the utility in this either: "Variables in the calling environment that aren't exported should be treated as unset in the body of the function, and it is on the function to initialize a variable with the same name before use." The idea there is to make explicit that the environment should be as if a separate utility executable was exec'd, or as if 'sh script_file [args...]' was exec'd. These only have access to exported global variable copies as initializers from the shell, not the script's local variables and, being copies, changes to them do not affect the calling environment. Non-export local variables are expected to be passed as arguments to the function, not referenced directly and possibly modified as with the other type of declaration. This is the intended result in the context of how it affects script writers and how it's actually accomplished is on the implementation. Currently 'fname()' is more a 'C' macro definition in effect, and 'fname{}' would be consistent with an actual 'C' function definition in effect, as mentioned above; not precisely, but a lot closer. This allows function library include files, which are possibly from a third party, from inadvertently using bogus initial values due to a conflict of 'name within function' with 'name used outside of function'. The 'local' keyword model I described in a prior post on mailing list allows functions of this type to avoid name collisions with exported variables also, especially those the standard documents as expected to be exported (like PATH and LC_ALL), yet still use the initial values. This also makes explicit variables the function may export do not affect the export flag of a callers local variables. Re: Firstly, the existing specification does not expose this to application at all afaict, since it "may only be done by implementations as internal to the shell." It appears thus to be a concession to implementations. As such, it does not make sense to me, to specify anything at all about what happens when "fname matches the name of a standard utility the shell has implemented as a function," and allow scope for them to be overridden, or undefined, at the same time as a definition of every other shell built-in shall cause an error. This has to do with an arbitrary fname{} not hiding the definition of a standard utility implemented as a script from third party script library functions that rely on that utilities' behavior. It's a 'unspecified' case because the implementation may also provide a separate executable or script find-able along PATH and provide an extension to allow using that instead of a possibly overridden script version. Similar reasoning applies when the utility is provided only as a separate file; how does an implementation handle a script function possibly hiding access to those. Special Built-Ins are required to be part of the shell and their behavior is required to not be overridden by the command search routines, per XCU 2.9.1.1, Case 1.a. Because of this an attempt to define these as functions will never get executed so treating it as an error lets the writer know this unambiguously; typos happen, IOW, or they forgot that name is an SBI. Re: In a similar vein, I don't see any purpose whatsoever in a different syntax on call. Leave that to the definition, similar to how it is up an include file to knows whether it is an "include-once" or an "include-always", in a sane language. The implementor of the function knows whether it needs to run in a "separate" environment; the user should not need to worry. Yes, that's an error on my part, thinking more 'C' function than script function in the way that's worded (I was tired too :-) ), and forgot to put the rest of that thought. It was intended to relate to above that a call of form 'fname()', with no arguments inside the parens, would be a means of indicating a script should ignore command search or 'alias' utility redefinitions and use a script's current function definition explicitly, whether defined with 'fname()' or 'fname{}'. Maybe this should be a separate ERN but I do consider it a legitimate reason for different call syntax's. It's a portable way, syntactically, of showing a script really wants to use its' definition, which is somewhat the opposite of using an implementation-specific, and possibly non-portable, means for showing you want to use the version provided by the implementation instead of a function. Hopefully this makes the intent clearer.

(0001938) ranjit (reporter) 2013-10-21 16:10	We're agreed that the specification of a func_shell is akin to a function, only like a subshell it cannot affect the parent env. My point is simply that we should stop there, and not add anything else that is not required to make the above work. So essentially we're left with bike-shedding, if one wants to be unkind about our deliberations ;) > The syntax choice was to match the usage of command list groups, per XCU > 2.9.4.1, as a mnemonic aid. But there's no real need: the main purpose is a function, with one simple restriction (which we can take as given, ie specified from "on-high".) And the problem is that said mnemonic aid as originally envisioned required quoting, so now we've added two chars. A single ampersand sticks out like a mile, while also being easy to type, and for a user to remove, knowing what the consequent will be (now it's a normal function.) > This is not a fork; it avoids a full fork to a sub-shell environment as a > {...} group does, as this is an alias of a BEGIN...END keyword pair in > expected effect. Oh I'm well aware, and I am aware that & is more properly called the bg operator: I called it fork to get some thinking going (since that's the major aspect as far as the OS is concerned: two processes where there was one.) Either way, both ( .. ) and & are conventionally termed subshells when it comes to execution cost, and this is a "functional subshell". So I think & has more semantic consonance with what is happening than clobber, as well as being simpler to write and to change. > The idea there is to make explicit that the environment should be as if a > separate utility executable was exec'd, or as if 'sh script_file [args...]' > was exec'd. Yes but it's not very useful given that this is a function in grammatical and semantic terms. And it starts to sound more like a fork + exec than a fork, whereas the intent of this is to be lighter-weight than executing an external. > This allows function library include files, which are possibly from a third > party, from inadvertently using bogus initial values due to a conflict of > 'name within function' with 'name used outside of function'. That makes no sense at all: presumably said function would set the values it intends to use, or indeed use the value from the script as a variable to communicate non-parametric settings with. The latter is useful, ime, of both large bash and sh scripts, using exactly the method you describe of library include files. And allowing use of extant script vars only becomes more useful with local, assuming a dynamic scope. In essence this point seems to me a concession to bad scripts; that is no reason to hobble good ones by taking away useful functionality that is easier to implement as-is. > This also makes explicit variables the function may export do not affect the > export flag of a callers local variables. The syntax and the definition, in fact the very raison d'etre, of a func_shell already do that. > Because of this an attempt to define these as functions will never get > executed so treating it as an error lets the writer know this unambiguously; > typos happen, IOW, or they forgot that name is an SBI. That's my point though: if you want to do that, simply ban any SBI from being defined as a function, and it shall always be an error for the application to attempt it. No more needs to be said, afaic, and there is no scope for confusion by any party (read: "arcane -brouha".) Thanks for the responses; the original intent is already very clear and I support that intent fully. Rather it is a question of making the functionality work intuitively and simply in shell, and I think the above elucidates my opinions on it, so I won't comment again unless asked or addressed; I realise this goes out on the ML, but my subscription broke, so I'll get back there soon. Thanks for your time.

(0001939) ranjit (reporter) 2013-10-21 16:31	"simply ban any SBI from being defined as a function" by the application .. in case it is unclear; not by the implementation.

(0001941) shware_systems (reporter) 2013-10-22 16:11	> My point is simply that we should stop there, and not add anything else that is not required to make the above work. Much of the additions are already implicit... Mentioning them explicitly helps prevent complaints of "But I did it this way and it didn't ~explicitly~ say I couldn't do it this way so I want it to be considered conforming." It's also a signal to the conformance test suite maintainers that this is something that extra test cases might need to be written for. > So I think & has more semantic consonance with what is happening than clobber, as well as being simpler to write and to change. The proposal, as modified, defines '{}' as a new 2 char operator token, LRbrace, on the same lines as the 'clobber' operator. The single quotes are as used by the grammar, they aren't used in an actual script. This isn't obvious with a proportional font, but the standard uses Courier there and I spaced it accordingly. How clobber is interpreted isn't affected at all, nor does it complicate parsing this. I think using '(&)' would mean having it as a three character operator because current shells would parse it as a 'missing right paren' syntax error. We'll have to agree to disagree, I think, on which is more mnemonic. > That makes no sense at all: presumably said function would... This part just makes it explicit what is implied by 'consistent with that of a separate fork/exec of sh or other utility-as-executable', to avoid relying on as few presumptions as practical. These do not have read or write access to the non-exports, or locals, of the caller so neither should these function definitions. Whether bad coding style or not a reference to a non-exported unset name is expected to return "" until assigned another value. This requires the new_env_block processing preceding execle() for a utiliaty-exec-file invoke followed by a push_vars_ptr which includes a call to the init_vars_from_env part of sh's initialization. On function exit pop_vars_ptr frees the vars and env_block. Positional param and value push/pops handled the same. Note no fork(), nor actual exec() call. The idea is to be able to copy/paste an entire script that doesn't define any functions, without any other edits required, from a separate file as the <compound_list> part of this type of definition and see a performance increase. A '. file', where file contains only one or more definitions of this type, should be able to be used at any point where the shell would allow an inline definition of this type and behave the same, even as the first line of the script before any non-exports are set. That's what is being accommodated; a reliably flexible logical model, not a particular implementation type, or "good" or "bad" coding style. > > This also makes explicit variables the function may export do not affect the > > export flag of a callers local variables. > The syntax and the definition, in fact the very raison d'etre, of a func_shell already do that. The definition requires positional parameters to be isolated. This implies named variables should be also but this isn't explicit, and thus the "static" vs "dynamic" debate. Similarly, readonly applies to the end of the script, not the end of the function definition, and this implies export applies until wherever a name is used with the unset utility, in a function call or outside it. The "var=WORD fname()" syntax specifies a value push and pop, but leaves attribute push and pop implied. The definition of readonly implies "name=WORD; readonly name; name=WORD fname()" should abort as an assignment conflict after the second 'D', but this isn't explicit either, that I see. Point being, the implications can be slanted either way as to what the intended model is, but neither are fully represented as expressed in other languages. Re: 1939: Yes, that's implicit already also. This is for clarity a new capability is not being introduced.

(0002011) Don Cragun (manager) 2013-11-21 17:46	This was discussed at length during the November 21, 2013 conference call. Other bug reports providing for shell variable scoping (such 0000767 which proposes adding a local special built-in utility) provide most of what is requested here, even though it might not be as efficient as what the submitter desires. But, local is already implemented and is in use in scripts while the new feature requested here is invention and has several open issues about what should be done in certain cases. If there is a strong need for this functionality, the submitter needs to convince the maintainers of POSIX-conforming shells to implement and document this functionality and then propose standardizing that feature in a future revision of the standard.

Issue History
Date Modified	Username	Field	Change
2013-10-14 22:29	shware_systems	New Issue
2013-10-14 22:29	shware_systems	Name	=> Mark Ziegast
2013-10-14 22:29	shware_systems	Section	=> XCU 2.9.5
2013-10-14 22:29	shware_systems	Page Number	=> 2346
2013-10-14 22:29	shware_systems	Line Number	=> 74657+
2013-10-15 18:21	dwheeler	Note Added: 0001909
2013-10-15 23:59	shware_systems	Note Added: 0001913
2013-10-16 15:36	shware_systems	Note Added: 0001922
2013-10-16 16:22	shware_systems	Note Edited: 0001922
2013-10-19 07:56	ranjit	Note Added: 0001934
2013-10-19 18:57	shware_systems	Note Added: 0001936
2013-10-21 16:10	ranjit	Note Added: 0001938
2013-10-21 16:31	ranjit	Note Added: 0001939
2013-10-22 16:11	shware_systems	Note Added: 0001941
2013-11-14 16:08	geoffclare	Relationship added	related to 0000767
2013-11-21 17:46	Don Cragun	Interp Status	=> ---
2013-11-21 17:46	Don Cragun	Note Added: 0002011
2013-11-21 17:46	Don Cragun	Status	New => Closed
2013-11-21 17:46	Don Cragun	Resolution	Open => Rejected

Aardvark Mark IV