Is PCLI possible?
At the French Perl Workshop (and probably elsewhere before), Matt Trout talked about why and how some ideas could succeed and most other fail. An interesting talk. The last part was a rant against the numerous modules on the CPAN to parse command-line options. His proposal was to do the same thing to command-line programs (CLI) than PSGI did to web applications: formulate a generic specification, that developers can use to build applications and not depend on a specific implementation.
A very interesting idea. We were a few (BooK, cmaussan, niels and myself) to begin thinking about what this PCLI specification could look like. I wrote down on paper what I've been doing in the numerous CLI programs (for a broad value of the terms) I wrote over the past years.
A CLI program has to go through these steps:
- fetch and parse environment variables (
%ENV
); errors may be warned about or fatal - fetch and parse command-line options (
@ARGV
); errors are fatal - handle usage and help (
Pod::Usage
- parse configuration file; errors are fatal
- check all parameters (mandatory options, exclusive options, etc); errors are fatal
- apply all parameters default values
- setup logging; errors may be silent, warned about, or fatal
- become a daemon, for the programs intended to run this way
- setup basic signals managements (
INT
,TERM
,QUIT
,HUP
), usually for daemons, but can also be useful for programs running for a long time
Now, I didn't know what PSGI looked like (I'm a sysdev, not a webdev). Today, I read the specification, and it confirmed what I had already identified: yes, PSGI is brilliantly "simple" because it was written by people knowing very well this field, helping themselves with the experience of similar specifications (WSGI, Rack, JSGI). But it is also simple because it is a protocol built on top of a protocol (HTTP), built on top of a protocol (TCP), built... At PSGI level, there's just a bidirectional channel with data going in both directions. And it's platform agnostic.
However, in the CLI world (and let's limit ourselves to the Unix world for the sake of simplicity), we have multiple channels of input (%ENV
, options, configuration file(s), STDIN
, input data files, signals) and output (STDOUT
, STDERR
, logging, output data files), and no standardized protocol of any kind.
Now, I'm afraid that either we try to cover most of the features we need, and we end up with a specification too complex, or we limit to a subset of the features, and we end up with a specification too simple to be actually useful.
Did I miss something? Was Matt too optimistic or am I once again too pessimistic?
First thing that springs to mind is encoding of the various inputs. Encoding of the config file is determined by the editor used to write it. Encoding of @ARGV is (probably??) determined by the terminal and LC_* variables. Encoding of %ENV is... I have no idea! Thankfully not needed to worry about that yet. PCLI should do the right thing and convert them all to correctly encoded Unicode character strings, with errors being fatal.
Not always applicable. And for many uses of env vars, where they aren’t configuring stuff that can’t also be configured another way, I don’t see much advantage here. It’s not like a web app that might be running under Apache or any number of other web servers where the env var names might theoretically be different from one server to the next. Instead, these are generally defined by the app not its environment, so using some sort of wrapper around $ENV{MYAPP_DEBUG} seems excessive.
Here we have a ton of options. A unifying standard will likely be a new option :-) but that’s not a reason not to try :-) Just be warned: different apps have different “modes”. For example, arguments may parse left-to-right or maybe all-at-once, or in an application-defined order (
--foo
is always parsed before--bar
, no matter the order on the comandline). Defaults can be dynamic (--bar
’s default may depend on--foo
’s value, which is why--foo
is parsed first, or on whether /bar is read-write or read-only). IME, this isn’t easy to genericise. As to errors, don’t forget about I18N. The ability to give errors to the developer using your framework instead of dying is important. I need to take your error, look it up, and display a translated version. Which also means I need the error in pieces. I need to know a) what the error is, and b) all the pieces that make it up (e.g., what option was unrecognised, or, for a recognised option, what value was invalid). All errors must be documented so I can use it to create a lookup table for other languages.Personally, I find Pod::Usage to be too far removed from the code that actually defines the parameters. Again, don’t forget about I18N. While having a Module::ES for the Spanish translation of the Module POD might work, it’s pretty far removed from how we normally do things, and would be a barrier for me to use your code at $work. On the other hand, if you delegate (or at least allow delegation) to me, I can do this with the same library that I use for all my other translations (e.g., Locale::Maketext, though that’s not what I’m using now).
Again, don’t forget I18N. I need to be able to spit out the errors translated.
This one is tricky. What logger are you using? My current project is using Log::Log4perl. It’s my first project using that logger. Others are heavily invested in other logging solutions (Log::Dispatch?). And, really, these two can be entirely overkill for many quickie apps that would otherwise gain hugely from such a proposed framework.
Again, a bit tricky, though I can understand how this one might be a bit more surprisingly so. Apps running under event loops (AnyEvent, POE, Reflex, Tk) not only have their own ways of dealing with signals, but also manage the signals in different ways as well. If you set $SIG{$foo} in some way, will the event loop clobber it or work with it? Really depends on what you set the handler to, too - if you set to IGNORE or some such, that will be clobbered and that’s fine. But if you set it to a real sub, that clobbering may not be so useful.
Personally, I think you might not be pessimistic enough ;-)
Be careful that the spec will be too broad, and thus fails flat from the get go. PSGI basically just covers how webserver gives information about the request to web application (or middleware), and how application/middleware returns response. There is no mention (and rightly so) about logging, daemon/process management, documentation/I18N, and all that.
Sure, a typical application will have those, but does it really need to be included in the spec?
I think the early PCLI spec should just specify a common interface for getting ARGV/ENV into a CLI application, which ideally should just be a function. And how exit code and output should be returned by that function.
My project is closely related BTW. Kindly take a look at these if you are interested: Rinci (particularly Rinci::function), Perinci::CmdLine.
App-Cmd is a pretty good framework for writing command line apps, though not as simple a design as PSGI.
App-Cmd handles argument/option processing, help, validation, etc quite easily.
I never had to deal with arguments outside ASCII, but reading P5P, I would say that it's better to leave the arguments in their original encoding. The main type of arguments outside ASCII are file names, but because they are external references, I'm not sure you can randomly convert them as if they were random strings.
I may be wrong, but my guess is that you don't have enough information about the encoding you receive, and the round-trip conversion won't be 100% safe. Especially because you need to know the encoding of the underlying filesystem of the volume hosting the file. Different OS have different formats (Windows uses UTF-16, most Unix seem to use UTF-8), but also different flavours (IIRC, OSX uses Canonical Form D, or something).
All in all, it seems jut safer to keep the file names as is.
Encoding of the environment variables, as you noted, is a wild guess. And about the configuration file, it means that the specification would need to impose a particular format, which it should not. I find .INI files very useful for most of programs I wrote over the past years, but I can see the need for more complex formats like Apache-like or even YAML. (or even XML, if you're into that.)
I don’t like environment variables, because it feels like some king of “hidden” information. Also, you can’t commit them. But many programs and modules make use of them, hence my including them in the discussion.
I think that was what made Matt angry: there are hundreds of modules to parse options, each with oh so slightly different features. Hence the idea of something that would possibly end this mess (quoting Matt with more urban words ;-)
But please note that the idea of PCLI is not to make yet another framework, but to write a specification that describes how things should work. This specification could then be implemented in different modules, and they would all be compatible because they would follow the spec. That’s what PSGI does for web applications. My post was to examine whether it’s actually possible to write such a generic specification.
Your remarks are interesting but I have an even simpler example of “options” for which I don’t know a module on the CPAN that provides a generic framework to handle them: object-verb -based syntax. Think of the Linux ip(8) command:
Maybe we could find something here similar with the “route handlers” in web apps, and use coderefs. Or see the options as a grammar that we want to describe. In both cases it seems pretty complex to describe, explain, and use.
Regarding I18N, I’d say it’s out of the PCLI scope, just as it is out of PSGI or even HTTP scope: it’s about describing how things work, not standardize the messages.
I included the step 3 because it provides a shortcut to avoid doing useless processing.
Steps 7 to 9 stretch the things a bit far. I agree they are useless in short-term programs. But they are needed in long-term programs. I included them in the list because they are useful to determine the scope of a PCLI specification.
To answer your question, I usually stick to Sys::Syslog, which is good enough, shipped with Perl and with a stable API. In fact, for the programs I wrote over the past years, the modules shipped with Perl were all good enough: Getopt::Long, Pod::Usage, Sys::Syslog. Only adding Proc::Daemon and Proc::PID::File for daemons.
Thanks for your detailed comment.
I mentioned that module to Matt, but PCLI would be a specification, not a module.
Agreed. As I said in my previous answer, the list I wrote was not necessarily intended to be a PCLI specification, but more like "what a CLI program has to do", and starting from here, what points do we consider within the scope.
I agree that some of these steps could be left out of PCLI, to avoid it being too broad. On the other end, if it is too small, will it be useful at all?
I didn't include input and output because I couldn't see the point of passing the corresponding file handles. Thinking again, I can see some use cases, but that feel technically awkward.
However I had forgotten exit codes, silly me.
Your modules are interesting. I'll have to take a serious look at them.
Thanks for your comment.
After meditating on the issue for a while I decided that the problem is that “a PSGI for the command line” is fundamentally mistaken. We already have that, it’s called
@ARGV
+%ENV
+STDIN
/STDOUT
/STDERR
.What Matt is actually asking for, equating to Web apps, would be not PSGI, but a standard for URI dispatching plus form validation (roughly corresponding to command line parsing) plus response generation AKA template engine interface (roughly corresponding to printing on STDOUT), along with assorted loose ends. So actually what he is saying he wants is a spec for the conceptual equivalent of something in command line apps that in Web apps would correspond not to PSGI, but to some sort of universal web framework specification.
Which does not exist.
And I think once you realise what he is asking for in reality, you give up. It is not even achievable in Web apps, which are a lot more uniform than command line apps.
But in the meantime there is App::Cmd as the first real framework for command line apps, or else now MooseX::App if you do not think that using “command line app” and “Moose” in the same sentence is self-contradictory. :-)
Wise words from a wise man :)
Before WSGI/Rack/PSGI there is also already CGI, a platform-neutral interface that works everywhere. IMO there is nothing in CGI that requires web application to be slow or fork per request. The Python guys, when devising SCGI that later became WSGI, basically just abstracted STDIN/STDOUT to make it more convenient for Python programs (not that it's not a good thing).
Yes. The CGI environment is simply less convenient – a lot harder – to set up in-process than inter-process, whereas PSGI makes both equally easy. I guess that is why it never caught on the same way as a universal protocol, the way that PSGI has. But neither protocol is inherently unsuitable for either scenario.