Is PCLI possible?

At the French Perl Workshop (and probably elsewhere before), Matt Trout talked about why and how some ideas could succeed and most other fail. An interesting talk. The last part was a rant against the numerous modules on the CPAN to parse command-line options. His proposal was to do the same thing to command-line programs (CLI) than PSGI did to web applications: formulate a generic specification, that developers can use to build applications and not depend on a specific implementation.

A very interesting idea. We were a few (BooK, cmaussan, niels and myself) to begin thinking about what this PCLI specification could look like. I wrote down on paper what I've been doing in the numerous CLI programs (for a broad value of the terms) I wrote over the past years.

A CLI program has to go through these steps:


  1. fetch and parse environment variables (%ENV); errors may be warned about or fatal

  2. fetch and parse command-line options (@ARGV); errors are fatal

  3. handle usage and help (Pod::Usage

  4. parse configuration file; errors are fatal

  5. check all parameters (mandatory options, exclusive options, etc); errors are fatal

  6. apply all parameters default values

  7. setup logging; errors may be silent, warned about, or fatal

  8. become a daemon, for the programs intended to run this way

  9. setup basic signals managements (INT, TERM, QUIT, HUP), usually for daemons, but can also be useful for programs running for a long time

Now, I didn't know what PSGI looked like (I'm a sysdev, not a webdev). Today, I read the specification, and it confirmed what I had already identified: yes, PSGI is brilliantly "simple" because it was written by people knowing very well this field, helping themselves with the experience of similar specifications (WSGI, Rack, JSGI). But it is also simple because it is a protocol built on top of a protocol (HTTP), built on top of a protocol (TCP), built... At PSGI level, there's just a bidirectional channel with data going in both directions. And it's platform agnostic.

However, in the CLI world (and let's limit ourselves to the Unix world for the sake of simplicity), we have multiple channels of input (%ENV, options, configuration file(s), STDIN, input data files, signals) and output (STDOUT, STDERR, logging, output data files), and no standardized protocol of any kind.

Now, I'm afraid that either we try to cover most of the features we need, and we end up with a specification too complex, or we limit to a subset of the features, and we end up with a specification too simple to be actually useful.

Did I miss something? Was Matt too optimistic or am I once again too pessimistic?

11 Comments

First thing that springs to mind is encoding of the various inputs. Encoding of the config file is determined by the editor used to write it. Encoding of @ARGV is (probably??) determined by the terminal and LC_* variables. Encoding of %ENV is... I have no idea! Thankfully not needed to worry about that yet. PCLI should do the right thing and convert them all to correctly encoded Unicode character strings, with errors being fatal.

  1. Not always applicable. And for many uses of env vars, where they aren’t configuring stuff that can’t also be configured another way, I don’t see much advantage here. It’s not like a web app that might be running under Apache or any number of other web servers where the env var names might theoretically be different from one server to the next. Instead, these are generally defined by the app not its environment, so using some sort of wrapper around $ENV{MYAPP_DEBUG} seems excessive.

  2. Here we have a ton of options. A unifying standard will likely be a new option :-) but that’s not a reason not to try :-) Just be warned: different apps have different “modes”. For example, arguments may parse left-to-right or maybe all-at-once, or in an application-defined order (--foo is always parsed before --bar, no matter the order on the comandline). Defaults can be dynamic (--bar’s default may depend on --foo’s value, which is why --foo is parsed first, or on whether /bar is read-write or read-only). IME, this isn’t easy to genericise. As to errors, don’t forget about I18N. The ability to give errors to the developer using your framework instead of dying is important. I need to take your error, look it up, and display a translated version. Which also means I need the error in pieces. I need to know a) what the error is, and b) all the pieces that make it up (e.g., what option was unrecognised, or, for a recognised option, what value was invalid). All errors must be documented so I can use it to create a lookup table for other languages.

  3. Personally, I find Pod::Usage to be too far removed from the code that actually defines the parameters. Again, don’t forget about I18N. While having a Module::ES for the Spanish translation of the Module POD might work, it’s pretty far removed from how we normally do things, and would be a barrier for me to use your code at $work. On the other hand, if you delegate (or at least allow delegation) to me, I can do this with the same library that I use for all my other translations (e.g., Locale::Maketext, though that’s not what I’m using now).

  4. Again, don’t forget I18N. I need to be able to spit out the errors translated.

  5. This one is tricky. What logger are you using? My current project is using Log::Log4perl. It’s my first project using that logger. Others are heavily invested in other logging solutions (Log::Dispatch?). And, really, these two can be entirely overkill for many quickie apps that would otherwise gain hugely from such a proposed framework.

  6. Again, a bit tricky, though I can understand how this one might be a bit more surprisingly so. Apps running under event loops (AnyEvent, POE, Reflex, Tk) not only have their own ways of dealing with signals, but also manage the signals in different ways as well. If you set $SIG{$foo} in some way, will the event loop clobber it or work with it? Really depends on what you set the handler to, too - if you set to IGNORE or some such, that will be clobbered and that’s fine. But if you set it to a real sub, that clobbering may not be so useful.

Personally, I think you might not be pessimistic enough ;-)

Be careful that the spec will be too broad, and thus fails flat from the get go. PSGI basically just covers how webserver gives information about the request to web application (or middleware), and how application/middleware returns response. There is no mention (and rightly so) about logging, daemon/process management, documentation/I18N, and all that.

Sure, a typical application will have those, but does it really need to be included in the spec?

I think the early PCLI spec should just specify a common interface for getting ARGV/ENV into a CLI application, which ideally should just be a function. And how exit code and output should be returned by that function.

My project is closely related BTW. Kindly take a look at these if you are interested: Rinci (particularly Rinci::function), Perinci::CmdLine.

App-Cmd is a pretty good framework for writing command line apps, though not as simple a design as PSGI.

App-Cmd handles argument/option processing, help, validation, etc quite easily.

After meditating on the issue for a while I decided that the problem is that “a PSGI for the command line” is fundamentally mistaken. We already have that, it’s called @ARGV + %ENV + STDIN/STDOUT/STDERR.

What Matt is actually asking for, equating to Web apps, would be not PSGI, but a standard for URI dispatching plus form validation (roughly corresponding to command line parsing) plus response generation AKA template engine interface (roughly corresponding to printing on STDOUT), along with assorted loose ends. So actually what he is saying he wants is a spec for the conceptual equivalent of something in command line apps that in Web apps would correspond not to PSGI, but to some sort of universal web framework specification.

Which does not exist.

And I think once you realise what he is asking for in reality, you give up. It is not even achievable in Web apps, which are a lot more uniform than command line apps.

But in the meantime there is App::Cmd as the first real framework for command line apps, or else now MooseX::App if you do not think that using “command line app” and “Moose” in the same sentence is self-contradictory. :-)

Wise words from a wise man :)

Before WSGI/Rack/PSGI there is also already CGI, a platform-neutral interface that works everywhere. IMO there is nothing in CGI that requires web application to be slow or fork per request. The Python guys, when devising SCGI that later became WSGI, basically just abstracted STDIN/STDOUT to make it more convenient for Python programs (not that it's not a good thing).

Yes. The CGI environment is simply less convenient – a lot harder – to set up in-process than inter-process, whereas PSGI makes both equally easy. I guess that is why it never caught on the same way as a universal protocol, the way that PSGI has. But neither protocol is inherently unsuitable for either scenario.

Leave a comment

About Maddingue

user-pic supposedly maintaining CPAN modules