September 2011 Archives

Using cpanm just to download tarballs

Inspired by a recent blog post on patching cpanminus, I decided to spend a few minutes patching cpanm for a small itch I have: sometimes I want to download distribution tarballs only without building/installing. This patch adds a --download command to do just that.

To try it out for yourself:

$ git clone git://github.com/sharyanto/cpanminus.git
$ cd cpanminus
$ script/build.PL
$ pe…

YAML vs INI (Again) and the plan for yet another INI module

For the past several years, I'd been set on YAML as the format for configuration file. It's human-readable, pretty, portable, and support arbitrary data structures. But for future projects, I'm planning to use INI format. Why?

First of all, YAML is "too complex" for users. There are subtle syntaxes, like the requirement for list separator character (,) or the mapping character (:) to be followed by space. And then there are object literals like Yes/No/true/false, ~, date/time, etc. And the various ways to do heredocs. It would take at the very least an hour to explain the syntax to first timers, and days to familiarize with it.

Second, and this is more important for me, there are no round-trip parsers/emitters for YAML. You can't modify data without reformatting the whole file and removing all comments.

INI doesn't have these problems. It's pretty readable, it's familiar to most users including Windows users, and it's easy to write round-trip parser for it. But INI has a different set of problems. One, there is no standard/specification. Each implementation will differ in some ways and have different features. And two, it has restrictions in encoding data structures.

This brings me to a plan for an INI reader/writer module that I can use for my projects. It needs to the following features, and so far I've not found one on CPAN which satisfies these requirements. (But maybe I can patch an existing one instead of starting from scratch).

  • Round-trip. It needs to preserve formatting, including comments, blank lines, indentation, spacing between the "=", and so on.

  • Support storing deep array and hashes.

  • Support arbitrary section names, property names, & property values (e.g. property name containing "=", section name containing "]" or newlines).

So far here's the specification for the INI format:

  • UTF-8?

  • Comments are only allowed at the beginning of line (with/without indent) and not allowed after property value.

  • Property name/value outside section is allowed (since they are common) and will be assumed to be in a section (configurable).

  • Quoting using double quotes (") is allowed in section name, property name, and property value. All problematic characters should be escaped. Example:

    ; a section with empty name
    [""]
    "property name containing = and \"" = value
    property2 = "value\n\0"
    
  • Whitespace before property and section name, or before/after equal sign is allowed/ignored. To include whitespace, use quoting.

  • Duplicate sections are allowed, property names will be merged, the later sections having the precedence.

  • Deep hashes/arrays should be represented by section containing multiple paths:

    ["hash" "subhash" 0]
    name=value
    
  • Arrays are represented by duplicate property names:

    names=John
    names=Paul
    names=James
    
  • To differentiate between a string and an array of single elements, use a comment:

    A string, "val".

    name=val
    

    An array of one element: ["val"].

    ; array=1
    name=val
    
  • To differentiate between an empty string and an array of zero elements, use a comment:

    A empty string, "".

    name=
    

    An array of zero element: [].

    ; array=0
    name=
    
  • To specify an empty hash, use a section:

    ["hash"]
    
  • To specify null value and differentiate it from empty string, use a comment:

    ; null
    name=
    

This way, the INI files can contain arbitrary data structures, just like JSON, except that the basic data structure must be hash of hashes (HoH).

UPDATE 2011-09-30 02:22 UTC: Thanks for all the comments and suggestions, but note that what I need is a human-readable/-editable format AND a parser/emitter which preserves comments and formatting. I put a lot of comments in my config files, and they are as equally valuable as the config themselves.

UPDATE 2011-11-04: My implementation is at Config::Ini::OnDrugs, it's still very early and incomplete, but you can see the updated specification there.

About Steven Haryanto

user-pic A programmer (mostly Perl 5 nowadays). My CPAN ID: SHARYANTO. I'm sedusedan on perlmonks. My twitter is stevenharyanto (but I don't tweet much). Follow me on github: sharyanto.