YAML vs INI (Again) and the plan for yet another INI module
For the past several years, I'd been set on YAML as the format for configuration file. It's human-readable, pretty, portable, and support arbitrary data structures. But for future projects, I'm planning to use INI format. Why?
First of all, YAML is "too complex" for users. There are subtle syntaxes, like the requirement for list separator character (,) or the mapping character (:) to be followed by space. And then there are object literals like Yes/No/true/false, ~, date/time, etc. And the various ways to do heredocs. It would take at the very least an hour to explain the syntax to first timers, and days to familiarize with it.
Second, and this is more important for me, there are no round-trip parsers/emitters for YAML. You can't modify data without reformatting the whole file and removing all comments.
INI doesn't have these problems. It's pretty readable, it's familiar to most users including Windows users, and it's easy to write round-trip parser for it. But INI has a different set of problems. One, there is no standard/specification. Each implementation will differ in some ways and have different features. And two, it has restrictions in encoding data structures.
This brings me to a plan for an INI reader/writer module that I can use for my projects. It needs to the following features, and so far I've not found one on CPAN which satisfies these requirements. (But maybe I can patch an existing one instead of starting from scratch).
Round-trip. It needs to preserve formatting, including comments, blank lines, indentation, spacing between the "=", and so on.
Support storing deep array and hashes.
Support arbitrary section names, property names, & property values (e.g. property name containing "=", section name containing "]" or newlines).
So far here's the specification for the INI format:
Comments are only allowed at the beginning of line (with/without indent) and not allowed after property value.
Property name/value outside section is allowed (since they are common) and will be assumed to be in a section (configurable).
Quoting using double quotes (") is allowed in section name, property name, and property value. All problematic characters should be escaped. Example:
; a section with empty name [""] "property name containing = and \"" = value property2 = "value\n\0"
Whitespace before property and section name, or before/after equal sign is allowed/ignored. To include whitespace, use quoting.
Duplicate sections are allowed, property names will be merged, the later sections having the precedence.
Deep hashes/arrays should be represented by section containing multiple paths:
["hash" "subhash" 0] name=value
Arrays are represented by duplicate property names:
names=John names=Paul names=James
To differentiate between a string and an array of single elements, use a comment:
A string, "val".
An array of one element: ["val"].
; array=1 name=val
To differentiate between an empty string and an array of zero elements, use a comment:
A empty string, "".
An array of zero element: .
; array=0 name=
To specify an empty hash, use a section:
To specify null value and differentiate it from empty string, use a comment:
; null name=
This way, the INI files can contain arbitrary data structures, just like JSON, except that the basic data structure must be hash of hashes (HoH).
UPDATE 2011-09-30 02:22 UTC: Thanks for all the comments and suggestions, but note that what I need is a human-readable/-editable format AND a parser/emitter which preserves comments and formatting. I put a lot of comments in my config files, and they are as equally valuable as the config themselves.
UPDATE 2011-11-04: My implementation is at Config::Ini::OnDrugs, it's still very early and incomplete, but you can see the updated specification there.