YAML vs INI (Again) and the plan for yet another INI module

By Steven Haryanto on September 30, 2011 12:20 AM

For the past several years, I'd been set on YAML as the format for configuration file. It's human-readable, pretty, portable, and support arbitrary data structures. But for future projects, I'm planning to use INI format. Why?

First of all, YAML is "too complex" for users. There are subtle syntaxes, like the requirement for list separator character (,) or the mapping character (:) to be followed by space. And then there are object literals like Yes/No/true/false, ~, date/time, etc. And the various ways to do heredocs. It would take at the very least an hour to explain the syntax to first timers, and days to familiarize with it.

Second, and this is more important for me, there are no round-trip parsers/emitters for YAML. You can't modify data without reformatting the whole file and removing all comments.

INI doesn't have these problems. It's pretty readable, it's familiar to most users including Windows users, and it's easy to write round-trip parser for it. But INI has a different set of problems. One, there is no standard/specification. Each implementation will differ in some ways and have different features. And two, it has restrictions in encoding data structures.

This brings me to a plan for an INI reader/writer module that I can use for my projects. It needs to the following features, and so far I've not found one on CPAN which satisfies these requirements. (But maybe I can patch an existing one instead of starting from scratch).

Round-trip. It needs to preserve formatting, including comments, blank lines, indentation, spacing between the "=", and so on.
Support storing deep array and hashes.
Support arbitrary section names, property names, & property values (e.g. property name containing "=", section name containing "]" or newlines).

So far here's the specification for the INI format:

UTF-8?
Comments are only allowed at the beginning of line (with/without indent) and not allowed after property value.
Property name/value outside section is allowed (since they are common) and will be assumed to be in a section (configurable).
Quoting using double quotes (") is allowed in section name, property name, and property value. All problematic characters should be escaped. Example:
```
; a section with empty name
[""]
"property name containing = and \"" = value
property2 = "value\n\0"
```
Whitespace before property and section name, or before/after equal sign is allowed/ignored. To include whitespace, use quoting.
Duplicate sections are allowed, property names will be merged, the later sections having the precedence.
Deep hashes/arrays should be represented by section containing multiple paths:
```
["hash" "subhash" 0]
name=value
```
Arrays are represented by duplicate property names:
```
names=John
names=Paul
names=James
```
To differentiate between a string and an array of single elements, use a comment:

A string, "val".
```
name=val
```
An array of one element: ["val"].
```
; array=1
name=val
```
To differentiate between an empty string and an array of zero elements, use a comment:

A empty string, "".
```
name=
```
An array of zero element: [].
```
; array=0
name=
```
To specify an empty hash, use a section:
```
["hash"]
```
To specify null value and differentiate it from empty string, use a comment:
```
; null
name=
```

This way, the INI files can contain arbitrary data structures, just like JSON, except that the basic data structure must be hash of hashes (HoH).

UPDATE 2011-09-30 02:22 UTC: Thanks for all the comments and suggestions, but note that what I need is a human-readable/-editable format AND a parser/emitter which preserves comments and formatting. I put a lot of comments in my config files, and they are as equally valuable as the config themselves.

UPDATE 2011-11-04: My implementation is at Config::Ini::OnDrugs, it's still very early and incomplete, but you can see the updated specification there.

28 comments

28 Comments

Anonymous | September 30, 2011 12:59 AM | Reply

No. Just no. Use Config::General.

Shawn H Corey | September 30, 2011 3:20 AM | Reply

Generally, I store the configuration in a *.pm file. Any maintenance programmer should know enough Perl to change it. If the user has to change it, I create a (G)UI for them.

Storing the configuration is a different format means you have to include a parser and a writer for it; a wasted effort if only programmers are going to see it.

Colin | September 30, 2011 5:33 AM | Reply

The project I spend most of my time on, WebGUI, uses JSON for configuration. There's a Config::JSON module for easy reading and manipulation, and most developers nowadays understand javascript.

Ron Savage | September 30, 2011 6:35 AM | Reply

Hi Steven

"It would take at the very least an hour to explain the syntax to first timers, and days to familiarize with it.". Yep. That's what killed YAML - A /beginner/ can't just sit down and type it.

I, and I suspect, many others, have returned to the INI file style.

"* Support arbitrary section names, property names, & property values (e.g. property name containing "=", section name containing "]" or newlines).". Nope. This is offering pathological flexibility.

"* Arrays are represented by duplicate property names:". Why? Because someone else did it that way, IIRC?

Why not:

name=[one
two
three]

with the '[' and ']' also possible on a separate line.

This more-or-less eliminates the YAML-like, and ridiculous, complexity of your rules about distinguishing between a scalar and an array.

Hence: An empty string:
name=
and an empty array:
name=[]
and a null
name=\0
and an array containing a null:
name=[\0]

Really, it's not that difficult :-)).

What I miss in INI parsers, and you don't mention, is nested sections. So I use is a global section containing the name of the section to process. Eg:

[global]

# host:
# o Specifies which section to use after the [global] section ends.
# o Values are one of localhost || webhost.
# o Values are case-sensitive.
#
# Warning:
# o This file is processed by Config::Tiny.
# o See App::Office::Contacts::Util::Config.
# o So, do not put comments at the ends of lines.
# o 'key=value # A comment' sets key to 'value # A comment' :-(.

host=localhost

[localhost]

# Template stuff
# --------------
# This a disk path.

tmpl_path=/dev/shm/html/assets/templates/app/office/contacts

# CSS stuff
# ---------
# This is a URL.

css_url=/assets/css/app/office/contacts

# Javascript stuff
# ----------------
# This is a URL.

yui_url=/assets/js/yui

[webhost]

# TBA.

Cheers
Ron

Steven Haryanto replied to comment from Shawn H Corey | September 30, 2011 9:29 AM | Reply

@Shawn: configuration is usually edited by users, not developers, and we cannot expect them to know Perl (except when the users are developers too). Also, sometimes the project is multilanguage.

Steven Haryanto replied to comment from Ron Savage | September 30, 2011 9:35 AM | Reply

@Ron: Thanks for the comments.

I will consider using [...] in property value for arrays, but this is incompatible with most other INI parsers.

By nested sections, do you mean merging of values between sections? Or includes (I'd like that)? Or section containing another section (which I did mention, although perhaps not clearly).

Steven Haryanto replied to comment from Anonymous | September 30, 2011 10:20 AM | Reply

@Anonymous: Config::General doesn't preserve comment/formatting.

Steven Haryanto replied to comment from Colin | September 30, 2011 10:20 AM | Reply

Agreed, JSON is pretty universal these days. But Config::JSON doesn't preserve formatting or comments.

mirod | September 30, 2011 2:39 PM | Reply

For me the one thing all formats beyong pure Perl code are missing is the ability to define variables:

    $WEB_ROOT= '/var/www';
    $CSS= "$WEB_ROOT/styles";

Whether it's INI, JSON or YAML, I haven't found a way to do this properly. I actually patched an old version of YAML.pm to allow for this, but it was very limited, with a simple regexp that replaced $var by the value associated with the var key:

    ---
    bar: $foo/bar
    foo: bar

Which doesn't look that good anyway as YAML outputs keys in alphanumeric order.

Also for complex data structures you then have to manage scope.

I wonder if there is a module that would do this, or if it would be worth writing one. And if this could be applied to a nicer format than YAML (I am not too familiar with INI)

Joel Roth replied to comment from mirod | September 30, 2011 3:40 PM | Reply

A project of mine resolves a similar issue. Each top-level key in the config file is a variable name. Instead of declaring variables, as you propose, it has a list of abbreviations that get substituted, even in the abbreviations list. This allows the following config to be properly handled:



abbreviations:

  24-stereo: s24_le,2,frequency,i

  frequency: 44100

devices:

  jack:

    signal_format: f32_le,N,frequency

record_format: 24-stereo

A list of variables is used to determine that 'record_format' is a scalar and 'devices' is a hash and that both are legal keys.

Not finding anything appropriate on CPAN, I ended up writing my own code for assigning a list of variables from the deserialized reference. (Surprising, as I would think this requirement to be common.)

sigzero.myopenid.com | September 30, 2011 10:50 PM | Reply

I thought Shlomi stated somewhere that Config::IniFiles preserved comments? Might want to check with him. He might be able to add some bits that you need.

Robert | September 30, 2011 11:34 PM | Reply

Do none of the thousand config file packages preserve formatting? I thought that Config::IniFiles had a switch that allowed that.

Ron Savage | October 1, 2011 5:41 AM | Reply

Hi Steven

I meant sections entirely nested within sections.

I realize this adds complexity, which I'm uneasy about, but I feel this corresponds to real world usage enough to be appropriate.

The other way to emulate it is by naming sections:
[outer]
...
[outer.inner]
...

But you've still go the problem of how to treat the /next/ section: Is it at the level of [outer] or is it nested?

mrstlee | October 1, 2011 6:58 PM | Reply

Data-only perl config isn't actually that hard for non-programmers to understand. It is basically JSON with '=>' instead of ':'. I've used it for years and had no complaints from our customer deployment support crew.

These days I tend to use JSON for files that aren't developer-specific since so many people are familiar with it. Disallow /**/ comments and transliteration from JSON to perl is a doddle - slurp in your json and translate : to =>, // to #, true to 1 etc. 'eval' your string (Ignore the PBP police on this one) and life is good.

True - utf8 presents more of a challenge. Personally I've not had to deal with utf8 config files.

Aristotle | October 1, 2011 7:06 PM | Reply

The only worthwhile design constraint I see here is round-tripping. But even the very desire to write the configuration file is somewhat telling. It is possible that you merely want to write GUI to assist in configuration, of course, or to remember the state of a few checkboxes – and that is fine. But it could also mean you are trying to use the configuration file as a persistent store for some internal state data structures, in which case of course you want a capable format so that your code won’t have to perform mapping – it’s much better for you if the user has to!

The right thing to do for configuration is use vanilla INI and live with its limitations. Sometimes that means spending some time coming up with a way to express a configuration using just the means of INI, but that is time well spent. It will make your configuration easier to use.

To persist internal state too complex to fit a configuration format, you should really be serialising it in some other format, not writing it into your configuration.

Configuration is not state serialisation.

mrstlee | October 1, 2011 10:05 PM | Reply

Why be so prescriptive about "the" right thing to do? Would trying to configure something as sophisticated as apache, e.g. multiple virtual hosts & mod_rewrite rules, with ini-style notation, really be time well spent?

I'd say it would be pretty hard to nail down exactly what configuration is. You could argue the perl interpreter is infinitely configurable - it's run-time behaviour can be completely determined by configuration files, usually referred to as 'scripts'.

If you just want to a programmatic means to update values in a config file you may not need anything more sophisticated than perl 1 liners + (bash|your shell of choice). It's all just text after all.

Aristotle | October 2, 2011 12:57 AM | Reply

I am prescriptive because I’ve seen what works and what doesn’t.

mod_rewrite is an awful programming language masquerading as configuration. Have you seen its documentation? You get to branch and loop by using hard to write syntax. There are also variables, though you can only assign and use them in very limited ways. If you want the user to write code, which for this use case he has to do for any need beyond the trivial, then use a programming language. Don’t try to gin up a hackneyed mini-language within the confines of a configuration file syntax, all the while ignoring all lessons of language design. The result will be painful for you and painful for your users.

When I wrote Plack::Middleware::Rewrite I was pleasantly surprised at how easy it was to offer all the more advanced features of mod_rewrite with almost no effort (except the proxying features), and yet still the “configuration” is more readable and far more easily writeable, esp. in expressing more complicated intents.

You could argue the perl interpreter is infinitely configurable – it’s run-time behaviour can be completely determined by configuration files, usually referred to as “scripts”.

Answered like an architecture astronaut. Yes, you can nitpick all distinctions out of existence. Control flow is nothing but GOTOs and IFs too. Yet I doubt that you think in terms of IF and GOTO when you structure your programs. Why not? All you achieve in so nitpicking is lose all sense of how to think of things.

Configuration is not program state. Sure, there’s a thin strip of grey area in the middle, but don’t let that little grey engulf the large expanses of black and white to either side.

Ed W | October 2, 2011 2:34 AM | Reply

I *think* you have described the standard perl JSON module:
http://search.cpan.org/~makamaka/JSON-2.53/lib/JSON.pm

...give or take the choice of formatting (ie json style vs ini style)? It has claimed round trip of the file?

Also JSON is reasonably easy for beginners to write and is approximately (give or take an argument) a subset of yaml.

Apache config style is also nice - I don't know if there is a round trip config library for that?

Why not invest some time looking at adding round trip of comments to an appropriate library that is already close rather than starting again?

Steven Haryanto replied to comment from mirod | October 4, 2011 1:36 AM | Reply

@mirod: True. As an alternative to using Perl code, one can also use preprocessing with templating languages, e.g. TT, which can do variables and simple calculation too.

I do plan to add calculation capability to my to-be-written INI module, not using Perl's eval(), but using something like Language::Expr. Example syntax:

[section1]
a = 2
b := $CONFIG["section1"]["a"] + 1 
c := sqrt($CONFIG["section1"]["a"]**2 + $CONFIG["section1"]["b"]**2)

[section2]

a = "A is " + $CONFIG["section1"]["a"]

Steven Haryanto replied to comment from Robert | October 4, 2011 1:38 AM | Reply

@Robert: Yes, Config::IniFiles preserves comments, that's one thing I like about it. But it currently does not preserve formatting, e.g. changing nicely formatted:

one          = foo
two          = bar
thirty seven = baz

to:

one=foo
two=bar
thirty seven=baz

I know it's usually not a big deal, but this is not round-trip.

Steven Haryanto replied to comment from Aristotle | October 4, 2011 1:44 AM | Reply

@Aristotle: True, it's more like that my configuration files contain "the state of some checkboxes (or input fields)" instead of "program state".

But they also contain some notes about: why some configuration is set to some value, dates and times, links, reminders, etc. The ordering and formatting of values can also be significant sometimes. These are valuable.

GUI tools do not deal nicely with these.

Steven Haryanto replied to comment from Ron Savage | October 4, 2011 1:51 AM | Reply

@Ron: I did mention nested section in my post (albeit not very clearly). I think I'm going with this syntax:

[outer][inner]

as it's easier to parse and to look at. It also doesn't have the "in-band character" (i.e., if you use "[outer.inner]", what if the section name contains dot; this will involve escaping [aside from the normal escaping by quotes).

I don't get the "/next/ section" part.

Steven Haryanto replied to comment from Ed W | October 4, 2011 1:55 AM | Reply

@Ed W: The "round-trip" mentioned in the JSON module documentation is about preserving the integrity of values. It does not preserve comments or formatting.

I think with non line-based formats like JSON/YAML/XML, adding a round-trip parser is not exactly trivial (and apparently not popular, since I've not seen a single implementation that does this).

Steven Haryanto replied to comment from Colin | October 4, 2011 1:58 AM | Reply

@Colin: So configuration is only meant to be edited via GUI/browser, and not directly in configuration files? Hm, will be really missing the ability to put comments.

Steven Haryanto replied to comment from sigzero.myopenid.com | October 4, 2011 2:00 AM | Reply

@sigzero: See above. Yes, Config::IniFiles preserves comments, but currently not formatting. I've also sent a couple of patches for C::IF before, but I'm not sure if I want to continue patching C::IF or write a new module, since the list of new features I want to put in is a bit long, and I don't want many of the current C::IF's features.

Steven Haryanto replied to comment from Joel Roth | October 4, 2011 2:36 AM | Reply

@Joel Roth: Looks ok, but this doesn't handle calculations.

Joel Roth replied to comment from Steven Haryanto | October 4, 2011 10:20 AM | Reply

@Steven Haryanto: I'll be interested to see how you resolve the tradeoff between capability and simplicity for the end user. For me, the laziness factor also figures in: I'd rather let my users write Perl code than come up with a whole new mini-language.

Steven Haryanto | November 4, 2011 11:11 AM | Reply

@Joel: Depends on who the users are :)

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Steven Haryanto

A programmer (mostly Perl 5 nowadays). My CPAN ID: SHARYANTO. I'm sedusedan on perlmonks. My twitter is stevenharyanto (but I don't tweet much). Follow me on github: sharyanto.

More info »

Of course I still use Perl