January 2010 Archives

Cantella::Store::UUID - 0.002000

I've released a new version of Cantella::Store::UUID. (git) It features one incompatible change, but I figured it was OK since I'm pretty sure nobody is using it and it is reasonably new. I just couldn't live with having one lone method named so awkwardly. The POD was all updated and improved and two new methods were added, which allow the user to search for files in the storage directory and perform some action on these files. I don't really intend for these methods to be used a lot, since they will take minutes to run on most systems and be incredibly IO-intensive (depth-first search of the entire storage tree, which could be hundreds of thousands of nodes). I use them mainly for maintenance operations and you should probably too.

These methods existed in the internal version this library was based on, but due to a bug in Path::Class::Dir, I was unable to add these features until Path::Class 0.18 was released with the fix. As of now, this module can be considered stable. I promise to add no code without extensive tests and change no APIs unless it's really, really necessary. Even then, there will be a long deprecation cycle and a back-compat layer.

If you haven't already seen it, I recommend you take a look at Cantella::Store::UUID. I use it to store many, many files in the file system without concentrating too many files in any one directory. Accessing files is quick because their location is deterministic (based on their UUID) and you can split the hierarchy into different physical devices to improve performance or add capacity. It's not very high-tech, but it works surprisingly well if you have huge numbers of files to store. Another nice thing is that you can mirror it with r-sync or create incremental back-ups using rdiff-backup.


Two ago I began writing the beginnings of what was to be the first Cantella::Data::Tabular renderer class. The idea was to render a Cantella::Data::Tabular::Table object into a plain text table. I failed miserably. Within 5 seconds I got wrapped up on issues of formatting and how to render data and data-types. Eventually I resorted to #moose for ideas and rafl pointed me towards some code of his. We agreed that it would be mutually beneficial to use if I got to use his code as long as I separated it from its original package, and packaged it separately so it could stand alone.

The idea behind MooseX::TypeMap is simple: it holds a series of "Entry" objects which have type_constraint and data attributes. You can then give MooseX::TypeMap an instance of a Moose::Meta::TypeConstraint and it will give you back the data associated with that type-constraint or, optionally, it's closest super-type that TypeMap knows. That's it. It's simple and it's short and it's hardly very exciting, but it will make my life easier.

Take, for example, a Cantella::Data::Tabular renderer, which needs to render different cells based on their associated type-constraint. Different people may want to have different rendering rules, and a good way to implement the conversion would be using MooseX::TypeMap to automatically fetch a closure or printf format based on the cell's type. A sane default could be shipped with the formatter and the user could either modify it or replace it to match his/her needs in only a couple of lines of code.

Tabular Data, spreadsheets and cell references

Since Cantella::JobQueue has been put on hold after the discovery of Beanstalkd, I've moved on to other projects. The first on my list was to add some new views to a Reaction-based application. Specifically, I needed a couple of ways to export tabular data. I needed to display it as inline XHTML tables, export it as PDF sheets and allow for spreadsheet downloads. Because none of the current offerings gave me the flexibility I wanted, I decided to write my own.

Cantella::Data::Tabular is a library for working with (surprise, surprise) tabular data. It contains three main objects, so far. Cells, rows and tables. A table has zero or more rows, and row has zero or more cells and a cell has zero or one value. Additionally, cells will be able to store certain meta-data to facilitate round-trip conversions from certain formats like ".xls". One of the decisions I made from the start was to disallow cells with 'undef' as a value. Cells can either have a value, which must be defined, or have no value. Additionally, I decided that I wanted different rows to be able to have a different number of columns. Lastly, I decided that the whole library should be completely unaware of what cell values mean. That means no references, no functions and no ranges. Values are simply values and it should be up to a higher-level library to decipher exactly what those values mean or don't mean. I did, however, build in cell-level type-constraints.

The library may seem limited, and may not grow to be expanded beyond the most basic needs, but that's the way I want it. Exporters and importers that write or read from Excel / CSV / HTML / PDF files will be made available separately as they become necessary. What I don't want to end up with is a spreadsheet library. I wouldn't mind building another library on top of C::D::T that is able to perform basic spreadsheet functions, but it's a slippery slope to go down.

As of right now, the library has basic tests and is almost fully documented. It hasn't been released to CPAN yet, but it will as soon as it does something useful. I just wanted to let people know this thing exists and invite any comments, suggestions, feature requests or criticisms you might have.

About Guillermo Roditi

user-pic I blog about Perl.