Tie::Array::CSV is now more efficient on row ops
Not nearly so exciting as my announcement of Zoidberg
, but today I announce the release of Tie::Array::CSV
version 0.04.
Tie::Array::CSV leverages Tie::File
and Text::CSV
to allow access to a CSV file as a native Perl 2D array (i.e. array of array references), without having to read the (entire) file into memory.
The major improvement in 0.04 was inspired by a conversation with David Mertens at the WindyCity.pm imformal meeting a couple weeks back. As I was explaining T::A::CSV to the group over a couple beers, David asked if the file is updated at every change. I said proudly that it was, however he noted that this has some drawbacks.
His first point was that it is expensive to write out the whole row even when the next operation might be on the same row. To address this, now, by default, same-row operations do not write out on each change. More specifically as long as the reference to the row stays in scope, the related line is not updated. This allows for more efficient operations like map
on the rows.
Additionally, this feature is optional, controlled by the hold_row
option to the constructors, which is true by default.
His second point, which I cannot do anything about, is related to Tie::File. He asks if changes to the top of the file are more costly than near the bottom? Honestly I didn’t know so I ask you, the reader; if you know please comment. Either way, I don’t see changing away from Tie::File for line access, but it is something worth understanding.
One other change is that the tie
and new
constructors now have the same option passing mechanism. In 0.03 the new
constructor allowed a more flexible system for passing options, but the tie
constructor did not. This mechanism has been ported back to the tie
constructor as of the new release.
I hope Tie::Array::CSV can help save people some effort when using CSV files, if you think it might help you, please check it out, and as always, let me know your thoughts.
P.S. development is hosted on my GitHub.
Hi Joel,
To answer the second point, yes David’s hunch is correct and Tie::File will slow down the larger the file is. It’s not so bad really, and you need a large file or to be frequently making updates, to notice.
This is because Tie::File needs to read the file and look for the separators (i.e. newlines) between the variable-length records, until it finds the correct record number.
I wrote Tie::File::FixedRecLen to work around this for a project of mine. By committing to fixed length records it can do a little math and quickly seek() to the right point in the file. Probably not applicable to Tie::Array::CSV but still maybe of interest :)
regards, oliver.