Test Driven Development of Excel::Writer::XLSX Part III
In the first and second part of this post we looked at how I used Perl's Test Driven Development to help port Spreadsheet::WriteExcel to Excel::Writer::XLSX.
The first two posts dealt with tests at the unit and class level. At the package/module level I was able to test the output of Excel::Writer::XLSX against files generated by Excel.
The is something that I had always wanted to do with Spreadsheet::WriteExcel. However, that module never targeted 100% fidelity with Excel because it would have required too much effort. Instead it targeted the minimum possible number of binary records to implement a feature. As such, Spreadsheet::WriteExcel started off with less than 1% of the Excel binary records and even today it still implements less than 50%. The fact that Excel will accept such minimal implementations is a credit to the designers since it allowed me to release small yet functional representations of the file format and add to it incrementally. If I had to provide 50% coverage in the first release the module would never have happened.
The advent of the Excel 2007 XML file format meant that it was now possible to target 100% fidelity. At least for the features that I wished to implement.
As we saw in the first post an Excel XLSX file at its simplest contains the following structure and XML files:
____ [Content_Types].xml
|
|____ docProps
| |____ app.xml
| |____ core.xml
|
|____ xl
| |____ workbook.xml
| |____ worksheets
| | |____ sheet1.xml
| |
| |____ styles.xml
| |
| |____ theme
| | |____ theme1.xml
| |
| |_____rels
| |____ workbook.xml.rels
|
|_____rels
|____ .rels
To test at this level I use test cases that take an Excel::Writer::XLSX output file and a file generated by Excel, unzip both using Archive::Zip, test that the each contain the same files and then tests each XML element of each file. A simplified test case might look like the following:
use lib 't/lib';
use TestFunctions qw(_compare_xlsx_files _is_deep_diff);
use strict;
use warnings;
use Test::More tests => 1;
# Tests setup.
...
#######################################################################
#
# Test the creation of a simple Excel::Writer::XLSX file.
#
use Excel::Writer::XLSX;
my $workbook = Excel::Writer::XLSX->new( $got_filename );
my $worksheet = $workbook->add_worksheet();
$worksheet->write( 'A1', 'Hello' );
$worksheet->write( 'A2', 123 );
$workbook->close();
#######################################################################
#
# Compare the generated and existing Excel files.
#
my ( $got, $expected, $caption ) = _compare_xlsx_files(
$got_filename,
$exp_filename,
$ignore_members,
$ignore_elements,
);
_is_deep_diff( $got, $expected, $caption );
What is interesting about this is that the central part of the test is an actual Excel::Writer::XLSX program. This code is that same as code used in example programs or in the documentation.
The _compare_xlsx_files()
function tests the two aspects of the XLSX file described above, the files (members in Archive::Zip parlance) and the XML elements of the file. A failing test case for files might look like this:
t/regression/simple01.t .. 1/1
# Failed test ' _compare_xlsx_files(): Members.'
# at t/lib/TestFunctions.pm line 268.
# +----+----------------------+----+----------------------------+
# | Elt|Got | Elt|Expected |
# +----+----------------------+----+----------------------------+
# | 0|[Content_Types].xml | 0|[Content_Types].xml |
# | 1|_rels/.rels | 1|_rels/.rels |
# | 2|docProps/app.xml | 2|docProps/app.xml |
# | 3|docProps/core.xml | 3|docProps/core.xml |
# | | * 4|xl/_rels/workbook.xml.rels *
# | 4|xl/sharedStrings.xml | 5|xl/sharedStrings.xml |
# | 5|xl/styles.xml | 6|xl/styles.xml |
# | 6|xl/theme/theme1.xml | 7|xl/theme/theme1.xml |
# | 7|xl/workbook.xml | 8|xl/workbook.xml |
# +----+----------------------+----+----------------------------+
# Looks like you failed 1 test of 1.
A failing test for the XML elements within a file might look like this:
t/regression/simple01.t .. 1/1
# Failed test ' _compare_xlsx_files(): xl/sharedStrings.xml'
# at t/lib/TestFunctions.pm line 268.
# +----+-------------------------+----------------------------+
# | Elt|Got |Expected |
# +----+-------------------------+----------------------------+
# | 2|<si> |<si> |
# * 3|<t>_hello</t> |<t>Hello</t> *
# | 4|</si> |</si> |
# | 5|</sst> |</sst> |
# +----+-------------------------+----------------------------+
# Looks like you failed 1 test of 1.
Again, this is making extensive use of Test::Differences as shown in the previous post.
It is also possible with this technique to ignore certain files or elements such as timestamps, certain randomly generated indexes and user specific metadata.
So in summary, I was able to to use Perl's test driven development techniques at the unit, class and package level against actual Excel data to accelerate porting of several years of work into several months. Added to this, I now have a comprehensive regression suite so that I can tackle refactoring with the confidence that I am not breaking any existing features.
This doesn't mean that the module is bug free, it isn't and end users often find interactions that I haven't tested, but I can quickly respond to test reports using the same techniques.
All in all, this has meant that the development of Excel::Writer::XLSX has been a lot less painful process than Spreadsheet::WriteExcel and has made me more inclined to add new features.
Strictly speaking, this is “pirate testing” rather than real TDD (where the tests drive not only implementation but also design). :-)
It may be overstating the case a little to say that this is an example of "pirate testing". If it is an example then it is only the third such example after the two that you link to and according to Google.
Also, the techniques presented above don't seem to match the loose definitions of "pirate testing" provided via those links: "the practice of shanghaiing another tool’s test suite" or "tests are specified in a language neutral format (XML, JSON, YAML, whatever) and then various test harnesses for different languages are written".
If it is an example of "pirate testing" then I am happy enough to be one of the inception points for a new Test Driven Development Pattern or meme.
However, I would view the above techniques as just Plain Old TDD using, in Kent Beck's words "Realistic Data, where you use data from the real world".
Also, it isn't fair to say that the tests aren't driving the design as well as the implementation. In this particular case the Excel::Writer:XLSX API is already defined by Spreadsheet::WriteExcel and isn't subject to change but the internal design has changed considerably and continues to change during testing.