February 2011 Archives

Test Driven Development of Excel::Writer::XLSX Part II

Test Driven Development of Excel::Writer::XLSX Part II

In the first part of this post we looked at how I used auto-generated code and tests to speed up the re-write of Spreadsheet::WriteExcel into Excel::Writer::XLSX.

This dealt with tests at the unit or method level. At the class level I was able to use tests that compared module results against actual XML from an Excel XLSX file. The following is an extract from an example test case. The XML in the DATA section is from an Excel file, pretty printed via xmllint.

###########################################################################
#
# Test the _assemble_xml_file() method.
#
$caption = " \tWorksheet: _assemble_xml_file()";

$worksheet = _new_worksheet(\$got);

$worksheet->select();
$worksheet->_assemble_xml_file();

$expected = _expected_to_aref();
$got      = _got_to_aref( $got );

_is_deep_diff( $got, $expected, $caption );

__DATA__
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"">
  <dimension ref="A1"/>
  <sheetViews>
    <sheetView tabSelected="1" workbookViewId="0"/>
  </sheetViews>
  <sheetFormatPr defaultRowHeight="15"/>
  <sheetData/>
  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75"/>
</worksheet>

The _is_deep_diff() test function used in the test above is a wrapper for the Test::Differences eq_or_diff() test method.

I prefer not to impose Test module dependencies on the end user unless they are necessary. In this case the _is_deep_diff() test function uses Test::Differences if it is available and falls back to the core Test::More is_deeply() function if it is not.

The Test::Differences output for a failing test case might look like the following:

$ prove -l t/worksheet/worksheet_01.t 
t/worksheet/worksheet_01.t .. 1/1 
#   Failed test '   Worksheet: _assemble_xml_file()'
#   at t/lib/TestFunctions.pm line 268.
# +----+----------------------------------+--------------------------------+
# | Elt|Got                               |Expected                        |
# +----+----------------------------------+--------------------------------+
# |   5|</sheetViews>                     |</sheetViews>                   |
# *   6|<sheetFormatPr default="12.5" />  |<sheetFormatPr default="15" />  *
# |   7|<sheetData />                     |<sheetData />                   |
# +----+----------------------------------+--------------------------------+
# Looks like you failed 1 test of 1.

If you are going to invest time in Test Driven Development then in practice you are going to spend a lot of time looking at the output of failing test cases. As such I find the the diff style context of Test::Difference to be invaluable, particularly when there is more than one difference in the failing test case.

In the next part of this blog I'll demonstrate the final component of the test framework where I test complete Excel files against the output of Excel::Writer::XLSX.

Test Driven Development of Excel::Writer::XLSX Part I

For the last few months I have been porting Spreadsheet::WriteExcel to the new Excel 2007+ file format.

The older Excel file format was comprised of sequential binary records whilst the new file format is a collection of XML files in a zip container.

The newer module maintains the same API as Spreadsheet::WriteExcel but is in a different namespace. It is called Excel::Writer::XLSX.

Some of the test driven development aspects of writing Excel::Writer::XLSX have been interesting and I'd thought that I'd blog about them here.

At its simplest an Excel XLSX file contains the following elements:

 ____ [Content_Types].xml
|
|____ docProps
| |____ app.xml
| |____ core.xml
|
|____ xl
| |____ workbook.xml
| |____ worksheets
| | |____ sheet1.xml
| |
| |____ styles.xml
| |
| |____ theme
| | |____ theme1.xml
| |
| |____ _rels
|   |____ workbook.xml.rels
|
|____ _rels
  |____ .rels

The .xml files contain the data and the .rels files link them together. These are all zipped into a single .xlsx file.

Since the files are XML based I initially considered generating them with a Class to XML mapper. However, interactions between individual files and between elements within the files made simple mapping impossible.

In the end I settled on using XML::Writer and statically auto-generating methods for outputting individual XML elements.

So, for example in the Sheet1.xml worksheet file shown above a typical element representing the page setup settings might look like this:

<pageSetup paperSize="9" orientation="landscape" />

I extract this from a sample file and use a small Perl program to auto-generate the code for creating the XML element. The output looks something like the following:

# Write the pageSetup element.
$self->_write_page_setup();


##########################################################################
#
# _write_page_setup()
#
# Write the <pageSetup> element.
#
sub _write_page_setup {

    my $self        = shift;
    my $paper_size  = 9;
    my $orientation = 'landscape';

    my @attributes = (
        'paperSize'   => $paper_size,
        'orientation' => $orientation,
    );

    $self->{_writer}->emptyTag( 'pageSetup', @attributes );
}

More importantly it also auto-generates a test case for the method like this:

###########################################################################
#
# Tests for Excel::Writer::XLSX::Worksheet methods.
#
# reverse('©'), January 2011, John McNamara, jmcnamara@cpan.org
#

use lib 't/lib';
use TestFunctions '_new_worksheet';
use strict;
use warnings;

use Test::More tests => 1;


###########################################################################
#
# Tests setup.
#
my $expected;
my $got;
my $caption;
my $worksheet;


###########################################################################
#
# Test the _write_page_setup() method.
#
$caption  = " \tWorksheet: _write_page_setup()";
$expected = '<pageSetup paperSize="9" orientation="landscape" />';

$worksheet = _new_worksheet(\$got);

$worksheet->_write_page_setup();

is( $got, $expected, $caption );

__END__

In the simplest cases the new method will work as expected and the test case will pass first time. In most cases the method will need some additional code to handle non-default cases and more tests will have to be added. However, as a minimum all new methods will have a test case.

If you are interested in seeing the final implementation of the test case have a look at sub_write_page_setup.t and the code for the method that it tests write_page_setup() (search down).

There are currently around 200 test files and 1100 test cases in the Excel::Writer::XLSX test suite and the test code base is around 18 KLOC for a code base of around 16 KLOC.

The methodology of generating test cases and code together has been quite successful and in less than 6 months I have been able to re-write 90% of the methods of Spreadsheet::WriteExcel which took over 10 years to write in the first place.

This productivity has been helped greatly by having a pre-defined and unchanging specification in the form of the Spreadsheet::WriteExcel API and existing documentation and examples. This in not insignificant since these elements probably comprised around 50% of the effort that went into Spreadsheet::WriteExcel.

Another advantage of the new module is that I am able to test complete files generated by Excel::Writer::XLSX against files generated by Excel.

I'll write about that in a separate post.

About John McNamara

user-pic Just another Perl hacker