bulk inserting tens of thousands of rows with DBIx::Class

For some crazy reason my $project had a denormalized database layout. I had a table 'job' with a column 'data' which had a serialized ArrayRef of HashRefs as value.
This design failure resulted in increasing and badly scaling memory consumption and had to be refactored. Now my table 'job' has a has_many relationship with table 'step_result' and each row in 'step_result' represents one of the above mentioned HashRefs.

But performance turned out to be really bad, I did not take any premature measures to take care of that and with the help of channel #dbix-class on irc.perl.org I was able to improve performance by 4200%. Yay!

One issue was that I used create() to insert the rows. For inserting large quantities of rows populate() is a much better choice. The problem was that populate() doesn't work if you are using Ordered.pm because it requires the highest used position value from the RDBMS before inserting a new row.

ribasushi helped out and eventually I could use populate() by filling in the position with the help of a subselect so Ordered.pms magic doesn't get triggered.

$job->step_results->search({
   position => \"(SELECT COALESCE(MAX(position)+1,1) FROM step_result WHERE job_id = $job_id)",
})->populate(\@results);

Note: It is important to execute populate() in void context because otherwise DBIC falls back to using create().

These actions already improved performance by 700%. The final touch was to create an index on the position column for each job.

sub sqlt_deploy_hook {
    my ( $self, $sqlt_table ) = @_;
    $sqlt_table->add_index( name => 'step_result_idx_job_id_position', fields => ['job_id', 'position'] );
}


ps: It is always a good idea to use bind parameters.

$job->step_results->search({
position => {
'=' => \[
"(SELECT COALESCE(MAX(position)+1,1) FROM step_result WHERE job_id = ?)",
[ ($job_id) x scalar @results ]
],
},
})->populate(\@results);

1 Comment

Leave a comment

About davewood

user-pic I like Toast.