bulk inserting tens of thousands of rows with DBIx::Class

By davewood on January 2, 2014 6:11 PM

For some crazy reason my $project had a denormalized database layout. I had a table 'job' with a column 'data' which had a serialized ArrayRef of HashRefs as value.
This design failure resulted in increasing and badly scaling memory consumption and had to be refactored. Now my table 'job' has a has_many relationship with table 'step_result' and each row in 'step_result' represents one of the above mentioned HashRefs.

But performance turned out to be really bad, I did not take any premature measures to take care of that and with the help of channel #dbix-class on irc.perl.org I was able to improve performance by 4200%. Yay!

One issue was that I used create() to insert the rows. For inserting large quantities of rows populate() is a much better choice. The problem was that populate() doesn't work if you are using Ordered.pm because it requires the highest used position value from the RDBMS before inserting a new row.

ribasushi helped out and eventually I could use populate() by filling in the position with the help of a subselect so Ordered.pms magic doesn't get triggered.

$job->step_results->search({
   position => \"(SELECT COALESCE(MAX(position)+1,1) FROM step_result WHERE job_id = $job_id)",
})->populate(\@results);

Note: It is important to execute populate() in void context because otherwise DBIC falls back to using create().

These actions already improved performance by 700%. The final touch was to create an index on the position column for each job.

sub sqlt_deploy_hook {
    my ( $self, $sqlt_table ) = @_;
    $sqlt_table->add_index( name => 'step_result_idx_job_id_position', fields => ['job_id', 'position'] );
}

ps: It is always a good idea to use bind parameters.

$job->step_results->search({
    position => {
        '=' => \[
            "(SELECT COALESCE(MAX(position)+1,1) FROM step_result WHERE job_id = ?)",
            [ ($job_id) x scalar @results ]
        ],
    },
})->populate(\@results);

1 comment

Tagged as:

bulk insert, DBIx::Class, Ordered.pm, populate, subselect

1 Comment

davewood | January 9, 2014 11:38 AM | Reply

You can get rid of part of the literal SQL.

ribasushi: (...) you need to make a correlator *and* you will not able to avoid literal sql entirely anyway
ribasushi: so if "no literal" is your goal - it outright can't be done
ribasushi: how to make a correlating statement (leaving the exact syntax in your case as an exercise to the reader): https://metacpan.org/source/RIBASUSHI/DBIx-Class-0.08250/t/prefetch/correlated.t#L24

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About davewood

I like Toast.

More info »

davewood