bulk inserting tens of thousands of rows with DBIx::Class
For some crazy reason my $project had a denormalized database layout. I had a table 'job' with a column 'data' which had a serialized ArrayRef of HashRefs as value.
This design failure resulted in increasing and badly scaling memory consumption and had to be refactored. Now my table 'job' has a has_many relationship with table 'step_result' and each row in 'step_result' represents one of the above mentioned HashRefs.
But performance turned out to be really bad, I did not take any premature measures to take care of that and with the help of channel #dbix-class on irc.perl.org I was able to improve performance by 4200%. Yay!
One issue was that I used create() to insert the rows. For inserting large quantities of rows populate() is a much better choice. The problem was that populate() doesn't work if you are using Ordered.pm because it requires the highest used position value from the RDBMS before inserting a new row.
ribasushi helped out and eventually I could use populate() by filling in the position with the help of a subselect so Ordered.pms magic doesn't get triggered.
$job->step_results->search({
position => \"(SELECT COALESCE(MAX(position)+1,1) FROM step_result WHERE job_id = $job_id)",
})->populate(\@results);
Note: It is important to execute populate() in void context because otherwise DBIC falls back to using create().
These actions already improved performance by 700%. The final touch was to create an index on the position column for each job.
sub sqlt_deploy_hook {
my ( $self, $sqlt_table ) = @_;
$sqlt_table->add_index( name => 'step_result_idx_job_id_position', fields => ['job_id', 'position'] );
}
ps: It is always a good idea to use bind parameters.
$job->step_results->search({
position => {
'=' => \[
"(SELECT COALESCE(MAX(position)+1,1) FROM step_result WHERE job_id = ?)",
[ ($job_id) x scalar @results ]
],
},
})->populate(\@results);
You can get rid of part of the literal SQL.
ribasushi: (...) you need to make a correlator *and* you will not able to avoid literal sql entirely anyway
ribasushi: so if "no literal" is your goal - it outright can't be done
ribasushi: how to make a correlating statement (leaving the exact syntax in your case as an exercise to the reader): https://metacpan.org/source/RIBASUSHI/DBIx-Class-0.08250/t/prefetch/correlated.t#L24