Using AI to Optimise the Calculation of Krippendorff’s Alpha
The Experiment
At the beginning of the year, we ran a small experiment at work. We hired four annotators and let them rate 900 sentences (the details are not important). To decide whether the inter-annotator agreement was significant, we calculated (among others) Krippendorff’s alpha coefficient.
I’d used Perl for everything else in the project, so I reached for Perl to calculate the alpha, as well. I hadn’t found any module for it on CPAN, so I wrote one: I read the Wikipedia page and implemented the formulas.
The Real Data
The experiment was promising, so we got additional funding. We hired 3 more annotators, and a few months later, another nine. This increased the number of raters to 16. So far, they’ve rated about 200K sentences. Each sentence has been annotated by at least two annotators (usually three).
One day, I decided to calculate the inter-annotator agreement for the new data. To my surprise, the calculation took more than 6 hours.
Profiling
I ran the NYTProf profiler on a smaller dataset to discover the problematic areas of the code. Browsing the results I clearly identified the culprit: the method responsible for building the coincidence matrix.
sub _build_coincidence($self) {
my %coinc;
my @s = $self->vals;
for my $v (@s) {
for my $v_ (@s) {
$coinc{$v}{$v_} = sum(map {
my $unit = $_;
my @k = keys %$unit;
sum(0,
map {
my $i = $_;
scalar grep $unit->{$_} eq $v_,
grep $i ne $_, @k
} grep $unit->{$_} eq $v, @k
) / (@k - 1)
} @{ $self->units });
}
}
return \%coinc
}
You can see the four nested loops here (two for
’s and two map
’s). I apologise, that’s how I’d understood the formula.
The Solution Suggested by AI
“This is the right time to experiment with AI!” thought I to myself. I started a session with GPT-4o, showed it the whole module and asked for a more efficient version. The LLM correctly identified the four nested loops, suggested an improvement, and generated code to implement it.
Unfortunately, the code didn’t even compile. I asked the AI several times to fix the problems, until I got a runnable code, but its running time wasn’t different to the original implementation. So I asked it to try harder, and after a third iteration of the whole ordeal, we had the following code that ran for 20 seconds instead of 22:
sub _build_coincidence($self) {
my %coinc;
my @vals = $self->vals;
# Initialize coincidence counts
for my $v (@vals) {
for my $v_ (@vals) {
$coinc{$v}{$v_} = 0;
}
}
# Iterate over each unit
for my $unit (@{ $self->units }) {
my @keys = keys %$unit;
my $unit_count = @keys;
# Count occurrences of each value in the current unit
my %value_count;
$value_count{ $unit->{$_} }++ for @keys;
# Calculate coincidences based on the value counts
for my $v (@vals) {
for my $v_ (@vals) {
if (exists $value_count{$v} && exists $value_count{$v_}) {
my $coinc_count = 0;
# Count pairs of keys that match the values
for my $key1 (@keys) {
for my $key2 (@keys) {
next if $key1 eq $key2; # Skip same keys
if ($unit->{$key1} eq $v && $unit->{$key2} eq $v_) {
$coinc_count++;
}
}
}
# Update the coincidence count
$coinc{$v}{$v_} += $coinc_count / ($unit_count - 1) if $unit_count > 1;
}
}
}
}
return \%coinc
}
That’s many lines. We’ve got comments, which should be nice, but… For the large data, I’d still have had to wait for several hours.
The Optimisation
I decided to bite the bullet and tried to optimise the code myself. I went over both the original code and the suggested improvement to realise the LLM wanted to cache the number of values in %value_count
but never really used the value: it only checked for existence.
What was really needed was not to iterate over all the values every time skipping the irrelevant ones, but to precompute the relevant values and reduce the breadth of the loops. Usually, there were only two or three values per unit (each sentence was annotated by at least two annotators), so there was no reason to check all sixteen possible.
This is the code I wrote and released as the new version of the module. For the large dataset, the running time dropped from six hours to four seconds.
sub _build_coincidence($self) {
my @vals = $self->vals;
my %coinc;
@{ $coinc{$_} }{@vals} = (0) x @vals for @vals;
for my $unit (@{ $self->units }) {
my %is_value;
@is_value{ values %$unit } = ();
my @values = keys %is_value;
my @keys = keys %$unit;
for my $v (@values) {
for my $v_ (@values) {
my $coinc_count = 0;
for my $key1 (@keys) {
for my $key2 (@keys) {
next if $key1 eq $key2;
++$coinc_count
if $unit->{$key1} eq $v
&& $unit->{$key2} eq $v_;
}
}
$coinc{$v}{$v_} += $coinc_count / (@keys - 1);
}
}
}
return \%coinc
}
I showed my final code to the LLM. It congratulated me on the brilliant solution, emphasising some of the tricks I’d used to make it faster (e.g. filling the matrix by zeros using the repetition operator).
Did it help me? I’m not sure. It motivated me to study the code and find the solution, but it didn’t really contribute to it. It took me about two hours including finding the solution myself.
What’s your experience with AI?
Leave a comment