575 Pull Requests in Three Weeks: What Happens When AI Meets CPAN Maintenance

On March 17th, I installed a bot called koan on my personal Claude account. It's designed to monitor your four-hour usage limits, maintain a queue of "missions," and efficiently use your credits — even while you're sleeping.

Three weeks later, I had reviewed and merged 575 pull requests across 20 CPAN repositories, cut 58 releases across 17 of them, and caught memory leaks, security holes, and long-standing bugs that nobody had gotten around to fixing in years.

Some people think that means I let an AI commit unchecked code to critical Perl infrastructure. I want to explain what actually happened.

The Numbers

Here's what those three weeks produced:

Repository PRs Merged Releases
XML-Parser958
IPC-Run683
YAML-Syck647
Net-Jabber-Bot383
Net-ACME2381
Net-Ident335
Crypt-RIPEMD160334
IO-Stty304
Razor2-Client-Agent272
Tree-MultiNode242
IO-Tty227
Business-UPS202
Test-MockFile172
Safe-Hole173
Tie-DBI151
Regexp-Parser113
Sys-Mmap91
Template26
Crypt-OpenSSL-RSA4
CDB_File4
Total57558

Those are the kind of numbers that make people nervous, and I understand why. Let me explain why they shouldn't be.

Every PR Was Reviewed by a Human

I need to say this plainly: every single pull request was reviewed and merged by me. Not rubber-stamped. Reviewed.

The koan bot submitted fixes, improved Kwalitee across the repositories, and worked to ensure CI covered as much surface area as possible so that little could merge without being tested. But it didn't have merge access. I was the bottleneck by design.

There was real pushback during review. When fixes looked wrong, I said so. When explanations were missing, I asked for them. When an approach was the wrong one, I rejected it. This was not a case of an AI firehosing code into production. It was a case of a maintainer using an AI to generate candidate fixes at a pace that would have been impossible for one person to write — but not impossible for one person to review.

That's the distinction people keep missing.

The Failures Told Us What We Didn't Know

Here's the part that actually gets interesting. The CPAN Testers matrix tracks test results across Perl versions, operating systems, and configurations. When we shipped releases, some of them failed. Look at the data:

Dist Before After Fix
IPC-Run103 FAIL (20260322.0)0 FAIL (20260327.0)
IO-Stty20 FAIL (0.05)0 FAIL (0.08)
Net-Ident25 FAIL (1.26)0 FAIL (1.27)
YAML-Syck52 FAIL (1.38)0 FAIL (1.40)
IO-Tty18 FAIL + 31 UNK (1.20–1.21)1 FAIL (1.27)
XML-Parser11 FAIL (2.49)0 FAIL (2.57)

Were there regressions? Yes. IPC-Run 20260322.0 shipped with 103 failures. That's because the AI-generated changes exposed CI gaps we didn't know existed — configurations we weren't testing, platforms we hadn't considered. Five days later it was at zero. IO-Stty went from 20 failures down to zero across four releases. YAML-Syck spiked at 1.38, was fixed by 1.40, spiked again at 1.42 with 86 failures on a different issue, and was clean again by 1.43.

The failures weren't the problem. The failures were the signal. They showed us where our CI was incomplete, and the rapid release cadence meant we could respond in days instead of months.

What We Actually Fixed

This wasn't just reformatting code and updating boilerplate. Across these repositories, we found and addressed:

  • Memory leaks that had been lurking for years
  • Security vulnerabilities that no one had audited for
  • Long-standing bugs that users had reported but no one had time to fix
  • More complete implementations of features people had requested for years but nobody had done
  • CI blind spots — entire platforms and Perl configurations that were never being tested

Many of these modules are deep infrastructure. XML-Parser, IPC-Run, YAML-Syck, IO-Tty — these aren't hobby projects. They're load-bearing walls in the Perl ecosystem. The work that got done in three weeks would have taken a solo maintainer the better part of a year, assuming they had the time at all.

The Reaction

The volume of activity got attention, and not all of it was positive. Some people looked at the PR count and concluded it must be AI slop — untested, unreviewed code flooding CPAN. Gentoo's packagers nearly banned my modules on the assumption that I was blindly shipping AI-generated code.

I'd encourage anyone with that concern to look at the actual diffs, the CI results, and the review comments. They're all public. If a specific change is wrong, let's talk about it — that's how open source is supposed to work.

What's worth noting is the double standard. The Perl community routinely accepts drive-by patches from complete strangers. Nobody demands that a first-time human contributor prove their code wasn't generated by autocomplete or copied from Stack Overflow. But attach the label "AI" and suddenly the code quality of the entire module is in question.

"It was generated by AI" is not a technical objection. The code either works or it doesn't.

AI_POLICY.md

In response to the concerns, we now ship an AI_POLICY.md in our repositories. You can read the full document, but it comes down to one line:

AI assists. Humans decide.

The document lays out exactly how AI is used: analyzing issues, generating draft PRs, surfacing context from the codebase. And it makes explicit what should already be obvious — every pull request, whether AI-drafted or human-authored, is reviewed by a human maintainer before merge. AI drafts are treated the same way you'd treat a junior contributor's first attempt: useful raw material that still needs experienced eyes.

We wrote this policy not because we had to, but because transparency matters. If AI is going to be part of open source maintenance — and it already is, whether projects acknowledge it or not — then the community deserves to know how it's being used.

The Real Question

The question isn't whether AI should be involved in open source maintenance. It already is. The question is whether maintainers are going to be honest about it and put guardrails in place, or whether it's going to happen quietly with no review process at all.

I chose the transparent path. I reviewed every PR. I shipped an AI policy. I responded to regressions within days. I'm accountable for every line that shipped, the same way I've been accountable for these modules for years.

575 pull requests. 58 releases. Memory leaks found. Security holes closed. CI gaps filled. Bugs fixed. Features completed.

The code speaks for itself.

9 Comments

Nicely Done !!!! This is a great example of AI assistance and disclosure!

Im curious to know the dollar amount spent on AI to achieve these results. Will you be doing this again on a set of different libraries in the future ?

Nice. I've been using AI to clean up some of my code, and I learned some things. It's not an existential threat if you don't let it be. It can help you get better and unclutter technical debt that weighs on you so long you forget about it.

This is very interesting, and I hope it leads somewhere positive.

What are your thoughts on addressing the problem of "trained ignorance" in LLMs?

False negatives when looking for security issues can be pretty nasty, I think.

Maybe if one uses multiple independently trained models and/or a second pair of human eyes to review the code, this risk can be minimized?

I need to say this plainly: every single pull request was reviewed and merged by me. Not rubber-stamped. Reviewed.

When you review at breakneck speeds like 190 PRs per week, what you approve is stuff like this:

https://github.com/cpan-authors/XML-Parser/pull/118#issuecomment-4165348882

… and this:

https://github.com/cpan-authors/XML-Parser/pull/140#issuecomment-4165195217

So with all that, <intonated>I need to say this plainly: each of these pull requests was reviewed and merged by you. Not rubber-stamped. Reviewed.</intonated>

Some people looked at the PR count and concluded it must be AI slop — untested, unreviewed code flooding CPAN. Gentoo's packagers nearly banned my modules on the assumption that I was blindly shipping AI-generated code.

[Silence]

I'd encourage anyone with that concern to look at the actual diffs, the CI results, and the review comments.

Yes. So would I.

If a specific change is wrong, let's talk about it

Ship first. Talk later.

Leave a comment

About Todd Rinaldo

user-pic I blog about Perl.