575 Pull Requests in Three Weeks: What Happens When AI Meets CPAN Maintenance
On March 17th, I installed a bot called koan on my personal Claude account. It's designed to monitor your four-hour usage limits, maintain a queue of "missions," and efficiently use your credits — even while you're sleeping.
Three weeks later, I had reviewed and merged 575 pull requests across 20 CPAN repositories, cut 58 releases across 17 of them, and caught memory leaks, security holes, and long-standing bugs that nobody had gotten around to fixing in years.
Some people think that means I let an AI commit unchecked code to critical Perl infrastructure. I want to explain what actually happened.
The Numbers
Here's what those three weeks produced:
| Repository | PRs Merged | Releases |
|---|---|---|
| XML-Parser | 95 | 8 |
| IPC-Run | 68 | 3 |
| YAML-Syck | 64 | 7 |
| Net-Jabber-Bot | 38 | 3 |
| Net-ACME2 | 38 | 1 |
| Net-Ident | 33 | 5 |
| Crypt-RIPEMD160 | 33 | 4 |
| IO-Stty | 30 | 4 |
| Razor2-Client-Agent | 27 | 2 |
| Tree-MultiNode | 24 | 2 |
| IO-Tty | 22 | 7 |
| Business-UPS | 20 | 2 |
| Test-MockFile | 17 | 2 |
| Safe-Hole | 17 | 3 |
| Tie-DBI | 15 | 1 |
| Regexp-Parser | 11 | 3 |
| Sys-Mmap | 9 | 1 |
| Template2 | 6 | — |
| Crypt-OpenSSL-RSA | 4 | — |
| CDB_File | 4 | — |
| Total | 575 | 58 |
Those are the kind of numbers that make people nervous, and I understand why. Let me explain why they shouldn't be.
Every PR Was Reviewed by a Human
I need to say this plainly: every single pull request was reviewed and merged by me. Not rubber-stamped. Reviewed.
The koan bot submitted fixes, improved Kwalitee across the repositories, and worked to ensure CI covered as much surface area as possible so that little could merge without being tested. But it didn't have merge access. I was the bottleneck by design.
There was real pushback during review. When fixes looked wrong, I said so. When explanations were missing, I asked for them. When an approach was the wrong one, I rejected it. This was not a case of an AI firehosing code into production. It was a case of a maintainer using an AI to generate candidate fixes at a pace that would have been impossible for one person to write — but not impossible for one person to review.
That's the distinction people keep missing.
The Failures Told Us What We Didn't Know
Here's the part that actually gets interesting. The CPAN Testers matrix tracks test results across Perl versions, operating systems, and configurations. When we shipped releases, some of them failed. Look at the data:
| Dist | Before | After Fix |
|---|---|---|
| IPC-Run | 103 FAIL (20260322.0) | 0 FAIL (20260327.0) |
| IO-Stty | 20 FAIL (0.05) | 0 FAIL (0.08) |
| Net-Ident | 25 FAIL (1.26) | 0 FAIL (1.27) |
| YAML-Syck | 52 FAIL (1.38) | 0 FAIL (1.40) |
| IO-Tty | 18 FAIL + 31 UNK (1.20–1.21) | 1 FAIL (1.27) |
| XML-Parser | 11 FAIL (2.49) | 0 FAIL (2.57) |
Were there regressions? Yes. IPC-Run 20260322.0 shipped with 103 failures. That's because the AI-generated changes exposed CI gaps we didn't know existed — configurations we weren't testing, platforms we hadn't considered. Five days later it was at zero. IO-Stty went from 20 failures down to zero across four releases. YAML-Syck spiked at 1.38, was fixed by 1.40, spiked again at 1.42 with 86 failures on a different issue, and was clean again by 1.43.
The failures weren't the problem. The failures were the signal. They showed us where our CI was incomplete, and the rapid release cadence meant we could respond in days instead of months.
What We Actually Fixed
This wasn't just reformatting code and updating boilerplate. Across these repositories, we found and addressed:
- Memory leaks that had been lurking for years
- Security vulnerabilities that no one had audited for
- Long-standing bugs that users had reported but no one had time to fix
- More complete implementations of features people had requested for years but nobody had done
- CI blind spots — entire platforms and Perl configurations that were never being tested
Many of these modules are deep infrastructure. XML-Parser, IPC-Run, YAML-Syck, IO-Tty — these aren't hobby projects. They're load-bearing walls in the Perl ecosystem. The work that got done in three weeks would have taken a solo maintainer the better part of a year, assuming they had the time at all.
The Reaction
The volume of activity got attention, and not all of it was positive. Some people looked at the PR count and concluded it must be AI slop — untested, unreviewed code flooding CPAN. Gentoo's packagers nearly banned my modules on the assumption that I was blindly shipping AI-generated code.
I'd encourage anyone with that concern to look at the actual diffs, the CI results, and the review comments. They're all public. If a specific change is wrong, let's talk about it — that's how open source is supposed to work.
What's worth noting is the double standard. The Perl community routinely accepts drive-by patches from complete strangers. Nobody demands that a first-time human contributor prove their code wasn't generated by autocomplete or copied from Stack Overflow. But attach the label "AI" and suddenly the code quality of the entire module is in question.
"It was generated by AI" is not a technical objection. The code either works or it doesn't.
AI_POLICY.md
In response to the concerns, we now ship an AI_POLICY.md in our repositories. You can read the full document, but it comes down to one line:
AI assists. Humans decide.
The document lays out exactly how AI is used: analyzing issues, generating draft PRs, surfacing context from the codebase. And it makes explicit what should already be obvious — every pull request, whether AI-drafted or human-authored, is reviewed by a human maintainer before merge. AI drafts are treated the same way you'd treat a junior contributor's first attempt: useful raw material that still needs experienced eyes.
We wrote this policy not because we had to, but because transparency matters. If AI is going to be part of open source maintenance — and it already is, whether projects acknowledge it or not — then the community deserves to know how it's being used.
The Real Question
The question isn't whether AI should be involved in open source maintenance. It already is. The question is whether maintainers are going to be honest about it and put guardrails in place, or whether it's going to happen quietly with no review process at all.
I chose the transparent path. I reviewed every PR. I shipped an AI policy. I responded to regressions within days. I'm accountable for every line that shipped, the same way I've been accountable for these modules for years.
575 pull requests. 58 releases. Memory leaks found. Security holes closed. CI gaps filled. Bugs fixed. Features completed.
The code speaks for itself.
Nicely Done !!!! This is a great example of AI assistance and disclosure!
Im curious to know the dollar amount spent on AI to achieve these results. Will you be doing this again on a set of different libraries in the future ?
Nice. I've been using AI to clean up some of my code, and I learned some things. It's not an existential threat if you don't let it be. It can help you get better and unclutter technical debt that weighs on you so long you forget about it.
This is very interesting, and I hope it leads somewhere positive.
What are your thoughts on addressing the problem of "trained ignorance" in LLMs?
False negatives when looking for security issues can be pretty nasty, I think.
Maybe if one uses multiple independently trained models and/or a second pair of human eyes to review the code, this risk can be minimized?
The $100 monthly claude account was enough. $20 could have been used but it would have taken longer.
> What are your thoughts on addressing the problem of "trained ignorance" in LLMs?
Not sure what you mean here. Making sure the code is well documented and POD-ed certainly makes claude more knowledgeable. In one case I caused a regression in "false" behavior with XML::Parser which I had to revert. this caused downstream outages but ended with tests and a more documented API.
> False negatives when looking for security issues can be pretty nasty, I think.
Claude only hi-lights the possibility of security issues. But it does so clearly in the PR. When it does so, it is instructed to ALSO provide a unit test that clearly shows a way to reproduce the issue so that any researcher or reviewer can review it and judge for themselves. Just like a real human submitting a security issue, it's up to me and security teams to decide if this is something worthy of concern.
> Will you be doing this again on a set of different libraries in the future ?
If people are open to it. I'm currently looking at offering to help with LibXML but I consider this to be a decision of the current maintainer if they want the help.
When you review at breakneck speeds like 190 PRs per week, what you approve is stuff like this:
https://github.com/cpan-authors/XML-Parser/pull/118#issuecomment-4165348882
… and this:
https://github.com/cpan-authors/XML-Parser/pull/140#issuecomment-4165195217
So with all that, <intonated>I need to say this plainly: each of these pull requests was reviewed and merged by you. Not rubber-stamped. Reviewed.</intonated>
[Silence]
Yes. So would I.
Ship first. Talk later.
> So with all that, I need to say this plainly: each of these pull requests was reviewed and merged by you. Not rubber-stamped. Reviewed.
Correct! I also remember the decision I made on both of these and I determined they were warranted. One of them broke downstream and I was able to quickly fix it. I now have documentation in the code base explaining why that behavior exists.