The Coro situation
Since my recent participation at the QA Hackathon I have become aware that rather more people than I expected do not know the specifics of this situation. Fewer than I expected have heard of it at all, even, although there appears to be some general awareness at the “something happened with that” level at least.
However, the situation is being used to characterise Marc Lehmann whenever his name comes up (and it comes up rather more often than I would expect or consider necessary).
To give a clear picture of the facts and to avoid repeating that exercise every time I have a related conversation, here is an outline of where we are and how we got here.
(Thanks to Andreas König, Graham Knop, and Peter Rabbitson for proofreading drafts of this article and verifying the stated facts.)
Here is a timeline of relevant events:
In late 2007, Coro gets a new feature: per-coroutine die/warn handlers.
Various bugs with the feature are reported and fixed, e.g. not working on perls earlier than 5.8.8. Tests slowly accumulate.
One of these bugs is that the Perl-level
PL_warnhookcan diverge (and likewise for
__DIE__). The workaround requires hooking into a very low-level point in the perl internals.
In Perl 5.21.8, in a pursuit for memory savings, the ability to hook into these internals is removed with patch c910fead7893. This results in a savings of 4 kilobytes total per perl process.
Coro is found to be broken. An issue is raised before the release of 5.22.0, but discussion fizzles out.
5.22.0 is released with both patches that break Coro.
perl #125439 is filed, about the imperfect synchronisation between
PL_foohook, but aside from Corion’s valiant but not yet successful attempt, little energy is put into fixing it.
Coro is now in a place where the options are as follows:
Make the tests and workaround conditional on perl version and accept being buggy with no recourse for some evidently non-hypothetical users, which implies headaches for testing and surprises for users moving from version to version.
The author of Coro picks option 2. He also provides an unofficial release of perl 5.22 called “stableperl” as a courtesy for users with no way out nor the skill and inclination to figure out and apply the patches necessary to unbreak Coro under 5.22 and following.
At opening of the next Perl release cycle, in 5.23.0, 9b7476d7a269 is corrected by patch 73949fca082f to restore the lvalue-ness of the macro. This leaves only one change still breaking Coro, so during most of the 5.23 cycle, it can be unbroken by applying the following patch, reproduced here in its entirety:
--- i/perl.h +++ w/perl.h @@ -5586 +5586 @@ -#define EXT_MGVTBL EXTCONST MGVTBL +#define EXT_MGVTBL EXT MGVTBL
It is decided that the patch shown above cannot be applied in 5.22.1, because that would violate the strict binary compatibility promise for point releases of perl within a stable version series. Regrettable/unfortunate though this may be, it is true, and not a simple technicality.
However, no move is ever made to apply it during the 5.23 development series.
In 5.24.0-RC3, the ability to hook those internals is still absent, so the situation for Coro remains unchanged, implying that it will remain broken for at least another release cycle.
A call is made to discuss whether the author of Coro has forfeited his ownership of the namespace.
Effects of the decision to keep that change are still cascading out, e.g. miyagawa/Task-Plack#3.
The still-unreverted patch was also reported (in perl #123687) to affect mod_perl 2, where it was fixed as documented in RT #101962, resulting in patch 82827132efd3. The solution was to increase the complexity of the hooking code, so that it can inject the same workaround as before but into a different point of the internals where it has a more limited scope. Contrast the amount of volunteer effort with the 4kb per process savings.
In all this, mod_perl 1 went entirely unaccounted for. It is unknown whether any users have tried to use it with perl 5.22, which can be expected not to work. Note also that c910fead7893 removes a comment which mentions that some SWIG bindings require the ability to hook those internals which just got removed.
People have been speaking of Marc Lehmann “refusing” to support 5.22 in Coro. It has been suggested that he could try to find a workaround.
But for one, if it is fixable, it is highly non-obvious how. The way mod_perl 2 was fixed does not appear applicable to Coro due to difference in the specifics of the two situations. This is implicitly acknowledged by Corion’s comment in perl #125439:
I think Perl should take the code from Coro to make
… and further evidenced by the fact that the unofficial release of Coro does not have a workaround despite the expertise of Reini Urban.
And for another, even if a workaround were possible, consider that Coro was working around a bug to begin with. That workaround got broken. It is not clear that a different workaround would be future-proof, outside of p5p blessing one as such. Is it on Coro’s author to keep working around broken workarounds for the same bug whenever p5p breaks another one? Or is it on p5p to either fix the underlying bug or else avoid breaking its workarounds?
To be fair, the monkeypatching performed by Coro and mod_perl hooking into that table is very global: it affects every perl interpreter within a process (of which there can be several when perl is embedded in another program). Locking down this hook point is a sensible architectural decision in the abstract. This is rumoured to have been the reason the patch never got reverted. Indeed, for most use cases, it is already possible to hook into other points with more limited scope. This is how mod_perl was saved. But it’s not clear that every existing use case already has cleaner alternatives, and even the ones available are not currently well understood – not even by the developers who locked down the hook point. Consequently there was no attempt to help downstream developers who had previously employed monkeypatches: no attempt was made to survey existing use cases, no documentation was added about clean and future-proof alternatives to existing monkeypatches, no helper functions to make those approaches more approachable… no aid of any kind. Such efforts would have been part of a carefully executed plan to promote cleaner coöperation among low-level extensions. Instead the commit message demonstrates the motivation for the patch prior to ex post facto justifications of its benefit.
Note this alternative sequence in which events could have played out:
When the 4kb-saving patch is found to break Coro and mod_perl, it gets reverted for the time being
A report about the underlying imperfect hook synchronisation bug is filed and a fix is written
A new major version of perl with the fix is released
The author of Coro is approached to say “your workaround has been made unnecessary in perl 5.$recently and later”
When the decision is made that this patch shall be re-applied, an effort is made to prepare the ecosystem (the surveying and documentation as mentioned above), during which the author of Coro is approached to say “the workaround should now be conditionalised because it will break in 5.$soon”
Nothing in that sequence is infeasible. It would have delayed the 4k-saving patch for a few releases, and in exchange it would have prevented any and all suffering inflicted on at least the maintainers and users of software built on Coro.
I close with a recent quote from the pumpking of Perl 5:
perlpolicy […] demands that we keep backward compatibility as much as possible. This is a huge deal for Perl 5, for a lot of reasons.
For one, it means that we don't knowingly break things unless we believe there's a good reason – and we’ve been trying to tighten up “good reason” a bit over time. This means that it's almost always safe to upgrade your perl without unpleasant surprises. The “incompatible changes” section of the changelog file should be short and fairly exhaustive.