CPAN Testers & pre-requisite reporting

Seeing as b.p.o won't let me comment on others' blog entries, I'm posting here.

This is in reply to brian's recent post about a trial version of Test::More.

Firstly. @preaction CPANTS != CPAN Testers. They are two very different projects.

Secondly, this is a conversation that has cropped up before, and I'm still in two minds about it. Short answer: I tend to side with brian. The tester platform shouldn't be doing any testing with the trial/development releases of pre-requisities, unless the tester is going to manually filter the results and send to the pre-requisite author if appropriate. I do understand Ether's perspective too, and there is merit in having these reports, but they more often target the wrong author.

Longer answer: The difficulty we have once the reports hits the Metabase, is that all the parsing into the CPAN Testers databases cannot know whether the test failures are because of the pre-requisite or the distribution being tested. The fault could be in the pre-requisite, but equally as Ether notes, the pre-requisite could be the next release and the distribution being tested makes assumptions that are no longer true. The analysis site, run by Andreas, could possibly highlight this with a suitable corpus of reports, but it too wouldn't necessarily know whether the pre-requisite or the distribution being tested is at fault. It takes a human to determine that.

I want CPAN Testers to be helpful to authors, and in this situation it isn't. The wrong author is being alerted, and in order for the right author to be notified, it requires the wrong author to take the time to feedback to the pre-requisite author. It's a step I don't think we should be asking authors to do. In the short term, the solution is to ask testers not to test with trial/development pre-requisites. This could be to ensure smokers test in a clean environment, or to exclude any trial/development releases if you test in batches.

One longer term solution would be to add more logic in the smoker client to re-attribute the report to the pre-requisite. This would require quite a bit of work, and I'm not sure how easy that would be, but it might be worth someone investigating one of the clients to see what could be done. Another solution would be to add logic to the backend parser, which I plan to abstract out to make it easier to use in more applications. However, both of these solutions could still be attributing reports to the wrong author, and we would be back in the same position, just from a different perspective.

The Admin site can now allow authors/testers to mark reports to be ignored from the system. But I will look to see what can be done so that the system can re-attribute reports rather than ignore them. It's still fixing the problem in the wrong place in my opinion, and we're still relying on authors to do the work, but at least we won't lose the reports.

In Summary, I think testers have a responsibility to direct reports to the right author, and if they don't then stop using trial/development releases as pre-requisites.

8 Comments

The problem with these proposed solutions is that they assume that the release at fault is always going to be a trial one.

I could release a seriously broken version of Exporter::Tiny as a stable release, and break a couple of hundred downstream dependencies, resulting in them each getting dozens of FAIL reports before I noticed my error and uploaded a fixed release.

The fact of the matter is that if my distribution has any dependencies at all (and I'm even counting Perl itself here in the extreme case), a FAIL report might be the result of a problem in any of my dependencies. (Or any of their dependencies!)

I think a much more robust solution would be to introduce an extra "supertype" to the results. Currently we have PASS FAIL NA UNKNOWN, which can be relatively painlessly extended to REGULAR.PASS REGULAR.FAIL ... DEVREL.PASS DEVREL.FAIL...

The "DEVREL" prefix will be appended as soon as there is *anything* in the chain that is, well - a devrel as understood by PAUSE. Then filtering these results in the consumers will be beyond trivial, while the metabase infrastructure itself won't need much tweaking at all.

> The "DEVREL" prefix will be appended as soon as there is *anything* in the chain

I like this a lot, and it's similar to what I was thinking too. However, I realize now there's a problem here -- the release status of a module can't always be determined from the module itself (at least, not the $VERSION) - you need the distribution metadata to determine that.

There is one (relatively) simple way to handle misdirected failures, however -- allow a mechanism to redirect them. We've already got nearly that, with the new admin site -- just modify the interface to allow redirecting the report back to the tester, *or* to the author of a different module. I know I've certainly received FAIL reports where it's determinable from the test output that some other module was at fault, so it would be nice to be able to send that report directly to that module's queue instead.

I agree with Peter Rabbitsons solution, I left a comment to that same effect in the original post.

I am the one making these Test::More changes, and I would have been a lot more reluctant to make these alpha releases if I know it would give other people red marks. I myself judge modules based on red vs green counts.

I still maintain that "oh, changing the set of test tags beyond the current 4 is too hard" is a dead-end strategy. In order to remain relevant the cpantesters infrastructure must allow for extra test-type additions anyway - the current set is too limited.

In addition I do not believe (given I know and admire the skill of everyone involved) that the current stack is written so badly, that it will be a monumental effort to allow for "super-namespacing" of the test results. At worst it should be a couple day slog, perfect for a hackathon... if there only was an upcoming one ;)

The workarounds otherwise proposed in this thread are... well - workarounds for the real problem.

Is this discussion probably just about limitations we have, as individuals and as a community? I see it that we have microcosmoses of people and modules and interactions between individual entities. Whenever some combination of those does not work we get some sort of problem and impact. And what we do about it is to watch carefully bad interactions so we can countersteer. More tags? What a brilliant idea if everybody could tag in advance what a release would break. But why release at all, then? If people could tag their software as broken in advance, why would they not fix their crap immediately? It's because we cannot predict everything that we have a system of cpantesters so we can watch what actually happens and repair faulty items as we identify them. If there is anything we need to change, it would be to calculate more permutations, not less. But this would again be just a discussion about our limited resources.

Leave a comment

About CPAN Testers

user-pic This is the new account for incidental and summary updates to what's happening with the CPAN Testers. For all the latest news and views please see our blog.