How I didn't fix AnyEvent::ForkManager
For April, I was assigned AnyEvent::ForkManager, which claims to provide an interface similar to Parallel::ForkManager, but compatible with AnyEvent. The module had some CPAN testers’ failures as well as an issue reported on GitHub, so I tried to fix it. I wasn’t quite successful, though.
The issue reported that tests for the module hung on MSWin. At work, I use Cygwin, so I tried to install the module there to see how the Linux/MSWin hybrid would do. I was able to install all prerequisites, including the main one, AnyEvent, “a framework to do event-based programming”. Nevertheless, the tests for the module itself got stuck, even if in a different place than reported in the issue.
I sprinkled the code with debugging messages (Basic debugging checklist #2) to discover the following line doesn’t return:
isnt $$, $pm->manager_pid, 'called by child';
At first, I thought that manager_pid
was the problem,
so I extracted the call from the statement:
my $mpid = $pm->manager_pid;
isnt $$, $mpid, 'called by child';
Surprisingly, $mpid
was populated correctly, it was
the isnt
that didn’t return. It seemed very suspicious:
it’s used in all the test suits on CPAN, it shouldn’t cause
problems! Or, maybe, the isnt
wasn’t
the isnt
I thought. I checked the dependencies, and
discovered Test::SharedFork
which defines its own testing subroutines. Adding some debugging
output to it revealed the real problem in the constructor of a
Test::SharedFork::Store::Locker
object:
flock $store->{fh}, LOCK_EX or die $!;
The flock was waiting for the exclusive lock infinitely. Just for curiosity, I inserted the following before the problematic line:
use Data::Dumper;
$Data::Dumper::Deparse = 1;
warn Dumper($store);
Strangely, not only was I able to explore the structure, but all the
tests passed. “Race condition!” thought I and tried to replace the
lines with Time::HiRes::usleep(200)
. The tests were
still passing, but when I lowered the value, they started to get
hung again.
Race conditions appear only sometimes, so I tried running the test suite 50 times on my Linux desktop. It failed 7 times with the following detail:
Interrupted system call at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 104.
Can't use an undefined value as a HASH reference at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 51.
END failed--call queue aborted at xt/nonblocking.t line 104.
On my laptop, the failures were less frequent (about 2/50), and sometimes, the message was different:
Interrupted system call at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 104.
Magic number checking on storable file failed at
/usr/lib/perl5/5.18.1/x86_64-linux-thread-multi/Storable.pm line 398, at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 51.
END failed--call queue aborted at xt/nonblocking.t line 104.
Line 104 in Store.pm is the flock line shown above.
Unfortunatelly, I didn’t have enough time to debug this further. It
was the end of April already, so I had to ask to “Stick” with the
assignment. To get rid of it and get my May assignment, I just fixed
some typos
in the documentation,
removed use
utf8 where it wasn’t needed,
and replaced select
undef, undef, undef
with Time::HiRes::usleep
, especially
because Time::HiRes was
already used.
Leave a comment