How I didn't fix AnyEvent::ForkManager

For April, I was assigned AnyEvent::ForkManager, which claims to provide an interface similar to Parallel::ForkManager, but compatible with AnyEvent. The module had some CPAN testers’ failures as well as an issue reported on GitHub, so I tried to fix it. I wasn’t quite successful, though.

The issue reported that tests for the module hung on MSWin. At work, I use Cygwin, so I tried to install the module there to see how the Linux/MSWin hybrid would do. I was able to install all prerequisites, including the main one, AnyEvent, “a framework to do event-based programming”. Nevertheless, the tests for the module itself got stuck, even if in a different place than reported in the issue.

I sprinkled the code with debugging messages (Basic debugging checklist #2) to discover the following line doesn’t return:

  isnt $$, $pm->manager_pid, 'called by child';

At first, I thought that manager_pid was the problem, so I extracted the call from the statement:

  my $mpid = $pm->manager_pid;
  isnt $$, $mpid, 'called by child';

Surprisingly, $mpid was populated correctly, it was the isnt that didn’t return. It seemed very suspicious: it’s used in all the test suits on CPAN, it shouldn’t cause problems! Or, maybe, the isnt wasn’t the isnt I thought. I checked the dependencies, and discovered Test::SharedFork which defines its own testing subroutines. Adding some debugging output to it revealed the real problem in the constructor of a Test::SharedFork::Store::Locker object:

  flock $store->{fh}, LOCK_EX or die $!;

The flock was waiting for the exclusive lock infinitely. Just for curiosity, I inserted the following before the problematic line:

  use Data::Dumper;
  $Data::Dumper::Deparse = 1;
  warn Dumper($store);

Strangely, not only was I able to explore the structure, but all the tests passed. “Race condition!” thought I and tried to replace the lines with Time::HiRes::usleep(200). The tests were still passing, but when I lowered the value, they started to get hung again.

Race conditions appear only sometimes, so I tried running the test suite 50 times on my Linux desktop. It failed 7 times with the following detail:

Interrupted system call at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 104.
Can't use an undefined value as a HASH reference at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 51.
END failed--call queue aborted at xt/nonblocking.t line 104.

On my laptop, the failures were less frequent (about 2/50), and sometimes, the message was different:

Interrupted system call at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 104.
Magic number checking on storable file failed at
/usr/lib/perl5/5.18.1/x86_64-linux-thread-multi/Storable.pm line 398, at /home/choroba/perl5/lib/perl5/Test/SharedFork/Store.pm line 51.
END failed--call queue aborted at xt/nonblocking.t line 104.

Line 104 in Store.pm is the flock line shown above.

Unfortunatelly, I didn’t have enough time to debug this further. It was the end of April already, so I had to ask to “Stick” with the assignment. To get rid of it and get my May assignment, I just fixed some typos in the documentation, removed use utf8 where it wasn’t needed, and replaced select undef, undef, undef with Time::HiRes::usleep, especially because Time::HiRes was already used.

Leave a comment

About E. Choroba

user-pic I blog about Perl.