Perl 6 IO TPF Grant: Monthly Report (March, 2017)

This document is the March, 2017 progress report for TPF Standardization, Test Coverage, and Documentation of Perl 6 I/O Routines grant

Timing

My delivery of the Action Plan was one week later than I originally expected to deliver it. The delay let me assess some of the big-picture consistency issues, which led to proposal to remove 15 methods from IO::Handle and to iron out naming and argument format for several other routines.

I still hope to complete all the code modifications prior to end of weekend of April 15, so all of these can be included in the next Rakudo Star release. And a week after, I plan to complete the grant.

Note: to minimize user impact, some of the changes may be included only in 6.d language, which will be available in 2017.04 release only if the user uses use v6.d.PREVIEW pragma.

IO Action Plan

I finished the IO Action Plan, placed it into /doc of rakudo's repository, and made it available to other core devs for review for a week (period ends on April 1st). The Action Plan is 16 pages long and contains 26 sections detailing proposed changes.

Overall, I proposed many more smaller changes than I originally expected and fewer larger, breaking changes than I originally expected. This has to do with a much better understanding of how rakudo's IO routines are "meant to" be used, so I think the improvement of the documentation under this grant will be much greater than I originally anticipated.

A lot of this has to do with lack of explanatory documentation for how to manipulate and traverse paths. This had the effect that users were using the $*SPEC object (157 instances of its use in the ecosystem!) and its routines for that goal, which is rather awkward. This concept is prevalent enough that I even wrote SPEC::Func module in the past, due to user demand, and certain books whose draft copies I read used the $*SPEC as well.

In reality, $*SPEC is an internal-ish thing and unless you're writing your own IO abstractions, you never need to use it directly. The changes and additions to the IO::Path methods done under this grant will make traversing paths even more pleasant, and the new tutorial documentation I plan to write under this grant will fully describe the Right Way™ to do it all.

In fact, removal of $*SPEC in future language versions is currently under consideration...

Removal of $*SPEC

lizmat++ pointed out that we can gain significant performance improvements by removing $*SPEC infrastructure and moving it into module-space. For example, a benchmark of slurping a 10-line file shows that removal of all the path processing code makes benched program run more than 14x faster. When benching IO::Path creation, dynamic var lookup alone takes up 14.73% of the execution time.

The initial plan was to try and make IO routines handle all OSes in a unified way (e.g. using / on Windows), however it was found this would create several ambiguities and would be buggy, even if fast.

However, I think there are still a lot of improvements that can be gained by making $*SPEC infrastructure internal. So we'd still have the IO::Spec-type modules but they'll have a private API we can optimize freely, and we'll get rid of the dynamic lookups, consolidate what code we can into IO::Path, while keeping the functionality that differs between OSes in the ::Spec modules.

Since this all sounds like guestimation and there's a significant-ish use of $*SPEC in the ecosystem, the plan now is to implement it all in a module first and see whether it works well and offers any significant performance improvements. If it does, I believe it should be possible to swap IO::Path to use the fast version in 6.d language, while still leaving $*SPEC dynvar and its modules in core, as deprecated, for removal in 6.e.

This won't be done under this grant, and while trying not to over-promise, I hope to release this module some time in May-June. So keep an eye out for it; I already picked out a name: FastIO

newio Branch

As per original goals of the grant, I reviewed the code in Rakudo's 2014–2015 newio branch, to look for any salvagable ideas. I did not have any masterplan design documents to go with it and I tried a few commits but did not find one that didn't have merge conflicts and compiled (it kept complaining about ModuleLoader), so my understanding of it comes solely from reading the source code, and may be off from what the original author intended it to be.

The major difference between newio and Rakudo's current IO structure is type hierarchy and removal of $*SPEC. newio provides IO::Pathy and PIO roles which are done by IO::File, IO::Dir, IO::Local, IO::Dup, IO::Pipe, and IO::Huh classes that represent various IO objects. The current Rakudo's system has fewer abstractions: IO::Path represents a path to an IO object and IO::Handle provides read/write access to it, with IO::Pipe handling pipes, and no special objects for directories (their contents are obtained via IO::Path.dir method and their attributes are modified via IO::Path methods).

Since 6.d language is additive to 6.c language, completely revamping the type hierarchy may be challenging and messy. I'm also not entirely sold on what appears to be one of the core design ideas in newio: most of the abstractions are of IO objects as they were at the object instantiation time. An IO::Pathy object represents an IO item that exists, despite there being no guarantees that it actually does. Thus, IO::File's .f and .e methods always return True, while its .d method always returns False. This undoubtedly gives a performance enhancement, however, if $ rm foo were executed after IO::File object's creation, the .e method would no longer return correct data and if then $ mkdir foo were executed, both .f and .d methods would be returning incorrect data.

Until recently, Rakudo cached the result of .e call and that produced unexpected by user behaviour. I think the issue will be greatly exacerbated if this sort of caching is extended to entire objects and many of their methods.

However, I do think the removal of $*SPEC is a good idea. And as described in previous section I will try to make a FastIO module, using ideas from newio branch, for possible inclusion in future language versions.

Experimental MoarVM Coverage Reporter

As was mentioned in my grant proposal, the coverage reporter was busted by the upgrade of information returned by .file and .line methods on core routines. MasterDuke++ made several commits fixing numerous issues to the coverage parser and last night I identified the final piece of the breakage. The annotations and hit reports all use the new SETTING::src/core/blah file format. The setting file has SETTING::src/core/blah markers inside of it. The parser however, still thinks it's being fed the old gen/moar/CORE.setting filenames, so once I teach it to calculate proper offsets into the setting file, we'll have coverage reports on perl6.wtf back up and running and I'll be able to use them to judge IO routine test coverage required for this grant.

Performance Improvements

Although not planned by the original grant, I was able to make the following performance enhancements to IO routines. So hey! Bonus deliverables \o/:

  • rakudo/fa9aa47 Make R::I::SET_LINE_ENDING_ON_HANDLE 4.1x Faster
  • rakudo/0111f10 Make IO::Spec::Unix.catdir 3.9x Faster
  • rakudo/4fdebc9 Make IO::Spec::Unix.split 36x Faster
    • Affects IO::Path's .parent, .parts, .volume, .dirname, and .basename
    • Measurement of first call to .basename shows it's now 6x-10x faster
  • rakudo/dcf1bb2 Make IO::Spec::Unix.rel2abs 35% faster
  • rakudo/55abc6d Improve IO::Path.child perf on *nix:
    • make IO::Path.child 2.1x faster on *nix
    • make IO::Spec::Unix.join 8.5x faster
    • make IO::Spec::Unix.catpath 9x faster
  • rakudo/4032953 Make IO::Handle.open 75% faster
  • rakudo/4eef6db Make IO::Spec::Unix.is-absolute about 4.4x faster
  • rakudo/ae5e510 Make IO::Path.new 7% faster when creating from Str
  • rakudo/0c6281 Make IO::Pipe.lines use IO::Handle.lines for 3.2x faster performance

Performance Improvements Made By Other Core Developers

lizmat++ also made these improvements in IO area:

Along with the commits above, she also made IO::Handle.lines faster and eliminated a quirk that required custom .lines implementation in IO::Pipe (which is a subclass of IO::Handle). Due to that, I was able to remove old IO::Pipe.lines implementation and make it use new-and-improved IO::Handle.lines, which made the method about 3.2x faster.

Bugs

Will (attempt to) fix as part of the grant

  • IO::Pipe inherits .t method from from IO::Handle to check if the handle is a TTY, however, attempt to call it causes a segfault. MasterDuke++ already found the candidate for the offending code (MoarVM/Issue#561) and this should be resolved by the time this grant is completed.

Don't think I will be able to fix these as part of the grant

  • Found a strange error generated when IO::Pipe's buffer is filled up. This is too deep in the guts for me to know how to resolve yet, so I filed it as RT#131026

Already Fixed

  • Found that IO::Path had a vestigial .pipe method that delegated to a non-existant IO::Handle method. Removed in rakudo/a01d67
  • Fixed IO::Pipe.lines not accepting a Whatever as limit, which is accepted by all other .lines. rakudo/0c6281 Tests in roast/465795 and roast/add852
  • Fixed issues due to caching of IO::Handle.e. Reported as RT#130889. Fixed in rakudo/76f718. Tests in roast/908348
  • Rejected rakudo PR#666 and resolved RT#126262 by explaining why the methods return Str objects instead of IO::Path on ticket/PR and improving the documentation by fixing mistakes (doc/ccae74) and expanding (doc/3cf943) on what the methods do exactly.
  • IO::Path.Bridge was defunct, as it was trying to call .Bridge on Str, which does not exist. Resolved the issue by deleting this method in rakudo/212cc8
  • Per demand, made IO::Path.dir a multi, so module-space can augment it with other candidates that add more functionality. rakudo/fbe7ace

Leave a comment

About Zoffix Znet

user-pic I blog about Perl.