Oh column, where art thou?

When ramiroencinas added FileSystem::Capacity::VolumesInfo to Perl 6 ecosystem I've spotted that it has no macOS support. And while trying to contribute to this module I've discovered how less known Perl 6 features can save the day. What FileSystem::Capacity::VolumesInfo module does is parsing output from df command, which looks like this:

$ df -k -P
Filesystem                                  1024-blocks       Used Available Capacity  Mounted on
/dev/disk3                                   1219749248  341555644 877937604    29%    /
devfs                                               343        343         0   100%    /dev
/dev/disk1s4                                  133638140  101950628  31687512    77%    /Volumes/Untitled
map -hosts                                            0          0         0   100%    /net
map auto_home                                         0          0         0   100%    /home
map -fstab                                            0          0         0   100%    /Network/Servers
//Pawel%20Pabian@biala-skrzynka.local./Data  1951417392 1837064992 114352400    95%    /Volumes/Data
/dev/disk5s2                                 1951081480 1836761848 114319632    95%    /Volumes/Time Machine Backups
bbkr@localhost:/Users/bbkr/foo 123           1219749248  341555644 877937604    29%    /Volumes/osxfuse

(if you see wrapped or trimmed output check raw one here)

And while this looks nice for humans, it is tough task for parser.

  • Columns have dynamic width - so values cannot be extracted by using substring with hardcoded positions.
  • Columns are space separated, spaces padded and their values can contain spaces - so values cannot be extracted using split by whitespace.
  • Filesystem names have different escaping patterns.
  • Some columns are aligned to left, some to right, and one to middle.

So let's use Perl 6 features to deal with this mess.

Capture command line output.

my ($header, @volumes) = run('df', '-k', '-P', :out).out.lines;

Method run executes shell command and returns Proc object. Method out creates Pipe object to receive shell command output. Method lines splits this output into lines, first goes to $header variable, remaining to @volumes array.

Parse header.

my $parsed_header = $header ~~ /^
    ('Filesystem')
    \s+
    ('1024-blocks')
    \s+
    ('Used')
    \s+
    ('Available')
    \s+
    ('Capacity')
    \s+
    ('Mounted on')
$/;

We do it because match object keeps each capture, and each capture knows from and to position at which it matched, for example:

say $parsed_header[1].Str;
say $parsed_header[1].from;
say $parsed_header[1].to;

Will return:

1024-blocks
44
55

That will help us a lot with dynamic columns width!

Extract row values.

First we have to look at border between Filesystem and 1024-blocks column. Because Filesystem is aligned to left and 1024-blocks is aligned to right so data from both columns can occupy space between those headers, for example:

Filesystem                      1024-blocks
/dev/someverybigdisk        111111111111111
me@host:/some directory 123    222222222222
         |                      |
         |<----- possible ----->|
         |<--- border space --->|

We cannot simply split it by space. But we know where 1024-blocks columns ends, so the number that ends at the same position is our volume size. To extract it, we can use another useful Perl 6 feature - regexp position anchor.

for @volumes -> $volume {
    $volume ~~ / (\d+) <.at($parsed_header[1].to)> /;
    say 'Volume size is ' ~ $/[0] ~ 'KB';
}

That finds sequence of digits that are aligned to the position of the end of the header. Every other column can be extracted using this trick if we know the data alignment.

$volume ~~ /
    # first column is not used by module, skip it
    \s+

    # 1024-blocks, right aligned
    (\d+) <.at($parsed_header[1].to)>

    \s+

    # Used, right aligned
    (\d+) <.at($parsed_header[2].to)>

    \s+

    # Available, right aligned
    (\d+) <.at($parsed_header[3].to)>

    \s+

    # Capacity, middle aligned, never longer than header
    <.at($parsed_header[4].from)>
        \s* (\d+ '%') \s*
    <.at($parsed_header[4].to)> 

    \s+

    # Mounted on, left aligned
    <.at($parsed_header[5].from)>(.*)
$/;

Profit!

By using header name positions and position anchors in regular expression we got bomb-proof df parser for macOS that works with regular disks, pendrives, NFS / AFS / FUSE shares, weird directory names and different escaping schemas.

Leave a comment

About Pawel bbkr Pabian

user-pic GitHub LinkedIn