Oh column, where art thou?
When ramiroencinas added FileSystem::Capacity::VolumesInfo to Perl 6 ecosystem I've spotted that it has no macOS support. And while trying to contribute to this module I've discovered how less known Perl 6 features can save the day. What FileSystem::Capacity::VolumesInfo module does is parsing output from df command, which looks like this:
$ df -k -P Filesystem 1024-blocks Used Available Capacity Mounted on /dev/disk3 1219749248 341555644 877937604 29% / devfs 343 343 0 100% /dev /dev/disk1s4 133638140 101950628 31687512 77% /Volumes/Untitled map -hosts 0 0 0 100% /net map auto_home 0 0 0 100% /home map -fstab 0 0 0 100% /Network/Servers //Pawel%20Pabian@biala-skrzynka.local./Data 1951417392 1837064992 114352400 95% /Volumes/Data /dev/disk5s2 1951081480 1836761848 114319632 95% /Volumes/Time Machine Backups bbkr@localhost:/Users/bbkr/foo 123 1219749248 341555644 877937604 29% /Volumes/osxfuse
(if you see wrapped or trimmed output check raw one here)
And while this looks nice for humans, it is tough task for parser.
- Columns have dynamic width - so values cannot be extracted by using substring with hardcoded positions.
- Columns are space separated, spaces padded and their values can contain spaces - so values cannot be extracted using split by whitespace.
- Filesystem names have different escaping patterns.
- Some columns are aligned to left, some to right, and one to middle.
So let's use Perl 6 features to deal with this mess.
Capture command line output.
my ($header, @volumes) = run('df', '-k', '-P', :out).out.lines;
Method run executes shell command and returns Proc object. Method out creates Pipe object to receive shell command output. Method lines splits this output into lines, first goes to $header variable, remaining to @volumes array.
Parse header.
my $parsed_header = $header ~~ /^ ('Filesystem') \s+ ('1024-blocks') \s+ ('Used') \s+ ('Available') \s+ ('Capacity') \s+ ('Mounted on') $/;
We do it because match object keeps each capture, and each capture knows from and to position at which it matched, for example:
say $parsed_header[1].Str; say $parsed_header[1].from; say $parsed_header[1].to;
Will return:
1024-blocks 44 55
That will help us a lot with dynamic columns width!
Extract row values.
First we have to look at border between Filesystem and 1024-blocks column. Because Filesystem is aligned to left and 1024-blocks is aligned to right so data from both columns can occupy space between those headers, for example:
Filesystem 1024-blocks /dev/someverybigdisk 111111111111111 me@host:/some directory 123 222222222222 | | |<----- possible ----->| |<--- border space --->|
We cannot simply split it by space. But we know where 1024-blocks columns ends, so the number that ends at the same position is our volume size. To extract it, we can use another useful Perl 6 feature - regexp position anchor.
for @volumes -> $volume { $volume ~~ / (\d+) <.at($parsed_header[1].to)> /; say 'Volume size is ' ~ $/[0] ~ 'KB'; }
That finds sequence of digits that are aligned to the position of the end of the header. Every other column can be extracted using this trick if we know the data alignment.
$volume ~~ / # first column is not used by module, skip it \s+ # 1024-blocks, right aligned (\d+) <.at($parsed_header[1].to)> \s+ # Used, right aligned (\d+) <.at($parsed_header[2].to)> \s+ # Available, right aligned (\d+) <.at($parsed_header[3].to)> \s+ # Capacity, middle aligned, never longer than header <.at($parsed_header[4].from)> \s* (\d+ '%') \s* <.at($parsed_header[4].to)> \s+ # Mounted on, left aligned <.at($parsed_header[5].from)>(.*) $/;
Profit!
By using header name positions and position anchors in regular expression we got bomb-proof df parser for macOS that works with regular disks, pendrives, NFS / AFS / FUSE shares, weird directory names and different escaping schemas.
Leave a comment