Yet another FASTA something

Yes, I write a lot of scripts having to do with parsing FASTA. I want to put lots of example code out into the wild so people have things to read and copy. In this example, a student needed to take a subset of reads from a set of FASTA files as her assembly was failing due to excessive memory requirements. This script will process a list of FASTA files to take the first "n" (default 100,000) reads from each, placing the output file into "out-dir" (or the same directory as the input file). So as not to accidentally overwrite the original data, I append an integer (1, 2, ...) if the output file already exists.

#!/usr/bin/env perl6

sub MAIN (Str :$out-dir='', :$n=100_000, *@files) {
USAGE unless @files;

if ($out-dir.chars > 0 && !$out-dir.IO.d) {
mkdir $out-dir;
}

my $fnum = 0;
for @files -> $file {
unless $file.IO.f {
USAGE("file ($file) not a file");
}

my $dir = $out-dir || $file.IO.dirname;
my $outfile = $*SPEC.catfile($dir, $file.IO.basename);
while $outfile.IO.e {
state $f = 0;
$outfile = $*SPEC.catfile($dir,
$file.IO.basename ~ '.' ~ ++$f);
}

my $out = open $outfile, :w;
my $i = 0;
printf "%3d: %s\n", ++$fnum, $file.IO.basename;
for $file.IO.lines -> $line {
if $line.starts-with('>') {
last if ++$i > $n;
}

$out.put($line);
}
}

printf "Done, processed %s file%s.\n",
$fnum, $fnum == 1 ?? '' !! 's';
}

sub USAGE ($message='') {
printf "%sUsage:\n %s [-n=x] [--out-dir=x] file1 [file2...]\n",
$message ?? "$message\n" !! '',
$*PROGRAM.IO.basename;
exit 1;
}

Leave a comment

About Ken Youens-Clark

user-pic I work for Dr. Bonnie Hurwitz at the University of Arizona where I use Perl quite a bit in bioinformatics and metagenomics. I am also trying to write a book at https://www.gitbook.com/book/kyclark/metagenomics/details. Comments welcome.