Split amino acid and nucleotide sequences

A labmate got a FASTA file of sequences that had a mix of amino acids and nucleotides that she wanted separated into separate files. Here's a little script to do that. Again, I wish there was an easier way to get the basename for a file that does not have the extension.

#!/usr/bin/env perl6

sub MAIN (Str $file) {
my $ext = $file.IO.extension;
(my $base = $file.IO.basename) ~~ s/\.$ext$//;
my $dna = open "$base.fna", :w;
my $aa = open "$base.faa", :w;
my $fh = open $file, :r, nl-in => '>';

for $fh.lines -> $rec {
next unless $rec;
my ($header, @seqs) = $rec.split(/\n/);
my $seq = @seqs.join;
my $out = $seq ~~ m:i/^<[actgn]>+$/ ?? $dna !! $aa;
$out.put(join("\n", ">$header", $seq));
}

put "Done.";
}

Leave a comment

About Ken Youens-Clark

user-pic I work for Dr. Bonnie Hurwitz at the University of Arizona where I use Perl quite a bit in bioinformatics and metagenomics. I am also trying to write a book at https://www.gitbook.com/book/kyclark/metagenomics/details. Comments welcome.