Split amino acid and nucleotide sequences
A labmate got a FASTA file of sequences that had a mix of amino acids and nucleotides that she wanted separated into separate files. Here's a little script to do that. Again, I wish there was an easier way to get the basename for a file that does not have the extension.
#!/usr/bin/env perl6sub MAIN (Str $file) {
my $ext = $file.IO.extension;
(my $base = $file.IO.basename) ~~ s/\.$ext$//;
my $dna = open "$base.fna", :w;
my $aa = open "$base.faa", :w;
my $fh = open $file, :r, nl-in => '>';for $fh.lines -> $rec {
next unless $rec;
my ($header, @seqs) = $rec.split(/\n/);
my $seq = @seqs.join;
my $out = $seq ~~ m:i/^<[actgn]>+$/ ?? $dna !! $aa;
$out.put(join("\n", ">$header", $seq));
}put "Done.";
}
Leave a comment