Split amino acid and nucleotide sequences

By Ken Youens-Clark on October 5, 2016 11:43 PM

A labmate got a FASTA file of sequences that had a mix of amino acids and nucleotides that she wanted separated into separate files. Here's a little script to do that. Again, I wish there was an easier way to get the basename for a file that does not have the extension.

#!/usr/bin/env perl6

sub MAIN (Str $file) {

    my $ext   = $file.IO.extension;

    (my $base = $file.IO.basename) ~~ s/\.$ext$//;

    my $dna   = open "$base.fna", :w;

    my $aa    = open "$base.faa", :w;

    my $fh    = open $file, :r, nl-in => '>';

    for $fh.lines -> $rec {

        next unless $rec;

        my ($header, @seqs) = $rec.split(/\n/);

        my $seq = @seqs.join;

        my $out = $seq ~~ m:i/^<[actgn]>+$/ ?? $dna !! $aa;

        $out.put(join("\n", ">$header", $seq));

    }

    put "Done.";

}

0 comments

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Ken Youens-Clark

I work for Dr. Bonnie Hurwitz at the University of Arizona where I use Perl quite a bit in bioinformatics and metagenomics. I am also trying to write a book at https://www.gitbook.com/book/kyclark/metagenomics/details. Comments welcome.

More info »

kyclark

Split amino acid and nucleotide sequences

Leave a comment

About Ken Youens-Clark

Search this blog