September 2016 Archives

Cleaning up the IDs in a FASTA file

By Ken Youens-Clark on September 20, 2016 8:00 PM

I have some FASTA files with headers like this:

>gi|83274083|ref|AC_000032.1| Mus musculus strain mixed chromosome 10, alternate assembly Mm_Celera, whole genome shotgun sequence

I wanted to extract just the 2nd field, so here's a Perl 6 script to do that:

#!/usr/bin/env perl6

use File::Temp;

sub MAIN (*@files) {

    my $i = 0;

    for @files -> $file {

        my ($tmpfile, $tmpfh) = tempfile();

        printf "%3d: %s\n", ++$i, $file.IO.basename;

        for $file.IO.lines -> $line {

            if $line.substr…

6 comments

Main Index | Archives | October 2016 »

About Ken Youens-Clark

I work for Dr. Bonnie Hurwitz at the University of Arizona where I use Perl quite a bit in bioinformatics and metagenomics. I am also trying to write a book at https://www.gitbook.com/book/kyclark/metagenomics/details. Comments welcome.

More info »

kyclark

September 2016 Archives

FASTA splitter

FASTQ to FASTQ with Perl 6

Cleaning up the IDs in a FASTA file

About Ken Youens-Clark

Search this blog