FASTQ to FASTQ with Perl 6

By Ken Youens-Clark on September 20, 2016 9:07 PM

#!/usr/bin/env perl6

sub MAIN (:$out-dir="", *@fastq) {

    if ($out-dir.chars > 0 && ! $out-dir.IO.d) {

        mkdir $out-dir;

    }

    my $i = 0;

    for @fastq -> $fastq {

        (my $basename = $fastq.IO.basename) ~~ s/\.\w*?$//;

        my $out-file = $*SPEC.catfile(

            $out-dir || $fastq.IO.dirname, $basename ~ '.fa');

        printf "%3d: %s -> %s\n",

            ++$i, $fastq.IO.basename, $out-file;

        my $out-fh   = open $out-file, :w;

        for $fastq.IO.lines -> $header, $seq, $break, $qual {

            # skip first "@"

            $out-fh.print('>' ~ $header.substr(1) ~ "\n");

            $out-fh.print($seq);

        }

        $out-fh.close;

    }

    put "Done.";

}

The FASTQ format is one of the worst conceived in the history of bioinformatics, and that's saying something. The only sane FASTQ format uses 4 lines per sequence: a header starting with an "@" sign, the sequence, the header repeated but starting with a "+" (or just the "+"), and the quality score (in either phred 33 or 40). Here's a sample:

@HWI-ST885:65:C07WUACXX:7:2302:1866:196007 1:N:0:GCCAAT
GTAAATGATGATCTGCCGCCGCAGCTCCTTTTTTTCTTTCAAGGCCAATTCGGTAGGCTTCAGCTTGGCGGAGCTTTCAATCACAGCGGCAT
+
BBBFFAFAIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIIIIIIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

What I thought would be fun to show off here is that you can read the contents of a list into more than one variable. Here I'd like to read four lines at a time, so I just read "lines" into four variables. How simple!

2 comments

2 Comments

Liz | September 21, 2016 9:00 AM | Reply

Cool!

FWIW, you don't have to name variables that you don't use (but that you do want to "consume") in the for loop. So:

for $fastq.IO.lines -> $header, $seq, $, $ { ... }

is also perfectly valid.

Ken Youens-Clark replied to comment from Liz | September 26, 2016 11:55 PM | Reply

Wow, thanks, Liz! That is really cool.

Also, the title should have been "FASTQ to FASTA." Darn.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Ken Youens-Clark

I work for Dr. Bonnie Hurwitz at the University of Arizona where I use Perl quite a bit in bioinformatics and metagenomics. I am also trying to write a book at https://www.gitbook.com/book/kyclark/metagenomics/details. Comments welcome.

More info »

kyclark