FASTQ to FASTQ with Perl 6

#!/usr/bin/env perl6

sub MAIN (:$out-dir="", *@fastq) {
if ($out-dir.chars > 0 && ! $out-dir.IO.d) {
mkdir $out-dir;

my $i = 0;
for @fastq -> $fastq {
(my $basename = $fastq.IO.basename) ~~ s/\.\w*?$//;
my $out-file = $*SPEC.catfile(
$out-dir || $fastq.IO.dirname, $basename ~ '.fa');
printf "%3d: %s -> %s\n",
++$i, $fastq.IO.basename, $out-file;
my $out-fh = open $out-file, :w;

for $fastq.IO.lines -> $header, $seq, $break, $qual {
# skip first "@"
$out-fh.print('>' ~ $header.substr(1) ~ "\n");

put "Done.";

The FASTQ format is one of the worst conceived in the history of bioinformatics, and that's saying something. The only sane FASTQ format uses 4 lines per sequence: a header starting with an "@" sign, the sequence, the header repeated but starting with a "+" (or just the "+"), and the quality score (in either phred 33 or 40). Here's a sample:

@HWI-ST885:65:C07WUACXX:7:2302:1866:196007 1:N:0:GCCAAT

What I thought would be fun to show off here is that you can read the contents of a list into more than one variable. Here I'd like to read four lines at a time, so I just read "lines" into four variables. How simple!



FWIW, you don't have to name variables that you don't use (but that you do want to "consume") in the for loop. So:

for $fastq.IO.lines -> $header, $seq, $, $ { ... }

is also perfectly valid.

Leave a comment

About Ken Youens-Clark

user-pic I work for Dr. Bonnie Hurwitz at the University of Arizona where I use Perl quite a bit in bioinformatics and metagenomics. I am also trying to write a book at Comments welcome.