Ken Youens-Clark

  • About: I work for Dr. Bonnie Hurwitz at the University of Arizona where I use Perl quite a bit in bioinformatics and metagenomics. I am also trying to write a book at Comments welcome.
  • Posted Closures, alternatives, map in Perl 6 to kyclark

    In a script I recently wrote, I employed a few features of Perl 6 that I'd like to highlight. I'm using Mash to create a distance matrix of samples (usually metagenomes or genomes) to each other, either in a…

  • Posted The MAIN Thing to kyclark

    If you're coming to Perl 6 from Perl 5, the global variable @*ARGS will be familiar to you as the place to get the command-line arguments to your program:

    $ cat main1.pl6
  • Posted Backticks and tests in Perl 6 to kyclark

    Perl was created for systems administration, and Perl 6 has all the chops you've come to expect from the brand. Here I needed to use MD5 checksums from my collaborator to verify that I downloaded all their data without errors. Each data "$file" has an accompanying "$file.md5" that looks like…

  • Posted Movie file reader to kyclark

    Last night I finally got to see The Martian. It was a fun movie, and it seems much of the science was solid. One thing that filmmakers still like to do is have computers spit out messages one-character-at-a-time as if they were arriving like telegrams. If you would like to read a file like…

  • Posted Finding cheaters with k-mers to kyclark

    This semester I'm teaching Perl 6 to beginners. On a recent homework, student A came to see me for help, so I pretty much wrote the script (if you come for help, you get help!). With every assignment, I provide a "test.pl6" script that lets the students know if they will pass. I stress that…

  • Posted Bouncy balls with Perl 6 to kyclark

    I've never written games before, but I previously posted a Hangman that I thought was fun. I love the examples of forest fire and

  • Commented on Web development with Perl 5
    I put all the relevant code/data here:
  • Commented on FASTQ read-pairer
    Sorry, yes, here is the code and sample data: I should also have elaborated on the problem. Usually the R1/R2 files have each read's forward/reverse (respectively) mate in the same order/location. That is, in the raw FASTQ files, read...
  • Posted FASTQ read-pairer to kyclark

    In bioinformatics, we get data from the sequencing centers usually in BAM or FASTQ format. One of the first steps is to QC the data to remove any reads that have very low quality or which are too short to use. Some sequencing technologies read from both the beginning and the end of the sequence…

  • Posted Pick to kyclark

    I love the new "pick" method of lists in Perl 6. Here's a handy Shakespearean insult generator (cf

    #!/usr/bin/env perl6

    sub MAIN (Int :$n=1) {

  • Posted Web development with Perl 5 to kyclark

    Even though I am in the thralls of Perl 6, I still do all my web development in Perl 5 because the ecology of modules is so mature. Here I will describe how I typically go about creating a website. For example, I will reference a small project I built for an affordable housing non-profit in…

  • Posted Hangman to kyclark

    So I'll confess that I've had a big crush on Haskell for a couple of years now. I've tried and failed many times to really get beyond trivial code, but I'm utterly fascinated by the code one can write with strong, static typing. It can feel contrived at times and very constraining, but I can…

  • Posted Split amino acid and nucleotide sequences to kyclark

    A labmate got a FASTA file of sequences that had a mix of amino acids and nucleotides that she wanted separated into separate files. Here's a little script to do that. Again, I wish there was an easier way to get the basename for a file that does not have the extension.

  • Posted Yet another FASTA something to kyclark

    Yes, I write a lot of scripts having to do with parsing FASTA. I want to put lots of example code out into the wild so people have things to read and copy. In this example, a student needed to take a subset of reads from a set of FASTA files as her assembly was failing due to excessive memory…

  • Commented on FASTA splitter
    Nice use of "rotor" and "nl-in" to shorten up the code, Pawel! Here's a new version that I like much better using those ideas: #!/usr/bin/env perl6 sub MAIN ( Str :$fasta! where *.IO.f, Int :$number=100, Str :$out-dir=$*PROGRAM.IO.d ) { mkdir...
  • Commented on FASTA splitter
    Sorry, there's a bug in there in that I didn't re-incorporate the newlines from the original sequences. How embarrassing. Here's a corrected version: #!/usr/bin/env perl6 sub MAIN ( Str :$fasta! where *.IO.f, Int :$number=100, Str :$out-dir=$*PROGRAM.IO.d ) { mkdir $out-dir...
  • Commented on FASTA splitter
    BioPerl6 is a thing and a very good thing it is ( Sorry, I should have mentioned that I could have used their FASTA parser. Here I wanted to present a self-contained script for purposes of exploring just a couple...
  • Posted FASTA splitter to kyclark

    sub MAIN (
    Str :$fasta! where *.IO.f,
    Int :$number=100,
    Str :$out-dir=$*PROGRAM.IO.dirname
    ) {
    mkdir $out-dir unless $out-dir.IO.d;

    my $ext = $fasta.IO.extension;
    my $basename = $fasta.IO.basename;…

  • Commented on FASTQ to FASTQ with Perl 6
    Wow, thanks, Liz! That is really cool. Also, the title should have been "FASTQ to FASTA." Darn....
  • Commented on Cleaning up the IDs in a FASTA file
    Pawel, I accidentally stumbled into bioinformatics. I was a Perl hacker who got hired by Lincoln Stein back in 2001 to work at Cold Spring Harbor Lab. It was immediately fascinating and horribly intimidating. I have never felt even a...
  • Commented on Cleaning up the IDs in a FASTA file
    "starts-with" is a great thing to show my beginner students! I keep forgetting about that, but it's a nice borrow from Python. Thanks!...
  • Posted FASTQ to FASTQ with Perl 6 to kyclark

    sub MAIN (:$out-dir="", *@fastq) {
    if ($out-dir.chars > 0 && ! $out-dir.IO.d) {
    mkdir $out-dir;

    my $i = 0;
    for @fastq -> $fastq {
    (my $basename = $fastq.IO.basename) ~~ s/\.\w*?$//;

  • Posted Cleaning up the IDs in a FASTA file to kyclark

    I have some FASTA files with headers like this:

    >gi|83274083|ref|AC_000032.1| Mus musculus strain mixed chromosome 10, alternate assembly Mm_Celera, whole genome shotgun sequence

    I wanted to extract just the 2nd field, so here's a Perl 6 script to do that:

Subscribe to feed Recent Actions from Ken Youens-Clark

  • Pawel bbkr Pabian commented on FASTA splitter

    Thanks for sharing your knowledge. I've learned about $*SPEC from your post :)

    I'm not FASTA expert, but my approach would be to use ">" as input separator instead of "\n". And then simply push 100 lines (=sequences) into each file through rotor:

    # beware first empty element
    writer($_) for $fasta.IO.lines( nl-in => '>').[1..*].rotor(100, :partial);

    sub writer (@seqs) {
    # restore sequence start character
    $out.print('>', $_); for $seqs;

    Of course TIMTOWTDI, however - from my experience - delimiter approach is waaa…

  • Pawel bbkr Pabian commented on FASTA splitter

    My mistake, should be:

    $out.print('>', $_); for @seqs;

Subscribe to feed Responses to Comments from Ken Youens-Clark

About is a common blogging platform for the Perl community. Written in Perl with a graphic design donated by Six Apart, Ltd.