Of Go, C, Perl and fastq file conversion Vol I : intro
Next Generation Sequencing
(NGS) has really taken off the last few years, as both devices and the
cost of experiments have dramatically declined. NGS decipher the
identity (base composition, the sequence of letters in the alphabet of
DNA and RNA) of nucleic acids and return the results in the fastq open data format. Fastq
files are flat text files with a standardized layout: each molecule
present in the sample that is captured by the sequencer is represented
with four fields:
- a '@' character and is followed by a sequence identifier and an optional description
- one (typically) or more lines of characters in the four letter alphabet of nucleic acids
- a metadata field starting with the "+" optionally followed by the same sequence identifier and description as in the first field
- one, or more lines of the quality of each symbol sequence reported in field 2