what to know about aligning

I am a biologist not a bioinformatician, I have two group of sequences (they are nucleotide and in fasta format), each group includes around 40,000 sequences ranging from 100 bp to 12 kb. I want to know how can I align the sequences from a group to the another and find the best pair for each fragment. Can I do it through Perl? if so how can I do that? is there any softwares that can I use?
second Q
how can I find secondary structures of the sequences in each groups? which program should I use?


I know nothing about biology or bioinformatics, but I strongly suspect that you should be looking at BioPerl.

Hi "ms". I recommend you also post your question at http://www.biostars.org/.

Also, there's a good bioinformatics community on Stack Overflow - http://stackoverflow.com/questions/tagged/bioperl

Saying this as a BioPerl core developer, I highly suggest looking at C/C++ based options; pure Perl-based solutions are technically possible using purely Perl but not recommended due to the overhead (memory used, length of time, etc). Particularly if you have long sequences.

If you look through BioPerl you'll note we wrap commonly-used aligners; even our Smith-Waterman code relies on (unwieldy, out-of-date) XS bindings to a C library.

Leave a comment

About ms

user-pic I blog about Perl.