what to know about aligning

By ms on February 1, 2013 12:17 AM

Hi
I am a biologist not a bioinformatician, I have two group of sequences (they are nucleotide and in fasta format), each group includes around 40,000 sequences ranging from 100 bp to 12 kb. I want to know how can I align the sequences from a group to the another and find the best pair for each fragment. Can I do it through Perl? if so how can I do that? is there any softwares that can I use?
second Q
how can I find secondary structures of the sequences in each groups? which program should I use?
Thanks
MS

7 comments

7 Comments

Dave Cross | February 1, 2013 6:25 AM | Reply

I know nothing about biology or bioinformatics, but I strongly suspect that you should be looking at BioPerl.

ugexe | February 1, 2013 8:28 AM | Reply

I think you want to use http://search.cpan.org/~vbar/Algorithm-NeedlemanWunsch-0.03/lib/Algorithm/NeedlemanWunsch.pm

Matt Perry | February 1, 2013 6:48 PM | Reply

Hi "ms". I recommend you also post your question at http://www.biostars.org/.

Ether | February 2, 2013 7:35 PM | Reply

Also, there's a good bioinformatics community on Stack Overflow - http://stackoverflow.com/questions/tagged/bioperl

pyrimidine | February 4, 2013 1:49 AM | Reply

Saying this as a BioPerl core developer, I highly suggest looking at C/C++ based options; pure Perl-based solutions are technically possible using purely Perl but not recommended due to the overhead (memory used, length of time, etc). Particularly if you have long sequences.

If you look through BioPerl you'll note we wrap commonly-used aligners; even our Smith-Waterman code relies on (unwieldy, out-of-date) XS bindings to a C library.

ms | February 5, 2013 4:59 AM | Reply

Thank you all for your comments.

ms | February 5, 2013 5:48 AM | Reply

Thank you all for your comments.

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About ms

I blog about Perl.

More info »

ms