2
votes

Dealing with plain text sequence files (fasta sequences for most case) is not very efficient. I really want to deal with python objects (str or so) instead of fasta file. What I need is simply:

>>> s1 = Seq('atgctttccg....act')
>>> s2 = Seq( 'tactttccg....tat')
>>> result = align(s1, s2, scoring_matrix)
>>> result.identity, result.score, result.expect
(79.37, 1086, 9e-105)
>>> result.alignment
('atgctttccg....act--','-tactttccg....tat')

Thus, I can also avoid parsing output files repetitively, which is boring, time-consuming and error-prone. I don't expect high performance. I'm planing to write a python extension implementing Smith-Waterman's algorithm but wondering: 1. Is there an existing module for my need? 2. Any recommended reading for common optimization of a Smith-Waterman alignment implementation?

Any suggestions appreciated.

1

1 Answers

1
votes

I use Biopython to handle/parse fasta files. It is great. I won't be hard to implement Smith-Waterman algorithm in python, but slow. Good luck.