2
votes

I am using python to doing multiple sequence alignment.for evaluate the alignment I use Weighted sum of pairs score (WSP) for three sequences seq1, seq2 and seq3, as we know the score is calculate as follows: first calculate the score of (seq1,seq2), and score of(seq1,seq3) and score of (seq2,seq3)

WSP=score(seq1,seq2)+score(seq1,seq3)+score(seq2,seq3)

python code:

def wsp():
        w=1
        dis=sum_distance(seq1,seq2,seq3)
        wsp=w*dis
        return wsp

now, I want to use a fasta file which contains many sequences.how can I calculate the WSP score for all sequences in a fasta file.

where sum_distance is a function to calculate distances between sequences

1
What is the expected output? A 3d-matrix?Willem Van Onsem
Do you want to get all possible triples of sequences, and compute the WSP of each triple?inspectorG4dget
Have you looked into using Biopython? I haven't checked if it includes WSP or sum of distances, but I wouldn't be surprised if it did. It includes all sorts of tools for working with and aligning sequences, and is much easier than coding everything yourself.MattDMo
for example for the three sequences seq1='AG-GT' seq2='AG-GT and seq3='ACT-T' the WSP function for the three sequences score gives 8.user3216969
not for each triple but for all sequences in the fileuser3216969

1 Answers

1
votes

The simplest way is to run sum_distance function over each pair of sequences in your file:

total_distance = 0  
with open('yourfile.fa', 'r') as sequences_list_1:
    for key_1, seq_1 in enumerate(sequences_list_1):
        with open('yourfile.fa', 'r') as sequences_list_2:
            for key_2, seq_2 in enumerate(sequences_list_2):
                if key_1 < key_2:
                    total_distance += sum_distance(seq_1, seq_2)