Python for-loop stopping prematurely

Question

I'm trying to convert a DNA sequence to an amino acid sequence. I have a dictionary of codons:

codon_mapping = {'AAA': 'K','AAC': 'N','AAG': 'K','AAT': 'N','ACA': 'T','ACC': 'T','ACG': 'T','ACT': 'T','AGA': 'R','AGC': 'S','AGG': 'R','AGT': 'S','ATA': 'I','ATC': 'I','ATG': 'M','ATT': 'I','CAA': 'Q','CAC': 'H','CAG': 'Q','CAT': 'H','CCA': 'P','CCC': 'P','CCG': 'P','CCT': 'P','CGA': 'R','CGC': 'R','CGG': 'R','CGT': 'R','CTA': 'L','CTC': 'L','CTG': 'L','CTT': 'L','GAA': 'E','GAC': 'D','GAG': 'E','GAT': 'D','GCA': 'A','GCC': 'A','GCG': 'A','GCT': 'A','GGA': 'G','GGC': 'G','GGG': 'G','GGT': 'G','GTA': 'V','GTC': 'V','GTG': 'V','GTT': 'V','TAA': '*','TAC': 'Y','TAG': '*','TAT': 'Y','TCA': 'S','TCC': 'S','TCG': 'S','TCT': 'S','TGA': '*','TGC': 'C','TGG': 'W','TGT': 'C','TTA': 'L','TTC': 'F','TTG': 'L','TTT': 'F'}

And an input sequence:

seq = 'ATGTATGGCTAGCTTACTACTGCGCACTGATGTGGCTATCGATCGCTGGTCGTTGCTGACCGAGCTAAA'

I currently have this code:

#import re
import re

#find the start codons in the sequence
starts=[m.start() for m in re.finditer('ATG', seq)]

#establish new dictionary
seqDictionary={}
#translate sequences
for i in starts:
    mySeq=seq[i:]
    translated=''
    for n in range(0, len(mySeq), 3):
        print(mySeq[n:n+3])
        if codon_mapping[mySeq[n:n+3]] != '*':
            translated += codon_mapping[mySeq[n:n+3]]
        if codon_mapping[seq[n:n+3]] == '*':
            break 
    print("translated: " + translated)
    seqDictionary[i]=(translated)
print(seqDictionary)
            
AA_frame1 = seqDictionary[0] 
AA_frame2 = seqDictionary[4] 
AA_frame3 = seqDictionary[29]
AA_longest = None

the problem is that for the second and third sequences (from positions 4 and 29, respectively), the for-loop exits after the fourth amino acid, even though those are not stop codons.

The output of the above code is:

ATG
TAT
GGC
TAG
translated: MYG
ATG
GCT
AGC
TTA
translated: MASL
ATG
TGG
CTA
TCG
translated: MWLS
{0: 'MYG', 4: 'MASL', 29: 'MWLS'}

I'm not getting any error messages, and I can't figure out why the loop is exiting. I know the correct solutions for the translated sequences are:

MYG
MASLLLRTDVAIDRWSLLTEL
MWLSIAGRC

Edit, this final code worked:

#import re
import re

#find the start codons in the sequence
starts=[m.start() for m in re.finditer('ATG', seq)]

#establish new dictionary
seqDictionary={}
#translate sequences
for i in starts:
    mySeq=seq[i:]
    translated=''
    for n in range(0, len(mySeq), 3):
        if len(mySeq[n:n+3]) < 3:
            break
        if codon_mapping[mySeq[n:n+3]] == '*':
            break
        else:
            translated += codon_mapping[mySeq[n:n+3]]
    seqDictionary[i]=(translated)
print(seqDictionary)

Output:

{0: 'MYG', 4: 'MASLLLRTDVAIDRWSLLTEL', 29: 'MWLSIAGRC'}

is it because of a typo? if codon_mapping[seq[n:n+3]] == '*':, should the seq be mySeq? — adrtam
I also suspect some confusions related to seq and mySeq. Why not if codon_mapping[mySeq[n:n+3]] != '*': #...; else: break ? — bli

Shannon Shannon · Accepted Answer · 2020-10-26T02:52:48

if codon_mapping[mySeq[n:n+3]] != '*':
    translated += codon_mapping[mySeq[n:n+3]]
if codon_mapping[seq[n:n+3]] == '*':
    break

here you are not checking the same thing. First if is checking mySeq, second if is checking seq.

this is better written as an if else than two ifs

if codon_mapping[mySeq[n:n+3]] == '*':
    break
else:
    translated += codon_mapping[mySeq[n:n+3]]

Python for-loop stopping prematurely

2 Answers