2
votes

I am trying to get my Levenshtein Edit Distance algorithm working but for some reason, the number of edits is coming out incorrect. I can't see where my mistake is and I was wondering if someone see's what I am doing incorrectly.

Input

5
ATCGTT
AGTTAC
ACGAAT
CCGTAAAT
TTACGACCAGT

expected output

Strand A: ATCGTT--
Strand B: A--GTTAC
Edit Distance: 4

Strand A: ATCG-TT
Strand B: A-CGAAT
Edit Distance: 3

Strand A: ATCGT---T
Strand B: -CCGTAAAT
Edit Distance: 5

Strand A: AT-CG----TT
Strand B: TTACGACCAGT
Edit Distance: 7

Strand A: AGTTAC
Strand B: ACGAAT
Edit Distance: 4

Strand A: -AGT-TAC
Strand B: CCGTAAAT
Edit Distance: 5

Strand A: --A-G-TTA-C
Strand B: TTACGACCAGT
Edit Distance: 8

Strand A: ACG--AAT
Strand B: CCGTAAAT
Edit Distance: 3

Strand A: --ACGA--A-T
Strand B: TTACGACCAGT
Edit Distance: 5

Strand A: --CCG-TAAAT
Strand B: TTACGACCAGT
Edit Distance: 7

my output

Strand A: ATCGT-
Strand B: AGTTAC
Edit Distance: 5

Strand A: ATC-T-
Strand B: ACGAAT
Edit Distance: 5

Strand A: ATC-T-
Strand B: CCGTAAAT
Edit Distance: 5

Strand A: A-C-T-
Strand B: TTACGACCAGT
Edit Distance: 10

Strand A: AGTTAC
Strand B: ACGAAT
Edit Distance: 5

Strand A: AG-TAC
Strand B: CCGTAAAT
Edit Distance: 6

Strand A: A--T-C
Strand B: TTACGACCAGT
Edit Distance: 7

Strand A: AC-AAT
Strand B: CCGTAAAT
Edit Distance: 7

Strand A: AC---T
Strand B: TTACGACCAGT
Edit Distance: 8

Strand A: CC-TAAAT
Strand B: TTACGACCAGT
Edit Distance: 8

findEditDistance

void EditDistance::findEditDistance()
{
    int upperValue, leftValue, diagonalValue;

    for (int i = 0; i < mLengthX; ++i)
    {
        table[i][0].stringLength = i;
    }

    for (int i = 0; i < mLengthY; ++i)
    {
        table[0][i].stringLength = i;
    }

    for (int i = 1; i < mLengthX; ++i)
    {
        for (int j = 1; j < mLengthY; ++j)
        {
            if (mStringX[i] == mStringY[j])
            {
                table[i][j].direction = DIAGONAL;
                table[i][j].stringLength = table[i - 1][j -1].stringLength;
            }
            else
            {
                upperValue = table[i - 1][j].stringLength;
                leftValue = table[i][j - 1].stringLength;
                diagonalValue = table[i - 1][j - 1].stringLength;

                if (upperValue < leftValue)
                {
                    if (upperValue < diagonalValue)
                    {
                        //upper is the lowest
                        table[i][j].stringLength = table[i - 1][j].stringLength + 1;
                        table[i][j].direction = UP;
                    }
                    else
                    {
                        //diagonal is lowest
                        table[i][j].stringLength = table[i - 1][j -1].stringLength + 1;
                        table[i][j].direction = DIAGONAL;
                    }
                }
                else if (leftValue < diagonalValue)
                {
                    //left is lowest
                    table[i][j].stringLength = table[i][j - 1].stringLength + 1;
                    table[i][j].direction = LEFT;
                }
                else
                {
                    //diagonal is lowest
                    table[i][j].stringLength = table[i - 1][j -1].stringLength + 1;
                    table[i][j].direction = DIAGONAL;
                }
            }
        }
    }   
}

getDistance

void EditDistance::getDistance()
{
    int i = mStringX.length() - 1;
    int j = mStringY.length() - 1;

    numEdits = 0;

    updateStrands (i, j);
}

updateStrands

void EditDistance::updateStrands (int i, int j)
{
    if (i == 0 || j == 0)
    {
        return;
    }

    if (table[i][j].direction == DIAGONAL)
    {
        ++numEdits;
        updateStrands (i - 1, j - 1);
    }
    else if (table[i][j].direction == UP)
    {
        mStringY[j] = '-';
        ++numEdits;
        updateStrands (i - 1, j);
    }
    else
    {
        mStringX[i] = '-';
        ++numEdits;
        updateStrands (i, j - 1);
    }
}
1
Can you provide what your input and output vs expected output was?kristianp
Oh yeah I can do that. Let me do that really quickRyan Newman
You've only posted one set of outputs.samgak
Could you tell us which ones are incorrect, and what the correct answers ought to have been? It would save us some time.Beta
Try printing the whole distance table - it will help you debug the processOphir Yoktan

1 Answers

1
votes

The problem with the edit distance is in your updateStrands. It counts diagonal moves as 1, when in fact a diagonal move can have a distance of 1 (substitution) or 0 (match). You could fix this in updateStrands, but there's really no need to do the calculation there at all, when the number is already in the lower-right corner of table.

If you want the correct "strands" (e.g. "ATCGTT--" and "A--GTTAC"), you will have to make corrections in updateStrands (you change elements of the strings where you should insert), getDistance (you start in the wrong place) and findEditDistance (you neglect to assign values to direction along the upper and left edges when you set stringLength to i).