42
votes

We're given two sequences of lowercase latin alphabet letters. They're both the same length and have the same amount of given types of letters (the first has an equal number of t's as the second and so on). We are required to find the minimum number of swaps (by "swap" we mean changing the order of two neighboring letters) required to transform the first sequence into the second. We can safely assume every two sequences CAN be transformed into each other. We could do this with brute-force, but the sequences are too long for that.

Input:
The length of the sequences (at least 2, at most 999999) and then two sequences.

Output:
An integer representing the number of swaps needed for the sequences to become the same.

Example:
{5, aaaaa, aaaaa} should output {0},
{4, abcd, acdb} should output {2}.

The first thing that came to my mind was bubblesort. We can simply bubblesort the sequence counting each swap. The problem is: a) it's O(n^2) worst-case b) I'm not convinced it would give me the smallest number for every case... Even the optimized bubblesort doesn't seem to be doing the trick. We could implement the cocktail sort which would solve the problem with turtles - but will it give me the best performance? Or maybe there's something simpler/faster?

This question can also be phrased as: How can we determine the edit distance between two strings when the only operation allowed is transposition?

6
Not really, prof gave us this today and before we got to work, the bell rang. It's not our homework but I find it interesting and would like to find out the way to solve it.Positive Int
No, it's not. There, you can swap any two cells - here, only the adjacent.Positive Int
ah -- right you are, I missed that detailJason S
If you can only swap adjacent cells, then a bubblesort is probably the optimal method. Amazingly enough, it was long ago proved to be optimal under essentially those circumstances. About the only possibility I can see for improvement would be a Shakersort (bubblesort where you alternate directions). I'm not sure that'll do any better either though -- I think it really only stands a chance of reduce the number of comparisons, not swaps.Jerry Coffin

6 Answers

10
votes

Here's a simple and efficient solution:

Let Q[ s2[i] ] = the positions character s2[i] is on in s2. Let P[i] = on what position is the character corresponding to s1[i] in the second string.

To build Q and P:

for ( int i = 0; i < s1.size(); ++i )
    Q[ s2[i] ].push_back(i); // basically, Q is a vector [0 .. 25] of lists

temp[0 .. 25] = {0}
for ( int i = 0; i < s1.size(); ++i )
    P[i + 1] = 1 + Q[ s1[i] ][ temp[ s1[i] ]++ ];

Example:

    1234
s1: abcd
s2: acdb
Q: Q[a = 0] = {0}, Q[b = 1] = {3}, Q[c = 2] = {1}, Q[d = 3] = {2}
P: P[1] = 1, P[2] = 4 (because the b in s1 is on position 4 in s2), P[3] = 2
   P[4] = 3

P has 2 inversions (4 2 and 4 3), so this is the answer.

This solution is O(n log n) because building P and Q can be done in O(n) and merge sort can count inversions in O(n log n).

13
votes

Regarding the minimum number of (not necessarily adjacent) swaps needed to convert a permutation into another, the metric you should use is the Cayley distance which is essentially the size of the permutation - the number of cycles.

Counting the number of cycles in a permutation is a quite trivial issue. A simple example. Suppose permutation 521634.

If you check the first position, you have 5, in the 5th you have 3 and in the 3rd you have 1, closing the first cycle. 2 is in the 2nd position, so it make a cycle itself and 4 and 6 make the last cycle (4 is in the 6th position and 6 in the 4th). If you want to convert this permutation in the identity permutation (with the minimum number of swaps), you need to reorder each cycle independently. The total number of swaps is the length of the permutation (6) minus the number of cycles (3).

Given any two permutations, the distance between them is equivalent to the distance between the composition of the first with the inverse of the second and the identity (computed as explained above). Therefore, the only thing you need to do is composing the first permutation and the inverse of the second and count the number of cycles in the result. All these operations are O(n), so you can get the minimum number of swaps in linear time.

8
votes

What you are looking for may be identical to the "Kendall tau distance", which is the (normalized) difference of concordant minus discordant pairs. See Wikipedia, where it is claimed that it is equivalent to the bubble sort distance.

In R, functions are avialable not only for computing tau, e.g.

cor( X, method="kendall", use="pairwise" ) ,

but also for testing the significance of the difference, e.g.

cor.test( x1, x2, method="kendall" ) ,

and they are even able to properly take into account ties.

See here for more.

4
votes

"Kendall tau distance" algorithm is the exact solution in this case, where the number of swaps of adjacent elements must be found.

Example.

eyssaasse (base string)
seasysaes

Base string provides indexes for each element: e=0, y=1, s=2, s=3, a=4, a=5, s=6, s=7, e=8;

Some elements are duplicate, so:
1) Create a dictionary where elements are keys, and values are lists of indices:

idx = {'e'=>[0, 8], 'y'=>[1], 's'=>[2, 3, 6, 7], 'a'=>[4, 5]}

2) Create an index map of the second string using element indexes in the idx dictionary:

seasysaes -> 204316587 (loop 'seasysaes' and pop next index from lists for each key in idx)

3) Create a list of all paired combinations of this map, 204316587: 20 24 23 21 26 25 28 27 04 03 01 06 ... 65 68 67 58 57 87;
Loop through these pairs counting those where first number bigger than second number.
This count is the sought-for number of adjacent swaps between strings.

Python script:

from itertools import combinations, cycle

word = 'eyssaasse' # base string
cmpr = 'seasysaes' # a string to find number of swaps from the base string
swaps = 0

# 1)
chars = {c: [] for c in word}
[chars[c].append(i) for i, c in enumerate(word)]
for k in chars.keys():
    chars[k] = cycle(chars[k])

# 2)
idxs = [next(chars[c]) for c in cmpr]

# 3)
for cmb in combinations(idxs, 2):
    if cmb[0] > cmb[1]:
        swaps += 1

print(swaps)

Number of swaps between 'eyssaasse' and 'seasysaes' is 7.
For 'reviver' and 'vrerevi' it's 8.

2
votes

I have written a class Permutation which among other things can return a number of transpositions needed to convert given permutation into identity. This is done by creating orbits (cycles) and counting their lengths. Terminology is taken from Kostrikin A., I., "Introduction to Linear Algebra I".

Includes:

#include <iostream>
#include <vector>
#include <set>
#include <algorithm>
#include <iterator>

class Permutation:

class Permutation {
public:
    struct ei_element {    /* element of the orbit*/
        int e; /* identity index */
        int i; /* actual permutation index */
    };
    typedef std::vector<ei_element> Orbit; /* a cycle */

    Permutation( std::vector<int> const& i_vector);
    /* permute i element, vector is 0 indexed */
    int pi( int i) const { return iv[ i - 1]; }
    int i( int k) const { return pi( k); } /* i_k = pi(k) */
    int q() const { /* TODO: return rank = q such that pi^q = e */ return 0; }
    int n() const { return n_; }
    /* return the sequence 1, 2, ..., n */
    std::vector<int> const& Omega() const { return ev; }
    /* return vector of cycles */
    std::vector<Orbit> const& orbits() const { return orbits_; }
    int l( int k) const { return orbits_[ k].size(); } /* length of k-th cycle */
    int transpositionsCount() const;  /* return sum of all transpositions */
    void make_orbits();

    private:
    struct Increment {
        int current;
        Increment(int start) : current(start) {}
        int operator() () {
            return current++;
        }
    };
    int n_;
    std::vector<int> iv; /* actual permutation */
    std::vector<int> ev; /* identity permutation */
    std::vector<Orbit> orbits_;
};

Definitions:

Permutation::Permutation( std::vector<int> const& i_vector) : 
                                                      n_( i_vector.size()), 
                                                      iv( i_vector), ev( n_) {
        if ( n_) { /* fill identity vector 1, 2, ..., n */
            Increment g ( 1);
            std::generate( ev.begin(), ev.end(), g);
        }
}

/* create orbits (cycles) */
void Permutation::make_orbits() {
    std::set<int> to_visit( ev.begin(), ev.end()); // identity elements to visit
    while ( !to_visit.empty()) {
        /* new cycle */
        Orbit orbit;
        int first_to_visit_e = *to_visit.begin();
        to_visit.erase( first_to_visit_e);
        int k = first_to_visit_e; // element in identity vector
        /* first orbit element */
        ei_element element;
        element.e = first_to_visit_e;
        element.i = i( first_to_visit_e);
        orbit.push_back( element);
        /* traverse permutation until cycle is closed */
        while ( pi( k) != first_to_visit_e && !to_visit.empty()) {
            k = pi( k);
            ei_element element;
            element.e = k;
            element.i = pi( k);
            orbit.push_back( element);
            to_visit.erase( k);
        }
        orbits_.push_back( orbit);
    }
}

and:

/* return sum of all transpositions */
int Permutation::transpositionsCount() const {
    int count = 0;
    int k = 0;
    while ( k < orbits_.size()) {
        count += l( k++) - 1; /* sum += l_k - 1 */ 
    }
    return count;
}

usage:

/*
 * 
 */
int main(int argc, char** argv) {
                       //1, 2, 3, 4, 5, 6, 7, 8       identity (e)
    int permutation[] = {2, 3, 4, 5, 1, 7, 6, 8}; //  actual (i)
    std::vector<int> vp( permutation, permutation + 8);

    Permutation p( vp);
    p.make_orbits();
    int k = p.orbits().size();
    std::cout << "Number of cycles:" << k << std::endl;

    for ( int i = 0; i < k; ++i) {
        std::vector<Permutation::ei_element> v = p.orbits()[ i];
        for ( int j = 0; j < v.size(); ++j) {
            std::cout << v[ j].e << "," << v[ j].i << " | ";
        }
        std::cout << std::endl;
    }

    std::cout << "Steps needed to create identity permutation: " 
                                                << p.transpositionsCount();
    return 0;
}

output:

Number of cycles:3

1,2 | 2,3 | 3,4 | 4,5 | 5,1 |

6,7 | 7,6 |

8,8 |

Steps needed to create identity permutation: 5

RUN SUCCESSFUL (total time: 82ms)


coliru

2
votes

Converting permutation from one to another can be converted to a similar problem (Number of swaps in a permutation) by inverting the target permutation in O(n), composing the permutations in O(n) and then finding the number of swaps from there to an identity permutation. Given:

int P1[] = {0, 1, 2, 3}; // abcd
int P2[] = {0, 2, 3, 1}; // acdb

// we can follow a simple algebraic modification
// (see http://en.wikipedia.org/wiki/Permutation#Product_and_inverse):
// P1 * P = P2                   | premultiply P1^-1 *
// P1^-1 * P1 * P = P1^-1 * P2
// I * P = P1^-1 * P2
// P = P1^-1 * P2
// where P is a permutation that makes P1 into P2.
// also, the number of steps from P to identity equals
// the number of steps from P1 to P2.

int P1_inv[4];
for(int i = 0; i < 4; ++ i)
    P1_inv[P1[i]] = i;
// invert the first permutation O(n)

int P[4];
for(int i = 0; i < 4; ++ i)
    P[i] = P2[P1_inv[i]];
// chain the permutations

int num_steps = NumSteps(P, 4); // will return 2
// now we just need to count the steps

To count the steps, a simple algorithm can be devised, such as:

int NumSteps(int *P, int n)
{
    int count = 0;
    for(int i = 0; i < n; ++ i) {
        for(; P[i] != i; ++ count) // could be permuted multiple times
            swap(P[P[i]], P[i]); // look where the number at hand should be
    }
    // count number of permutations

    return count;
}

This always swaps an item for a place where it should be in the identity permutation, therefore at every step it undoes and counts one swap. Now, provided that the number of swaps it returns is indeed minimum, the runtime of the algorithm is bounded by it and is guaranteed to finish (instead of getting stuck in an infinite loop). It will run in O(m) swaps or O(m + n) loop iterations where m is number of swaps (the count returned) and n is number of items in the sequence (4). Note that m < n is always true. Therefore, this should be superior to O(n log n) solutions, as the upper bound is O(n - 1) of swaps or O(n + n - 1) of loop iterations here, which is both practically O(n) (constant factor of 2 omitted in the latter case).

The algorithm will only work for valid permutations, it will loop infinitely for sequences with duplicate values and will do out-of-bounds array access (and crash) for sequences with values other than [0, n). A complete test case can be found here (builds with Visual Studio 2008, the algorithm itself should be fairly portable). It generates all possible permutations of lengths 1 to 32 and checks against solutions, generated with breadth first search (BFS), seems to work for all of permutations of lengths 1 to 12, then it becomes fairly slow but I assume it will just continue working.