88
votes

When we sort a list, like

a = [1,2,3,3,2,2,1]
sorted(a) => [1, 1, 2, 2, 2, 3, 3]

equal elements are always adjacent in the resulting list.

How can I achieve the opposite task - shuffle the list so that equal elements are never (or as seldom as possible) adjacent?

For example, for the above list one of the possible solutions is

p = [1,3,2,3,2,1,2]

More formally, given a list a, generate a permutation p of it that minimizes the number of pairs p[i]==p[i+1].

Since the lists are large, generating and filtering all permutations is not an option.

Bonus question: how to generate all such permutations efficiently?

This is the code I'm using to test the solutions: https://gist.github.com/gebrkn/9f550094b3d24a35aebd

UPD: Choosing a winner here was a tough choice, because many people posted excellent answers. @VincentvanderWeele, @David Eisenstat, @Coady, @enrico.bacis and @srgerg provided functions that generate the best possible permutation flawlessly. @tobias_k and David also answered the bonus question (generate all permutations). Additional points to David for the correctness proof.

The code from @VincentvanderWeele appears to be the fastest.

12
So you care only for equality? something like [1, 2, 1, 3, 1, 4, 1, 5] is exactly the same as [1, 3, 1, 2, 1, 4, 1, 5] by your criterion?Bakuriu
There cannot be an "efficient" algorithm. Take a list like [1, 1, 1, ..., 2, 3, 4, ..., N] with 2N elements. You can put a number n > 1 between each pair of consecutive 1 to obtain a good permutation. Then you permute the N/2 elements and obtain all valid permutations (meaning none is a bad one, but there may be more). The number of such permutations is O(N^2), so you cannot do better than O(N^2). Still better than O(N^3) of the naive approach though.Bakuriu
@Bakuriu: Two things: (1) to be clear, your example shows that there can be no efficient algorithm for the bonus question. (2) Enumerating all optimal solutions to your example is O((N/2)!), which is much worse than O(N^2) (i.e. your example is much stronger than you realised :-)j_random_hacker
@msw: I'm making a website, and there is a row with ads blocks from different providers. I want to arrange them so that no blocks from the same provider stand side by side.georg
I wouldn't say that this is "not even close to a duplicate", but the alleged duplicate is a different question, since the distance between identical elements is considered. People who voted to close after WhyCry's comment: please pay more attention in the future.David Eisenstat

12 Answers

30
votes

This is along the lines of Thijser's currently incomplete pseudocode. The idea is to take the most frequent of the remaining item types unless it was just taken. (See also Coady's implementation of this algorithm.)

import collections
import heapq


class Sentinel:
    pass


def david_eisenstat(lst):
    counts = collections.Counter(lst)
    heap = [(-count, key) for key, count in counts.items()]
    heapq.heapify(heap)
    output = []
    last = Sentinel()
    while heap:
        minuscount1, key1 = heapq.heappop(heap)
        if key1 != last or not heap:
            last = key1
            minuscount1 += 1
        else:
            minuscount2, key2 = heapq.heappop(heap)
            last = key2
            minuscount2 += 1
            if minuscount2 != 0:
                heapq.heappush(heap, (minuscount2, key2))
        output.append(last)
        if minuscount1 != 0:
            heapq.heappush(heap, (minuscount1, key1))
    return output

Proof of correctness

For two item types, with counts k1 and k2, the optimal solution has k2 - k1 - 1 defects if k1 < k2, 0 defects if k1 = k2, and k1 - k2 - 1 defects if k1 > k2. The = case is obvious. The others are symmetric; each instance of the minority element prevents at most two defects out of a total of k1 + k2 - 1 possible.

This greedy algorithm returns optimal solutions, by the following logic. We call a prefix (partial solution) safe if it extends to an optimal solution. Clearly the empty prefix is safe, and if a safe prefix is a whole solution then that solution is optimal. It suffices to show inductively that each greedy step maintains safety.

The only way that a greedy step introduces a defect is if only one item type remains, in which case there is only one way to continue, and that way is safe. Otherwise, let P be the (safe) prefix just before the step under consideration, let P' be the prefix just after, and let S be an optimal solution extending P. If S extends P' also, then we're done. Otherwise, let P' = Px and S = PQ and Q = yQ', where x and y are items and Q and Q' are sequences.

Suppose first that P does not end with y. By the algorithm's choice, x is at least as frequent in Q as y. Consider the maximal substrings of Q containing only x and y. If the first substring has at least as many x's as y's, then it can be rewritten without introducing additional defects to begin with x. If the first substring has more y's than x's, then some other substring has more x's than y's, and we can rewrite these substrings without additional defects so that x goes first. In both cases, we find an optimal solution T that extends P', as needed.

Suppose now that P does end with y. Modify Q by moving the first occurrence of x to the front. In doing so, we introduce at most one defect (where x used to be) and eliminate one defect (the yy).

Generating all solutions

This is tobias_k's answer plus efficient tests to detect when the choice currently under consideration is globally constrained in some way. The asymptotic running time is optimal, since the overhead of generation is on the order of the length of the output. The worst-case delay unfortunately is quadratic; it could be reduced to linear (optimal) with better data structures.

from collections import Counter
from itertools import permutations
from operator import itemgetter
from random import randrange


def get_mode(count):
    return max(count.items(), key=itemgetter(1))[0]


def enum2(prefix, x, count, total, mode):
    prefix.append(x)
    count_x = count[x]
    if count_x == 1:
        del count[x]
    else:
        count[x] = count_x - 1
    yield from enum1(prefix, count, total - 1, mode)
    count[x] = count_x
    del prefix[-1]


def enum1(prefix, count, total, mode):
    if total == 0:
        yield tuple(prefix)
        return
    if count[mode] * 2 - 1 >= total and [mode] != prefix[-1:]:
        yield from enum2(prefix, mode, count, total, mode)
    else:
        defect_okay = not prefix or count[prefix[-1]] * 2 > total
        mode = get_mode(count)
        for x in list(count.keys()):
            if defect_okay or [x] != prefix[-1:]:
                yield from enum2(prefix, x, count, total, mode)


def enum(seq):
    count = Counter(seq)
    if count:
        yield from enum1([], count, sum(count.values()), get_mode(count))
    else:
        yield ()


def defects(lst):
    return sum(lst[i - 1] == lst[i] for i in range(1, len(lst)))


def test(lst):
    perms = set(permutations(lst))
    opt = min(map(defects, perms))
    slow = {perm for perm in perms if defects(perm) == opt}
    fast = set(enum(lst))
    print(lst, fast, slow)
    assert slow == fast


for r in range(10000):
    test([randrange(3) for i in range(randrange(6))])
23
votes

Pseudocode:

  1. Sort the list
  2. Loop over the first half of the sorted list and fill all even indices of the result list
  3. Loop over the second half of the sorted list and fill all odd indices of the result list

You will only have p[i]==p[i+1] if more than half of the input consists of the same element, in which case there is no other choice than putting the same element in consecutive spots (by the pidgeon hole principle).


As pointed out in the comments, this approach may have one conflict too many in case one of the elements occurs at least n/2 times (or n/2+1 for odd n; this generalizes to (n+1)/2) for both even and odd). There are at most two such elements and if there are two, the algorithm works just fine. The only problematic case is when there is one element that occurs at least half of the time. We can simply solve this problem by finding the element and dealing with it first.

I don't know enough about python to write this properly, so I took the liberty to copy the OP's implementation of a previous version from github:

# Sort the list
a = sorted(lst)

# Put the element occurring more than half of the times in front (if needed)
n = len(a)
m = (n + 1) // 2
for i in range(n - m + 1):
    if a[i] == a[i + m - 1]:
        a = a[i:] + a[:i]
        break

result = [None] * n

# Loop over the first half of the sorted list and fill all even indices of the result list
for i, elt in enumerate(a[:m]):
    result[2*i] = elt

# Loop over the second half of the sorted list and fill all odd indices of the result list
for i, elt in enumerate(a[m:]):
    result[2*i+1] = elt

return result
10
votes

The algorithm already given of taking the most common item left that isn't the previous item is correct. Here's a simple implementation, which optimally uses a heap to track the most common.

import collections, heapq
def nonadjacent(keys):
    heap = [(-count, key) for key, count in collections.Counter(a).items()]
    heapq.heapify(heap)
    count, key = 0, None
    while heap:
        count, key = heapq.heapreplace(heap, (count, key)) if count else heapq.heappop(heap)
        yield key
        count += 1
    for index in xrange(-count):
        yield key

>>> a = [1,2,3,3,2,2,1]
>>> list(nonadjacent(a))
[2, 1, 2, 3, 1, 2, 3]
8
votes

You can generate all the 'perfectly unsorted' permutations (that have no two equal elements in adjacent positions) using a recursive backtracking algorithm. In fact, the only difference to generating all the permutations is that you keep track of the last number and exclude some solutions accordingly:

def unsort(lst, last=None):
    if lst:
        for i, e in enumerate(lst):
            if e != last:
                for perm in unsort(lst[:i] + lst[i+1:], e):
                    yield [e] + perm
    else:
        yield []

Note that in this form the function is not very efficient, as it creates lots of sub-lists. Also, we can speed it up by looking at the most-constrained numbers first (those with the highest count). Here's a much more efficient version using only the counts of the numbers.

def unsort_generator(lst, sort=False):
    counts = collections.Counter(lst)
    def unsort_inner(remaining, last=None):
        if remaining > 0:
            # most-constrained first, or sorted for pretty-printing?
            items = sorted(counts.items()) if sort else counts.most_common()
            for n, c in items:
                if n != last and c > 0:
                    counts[n] -= 1   # update counts
                    for perm in unsort_inner(remaining - 1, n):
                        yield [n] + perm
                    counts[n] += 1   # revert counts
        else:
            yield []
    return unsort_inner(len(lst))

You can use this to generate just the next perfect permutation, or a list holding all of them. But note, that if there is no perfectly unsorted permutation, then this generator will consequently yield no results.

>>> lst = [1,2,3,3,2,2,1]
>>> next(unsort_generator(lst))
[2, 1, 2, 3, 1, 2, 3]
>>> list(unsort_generator(lst, sort=True))
[[1, 2, 1, 2, 3, 2, 3], 
 ... 36 more ...
 [3, 2, 3, 2, 1, 2, 1]]
>>> next(unsort_generator([1,1,1]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

To circumvent this problem, you could use this together with one of the algorithms proposed in the other answers as a fallback. This will guarantee to return a perfectly unsorted permutation, if there is one, or a good approximation otherwise.

def unsort_safe(lst):
    try:
        return next(unsort_generator(lst))
    except StopIteration:
        return unsort_fallback(lst)
5
votes

In python you could do the following.

Consider you have a sorted list l, you can do:

length = len(l)
odd_ind = length%2
odd_half = (length - odd_ind)/2
for i in range(odd_half)[::2]:
    my_list[i], my_list[odd_half+odd_ind+i] = my_list[odd_half+odd_ind+i], my_list[i]

These are just in place operations and should thus be rather fast (O(N)). Note that you will shift from l[i] == l[i+1] to l[i] == l[i+2] so the order you end up with is anything but random, but from how I understand the question it is not randomness you are looking for.

The idea is to split the sorted list in the middle then exchange every other element in the two parts.

For l= [1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5] this leads to l = [3, 1, 4, 2, 5, 1, 3, 1, 4, 2, 5]

The method fails to get rid of all the l[i] == l[i + 1] as soon as the abundance of one element is bigger than or equal to half of the length of the list.

While the above works fine as long as the abundance of the most frequent element is smaller than half the size of the list, the following function also handles the limit cases (the famous off-by-one issue) where every other element starting with the first one must be the most abundant one:

def no_adjacent(my_list):
    my_list.sort()
    length = len(my_list)
    odd_ind = length%2
    odd_half = (length - odd_ind)/2
    for i in range(odd_half)[::2]:
        my_list[i], my_list[odd_half+odd_ind+i] = my_list[odd_half+odd_ind+i], my_list[i]

    #this is just for the limit case where the abundance of the most frequent is half of the list length
    if max([my_list.count(val) for val in set(my_list)]) + 1 - odd_ind > odd_half:
        max_val = my_list[0]
        max_count = my_list.count(max_val)
        for val in set(my_list):
            if my_list.count(val) > max_count:
               max_val = val
               max_count = my_list.count(max_val)
        while max_val in my_list:
            my_list.remove(max_val)
        out = [max_val]
        max_count -= 1
        for val in my_list:
            out.append(val)
            if max_count:
                out.append(max_val)
                max_count -= 1
        if max_count:
            print 'this is not working'
            return my_list
            #raise Exception('not possible')
        return out
    else:
        return my_list
5
votes

Here is a good algorithm:

  1. First of all count for all numbers how often they occur. Place the answer in a map.

  2. sort this map so that the numbers that occur most often come first.

  3. The first number of your answer is the first number in the sorted map.

  4. Resort the map with the first now being one smaller.

If you want to improve efficiency look for ways to increase the efficiency of the sorting step.

5
votes

In answer to the bonus question: this is an algorithm which finds all permutations of a set where no adjacent elements can be identical. I believe this to be the most efficient algorithm conceptually (although others may be faster in practice because they translate into simpler code). It doesn't use brute force, it only generates unique permutations, and paths not leading to solutions are cut off at the earliest point.

I will use the term "abundant element" for an element in a set which occurs more often than all other elements combined, and the term "abundance" for the number of abundant elements minus the number of other elements.
e.g. the set abac has no abundant element, the sets abaca and aabcaa have a as the abundant element, and abundance 1 and 2 respectively.

  1. Start with a set like:

aaabbcd

  1. Seperate the first occurances from the repeats:

firsts: abcd
repeats: aab

  1. Find the abundant element in the repeats, if any, and calculate the abundance:

abundant element: a
abundance: 1

  1. Generate all permutations of the firsts where the number of elements after the abundant element is not less than the abundance: (so in the example the "a" cannot be last)

abcd, abdc, acbd, acdb, adbc, adcb, bacd, badc, bcad, bcda, bdac, bdca,
cabd, cadb, cbad, cbda, cdab, cdba, dabc, dacb, abac, dbca, dcab, dcba

  1. For each permutation, insert the set of repeated characters one by one, following these rules:

5.1. If the abundance of the set is greater than the number of elements after the last occurance of the abundant element in the permutation so far, skip to the next permutation.
e.g. when permutation so far is abc, a set with abundant element a can only be inserted if the abundance is 2 or less, so aaaabc is ok, aaaaabc isn't.

5.2. Select the element from the set whose last occurance in the permutation comes first.
e.g. when permutation so far is abcba and set is ab, select b

5.3. Insert the selected element at least 2 positions to the right of its last occurance in the permutation.
e.g. when inserting b into permutation babca, results are babcba and babcab

5.4. Recurse step 5 with each resulting permutation and the rest of the set.

EXAMPLE:
set = abcaba
firsts = abc
repeats = aab

perm3  set    select perm4  set    select perm5  set    select perm6

abc    aab    a      abac   ab     b      ababc  a      a      ababac  
                                                               ababca  
                                          abacb  a      a      abacab  
                                                               abacba  
                     abca   ab     b      abcba  a      -
                                          abcab  a      a      abcaba  
acb    aab    a      acab   ab     a      acaba  b      b      acabab  
                     acba   ab     b      acbab  a      a      acbaba  
bac    aab    b      babc   aa     a      babac  a      a      babaca  
                                          babca  a      -
                     bacb   aa     a      bacab  a      a      bacaba  
                                          bacba  a      -  
bca    aab    -
cab    aab    a      caba   ab     b      cabab  a      a      cababa  
cba    aab    -

This algorithm generates unique permutations. If you want to know the total number of permutations (where aba is counted twice because you can switch the a's), multiply the number of unique permutations with a factor:

F = N1! * N2! * ... * Nn!

where N is the number of occurances of each element in the set. For a set abcdabcaba this would be 4! * 3! * 2! * 1! or 288, which demonstrates how inefficient an algorithm is that generates all permutations instead of only the unique ones. To list all permutations in this case, just list the unique permutations 288 times :-)

Below is a (rather clumsy) implementation in Javascript; I suspect that a language like Python may be better suited for this sort of thing. Run the code snippet to calculate the seperated permutations of "abracadabra".

// FIND ALL PERMUTATONS OF A SET WHERE NO ADJACENT ELEMENTS ARE IDENTICAL
function seperatedPermutations(set) {
    var unique = 0, factor = 1, firsts = [], repeats = [], abund;

    seperateRepeats(set);
    abund = abundance(repeats);
    permutateFirsts([], firsts);
    alert("Permutations of [" + set + "]\ntotal: " + (unique * factor) + ", unique: " + unique);

    // SEPERATE REPEATED CHARACTERS AND CALCULATE TOTAL/UNIQUE RATIO
    function seperateRepeats(set) {
        for (var i = 0; i < set.length; i++) {
            var first, elem = set[i];
            if (firsts.indexOf(elem) == -1) firsts.push(elem)
            else if ((first = repeats.indexOf(elem)) == -1) {
                repeats.push(elem);
                factor *= 2;
            } else {
                repeats.splice(first, 0, elem);
                factor *= repeats.lastIndexOf(elem) - first + 2;
            }
        }
    }

    // FIND ALL PERMUTATIONS OF THE FIRSTS USING RECURSION
    function permutateFirsts(perm, set) {
        if (set.length > 0) {
            for (var i = 0; i < set.length; i++) {
                var s = set.slice();
                var e = s.splice(i, 1);
                if (e[0] == abund.elem && s.length < abund.num) continue;
                permutateFirsts(perm.concat(e), s, abund);
            }
        }
        else if (repeats.length > 0) {
            insertRepeats(perm, repeats);
        }
        else {
            document.write(perm + "<BR>");
            ++unique;
        }
    }

    // INSERT REPEATS INTO THE PERMUTATIONS USING RECURSION
    function insertRepeats(perm, set) {
        var abund = abundance(set);
        if (perm.length - perm.lastIndexOf(abund.elem) > abund.num) {
            var sel = selectElement(perm, set);
            var s = set.slice();
            var elem = s.splice(sel, 1)[0];
            for (var i = perm.lastIndexOf(elem) + 2; i <= perm.length; i++) {
                var p = perm.slice();
                p.splice(i, 0, elem);
                if (set.length == 1) {
                    document.write(p + "<BR>");
                    ++unique;
                } else {
                    insertRepeats(p, s);
                }
            }
        }
    }

    // SELECT THE ELEMENT FROM THE SET WHOSE LAST OCCURANCE IN THE PERMUTATION COMES FIRST
    function selectElement(perm, set) {
        var sel, pos, min = perm.length;
        for (var i = 0; i < set.length; i++) {
            pos = perm.lastIndexOf(set[i]);
            if (pos < min) {
                min = pos;
                sel = i;
            }
        }
        return(sel);
    }

    // FIND ABUNDANT ELEMENT AND ABUNDANCE NUMBER
    function abundance(set) {
        if (set.length == 0) return ({elem: null, num: 0});
        var elem = set[0], max = 1, num = 1;
        for (var i = 1; i < set.length; i++) {
            if (set[i] != set[i - 1]) num = 1
            else if (++num > max) {
                max = num;
                elem = set[i];
            }
        }
        return ({elem: elem, num: 2 * max - set.length});
    }
}

seperatedPermutations(["a","b","r","a","c","a","d","a","b","r","a"]);
4
votes

The idea is to sort the elements from the most common to the least common, take the most common, decrease its count and put it back in the list keeping the descending order (but avoiding putting the last used element first to prevent repetitions when possible).

This can be implemented using Counter and bisect:

from collections import Counter
from bisect import bisect

def unsorted(lst):
    # use elements (-count, item) so bisect will put biggest counts first
    items = [(-count, item) for item, count in Counter(lst).most_common()]
    result = []

    while items:
        count, item = items.pop(0)
        result.append(item)
        if count != -1:
            element = (count + 1, item)
            index = bisect(items, element)
            # prevent insertion in position 0 if there are other items
            items.insert(index or (1 if items else 0), element)

    return result

Example

>>> print unsorted([1, 1, 1, 2, 3, 3, 2, 2, 1])
[1, 2, 1, 2, 1, 3, 1, 2, 3]

>>> print unsorted([1, 2, 3, 2, 3, 2, 2])
[2, 3, 2, 1, 2, 3, 2]
2
votes
  1. Sort the list.
  2. Generate a "best shuffle" of the list using this algorithm

It will give the minimum of items from the list in their original place (by item value) so it will try, for your example, to put the 1's, 2's and 3's away from their sorted positions.

2
votes

Start with the sorted list of length n. Let m=n/2. Take the values at 0, then m, then 1, then m+1, then 2, then m+2, and so on. Unless you have more than half of the numbers the same, you'll never get equivalent values in consecutive order.

2
votes

Please forgive my "me too" style answer, but couldn't Coady's answer be simplified to this?

from collections import Counter
from heapq import heapify, heappop, heapreplace
from itertools import repeat

def srgerg(data):
    heap = [(-freq+1, value) for value, freq in Counter(data).items()]
    heapify(heap)

    freq = 0
    while heap:
        freq, val = heapreplace(heap, (freq+1, val)) if freq else heappop(heap)
        yield val
    yield from repeat(val, -freq)

Edit: Here's a python 2 version that returns a list:

def srgergpy2(data):
    heap = [(-freq+1, value) for value, freq in Counter(data).items()]
    heapify(heap)

    freq = 0
    result = list()
    while heap:
        freq, val = heapreplace(heap, (freq+1, val)) if freq else heappop(heap)
        result.append(val)
    result.extend(repeat(val, -freq))
    return result
2
votes
  1. Count number of times each value appears
  2. Select values in order from most frequent to least frequent
  3. Add selected value to final output, incrementing the index by 2 each time
  4. Reset index to 1 if index out of bounds
from heapq import heapify, heappop
def distribute(values):
    counts = defaultdict(int)
    for value in values:
        counts[value] += 1
    counts = [(-count, key) for key, count in counts.iteritems()]
    heapify(counts)
    index = 0
    length = len(values)
    distributed = [None] * length
    while counts:
        count, value = heappop(counts)
        for _ in xrange(-count):
            distributed[index] = value
            index = index + 2 if index + 2 < length else 1
    return distributed