79
votes

I have to cluster the consecutive elements from a NumPy array. Considering the following example

a = [ 0, 47, 48, 49, 50, 97, 98, 99]

The output should be a list of tuples as follows

[(0), (47, 48, 49, 50), (97, 98, 99)]

Here the difference is just one between the elements. It will be great if the difference can also be specified as a limit or a hardcoded number.

6
I found this answer having EXACTELY the same problem... Small world! :o) - heltonbiker

6 Answers

22
votes

Here's a lil func that might help:

def group_consecutives(vals, step=1):
    """Return list of consecutive lists of numbers from vals (number list)."""
    run = []
    result = [run]
    expect = None
    for v in vals:
        if (v == expect) or (expect is None):
            run.append(v)
        else:
            run = [v]
            result.append(run)
        expect = v + step
    return result

>>> group_consecutives(a)
[[0], [47, 48, 49, 50], [97, 98, 99]]
>>> group_consecutives(a, step=47)
[[0, 47], [48], [49], [50, 97], [98], [99]]

P.S. This is pure Python. For a NumPy solution, see unutbu's answer.

210
votes
def consecutive(data, stepsize=1):
    return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

a = np.array([0, 47, 48, 49, 50, 97, 98, 99])
consecutive(a)

yields

[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]
12
votes

(a[1:]-a[:-1])==1 will produce a boolean array where False indicates breaks in the runs. You can also use the built-in numpy.grad.

5
votes

this is what I came up so far: not sure is 100% correct

import numpy as np
a = np.array([ 0, 47, 48, 49, 50, 97, 98, 99])
print np.split(a, np.cumsum( np.where(a[1:] - a[:-1] > 1) )+1)

returns:

>>>[array([0]), array([47, 48, 49, 50]), array([97, 98, 99])]
0
votes

Tested for one dimensional arrays

Get where diff isn't one

diffs = numpy.diff(array) != 1

Get the indexes of diffs, grab the first dimension and add one to all because diff compares with the previous index

indexes = numpy.nonzero(diffs)[0] + 1

Split with the given indexes

groups = numpy.split(array, indexes)
-2
votes

This sounds a little like homework, so if you dont mind I will suggest an approach

You can iterate over a list using

for i in range(len(a)):
    print a[i]

You could test the next element in the list meets some criteria like follows

if a[i] == a[i] + 1:
    print "it must be a consecutive run"

And you can store results seperately in

results = []

Beware - there is an index out of range error hidden in the above you will need to deal with