python - How to make a flat list out of a list of lists?

4146

votes

Is there a shortcut to make a simple list out of a list of lists in Python?

I can do it in a for loop, but maybe there is some cool "one-liner"? I tried it with functools.reduce()

from functools import reduce
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

but I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

pythonlistmultidimensional-arrayflatten

There's an in-depth discussion of this here: rightfootin.blogspot.com/2006/09/more-on-python-flatten.html, discussing several methods of flattening arbitrarily nested lists of lists. An interesting read! - RichieHindle

Some other answers are better but the reason yours fails is that the 'extend' method always returns None. For a list with length 2, it will work but return None. For a longer list, it will consume the first 2 args, which returns None. It then continues with None.extend(<third arg>), which causes this erro - mehtunguh

@shawn-chin solution is the more pythonic here, but if you need to preserve the sequence type, say you have a tuple of tuples rather than a list of lists, then you should use reduce(operator.concat, tuple_of_tuples). Using operator.concat with tuples seems to perform faster than chain.from_iterables with list. - Meitham

stackoverflow.com/questions/50259290/… (this article explain the difference between an np.flatten() and a tf.flatten() use (static vs dynamic) ndarray. - Golden Lion

Is your list-of-lists a perfectly two-level list? by which I mean, is every integer in a list which is in another list - so exactly two levels? (as opposed to having some integers in a nested list and some integers directly in the first list). The fact that you have [7] in your list (instead of 7) indicates that yes, it is a perfectly two-level list. So wouldn't sthg very simple like l2=[] followed by [l2.extend(i) for i in l] do the trick? - gnoodle

5879

votes

Given a list of lists t,

flat_list = [item for sublist in t for item in sublist]

which means:

flat_list = []
for sublist in t:
    for item in sublist:
        flat_list.append(item)

is faster than the shortcuts posted so far. (t is the list to flatten.)

Here is the corresponding function:

def flatten(t):
    return [item for sublist in t for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s't=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in t for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s't=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(t, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s't=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,t)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(T**2) when there are T sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have T sublists of k items each: the first k items are copied back and forth T-1 times, the second k items T-2 times, and so on; total number of copies is k times the sum of x for x from 1 to T excluded, i.e., k * (T**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

1864

votes

You can use itertools.chain():

import itertools

list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain(*list2d))

Or you can use itertools.chain.from_iterable() which doesn't require unpacking the list with the * operator:

merged = list(itertools.chain.from_iterable(list2d))

1065

votes

Note from the author: This is inefficient. But fun, because monoids are awesome. It's not appropriate for production Python code.

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

This just sums the elements of iterable passed in the first argument, treating second argument as the initial value of the sum (if not given, 0 is used instead and this case will give you an error).

Because you are summing nested lists, you actually get [1,3]+[2,4] as a result of sum([[1,3],[2,4]],[]), which is equal to [1,3,2,4].

Note that only works on lists of lists. For lists of lists of lists, you'll need another solution.

647

votes

I tested most suggested solutions with perfplot (a pet project of mine, essentially a wrapper around timeit), and found

import functools
import operator
functools.reduce(operator.iconcat, a, [])

to be the fastest solution, both when many small lists and few long lists are concatenated. (operator.iadd is equally fast.)

Code to reproduce the plot:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

287

votes

Don't reinvent the wheel

If you use -

...Django:

>>> from django.contrib.admin.utils import flatten
>>> l = [[1,2,3], [4,5], [6]]
>>> flatten(l)
>>> [1, 2, 3, 4, 5, 6]

...Pandas:

>>> from pandas.core.common import flatten
>>> list(flatten(l))

...Itertools:

>>> import itertools
>>> flatten = itertools.chain.from_iterable
>>> list(flatten(l))

...Matplotlib

>>> from matplotlib.cbook import flatten
>>> list(flatten(l))

...Unipath:

>>> from unipath.path import flatten
>>> list(flatten(l))

...Setuptools:

>>> from setuptools.namespaces import flatten
>>> list(flatten(l))

227

votes

>>> from functools import reduce
>>> l = [[1,2,3], [4,5,6], [7], [8,9]]
>>> reduce(lambda x, y: x+y, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The extend() method in your example modifies x instead of returning a useful value (which functools.reduce() expects).

A faster way to do the reduce version would be

>>> import operator
>>> l = [[1,2,3], [4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

134

votes

Here is a general approach that applies to numbers, strings, nested lists and mixed containers. This can flatten both simple and complicated containers (see also Demo).

Code

from typing import Iterable 
#from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

Notes:

In Python 3, yield from flatten(x) can replace for sub_x in flatten(x): yield sub_x
In Python 3.8, abstract base classes are moved from collection.abc to the typing module.

Demo

simple = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(simple))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

complicated = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(complicated))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

Reference

This solution is modified from a recipe in Beazley, D. and B. Jones. Recipe 4.14, Python Cookbook 3rd Ed., O'Reilly Media Inc. Sebastopol, CA: 2013.
Found an earlier SO post, possibly the original demonstration.

63

votes

If you want to flatten a data-structure where you don't know how deep it's nested you could use iteration_utilities.deepflatten¹

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

It's a generator so you need to cast the result to a list or explicitly iterate over it.

To flatten only one level and if each of the items is itself iterable you can also use iteration_utilities.flatten which itself is just a thin wrapper around itertools.chain.from_iterable:

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Just to add some timings (based on Nico Schlömer answer that didn't include the function presented in this answer):

It's a log-log plot to accommodate for the huge range of values spanned. For qualitative reasoning: Lower is better.

The results show that if the iterable contains only a few inner iterables then sum will be fastest, however for long iterables only the itertools.chain.from_iterable, iteration_utilities.deepflatten or the nested comprehension have reasonable performance with itertools.chain.from_iterable being the fastest (as already noticed by Nico Schlömer).

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

^{1 Disclaimer: I'm the author of that library}

45

votes

I take my statement back. sum is not the winner. Although it is faster when the list is small. But the performance degrades significantly with larger lists.

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

The sum version is still running for more than a minute and it hasn't done processing yet!

For medium lists:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

Using small lists and timeit: number=1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

44

votes

There seems to be a confusion with operator.add! When you add two lists together, the correct term for that is concat, not add. operator.concat is what you need to use.

If you're thinking functional, it is as easy as this::

>>> from functools import reduce
>>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

You see reduce respects the sequence type, so when you supply a tuple, you get back a tuple. Let's try with a list::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Aha, you get back a list.

How about performance::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable is pretty fast! But it's no comparison to reduce with concat.

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

41

votes

Why do you use extend?

reduce(lambda x, y: x+y, l)

This should work fine.

37

votes

Consider installing the more_itertools package.

> pip install more_itertools

It ships with an implementation for flatten (source, from the itertools recipes):

import more_itertools


lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

Note: as mentioned in the docs, flatten requires a list of lists. See below on flattening more irregular inputs.

As of version 2.4, you can flatten more complicated, nested iterables with more_itertools.collapse (source, contributed by abarnet).

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.collapse(lst)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]              # complex nesting
list(more_itertools.collapse(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

29

votes

The reason your function didn't work is because the extend extends an array in-place and doesn't return it. You can still return x from lambda, using something like this:

reduce(lambda x,y: x.extend(y) or x, l)

Note: extend is more efficient than + on lists.

23

votes

def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]

23

votes

Recursive version

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]]

def flatten_list(k):
    result = list()
    for i in k:
        if isinstance(i,list):

            #The isinstance() function checks if the object (first argument) is an 
            #instance or subclass of classinfo class (second argument)

            result.extend(flatten_list(i)) #Recursive call
        else:
            result.append(i)
    return result

flatten_list(x)
#result = [1,2,3,4,5,6,7,8,9,10]

22

votes

The accepted answer did not work for me when dealing with text-based lists of variable lengths. Here is an alternate approach that did work for me.

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

Accepted answer that did not work:

flat_list = [item for sublist in l for item in sublist]
print(flat_list)
['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

New proposed solution that did work for me:

flat_list = []
_ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item]
print(flat_list)
['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

22

votes

A bad feature of Anil's function above is that it requires the user to always manually specify the second argument to be an empty list []. This should instead be a default. Due to the way Python objects work, these should be set inside the function, not in the arguments.

Here's a working function:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

Testing:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

17

votes

matplotlib.cbook.flatten() will work for nested lists even if they nest more deeply than the example.

import matplotlib
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
print(list(matplotlib.cbook.flatten(l)))
l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]]
print list(matplotlib.cbook.flatten(l2))

Result:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

This is 18x faster than underscore._.flatten:

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec
Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec
(time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

16

votes

Following seem simplest to me:

>>> import numpy as np
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> print (np.concatenate(l))
[1 2 3 4 5 6 7 8 9]

14

votes

One can also use NumPy's flat:

import numpy as np
list(np.array(l).flat)

Edit 11/02/2016: Only works when sublists have identical dimensions.

10

votes

I personally find it hard to remember all the modules that needed to be imported. Thus I tend to use a simple method, even though I don't know how its performance is compared to other answers.

If you just want to flatten nested lists, then the following will do the job:

def flatten(lst):
    for item in lst:
        if isinstance(item, list):
            yield from flatten(item)
        else:
            yield item

# test case:
a =[0, [], "fun", [1, 2, 3], [4, [5], 6], 3, [7], [8, 9]]
list(flatten(a))
# output 
# [0, 'fun', 1, 2, 3, 4, 5, 6, 3, 7, 8, 9]

However, if you want to flatten a list of iterables (list and/or tuples), it can also do the job with a slight modification:

from collections.abc import Iterable
def flatten(lst):
    for item in lst:
        if isinstance(item,Iterable) and not isinstance(item,str):
            yield from flatten(item)
        else:
            yield item

# test case:
a =[0, [], "fun", (1, 2, 3), [4, [5], (6)], 3, [7], [8, 9]]
list(flatten(a))
# output: 
# [0, 'fun', 1, 2, 3, 4, 5, 6, 3, 7, 8, 9]

8

votes

you can use list extend method, it shows to be the fastest:

flat_list = []
for sublist in l:
    flat_list.extend(sublist)

performance:

import functools
import itertools
import numpy
import operator
import perfplot



def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def extend(a):
    n = []

    list(map(n.extend, a))

    return n 


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    kernels=[
        functools_reduce_iconcat, extend,itertools_chain, numpy_flat
        ],
    n_range=[2**k for k in range(16)],
    xlabel='num lists',
    )

output:

8

votes

This is a play on the original poster's code. (He wasn't far off)

f = []
list(map(f.extend, l))

7

votes

from nltk import flatten

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flatten(l)

The advantage of this solution over most others here is that if you have a list like:

l = [1, [2, 3], [4, 5, 6], [7], [8, 9]]

while most other solutions throw an error this solution handles them.

6

votes

Note: Below applies to Python 3.3+ because it uses yield_from. six is also a third-party package, though it is stable. Alternately, you could use sys.version.

In the case of obj = [[1, 2,], [3, 4], [5, 6]], all of the solutions here are good, including list comprehension and itertools.chain.from_iterable.

However, consider this slightly more complex case:

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

There are several problems here:

One element, 6, is just a scalar; it's not iterable, so the above routes will fail here.
One element, 'abc', is technically iterable (all strs are). However, reading between the lines a bit, you don't want to treat it as such--you want to treat it as a single element.
The final element, [8, [9, 10]] is itself a nested iterable. Basic list comprehension and chain.from_iterable only extract "1 level down."

You can remedy this as follows:

>>> from collections import Iterable
>>> from six import string_types

>>> def flatten(obj):
...     for i in obj:
...         if isinstance(i, Iterable) and not isinstance(i, string_types):
...             yield from flatten(i)
...         else:
...             yield i


>>> list(flatten(obj))
[1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

Here, you check that the sub-element (1) is iterable with Iterable, an ABC from itertools, but also want to ensure that (2) the element is not "string-like."

6

votes

You can use numpy :
flat_list = list(np.concatenate(list_of_list))

5

votes

If you are willing to give up a tiny amount of speed for a cleaner look, then you could use numpy.concatenate().tolist() or numpy.concatenate().ravel().tolist():

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

You can find out more here in the docs numpy.concatenate and numpy.ravel

5

votes

Fastest solution I have found (for large list anyway):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

Done! You can of course turn it back into a list by executing list(l)

5

votes

Simple code for underscore.py package fan

from underscore import _
_.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

It solves all flatten problems (none list item or complex nesting)

from underscore import _
# 1 is none list item
# [2, [3]] is complex nesting
_.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

You can install underscore.py with pip

pip install underscore.py

5

votes

def flatten(alist):
    if alist == []:
        return []
    elif type(alist) is not list:
        return [alist]
    else:
        return flatten(alist[0]) + flatten(alist[1:])

python - How to make a flat list out of a list of lists?

30 Answers

Don't reinvent the wheel

If you use -

Accepted answer that did not work:

New proposed solution that did work for me: