Use itertools.groupby
:
>>> import itertools
>>> import operator
>>> foo = [("a", 1), ("a", 2), ("b", 1), ("c", 1), ("c", 2)]
>>> for group in itertools.groupby(foo, operator.itemgetter(0)):
... print group[0], list(map(operator.itemgetter(1), group[1]))
...
a [1, 2]
b [1]
c [1, 2]
Explanation:
groupby
, as the name suggest, groups elements of an iterable into chunks based on some key function. That is, it calls keyfunc
on the first element of the iterable, then pulls elements one by one from the iterable until the value of keyfunc
changes, at which point it yields all of the elements it has got so far and starts again from the new key. It is also sensible and does not consume more memory than necessary; once values have been yielded they are no longer held by groupby
.
Here, we group the elements of the input by operator.itemgetter(0)
, which is a useful "toolbox" function which maps x
to x[0]
. In other words, we group by the first element of the tuple, which is a key.
Naturally, you will need to write a custom generator to handle reading the input (from sys.stdin
, probably) and yield them one by one. Fortunately, this is pretty easy, using the yield
keyword.
Note also that this assumes that the keys are sorted. Naturally, if they are not sorted there is nothing you can do: you would need to look until the end of the input to ensure that you have all values for a given key.