Iterators are objects that implement the iter and next methods. If those methods are defined, we can use for loop or comprehensions.
class Squares:
def __init__(self, length):
self.length = length
self.i = 0
def __iter__(self):
print('calling __iter__') # this will be called first and only once
return self
def __next__(self):
print('calling __next__') # this will be called for each iteration
if self.i >= self.length:
raise StopIteration
else:
result = self.i ** 2
self.i += 1
return result
Iterators get exhausted. It means after you iterate over items, you cannot reiterate, you have to create a new object. Let's say you have a class, which holds the cities properties and you want to iterate over.
class Cities:
def __init__(self):
self._cities = ['Brooklyn', 'Manhattan', 'Prag', 'Madrid', 'London']
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._cities):
raise StopIteration
else:
item = self._cities[self._index]
self._index += 1
return item
Instance of class Cities is an iterator. However if you want to reiterate over cities, you have to create a new object which is an expensive operation. You can separate the class into 2 classes: one returns cities and second returns an iterator which gets the cities as init param.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Istanbul', 'London']
def __len__(self):
return len(self._cities)
class CityIterator:
def __init__(self, city_obj):
# cities is an instance of Cities
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
Now if we need to create a new iterator, we do not have to create the data again, which is cities. We creates cities object and pass it to the iterator. But we are still doing extra work. We could implement this by creating only one class.
Iterable is a Python object that implements the iterable protocol. It requires only __iter__()
that returns a new instance of iterator object.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Istanbul', 'Paris']
def __len__(self):
return len(self._cities)
def __iter__(self):
return self.CityIterator(self)
class CityIterator:
def __init__(self, city_obj):
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
Iterators has __iter__
and __next__
, iterables have __iter__
, so we can say Iterators are also iterables but they are iterables that get exhausted. Iterables on the other hand never become exhausted
because they always return a new iterator that is then used to iterate
You notice that the main part of the iterable code is in the iterator, and the iterable itself is nothing more than an extra layer that allows us to create and access the iterator.
Iterating over an iterable
Python has a built function iter() which calls the __iter__()
. When we iterate over an iterable, Python calls the iter() which returns an iterator, then it starts using __next__()
of iterator to iterate over the data.
NOte that in the above example, Cities creates an iterable but it is not a sequence type, it means we cannot get a city by an index. To fix this we should just add __get_item__
to the Cities class.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Budapest', 'Newcastle']
def __len__(self):
return len(self._cities)
def __getitem__(self, s): # now a sequence type
return self._cities[s]
def __iter__(self):
return self.CityIterator(self)
class CityIterator:
def __init__(self, city_obj):
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item