17
votes

For example, how much memory is required to store a list of one million (32-bit) integers?

alist = range(1000000) # or list(range(1000000)) in Python 3.0
5
Test it? Graphical example I made: stackoverflow.com/a/30008338/2087463tmthydvnprt
@tmthydvnprt: your answer relies solely on a single sys.getsizeof() call that is incorrect. Compare the results with asizeof.asizeof() from my answer (e.g., on my system: sys.getsizeof(range(10**6)) -> 8000072 while asizeof.asizeof(range(10**6)) -> 32000072).jfs
why is a single sys.getsizeof() incorrect?tmthydvnprt

5 Answers

24
votes

"It depends." Python allocates space for lists in such a way as to achieve amortized constant time for appending elements to the list.

In practice, what this means with the current implementation is... the list always has space allocated for a power-of-two number of elements. So range(1000000) will actually allocate a list big enough to hold 2^20 elements (~ 1.045 million).

This is only the space required to store the list structure itself (which is an array of pointers to the Python objects for each element). A 32-bit system will require 4 bytes per element, a 64-bit system will use 8 bytes per element.

Furthermore, you need space to store the actual elements. This varies widely. For small integers (-5 to 256 currently), no additional space is needed, but for larger numbers Python allocates a new object for each integer, which takes 10-100 bytes and tends to fragment memory.

Bottom line: it's complicated and Python lists are not a good way to store large homogeneous data structures. For that, use the array module or, if you need to do vectorized math, use NumPy.

PS- Tuples, unlike lists, are not designed to have elements progressively appended to them. I don't know how the allocator works, but don't even think about using it for large data structures :-)

15
votes

Useful links:

How to get memory size/usage of python object

Memory sizes of python objects?

if you put data into dictionary, how do we calculate the data size?

However they don't give a definitive answer. The way to go:

  1. Measure memory consumed by Python interpreter with/without the list (use OS tools).

  2. Use a third-party extension module which defines some sort of sizeof(PyObject).

Update:

Recipe 546530: Size of Python objects (revised)

import asizeof

N = 1000000
print asizeof.asizeof(range(N)) / N
# -> 20 (python 2.5, WinXP, 32-bit Linux)
# -> 33 (64-bit Linux)
6
votes

Addressing "tuple" part of the question

Declaration of CPython's PyTuple in a typical build configuration boils down to this:

struct PyTuple {
  size_t refcount; // tuple's reference count
  typeobject *type; // tuple type object
  size_t n_items; // number of items in tuple
  PyObject *items[1]; // contains space for n_items elements
};

Size of PyTuple instance is fixed during it's construction and cannot be changed afterwards. The number of bytes occupied by PyTuple can be calculated as

sizeof(size_t) x 2 + sizeof(void*) x (n_items + 1).

This gives shallow size of tuple. To get full size you also need to add total number of bytes consumed by object graph rooted in PyTuple::items[] array.

It's worth noting that tuple construction routines make sure that only single instance of empty tuple is ever created (singleton).

References: Python.h, object.h, tupleobject.h, tupleobject.c

4
votes

A new function, getsizeof(), takes a Python object and returns the amount of memory used by the object, measured in bytes. Built-in objects return correct results; third-party extensions may not, but can define a __sizeof__() method to return the object’s size.

kveretennicov@nosignal:~/py/r26rc2$ ./python
Python 2.6rc2 (r26rc2:66712, Sep  2 2008, 13:11:55) 
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
>>> import sys
>>> sys.getsizeof(range(1000000))
4000032
>>> sys.getsizeof(tuple(range(1000000)))
4000024

Obviously returned numbers don't include memory consumed by contained objects (sys.getsizeof(1) == 12).

2
votes

This is implementation specific, I'm pretty sure. Certainly it depends on the internal representation of integers - you can't assume they'll be stored as 32-bit since Python gives you arbitrarily large integers so perhaps small ints are stored more compactly.

On my Python (2.5.1 on Fedora 9 on core 2 duo) the VmSize before allocation is 6896kB, after is 22684kB. After one more million element assignment, VmSize goes to 38340kB. This very grossly indicates around 16000kB for 1000000 integers, which is around 16 bytes per integer. That suggests a lot of overhead for the list. I'd take these numbers with a large pinch of salt.