python - PyPy -- How can it possibly beat CPython?

Question

PyPy is a reimplementation of Python in Python, using advanced techniques to try to attain better performance than CPython. Many years of hard work have finally paid off. Our speed results often beat CPython, ranging from being slightly slower, to speedups of up to 2x on real application code, to speedups of up to 10x on small benchmarks.

How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?

(On a related note... why would anyone try something like this?)

Nitpick: PyPy is PyPyPy. Think of the Py-* prefix as a projection operator. — u0b34a0f6ae
Ok. so PyPy should be preferred than to CPython? does it have any drawbacks? — balki
PyPy is excellent at runtime optimization, but its different innards make it incompatible with several popular C extensions. — Cees Timmerman
Almost everyone is missing the question, as to how a speed gain is THEORETICALLY possible. But think about it: Python can do anything, just like a Turing machine. It can call gcc, after all. So you can also write some python code that runs on CPython, that interprets some other python code, translates it to C, and executes gcc, and then executes the compiled program. And it could be faster, if the code is called often enough. — Sergey Orshanskiy

Noufal Ibrahim Noufal Ibrahim · Accepted Answer · 2010-04-07T11:48:51

Q1. How is this possible?

Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.

Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).

As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.

Q2. Which Python implementation was used to implement PyPy?

The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). - Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.

Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?

That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.

Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It's a contrived case but does exhibit some ideas.

Q4. Why would anyone try something like this?

From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement

We aim to provide:

a common translation and support framework for producing
implementations of dynamic languages, emphasizing a clean
separation between language specification and implementation
aspects. We call this the RPython toolchain_.

a compliant, flexible and fast implementation of the Python_ Language which uses the above toolchain to enable new advanced high-level features without having to encode the low-level details.

By separating concerns in this way, our implementation of Python - and other dynamic languages - is able to automatically generate a Just-in-Time compiler for any dynamic language. It also allows a mix-and-match approach to implementation decisions, including many that have historically been outside of a user's control, such as target platform, memory and threading models, garbage collection strategies, and optimizations applied, including whether or not to have a JIT in the first place.

The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?

python - PyPy -- How can it possibly beat CPython?

4 Answers