0
votes

This is proving to be a very difficult question for me to figure out how to properly ask.

For example, the Python interpreter is written in C. Say you wrote another interpreter in Python, that got compiled through CPython. And you then intend to run a program through your interpreter. Does that code have to go through your interpreter and then CPython? Or would the new interpreter be self-contained and not require CPython to interpret it's output. Because if it is not a standalone interpreter, then there would be a performance loss, in which case it would seem like all compilers would be written in low level languages until the end of time

EDIT: PyPy was the very thing that got me wondering about this topic. I was under the impression that it was an interpreter, written in Python, and I did not understand how it could be faster than CPython. I also do not understand how bytecode gets executed by the machine without being translated to machine code, though I suppose that is another topic.

1
Look at PyPy. It's a Python interpreter written in Python that runs under… well, usually under PyPy itself. By using a JIT custom-designed for Python-like languages, it's usually significantly faster than the usual CPython interpreter, which is written in C but does virtually no runtime optimization, or Jython and IronPython, which are written in Java and C# but rely on generic JVM and .NET JITs. And that's without even getting into non-interpreting compilers, where the hosting language isn't relevant.abarnert
@abarnert, PyPy is written in RPython -- a restricted subset of the Python language which can be compiled to C. Thus, it is not itself interpreted, and thus quite emphatically does not provide an existence proof of an interpreted interpreter which runs faster than its parent platform.Charles Duffy
Answering the question specifically about PyPy, then -- PyPy can be faster than CPython because it doesn't run on CPython at runtime, but is compiled to C and then to native code. Modern JIT runtimes are indeed a bigger discussion.Charles Duffy
The right answer to the question the OP really wants to ask is "modern JIT runtimes". Or, more generally, (static and dynamic) optimization. CPython does a small amount of static optimization and a few tiny bits of special-cased dynamic optimization; PyPy (usually) automatically finds the hotspots and compiles them to optimized machine code on the fly; and that's why PyPy is (often) faster. (But I'll bet there's already a dup of that question here which covers this a lot better than a series of comments trying to puzzle out the real question from this one can…)abarnert
@abarnert and OP, here's one of the more interesting discussions of Python and its various intermediate representations that I've found on SO: stackoverflow.com/a/2998544/20789Dan Lenski

1 Answers

5
votes

You seem to be confused about the distinction between compilers and interpreters, since you refer to both in your question without a clear distinction. (Quite understandble... see all the comments flying around this thread :-))

Compilers and interpreters are somewhat, though not totally, orthogonal concepts:

Compilers

Compilers take source code and produce a form that can be executed more efficiently, whether that be native machine code, or an intermediate form like CPython's bytecode.

is perhaps the canonical example of a language that is almost always compiled to native machine code. The language was indeed designed to be relatively easy and efficient to translate into machine code. RISC CPU architectures became popular after the C language was already well-adopted for operating system programming, and they were often designed to make it even more efficient to translate certain features of C to machine code.

So the "compilability" of C has become self-reinforcing. It is difficult to introduce a new architecture on which it is hard to write a good C compiler (e.g. Itanium) that fully takes advantage of the hardware's potential. If your CPU can't run C code efficiently, it can't run most operating systems efficiently (the low-level bits of Linux, Unix, and Windows are mainly written in C).

Interpreters

Interpreters are traditionally defined as programs that try to run source code directly from its source representation. Most implementations of BASIC worked like this, back in the good ol' days: BASIC would literally re-parse each line of code on each iteration through a loop.

Modern languages

Modern programming languages and platforms blur the lines a lot. Languages like , , or are typically not compiled to native machine code, but to various intermediate forms like bytecode.

CPython's bytecode can be interpreted, but the overhead of interpretation is much lower because the code is fully parsed beforehand (and saved in .pyc file so it doesn't need to be re-parsed until it is modified).

Just-in-time compilation can be used to translate bytecode to native machine code just before it is actually run, with many different strategies for exactly when the native code compilation should take place.

Some languages that have "traditionally" been run via a bytecode interpreter or JIT compiler are also amenable to ahead-of-time compilation. For example, the Dalvik VM used in previous versions of Android relies on just-in-time compilation, while Android 4.4 has introduced ART which uses ahead-of-time compilation intsead.

Intermediate representations of Python

Here's a great thread containing a really useful and thoughful answer by @AlexMartelli on the lower-level compiled forms generated by various implementations of Python.

Answering the original question (I think...)

A traditional interpreter will almost certainly execute code slower than if that same code were compiled to "bare metal" machine code, all else being equal (which it typically is not), because the interpreter imposes an additional cost of parsing every line or unit of code every time it is executed.

So if a traditional interpreter were running under an interpreter, which was itself running under an interpreter, etc., ... that would result in a performance loss, just as running a VM (virtual machine) under a VM under a VM will be slower than running on "bare metal."

This is not so for a compiler. I could write a compiler which runs under an interpreter which runs under an interpreter which has been compiled by a compiler, etc... the resulting compiler could generate native machine code that is just as good as the native code generated by any other compiler. (It is important to realize that the performance of the compiler itself can be entirely independent of the performance of the executed code; an aggressive optimizing C compiler typically takes much more time to compile code than a non-optimizing compiler, but the intention is for the resultant native code to run significantly faster.)