Aside from local/global variable store times, opcode prediction makes the function faster.
As the other answers explain, the function uses the STORE_FAST
opcode in the loop. Here's the bytecode for the function's loop:
>> 13 FOR_ITER 6 (to 22) # get next value from iterator
16 STORE_FAST 0 (x) # set local variable
19 JUMP_ABSOLUTE 13 # back to FOR_ITER
Normally when a program is run, Python executes each opcode one after the other, keeping track of the a stack and preforming other checks on the stack frame after each opcode is executed. Opcode prediction means that in certain cases Python is able to jump directly to the next opcode, thus avoiding some of this overhead.
In this case, every time Python sees FOR_ITER
(the top of the loop), it will "predict" that STORE_FAST
is the next opcode it has to execute. Python then peeks at the next opcode and, if the prediction was correct, it jumps straight to STORE_FAST
. This has the effect of squeezing the two opcodes into a single opcode.
On the other hand, the STORE_NAME
opcode is used in the loop at the global level. Python does *not* make similar predictions when it sees this opcode. Instead, it must go back to the top of the evaluation-loop which has obvious implications for the speed at which the loop is executed.
To give some more technical detail about this optimization, here's a quote from the ceval.c
file (the "engine" of Python's virtual machine):
Some opcodes tend to come in pairs thus making it possible to
predict the second code when the first is run. For example,
GET_ITER
is often followed by FOR_ITER
. And FOR_ITER
is often
followed by STORE_FAST
or UNPACK_SEQUENCE
.
Verifying the prediction costs a single high-speed test of a register
variable against a constant. If the pairing was good, then the
processor's own internal branch predication has a high likelihood of
success, resulting in a nearly zero-overhead transition to the
next opcode. A successful prediction saves a trip through the eval-loop
including its two unpredictable branches, the HAS_ARG
test and the
switch-case. Combined with the processor's internal branch prediction,
a successful PREDICT
has the effect of making the two opcodes run as if
they were a single new opcode with the bodies combined.
We can see in the source code for the FOR_ITER
opcode exactly where the prediction for STORE_FAST
is made:
case FOR_ITER: // the FOR_ITER opcode case
v = TOP();
x = (*v->ob_type->tp_iternext)(v); // x is the next value from iterator
if (x != NULL) {
PUSH(x); // put x on top of the stack
PREDICT(STORE_FAST); // predict STORE_FAST will follow - success!
PREDICT(UNPACK_SEQUENCE); // this and everything below is skipped
continue;
}
// error-checking and more code for when the iterator ends normally
The PREDICT
function expands to if (*next_instr == op) goto PRED_##op
i.e. we just jump to the start of the predicted opcode. In this case, we jump here:
PREDICTED_WITH_ARG(STORE_FAST);
case STORE_FAST:
v = POP(); // pop x back off the stack
SETLOCAL(oparg, v); // set it as the new local variable
goto fast_next_opcode;
The local variable is now set and the next opcode is up for execution. Python continues through the iterable until it reaches the end, making the successful prediction each time.
The Python wiki page has more information about how CPython's virtual machine works.
time
command. – Ward Muylaert