In the last episode, I have showed a rather important limitation of the tiny-interp
interpreter:
def cond():
x = 3
if x < 5:
return "yes"
else:
return "no"
Control flow and function calls were not handled, as a result tiny-interp
could
not interpret the above code fragment.
In the following, I’ll ditch tiny-interp
and switch to the “real” pygo
interpreter.
Real Python bytecode
People having read the AOSA article know that the structure of the bytecode of
the tiny-interp
interpreter instruction set is in fact very similar to the
one of the real python bytecode.
Indeed, if one defines the above cond()
function in a python3
prompt and
enters:
### bytecode as raw bytes
>>> print(cond.__code__.co_code)
b'd\x01\x00}\x00\x00|\x00\x00d\x02\x00k\x00\x00r\x16\x00d\x03
\x00Sd\x04\x00Sd\x00\x00S'
### bytecode as numbers
>>> print(list(cond.__code__.co_code))
[100, 1, 0, 125, 0, 0, 124, 0, 0, 100, 2, 0, 107,
0, 0, 114, 22, 0, 100, 3, 0, 83, 100, 4, 0, 83,
100, 0, 0, 83]
This doesn’t look very human friendly.
Luckily, there is the dis
module that can ingest low-level bytecode
and prints it in a more human-readable way:
>>> import dis
>>> dis.dis(cond)
2 0 LOAD_CONST 1 (3)
3 STORE_FAST 0 (x)
3 6 LOAD_FAST 0 (x)
9 LOAD_CONST 2 (5)
12 COMPARE_OP 0 (<)
15 POP_JUMP_IF_FALSE 22
4 18 LOAD_CONST 3 ('yes')
21 RETURN_VALUE
6 >> 22 LOAD_CONST 4 ('no')
25 RETURN_VALUE
26 LOAD_CONST 0 (None)
29 RETURN_VALUE
Have a look at the official dis
module documentation for more informations.
In a nutshell, the LOAD_CONST
is the same than our toy OpLoadValue
and LOAD_FAST
is the same than our toy OpLoadName
.
Simply inspecting this little bytecode snippet shows how conditions and branch-y
code might be handled.
The instruction POP_JUMP_IF_FALSE
implements the if x < 5
statement from the
cond()
function.
If the condition is false
(i.e.: x
is greater or equal than 5
), the interpreter
is instructed to jump to position 22
in the bytecode stream, i.e. the return "no"
body of the false
branch.
Loops are handled pretty much the same way:
>>> def loop():
... x = 1
... while x < 5:
... x = x + 1
... return x
...
>>> dis.dis(loop)
2 0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (x)
3 6 SETUP_LOOP 26 (to 35)
>> 9 LOAD_FAST 0 (x)
12 LOAD_CONST 2 (5)
15 COMPARE_OP 0 (<)
18 POP_JUMP_IF_FALSE 34
4 21 LOAD_FAST 0 (x)
24 LOAD_CONST 1 (1)
27 BINARY_ADD
28 STORE_FAST 0 (x)
31 JUMP_ABSOLUTE 9
>> 34 POP_BLOCK
5 >> 35 LOAD_FAST 0 (x)
38 RETURN_VALUE
The above bytecode dump should be rather self-explanatory.
Except perhaps for the RETURN_VALUE
instruction: where does the
instruction return to?
To answer this, a new concept must be introduced: the Frame
.
Frames
As the AOSA article puts it:
A frame is a collection of information[s] and context for a chunk of code.
Whenever a function is called, a new Frame
is created, carrying a data stack
(the local variables we have played with so far) and a block stack (to handle
control flow such as loops and exceptions.)
The RETURN_VALUE
instructs the interpreter to pass a value between Frames
,
from the callee’s data stack back to the caller’s data stack.
I’ll show the pygo
implementation of a Frame
in a moment.
Pygo components
Still following the blueprints of AOSA and byterun
, pygo
is built on
the following types:
a
VM
(virtual machine) which manages the high-level structures (call stack of frames, mapping of instructions to operations, etc…). TheVM
is a slightly more complex version of the previousInterpreter
type fromtiny-interp
,a
Frame
: everyFrame
value contains a code value and manages some state (such as the global and local namespaces, a pointer to the callingFrame
and the last bytecode instruction executed),a
Function
to model real Python functions: this is to correctly handle the creation and destruction ofFrames
,a
Block
to handle Python block management on to which control flow and loops are mapped.
Virtual machine
Each value of a pygo.VM
must store the call stack, the Python
exception state and the return values as they flow between frames:
type VM struct {
frames Frames // call stack of Frames
fp *Frame // pointer to current Frame
ret Value // return value
exc Exception // last exception
}
A pygo.VM
value can run bytecode with the RunCode
method:
func (vm *VM) RunCode(code Code, globals, locals map[string]Value) (Value, error) {
frame := vm.makeFrame(code, globals, locals, vm.fp)
return vm.runFrame(frame)
}