12 Sep 2016, 14:35

Introduction to the pygo virtual machine

In the last episode, I have showed a rather important limitation of the tiny-interp interpreter:

def cond():
	x = 3
	if x < 5:
		return "yes"
	else:
		return "no"

Control flow and function calls were not handled, as a result tiny-interp could not interpret the above code fragment.

In the following, I’ll ditch tiny-interp and switch to the “real” pygo interpreter.

Real Python bytecode

People having read the AOSA article know that the structure of the bytecode of the tiny-interp interpreter instruction set is in fact very similar to the one of the real python bytecode.

Indeed, if one defines the above cond() function in a python3 prompt and enters:

### bytecode as raw bytes
>>> print(cond.__code__.co_code)
b'd\x01\x00}\x00\x00|\x00\x00d\x02\x00k\x00\x00r\x16\x00d\x03
\x00Sd\x04\x00Sd\x00\x00S'

### bytecode as numbers
>>> print(list(cond.__code__.co_code))
[100, 1, 0, 125, 0, 0, 124, 0, 0, 100, 2, 0, 107,
0, 0, 114, 22, 0, 100, 3, 0, 83, 100, 4, 0, 83,
100, 0, 0, 83]

This doesn’t look very human friendly. Luckily, there is the dis module that can ingest low-level bytecode and prints it in a more human-readable way:

>>> import dis
>>> dis.dis(cond)
  2           0 LOAD_CONST               1 (3)
              3 STORE_FAST               0 (x)

  3           6 LOAD_FAST                0 (x)
              9 LOAD_CONST               2 (5)
             12 COMPARE_OP               0 (<)
             15 POP_JUMP_IF_FALSE       22

  4          18 LOAD_CONST               3 ('yes')
             21 RETURN_VALUE

  6     >>   22 LOAD_CONST               4 ('no')
             25 RETURN_VALUE
             26 LOAD_CONST               0 (None)
             29 RETURN_VALUE

Have a look at the official dis module documentation for more informations. In a nutshell, the LOAD_CONST is the same than our toy OpLoadValue and LOAD_FAST is the same than our toy OpLoadName.

Simply inspecting this little bytecode snippet shows how conditions and branch-y code might be handled. The instruction POP_JUMP_IF_FALSE implements the if x < 5 statement from the cond() function. If the condition is false (i.e.: x is greater or equal than 5), the interpreter is instructed to jump to position 22 in the bytecode stream, i.e. the return "no" body of the false branch. Loops are handled pretty much the same way:

>>> def loop():
...     x = 1
...     while x < 5:
...             x = x + 1
...     return x
... 
>>> dis.dis(loop)
  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (x)

  3           6 SETUP_LOOP              26 (to 35)
        >>    9 LOAD_FAST                0 (x)
             12 LOAD_CONST               2 (5)
             15 COMPARE_OP               0 (<)
             18 POP_JUMP_IF_FALSE       34

  4          21 LOAD_FAST                0 (x)
             24 LOAD_CONST               1 (1)
             27 BINARY_ADD
             28 STORE_FAST               0 (x)
             31 JUMP_ABSOLUTE            9
        >>   34 POP_BLOCK

  5     >>   35 LOAD_FAST                0 (x)
             38 RETURN_VALUE

The above bytecode dump should be rather self-explanatory. Except perhaps for the RETURN_VALUE instruction: where does the instruction return to?

To answer this, a new concept must be introduced: the Frame.

Frames

As the AOSA article puts it:

A frame is a collection of information[s] and context for a chunk of code.

Whenever a function is called, a new Frame is created, carrying a data stack (the local variables we have played with so far) and a block stack (to handle control flow such as loops and exceptions.)

The RETURN_VALUE instructs the interpreter to pass a value between Frames, from the callee’s data stack back to the caller’s data stack.

I’ll show the pygo implementation of a Frame in a moment.

Pygo components

Still following the blueprints of AOSA and byterun, pygo is built on the following types:

a VM (virtual machine) which manages the high-level structures (call stack of frames, mapping of instructions to operations, etc…). The VM is a slightly more complex version of the previous Interpreter type from tiny-interp,
a Frame: every Frame value contains a code value and manages some state (such as the global and local namespaces, a pointer to the calling Frame and the last bytecode instruction executed),
a Function to model real Python functions: this is to correctly handle the creation and destruction of Frames,
a Block to handle Python block management on to which control flow and loops are mapped.

Virtual machine

Each value of a pygo.VM must store the call stack, the Python exception state and the return values as they flow between frames:

type VM struct {
	frames Frames    // call stack of Frames
	fp     *Frame    // pointer to current Frame
	ret    Value     // return value
	exc    Exception // last exception
}

A pygo.VM value can run bytecode with the RunCode method:

func (vm *VM) RunCode(code Code, globals, locals map[string]Value) (Value, error) {
	frame := vm.makeFrame(code, globals, locals, vm.fp)
	return vm.runFrame(frame)
}

09 Sep 2016, 10:37

A tiny python-like interpreter

under go interpreter

Last episode saw me slowly building up towards setting the case for a pygo interpreter: a python interpreter in Go.

Still following the Python interpreter written in Python blueprints, let me first do (yet another!) little detour: let me build a tiny (python-like) interpreter.

A Tiny Interpreter

This tiny interpreter will understand three instructions:

LOAD_VALUE
ADD_TWO_VALUES
PRINT_ANSWER

As stated before, my interpreter doesn’t care about lexing, parsing nor compiling. It has just, somehow, got the instructions from somewhere.

So, let’s say I want to interpret:

7 + 5

The instruction set to interpret it would look like:

code := Code{
	Prog: []Instruction{
		OpLoadValue, 0, // load first number
		OpLoadValue, 1, // load second number
		OpAdd,
		OpPrint,
	},
	Numbers: []int{7, 5},
}

var interp Interpreter
interp.Run(code)

The astute reader will probably notice I have slightly departed from AOSA’s python code. In the book, each instruction is actually a 2-tuple (Opcode, Value). Here, an instruction is just a stream of “integers”, being (implicitly) either an Opcode or an operand.

The CPython interpreter is a stack machine. Its instruction set reflects that implementation detail and thus, our tiny interpreter implementation will have to cater for this aspect too:

type Interpreter struct {
	stack stack
}

type stack struct {
	stk []int
}

Now, the interpreter has to actually run the code, iterating over each instructions, pushing/popping values to/from the stack, according to the current instruction. That’s done in the Run(code Code) method:

func (interp *Interpreter) Run(code Code) {
	prog := code.Prog
	for pc := 0; pc < len(prog); pc++ {
		op := prog[pc].(Opcode)
		switch op {
		case OpLoadValue:
			pc++
			val := code.Numbers[prog[pc].(int)]
			interp.stack.push(val)
		case OpAdd:
			lhs := interp.stack.pop()
			rhs := interp.stack.pop()
			sum := lhs + rhs
			interp.stack.push(sum)
		case OpPrint:
			val := interp.stack.pop()
			fmt.Println(val)
		}
	}
}

And, yes, sure enough, running this:

func main() {
	code := Code{
		Prog: []Instruction{
			OpLoadValue, 0, // load first number
			OpLoadValue, 1, // load second number
			OpAdd,
			OpPrint,
		},
		Numbers: []int{7, 5},
	}

	var interp Interpreter
	interp.Run(code)
}

outputs:

$> go run ./cmd/tiny-interpreter/main.go
12

The full code is here: github.com/sbinet/pygo/cmd/tiny-interp.

Variables

The AOSA article sharply notices that, even though this tiny-interp interpreter is quite limited, its overall architecture and modus operandi are quite comparable to how the real python interpreter works.

Save for variables. tiny-interp doesn’t do variables. Let’s fix that.

Consider this code fragment:

a = 1
b = 2
print(a+b)

tiny-interp needs to be modified so that:

values can be associated to names (variables), and
new Opcodes need to be added to describe these associations.

Under these new considerations, the above code fragment would be compiled down to the following program:

func main() {
	code := Code{
		Prog: []Instruction{
			OpLoadValue, 0,
			OpStoreName, 0,
			OpLoadValue, 1,
			OpStoreName, 1,
			OpLoadName, 0,
			OpLoadName, 1,
			OpAdd,
			OpPrint,
		},
		Numbers: []int{1, 2},
		Names:   []string{"a", "b"},
	}

	interp := New()
	interp.Run(code)
}

The new opcodes OpStoreName and OpLoadName respectively store the current value on the stack with some variable name (the index into the Names slice) and load the value (push it on the stack) associated with the current variable.

The Interpreter now looks like:

type Interpreter struct {
	stack stack
	env   map[string]int
}

where env is the association of variable names with their current value.

The Run method is then modified to handle OpLoadName and OpStoreName:

 func (interp *Interpreter) Run(code Code) {
@@ -63,6 +79,16 @@ func (interp *Interpreter) Run(code Code) {
                case OpPrint:
                        val := interp.stack.pop()
                        fmt.Println(val)
+               case OpLoadName:
+                       pc++
+                       name := code.Names[prog[pc].(int)]
+                       val := interp.env[name]
+                       interp.stack.push(val)
+               case OpStoreName:
+                       pc++
+                       name := code.Names[prog[pc].(int)]
+                       val := interp.stack.pop()
+                       interp.env[name] = val
                }
        }
 }

At this point, tiny-interp correctly handles variables:

$> tiny-interp
3

which is indeed the expected result.

The complete code is here: github.com/sbinet/pygo/cmd/tiny-interp

Control flow && function calls

tiny-interp is already quite great. I think. But there is at least one glaring defect. Consider:

def cond():
	x = 3
	if x < 5:
		return "yes"
	else:
		return "no"

tiny-interp doesn’t handle conditionals. It’s also completely ignorant about loops and can’t actually call (nor define) functions. In a nutshell, there is no control flow in tiny-interp. Yet.

To properly implement function calls, though, tiny-interp will need to grow a new concept: activation records, also known as Frames.

Stay tuned…

Update: correctly associate Frames with function calls. Thanks to /u/munificent.

07 Sep 2016, 10:37

Starting a Go interpreter

under go interpreter

In this series of posts, I’ll try to explain how one can write an interpreter in Go and for Go. If, like me, you lack a bit in terms of interpreters know-how, you should be in for a treat.

Introduction

Go is starting to get traction in the science and data science communities. And, why not? Go is fast to compile and run, is statically typed and thus presents a nice “edit/compile/run” development cycle. Moreover, a program written in Go is easily deployable and cross-compilable on a variety of machines and operating systems.

Go is also starting to have the foundation libraries for scientific work:

And the data science community is bootstrapping itself around the gopherds community (slack channel: #data-science).

For data science, a central tool and workflow is the Jupyter and its notebook. The Jupyter notebook provides a nice “REPL”-based workflow and the ability to share algorithms, plots and results. The REPL (Read-Eval-Print-Loop) allows people to engage fast exploratory work of someone’s data, quickly iterating over various algorithms or different ways to interpret data. For this kind of work, an interactive interpreter is paramount.

But Go is compiled and even if the compilation is lightning fast, a true interpreter is needed to integrate well with a REPL-based workflow.

The go-interpreter project (also available on Slack: #go-interpreter) is starting to work on that: implement a Go interpreter, in Go and for Go. The first step is to design a bit this beast: here.

Before going there, let’s do a little detour: writing a (toy) interpreter in Go for Python. Why? you ask… Well, there is a very nice article in the AOSA series: A Python interpreter written in Python. I will use it as a guide to gain a bit of knowledge in writing interpreters.

PyGo: A (toy) Python interpreter

In the following, I’ll show how one can write a toy Python interpreter in Go. But first, let me define exactly what pygo will do. pygo won’t lex, parse nor compile Python code.

No. pygo will take directly the already compiled bytecode, produced with a python3 program, and then interpret the bytecode instructions:

shell> python3 -m compileall -l my-file.py
shell> pygo ./__pycache__/my-file.cpython-35.pyc

pygo will be a simple bytecode interpreter, with a main loop fetching bytecode instructions and then executing them. In pseudo Go code:

func run(instructions []instruction) {
	for _, instruction := range instructions {
		switch inst := instruction.(type) {
			case opADD:
				// perform a+b
			case opPRINT:
				// print values
			// ...
		}
	}
}

pygo will export a few types to implement such an interpreter:

a virtual machine pygo.VM that will hold the call stack of frames and manage the execution of instructions inside the context of these frames,
a pygo.Frame type to hold informations about the stack (globals, locals, functions’ code, …),
a pygo.Block type to handle the control flow (if, else, return, continue, etc…),
a pygo.Instruction type to model opcodes (ADD, LOAD_FAST, PRINT, …) and their arguments (if any).

Ok. That’s enough for today. Stay tuned…

In the meantime, I recommend reading the AOSA article.

0xb1ns>> go run

Digital ramblings of a french gopher