09 Sep 2016

A tiny python-like interpreter

Last episode saw me slowly building up towards setting the case for a pygo interpreter: a python interpreter in Go.

Still following the Python interpreter written in Python blueprints, let me first do (yet another!) little detour: let me build a tiny (python-like) interpreter.

A Tiny Interpreter

This tiny interpreter will understand three instructions:

  • LOAD_VALUE
  • ADD_TWO_VALUES
  • PRINT_ANSWER

As stated before, my interpreter doesn’t care about lexing, parsing nor compiling. It has just, somehow, got the instructions from somewhere.

So, let’s say I want to interpret:

7 + 5

The instruction set to interpret it would look like:

code := Code{
	Prog: []Instruction{
		OpLoadValue, 0, // load first number
		OpLoadValue, 1, // load second number
		OpAdd,
		OpPrint,
	},
	Numbers: []int{7, 5},
}

var interp Interpreter
interp.Run(code)

The astute reader will probably notice I have slightly departed from AOSA’s python code. In the book, each instruction is actually a 2-tuple (Opcode, Value). Here, an instruction is just a stream of “integers”, being (implicitly) either an Opcode or an operand.


The CPython interpreter is a stack machine. Its instruction set reflects that implementation detail and thus, our tiny interpreter implementation will have to cater for this aspect too:

type Interpreter struct {
	stack stack
}

type stack struct {
	stk []int
}

Now, the interpreter has to actually run the code, iterating over each instructions, pushing/popping values to/from the stack, according to the current instruction. That’s done in the Run(code Code) method:

func (interp *Interpreter) Run(code Code) {
	prog := code.Prog
	for pc := 0; pc < len(prog); pc++ {
		op := prog[pc].(Opcode)
		switch op {
		case OpLoadValue:
			pc++
			val := code.Numbers[prog[pc].(int)]
			interp.stack.push(val)
		case OpAdd:
			lhs := interp.stack.pop()
			rhs := interp.stack.pop()
			sum := lhs + rhs
			interp.stack.push(sum)
		case OpPrint:
			val := interp.stack.pop()
			fmt.Println(val)
		}
	}
}

And, yes, sure enough, running this:

func main() {
	code := Code{
		Prog: []Instruction{
			OpLoadValue, 0, // load first number
			OpLoadValue, 1, // load second number
			OpAdd,
			OpPrint,
		},
		Numbers: []int{7, 5},
	}

	var interp Interpreter
	interp.Run(code)
}

outputs:

$> go run ./cmd/tiny-interpreter/main.go
12

The full code is here: github.com/sbinet/pygo/cmd/tiny-interp.

Variables

The AOSA article sharply notices that, even though this tiny-interp interpreter is quite limited, its overall architecture and modus operandi are quite comparable to how the real python interpreter works.

Save for variables. tiny-interp doesn’t do variables. Let’s fix that.

Consider this code fragment:

a = 1
b = 2
print(a+b)

tiny-interp needs to be modified so that:

  • values can be associated to names (variables), and
  • new Opcodes need to be added to describe these associations.

Under these new considerations, the above code fragment would be compiled down to the following program:

func main() {
	code := Code{
		Prog: []Instruction{
			OpLoadValue, 0,
			OpStoreName, 0,
			OpLoadValue, 1,
			OpStoreName, 1,
			OpLoadName, 0,
			OpLoadName, 1,
			OpAdd,
			OpPrint,
		},
		Numbers: []int{1, 2},
		Names:   []string{"a", "b"},
	}

	interp := New()
	interp.Run(code)
}

The new opcodes OpStoreName and OpLoadName respectively store the current value on the stack with some variable name (the index into the Names slice) and load the value (push it on the stack) associated with the current variable.

The Interpreter now looks like:

type Interpreter struct {
	stack stack
	env   map[string]int
}

where env is the association of variable names with their current value.

The Run method is then modified to handle OpLoadName and OpStoreName:

 func (interp *Interpreter) Run(code Code) {
@@ -63,6 +79,16 @@ func (interp *Interpreter) Run(code Code) {
                case OpPrint:
                        val := interp.stack.pop()
                        fmt.Println(val)
+               case OpLoadName:
+                       pc++
+                       name := code.Names[prog[pc].(int)]
+                       val := interp.env[name]
+                       interp.stack.push(val)
+               case OpStoreName:
+                       pc++
+                       name := code.Names[prog[pc].(int)]
+                       val := interp.stack.pop()
+                       interp.env[name] = val
                }
        }
 }

At this point, tiny-interp correctly handles variables:

$> tiny-interp
3

which is indeed the expected result.

The complete code is here: github.com/sbinet/pygo/cmd/tiny-interp

Control flow && function calls

tiny-interp is already quite great. I think. But there is at least one glaring defect. Consider:

def cond():
	x = 3
	if x < 5:
		return "yes"
	else:
		return "no"

tiny-interp doesn’t handle conditionals. It’s also completely ignorant about loops and can’t actually call (nor define) functions. In a nutshell, there is no control flow in tiny-interp. Yet.

To properly implement function calls, though, tiny-interp will need to grow a new concept: activation records, also known as Frames.

Stay tuned…


Update: correctly associate Frames with function calls. Thanks to /u/munificent.