Build an AST Interpreter

Posted on 2022-02-04 | In Programing Language

Interpreter

Interpreted Languages
- Implement semantics themselves
- AST-based (recursive) interpreters
  - Tree-like structure
- Bytecode-interpreters(VM)
  - Instruction-like structure
Compiled languages
- Delegate semantics to a target language
- Ahead-of-time(AOT) compilers (C++)
  - Can call interpreter in during the compilation process
- Just-in-time(JIT) compilers
- AST-transformers

AST interpreter

Capture high-level semantics

Let’s say we have the following example code:

x = 15;
x + 10 - 5;

The AST result could look like this

// AST

[program, [
    [assign, x, 15],
    [sub,
        [add, x, 10],
        5
    ]
]]

This AST result is handed to an AST interpreter.

You can find the AST definition for different languages in astexplorer.net.

Bytecode interpreter

Bytecode interpreter is also known as a virtual machine. It has one more step for generating bytecode instructions compared to the AST interpreter.

The reason we need this type of interpreter is that it produces memory/space efficient data structures (array of bytes) and is faster to traverse in contrast to tree-like data structures.

Stack-based Machines
- Stack for operands and operators
- Result is “on top of the stack”
Register-based machines
- Set of virtual registers
- Result is in “accumulator” register
- Mapped to real via register allocation

For Stack-based machines, take the following source code example, the produced bytecode can look something like this:

// Bytecode

push %15
set %0
push %0
push $10
add
push $5
sub

stack pointer(sp) and instruction pointer(ip) are the two most important pointers for any stack-based virtual machines.

S-Expressions

We will build a small toy language called Eva whose language rule can be outlined as follows:

//Expression format:

(+ 5 10)            // addidtion
(set x 15)          // assignment
(if (> x 10)        // if
    (print "ok")
    (print "err")
)

// define a function:
(def foo (bar)
      (+ bar 10))

// lambda function
(lambda (x) (* x x) 10)

// all functions in Eva are closures
(def createCounter()
  (begin
    (var i 0)
    (lambda () (++i))
  )
)

(var count (creatCounter))
(count) //1
(count) //2

// variable declaration
(var foo 10)

// assignment expression
(set foo 10)

Desgin goals

Simple syntax: S-expression
- lambda functions, IILEs
Everything is an expression
- Functional programming
No explicit return; last evaluated expression is the result
- Imperative programming
First-class functions: assgin to variables, pass as arguments, return as values
- Namespaces and modules
Static scope: all functions are closures
- OOP: class-based, prototype-based

BNF (S-Expression)

Exp
  : Atom
  | List
  ;

Atom
  : NUMBER { $$ = Number($1) }
  | STRING
  | SYMBOL
  ;

List
  : '(' ListEntries ')' { $$ = $2 }
  ;

ListEntries
  : ListEntries Exp { $1.push($2); $$ = $1 }
  | /* empty */     { $$ = [] }
  ;

Environment

Environment is just a repository of variables and functions defined in the scope.

Environment Record (actual storage)
Optional reference to Parent Environment

Block

When entering a new block of code, a new environment will be created

(var x 10)
(print x)       // 10

(begin
    (set x 20)
    (print x)   // 20
)
(print x)       // 10

Functions

// call to global `print` function
(print "Hellow World")

// call to global `+` function:
(+ ratio 30)

// function declaration
(var x 10)
(def foo() x)
(def bar()
    (begin
        (var x 20)
        (+ (foo) x)
    )
)

For function call, we need two things

A new Activation Environment to store local variables and function parameters. The activation environment is also known as
- activation frame
- activation object
- activation record
- stack frame
- frame
A parent link to capture parent scope
A call stack for debugging. The stack is also known as
- execution stack
- control stack
- stack

It turns out that once we have the activation environment, if we don’t need to think about debugging, there is no need to build stacks for functions. All we need is a garbage collector that can clean up the objects stored in the activation environments. The environment itself can be allocated on heap.

Syntactic sugar

switch

The switch expression can be a syntax sugar of nested if-else expressions.

//Expression:

(switch
    (<cond1> <block1>)
    ...
    (<condN> <blockN>)
    (else <alternate>)
)

// Example:
(switch
    ((> x 1) 100)
    ((= x 1) 200)
    (else 0)
)

// Implementation
(if <cond1>
    <block1>
    ...
    (if <condN>
        <blockN>
        <alternate>
    )
)

for

For loop is a syntax sugar of the while loop

// Expressions:
(for <init>
     <condition>
     <modifier>
     <exp>
)

// Example:
(for (var x 10)
     (> x 0)
     (- x 1)
     (print x)
)

// Implementation
(begin
    <init>
    (while <condition>
        (begin
            <exp>
            <modifier>
        )
    )
)

OOP

There are generally two approaches to implement OOP: Class-based OOP and Prototype-based OOP.

Classes and Instances (Inheritance chains)

(class Point null
    (begin
        (def constructor (self x y)
            (begin
                (set (prop self x) x)
                (set (prop self y) y)
            )
        )
        (def calc (self)
            (+
                (prop self x)
                (prop self y)
            )
        )
    )
)

(var p (new Point 10 20))
((prop p calc) p)

Objects and Prototypes (Delegation chain)

(object base
    (x 10)
    (y 20)
)

(object child
    (z 30)
    (__proto__ base)
)

(prop child x) //10
(prop child z) //30

From the implementation perspective, a class is just a named environment which can be instantiated to create objects. Since it is an environment, it has an environment record (actual storage) and an optional reference to the parent environment.

As you can see, the class is also an object that has its own space in memory similar to its instance.

Module

A module is nothing but a named first-class environment.

// `Math` module declaration
(module Math
    (begin
        (def abs (value)
            (if (< value 0)
                (- value)
                value
            )
        )

        (def square (x)
            (* x x)
        )

        (var PI 3.1415926)
    )
)