The Vassar Interpreter

Table of Contents

Introduction

The Vassar Interpreter (TVI) is a simple interpreter that executes a language that bears a passing resemblance to assembly language. The purpose of the interpreter is to allow students in an introductory compiler construction course to develop a compiler that generates "executable" code while avoiding many of the complexities that are beyond the scope of an introductory course. This document gives an introduction to TVI and covers some of the possible ways that semantic actions might be implemented. Throughout this document the phrase source program will refer to the language that will be compiled into TVI code and target program refers to the TVI source code that your compiler will generate.

TVI Statements

Most instructions take three operands, and the destination operand is always the last operand in the instruction. The general form of instructions is:

        opcode operand1, operand2, destination

The above should be read as:

        destination := operand1 opcode operand2

A statement in a code section can also be labeled if it is to be the destination of a goto statement or branching statement. A labeled statement takes the form:

        label: opcode operand1, operand2, destination

where label is an integer. The labels assigned to statements do not need to be in sequential order, nor do they need to start at any particular value. Labels do need to be unique within a source file. A statement can only have one label, however labels may be placed on blank lines to achieve the same effect.

...
goto 0
...
goto 1
...
goto 2
...
0:
1:
2: opcode operand1, operand2, destination
        

Comments

Single line comments are started with a semi-colon. C-style multi-line comments are also supported.

; This is a single line comment.

/*
 * This is a multiline comment.
 */
        

TIP: Have your compiler generate comments when generating the TVI code for while loops and if statements. It can make the code much easier to understand when debugging is required.

Back to the top.

Procedures and functions

Code for procedures and functions is enclosed in PROCBEGIN and PROCEND statements.

Procedures and functions are invoked with the call instruction, and execution jumps to the first instruction after the PROCBEGIN. When the PROCEND is encountered execution jumps back to the first instruction following the call.

TVI Memory Management

TVI memory is managed in much the same way as memory on any computer system. TVI has a series of individual memory locations that are addressed as if they were arranged in one long array.

The STR data type provides a simple string type that can be used to provide basic interactivity. The STR data type is not an array of bytes and no manipulation of STR variables is allowed other than printing them.

To allocate TVI memory, use the alloc instruction.

Memory required at runtime must be allocated on the stack with the alloc instruction. The alloc instruction doesn't actually allocate memory (and free doesn't actually release any memory) it simply increases the stack frame pointer in the interpreter. If a procedure or function is called, the interpreter knows not to overwrite those memory locations with the next stack frame. Memory in the stack frame may still be accessed even if no alloc has been generated, but these memory locations will be overwritten if a procedure or function is called.

Variables declared in procedures and functions are assigned offsets into the current stack frame. The easiest way to do this is to maintain a counter which is set to zero every time the parser moves into a procedure or function definition. Then, after the subroutine has been parsed the generated code can be wrapped with PROCBEGIN, and alloc (the prologue) and free and PROCEND (the epilogue).

Back to the top.

Addressing Modes

Most CPUs support several addressing modes to simplify memory access. Unfortunately (or fortunately, depending on your point of view) most of those are unavailable in TVI. TVI supports two basic addressing modes that allow us to address either an absolute memory location or an offset into the current stack frame:

%5 ; offset five bytes into the current stack frame.
_3 ; absolute memory address.
        

There are also two additional modifiers that can be used with these addressing modes to allow pointer operations; the address of modifier and deference modifier:

@%5  ; evaluates to the absolute memory address of a stack frame location
^%5  ; dereference the address stored in location %5
        

Any address, including aliases, can be dereferenced with the '^' modifier. However, the '@' modifier can only be applied to memory addresses.

Back to the top.

Generating Addresses

You will need to assign addresses to two types of objects, addresses for the variables declared in the source program and temporary addresses that the compiler uses for intermediate results of computations. Assigning addresses to variables defined in the source program is simply a matter of maintaining a counter and numbering the variables as the parser encounters them in the VAR section.

You will need to generate a temporary memory location for most operations that compute a value. The simplest way to generate valid addresses to use for temporary storage is to continue using the same counter that you used to generate addresses for the variables in the symbol table. This way you are sure that the temporaries you create won't overwrite any parameters or local variables.

This will result in your compiler generating quite a few redundant statements of the form:

move n, %1
move %1, %2
move %2, %3

which will tend to use up huge amounts of memory if no optimization is done. A human can easily see that the instruction move %1, %2 is redundant if we simply move n directly into address %3. However, there is no practical or easy way to determine during parsing which memory locations are safe to reuse, so we don't even bother trying. Getting rid of all the redundant temporary memory locations is a task for the optimizer and there is a very simple algorithm to optimize memory (or register) allocations once all the code has been generated.

Back to the top.

TVI Opcodes

TVI opcodes are generally three address instructions. The destination of the instruction is always the last operand and must evaluate to an address. TVI opcodes are case sensitive.

Integer Operations

add op1, op2, dest  ; addition
sub op1, op2, dest  ; subtraction
div op1, op2, dest  ; division
mul op1, op2, dest  ; multiplication
uminus src, dest    ; unary minus

Integer Operations

fadd op1, op2, dest  ; addition
fsub op1, op2, dest  ; subtraction
fdiv op1, op2, dest  ; division
fmul op1, op2, dest ; multiplication

Using the integer operations on floating point values or the floating point operations on integer values WILL result in giberish.

Type Conversions

ltof src, dest  ; converts an integer to a floating point representation
ftol src, dest  ; converts a floating point representaion of an integer

Use ltof and ftol to convert values between integer and floating point representations.

Moving Memory

move src, dest           ; moves from src to dest
store src, offset, dest  ; move from src into dest + offset
load src, off, dest      ;  move from src+offset into dest

Flow of Control

goto label           ; execution jumps to the specified statement
blt op1, op2, label  ; jump to label if op1 < op2
ble op1, op2, label  ; jump to label if op1 <= op2
bgt op1, op2, label  ; jump to label if op1 > op2
bge op1, op2, label  ; jump to label if op1 >= op2
beq op1, op2, label  ; jump to label if op1 == op2
bne op1, op2, label  ; jump to label if op1 != op2

Input/Output

print op1  ; prints a literal string or alias defined in the DATA section
outp  src  ; integer output of the value at memory address src
foutp src  ; floating point output of the value at memory address src
inp dest   ; reads an integer and stores it to dest
finp dest  ; reads a floating point value and stores it to dest.

Calling Subroutines

PROCBEGIN name   ; marks the beginning of a subroutine.
PROCEND          ; marks the end of a subroutine.
param operand    ; pushes the operand onto the parameter stack for a function call
alloc n          ; adds n memory locations to the current stack frame.
free n           ; decreases the size of the stack frame by n memory locations.
call name, count ; calls the subroutine name. count is the number of parameters 
                 ; pushed onto the stack.

Misc Opcodes

nop  ; does nothing
exit ; causes program execution to halt.

Back to the top.

Known Problems

  1. The interpreter will crash if you use more than 256 memory addresses.
  2. There is no modulo operator. Compilers will need to generate code to calculate modulo values manually.
  3. Branching opcodes assume integer values are being compared.

Back to the top.