The Vassar Interpreter (TVI) is a simple interpreter that executes a language that bears a passing resemblance to assembly language. The purpose of the interpreter is to allow students in an introductory compiler construction course to develop a compiler that generates "executable" code while avoiding many of the complexities that are beyond the scope of an introductory course. This document gives an introduction to TVI and covers some of the possible ways that semantic actions might be implemented. Throughout this document the phrase source program will refer to the language that will be compiled into TVI code and target program refers to the TVI source code that your compiler will generate.
Most instructions take three operands, and the destination operand is always the last operand in the instruction. The general form of instructions is:
opcode operand1, operand2, destination
The above should be read as:
destination := operand1 opcode operand2
A statement in a code section can also be labeled if it is to be the destination of a goto statement or branching statement. A labeled statement takes the form:
label: opcode operand1, operand2, destination
where label is an integer. The labels assigned to statements do not need to be in sequential order, nor do they need to start at any particular value. Labels do need to be unique within a source file. A statement can only have one label, however labels may be placed on blank lines to achieve the same effect.
... goto 0 ... goto 1 ... goto 2 ... 0: 1: 2: opcode operand1, operand2, destination
Single line comments are started with a semi-colon. C-style multi-line comments are also supported.
; This is a single line comment. /* * This is a multiline comment. */
TIP: Have your compiler generate comments when generating the TVI code for while loops and if statements. It can make the code much easier to understand when debugging is required.
Code for procedures and functions is enclosed in PROCBEGIN and PROCEND statements.
Procedures and functions are invoked with the call instruction, and execution jumps to the first instruction after the PROCBEGIN. When the PROCEND is encountered execution jumps back to the first instruction following the call.
TVI memory is managed in much the same way as memory on any computer system. TVI has a series of individual memory locations that are addressed as if they were arranged in one long array.
The STR data type provides a simple string type that can be used to provide basic interactivity. The STR data type is not an array of bytes and no manipulation of STR variables is allowed other than printing them.
To allocate TVI memory, use the alloc instruction.
Memory required at runtime must be allocated on the stack with the alloc instruction. The alloc instruction doesn't actually allocate memory (and free doesn't actually release any memory) it simply increases the stack frame pointer in the interpreter. If a procedure or function is called, the interpreter knows not to overwrite those memory locations with the next stack frame. Memory in the stack frame may still be accessed even if no alloc has been generated, but these memory locations will be overwritten if a procedure or function is called.
Variables declared in procedures and functions are assigned offsets into the current stack frame. The easiest way to do this is to maintain a counter which is set to zero every time the parser moves into a procedure or function definition. Then, after the subroutine has been parsed the generated code can be wrapped with PROCBEGIN, and alloc (the prologue) and free and PROCEND (the epilogue).
Most CPUs support several addressing modes to simplify memory access. Unfortunately (or fortunately, depending on your point of view) most of those are unavailable in TVI. TVI supports two basic addressing modes that allow us to address either an absolute memory location or an offset into the current stack frame:
%5 ; offset five bytes into the current stack frame. _3 ; absolute memory address.
There are also two additional modifiers that can be used with these addressing modes to allow pointer operations; the address of modifier and deference modifier:
@%5 ; evaluates to the absolute memory address of a stack frame location ^%5 ; dereference the address stored in location %5
Any address, including aliases, can be dereferenced with the '^' modifier. However, the '@' modifier can only be applied to memory addresses.
You will need to assign addresses to two types of objects, addresses for the variables declared in the source program and temporary addresses that the compiler uses for intermediate results of computations. Assigning addresses to variables defined in the source program is simply a matter of maintaining a counter and numbering the variables as the parser encounters them in the VAR section.
You will need to generate a temporary memory location for most operations that compute a value. The simplest way to generate valid addresses to use for temporary storage is to continue using the same counter that you used to generate addresses for the variables in the symbol table. This way you are sure that the temporaries you create won't overwrite any parameters or local variables.
This will result in your compiler generating quite a few redundant statements of the form:
move n, %1 move %1, %2 move %2, %3
which will tend to use up huge amounts of memory if no optimization is done. A human can easily see that the instruction move %1, %2 is redundant if we simply move n directly into address %3. However, there is no practical or easy way to determine during parsing which memory locations are safe to reuse, so we don't even bother trying. Getting rid of all the redundant temporary memory locations is a task for the optimizer and there is a very simple algorithm to optimize memory (or register) allocations once all the code has been generated.
TVI opcodes are generally three address instructions. The destination of the instruction is always the last operand and must evaluate to an address. TVI opcodes are case sensitive.
add op1, op2, dest ; addition sub op1, op2, dest ; subtraction div op1, op2, dest ; division mul op1, op2, dest ; multiplication uminus src, dest ; unary minus
fadd op1, op2, dest ; addition fsub op1, op2, dest ; subtraction fdiv op1, op2, dest ; division fmul op1, op2, dest ; multiplication
Using the integer operations on floating point values or the floating point operations on integer values WILL result in giberish.
ltof src, dest ; converts an integer to a floating point representation ftol src, dest ; converts a floating point representaion of an integer
Use ltof and ftol to convert values between integer and floating point representations.
move src, dest ; moves from src to dest store src, offset, dest ; move from src into dest + offset load src, off, dest ; move from src+offset into dest
goto label ; execution jumps to the specified statement blt op1, op2, label ; jump to label if op1 < op2 ble op1, op2, label ; jump to label if op1 <= op2 bgt op1, op2, label ; jump to label if op1 > op2 bge op1, op2, label ; jump to label if op1 >= op2 beq op1, op2, label ; jump to label if op1 == op2 bne op1, op2, label ; jump to label if op1 != op2
print op1 ; prints a literal string or alias defined in the DATA section outp src ; integer output of the value at memory address src foutp src ; floating point output of the value at memory address src inp dest ; reads an integer and stores it to dest finp dest ; reads a floating point value and stores it to dest.
PROCBEGIN name ; marks the beginning of a subroutine. PROCEND ; marks the end of a subroutine. param operand ; pushes the operand onto the parameter stack for a function call alloc n ; adds n memory locations to the current stack frame. free n ; decreases the size of the stack frame by n memory locations. call name, count ; calls the subroutine name. count is the number of parameters ; pushed onto the stack.
nop ; does nothing exit ; causes program execution to halt.
Copyright © 2015 Vassar College