

TVI simulates a simple CPU with a limited amount of memory and no general purpose registers . In order to simplify the job of writing a compiler that generates TVI code many of the messy details of function calling , memory allocation for strings, and the placement of the actual program in memory are handled by the CPU itself. While this simplifies the task of writing a compiler, it does so at the expense of the implementation of the interpreter. However, since TVI only needs to interpret the code and not actually simulate a CPU there are many things that can be done to ease the implementation. i.e. memory allocation for strings can be done with a list of strings rather than actually storing them in the CPU's memory.
The CPU will manage a pool of contiguous memory locations of type Byte, where Byte is simply a typedef for type char. Memory on the CPU can be addressed as a Byte, as a Word (two Bytes), or as a Long (two Words). The CPU will need to provide methods that allow read and write access to individual memory locations and for running the current instruction. The CPU will also need to provide four registers that will be used during program execution (but not directly in TVI statements). These are:
The CPU will also manage two stacks: one for activation records for function calls and another another that will be available to the TVI programmer through the push and pop opcodes. An activation record will be pushed onto the stack whenever the call opcode is executed and is used to record: the value of the frame pointer, the size of the current frame, and the address of the next instruction (i.e. where the function should return to). When a function call returns the activation stack will be popped and the values of the registers restored.
(Note: these are a tasks that are usual up to the assembly language programmer/compiler writer to look after, but that have been abstracted away by the interpreter.)
Execution of the TVI program will consist of two phases. During the first phase the interpreter will read the input file and create a list of the instructions found. During the second phase the interpreter will iterate through the list and execute each instruction.
During the first phase of execution the interpreter will read the source file and parse the TVI program. The interpreter should only attempt to execute valid TVI source code and should print an appropriate error message if there is a syntax error in the file.
Most programming languages require a variable to be declared before it can be used. However, the TVI programmer is allowed to use a variable before it has been declared, causing a slight problem for the parser. When parsing an instruction the parser needs to substitue variable names with its addresses, however if the variable's declaration hasn't been encountered yet the variable won't have an address assigned. The same is also true for procedures and labels. A procedure may be called before it has been defined, and a goto can jump to a statement later in the file. I.E.
CODE call foo exit PROCBEGIN foo bgt i, 0, 99 ; if i > 0 then goto 99 print msg 99: PROCEND DATA LONG i = 1 STR msg = "i is not greater than 0\n"
The simplest solution is for the parser to do a pre-processing pass of the file. During the pre-processing pass the parser simply looks for variable declarations, procedure definitions, and labelled statements. Labels and function names are entered into the symbol table. Since the address of the statement that the label or function name refers to isn't known yet dummy instructions are generated and entered into the symbol table with the name. After parsing is complete these dummy instructions can be removed, but at least during parsing the parser will have an address to use.
The memory offsets for variables is also determined and the variable names and offsets are stored in the symbol table. Any instructions necessary to initialize the variable are generated at this time as these statements must be the first ones executed by the interpreter. In the above example, the interpreter must prepend the instruction
move 1, _0
to the program (assuming i has been assigned offset 0) so that i
has the proper value when the program runs.
During the second pass, the parser will reread the source file, this time parsing code sections and generating the instruction list for the opcodes in the file. Instructions are simply appended to a list as they are encountered. After the second pass completes the parser should return a list of all the instructions needed to run the program.
To rund the program the CPU should load the first instruction in the list and continue executing instructions until the instruction pointer becomes null. The action taken by the CPU at each step of the program will depend on the opcode of the current instruction. The opcodes can be divided into several categories:
When executing arithmatic, memory, or I/O instructions the CPU will take the appropriate action and move to the next instruction in the list. A flow of control instruction will cause the instruction pointer to be moved to an instruction other than the next one in the list. For example, when a procedure call is encountered the procedure name is looked up in the symbol table and the instruction pointer is set to the address recorded in the table.
TVI files constist of DATA sections and CODE sections. There can be any number of CODE and DATA sections in a TVI file and they may appear in any order in the source code.
A DATA section consists of zero or more variable declarations, where a variable declarations is a size specifier (one of BYTE, WORD, or LONG) and a comma delimited list of identifiers. Variables may also be intialized at the same time they are declared: I.E.
DATA LONG x, y = 1, z
DATA sections may also contain string variables of type STR. The strings in TVI are provided so the TVI programmer can print simple prompts and messages. It is not possible to manipulate string programmatically via TVI code. (mostly because we won't be implemementing any of the opcodes that allow access to a single byte of memory). Therefore, STR variable must initialized when they are declared. I.E.
DATA STR prompt = "Enter a number : "
CODE sections contain the list programming statements that make up a TVI program. The list of statements can contain zero or more instructions and procedure declarations.
A procedure declaration is the keyword PROCBEGIN followed by an identifier and a newline character, a list of statements, and terminated with the keyword PROCEND. TVI procedure can not be nested.
An instruction can be a label on a line by itself, a label followed by an opcode and its operands, or simply an opcode and its operands. The number of operands to parse depends on the opcode in the instruction.
A label is a non-negative integer value followed by a colon.
Operands can be literal integer or floating point values, a memory address (either absolute or an offset from the frame pointer), or the name of a user defined variable or function. All memory offsets are non-negative integer values.
An absolute address consists of an underscore followed by a memory offset (i.e. _4). A local address, that is, an offset from the frame pointer, consists of a percent sign followed by a memory offset (i.e. %8).
There are also two addressing modes that may be applied to operands. Operands may be dereferenced with the caret, i.e, ^_4, and the absolute address of a memory offset can be determined with the at symbol, i.e. @%4, @i.
The Vassar Interpreter (TVI) is a program that reads and interprets a language that bears a passing resemblance to three address assembly code. The goal of the interpreter is (was) to allow students in the compiler construction course (CS331) to write a compiler that generates a reasonable fascimile of assembly language and then be able to run the generated code. TVI abstracts away many of the messy CPU details for the 331 students allowing them to concentrate on the compiler itself.
Our task is to implement TVI. More details regarding TVI instruction set can be found here.