This is an old revision of the document!


CMPU-331 Compilers (Fall 2018)

Overview

The semantic action subsystem is the part responsible for generating the intermediate code which can be run using The Vassar Interpreter. (We output intermediate code for several reasons.) This subsystem accepts tokens from the parser when triggered by symbols in an augmented version of the Vascal grammar: these points are where there is sufficient information for there to be semantic meaning and therefore an operation can be performed.

With the integration of semantic routines into the project as it exists so far, the project will be complete, and you should have a working generator for an Intermediate Code representation of any valid program written in Vascal as described by its grammar.

Design

To implement the semantic actions, you will use an augmented version of the original grammar that includes semantic actions in the right hand sides of most productions. The parser must be modified to recognize the presence of a semantic action on the stack top and call the appropriate routine when it does.

The semantic actions are the biggest part of the compiler, in part because this is where all the pieces you have developed finally start working together. Therefore, this part of the project is itself divided into four phases, each due on a given date between now and the time the entire project is due. The due dates are listed on the Project page.

Error Handling

Error Types

Semantic errors are detected by the semantic action routines, and include such things as undeclared and multiply declared variables, mismatches in number and type of actual and formal parameters in a procedure call, any of the various kinds-of-objects used in the wrong context, array without subscripts (or simple variable with subscripts), etc.

In short, if the error is not lexical or syntactic, and has not been caught earlier, it is a semantic error and must be trapped here.

Minimum Requirement

At the very least, the semantic actions should detect errors and emit as explicit and meaningful a message as possible. Don’t issue messages that are meaningless to the user, such as messages which refer to the contents of the semantic stack. As usual, you should give an indication of where in the source the error occurred by giving the line number and, ideally, the character. This is likely just duplication of the lexer and parser error system.

This 'halt-and-catch-fire' approach is entirely fine for this project.

More Advanced Error Handling

One of the most common issues is an undeclared variable. When undeclared variables are detected, you might avoid printing a message every time the undeclared variable is subsequently encountered in the source and print a message only the first time the variable is encountered. The variable should then be entered into the symbol table and a flag (in the symbol table entry for that variable) set to indicate that the variable was entered into the table as the result of an error condition. Similarly, when a simple variable is referenced as if it were an array, set a flag to indicate this in the symbol table entry for that name so that subsequent illegal references do not generate the same error over and over again.

This addition to the error handling system is a little tricky but it does reduce the number of halts. You can choose whether or not this situation results in compiled code being emitted (on the assumption that everything is all right), or not being emitted (the user is required to define variables explicitly).

Submission