CMPU-331 Compilers (Spring 2019)


This phase of the compiler project involves designing symbol table management routines, which will be accessed by the lexical analyzer and semantic routines.

Our compiler uses three symbol tables:

  1. a constant table, which contains constants (represented as strings) used in the program;
  2. a global symbol table, which contains names known throughout all scopes of the Vascal program;
  3. a local symbol table, which contains names known in a nested block while it is being parsed.

The symbol tables are used by the semantic actions, which are called at appropriate points in the parse to insert names and information associated with them (when parsing declarations) and look up names used in the body of the program (to see that they have been declared and are used appropriately according to the specifications of their declarations). The semantic actions also use information about the stage of the parse to determine whether an identfier is inserted into the local or global table.

At this point, without the semantic actions, most of the information that will eventually be stored in the symbol tables is not available. So for this assignment, you will just set up the symbol table class and the symbol table entry classes for later use and test that they work.


Symbol Tables

The constant and global tables are set up at the very outset of compilation, before the first token is scanned, and are not deleted until the compilation process terminates.

The global table initially contains the names of all built-in functions, in our case, read and write, the (internal) program name main, and input and output, the filenames passed to the Vascal program). The names READ and WRITE should be stored in the symbol table as procedure entries (with 0 parameters). The names INPUT and OUTPUT can be stored as variable entries. (Note that INPUT and OUTPUT are never referenced after the first mention, but they need to be in the symbol table so the names are not used otherwise.)

The constant table is initially empty.


Local tables are instantiated when the compiler begins to parse a procedure or function and deleted when parsing of that procedure or function ends. All of the information in each of the symbol tables is accessed using hashing. Each symbol table object will have the capability to insert a new entry and lookup an entry.

NOTE: Do not use the symbol table class to create a keyword table in the lexical analyzer: make it a simple hash table. Using the symbol table class in the lexical analyzer creates a a circular dependency that Maven and IntelliJ get very upset about.

Table Management

The table management routines will do the following:

  • Insert or search for a given identifier in the symbol table.
  • Search for a given constant; if not found, insert the constant in the table.

Since the stage of the parse determines whether an identifier needs to be inserted in the symbol table, a global flag or parameter (set and used by the semantic actions) will be used to indicate which is the case.

Constants are always searched for; if not found, it is inserted in the table.

The table management routines also have to know if the parser is currently dealing with the main program or a procedure or function. Identifiers declared in the main program are stored in the global table, and identifiers declared in procedures and functions are stored in the local table. Identifiers referenced in a procedure or function are searched for first in the local table, and if not found, then in the global table. After a procedure or function is scanned and parsed, the local table containing identifiers for that procedure or function is deleted. Again, without the semantic actions, we cannot determine if we are in global or local scope.

Note that the name of a procedure or function is global: it is known in the main program and therefore stored as a global identifier.


SymbolTable Class

Define a SymbolTable class, which includes:

  • a hashtable, capacity (size) to be passed as a parameter to the constructor.
  • a lookup method.
  • an insert method.
  • a size method that returns the number of entries in the table.
  • a dumpTable() method that prints the contents of the table (name and the contents of each field in the entry, mostly empty for now).

SymbolTableEntry Class & Children

Define a SymbolTableEntry class. This will be the parent class of a suite of child classes.

This class includes the following boolean methods, all of which return false in the parent class and are overridden by specific entries as needed (upon instantiation if possible, or later by the semantic actions):

  • isVariable
  • isProcedure
  • isFunction
  • isFunctionResult
  • isParameter
  • isArray
  • isReserved

Define a a set of subclasses that extend the SymbolTableEntry class, one for each of the following types:

  • ArrayEntry (name [string], address [int], type [TokenType], upperBound [int], lowerBound [int])
  • ConstantEntry (name [string], type [TokenType])
  • FunctionEntry (name [string], numberOfParameters [int], parameterInfo [List], result [VariableEntry])
  • ProcedureEntry (name [string], numberOfParameters [int], parameterInfo [List])
  • VariableEntry (name [string], address [int], type [TokenType])
  • IODeviceEntry (name [string])

Error Handling

Define a SymbolTableError class. The only error at this point (and maybe ever) is an attempt to insert a name into a table where it already exists.


  1. Depending on the type of the entry, different fields will be required. The basic fields and their types for each are listed above. In this assignment only the name field will be used for all entry types except ProcedureEntry, which will also set numberOfParameters when inserting built-in names.
  2. You should enter the names of built-in procedures and IO devices to conform to the way your lexical analyzer normalizes names, as either all upper or all lower case.