CMPU 331:   Compiler Design (Spring 2016)
1 Overview
2 Requirements
3 Submission

CMPU 331: Compiler Design (Spring 2016)

Symbol Tables

1 Overview

This phase of the compiler project involves designing symbol table management routines, which will be accessed by the lexical analyzer and semantic routines.

Our compiler uses three symbol tables:

The constant and global tables are set up at the very outset of compilation, before the first token is scanned, and are not deleted until the compilation process terminates. The global table initially contains the names of all built-in functions, in our case, read and write, the (internal) program name main, and input and output, the filenames passed to the Pascal program). The names read and write should be stored in the symbol table as procedure entries (with 0 parameters). The names input and output can be stored as variable entries. (note that input and output are never referenced after the first mention, but they need to be in the symbol table so the names are not used otherwise. The constant table is initally empty.

Local tables are instantiated when the compiler begins to parse a procedure or function and deleted when parsing of that procedure or function ends. All of the information in each of the symbol tables is accessed using hashing. Each symbol table object will have the capability to insert a new entry and lookup an entry.

NOTE: Do not use the symbol table class to create a keyword table in the lexical analyzer–make it a simple hash table. Using the symbol table class in the lexical analyzer creates a a circular dependency that Maven and IntelliJ get very upset about.

The table management routines will do the following:

  • insert or search for a given identifier in the symbol table.

  • search for a given constant, and if not found, insert the constant in the table.

Since the stage of the parse determines whether an identifier needs to be inserted in the symbol table, a global flag or parameter (set and used by the semantic actions) will be used to indicate which is the case. Constants are always searched for, and if not found, inserted in the table.

The table management routines also have to know if the parser is currently dealing with the main program or a procedure or function. Identifiers declared in the main program are stored in the global table, and identifiers declared in procedures and functions are stored in the local table. Identifiers referenced in a procedure or function are searched for first in the local table, and if not found, then in the global table. After a procedure or function is scanned and parsed, the local table containing identifiers for that procedure or function is deleted. Again, without the semantic actions, we cannot determine if we are in global or local scope.

Note that the name of a procedure or function is global–it is known in the main program and therefore stored as a global identifier.

2 Requirements

The symbol tables are used by the semantic actions, which are called at appropriate points in the parse to insert names and information associated with them (when parsing declarations) and look up names used in the body of the program (to see that they have been declared and are used appropriately according to the specifications of their declarations). The semantic actions also use information about the stage of the parse to determine whether an identfier is inserted into the local or global table.

At this point, without the semantic actions, most of the information that will eventually be stored in the symbol tables is not available. So for this assignment, you will just set up the symbol table class and the symbol table entry classes for later use and test that they work.

For this assignment, you should do the following:

Depending on the type of the entry, different fields will be required. The basic fields and their types for each are listed above. In this assignment only the name field will be used for all entry types except ProcedureEntry, which will also set numberOfParameters when inserting built-in names.

NOTE: You should enter the names of built-in procedures and IO devices to conform to the way your lexical analyzer normalizes names, as either all upper or all lower case.

3 Submission

Please follow the Submission instructions.