CMPU 331:   Compiler Design (Spring 2017)
1 Overview
2 Error detection and recovery
3 Considerations
4 Additional resources
5 Submission

CMPU 331: Compiler Design (Spring 2017)

Parser

1 Overview

This phase of the project implements an LL(1) parser for the language described by this grammar.

The parser consists of the following:

The parse table lists all the entries for your matrix. Note that negative entries denote productions whose right-hand-side is the empty string. The table is also available as an Excel file with the terminals and non-terminals given as headers to the rows and columns.

The parser works by examining the symbol on top of the stack and the current input symbol, which is the token most recently passed to the parser by the lexical analyzer. If the stack symbol is a terminal and matches the current input symbol, the top symbol is popped from the stack and the parser calls for the next input token. If the stack symbol is a non-terminal, the parser consults the cell in the parse table corresponding to the stack symbol and the current input symbol. If the entry in the table is accept, the parse has successfully completed. If the entry is error, the error routine is called. If the entry is a production number, the parser does the following: the top symbol on the stack is popped. The production number from the parse table is used as an index into the right-hand side table. This table can be set up as an array with integer indeces that correspond to the production numbers in the grammar. Each cell should point to a list containing the actual symbols in the right hand side for that production. You may want to store the symbols in reverse order–i.e., last one at the front of the list, etc.–since this is the order in which you will push them on the stack.

The parse starts by initializing the stack to contain the distinguished symbol of the grammar (<Goal>) with the ENDOFFILE terminal symbol underneath. It then proceeds as described above, repeatedly comparing stack top and current input and pushing and popping symbols from the stack as required, until an error is detected or the input string is successfully parsed.

You should implement a dump_stack function that can be called during debugging that prints the current contents of the parse stack. Be sure this can be turned on and off, since the compiler will ultimately not generate this kind of internal information for the user.

2 Error detection and recovery

A few general principles apply to errors found regardless of parsing technique being used:

Recognizing that the input is not syntactically valid can be relatively straightforward. An error is detected in predictive parsing when the terminal on top of the stack does not match the next input symbol or when nonterminal A is on top of the stack, a is the next input symbol and the parsing table entry M[A,a] is empty.

Recovering from errors and being able to resume and successfully parse is a bit more difficult. Therefore, the basic requirement for the assignment is to abort after reporting the first error.

However, most users would like to find out more than one error per compilation, so you may want to implement some error recovery. The problem is how to fix the error in some way to allow parsing to continue, including finding a reasonable place to resume the parse.

Many errors are relatively minor and involve syntactic violations for which the parser has a correction that it believes is likely to be what the programmer intended. For example, a missing semicolon at the end of the line or a misspelled keyword can usually be recognized. For many minor errors, the parser can "fix" the program by guessing at what was intended and reporting a warning, but allowing compilation to proceed unhindered. The parser might skip what appears to be an erroneous token in the input or insert a necessary, but missing, token or change a token into the one expected (substituting BEGIN for BGEIN). For more major or complex errors, the parser may have no reliable correction. The parser will attempt to continue but will probably have to skip over part of the input or take some other exceptional action to do so.

Panic mode error recovery is a simple technique that just bails out of the current construct, looking for a safe symbol at which to restart parsing. The parser just discards input tokens until it finds what is called a synchronizing token. The set of synchronizing tokens are those that we believe confirm the end of the invalid statement (e.g., a semicolon) and allow us to pick up at the next piece of code. The basic procedurre is this:

Phrase-level error recovery fills blank entries in the parse table with specific error actions. These actions may change, delete, or insert symbols in the input and/or issue appropriate error messages. At the least, you can provide a precise error message that describes the mismatch between expected and found.

Please refer to your text on pp. 228-31 for a more detailed description of methods for panic mode and phrase-level error recovery.

3 Considerations

The right-hand-side table will ultimately contain markers that indicate that a call to semantic action routines is in order, interspersed within the right-hand-side symbols themselves. Your semantic routines will not be implemented until later, so at this point you can either use the grammar with actions and include code to just ignore semantic actions when they appear on the stack; or use the grammar without actions, and modify it later to include the actions.

4 Additional resources

5 Submission

Please follow the Submission instructions.