The Acceptable-set Method of Error Recovery


The Acceptable-set Method of Error Recovery consists of three steps, all performed after the error is detected:

Step 1: Constructing the Acceptable set

The Acceptable sets are constructed using continuations. A continuation for a given stack configuration repalces each non-terminal on the stack with a terminal production for it, in particular, the production that produces the shortest string.

Each alternative right hand side for each non-terminal in a grammar defeines in itself a language, a set of strings. We are interested here in the length of the shortest string in each of these languages. ONce computed, we know for each non-terminal which of its alternatives produces the shortest string; if two alternatives produce shortest strings of the same length we simply choose one of them arbitrarily. This information is used to ill the array Shortest Production Table [].

The lengths of the shortest productions of all aternatives of all non-terminals is computed as follows:

Data definitions:

1. A set of pairs of the form (production rule, integer).
2a. A set of pairs of the form (non-terminal, integer).
2b. A set of pairs of the form (terminal, integer).

Initializations:

1a. For each production rule N -> A 1...An with n>0 there is a pair (N -> A 1...An,ï).
1b. For each production rule N -> e there is a pair (N -> e, 0).
2a. For each non-terminal N there is a pair (N, ï).
2b. For each terminal T there is a pair (T, 1).

Inference rules:

1. For each production rule N -> A 1...An with n>0 , if there are pairs (A1 , l 1) to (An , ln) with all li< ï, the pair (N -> A 1...An, lN ) must be replaced by a pair (N -> A 1...An, lnew) where lnew=

provided lnew< lN .

2. For each non-terminal N, if there are one or more pairs of the form (N -> a, li) with li<   ï, the pair (N, lN) must be replaced by (N, lnew ) where lnew is the minimum of the lis, provided lnew < lN .

This algorithm is based on the fact that the length of the shortest productions of an alternaitve N-> AB... is the sum of the lengths of the shortest productions of A,B, etc. The initializations 1b. and 2b. set the minimum lengths of empty alternatives to 0 and those of terminal symbols to 1.  All other lengths are set to ï so any actual lengths found will be shorter. The first inference rule says that the shortest length of an alternative is the sum of the shortest lengths of its components; more complicated but fiarly obvious rules apply if the alternative includes repetition operators. The second inference rule says that the shortest length of a non-terminal is the minimum of the shortest lengths of its alternatives. Note that variables are implemented as (name, value) pairs.
 

Step 2: Discard unacceptable tokens

Zero or more tokens from the input are discarded in order, until  a token is found that is in the acceptable set. Since the token EOF is always acceptable, this step terminates.

Step 3: Resynchronize the parser

Continue parsing with a modified parser. This parser first tries the usual  predict or match move.  If this succeeds, the parser is back in synch and the parse can continue normally, but if it fails, the modified parser proceeds as follows: This step is repeated until a move succeeds and the parser is resynchronized.
 

Here is the overall algorithm:

// Step 1: construct acceptable set:
SET Acceptable set TO Acceptable set of (Prediction stack);

// Step 2: skip unacceptable tokens:
WHILE Input token [Input token index] IS NOT IN Acceptable set:
        MESSAGE "Token skipped: ", Input token [Input token index];
        SET Input token index TO Input token index + 1;

// Step 3: resynchronize the parser:
SET the flag Resynchronized TO False;
WHILE NOT Resynchronized:
        Pop top of Prediction stack into Predicted;
        IF Predicted is a Token class:
                // Try a match move:
                IF Predicted = Input token [Input token index] .class:
                        SET Input token index TO Input token index + 1; // matched
                        SET Resynchronized TO True; // resynchronized!
                ELSE Predicted /= Input token:
                        Insert a token of class Predicted, including representation;
                        MESSAGE "Token inserted of class ", Predicted;
        ELSE Predicted is a Non-terminal:
                // Do a prediction move:
                SET Prediction TO
                        Prediction table [Predicted, Input token [Input token index]];
                IF Prediction = Empty:
                        SET Prediction TO Shortest production table [Predicted];
                // Now Prediction /= Empty:
                PUSH The symbols of Prediction ON Prediction stack;