The Acceptable-set Method of Error Recovery consists of three steps,
all performed after the error is detected:
Each alternative right hand side for each non-terminal in a grammar defeines in itself a language, a set of strings. We are interested here in the length of the shortest string in each of these languages. ONce computed, we know for each non-terminal which of its alternatives produces the shortest string; if two alternatives produce shortest strings of the same length we simply choose one of them arbitrarily. This information is used to ill the array Shortest Production Table [].
The lengths of the shortest productions of all aternatives of all non-terminals is computed as follows:
Data definitions:
1. A set of pairs of the form (production rule, integer).
2a. A set of pairs of the form (non-terminal, integer).
2b. A set of pairs of the form (terminal, integer).
Initializations:
1a. For each production rule N -> A 1...An
with n>0 there is a pair (N -> A 1...An,ï).
1b. For each production rule N -> e
there is a pair (N -> e, 0).
2a. For each non-terminal N there is a pair (N, ï).
2b. For each terminal T there is a pair (T, 1).
Inference rules:
1. For each production rule N -> A 1...An with n>0 , if there are pairs (A1 , l 1) to (An , ln) with all li< ï, the pair (N -> A 1...An, lN ) must be replaced by a pair (N -> A 1...An, lnew) where lnew=
provided lnew< lN .
2. For each non-terminal N, if there are one or more pairs of the form (N -> a, li) with li< ï, the pair (N, lN) must be replaced by (N, lnew ) where lnew is the minimum of the lis, provided lnew < lN .
This algorithm is based on the fact that the length of the shortest
productions of an alternaitve N-> AB... is the sum of the lengths of the
shortest productions of A,B, etc. The initializations 1b. and 2b. set the
minimum lengths of empty alternatives to 0 and those of terminal symbols
to 1. All other lengths are set to ï so
any actual lengths found will be shorter. The first inference rule says
that the shortest length of an alternative is the sum of the shortest lengths
of its components; more complicated but fiarly obvious rules apply if the
alternative includes repetition operators. The second inference rule says
that the shortest length of a non-terminal is the minimum of the shortest
lengths of its alternatives. Note that variables are implemented as (name,
value) pairs.
Here is the overall algorithm:
// Step 1: construct acceptable set:
SET Acceptable set TO Acceptable set of (Prediction stack);
// Step 2: skip unacceptable tokens:
WHILE Input token [Input token index] IS NOT IN Acceptable set:
MESSAGE "Token skipped:
", Input token [Input token index];
SET Input token index
TO Input token index + 1;
// Step 3: resynchronize the parser:
SET the flag Resynchronized TO False;
WHILE NOT Resynchronized:
Pop top of Prediction
stack into Predicted;
IF Predicted is a Token
class:
// Try a match move:
IF Predicted = Input token [Input token index] .class:
SET Input token index TO Input token index + 1; // matched
SET Resynchronized TO True; // resynchronized!
ELSE Predicted /= Input token:
Insert a token of class Predicted, including representation;
MESSAGE "Token inserted of class ", Predicted;
ELSE Predicted is a
Non-terminal:
// Do a prediction move:
SET Prediction TO
Prediction table [Predicted, Input token [Input token index]];
IF Prediction = Empty:
SET Prediction TO Shortest production table [Predicted];
// Now Prediction /= Empty:
PUSH The symbols of Prediction ON Prediction stack;