class: center, middle # Abstract Syntax Trees and Error Handling _CMPU 331 - Compilers_ --- # Abstract Syntax Trees * A parser traces the derivation of a sequence of tokens * Parse trees trace the operation of the parser * Capture the nesting structure of the productions ![parse tree](/~cs331/images/lectures/cfg_noparen_diagram.png) --- # Abstract Syntax Trees * But, parse trees capture a lot of information we really don't care about * parentheses * single-successor nodes * hierarchy of productions (sometimes forced by the chosen parser tool, such as eliminating left recursion) ![parse tree](/~cs331/images/lectures/cfg_diagram.png) --- # Abstract Syntax Trees * Abstract syntax trees (ASTs) are like parse trees, but cleaned up and streamlined * Designed to simplify later stages of compilation * Simple but important data structure in compilers ![AST](/~cs331/images/lectures/ast_diagram.png) --- # Error Handling * Purpose of the compiler is to: * Detect non-valid programs * Translate the valid ones * Different kinds of possible errors --- # Error Handling Error kind | Example | Detected by ---------- | ------- | ----------- Lexical | invalid character '@' | lexer Syntax | expected ')' after '(' | parser Semantic | can't add integer to string 'x + 5' | type checker Correctness | unexpected result | code review, tests, proofs --- # Syntax Error Handling * Error handler should: * Report errors accurately and clearly * Recover from an error quickly * Not slow down compilation of valid code * Good error handling is not easy to achieve --- # Approaches to Syntax Error Recovery * From simple to complex * Panic mode * Error productions * Automatic local or global correction * Not all are supported by all parser generators --- # Panic Mode * Simplest, most popular method * When an error is detected: * Discard tokens until one with a clear role is found * Continue from there * Such tokens are called _synchronizing_ tokens * Typically pick statement or expression terminators --- # Panic Mode Example: ``` (1 + + 2) + 3 ``` * Skip ahead to the next integer and then continue Example: ``` int x = ; int y = x + 5; ``` * Skip ahead to the end of statement ";" and then continue --- # Error Productions * Idea: specify common mistakes in the grammar * Essentially promotes common errors to alternative syntax * Example: * Common error: write `5 x` instead of `5 * x` * Add production `E ... | E E` * Disadvantage: complicates the grammar --- # Local and Global Correction * Idea: find a correct program, similar to the actual input * Try token insertions and deletions * Exhaustive search * Disadvantages: * Hard to implement * Slows down parsing of correct programs * The "corrected" program may not actually have the intended behavior * Not all tools support it --- # Syntax Error Recovery * Past * Slow recompilation cycle (once a day, once a week, once a month) * Tried to find as many errors in one cycle as possible * Researchers were obsessed with error recovery * Present * Quick recompilation cycle (every minute, several times a minute) * Developers tend to correct one error and then recompile * Complex error recovery is less compelling * Panic mode is generally regarded as enough