class: center, middle ## CS331 # Compilers ## Fall 2019 --- # Overview * Website: https://www.cs.vassar.edu/~cs331/ * Lectures: Tuesday & Thursday 1:30pm-2:45pm * Instructor: Allison Randal * Communication * Moodle discussion forum * email: arandal@vassar.edu * Office hours: Tuesday 3-4pm, Monday & Wednesday 10:30am-noon * Meetings by Zoom/Hangouts --- # Course Structure * Readings: _A Practical Approach to Compiler Construction_ by Des Watson (and optional additional readings) * Theory: 4 written assignments * Practice: 4 project parts (+ 1 for extra credit) * No Exams * Due dates: on the [course calendar](https://www.cs.vassar.edu/~cs331/calendar.html) --- # Quick Survey * How do you prefer to receive assignments/project descriptions? * Paper handout * PDF * Mobile-friendly web page * How do you prefer to submit written assignments? * On paper (hand-written or printed) * PDFs uploaded to Moodle * Do you have a GitHub account? --- # Course Project * Large project, start early * Compiler for C-like language, called DL (described in the textbook) * 4 parts * Lexer * Parser * Semantic Analyzer * Generator * (extra credit) Optimizer * Project parts will be submitted through GitHub --- # Topics * How does a programming language work? * Techniques and tools * Practical experience building a compiler --- # A Bit of History **Machine Code (Binary)** * Machine friendly * Not human friendly, prone to errors (typos) * Few instructions, correspond to low-level operation of the machine Example of in 32-bit x86 machine code (XOR operation): ``` 00110010 00001100 00100101 00010010 00000000 00000000 00000000 ``` --- # A Bit of History **Assembly Language** * Adds symbolic names for * Values * Registers * Instructions * Storage locations * Close to machine code, but needs an _assembler_ to translate * Specific to hardware architecture * Not human friendly, prone to errors (typos) Example of ARM assembly language: ``` add r2, r1, #16 ``` --- # A Bit of History **High-level languages** * Human friendly, easier to write and read * Similar to natural human languages (usually English) * Portable * Many operators/keywords * Needs a _compiler_ to translate * Can check for human errors during compilation * Can optimize during compilation Example of C language: ``` int expr(int n) { int d; d = 4 * n * n * (n + 1) * (n + 1); return d; } ``` --- # A Bit of History **Multiple families of higher-level languages** * Imperative * Procedural (BASIC, C, Fortran, ...) * Object-Oriented (Java, C++, Python, ...) * Declarative * Functional (Lisp, Scheme/Racket, ML, Haskell, ...) * Logical (Prolog, ...) * Different syntax and semantics ⇒ different experience for programmer * All are translated using compiler techniques/tools * All ultimately run machine code underneath --- # Translation * Overall process: * Read the source, words and structure * Validate it * (optional) Optimize it * Generate equivalent lower-level code ![multistage pipeline](/~cs331/images/lectures/multistage.png) (source: _Language Implementation Patterns_ by Terence Parr) --- # Translation 1. Lexical Analysis 2. Parsing 3. Semantic Analysis 4. Optimization 5. Code Generation ![multistage pipeline](/~cs331/images/lectures/multistage.png) (source: _Language Implementation Patterns_ by Terence Parr) --- # Lexical Analysis First step is to recognize the words: > _This is a sentence._ --- # Lexical Analysis First step is to recognize the tokens (words): ``` (+ 3 4) ``` --- # Lexical Analysis First step is to recognize the tokens (words): ``` if x == y then z = 1; else z = 2; ``` --- # Parsing Second step is to understand the structure (syntax): ![sentence diagram](/~cs331/images/lectures/sentence_diagram.png) --- # Parsing Second step is to understand the structure (syntax): ![scheme diagram](/~cs331/images/lectures/scheme_diagram.png) --- # Semantic Analysis Third step is to understand the "meaning" For compilers this means a limited form of analysis to catch inconsistencies --- # Semantic Analysis Third step is to understand the "meaning": > _Jack said Jerry left his assignment at home._ Who does "his" refer to? --- # Semantic Analysis Third step is to understand the "meaning": ``` int jack = 3; { int jack = 4 } ``` Programming languages define strict rules to avoid ambiguities. --- # Optimization Similar to editing to improve human language. ``` X = Y * 0 ``` is the same as ``` X = 0 ``` --- # Optimization Why optimize? * run faster * use less memory * generally, conserve some resource --- # Code Generation * Write a lower-level language * Similar to translating human languages * Result has the same meaning as the original * May generate assembly or machine code, but may be an intermediate language --- # Intermediate Languages * Compiler may translate in successive stages * Each stage is a lower level of abstraction ![multistage pipeline](/~cs331/images/lectures/multistage.png) (source: _Language Implementation Patterns_ by Terence Parr) --- # Interpreter * Stop at an intermediate level and execute * Interpreter is written in some (high-level) language and compiled to machine code ![multistage pipeline](/~cs331/images/lectures/multistage.png) (source: _Language Implementation Patterns_ by Terence Parr) --- # Other Considerations * Overall, a compiler is that simple * Language design affects compiler implementation * Can be easier or harder to compile * Many trade-offs in language design * Error handling trade-offs * Modern compilers emphasize optimization --- # Summary We'll look at each stage in detail, and implement 4 of them 1. Lexical Analysis 2. Parsing 3. Semantic Analysis 4. Optimization 5. Code Generation