class: center, middle # Intermediate Code Generation _CMPU 331 - Compilers_ --- # Recap: Stages of Compilation 1. Lexical Analysis 2. Parsing 3. Semantic Analysis 4. Optimization 5. Code Generation ![multistage pipeline](/~cs331/images/lectures/multistage.png) (source: _Language Implementation Patterns_ by Terence Parr) --- # LLVM Project * A collection of (C++) libraries and tools to help in building compilers, debuggers, program analysers, etc. * Already installed on lab workstations * Can install your own computer, see [llvm.org](https://releases.llvm.org/download.html) * On Ubuntu, install the package `llvm-runtime` * On MacOS, might already be installed with Xcode --- # LLVM Project **History** * Started as academic project at University of Illinois in 2002 * Now a large open source project with many contributors and a growing user base **Related projects** * Clang - C/C++/Objective C compiler, equivalent to GCC * Rust - implemented on LLVM * Microsoft's Common Language Infrastructure (CLI), platform agnostic .NET * Glasgow Haskell Compiler (GHC) has LLVM as one of several alternative code generators --- # Recap - Machine Code (Binary) * Machine friendly * Not human friendly, prone to errors (typos) * Few instructions, correspond to low-level operation of the machine Example of in 32-bit x86 machine code (XOR operation): ``` 00110010 00001100 00100101 00010010 00000000 00000000 00000000 ``` --- # Recap - Assembly Language * Adds symbolic names for * Values * Registers * Instructions * Storage locations * Close to machine code, but needs an _assembler_ to translate * Specific to hardware architecture * Not human friendly, prone to errors (typos) Example of ARM assembly language: ``` add r2, r1, #16 ``` --- # Recap - High-level Languages * Human friendly, easier to write and read * Similar to natural human languages (usually English) * Portable * Many operators/keywords * Needs a _compiler_ to translate * Can check for human errors during compilation * Can optimize during compilation Example of C language: ``` int expr(int n) { int d; d = 4 * n * n * (n + 1) * (n + 1); return d; } ``` --- # Intermediate Languages * Like a higher-level assembly language * Portable, compiled to different hardware architectures * Good form for optimizations * Close to the machine, but * Retains enough higher-level information to be useful * Uses register names, but has an unlimited number * Uses control structures like assembly * Uses opcodes like assembly * Most correspond directly to assembly opcodes * Some are higher level --- # LLVM's Intermediate Language * Three adress-code with two source registers and one destination register: ``` %x = add i32 %y, %z ``` * Source can be a value: ``` %x = mul i32 %y, 8 ``` * Instructions are typed: ``` %x = fadd double %y, %z ``` * New register for each result (Static Single Assignment form) --- # Example A simple calculation expression with two operations: ``` (1 + 5) * 10 ``` Is translated into two separate lines of IR. ``` %tmp = add i32 1, 5 %result = mul i32 %tmp, 10 ``` The `%tmp` identifier is a temporary register: * It stores the result of the `add` operation ("destination") * It is an argument to the `mul` operation ("source")