class: center, middle # Machine Code Generation _CMPU 331 - Compilers_ --- # Recap: Stages of Compilation 1. Lexical Analysis 2. Parsing 3. Semantic Analysis 4. Optimization 5. Code Generation ![multistage pipeline](/~cs331/images/lectures/multistage.png) (source: _Language Implementation Patterns_ by Terence Parr) --- # Machine Code Generation * Generating machine code is similar to generating intermediate code like LLVM, except: * The instructions are specific to the target hardware architecture (x86, ARM, RISC-V, etc.) * The compiler has to manage more low-level hardware details like memory management and register allocation --- # Run-time Environments * Execution of a program is initially under the control of the operating system * When a program is invoked: * The OS allocates space in memory for the program * The code is loaded into part of the space * The OS jumps to the entry point (that is, the "main" function, or equivalent) --- # Memory Layout ![memory layout](/~cs331/images/lectures/memory_layout.png) --- # Memory Layout By tradition, pictures of machine organization have: * Low addresses at the top * High addresses at the bottom * Lines delimiting areas for different kinds of data These pictures are simplifications, for example, not all memory is necessarily contiguous --- # What is Other Space? * Holds all data for the program * Other Space == Data Space * Compiler is responsible for * Generating code * Managing use of the data area --- # Assumptions about Execution 1. Execution is sequential; control moves from one point in a program to another in a well- defined order 2. When a function/method is called, control eventually returns to the point immediately after the call --- # Lifetimes of Variables * The _lifetime_ of a variable `x` is the portion of execution in which `x` is defined and used * Lifetime is a dynamic (run-time) concept * Scope is a static concept --- # Activations * A call to function/method `y` is an _activation_ of `y` * The _lifetime_ of an activation of `y` is: * All the steps to execute `y` * Including all the steps in any functions/methods that `y` calls --- # Activation Trees * Control returns to the point immediately after a function/method call * This requires that when `y` calls `z`, then `z` returns before `y` does * Activation lifetimes are nested * They can be represented as a tree * They can be implemented as a stack * The activation tree depends on run-time behavior, and may be different every time the program runs --- # Example ``` g(); { return 1 } f(); { return g() } /* main */ { g(); return f() } ``` ![activation record](/~cs331/images/lectures/activation_record_first.png) --- # Example ``` g(); { return 1 } f(); { return g() } /* main */ { g(); return f() } ``` ![activation record](/~cs331/images/lectures/activation_record_second.png) --- # Example ``` g(); { return 1 } f(); { return g() } /* main */ { g(); return f() } ``` ![activation record](/~cs331/images/lectures/activation_record_third.png) --- # Example ``` g(); { return 1 } f(); { return g() } /* main */ { g(); return f() } ``` ![activation record](/~cs331/images/lectures/activation_record_final.png) --- # Revised Memory Layout ![memory layout](/~cs331/images/lectures/memory_layout_stack.png) --- # Activation Records * The information needed to manage one function/method activation is called an _activation record_ (AR) or _frame_ * Contents of a typical AR for `g`: * Space for `g`'s return value * Parameters passed to `g` * Pointer to the previous (caller's) AR, also called the _control link_ * A _return address_, where to resume execution after returning from `g` * Machine status before calling `g` (contents of registers, program counter, local variables) * Other temporary values --- # Activation Records * If function `f` calls `g`, then `g`'s activation record contains some info about `f` as well as info about `g` * Since `f` is "suspended" until `g` completes, `g`'s AR contains information needed to resume execution of `f` * The AR for `g` may also contain: * `g`'s return value (needed by `f`) * Actual parameters to `g` (supplied by `f`) * Space for `g`'s local variables --- # Example ``` f(x); { if (x==0) { return 1 } else { return f(x - 1) } } /* main */ { return f(3) } ``` ![activation record](/~cs331/images/lectures/activation_record_f.png) --- # Example ``` f(x); { if (x==0) { return 1 } else { return f(x - 1) /*!*/ } } /* main */ { return f(3) /*@*/ } ``` ![activation record](/~cs331/images/lectures/activation_record_f_extended.png) --- # Example * The `main` function has no arguments or local variables, and its result is never used, so its AR is uninteresting * The comments `/*!*/` and `/*@*/` represent the return addresses for the two different calls to `f` * This is only one of many possible AR designs * Would work for C, Pascal, FORTRAN, and other similar languages --- # Discussion * The advantage of placing the return value first in the frame is that the caller can find it at a fixed offset from its own frame * There is nothing magic about this AR layout * Can rearrange order of frame elements * Can divide caller/callee responsibilities differently * An AR layout is "better" if it improves execution speed or simplifies code generation * Real compilers hold as much of the frame as possible in registers * Especially function arguments and return values --- # Activation Records * The compiler must determine, at compile-time, the layout of activation records and generate code that correctly accesses locations in the activation record * So, the AR layout and the code generator must be designed together --- # Globals * All references to a global variable point to the same object, so can't store it in an activation record * Globals are assigned a fixed address once, they are "statically allocated" * Depending on the language, there may be other statically allocated values (like global constants) --- # Memory Layout with Static Data ![memory layout](/~cs331/images/lectures/memory_layout_static.png) --- # Heap Storage * A value that outlives the function/method that creates it cannot be kept in the AR ``` method foo() { return new Bar } ``` > (The `Bar` object must survive deallocation of `foo`'s AR) * Languages with dynamically allocated data use a _heap_ to store dynamic data * Both the heap and the stack grow, must take care that they don't grow into each other * Solution: start heap and stack at opposite ends of memory and let them grow towards each other --- # Memory Layout with Heap ![memory layout](/~cs331/images/lectures/memory_layout_heap.png) --- # Memory Layout * The code area contains object code * For most languages, fixed size and read-only * The static area contains data (not code) with fixed addresses (global data) * Fixed size, may be read-only or writable * The stack contains an AR for each currently active function/method * Each AR usually fixed size, contains local variables * Heap contains all other data * In C, heap is managed by `malloc` and `free`