class: center, middle # Rust and Type Safety _CMPU 331 - Compilers_ --- # Undefined Behavior > _...behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements..._ > — C programming language standard (version ISO/IEC 9899:1999) Undefined behavior doesn't just produce an unspecified result: it is allowed to cause the program to do _anything at all_. --- # Undefined Behavior This program tries to set the value of an array element, using an index past the end of the array. What should the program do? ```c int main(int argc, char **argv) { unsigned long a[1]; a[3] = 0x7ffff7b36cebUL; return 0; } ``` * What it actually does is set the return address of the `main` function to `0x7ffff7b36cebUL`, and then keeps executing whatever it finds at that memory address. * Whoops! * The standard says this is allowed, because the behavior is "undefined". But, this kind of behavior is the cause of many dangerous security flaws in C code. --- # Terminology * A _well defined_ program is one that is written so that no execution can lead to undefined behavior. * A language is _type safe_ if it has type checking sufficient to ensure that every program is well defined. _(Note: a language can be type safe by doing type checking statically at compile time, dynamically at runtime, or both.)_ --- # Type Safety A similar attempt to assign to an index past the end of an array in Python would throw an exception: ```python >>> a = [0] >>> a[3] = 0x7ffff7b36ceb Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: list assignment index out of range >>> ``` * Throwing an exception is defined behavior. The exception will either be caught and handled, or the program will terminate. The program will not go off and set, access, or execute random regions of memory. * Many high-level languages are type safe: Java, JavaScript, Ruby, Haskell, etc... --- # Type Safety * Type safety isn't magic, it doesn't solve all security problems. * But, it does help eliminate a class of very common security flaws related to memory access. Specifically: * _spatial errors_: for example, out-of-bounds array index, out-of-bounds pointer * _temporal errors_: for example, use-after-free, use of unitialized memory _(Note: see "[SoK: Eternal War in Memory](https://people.eecs.berkeley.edu/~dawnsong/papers/Oakland13-SoK-CR.pdf)" by László Szekeres, Mathias Payer, Tao Wei, and Dawn Song)_ --- # Rust * A modern systems language, designed to work well for the same purposes as C. * Can use C libraries directly, and also Rust code can be used directly as a library from C. * Type safe, and also takes type checking one step further... --- # Rust and Type Safety * Rust has familar simple types for integers, floating-point numbers, booleans, and characters. * The `String` type is a sequence of characters (not an array). * The size of an array and the type of its elements is fixed at compile time. * A vector is similar to an array, but can be resized dynamically. --- # Rust and Type Safety By default in Rust, array elements are immutable (cannot be modified): ```rust let letters = ["a", "b", "c", "d"]; letters[0] = "q"; ``` Gives the error: ``` cannot mutably borrow field of immutable binding ``` Have to declare the array as `mut`: ```rust let mut letters = ["a", "b", "c", "d"]; letters[0] = "q"; ``` --- # Rust and Type Safety If we try to set an out-of-bounds array index: ```rust let mut letters = ["a", "b", "c", "d"]; letters[4] = "q"; ``` Rust gives the error: ``` index out of bounds: the len is 4 but the index is 5 ``` --- # Lifetimes The _lifetime_ of a variable is the span of code where that variable is actively usable. * Starting when the variable is declared and defined. * Ending when the variable is destroyed or out of scope. We can compare two different lifetimes in terms of which _outlives_ the other: ``` 'a: I----------------I 'b: I---------I ``` --- # Lifetimes Outliving is a transitive relation, if `'a` outlives `'b`, and `'b` outlives `'c`, then `'a` also outlives `'c`: ``` 'a: I--------------------I 'b: I----------------I 'c: I---------I ``` But, not all lifetimes can be compared by _outlives_: ``` 'a: I---------I 'b: I------------I ``` Rust defines one maximal lifetime named `'static` that outlives all the others. --- # Rust and Lifetimes The _lifetime_ of a variable is the span of code where that variable is actively usable. * Starting when the variable is declared and defined. * Ending when the variable is destroyed or out of scope. ```rust let x = 4; { let y = 7; } let z = y; ``` Compile-time error: ``` cannot find value `y` in this scope ``` --- # Rust and Lifetimes Rust also requires references (pointers) to respect lifetimes: ```rust let r; { let x = 4; r = &x; } let y = r; ``` Compile-time error: ``` 5 | r = &x; | ^ borrowed value does not live long enough 6 | } | - `x` dropped here while still borrowed ... 12 | } | - borrowed value needs to live until here ``` --- # Lifetimes as Types How does Rust do this? With type checking, of course. * If U is a parent type, and T is a subtype of U, then we can always store a T typed value in a U typed variable, but not vice versa. * If T is a type with lifetime `'a` and U is a type with lifetime `'b`, and `'a` outlives `'b`, then T is a subtype of U. ```rust let r; // integer type with lifetime 'a { let x = 4; // integer type with lifetime 'b r = &x; } let y = r; // not allowed because 'a outlives 'b ```