Lecture 18

Theory and Design of PL (CS 538)

March 30, 2020

News

HW5 Out: Start Early

  • Due in 2.5 weeks: April 17 (FRIDAY)
  • WR5 Part 1: Short answers (why code is rejected)
    • Do these first
  • HW5: implement key-value Map based on BST

This assignment is big, with lots of compiler errors.

HW5 Out: Tips

  • Read the README carefully…
  • Try using recursion
    • Will avoid the borrow checker a bit
    • For more compiler errors, use loops (optional)
  • Most of the functions are one-liners
  • Get the first iterator (consuming) right
    • Other two iterators are nearly copy-paste

HW4: Feedback?

Mixing moving and borrowing

Option::take()

impl Option<T> {
    pub fn take(&mut self) -> Option<T> { ... }
}
  • Remember: &mut self is ref. to Option<T>
  • What does this function do?
    1. Get what self is pointing at (take ownership!)
    2. Write None to self

How does ownership change?

  • Before and after take:
    • Before: caller doesn’t own, someone else owns Some(...)
    • After: caller owns Some(...), someone else owns None.
  • Note: ownership transfers, but data is never copied!
  • Also see std::mem::replace, std::mem::swap

Revisiting

let my_str = String::from("Hello world!");
let maybe_str = Some(my_str);
    
match maybe_str {
  None => println!("Nothing!"),
  Some(s) => println!("Something!"),  // String *moved* into s
                                      // s dropped here
}
  
println!("Still there? {}", maybe_str.is_none());  // Not OK!
  • Even maybe_str is dropped: inner s is gone!

Take inner, leave wrapper

  • What happens if we take the maybe_str instead?
let mut maybe_str = Some(String::from("Hello world!"));
let mut_str_ref = &mut maybe_str;   // type: &mut Option<String>

let took_str = mut_str_ref.take();  // type: Option<String>
                                    // maybe_str is now None

match took_str {
  None => println!("Nothing here!"),
  Some(s) => ... s owns String ...,
}

println!("Still there? {}", maybe_str.is_none());  // Now OK

Generics and Polymorphism

Type with parameters

  • Just like in Haskell
    • Types: [a], Maybe a, …
  • Similar idea in Rust
    • Types: Option<T>, …

Generic types

  • Put type variables in angle brackets
struct MyPair<T, U> {
  first: T,
  second: U,
}

enum MySum<T, U> {
  Left(T),
  Right(U),
}

Generic functions

  • Like polymorphic functions in Haskell
fn swap_pair<T, U>(input: MyPair<T, U>) -> MyPair<U, T> {
  MyPair { first: input.second, second: input.first }
}

fn swap_sum<T, U>(input: MySum<T, U>) -> MySum<U, T> {
  match input {
    Left(val)  => MySum::Right(val),
    Right(val) => MySum::Left(val),
  }
}

Generic methods

  • Can put type parameters on impl blocks
    • Don’t need to annotate type params inside
impl<T, U> MyPair<T, U> {
  fn pair_fn_t(self, t: T) { ... }

  fn pair_fn_u(self, u: U) { ... }

  fn pair_fn(self, pair: MyPair<T, U>) { ... }
}

Rust details

  • Generic functions are specialized at compile time
    • Change foo<T>(t: T) to foo_i32(t: i32)
    • No extra runtime cost for using generics
    • Polymorphic to monomorphic (monomorphization)
  • Sizes of type params must be known at compile time

Aliasing

The golden rules

  • Aliasing: two references to same memory
  • In any scope, there can be either:
    1. Any number of immutable references
    2. At most one mutable reference
  • … referring to the same data

One or the other: not both!

Why aliasing matters

  • Aliasing makes optimizations harder
    • Makes it harder to cache, reorder code, …
  • Aliasing and mutation are dangerous together
    • Very common source of memory errors

Is this optimization OK?

fn compute(input: &u32, output: &mut u32) {
  if *input > 10 { *output = 1; }  // lookup input
  if *input > 5 { *output *= 2; }  // lookup again
}

fn compute_opt(input: &u32, output: &mut u32) {
  let cached_input = *input;  // cache *input
  if cached_input > 10 {
    *output = 2;
  } else if cached_input > 5 {
    *output *= 2;
  }
}
  • Not OK if input and output point to same thing
  • In Rust: OK since input and output can’t alias

Aliasing and mutation: danger!

  • Rules are crucial to ensure memory safety
let mut data = vec![1, 2, 3];
let fst_ref = &data[0];

data.clear();  // rejected by Rust: breaks ref rules!
println!("{}", fst_ref);  // what is this pointing at now???

Lifetimes

Don’t focus on details

  • Rust rejects lots of valid programs
  • Analysis is getting better/more sophisticated
    • Rules for lifetimes are changing/evolving
  • Think of this as a sketch about how Rust checks

High-level: how Rust analyzes aliasing

Back to the bad example

let mut data = vec![1, 2, 3];
let fst_ref = &data[0];

data.clear();  // rejected by Rust: breaks ref rules!
println!("{}", fst_ref);  // what is this pointing at now???

How does Rust know?

  • In Rust, each reference has a lifetime
  • Borrow-checker reasons about facts like:
    • Whenever Ref 1 is valid, Ref 2 is valid too
    • “Ref 2 lives longer than Ref 1”

Lifetimes: scope names

  • Think: name for a scope/block in program
  • Static lifetime 'static is global scope (biggest)
  • Scope variables 'a refer to some scope
    • Can’t write concrete lifetimes besides 'static

Lifetimes are nested

  • Think: scopes are nested too
  • Write: 'b:'a for 'b contains 'a
    • That is: 'b lives longer than 'a
  • Example: 'static:'a, global scope is longest

Example

{                            //               < 'a1
  let foo = 1;               //               |
  {                          //         < 'a2 |
    let bar = 2;             //         |     |
    {                        //  < 'a3  |     |
      let baz = 3;           //  |      |     |
    }                        //  <      |     |
  }                          //         <     |
}                            //               <
  • Lifetimes are nested: 'a1:'a2 and 'a2:'a3

References have lifetimes

  • Describes how long reference is valid for
  • Lifetimes appear in ref types (and a few other places)
&'a String      // Ref living 'a to String living 'a
&'b mut String  // Mutable ref living 'b to String

Lifetime Examples

let x = 0;
let y = &x;
let z = &y;

Lifetime Examples

let mut data = vec![1, 2, 3];
let fst_ref = &data[0];

data.clear();  // rejected by Rust: breaks ref rules!
println!("{}", fst_ref);  // what is this pointing at now???

Lifetimes evolve

  • “Rust 2015”: what we just saw
    • Lifetimes are scopes (lexical lifetimes)
    • Rejects many safe programs
  • “Rust 2018”: lifetimes are sets of references
    • Also known as non-lexical lifetimes (NLL)
    • Gory details/examples in RFC proposal

Annotating lifetimes

Usually: no need to worry

  • Lifetimes inferred automatically 99.9% of the time
  • Certain kinds of code need annotations
    • Structs storing references
    • Functions returning references

Functions and lifetimes

  • Typical use case
    • Function takes references as arguments
    • Function returns reference
  • Need to describe how long returned reference lives
    • Usually: depends on lifetimes of arguments

Example: lives forever

static NAME: &'static str = "Steve";

// Omitting lifetimes
fn foo (arg: &String) -> &String { NAME }

// Annotating lifetimes
fn annot_foo<'a> (arg: &'a String) -> &'static String { NAME }

// Return ref doesn't depend on input, lives forever
  • Function must work for all choices of 'a
    • Just like all generic functions in Rust

Example: lifetime of inputs

// Omitting lifetimes
fn plus_foo (arg: &mut String) -> &mut String {
  arg.push_str(" and foo");
  arg
}

// Annotating lifetimes
fn annot_plus_foo<'a> (arg: &'a mut String) -> &'a mut String {
  arg.push_str(" and foo");
  arg
}
  • Return ref lives (at least) as long as input arg

Dangling references

  • This function is broken: it creates a dangling pointer
fn bad_foo () -> &String {
  let too_short = String::from("too short");

  &too_short
} // too_short goes out of scope, is dropped here
  • Returns a reference, but too_short is dropped
    • Returned reference points to nothing!

Prevented in Rust

  • Compiler complains: can’t infer lifetimes
  • What if we try to fill in some lifetimes?
fn bad_foo<'a> () -> &'a String {
  let too_short = String::from("too short");

  &too_short
} // too_short goes out of scope, is dropped here
  • Compiler rejects: returned reference doesn’t live (at least) as long as 'a for all possible lifetimes 'a
    • Would work if ref had 'static lifetime

Compiler may need help

  • The following simple function does not compile
fn longest(x: &String, y: &String) -> &String {
  if x.len() > y.len() {
    x
  } else {
    y
  }
}
  • Compiler not sure how long the returned string lives

Add annotations

  • Help the compiler by supplying lifetimes
fn longest<'a> (x: &'a String, y: &'a String) -> &'a String {
  if x.len() > y.len() {
    x
  } else {
    y
  }
}
  • Read: if x and y live at least as long as 'a, then returned string also lives at least as long as 'a

Rust traits

Think: typeclasses

  • Defining a new trait
    • List methods required to implement trait
    • Can put default implementations
trait Summary {
  fn summarize_author(&self) -> String;

  fn summarize(&self) -> String {
    format!("(Read more from {}...)", self.summarize_author())
  }
}

Implementing a trait

  • Provide missing implementations (or use defaults)
// Our type
struct NewsArticle {
  author: String,
  content: String,
}

// Implementing the trait
impl Summary for NewsArticle {
  fn summarize_author(&self) -> String {
    format!("{}", self.author)
  }

  // leave summarize as default
}

Requiring a trait

  • Function may require parameters implement traits
  • Put requirements with type parameters
    • Can require several traits with “+”
    • Called “trait bounds”
fn cmp_auth<T: Summary + Ord>(x: &T, y: &T) {
  // can use Summary trait
  let auth_x = x.summarize_author();

  // can use Cmp trait
  let cmp_two = x.cmp(y);

  ...
}

Requiring a trait

  • Often cleaner to separate out trait bounds
fn cmp_auth<T>(x: &T, y: &T)
where
  T: Summary + Ord,
  // can list other bounds here
{
  // can use Summary trait
  let auth_x = x.summarize_author();

  // can use Cmp trait
  let cmp_two = x.cmp(y);

  ...
}

Traits: Examples

Ord

  • Ordering is an enum: Less, Equal, or Greater
    • Requires PartialOrd and Eq
  • Self (in caps) is the type with this trait
trait Ord: Eq + PartialOrd {
  fn cmp(&self, other: &Self) -> Ordering;
  // Example: match x.cmp(&y) { ... }

  fn max(self, other: Self) -> Self { ... }
  fn min(self, other: Self) -> Self { ... }
}

Clone

  • Types with ability to do deep copy
  • May be expensive, always explicitly stated
trait Clone {
  fn clone(&self) -> Self;

  // Example: let dolly_two = dolly.clone();
}

// Can also be auto-derived if members are Clone
#[derive(Clone)]
struct Person {
    name: String,
    age: u32,
}

Drop

  • Add custom behavior when type is dropped
    • Note: memory is freed no matter what
  • Implemented by default, usually no need
trait Drop {
  fn drop(&mut self);
}

impl Drop for Person {
  fn drop(&mut self) {
    println!("Don't drop me!!!");
  }
}

From/Into

  • Conversions from a type, and into a type
  • Again, conversions always explicit
trait From<T> {
  fn from(other: T) -> Self;
  // Can convert from T's to this type
}

trait Into<T> {
  fn into(self) -> T;
  // Can convert from this type to T's
}

Many, many more

  • Rust makes very liberal use of traits
  • Many syntax features hook into traits
    • For loops: IntoIterator
    • Square-brackets: Index/IndexMut
    • Dereference: Deref/DerefMut
    • Operator overloading (+/-/*): Add/Sub/Mult

Missing anything?

Inductive datatypes?

  • Can do, but not so easy
    • Types must have statically known size on stack
  • Size of inductive datatypes not known statically
  • First type definition is rejected:
enum MyList<T> {
  Nil,
  Cons(T, MyList<T>),  // know size of T, but not MyList<T>
}

enum MyListOk<T> {
  Nil,
  Cons(T, Box<MyListOk<T>>),  // Box: put inner list on heap
}

Function types?

  • No plain arrow types
    • Size of functions is not statically known
    • Can’t place data on the stack
  • Can model various function types using traits (later)

Is that really polymorphism?

  • Type variables only types with statically-known size
    • Usually needed to specialize generics
  • Can override this behavior:
// Sized trait: T has size known at compile time
// negative annotation `?Sized`: T *doesn't* need to be Sized
fn foo<T: ?Sized>(t: &T) { ... }
  • Usually: when working with references to generics
    • Size not important if we don’t need to move data

Are those really typeclasses?

  • A few differences compared to Haskell
  • Operations always take instance as first argument
    • Can’t do stuff like Read typeclass:
class Read a where
  read :: String -> a