Lecture 13

Theory and Design of PL (CS 538)

March 4, 2020

Property-based Testing

Unit testing is boring

  • Most common kind of test today
  • Idea: write test cases one by one
    • Write down one input (and maybe external state)
    • Write down expected output
    • Check if input produces expected output
  • Build up a lot of hand-crafted tests
    • Write new tests when bugs are found
    • Keep tests up to date

Idea: test properties

  • Idea: write down properties of programs
  • Properties hold for class of inputs, not just one input
  • Don’t need to write tests one-by-one

Randomly generate test cases!

Examples

  • Applying twice same as applying once (idempotence)
  • One function “undoes” another function (inverse)
  • Optimized implementation mirrors simple version
  • Relationships between insert, delete, lookup, etc.

Quickcheck

  • Haskell library for property-based testing
    • Write random input generators with combinators
    • Write properties of functions we want to test
  • Quickcheck will randomly generate and test
    • “Shrinks” failing test inputs to find minimal ones
  • Implementations in at least 40+ other languages

Taking it for a spin

  • Install with Cabal (or Stack)
cabal v2-install --lib QuickCheck
  • Import Haskell module
import Test.QuickCheck
  • Documentation available here

Quickcheck demo

How to test a parser?

  • Parser goes from String to structured data
  • How to generate input Strings?
    • Randomly? Almost certainly won’t parse…
  • Even if parser succeeds, is the answer is right?

Use the pretty-printer (HW3)

  • Parser: String to structured data
  • Printing: structured data to String
    • This direction is usually easy…

Inverse property: printing data, then parsing it back should give original data!

How is the library designed?

Gen type

  • Gen a: something that can generate random a’s
-- Build generator drawing a's from a list of a's
elements :: [a] -> Gen a

-- Select a random generator from a list
oneof :: [Gen a] -> Gen a

-- Customize distribution of generators
frequency :: [(Int, Gen a)] -> Gen a

Gen is a monad

instance Monad Gen where
  return :: a -> Gen a

  (>>=) :: Gen a -> (a -> Gen b) -> Gen b
  • Return: from val of type a, build generator that always returns val
  • Bind: draw something from first generator (of a’s) and use to select the next generator (of b’s)

Combining generators

  • Combinators: build new generators out of old ones
-- Turn generator of a into generator of pairs of a
genPairOf :: Gen a -> Gen (a, a)
genPairOf g = do x <- g
                 y <- g
                 return (x, y)

-- Turn generator of a into generator of lists of a
vectorOf :: Int -> Gen a -> Gen [a]
vectorOf 0 _ = return []
vectorOf n g = do x  <- g
                  xs <- vectorOf (n - 1) g
                  return (x:xs)

Typeclass: Arbitrary

  • Arbitrary a: means a is “generatable”
  • Concretely: there is something of type Gen a
class Arbitrary a where
  arbitrary :: Gen a

Using Arbitrary

  • Typeclass machinery will automatically get generator
  • Compare with previous: no need to pass in Gen a
genPair :: Arbitrary a => Gen (a, a)
genPair = do x <- arbitrary  -- From typeclass constraint
             y <- arbitrary  -- Automatically inferred
             return (x, y)

vector :: Arbitrary a => Int -> Gen [a]
vector 0 = return []
vector n = do x  <- arbitrary
              xs <- vector (n - 1)
              return (x:xs)

Instances of Arbitrary

  • Library has tons of instances for base types
    • Arbitrary Bool, Arbitrary Char, Arbitrary Int, …
  • Also has instances for more complex types
instance (Arbitrary a, Arbitrary b) 
         => Arbitrary (a, b) where        -- products
instance (Arbitrary a, Arbitrary b)
         => Arbitrary (Either a b) where  -- sums
instance Arbitrary a
         => Arbitrary [a] where           -- lists

Arbitrary products

instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where
  arbitrary = do getA <- arbitrary  -- type: Gen a
                 getB <- arbitrary  -- type: Gen b
                 return (getA, getB)

Arbitrary sums

instance (Arbitrary a, Arbitrary b) => Arbitrary (Either a b) where
  arbitrary = oneOf [ do aa <- arbitrary  -- type: Gen a
                         return (Left aa)
                    , do bb <- arbitrary  -- type: Gen b
                         return (Right bb) ]

Testing properties

  • Combine generator of a’s and property of a’s
forAll :: Show a => Gen a -> (a -> Bool) -> Property

myProp2 = forAll genX $ \x ->
            forAll genY $ \y ->
              fst (x, y) == x

Additional features

  • Sizes: control “size” of generated things
  • Shrinking: given a failing test case, “make it smaller”
    • Search for simplest failing test cases
    • Can customize how to shrink test cases
  • Implications: if one prop holds, then other one holds
    • “If input is valid, then function behaves correctly”

Quick review: Our Favorite Types

Always the same pattern

  1. Add a new type
  2. Add constructor expressions
  3. Add destructor expressions
  4. Add typing rules for new expressions
  5. Add evaluation rules for new expressions

Function types

  1. Type of the form s \to t, where s, t are types
  2. Constructing functions: \lambda x.\ e
  3. Destructing functions: e~e'

Functions in Haskell

-- Function types look like this
myFun :: Int -> String

-- Building functions
myFun = \arg -> "Int: " ++ (show arg)

mySameFun arg = "Int: " ++ (show arg)

-- Using functions
myOtherFun = myFun 42

Product types

  1. Type of the form s \times t, where s, t are types
  2. Constructing pairs: (e_1, e_2)
  3. Destructing pairs: fst(e) and snd(e)
    • Or: pattern match

Think: an s \times t contains an s AND a t

Products in Haskell

-- Product types look like this:
myPair :: (Bool, Int)

-- Building pairs
myPair = (True, 1234)

-- Using pairs via projections
myFst = fst myPair  -- True
mySnd = snd myPair  -- 1234

Records in Haskell

  • “Record types”: products in disguise
-- Declaring a record type
data RecordType = MkRt { getBool :: Bool, getInt :: Int }
myRecord :: RecordType

-- Building records
myRecord = MkRt { getBool = True, getInt = 1234 }

-- Using records via accessors
myBool = getBool myRecord -- True
myInt  = getInt  myRecord -- 1234

-- Using records pattern match
myFoo = case myRecord of
          MkRt { getBool = b, getInt = i } -> ... b ... i ...

Sum types

  1. Type of the form s + t, where s, t are types
  2. Constructing sums: inl(e_1), inr(e_2)
  3. Destructing sums: case analysis/pattern match
    • Can’t use fst/snd: don’t know if it’s an s or a t!

Think: an s + t contains an s OR a t

Sums in Haskell

-- Sum types look like this:
data BoolPlusInt = Inl Bool | Inr Int

-- Building sums: two ways
myBool = Inl True :: BoolPlusInt
myInt  = Inr 1234 :: BoolPlusInt

-- Using sums: pattern match
myFun :: BoolPlusInt -> String
myFun bOrI = case bOrI of
               Inl b -> "Got bool: " ++ (show b)
               Inr i -> "Got int: "  ++ (show i)

The Algebra of Datatypes

What is an algebra?

  • A bunch of stuff you can multiply and add together
  • Think: high-school algebra, polynomials, etc.
  • How can we multiply and add types?

More care needed for non-termination
(Course theme: we won’t be careful)

When are two types “the same”?

  • Given two equivalent types t and s:
    • Program (function) converting t to s
    • Program (function) converting s to t
    • Converting back and forth should be identity
  • We call such types isomorphic, and write t \cong s

Finite types

  • A type with no values: 0 (“Void”/“Empty”/“False”)
    • No constructors
  • A type with one possible value: 1 (“Unit”)
    • Exactly one constructor: ()
  • A type with two possible values: 2 (“Bool”)
    • Exactly two constructors: true and false
  • A type with three possible values: 3 (“Three”)

Expected equations hold!

  • Basic arithmetic 2 \times 2 \cong 1 + 1 + 1 + 1 \cong 4
  • Commutativity t \times s \cong s \times t \qquad t + s \cong s + t
  • Associativity t_1 + (t_2 + t_3) \cong (t_1 + t_2) + t_3
  • Distributivity t_1 \times (t_2 + t_3) \cong (t_1 \times t_2) + (t_1 \times t_3)
  • Identities 1 \times t \cong t \times 1 \cong t \qquad 0 + t \cong t + 0 \cong t

Exponentials

  • Write s^t for function types t \to s
  • Satisfies the expected properties, for instance:
    • Arithmetic: 2^2 \cong 4, t^2 \cong t \times t
    • Tower rule: (Z^Y)^X \cong Z^{X \times Y}
    • Z^{X + Y} \cong Z^X \times Z^Y

Derivatives

  • High-school calculus: derivative of X^n is n \times X^{n - 1}
    • Surprisingly: forms the zipper of a type!
    • Original type with a “hole” in it
  • Example: derivative of pair t \times t is 2 \times t \cong t + t
    • Left: hole in first component, t in second
    • Right: hole in second component, t in first

Inductive types

data List a = Nil | Cons a (List a)
  • Reading: should satisfy List(t) \cong 1 + t \times List(t)
  • One solution: 1 + t + t^2 + t^3 + \cdots
    • Reading: either empty, or one t, or two t, …
  • Take derivative: 1 + 2 \times t + 3 \times t^2 + \cdots
    • You’ve programmed with this type before…

Haskell Wrapup

Highly experimental

  • Original goal: implement a lazy language
  • An academic experiment that escaped from the lab
    • “Avoid success at all costs”
  • Remains a testbed for wild and wacky PL ideas
  • GHC has a huge list of experimental flags
    • IncoherentInstances, UndecidableInstances, RankNTypes, GADTs, …

Haskell is extreme

  • Extreme control of side-effects: can’t just print a line!
  • Pervasive use of monads: hard to avoid
  • Style encourages lots of symbol operators
    • Impossible to Google, looks like line noise
  • Takes abstraction to an extreme
    • Highly generic and reusable code
    • Very dense: looks small, but unpacks to a lot

Tremendous influence

  • Popularized many features
    • Typeclasses and polymorphism
    • Algebraic datatypes and pattern matching
    • Higher-order functions
  • Showed: strongly-typed languages can be elegant
    • Every language should have type inference
  • Changed how people think about programming
    • Got lots of people to learn about monads

Does anyone use this?

  • More than you might think
    • Finance: Credit Suisse, DB, JPM, Standard Chartered, Barclays, HFT shops, …
    • Big tech: Microsoft, Facebook, Google, Intel, …
    • R&D: Galois, NICTA, MITRE, …
    • Security: Kaspersky, lots of blockchain, …
    • Startups: too many
  • Strengths
    • Anything working with source code
    • Static analysis, transformations, compilers, …
    • Hardware design

Can this stuff go fast?

  • Haskell code can be really fast
  • Performance tuning makes a huge difference
    • Requires very solid understanding of GHC
    • Few resources, somewhat of a dark art
    • You have to know what you’re doing

Lazy versus eager

  • Laziness is double-edged
  • Elegant, simple code via recursion
    • Very natural to work with infinite data
    • Usually don’t hit non-termination
  • Hard to reason about performance, especially space
    • Things might not be run when you think they are
    • “Space leaks”: buildup of suspended computations

Less radical cousins

  • Haskell is a member of the ML family of languages
  • Same family: OCaml, SML, F#, Purescript, …
    • More popular in industry
    • FB version with curly braces and semicolons
  • General features
    • Eager, easier to reason about performance
    • No control of side effects (no monads)
    • No typeclasses (but real modules)
    • Syntax is very similar to Haskell

Scratching the surface

  • Much more to Haskell than we covered
    • Type level programming, dependent types
    • Concurrency and parallelism
    • Generic Haskell (how does derive work?)
    • Arbitrarily complex category theory stuff
  • Related systems