Lecture 13

Theory and Design of PL (CS 538)

March 4, 2020

Property-based Testing

Unit testing is boring

Most common kind of test today
Idea: write test cases one by one
- Write down one input (and maybe external state)
- Write down expected output
- Check if input produces expected output
Build up a lot of hand-crafted tests
- Write new tests when bugs are found
- Keep tests up to date

Idea: test properties

Idea: write down properties of programs
Properties hold for class of inputs, not just one input
Don’t need to write tests one-by-one

Randomly generate test cases!

Examples

Applying twice same as applying once (idempotence)
One function “undoes” another function (inverse)
Optimized implementation mirrors simple version
Relationships between insert, delete, lookup, etc.

Quickcheck

Haskell library for property-based testing
- Write random input generators with combinators
- Write properties of functions we want to test
Quickcheck will randomly generate and test
- “Shrinks” failing test inputs to find minimal ones
Implementations in at least 40+ other languages

Taking it for a spin

Install with Cabal (or Stack)

cabal v2-install --lib QuickCheck

Import Haskell module

import Test.QuickCheck

Documentation available here

Quickcheck demo

How to test a parser?

Parser goes from String to structured data
How to generate input Strings?
- Randomly? Almost certainly won’t parse…
Even if parser succeeds, is the answer is right?

Use the pretty-printer (HW3)

Parser: String to structured data
Printing: structured data to String
- This direction is usually easy…

Inverse property: printing data, then parsing it back should give original data!

How is the library designed?

Gen type

Gen a: something that can generate random a’s

-- Build generator drawing a's from a list of a's
elements :: [a] -> Gen a

-- Select a random generator from a list
oneof :: [Gen a] -> Gen a

-- Customize distribution of generators
frequency :: [(Int, Gen a)] -> Gen a

Gen is a monad

instance Monad Gen where
  return :: a -> Gen a

  (>>=) :: Gen a -> (a -> Gen b) -> Gen b

Return: from val of type a, build generator that always returns val
Bind: draw something from first generator (of a’s) and use to select the next generator (of b’s)

Combining generators

Combinators: build new generators out of old ones

-- Turn generator of a into generator of pairs of a
genPairOf :: Gen a -> Gen (a, a)
genPairOf g = do x <- g
                 y <- g
                 return (x, y)

-- Turn generator of a into generator of lists of a
vectorOf :: Int -> Gen a -> Gen [a]
vectorOf 0 _ = return []
vectorOf n g = do x  <- g
                  xs <- vectorOf (n - 1) g
                  return (x:xs)

Typeclass: Arbitrary

Arbitrary a: means a is “generatable”
Concretely: there is something of type Gen a

class Arbitrary a where
  arbitrary :: Gen a

Using Arbitrary

Typeclass machinery will automatically get generator
Compare with previous: no need to pass in Gen a

genPair :: Arbitrary a => Gen (a, a)
genPair = do x <- arbitrary  -- From typeclass constraint
             y <- arbitrary  -- Automatically inferred
             return (x, y)

vector :: Arbitrary a => Int -> Gen [a]
vector 0 = return []
vector n = do x  <- arbitrary
              xs <- vector (n - 1)
              return (x:xs)

Instances of Arbitrary

Library has tons of instances for base types
- Arbitrary Bool, Arbitrary Char, Arbitrary Int, …
Also has instances for more complex types

instance (Arbitrary a, Arbitrary b) 
         => Arbitrary (a, b) where        -- products
instance (Arbitrary a, Arbitrary b)
         => Arbitrary (Either a b) where  -- sums
instance Arbitrary a
         => Arbitrary [a] where           -- lists

Arbitrary products

instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where
  arbitrary = do getA <- arbitrary  -- type: Gen a
                 getB <- arbitrary  -- type: Gen b
                 return (getA, getB)

Arbitrary sums

instance (Arbitrary a, Arbitrary b) => Arbitrary (Either a b) where
  arbitrary = oneOf [ do aa <- arbitrary  -- type: Gen a
                         return (Left aa)
                    , do bb <- arbitrary  -- type: Gen b
                         return (Right bb) ]

Testing properties

Combine generator of a’s and property of a’s

forAll :: Show a => Gen a -> (a -> Bool) -> Property

myProp2 = forAll genX $ \x ->
            forAll genY $ \y ->
              fst (x, y) == x

Additional features

Sizes: control “size” of generated things
Shrinking: given a failing test case, “make it smaller”
- Search for simplest failing test cases
- Can customize how to shrink test cases
Implications: if one prop holds, then other one holds
- “If input is valid, then function behaves correctly”

Quick review: Our Favorite Types

Always the same pattern

Add a new type
Add constructor expressions
Add destructor expressions
Add typing rules for new expressions
Add evaluation rules for new expressions

Function types

Type of the form s \to t, where s, t are types
Constructing functions: \lambda x.\ e
Destructing functions: e~e'

Functions in Haskell

-- Function types look like this
myFun :: Int -> String

-- Building functions
myFun = \arg -> "Int: " ++ (show arg)

mySameFun arg = "Int: " ++ (show arg)

-- Using functions
myOtherFun = myFun 42

Product types

Type of the form s \times t, where s, t are types
Constructing pairs: (e_1, e_2)
Destructing pairs: fst(e) and snd(e)
- Or: pattern match

Think: an s \times t contains an s AND a t

Products in Haskell

-- Product types look like this:
myPair :: (Bool, Int)

-- Building pairs
myPair = (True, 1234)

-- Using pairs via projections
myFst = fst myPair  -- True
mySnd = snd myPair  -- 1234

Records in Haskell

“Record types”: products in disguise

-- Declaring a record type
data RecordType = MkRt { getBool :: Bool, getInt :: Int }
myRecord :: RecordType

-- Building records
myRecord = MkRt { getBool = True, getInt = 1234 }

-- Using records via accessors
myBool = getBool myRecord -- True
myInt  = getInt  myRecord -- 1234

-- Using records pattern match
myFoo = case myRecord of
          MkRt { getBool = b, getInt = i } -> ... b ... i ...

Sum types

Type of the form s + t, where s, t are types
Constructing sums: inl(e_1), inr(e_2)
Destructing sums: case analysis/pattern match
- Can’t use fst/snd: don’t know if it’s an s or a t!

Think: an s + t contains an s OR a t

Sums in Haskell

-- Sum types look like this:
data BoolPlusInt = Inl Bool | Inr Int

-- Building sums: two ways
myBool = Inl True :: BoolPlusInt
myInt  = Inr 1234 :: BoolPlusInt

-- Using sums: pattern match
myFun :: BoolPlusInt -> String
myFun bOrI = case bOrI of
               Inl b -> "Got bool: " ++ (show b)
               Inr i -> "Got int: "  ++ (show i)

The Algebra of Datatypes

What is an algebra?

A bunch of stuff you can multiply and add together
Think: high-school algebra, polynomials, etc.
How can we multiply and add types?

More care needed for non-termination
(Course theme: we won’t be careful)

When are two types “the same”?

Given two equivalent types t and s:
- Program (function) converting t to s
- Program (function) converting s to t
- Converting back and forth should be identity
We call such types isomorphic, and write t \cong s

Finite types

A type with no values: 0 (“Void”/“Empty”/“False”)
- No constructors
A type with one possible value: 1 (“Unit”)
- Exactly one constructor: ()
A type with two possible values: 2 (“Bool”)
- Exactly two constructors: true and false
A type with three possible values: 3 (“Three”)
…

Expected equations hold!

Basic arithmetic 2 \times 2 \cong 1 + 1 + 1 + 1 \cong 4
Commutativity t \times s \cong s \times t \qquad t + s \cong s + t
Associativity t_1 + (t_2 + t_3) \cong (t_1 + t_2) + t_3
Distributivity t_1 \times (t_2 + t_3) \cong (t_1 \times t_2) + (t_1 \times t_3)
Identities 1 \times t \cong t \times 1 \cong t \qquad 0 + t \cong t + 0 \cong t

Exponentials

Write s^t for function types t \to s
Satisfies the expected properties, for instance:
- Arithmetic: 2^2 \cong 4, t^2 \cong t \times t
- Tower rule: (Z^Y)^X \cong Z^{X \times Y}
- Z^{X + Y} \cong Z^X \times Z^Y

Derivatives

High-school calculus: derivative of X^n is n \times X^{n - 1}
- Surprisingly: forms the zipper of a type!
- Original type with a “hole” in it
Example: derivative of pair t \times t is 2 \times t \cong t + t
- Left: hole in first component, t in second
- Right: hole in second component, t in first

Inductive types

data List a = Nil | Cons a (List a)

Reading: should satisfy List(t) \cong 1 + t \times List(t)
One solution: 1 + t + t^2 + t^3 + \cdots
- Reading: either empty, or one t, or two t, …
Take derivative: 1 + 2 \times t + 3 \times t^2 + \cdots
- You’ve programmed with this type before…

Haskell Wrapup

Highly experimental

Original goal: implement a lazy language
An academic experiment that escaped from the lab
- “Avoid success at all costs”
Remains a testbed for wild and wacky PL ideas
GHC has a huge list of experimental flags
- IncoherentInstances, UndecidableInstances, RankNTypes, GADTs, …

Haskell is extreme

Extreme control of side-effects: can’t just print a line!
Pervasive use of monads: hard to avoid
Style encourages lots of symbol operators
- Impossible to Google, looks like line noise
Takes abstraction to an extreme
- Highly generic and reusable code
- Very dense: looks small, but unpacks to a lot

Tremendous influence

Popularized many features
- Typeclasses and polymorphism
- Algebraic datatypes and pattern matching
- Higher-order functions
Showed: strongly-typed languages can be elegant
- Every language should have type inference
Changed how people think about programming
- Got lots of people to learn about monads

Does anyone use this?

More than you might think
- Finance: Credit Suisse, DB, JPM, Standard Chartered, Barclays, HFT shops, …
- Big tech: Microsoft, Facebook, Google, Intel, …
- R&D: Galois, NICTA, MITRE, …
- Security: Kaspersky, lots of blockchain, …
- Startups: too many
Strengths
- Anything working with source code
- Static analysis, transformations, compilers, …
- Hardware design

Can this stuff go fast?

Haskell code can be really fast
- Can be competitive with C, sometimes
- GHC is highly optimizing, use purity and types
Performance tuning makes a huge difference
- Requires very solid understanding of GHC
- Few resources, somewhat of a dark art
- You have to know what you’re doing

Lazy versus eager

Laziness is double-edged
Elegant, simple code via recursion
- Very natural to work with infinite data
- Usually don’t hit non-termination
Hard to reason about performance, especially space
- Things might not be run when you think they are
- “Space leaks”: buildup of suspended computations

Less radical cousins

Haskell is a member of the ML family of languages
Same family: OCaml, SML, F#, Purescript, …
- More popular in industry
- FB version with curly braces and semicolons
General features
- Eager, easier to reason about performance
- No control of side effects (no monads)
- No typeclasses (but real modules)
- Syntax is very similar to Haskell

Scratching the surface

Much more to Haskell than we covered
- Type level programming, dependent types
- Concurrency and parallelism
- Generic Haskell (how does derive work?)
- Arbitrarily complex category theory stuff
Related systems
- Liquid Haskell: types with custom assertions
- Agda: computer-aided proof assistant