Language-Based Security

Topics in Security and Privacy Technologies (CS 839)

November 21, 2018

LangSec: Principles

Programs are written in programming languages

Security holes are bugs

  1. Programmer writes some code
  2. Programmer makes a mistake!
    • Forgets to check permissions
    • Mixes private and public data
    • Doesn’t allocate enough space
    • Reads from malicious input
  3. Attacker exploits the security flaw

Programming languages:
first line of defense

  • Catching errors earlier is better
  • Earliest possible time: when program is written
  • Easier to reject program than try to defend against it

Design languages
to reduce security flaws

  • Make it easier for programmer to do right thing
  • Make certain kinds of bugs impossible
  • Limit damage caused by any bugs

When are errors caught?

  • When the program is running
    • Stop program when it does something unsafe
    • “Dynamic analysis”
  • When the program is compiled
    • Reject bad program before it even runs
    • “Static analysis”

Overall Strategy

1. Pick a language

  • Real languages: Java, C, …
    • Highly complex: tons of features
    • Hard to modify language
  • Idealized “core” languages
    • Much simpler, small number of features
    • Model “essence” of real languages

2. Formalize what programs “do”

  • Run it on a machine and find out?
    • Not very useful for proofs…
  • Formalize behavior mathematically, on paper
    • Discard “unimportant” details
    • Describe how program “steps”

3. Describe how to check
a given program

  • Ideally: works without running the program
  • Other desirable features:
    • Scales up to large programs
    • Runs in a reasonable amount of time
    • Doesn’t reject too many correct programs

4. Prove correctness

  • We want to prove two things:
    • Soundness: catch all buggy programs
    • Completeness: accept all correct programs

Almost always: can’t have both!

Usually: pick soundness

  • All buggy programs are rejected
    • If the check says “safe”, then it is safe
  • But: some safe programs might be rejected
    • Hopefully, not too many

Imperative Languages

Most familiar

  • Basis of many popular languages
    • Java, C++, Python, …
  • Program executes sequence of instructions
  • Can read/write to variables

Keep essential features

  • Assignments to variables
  • Sequencing (“semicolon”)
  • Conditionals (“if-then-else”)
  • Loops

Drop fancier features

  • Memory management
  • Jumps and gotos
  • Function pointers
  • Templates

Example

Functional Languages

Maybe a bit less familiar

  • No imperative features
    • Can’t modify variables
    • Usually no looping command (instead: recursion)
  • Instead: functions
    • Define functions
    • Call (“apply”) functions on arguments

Simpler to formalize

  • Program includes “all the information”
    • Behavior doesn’t depend on state of variables
  • Program “runs” by changing the code itself
    • Simplifies all the way down to final answer

Example

Operational Semantics

Execute by “stepping”

  • Start with program
    • Imperative: plus variable setting
  • In each step, perform update:
    • Functional: modify the program
    • Imperative: update variables
  • Terminates when it stops stepping (is a “value”)

Example

Different styles

  • Big-step
    • Describe value a program eventually steps to
  • Small-step
    • Describe one step of a program

Type Systems

Assign “types” to programs

  • A type \tau describes a class of programs
    • Usually: well-behaved in some way
  • Can automatically check if program has type \tau
    • Type of program depends on types of components
    • Analysis scales to large programs

Example

Strengths

  • Lightweight
    • Checking types is simple, automatic
    • Don’t need to run program
  • Natural and intuitive
    • Can’t add a String to a Boolean
    • Programmers often think in terms of types
  • Identify correct programs

Weaknesses

  • Programmer may need to add annotations
    • Extra hints for compiler
    • Common for more complex types
  • Compiler sometimes rejects correct programs
    • Figuring out why can be very frustrating
    • May need to write program in less natural form

Types can be complex

  • Simpler types
    • String, Char, Bool, Int, function types, …
  • More complex types
    • Secret values/public values
    • Trusted values/untrusted values
    • Local data/remote data
    • Random values

Prove: soundness

  • If program has a type, it should be well-behaved
    • Relate type system to operational behavior
    • “Soundness theorem”
  • Many possible notions of “well-behaved”
    • Don’t add Strings to Bools
    • Don’t mix public and private data
    • Don’t write past end of buffer

“Well-typed programs can’t go wrong”