Course Welcome

Topics in Security and Privacy Technologies (CS 839)

September 05, 2018

Security and Privacy

It’s everywhere!

Stuff is totally insecure!

It’s really difficult!

What topics to cover?

A really, really vast field

  • Things we will not be able to cover:
    • Real-world attacks
    • Computer systems security
    • Defenses and countermeasures
    • Social aspects of security
    • Theoretical cryptography

Theme 1: Formalizing S&P

  • Mathematically formalize notions of security
  • Rigorously prove security
  • Guarantee that certain breakages can’t occur

Remember: definitions are tricky things!

Theme 2: Automating S&P

  • Use computers to help build more secure systems
  • Automatically check security properties
  • Search for attacks and vulnerabilities

Our focus: four modules

  1. Differential privacy
  2. Applied cryptography
  3. Language-based security
  4. Adversarial machine learning

Differential privacy

A mathematically solid definition of privacy

  • Simple and clean formal property
  • Satisfied by many algorithms
  • Degrades gracefully under composition

Applied crypto

Computing in an untrusted world

  • Proving you know something without revealing it
  • Certifying that you did a computation correctly
  • Computing on encrypted data, without decryption
  • Computing joint answer without revealing your data

Language-based security

Ensure security by construction

  • Programming languages for security
  • Compiler checks that programs are secure
  • Information flow, privacy, cryptography, …

Adversarial machine learning

Manipulating ML systems

  • Crafting examples to fool ML systems
  • Messing with training data
  • Extracting training information

Tedious course details

Class format

Paper presentations

  • Sign up to lead a discussion on one paper
  • Suggested topic, papers, and schedule on website
  • Before each presentation:
    • I will send out brief questions
    • Please email me brief answers

If you want advice, come talk to me!

Final project

  • Work individually or in pairs
  • Project details and suggestions on website
  • Key dates:
    • September 19: Pick groups and topic
    • October 15: Milestone 1
    • November 14: Milestone 2
    • End of class: Final writeups and presentations

If you want advice, come talk to me!

Todos for you

  1. Complete the course survey
  2. Check out the course website
  3. Think about what paper you want to present
  4. Brainstorm project topics

Defining privacy

What does privacy mean?

  • Many kinds of “privacy breaches”
    • Obvious: third party learns your private data
    • Retention: you give data, company keeps it forever
    • Passive: you don’t know your data is collected

Why is privacy hard?

  • Hard to pin down what privacy means!
  • Once data is out, can’t put it back into the bottle
  • Privacy-preserving data release today may violate privacy tomorrow, combined with “side-information”
  • Data may be used many times, often doesn’t change

Hiding private data

  • Delete “personally identifiable information”
    • Name and age
    • Birthday
    • Social security number
  • Publish the “anonymized” or “sanitized” data

Problem: not enough

  • Can match up anonymized data with public sources
  • De-anonymize data, associate names to records
  • Really, really hard to think about side information
    • May not even be public at time of data release!

Netflix challenge

  • Database of movie ratings
  • Published: ID number, movie rating, and rating date
  • Attack: from public IMDB ratings, recover names for Netflix data

“Blending in a crowd”

  • Only release records that are similar to others
  • k-anonymity: require at least k identical records
  • Other variants: l-diversity, t-closeness, …

Problem: composition

  • Repeating k-anonymous releases may lose privacy
  • Privacy protection may fall off a cliff
    • First few queries fine, then suddenly total violation
  • Again, interacts poorly with side-information

Differential privacy

  • Proposed by Dwork, McSherry, Nissim, Smith (2006)

A new approach to formulating privacy goals: the risk to one’s privacy, or in general, any type of risk… should not substantially increase as a result of participating in a statistical database. This is captured by differential privacy.

Basic setting

  • Private data: set of records from individuals
    • Each individual: one record
    • Example: set of medical records
  • Private query: function from database to output
    • Randomized: adds noise to protect privacy

Basic definition

A query Q is (\varepsilon, \delta)-differentially private if for every two databases db, db' that differ in one individual’s record, and for every subset S of outputs, we have:

\Pr[ Q(db) \in S ] \leq e^\varepsilon \cdot \Pr[ Q(db') \in S ] + \delta

// reveal.js plugins