"We do not collect personally identifiable information"...  "This dataset have been de-identified prior to release"... From advertisers tracking Web clicks to biomedical researchers sharing clinical records, anonymization is the main privacy protection mechanism used for sensitive data today.

I will argue that the distinction between "personally identifiable" and "non-personally identifiable" information is fallacious by showing how to infer private information from fully anonymized data in three settings: (1) records of individual transactions and preferences, illustrated by the Netflix Prize dataset, (2) social networks, and (3) recommender systems, where temporal changes in aggregate statistics allow accurate inference of hidden individual transactions.

I will then outline a program for data privacy research.  It includes several challenging problems in the design and implementation of privacy-preserving systems, domain-specific algorithmic research, as well as policy and economic issues. work.


Vitaly Shmatikov is an associate professor of computer science at the University of Texas at Austin.  He works on security and privacy. After getting his PhD from Stanford and before joining UT, he worked at SRI on formal methods for security protocol analysis.  Most recently, he served as the program co-chair of the ACM Conference on Computer and Communications Security (CCS).

To Join the Webinar:

The Webinar will be held from 12:00-1:00pm EST on January 19, 2012 in Room 110.

To attend virtually, please register by January 18, 23:59 PDT at:

After your registration is accepted, you will get an email with a URL to join the meeting. Please be sure to join a few minutes before the start of the webinar. This system does not establish a voice connection on your computer; instead, your acceptance message will have a toll-free phone number that you will be prompted to call after joining.  Please note that this registration is a manual process; therefore, do not expect an immediate acceptance.  In the event the number of requests exceeds the capacity, some requests may have to be denied.

