Enabling Accurate Analysis of Private Network Data

21 Jul
Wednesday, 07/21/2010 6:00am to 8:00am
Ph.D. Thesis Defense

Michael Hay

Computer Science Building, Room 151

This dissertation addresses the challenge of enabling accurateanalysis of network data while ensuring the protection of network participants' privacy. This is an important problem: massive amounts of data are being collected (facebook activity, email correspondence, cell phone records), there is huge interest in analyzing the data, but the data are not being shared due to concerns about privacy. Despite much research in privacy-preserving data analysis, existing technologies fail to provide a solution because they were designed fortables, not networks, and cannot be easily adapted to handle the complexities of network data.

Building on the foundation of prior work, we develop several technologies that advance us toward our goal. First, we develop a framework for assessing the risk of publishing a network that has been "anonymized." Using this framework, we show that only a small amount of background about local network structure is needed to re-identify an "anonymous" individual. This motivates our second contribution: an algorithm that transforms the structure of the network to provably lower re-identification risk. In comparison with other algorithms, we show that our approach more accurately preserves important features of the network topology. Finally, we consider an alternative paradigm, in which the analyst can analyze a private data through a carefully controlled query interface. We show that the degree sequence of a network can be accurately estimated under strong guarantees of privacy.

Advisors: Gerome Miklau & David Jensen