Graph Classification with Differential Privacy
Examensarbete för masterexamen
Complex adaptive systems (MPCAS), MSc
Thufvesson Retzner, Carl
With increasing usage of online services such as email, social networks and online shopping more data than ever before is being gathered. Many types of data for which structure, flow and relationships are important may naturally be represented by graphs. Services may want to use machine learning algorithms to gain insight about their customers or users, and to successfully use the concepts of machine learning they must take the structure and properties of graph data into account. Seemingly innocuous data could potentially be used to infer sensitive information about individuals and should be kept private, the algorithms and techniques used must therefore take privacy into consideration. In this thesis we consider the combination of machine learning and privacy by bringing together the concepts of support vector machines and differential privacy. We examine the classification of graphs by means of kernel methods and present a framework for constructing private representations of two well known graph kernels, the random walk kernel and graphlet kernels. Furthermore, to allow for classification of large graphs we present a novel sampling scheme for approximation of subgraph counts. We evaluate both sampling and classification using four real world datasets consisting of social, road and protein networks.
Data- och informationsvetenskap , Computer and Information Science