Entity Disambiguation in Anonymized Graphs Using Graph Kernels

Hermansson, Linus; Kerola, Tommi

Entity Disambiguation in Anonymized Graphs Using Graph Kernels

dc.contributor.author	Hermansson, Linus
dc.contributor.author	Kerola, Tommi
dc.contributor.department	Chalmers tekniska högskola / Institutionen för data- och informationsteknik (Chalmers)	sv
dc.contributor.department	Chalmers University of Technology / Department of Computer Science and Engineering (Chalmers)	en
dc.date.accessioned	2019-07-03T13:17:30Z
dc.date.available	2019-07-03T13:17:30Z
dc.date.issued	2013
dc.description.abstract	In recent years, the explosion of available online information has brought forth new data mining applications into the spotlight, such as automated querying about realworld entities. This requires extraction of identifiers such as names and places from text. The problem, however, is complicated by the non-uniqueness of identifiers. A motivating example is the name Chris Anderson, which could either refer to Chris Anderson, the curator of TED Talks, or Chris Anderson, the former editor-in-chief of WIRED Magazine. Both individuals work in overlapping fields, and deciding whom is referred to could be a difficult task, even when considering context. Correctly identifying and resolving such ambiguous identifiers is crucial for enabling such applications to advance from the research lab into practical usage. This master's thesis presents a novel method for entity disambiguation in anonymized graphs based on local neighborhood structure. Most existing approaches leverage node information, which might not be available in several contexts due to privacy concerns, or information about the sources of the data. We consider this problem in the supervised setting where we are provided only with a base graph and a set of nodes labeled as ambiguous or unambiguous. We characterize the similarity between two nodes based on their local neighborhood structure using graph kernels; and efficiently solve the resulting classification task using a support vector machine (SVM), a standard machine learning technique. Leveraging kernels is a powerful method for extending linear SVM classifiers to nonlinear classification tasks. Recently, a number of graph kernels have been proposed for classifying graph structures. In this thesis, we present extensions of two existing graphs kernels, namely, the direct product kernel and the shortest-path kernel, with significant improvements in accuracy. For the direct product kernel, our extension also provides significant computational benefits. A key concern today is scalability of algorithms to web-scale datasets. This poses new challenges for designing new machine learning methods. We use GraphLab, a framework for distributed computing, to allow our extended kernels to be computed in parallel. This ensures scalability and allows our method to handle large-size data. We test our method on two real-world datasets, comparing our approach to a stateof-the-art method. We show that using less information, our method is significantly better in terms of either speed or accuracy or both.
dc.identifier.uri	https://hdl.handle.net/20.500.12380/184373
dc.language.iso	eng
dc.setspec.uppsok	Technology
dc.subject	Data- och informationsvetenskap
dc.subject	Informations- och kommunikationsteknik
dc.subject	Computer and Information Science
dc.subject	Information & Communication Technology
dc.title	Entity Disambiguation in Anonymized Graphs Using Graph Kernels
dc.type.degree	Examensarbete för masterexamen	sv
dc.type.degree	Master Thesis	en
dc.type.uppsok	H
local.programme	Computer science – algorithms, languages and logic (MPALG), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1

Namn:: 184373.pdf
Size:: 1.06 MB
Format:: Adobe Portable Document Format
Description:: Fulltext

Ladda ner

Samlingar

Examensarbeten för masterexamen