Locating faulty data in an harvested database - Extending a Metadata language with support for semantic rules to find erroneous data in a vast and incomplete database

Typ
Examensarbete för masterexamen
Master Thesis
Program
Publicerad
2012
Författare
Gundberg, Per
Steen Timle, Joel
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This thesis deals with the task of finding erroneous entries in a large database whose content have been automatically collected by scanning different sources on the world wide web. The information is divided into different events, organized in different event classes. As part of the thesis work, a language to describe semantic and structural rules on the information has been designed as an extension to the already existing Metadata language of the database. A set of rules has been written in this language which describes the extended demands. A tool to test the information in the database against rules described in the extended language has also been implemented. The result of the evaluation not only reports if an entry does not fulfill a rule, but also what part of the entry breaks the rule. This information is stored in a database for further analysis and use. Subsets of the database have been checked and during these tests, about five percent of the events did not fulfil all of the rules defined for its event class.
Beskrivning
Ämne/nyckelord
Data- och informationsvetenskap , Informations- och kommunikationsteknik , Computer and Information Science , Information & Communication Technology
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index