Locating faulty data in an harvested database - Extending a Metadata language with support for semantic rules to find erroneous data in a vast and incomplete database

Publicerad

Typ

Examensarbete för masterexamen
Master Thesis

Program

Modellbyggare

Tidskriftstitel

ISSN

Volymtitel

Utgivare

Sammanfattning

This thesis deals with the task of finding erroneous entries in a large database whose content have been automatically collected by scanning different sources on the world wide web. The information is divided into different events, organized in different event classes. As part of the thesis work, a language to describe semantic and structural rules on the information has been designed as an extension to the already existing Metadata language of the database. A set of rules has been written in this language which describes the extended demands. A tool to test the information in the database against rules described in the extended language has also been implemented. The result of the evaluation not only reports if an entry does not fulfill a rule, but also what part of the entry breaks the rule. This information is stored in a database for further analysis and use. Subsets of the database have been checked and during these tests, about five percent of the events did not fulfil all of the rules defined for its event class.

Beskrivning

Ämne/nyckelord

Data- och informationsvetenskap, Informations- och kommunikationsteknik, Computer and Information Science, Information & Communication Technology

Citation

Arkitekt (konstruktör)

Geografisk plats

Byggnad (typ)

Byggår

Modelltyp

Skala

Teknik / material

Index

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced