Developing a Cooperative Data Cleaning Tool
Typ
Examensarbete för masterexamen
Program
Engineering mathematics and computational science (MPENM), MSc
Publicerad
2021
Författare
Chatterjee, Devosmita
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Abstract
Presently, large amount of data generated by organizations drives their business decisions. The
data is usually inconsistent, inaccurate and incomplete. Poor data quality may lead to incorrect
decisions for the organizations and hence, negatively affect them. Thus, high quality data is of
utmost priority to draw good and valid business decisions and strategies. Data cleaning is the
ultimate way to solve the data quality issues. But, data cleaning is really a time consuming
task. Thus, tools which can help with the task are needed. This demands data cleaning tools for
systematically examining data for errors and automatically cleaning them using algorithms. These
data cleaning tools helps organizations save time and increase their efficiency.
In this thesis, we develop a cooperative, free and open source data cleaning standalone application
‘DataCleaningTool’ in order to achieve the task of data cleaning. This tool is able to identify the
potential data problems and report results such that the users can take informed decisions to clean
data effectively.
Beskrivning
Ämne/nyckelord
Data Cleaning, Noisy Data, Missing Data, MissForest Method, Outliers, Data Transformation, Interactive Data Visualization