Machine learning based warning system for failed procurement classification documents
Examensarbete för masterexamen
Warning systems in the Machine Learning field of study, is a tool that generates a warning based on a model’s prediction results. This thesis’s study topic is to create such system to identify possible problematic procurement classification documents. Given a database of a company, a dataset was created for which a feature analysis was made to investigate which properties of a document can cause an either classification or formatting error. The challenging part of the research was the feature engineering since each feature had to be preprocessed differently based on the importance of the information contained. Moreover, different supervised machine learning methods were implemented and hyperparameter tuned, using an algorithm called Grid Search. After the evaluation and comparison of the models, XGBoost Classifier was found to be the most successful both in terms of performance and computational time achieving 90,5% accuracy. However, by gathering more data, especially containing formatting errors, it is anticipated that the performance of the warning system using the XGBoost will be improved.
Warning system, supervised learning, machine learning, feature engineering, XGBoost Classifier