A machine learning approach for predicting bacteria content in drinking water

Loading...
Thumbnail Image

Date

Type

Examensarbete för masterexamen
Master's Thesis

Model builders

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The current method for finding whether drinking water contains bacterial contamination is a very slow process and it can take up to eight days before the results are obtained. During this time, a significant proportion of the population has potentially obtained diseases from contaminated water. As a mitigating action, this thesis aimed to understand if machine learning could be a promising method for forecasting the bacteria level and how such a model could be designed. The project was performed in association with a case company called Nocoli, which is spun out of Chalmers Ventures and desired an examination of the potential implementation. A literature review including eight different case studies of how machine learning was previously applied in the field and three semi-structured interviews with industryspecific stakeholders were conducted. The research methodology originated from the fact that both an overview of the current industry situation as well as machine learning applicability was required. Moreover, by using an extracted theory of machine learning algorithms for different objectives, the case studies were evaluated to find patterns that could meet the case companys demands. It was found that machine learning is promising and desired in the industry to improve current operations. The Random Forest algorithm was recommended in the initial stage due to its trade-off between accuracy and interpretability. Data on bacterial content and other factors including weather was intended as the data source. The recommendation included a 3:1:1 split between training-, validation-, and test sets as well as using a recursive feature selection algorithm. Additionally, a combination of error measures was recommended including Mean Squared Error with an out-of-bag supplement to reduce overfitting. Furthermore, although no data could be obtained to evaluate the recommended model, it was concluded that machine learning could have a positive impact on today’s approach and contribute to improved water management and safety by enabling reliable forecasts.

Description

Keywords

machine learning, forecasting, drinking water quality, contaminated water, drinking water treatment, escherichia coli prediction, HPC method, Random Forest.

Citation

Architect

Location

Type of building

Build Year

Model type

Scale

Material / technology

Index

Endorsement

Review

Supplemented By

Referenced By