Feature selection in an industrial data set

dc.contributor.authorAndreasson, Philip
dc.contributor.departmentChalmers tekniska högskola / Institutionen för fysiksv
dc.contributor.examinerGranath, Mats
dc.contributor.supervisorStormig, Jimmy
dc.date.accessioned2020-01-08T12:39:07Z
dc.date.available2020-01-08T12:39:07Z
dc.date.issued2019sv
dc.date.submitted2019
dc.description.abstractFeature selection is a technique for reducing the dimensionality of data sets which can provide benefits in terms of computational time, performance and interpretability. This thesis presents the development of a genetic algorithm for feature selection in an industrial data set on investigations, where a large proportion of the features are categorical. The genetic algorithm is designed to always select one-hot encoded categorical features as a group. The quality of a proposed feature selection subset was assessed using Naive Bayes classifiers, decision trees, artificial neural networks, support vector machines and logistic regression classifiers. The classification performance of the subsets obtained from the genetic algorithm were further compared to stepwise forward selection, Relief, LASSO and random forests. The results showed that the dimensionality of the data set could be reduced drastically while maintaining a good classification accuracy. Most significant results were obtained for the Naive Bayes classifier, where the genetic algorithm and stepwise forward selection managed to produce subsets with prediction performances that significantly exceeded both the full data set and the subsets from the other feature selection algorithms. For the other classifiers, the differences were smaller. Given the extensive time required to run the genetic algorithm and stepwise forward selection, the other feature selection algorithms are a better choice for these classifiers.sv
dc.identifier.coursecodeTIFX05sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/300642
dc.language.isoengsv
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectfeature selectionsv
dc.subjectgenetic algorithmssv
dc.subjectcategorical featuressv
dc.titleFeature selection in an industrial data setsv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeComplex adaptive systems (MPCAS), MSc
Ladda ner
Original bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Philip_Andreasson_Master_Thesis.pdf
Storlek:
1.04 MB
Format:
Adobe Portable Document Format
Beskrivning:
License bundle
Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: