Interpreting Machine Learning Models using Conditional Counterfactual Generation
Hämtar...
Ladda ner
Publicerad
Författare
Typ
Examensarbete för masterexamen
Master's Thesis
Master's Thesis
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
With the rapid development and application of complex machine learning models, the
need to interpret the internal processes of such models have become increasingly relevant.
In this thesis, a novel method for interpreting black box machine learning models is
proposed, where an autoencoder is used to generate reconstructions of data to visualize
in an interpretable way what patterns a model has learned to detect. The method is
first shown to work for a simple constructed problem, being able to interpret a model
that has learned to predict the mean of an underlying normal distribution from samples.
It is then evaluated for a more complex problem, where a model has learned to classify
the existence of disease in images from the CheXpert dataset of X-ray images. It is
demonstrated that naively implementing the method to interpret this model leads to the
autoencoder generating adversarial patterns to trick the model, instead of showing the an
interpretable explanation of what the model has learned. To mitigate this issue, the thesis
explores adding an additional model in the latent space of the conditional autoencoder and
demonstrates that this can provide a certain degree of interpretability. Because of this,
the method shows promise for interpreting black box models and with further research it
might become viable for practical use.
Beskrivning
Ämne/nyckelord
machine learning, interpretability, autoencoders, counterfactual, chest X-ray images.
