Weakly Supervised Deep Learning Classification
Publicerad
Författare
Typ
Examensarbete för masterexamen
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
The usage of increasingly large and complex sets of data is rapidly gaining traction
within healthcare and life sciences. To handle these datasets prompts for more sophisticated
methods. A key such method is Artificial Intelligence, AI. There are
numerous examples of successful application of AI in health care, especially in diagnostic
disciplines, e.g., automatic analysis of X-ray images, treatment recommendations
and monitoring adherence [25]. In some of these disciplines, AI have been
demonstrated to be able to outperform humans. AI is therefore receiving more and
more attention as a way to increase efficiency and safety in healthcare.
A key hindrance to the adoption of such systems is the large quantities of labeled
data required to train deep learning models. One proposed method of overcoming
this annotation bottleneck is weak supervision, or data programming, where the data
annotation is done using labeling functions. These labeling functions are used to
translate the expert domain knowledge of the annotator using statistical models into
“denoised” or probabilistic labels that can be used to train deep learning algorithms
without the use of ground truth data provided by an expert annotator.
This thesis investigates the Weak Supervision method for concept classification from
electronic health records. We describe the development of a distant supervision
method, where the external medical database MeSH is used to create labeling functions
for different phenotypes (concepts) from the MIMIC-III database [20]. These
labeling functions are then used to create probabilistic labels for a few different
deep learning models to train on. A deep CNN model trained on the probabilistic
labels from the labeling functions achieves a f1-score of 0.93 on the test set and is
clearly able to generalize beyond the probabilistic labels it is trained on. It can be
concluded that weak supervision seems to be a promising approach for NLP problems
within the medical field that could potentially drastically decrease the need for
expert annotations, which is both time-consuming and expensive.
Beskrivning
Ämne/nyckelord
weak supervision, deep learning, machine learning, NLP