Inter-hospital brain tumour diagnostics using Private Federated Learning An empirical analysis of convergence in a heterogeneous, non- IID setting and a theoretical review of privacy mechanisms
Typ
Examensarbete för masterexamen
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2020
Författare
Nyström, Lukas
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This study has investigated the possibility to achieve high performing brain tumour
segmentation using Deep Learning, without breaching the strict privacy regulations
such as GDPR that governs the use of medical data. This was achieved using a novel
technique called Federated Learning (FL) in which models are shared across institutions
rather than the sensitive raw data. The aim was to develop an autonomous
AI system that can aid medical professionals in diagnosing patients more time and
resource efficient. Reducing the cost of treatment is a crucial first step towards a
more equal health care.
To achieve the objective extensive empirical experiments using the SOTA 3D ResNet
U-NET models were carried out. The experiments were divided into three parts and
used a data set comprising 3686 samples, making it almost ten times larger than
the commonly used benchmark data set. First a model was developed to work in
the common, centralised setting. It achieved human level performance and a median
dice score of 0.87. It performs well for all analysed seven sub types of brain
tumours as well as on data collected from several different sources. The latter is
a key finding, since previous studies has struggled with this due to the large interinstitution
heterogeneity in terms of data quality. The second experiment was to
extend the centralised model to the federated setting. The data was distributed
non-IID across five virtual hospitals. Each hospital first trained a local model on
its own data, which lead to only 68.2% of the benchmark performance. Then the
five sites trained a joint model using FL and the proposed novel technique adaptive
momentum, which was shown to improve the current SOTA. This improved performance
significantly, reaching as high as 88.6% of the conventional benchmark.
Finally, although FL does not share any raw data this study highlights several
other privacy vulnerabilities as well as techniques for how to protect the system
against them. It is shown that by using several layers of protection it is possible to
provide complete privacy without any significant loss of performance. The layers include
Differential Privacy, Homomorphic Encryption, Shamir’s secret sharing, AES
encryption and SHA-256 authentication. The study thus shows that it is indeed
possible to get human level performance even in a federated, private scenario when
the model is trained on non-IID and highly heterogeneous data.
Beskrivning
Ämne/nyckelord
Federated Learning , Brain Tumour Segmentation , Computer Aided Diagnostics , Privacy Preserving Machine Learning , Differential Privacy, Deep Learning