Inter-hospital brain tumour diagnostics using Private Federated Learning An empirical analysis of convergence in a heterogeneous, non- IID setting and a theoretical review of privacy mechanisms

Typ
Examensarbete för masterexamen
Program
Complex adaptive systems (MPCAS), MSc
Publicerad
2020
Författare
Nyström, Lukas
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
This study has investigated the possibility to achieve high performing brain tumour segmentation using Deep Learning, without breaching the strict privacy regulations such as GDPR that governs the use of medical data. This was achieved using a novel technique called Federated Learning (FL) in which models are shared across institutions rather than the sensitive raw data. The aim was to develop an autonomous AI system that can aid medical professionals in diagnosing patients more time and resource efficient. Reducing the cost of treatment is a crucial first step towards a more equal health care. To achieve the objective extensive empirical experiments using the SOTA 3D ResNet U-NET models were carried out. The experiments were divided into three parts and used a data set comprising 3686 samples, making it almost ten times larger than the commonly used benchmark data set. First a model was developed to work in the common, centralised setting. It achieved human level performance and a median dice score of 0.87. It performs well for all analysed seven sub types of brain tumours as well as on data collected from several different sources. The latter is a key finding, since previous studies has struggled with this due to the large interinstitution heterogeneity in terms of data quality. The second experiment was to extend the centralised model to the federated setting. The data was distributed non-IID across five virtual hospitals. Each hospital first trained a local model on its own data, which lead to only 68.2% of the benchmark performance. Then the five sites trained a joint model using FL and the proposed novel technique adaptive momentum, which was shown to improve the current SOTA. This improved performance significantly, reaching as high as 88.6% of the conventional benchmark. Finally, although FL does not share any raw data this study highlights several other privacy vulnerabilities as well as techniques for how to protect the system against them. It is shown that by using several layers of protection it is possible to provide complete privacy without any significant loss of performance. The layers include Differential Privacy, Homomorphic Encryption, Shamir’s secret sharing, AES encryption and SHA-256 authentication. The study thus shows that it is indeed possible to get human level performance even in a federated, private scenario when the model is trained on non-IID and highly heterogeneous data.
Beskrivning
Ämne/nyckelord
Federated Learning , Brain Tumour Segmentation , Computer Aided Diagnostics , Privacy Preserving Machine Learning , Differential Privacy, Deep Learning
Citation
Arkitekt (konstruktör)
Geografisk plats
Byggnad (typ)
Byggår
Modelltyp
Skala
Teknik / material
Index