Tackling Missing Values in Mass Spectrometry-based Proteomics Data

dc.contributor.authorLeonard, Louise
dc.contributor.departmentChalmers tekniska högskola / Institutionen för matematiska vetenskapersv
dc.contributor.examinerKristiansson, Erik
dc.contributor.supervisorvan Zuydam, Natalie
dc.date.accessioned2021-03-10T20:31:10Z
dc.date.available2021-03-10T20:31:10Z
dc.date.issued2021sv
dc.date.submitted2020
dc.description.abstractIn the development of therapeutics, analysis of differentially abundant proteins (DAPs) using mass spectrometry (MS) is essential. However, MS-based data suffers from high rates of missing values that severely complicate downstream analyses. Various imputation methods have been proposed to deal with the missing data, but there is no standard protocol for selecting a method. Here we have comprehensively evaluated common methods, to develop a best practice for imputation to inform downstream statistical analyses of MS proteomics data. We compared the performance of five imputation methods in their application to values missing completely at random and missing not at random introduced into data from the Cancer Cell Line Encyclopedia, and data simulated from a multivariate mixed-effects model respectively. Performance was measured in true positive rate (TPR) and false positive (FPR) of detected DAPs (%adj 0 05, est. log2 fold-change ¡1, and an accuracy metric [&] 103). The FPR was below 5% for all methods under all conditions tested. If less than 10% of the data was missing, imputation did not increase the TPR compared to removing missing values. For 30% missingness irrespective of data or missingness type, the TPR was below 80%; and for 50% missingness the TPR was 25- 75% depending on imputation method. Since the FPR was controlled, no artefacts were introduced by any methods under any circumstances. For large proportions of missingness (50%), we recommend imputation with Principal Component Analysis imputation if the sample size is large (= ¡ 50). With small sample sizes (= = 10) or small proportions of missingness (10%), imputation is advised against.sv
dc.identifier.coursecodeMVEX03sv
dc.identifier.urihttps://hdl.handle.net/20.500.12380/302263
dc.language.isoengsv
dc.setspec.uppsokPhysicsChemistryMaths
dc.subjectimputation, missing data, mass spectrometry, multivariate mixed-effects models, differential abundance, proteomicssv
dc.titleTackling Missing Values in Mass Spectrometry-based Proteomics Datasv
dc.type.degreeExamensarbete för masterexamensv
dc.type.uppsokH
local.programmeEngineering mathematics and computational science (MPENM), MSc

Ladda ner

Original bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
Master_s_thesis_Louise_Leonard.pdf
Storlek:
6.71 MB
Format:
Adobe Portable Document Format
Beskrivning:

License bundle

Visar 1 - 1 av 1
Hämtar...
Bild (thumbnail)
Namn:
license.txt
Storlek:
1.14 KB
Format:
Item-specific license agreed upon to submission
Beskrivning: