Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Parker SJ, Chen L, Spivia W, Saylor G, Mao C, Venkatraman V, Holewinski RJ, Mastali M, Pandey R, Athas G, Yu G, Fu Q, Troxlair D, Vander Heide R, Herrington D, Van Eyk JE, Wang Y. Identification of Putative Early Atherosclerosis Biomarkers by Unsupervised Deconvolution of Heterogeneous Vascular Proteomes. J Proteome Res 2020;19:2794-2806. [PMID: 32202800 DOI: 10.1021/acs.jproteome.0c00118] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

For:	Parker SJ, Chen L, Spivia W, Saylor G, Mao C, Venkatraman V, Holewinski RJ, Mastali M, Pandey R, Athas G, Yu G, Fu Q, Troxlair D, Vander Heide R, Herrington D, Van Eyk JE, Wang Y. Identification of Putative Early Atherosclerosis Biomarkers by Unsupervised Deconvolution of Heterogeneous Vascular Proteomes. J Proteome Res 2020;19:2794-2806. [PMID: 32202800 DOI: 10.1021/acs.jproteome.0c00118] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Number

Cited by Other Article(s)

Plubell DL, Huang E, Spencer SE, Poston KL, Montine TJ, MacCoss MJ. Data Independent Acquisition to Inform the Development of Targeted Proteomics Assays Using a Triple Quadrupole Mass Spectrometer. J Proteome Res 2025. [PMID: 40328514 DOI: 10.1021/acs.jproteome.5c00016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]

Xu S, Chuang CY, Hawkins CL, Hägglund P, Davies MJ. Quantitative analysis of the proteome and protein oxidative modifications in primary human coronary artery endothelial cells and associated extracellular matrix. Redox Biol 2025;81:103524. [PMID: 39954365 PMCID: PMC11875191 DOI: 10.1016/j.redox.2025.103524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 01/28/2025] [Accepted: 01/30/2025] [Indexed: 02/17/2025] Open

Abstract

Vascular endothelial cells (ECs) play a key role in physiology by controlling arterial contraction and relaxation, and molecular transport. EC dysfunction is associated with multiple pathologies. Here, we characterize the cellular and extracellular matrix (ECM) proteomes of primary human coronary artery ECs, from multiple donors, and oxidation/nitration products formed on these during cell culture, using liquid chromatography-mass spectrometry. In total ∼9900 proteins were identified in cells from 3 donors, with ∼7000 proteins per donor. Of these ∼5300 were consistently identified, indicating some heterogeneity across the donors, with age a possible cause. Multiple endogenous oxidation products were detected on both ECM and cellular proteins (and particularly endoplasmic reticulum species). In contrast, nitration was mostly detected on cell proteins and particularly cytoskeletal proteins, consistent with intracellular generation of nitrating agents, possibly from endothelial nitric oxide synthase (eNOS) or peroxidase enzymes. The modifications are ascribed to both physiological enzymatic activity (hydroxylation at proline/lysine; predominantly on ECM proteins and especially collagens) and the formation of reactive species (oxidation at tryptophan/tyrosine/histidine; nitration at tryptophan/tyrosine). The identified sites are present on a limited number of peptides (104 oxidized; 23 nitrated) from a modest number of proteins. A small number of proteins were detected with multiple modifications, consistent with these being selective and specific targets. Several nitrated peptides were consistently detected across all donors, and also in human smooth muscle cells suggesting that these are major targets in the vascular proteome. These data provide a 'background' proteome dataset for studies of endothelial dysfunction in disease.

Collapse

Du D, Bhardwaj S, Lu Y, Wang Y, Parker SJ, Zhang Z, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Embracing the informative missingness and silent gene in analyzing biologically diverse samples. Sci Rep 2024;14:28265. [PMID: 39550430 PMCID: PMC11569126 DOI: 10.1038/s41598-024-78076-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 10/28/2024] [Indexed: 11/18/2024] Open

Du D, Bhardwaj S, Lu Y, Wang Y, Parker SJ, Zhang Z, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. ABDS: a bioinformatics tool suite for analyzing biologically diverse samples. RESEARCH SQUARE 2024:rs.3.rs-4419408. [PMID: 38853832 PMCID: PMC11160903 DOI: 10.21203/rs.3.rs-4419408/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]

Nguyen H, Nguyen H, Tran D, Draghici S, Nguyen T. Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges. Nucleic Acids Res 2024;52:4761-4783. [PMID: 38619038 PMCID: PMC11109966 DOI: 10.1093/nar/gkae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 03/01/2024] [Accepted: 04/02/2024] [Indexed: 04/16/2024] Open

Wu CT, Du D, Chen L, Dai R, Liu C, Yu G, Bhardwaj S, Parker SJ, Zhang Z, Clarke R, Herrington DM, Wang Y. CAM3.0: determining cell type composition and expression from bulk tissues with fully unsupervised deconvolution. Bioinformatics 2024;40:btae107. [PMID: 38407991 PMCID: PMC10924278 DOI: 10.1093/bioinformatics/btae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 01/13/2024] [Accepted: 02/25/2024] [Indexed: 02/28/2024] Open

Lorentzen LG, Yeung K, Eldrup N, Eiberg JP, Sillesen HH, Davies MJ. Proteomic analysis of the extracellular matrix of human atherosclerotic plaques shows marked changes between plaque types. Matrix Biol Plus 2024;21:100141. [PMID: 38292008 PMCID: PMC10825564 DOI: 10.1016/j.mbplus.2024.100141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/01/2024] [Accepted: 01/04/2024] [Indexed: 02/01/2024] Open

Abstract

Cardiovascular disease is the leading cause of death, with atherosclerosis the major underlying cause. While often asymptomatic for decades, atherosclerotic plaque destabilization and rupture can arise suddenly and cause acute arterial occlusion or peripheral embolization resulting in myocardial infarction, stroke and lower limb ischaemia. As extracellular matrix (ECM) remodelling is associated with plaque instability, we hypothesized that the ECM composition would differ between plaques. We analyzed atherosclerotic plaques obtained from 21 patients who underwent carotid surgery following recent symptomatic carotid artery stenosis. Plaques were solubilized using a new efficient, single-step approach. Solubilized proteins were digested to peptides, and analyzed by liquid chromatography-mass spectrometry using data-independent acquisition. Identification and quantification of 4498 plaque proteins was achieved, including 354 ECM proteins, with unprecedented coverage and high reproducibility. Multidimensional scaling analysis and hierarchical clustering indicate two distinct clusters, which correlate with macroscopic plaque morphology (soft/unstable versus hard/stable), ultrasound classification (echolucent versus echogenic) and the presence of hemorrhage/ulceration. We identified 714 proteins with differential abundances between these groups. Soft/unstable plaques were enriched in proteins involved in inflammation, ECM remodelling, and protein degradation (e.g. matrix metalloproteinases, cathepsins). In contrast, hard/stable plaques contained higher levels of ECM structural proteins (e.g. collagens, versican, nidogens, biglycan, lumican, proteoglycan 4, mineralization proteins). These data indicate that a single-step proteomics method can provide unique mechanistic insights into ECM remodelling and inflammatory mechanisms within plaques that correlate with clinical parameters, and help rationalize plaque destabilization. These data also provide an approach towards identifying biomarkers for individualized risk profiling of atherosclerosis.

Collapse

Momenzadeh A, Kreimer S, Guo D, Ayres M, Berman D, Chyu KY, Shah PK, Milewicz D, Azizzadeh A, Meyer JG, Parker S. Differentiation between Descending Thoracic Aortic Diseases using Machine Learning and Plasma Proteomic Signatures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.26.538468. [PMID: 37162892 PMCID: PMC10168345 DOI: 10.1101/2023.04.26.538468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Abstract

Background

Descending thoracic aortic aneurysms and dissections can go undetected until severe and catastrophic, and few clinical indices exist to screen for aneurysms or predict risk of dissection.

Methods

This study generated a plasma proteomic dataset from 75 patients with descending type B dissection (Type B) and 62 patients with descending thoracic aortic aneurysm (DTAA). Standard statistical approaches were compared to supervised machine learning (ML) algorithms to distinguish Type B from DTAA cases. Quantitatively similar proteins were clustered based on linkage distance from hierarchical clustering and ML models were trained with uncorrelated protein lists across various linkage distances with hyperparameter optimization using 5-fold cross validation. Permutation importance (PI) was used for ranking the most important predictor proteins of ML classification between disease states and the proteins among the top 10 PI protein groups were submitted for pathway analysis.

Results

Of the 1,549 peptides and 198 proteins used in this study, no peptides and only one protein, hemopexin (HPX), were significantly different at an adjusted p-value <0.01 between Type B and DTAA cases. The highest performing model on the training set (Support Vector Classifier) and its corresponding linkage distance (0.5) were used for evaluation of the test set, yielding a precision-recall area under the curve of 0.7 to classify between Type B from DTAA cases. The five proteins with the highest PI scores were immunoglobulin heavy variable 6-1 (IGHV6-1), lecithin-cholesterol acyltransferase (LCAT), coagulation factor 12 (F12), HPX, and immunoglobulin heavy variable 4-4 (IGHV4-4). All proteins from the top 10 most important correlated groups generated the following significantly enriched pathways in the plasma of Type B versus DTAA patients: complement activation, humoral immune response, and blood coagulation.

Conclusions

We conclude that ML may be useful in differentiating the plasma proteome of highly similar disease states that would otherwise not be distinguishable using statistics, and, in such cases, ML may enable prioritizing important proteins for model prediction.

Collapse

Affiliation(s)

Amanda Momenzadeh Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
Simion Kreimer Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
Dongchuan Guo Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Texas
Matthew Ayres Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
Daniel Berman Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA Cedars-Sinai Imaging Department, Cedars Sinai Medical Center, Lost Angeles, California, USA
Kuang-Yuh Chyu Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
Prediman K Shah Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
Dianna Milewicz Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Texas
Ali Azizzadeh Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
Jesse G. Meyer Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
Sarah Parker Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA Department of Biomedical Sciences, Cedars Sinai Medical Center, Los Angeles California, USA

Collapse

Hasman M, Mayr M, Theofilatos K. Uncovering Protein Networks in Cardiovascular Proteomics. Mol Cell Proteomics 2023;22:100607. [PMID: 37356494 PMCID: PMC10460687 DOI: 10.1016/j.mcpro.2023.100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 05/01/2023] [Accepted: 06/20/2023] [Indexed: 06/27/2023] Open

Du D, Bhardwaj S, Parker SJ, Cheng Z, Zhang Z, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. ABDS: tool suite for analyzing biologically diverse samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.05.547797. [PMID: 37461566 PMCID: PMC10349978 DOI: 10.1101/2023.07.05.547797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/23/2023]

Wu CT, Shen M, Du D, Cheng Z, Parker SJ, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Cosbin: cosine score-based iterative normalization of biologically diverse samples. BIOINFORMATICS ADVANCES 2022;2:vbac076. [PMID: 36330358 PMCID: PMC9614059 DOI: 10.1093/bioadv/vbac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/02/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022]

Lu Y, Wu CT, Parker SJ, Cheng Z, Saylor G, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. COT: an efficient and accurate method for detecting marker genes among many subtypes. BIOINFORMATICS ADVANCES 2022;2:vbac037. [PMID: 35673616 PMCID: PMC9163574 DOI: 10.1093/bioadv/vbac037] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 04/10/2022] [Accepted: 05/16/2022] [Indexed: 01/27/2023]

Chen L, Wu CT, Lin CH, Dai R, Liu C, Clarke R, Yu G, Van Eyk JE, Herrington DM, Wang Y. swCAM: estimation of subtype-specific expressions in individual samples with unsupervised sample-wise deconvolution. Bioinformatics 2022;38:1403-1410. [PMID: 34904628 PMCID: PMC8826012 DOI: 10.1093/bioinformatics/btab839] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 10/30/2021] [Accepted: 12/10/2021] [Indexed: 02/04/2023] Open

Abstract

MOTIVATION

Complex biological tissues are often a heterogeneous mixture of several molecularly distinct cell subtypes. Both subtype compositions and subtype-specific (STS) expressions can vary across biological conditions. Computational deconvolution aims to dissect patterns of bulk tissue data into subtype compositions and STS expressions. Existing deconvolution methods can only estimate averaged STS expressions in a population, while many downstream analyses such as inferring co-expression networks in particular subtypes require subtype expression estimates in individual samples. However, individual-level deconvolution is a mathematically underdetermined problem because there are more variables than observations.

RESULTS

We report a sample-wise Convex Analysis of Mixtures (swCAM) method that can estimate subtype proportions and STS expressions in individual samples from bulk tissue transcriptomes. We extend our previous CAM framework to include a new term accounting for between-sample variations and formulate swCAM as a nuclear-norm and ℓ2,1-norm regularized matrix factorization problem. We determine hyperparameter values using cross-validation with random entry exclusion and obtain a swCAM solution using an efficient alternating direction method of multipliers. Experimental results on realistic simulation data show that swCAM can accurately estimate STS expressions in individual samples and successfully extract co-expression networks in particular subtypes that are otherwise unobtainable using bulk data. In two real-world applications, swCAM analysis of bulk RNASeq data from brain tissue of cases and controls with bipolar disorder or Alzheimer's disease identified significant changes in cell proportion, expression pattern and co-expression module in patient neurons. Comparative evaluation of swCAM versus peer methods is also provided.

AVAILABILITY AND IMPLEMENTATION

The R Scripts of swCAM are freely available at https://github.com/Lululuella/swCAM. A user's guide and a vignette are provided.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Comparative assessment and novel strategy on methods for imputing proteomics data. Sci Rep 2022;12:1067. [PMID: 35058491 PMCID: PMC8776850 DOI: 10.1038/s41598-022-04938-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 01/04/2022] [Indexed: 11/09/2022] Open

Saddic L, Orosco A, Guo D, Milewicz DM, Troxlair D, Heide RV, Herrington D, Wang Y, Azizzadeh A, Parker SJ. Proteomic analysis of descending thoracic aorta identifies unique and universal signatures of aneurysm and dissection. JVS Vasc Sci 2022;3:85-181. [PMID: 35280433 PMCID: PMC8914561 DOI: 10.1016/j.jvssci.2022.01.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 01/05/2022] [Indexed: 01/05/2023] Open

Kammers K, Taub MA, Mathias RA, Yanek LR, Kanchan K, Venkatraman V, Sundararaman N, Martin J, Liu S, Hoyle D, Raedschelders K, Holewinski R, Parker S, Dardov V, Faraday N, Becker DM, Cheng L, Wang ZZ, Leek JT, Van Eyk JE, Becker LC. Gene and protein expression in human megakaryocytes derived from induced pluripotent stem cells. J Thromb Haemost 2021;19:1783-1799. [PMID: 33829634 DOI: 10.1111/jth.15334] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/25/2021] [Accepted: 02/19/2021] [Indexed: 01/26/2023]

Affiliation(s)

Kai Kammers Division of Biostatistics and Bioinformatics, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Margaret A Taub Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
Rasika A Mathias The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA Division of Allergy and Clinical Immunology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Lisa R Yanek The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Kanika Kanchan Division of Allergy and Clinical Immunology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Vidya Venkatraman Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
Niveda Sundararaman Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
Joshua Martin The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Senquan Liu Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Dixie Hoyle Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Koen Raedschelders Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
Ronald Holewinski Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
Sarah Parker Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
Victoria Dardov Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
Nauder Faraday The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Diane M Becker The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Linzhao Cheng Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Zack Z Wang Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
Jeffrey T Leek Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
Jennifer E Van Eyk Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
Lewis C Becker The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA

Collapse

Dabke K, Kreimer S, Jones MR, Parker SJ. A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets. J Proteome Res 2021;20:3214-3229. [PMID: 33939434 DOI: 10.1021/acs.jproteome.1c00070] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Abstract

Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification level-fragment level-improved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set's most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.

Collapse