1
|
Plubell DL, Huang E, Spencer SE, Poston KL, Montine TJ, MacCoss MJ. Data Independent Acquisition to Inform the Development of Targeted Proteomics Assays Using a Triple Quadrupole Mass Spectrometer. J Proteome Res 2025. [PMID: 40328514 DOI: 10.1021/acs.jproteome.5c00016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025]
Abstract
Mass spectrometry based targeted proteomics methods provide a sensitive and high-throughput analysis of selected proteins. To develop a targeted bottom-up proteomics assay, peptides must be evaluated as proxies for the measurement of a protein or proteoform in a biological matrix. Candidate peptide selection typically relies on predetermined biochemical properties, data from semistochastic sampling, or empirical measurements. These strategies require extensive testing and method refinement due to the difficulties associated with prediction of the peptide response in the biological matrix of interest. Gas-phase fractionated (GPF) narrow window data-independent acquisition (DIA) aids in the development of reproducible selected reaction monitoring (SRM) assays by providing matrix-specific information on peptide detectability and quantification by mass spectrometry. To demonstrate the suitability of DIA data for selecting peptide targets, we reimplement a portion of an existing assay to measure 98 Alzheimer's disease proteins in cerebrospinal fluid (CSF). Peptides were selected from GPF-DIA based on signal intensity and reproducibility. The resulting SRM assay exhibits a quantitative precision similar to that of published data, despite the inclusion of different peptides between the assays. This workflow enables development of new assays without additional upfront data acquisition, demonstrated here through generation of a separate assay for an unrelated set of proteins in CSF from the same data set.
Collapse
Affiliation(s)
- Deanna L Plubell
- University of Washington, Department of Genome Sciences, Seattle, Washington 98195, United States
| | - Eric Huang
- University of Washington, Department of Genome Sciences, Seattle, Washington 98195, United States
| | - Sandra E Spencer
- University of Washington, Department of Genome Sciences, Seattle, Washington 98195, United States
| | - Kathleen L Poston
- Stanford University, Department of Neurology & Neurological Sciences, Stanford, California 94305, United States
| | - Thomas J Montine
- Stanford University, Department of Pathology, Stanford, California 94305, United States
| | - Michael J MacCoss
- University of Washington, Department of Genome Sciences, Seattle, Washington 98195, United States
| |
Collapse
|
2
|
Xu S, Chuang CY, Hawkins CL, Hägglund P, Davies MJ. Quantitative analysis of the proteome and protein oxidative modifications in primary human coronary artery endothelial cells and associated extracellular matrix. Redox Biol 2025; 81:103524. [PMID: 39954365 PMCID: PMC11875191 DOI: 10.1016/j.redox.2025.103524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 01/28/2025] [Accepted: 01/30/2025] [Indexed: 02/17/2025] Open
Abstract
Vascular endothelial cells (ECs) play a key role in physiology by controlling arterial contraction and relaxation, and molecular transport. EC dysfunction is associated with multiple pathologies. Here, we characterize the cellular and extracellular matrix (ECM) proteomes of primary human coronary artery ECs, from multiple donors, and oxidation/nitration products formed on these during cell culture, using liquid chromatography-mass spectrometry. In total ∼9900 proteins were identified in cells from 3 donors, with ∼7000 proteins per donor. Of these ∼5300 were consistently identified, indicating some heterogeneity across the donors, with age a possible cause. Multiple endogenous oxidation products were detected on both ECM and cellular proteins (and particularly endoplasmic reticulum species). In contrast, nitration was mostly detected on cell proteins and particularly cytoskeletal proteins, consistent with intracellular generation of nitrating agents, possibly from endothelial nitric oxide synthase (eNOS) or peroxidase enzymes. The modifications are ascribed to both physiological enzymatic activity (hydroxylation at proline/lysine; predominantly on ECM proteins and especially collagens) and the formation of reactive species (oxidation at tryptophan/tyrosine/histidine; nitration at tryptophan/tyrosine). The identified sites are present on a limited number of peptides (104 oxidized; 23 nitrated) from a modest number of proteins. A small number of proteins were detected with multiple modifications, consistent with these being selective and specific targets. Several nitrated peptides were consistently detected across all donors, and also in human smooth muscle cells suggesting that these are major targets in the vascular proteome. These data provide a 'background' proteome dataset for studies of endothelial dysfunction in disease.
Collapse
Affiliation(s)
- Shuqi Xu
- Department of Biomedical Sciences, Panum Institute, University of Copenhagen, Denmark; Department of Cardiovascular Medicine, The Affiliated Yongchuan Hospital of Chongqing Medical University, Chongqing, China
| | - Christine Y Chuang
- Department of Biomedical Sciences, Panum Institute, University of Copenhagen, Denmark
| | - Clare L Hawkins
- Department of Biomedical Sciences, Panum Institute, University of Copenhagen, Denmark
| | - Per Hägglund
- Department of Biomedical Sciences, Panum Institute, University of Copenhagen, Denmark.
| | - Michael J Davies
- Department of Biomedical Sciences, Panum Institute, University of Copenhagen, Denmark.
| |
Collapse
|
3
|
Du D, Bhardwaj S, Lu Y, Wang Y, Parker SJ, Zhang Z, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Embracing the informative missingness and silent gene in analyzing biologically diverse samples. Sci Rep 2024; 14:28265. [PMID: 39550430 PMCID: PMC11569126 DOI: 10.1038/s41598-024-78076-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 10/28/2024] [Indexed: 11/18/2024] Open
Abstract
Bioinformatics software tools are essential to identify informative molecular features that define different phenotypic sample groups. Among the most fundamental and interrelated tasks are missing value imputation, signature gene detection, and differential pattern visualization. However, many commonly used analytics tools can be problematic when handling biologically diverse samples if either informative missingness possess high missing rates with mixed missing mechanisms, or multiple sample groups are compared and visualized in parallel. We developed the ABDS tool suite specifically for analyzing biologically diverse samples. Collectively, a mechanism-integrated group-wise pre-imputation scheme is proposed to retain informative missingness associated with signature genes, a cosine-based one-sample test is extended to detect group-silenced signature genes, and a unified heatmap is designed to display multiple sample groups. We describe the methodological principles and demonstrate the effectiveness of three analytics tools under targeted scenarios, supported by comparative evaluations and biomedical showcases. As an open-source R package, ABDS tool suite complements rather than replaces existing tools and will allow biologists to more accurately detect interpretable molecular signals among phenotypically diverse sample groups.
Collapse
Affiliation(s)
- Dongping Du
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| | - Saurabh Bhardwaj
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, 147004, Punjab, India
| | - Yingzhou Lu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| | - Yizhi Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD, 21231, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Guoqiang Yu
- Department of Automation, Tsinghua University, Beijing, 100084, P. R. China
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN, 55912, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC, 27157, USA
| | - Yue Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, 22203, USA.
- Dept. of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 N. Glebe Road, Arlington, VA, 22203, USA.
| |
Collapse
|
4
|
Du D, Bhardwaj S, Lu Y, Wang Y, Parker SJ, Zhang Z, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. ABDS: a bioinformatics tool suite for analyzing biologically diverse samples. RESEARCH SQUARE 2024:rs.3.rs-4419408. [PMID: 38853832 PMCID: PMC11160903 DOI: 10.21203/rs.3.rs-4419408/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Bioinformatics software tools are essential to identify informative molecular features that define different phenotypic sample groups. Among the most fundamental and interrelated tasks are missing value imputation, signature gene detection, and differential pattern visualization. However, many commonly used analytics tools can be problematic when handling biologically diverse samples if either informative missingness possess high missing rates with mixed missing mechanisms, or multiple sample groups are compared and visualized in parallel. We developed the ABDS tool suite specifically for analyzing biologically diverse samples. Collectively, a mechanism-integrated group-wise pre-imputation scheme is proposed to retain informative missingness associated with signature genes, a cosine-based one-sample test is extended to detect group-silenced signature genes, and a unified heatmap is designed to display multiple sample groups. We describe the methodological principles and demonstrate the effectiveness of three analytics tools under targeted scenarios, supported by comparative evaluations and biomedical showcases. As an open-source R package, ABDS tool suite complements rather than replaces existing tools and will allow biologists to more accurately detect interpretable molecular signals among phenotypically diverse sample groups.
Collapse
Affiliation(s)
- Dongping Du
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Saurabh Bhardwaj
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Yingzhou Lu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Yizhi Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Sarah J. Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Automation, Tsinghua University, Beijing 100084, P. R. China
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M. Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
5
|
Nguyen H, Nguyen H, Tran D, Draghici S, Nguyen T. Fourteen years of cellular deconvolution: methodology, applications, technical evaluation and outstanding challenges. Nucleic Acids Res 2024; 52:4761-4783. [PMID: 38619038 PMCID: PMC11109966 DOI: 10.1093/nar/gkae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 03/01/2024] [Accepted: 04/02/2024] [Indexed: 04/16/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-Seq) is a recent technology that allows for the measurement of the expression of all genes in each individual cell contained in a sample. Information at the single-cell level has been shown to be extremely useful in many areas. However, performing single-cell experiments is expensive. Although cellular deconvolution cannot provide the same comprehensive information as single-cell experiments, it can extract cell-type information from bulk RNA data, and therefore it allows researchers to conduct studies at cell-type resolution from existing bulk datasets. For these reasons, a great effort has been made to develop such methods for cellular deconvolution. The large number of methods available, the requirement of coding skills, inadequate documentation, and lack of performance assessment all make it extremely difficult for life scientists to choose a suitable method for their experiment. This paper aims to fill this gap by providing a comprehensive review of 53 deconvolution methods regarding their methodology, applications, performance, and outstanding challenges. More importantly, the article presents a benchmarking of all these 53 methods using 283 cell types from 30 tissues of 63 individuals. We also provide an R package named DeconBenchmark that allows readers to execute and benchmark the reviewed methods (https://github.com/tinnlab/DeconBenchmark).
Collapse
Affiliation(s)
- Hung Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | - Ha Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | - Duc Tran
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, Detroit, MI, USA
- Advaita Bioinformatics, Ann Arbor, MI, USA
| | - Tin Nguyen
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| |
Collapse
|
6
|
Wu CT, Du D, Chen L, Dai R, Liu C, Yu G, Bhardwaj S, Parker SJ, Zhang Z, Clarke R, Herrington DM, Wang Y. CAM3.0: determining cell type composition and expression from bulk tissues with fully unsupervised deconvolution. Bioinformatics 2024; 40:btae107. [PMID: 38407991 PMCID: PMC10924278 DOI: 10.1093/bioinformatics/btae107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 01/13/2024] [Accepted: 02/25/2024] [Indexed: 02/28/2024] Open
Abstract
MOTIVATION Complex tissues are dynamic ecosystems consisting of molecularly distinct yet interacting cell types. Computational deconvolution aims to dissect bulk tissue data into cell type compositions and cell-specific expressions. With few exceptions, most existing deconvolution tools exploit supervised approaches requiring various types of references that may be unreliable or even unavailable for specific tissue microenvironments. RESULTS We previously developed a fully unsupervised deconvolution method-Convex Analysis of Mixtures (CAM), that enables estimation of cell type composition and expression from bulk tissues. We now introduce CAM3.0 tool that improves this framework with three new and highly efficient algorithms, namely, radius-fixed clustering to identify reliable markers, linear programming to detect an initial scatter simplex, and a smart floating search for the optimum latent variable model. The comparative experimental results obtained from both realistic simulations and case studies show that the CAM3.0 tool can help biologists more accurately identify known or novel cell markers, determine cell proportions, and estimate cell-specific expressions, complementing the existing tools particularly when study- or datatype-specific references are unreliable or unavailable. AVAILABILITY AND IMPLEMENTATION The open-source R Scripts of CAM3.0 is freely available at https://github.com/ChiungTingWu/CAM3/(https://github.com/Bioconductor/Contributions/issues/3205). A user's guide and a vignette are provided.
Collapse
Affiliation(s)
- Chiung-Ting Wu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Dongping Du
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, United States
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, United States
| | - Guoqiang Yu
- Department of Automation, Tsinghua University, Beijing 100084, P. R. China
| | - Saurabh Bhardwaj
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering & Technology, Punjab 147004, India
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, United States
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, United States
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, United States
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, United States
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, United States
| |
Collapse
|
7
|
Lorentzen LG, Yeung K, Eldrup N, Eiberg JP, Sillesen HH, Davies MJ. Proteomic analysis of the extracellular matrix of human atherosclerotic plaques shows marked changes between plaque types. Matrix Biol Plus 2024; 21:100141. [PMID: 38292008 PMCID: PMC10825564 DOI: 10.1016/j.mbplus.2024.100141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 01/01/2024] [Accepted: 01/04/2024] [Indexed: 02/01/2024] Open
Abstract
Cardiovascular disease is the leading cause of death, with atherosclerosis the major underlying cause. While often asymptomatic for decades, atherosclerotic plaque destabilization and rupture can arise suddenly and cause acute arterial occlusion or peripheral embolization resulting in myocardial infarction, stroke and lower limb ischaemia. As extracellular matrix (ECM) remodelling is associated with plaque instability, we hypothesized that the ECM composition would differ between plaques. We analyzed atherosclerotic plaques obtained from 21 patients who underwent carotid surgery following recent symptomatic carotid artery stenosis. Plaques were solubilized using a new efficient, single-step approach. Solubilized proteins were digested to peptides, and analyzed by liquid chromatography-mass spectrometry using data-independent acquisition. Identification and quantification of 4498 plaque proteins was achieved, including 354 ECM proteins, with unprecedented coverage and high reproducibility. Multidimensional scaling analysis and hierarchical clustering indicate two distinct clusters, which correlate with macroscopic plaque morphology (soft/unstable versus hard/stable), ultrasound classification (echolucent versus echogenic) and the presence of hemorrhage/ulceration. We identified 714 proteins with differential abundances between these groups. Soft/unstable plaques were enriched in proteins involved in inflammation, ECM remodelling, and protein degradation (e.g. matrix metalloproteinases, cathepsins). In contrast, hard/stable plaques contained higher levels of ECM structural proteins (e.g. collagens, versican, nidogens, biglycan, lumican, proteoglycan 4, mineralization proteins). These data indicate that a single-step proteomics method can provide unique mechanistic insights into ECM remodelling and inflammatory mechanisms within plaques that correlate with clinical parameters, and help rationalize plaque destabilization. These data also provide an approach towards identifying biomarkers for individualized risk profiling of atherosclerosis.
Collapse
Affiliation(s)
- Lasse G. Lorentzen
- Department of Biomedical Sciences, Panum Institute, University of Copenhagen, Denmark
| | - Karin Yeung
- Department of Vascular Surgery, Heart Centre, University Hospital Copenhagen - Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Nikolaj Eldrup
- Department of Vascular Surgery, Heart Centre, University Hospital Copenhagen - Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Jonas P. Eiberg
- Department of Vascular Surgery, Heart Centre, University Hospital Copenhagen - Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
- Copenhagen Academy for Medical Education and Simulation (CAMES), Capital Region of Denmark, Copenhagen, Denmark
| | - Henrik H. Sillesen
- Department of Vascular Surgery, Heart Centre, University Hospital Copenhagen - Rigshospitalet, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Michael J. Davies
- Department of Biomedical Sciences, Panum Institute, University of Copenhagen, Denmark
| |
Collapse
|
8
|
Momenzadeh A, Kreimer S, Guo D, Ayres M, Berman D, Chyu KY, Shah PK, Milewicz D, Azizzadeh A, Meyer JG, Parker S. Differentiation between Descending Thoracic Aortic Diseases using Machine Learning and Plasma Proteomic Signatures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.26.538468. [PMID: 37162892 PMCID: PMC10168345 DOI: 10.1101/2023.04.26.538468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Background Descending thoracic aortic aneurysms and dissections can go undetected until severe and catastrophic, and few clinical indices exist to screen for aneurysms or predict risk of dissection. Methods This study generated a plasma proteomic dataset from 75 patients with descending type B dissection (Type B) and 62 patients with descending thoracic aortic aneurysm (DTAA). Standard statistical approaches were compared to supervised machine learning (ML) algorithms to distinguish Type B from DTAA cases. Quantitatively similar proteins were clustered based on linkage distance from hierarchical clustering and ML models were trained with uncorrelated protein lists across various linkage distances with hyperparameter optimization using 5-fold cross validation. Permutation importance (PI) was used for ranking the most important predictor proteins of ML classification between disease states and the proteins among the top 10 PI protein groups were submitted for pathway analysis. Results Of the 1,549 peptides and 198 proteins used in this study, no peptides and only one protein, hemopexin (HPX), were significantly different at an adjusted p-value <0.01 between Type B and DTAA cases. The highest performing model on the training set (Support Vector Classifier) and its corresponding linkage distance (0.5) were used for evaluation of the test set, yielding a precision-recall area under the curve of 0.7 to classify between Type B from DTAA cases. The five proteins with the highest PI scores were immunoglobulin heavy variable 6-1 (IGHV6-1), lecithin-cholesterol acyltransferase (LCAT), coagulation factor 12 (F12), HPX, and immunoglobulin heavy variable 4-4 (IGHV4-4). All proteins from the top 10 most important correlated groups generated the following significantly enriched pathways in the plasma of Type B versus DTAA patients: complement activation, humoral immune response, and blood coagulation. Conclusions We conclude that ML may be useful in differentiating the plasma proteome of highly similar disease states that would otherwise not be distinguishable using statistics, and, in such cases, ML may enable prioritizing important proteins for model prediction.
Collapse
Affiliation(s)
- Amanda Momenzadeh
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Simion Kreimer
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Dongchuan Guo
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Texas
| | - Matthew Ayres
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Daniel Berman
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
- Cedars-Sinai Imaging Department, Cedars Sinai Medical Center, Lost Angeles, California, USA
| | - Kuang-Yuh Chyu
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Prediman K Shah
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Dianna Milewicz
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Texas
| | - Ali Azizzadeh
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Jesse G. Meyer
- Department of Computational Biomedicine, Cedars Sinai Medical Center, Los Angeles, California, USA
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
| | - Sarah Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
- Smidt Heart Institute, Cedars Sinai Medical Center, Los Angeles, California, USA
- Department of Biomedical Sciences, Cedars Sinai Medical Center, Los Angeles California, USA
| |
Collapse
|
9
|
Hasman M, Mayr M, Theofilatos K. Uncovering Protein Networks in Cardiovascular Proteomics. Mol Cell Proteomics 2023; 22:100607. [PMID: 37356494 PMCID: PMC10460687 DOI: 10.1016/j.mcpro.2023.100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 05/01/2023] [Accepted: 06/20/2023] [Indexed: 06/27/2023] Open
Abstract
Biological networks have been widely used in many different diseases to identify potential biomarkers and design drug targets. In the present review, we describe the main computational techniques for reconstructing and analyzing different types of protein networks and summarize the previous applications of such techniques in cardiovascular diseases. Existing tools are critically compared, discussing when each method is preferred such as the use of co-expression networks for functional annotation of protein clusters and the use of directed networks for inferring regulatory associations. Finally, we are presenting examples of reconstructing protein networks of different types (regulatory, co-expression, and protein-protein interaction networks). We demonstrate the necessity to reconstruct networks separately for each cardiovascular tissue type and disease entity and provide illustrative examples of the importance of taking into consideration relevant post-translational modifications. Finally, we demonstrate and discuss how the findings of protein networks could be interpreted using single-cell RNA-sequencing data.
Collapse
Affiliation(s)
- Maria Hasman
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | - Manuel Mayr
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | | |
Collapse
|
10
|
Du D, Bhardwaj S, Parker SJ, Cheng Z, Zhang Z, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. ABDS: tool suite for analyzing biologically diverse samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.05.547797. [PMID: 37461566 PMCID: PMC10349978 DOI: 10.1101/2023.07.05.547797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/23/2023]
Abstract
Motivation Analytics tools are essential to identify informative molecular features about different phenotypic groups. Among the most fundamental tasks are missing value imputation, signature gene detection, and expression pattern visualization. However, most commonly used analytics tools may be problematic for characterizing biologically diverse samples when either signature genes possess uneven missing rates across different groups yet involving complex missing mechanisms, or multiple biological groups are simultaneously compared and visualized. Results We develop ABDS tool suite tailored specifically to analyzing biologically diverse samples. Mechanism-integrated group-wise imputation is developed to recruit signature genes involving informative missingness, cosine-based one-sample test is extended to detect enumerated signature genes, and unified heatmap is designed to comparably display complex expression patterns. We discuss the methodological principles and demonstrate the conceptual advantages of the three software tools. We also showcase the biomedical applications of these individual tools. Implemented in open-source R scripts, ABDS tool suite complements rather than replaces the existing tools and will allow biologists to more accurately detect interpretable molecular signals among diverse phenotypic samples. Availability and implementation The R Scripts of ABDS tool suite is freely available at https://github.com/niccolodpdu/ABDS.
Collapse
Affiliation(s)
- Dongping Du
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Saurabh Bhardwaj
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
- Department of Electrical and Instrumentation Engineering, Thapar Institute of Engineering and Technology, Patiala, Punjab 147004, India
| | - Sarah J. Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Zuolin Cheng
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Zhen Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Yingzhou Lu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E. Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M. Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical & Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
11
|
Wu CT, Shen M, Du D, Cheng Z, Parker SJ, Lu Y, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. Cosbin: cosine score-based iterative normalization of biologically diverse samples. BIOINFORMATICS ADVANCES 2022; 2:vbac076. [PMID: 36330358 PMCID: PMC9614059 DOI: 10.1093/bioadv/vbac076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 10/02/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022]
Abstract
Motivation Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant. Results We report an efficient and accurate data-driven method—Cosine score-based iterative normalization (Cosbin)—to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups. Availability and implementation The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Dongping Du
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Zuolin Cheng
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Yingzhou Lu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- To whom correspondence should be addressed.
| |
Collapse
|
12
|
Lu Y, Wu CT, Parker SJ, Cheng Z, Saylor G, Van Eyk JE, Yu G, Clarke R, Herrington DM, Wang Y. COT: an efficient and accurate method for detecting marker genes among many subtypes. BIOINFORMATICS ADVANCES 2022; 2:vbac037. [PMID: 35673616 PMCID: PMC9163574 DOI: 10.1093/bioadv/vbac037] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 04/10/2022] [Accepted: 05/16/2022] [Indexed: 01/27/2023]
Abstract
Motivation Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others-so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. Results We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods. Availability and implementation The Python COT software with a detailed user's manual and a vignette are freely available at https://github.com/MintaYLu/COT. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yingzhou Lu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chiung-Ting Wu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Zuolin Cheng
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Georgia Saylor
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA,To whom correspondence should be addressed.
| |
Collapse
|
13
|
Chen L, Wu CT, Lin CH, Dai R, Liu C, Clarke R, Yu G, Van Eyk JE, Herrington DM, Wang Y. swCAM: estimation of subtype-specific expressions in individual samples with unsupervised sample-wise deconvolution. Bioinformatics 2022; 38:1403-1410. [PMID: 34904628 PMCID: PMC8826012 DOI: 10.1093/bioinformatics/btab839] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 10/30/2021] [Accepted: 12/10/2021] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Complex biological tissues are often a heterogeneous mixture of several molecularly distinct cell subtypes. Both subtype compositions and subtype-specific (STS) expressions can vary across biological conditions. Computational deconvolution aims to dissect patterns of bulk tissue data into subtype compositions and STS expressions. Existing deconvolution methods can only estimate averaged STS expressions in a population, while many downstream analyses such as inferring co-expression networks in particular subtypes require subtype expression estimates in individual samples. However, individual-level deconvolution is a mathematically underdetermined problem because there are more variables than observations. RESULTS We report a sample-wise Convex Analysis of Mixtures (swCAM) method that can estimate subtype proportions and STS expressions in individual samples from bulk tissue transcriptomes. We extend our previous CAM framework to include a new term accounting for between-sample variations and formulate swCAM as a nuclear-norm and ℓ2,1-norm regularized matrix factorization problem. We determine hyperparameter values using cross-validation with random entry exclusion and obtain a swCAM solution using an efficient alternating direction method of multipliers. Experimental results on realistic simulation data show that swCAM can accurately estimate STS expressions in individual samples and successfully extract co-expression networks in particular subtypes that are otherwise unobtainable using bulk data. In two real-world applications, swCAM analysis of bulk RNASeq data from brain tissue of cases and controls with bipolar disorder or Alzheimer's disease identified significant changes in cell proportion, expression pattern and co-expression module in patient neurons. Comparative evaluation of swCAM versus peer methods is also provided. AVAILABILITY AND IMPLEMENTATION The R Scripts of swCAM are freely available at https://github.com/Lululuella/swCAM. A user's guide and a vignette are provided. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chiung-Ting Wu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Chia-Hsiang Lin
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Robert Clarke
- The Hormel Institute, University of Minnesota, Austin, MN 55912, USA
| | - Guoqiang Yu
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA
| | - David M Herrington
- Department of Internal Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
14
|
Comparative assessment and novel strategy on methods for imputing proteomics data. Sci Rep 2022; 12:1067. [PMID: 35058491 PMCID: PMC8776850 DOI: 10.1038/s41598-022-04938-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 01/04/2022] [Indexed: 11/09/2022] Open
Abstract
Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach.
Collapse
|
15
|
Saddic L, Orosco A, Guo D, Milewicz DM, Troxlair D, Heide RV, Herrington D, Wang Y, Azizzadeh A, Parker SJ. Proteomic analysis of descending thoracic aorta identifies unique and universal signatures of aneurysm and dissection. JVS Vasc Sci 2022; 3:85-181. [PMID: 35280433 PMCID: PMC8914561 DOI: 10.1016/j.jvssci.2022.01.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 01/05/2022] [Indexed: 01/05/2023] Open
Abstract
Objective Methods Results Conclusions Diseases of the descending thoracic aorta such as aneurysms and dissections carry a high degree of morbidity and mortality. At present, a complete understanding is still lacking of the genetics that drive these diseases and why some aortic segments dissect in the presence or absence of an aneurysm. We compared and contrasted the whole proteome expression of descending aortas from patients with normal, dissected, aneurysmal, and aneurysmal with dissected pathology aortic tissue. We uncovered potential tissue markers that might serve as future targets for therapy or predictors of disease progression.
Collapse
Affiliation(s)
- Louis Saddic
- Department of Anesthesiology and Perioperative Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, Calif
| | - Amanda Orosco
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, Calif
| | - Dongchuan Guo
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Tex
| | - Dianna M. Milewicz
- Department of Internal Medicine, McGovern Medical School, University of Texas Health Science Center, Houston, Tex
| | - Dana Troxlair
- Department of Pathology, Louisiana State University, New Orleans, La
| | | | - David Herrington
- Department of Cardiovascular Medicine, Wake Forest University, Winston-Salem, NC
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Va
| | - Ali Azizzadeh
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, Calif
| | - Sarah J. Parker
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, Calif
- Correspondence: Sarah J. Parker, PhD, Department of Cardiology, Smidt Heart Institute, Cedars Sinai Medical Center, AHSP A9228, 8700 Beverly Blvd, Los Angeles, CA 90048
| |
Collapse
|
16
|
Kammers K, Taub MA, Mathias RA, Yanek LR, Kanchan K, Venkatraman V, Sundararaman N, Martin J, Liu S, Hoyle D, Raedschelders K, Holewinski R, Parker S, Dardov V, Faraday N, Becker DM, Cheng L, Wang ZZ, Leek JT, Van Eyk JE, Becker LC. Gene and protein expression in human megakaryocytes derived from induced pluripotent stem cells. J Thromb Haemost 2021; 19:1783-1799. [PMID: 33829634 DOI: 10.1111/jth.15334] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 01/25/2021] [Accepted: 02/19/2021] [Indexed: 01/26/2023]
Abstract
BACKGROUND There is interest in deriving megakaryocytes (MKs) from pluripotent stem cells (iPSC) for biological studies. We previously found that genomic structural integrity and genotype concordance is maintained in iPSC-derived MKs. OBJECTIVE To establish a comprehensive dataset of genes and proteins expressed in iPSC-derived MKs. METHODS iPSCs were reprogrammed from peripheral blood mononuclear cells (MNCs) and MKs were derived from the iPSCs in 194 healthy European American and African American subjects. mRNA was isolated and gene expression measured by RNA sequencing. Protein expression was measured in 62 of the subjects using mass spectrometry. RESULTS AND CONCLUSIONS MKs expressed genes and proteins known to be important in MK and platelet function and demonstrated good agreement with previous studies in human MKs derived from CD34+ progenitor cells. The percent of cells expressing the MK markers CD41 and CD42a was consistent in biological replicates, but variable across subjects, suggesting that unidentified subject-specific factors determine differentiation of MKs from iPSCs. Gene and protein sets important in platelet function were associated with increasing expression of CD41/42a, while those related to more basic cellular functions were associated with lower CD41/42a expression. There was differential gene expression by the sex and race (but not age) of the subject. Numerous genes and proteins were highly expressed in MKs but not known to play a role in MK or platelet function; these represent excellent candidates for future study of hematopoiesis, platelet formation, and/or platelet function.
Collapse
Affiliation(s)
- Kai Kammers
- Division of Biostatistics and Bioinformatics, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Margaret A Taub
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Rasika A Mathias
- The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
- Division of Allergy and Clinical Immunology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Lisa R Yanek
- The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Kanika Kanchan
- Division of Allergy and Clinical Immunology, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Vidya Venkatraman
- Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Niveda Sundararaman
- Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Joshua Martin
- The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Senquan Liu
- Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Dixie Hoyle
- Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Koen Raedschelders
- Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Ronald Holewinski
- Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Sarah Parker
- Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Victoria Dardov
- Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Nauder Faraday
- The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Diane M Becker
- The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Linzhao Cheng
- Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Zack Z Wang
- Division of Hematology and Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Jennifer E Van Eyk
- Advanced Clinical Biosystems Research Institute, Barbra Streisand Woman's Heart Center, The Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, California, USA
| | - Lewis C Becker
- The GeneSTAR Program, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
17
|
Dabke K, Kreimer S, Jones MR, Parker SJ. A Simple Optimization Workflow to Enable Precise and Accurate Imputation of Missing Values in Proteomic Data Sets. J Proteome Res 2021; 20:3214-3229. [PMID: 33939434 DOI: 10.1021/acs.jproteome.1c00070] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Missing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, no single imputation method is best suited for a diverse range of data sets, and no clear strategy exists for evaluating imputation methods for clinical DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a strategy to assess imputation methods on clinical label-free DIA-MS data sets. We used three DIA-MS data sets with real missing values to evaluate eight imputation methods with multiple parameters at different levels of protein quantification: a dilution series data set, a small pilot data set, and a clinical proteomic data set comparing paired tumor and stroma tissue. We found that imputation methods based on local structures within the data, like local least-squares (LLS) and random forest (RF), worked well in our dilution series data set, whereas imputation methods based on global structures within the data, like BPCA, performed well in the other two data sets. We also found that imputation at the most basic protein quantification level-fragment level-improved accuracy and the number of proteins quantified. With this analytical framework, we quickly and cost-effectively evaluated different imputation methods using two smaller complementary data sets to narrow down to the larger proteomic data set's most accurate methods. This acquisition strategy allowed us to provide reproducible evidence of the accuracy of the imputation method, even in the absence of a ground truth. Overall, this study indicates that the most suitable imputation method relies on the overall structure of the data set and provides an example of an analytic framework that may assist in identifying the most appropriate imputation strategies for the differential analysis of proteins.
Collapse
Affiliation(s)
- Kruttika Dabke
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States.,Graduate Program in Biomedical Sciences, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Simion Kreimer
- Advanced Clinical Biosystems Research Institute, Smidt Heart Institute, Departments of Cardiology and Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Michelle R Jones
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| | - Sarah J Parker
- Advanced Clinical Biosystems Research Institute, Smidt Heart Institute, Departments of Cardiology and Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, California 90048, United States
| |
Collapse
|