1
|
Gu C, Ghasemi SM, Cai Y, Fahrmann JF, Long JP, Katayama H, Wu C, Vykoukal J, Dennison JB, Hanash S, Do KA, Irajizad E. Grape-Pi: graph-based neural networks for enhanced protein identification in proteomics pipelines. BIOINFORMATICS ADVANCES 2025; 5:vbaf095. [PMID: 40406669 PMCID: PMC12096076 DOI: 10.1093/bioadv/vbaf095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 04/02/2025] [Accepted: 04/24/2025] [Indexed: 05/26/2025]
Abstract
Motivation Protein identification via mass spectrometry (MS) is the primary method for untargeted protein detection. However, the identification process is challenging due to data complexity and the need to control false discovery rates (FDR) of protein identification. To address these challenges, we developed a graph neural network (GNN)-based model, Graph Neural Network using Protein-Protein Interaction for Enhancing Protein Identification (Grape-Pi), which is applicable to all proteomics pipelines. This model leverages protein-protein interaction (PPI) data and employs two types of message-passing layers to integrate evidence from both the target protein and its interactors, thereby improving identification accuracy. Results Grape-Pi achieved significant improvements in area under receiver-operating characteristic curve (AUC) in differentiating present and absent proteins: 18% and 7% in two yeast samples and 9% in gastric samples over traditional methods in the test dataset. Additionally, proteins identified via Grape-Pi in gastric samples demonstrated a high correlation with mRNA data and identified gastric cancer proteins, like MAP4K4, missed by conventional methods. Availability and Implementation Grape-Pi is freely available at https://zenodo.org/records/11310518 and https://github.com/FDUguchunhui/GrapePi.
Collapse
Affiliation(s)
- Chunhui Gu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Seyyed Mahmood Ghasemi
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Yining Cai
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Johannes F Fahrmann
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - James P Long
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Hiroyuki Katayama
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Chong Wu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Jody Vykoukal
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Jennifer B Dennison
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Samir Hanash
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Kim-Anh Do
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Ehsan Irajizad
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| |
Collapse
|
2
|
Affiliation(s)
- Axel Gandy
- Department of Mathematics, Imperial College London
| | - Georg Hahn
- Department of Mathematics, Imperial College London
| |
Collapse
|
3
|
Mining for genes related to choroidal neovascularization based on the shortest path algorithm and protein interaction information. Biochim Biophys Acta Gen Subj 2016; 1860:2740-9. [PMID: 26987808 DOI: 10.1016/j.bbagen.2016.03.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 03/05/2016] [Accepted: 03/10/2016] [Indexed: 12/24/2022]
Abstract
BACKGROUND Choroidal neovascularization (CNV) is a serious eye disease that may cause visual loss, especially for older people. Many factors have been proven to induce this disease including age, gender, obesity, and so on. However, until now, we have had limited knowledge on CNV's pathogenic mechanism. Discovering the genes that underlie this disease and performing extensive studies on them can help us to understand how CNV occurs and design effective treatments. METHODS In this study, we designed a computational method to identify novel CNV-related genes in a large protein network constructed using the protein-protein interaction information in STRING. The candidate genes were first extracted from the shortest paths connecting any two known CNV-related genes and then filtered by a permutation test and using knowledge of their linkages to known CNV-related genes. RESULTS A list of putative CNV-related candidate genes was accessed by our method. These genes are deemed to have strong relationships with CNV. CONCLUSIONS Extensive analyses of several of the putative genes such as ANK1, ITGA4, CD44 and others indicate that they are related to specific biological processes involved in CNV, implying they may be novel CNV-related genes. GENERAL SIGNIFICANCE The newfound putative CNV-related genes may provide new insights into CNV and help design more effective treatments. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.
Collapse
|
4
|
Hur B, Chae H, Kim S. Combined analysis of gene regulatory network and SNV information enhances identification of potential gene markers in mouse knockout studies with small number of samples. BMC Med Genomics 2015; 8 Suppl 2:S10. [PMID: 26044212 PMCID: PMC4460612 DOI: 10.1186/1755-8794-8-s2-s10] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
RNA-sequencing is widely used to measure gene expression level at the whole genome level. Comparing expression data from control and case studies provides good insight on potential gene markers for phenotypes. However, discovering gene markers that represent phenotypic differences in a small number of samples remains a challenging task, since finding gene markers using standard differential expressed gene methods produces too many candidate genes and the number of candidates varies at different threshold values. In addition, in a small number of samples, the statistical power is too low to discriminate whether gene expressions were altered by genetic differences or not. In this study, to address this challenge, we purpose a four-step filtering method that predicts gene markers from RNA-sequencing data of mouse knockout studies by utilizing a gene regulatory network constructed from omics data in the public domain, biological knowledge from curated pathways, and information of single-nucleotide variants. Our prediction method was not only able to reduce the number of candidate genes than the differentialy expressed gene-only filtered method, but also successfully predicted significant genes that were reported in research findings of the data contributors.
Collapse
|
5
|
Chen L, Chu C, Kong X, Huang G, Huang T, Cai YD. A hybrid computational method for the discovery of novel reproduction-related genes. PLoS One 2015; 10:e0117090. [PMID: 25768094 PMCID: PMC4358884 DOI: 10.1371/journal.pone.0117090] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2014] [Accepted: 12/13/2014] [Indexed: 12/12/2022] Open
Abstract
Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People’s Republic of China
| | - Chen Chu
- State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People’s Republic of China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200025, People’s Republic of China
| | - Guohua Huang
- Institute of Systems Biology, Shanghai University, Shanghai, 200444, People’s Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200025, People’s Republic of China
- * E-mail: (TH); (YDC)
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, 200444, People’s Republic of China
- * E-mail: (TH); (YDC)
| |
Collapse
|
6
|
Nandal UK, Vlietstra WJ, Byrman C, Jeeninga RE, Ringrose JH, van Kampen AHC, Speijer D, Moerland PD. Candidate prioritization for low-abundant differentially expressed proteins in 2D-DIGE datasets. BMC Bioinformatics 2015; 16:25. [PMID: 25627479 PMCID: PMC4384356 DOI: 10.1186/s12859-015-0455-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 01/09/2015] [Indexed: 01/17/2023] Open
Abstract
Background Two-dimensional differential gel electrophoresis (2D-DIGE) provides a powerful technique to separate proteins on their isoelectric point and apparent molecular mass and quantify changes in protein expression. Abundantly available proteins in spots can be identified using mass spectrometry-based approaches. However, identification is often not possible for low-abundant proteins. Results We present a novel computational approach to prioritize candidate proteins for unidentified spots. Our approach exploits noisy information on the isoelectric point and apparent molecular mass of a protein spot in combination with functional similarities of candidate proteins to already identified proteins to select and rank candidates. We evaluated our method on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. Using leave-one-out cross-validation, we show that the true-positive rate for the top-5 ranked proteins is 43.8%. Conclusions Our approach shows good performance on a 2D-DIGE dataset comparing protein expression in uninfected and HIV-1 infected T-cells. We expect our method to be highly useful in (re-)mining other 2D-DIGE experiments in which especially the low-abundant protein spots remain to be identified. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0455-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Umesh K Nandal
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Wytze J Vlietstra
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Carsten Byrman
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Rienk E Jeeninga
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Jeffrey H Ringrose
- Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Antoine H C van Kampen
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands. .,Biosystems Data Analysis Group, University of Amsterdam, Science Park 9041098, XH Amsterdam, The Netherlands.
| | - Dave Speijer
- Department of Medical Biochemistry, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| | - Perry D Moerland
- Bioinformatics Laboratory, Academic Medical Center, University of Amsterdam, PO Box 22700, DE Amsterdam, 1100, The Netherlands.
| |
Collapse
|
7
|
Wang X, Zhang B. Integrating genomic, transcriptomic, and interactome data to improve Peptide and protein identification in shotgun proteomics. J Proteome Res 2014; 13:2715-23. [PMID: 24792918 PMCID: PMC4059263 DOI: 10.1021/pr500194t] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
![]()
Mass spectrometry (MS)-based shotgun
proteomics is an effective
technology for global proteome profiling. The ultimate goal is to
assign tandem MS spectra to peptides and subsequently infer proteins
and their abundance. In addition to database searching and protein
assembly algorithms, computational approaches have been developed
to integrate genomic, transcriptomic, and interactome information
to improve peptide and protein identification. Earlier efforts focus
primarily on making databases more comprehensive using publicly available
genomic and transcriptomic data. More recently, with the increasing
affordability of the Next Generation Sequencing (NGS) technologies,
personalized protein databases derived from sample-specific genomic
and transcriptomic data have emerged as an attractive strategy. In
addition, incorporating interactome data not only improves protein
identification but also puts identified proteins into their functional
context and thus facilitates data interpretation. In this paper, we
survey the major integrative bioinformatics approaches that have been
developed during the past decade and discuss their merits and demerits.
Collapse
Affiliation(s)
- Xiaojing Wang
- Department of Biomedical Informatics, ‡Vanderbilt-Ingram Cancer Center, and §Department of Cancer Biology, Vanderbilt University School of Medicine , Nashville, Tennessee 37232, United States
| | | |
Collapse
|
8
|
Gandy A, Hahn G. MMCTest-A Safe Algorithm for Implementing Multiple Monte Carlo Tests. Scand Stat Theory Appl 2014. [DOI: 10.1111/sjos.12085] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Axel Gandy
- Department of Mathematics; Imperial College London
| | - Georg Hahn
- Department of Mathematics; Imperial College London
| |
Collapse
|
9
|
Franceschi P, Giordan M, Wehrens R. Multiple comparisons in mass-spectrometry-based -omics technologies. Trends Analyt Chem 2013. [DOI: 10.1016/j.trac.2013.04.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|