1
|
Henarejos-Castillo I, Sanz FJ, Solana-Manrique C, Sebastian-Leon P, Medina I, Remohi J, Paricio N, Diaz-Gimeno P. Whole-exome sequencing and Drosophila modelling reveal mutated genes and pathways contributing to human ovarian failure. Reprod Biol Endocrinol 2024; 22:153. [PMID: 39633407 PMCID: PMC11616368 DOI: 10.1186/s12958-024-01325-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 11/24/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND Ovarian failure (OF) is a multifactorial, complex disease presented by up to 1% of women under 40 years of age. Despite 90% of patients being diagnosed with idiopathic OF, the underlying molecular mechanisms remain unknown, making it difficult to personalize treatments for these patients in the clinical setting. Studying the presence and/or accumulation of SNVs at the gene/pathway levels will help describe novel genes and characterize disrupted biological pathways linked with ovarian failure. METHODS Ad-hoc case-control SNV screening conducted from 2020 to 2023 of 150 VCF files WES data included Spanish IVF patients with (n = 118) and without (n = 32) OF (< 40 years of age; mean BMI 22.78) along with GnomAD (n = 38,947) and IGSR (n = 1,271; 258 European female VCF) data for pseudo-control female populations. SNVs were prioritized according to their predicted deleteriousness, frequency in genomic databases, and proportional differences across populations. A burden test was performed to reveal genes with a higher presence of SNVs in the OF cohort in comparison to control and pseudo-control groups. Systematic in-silico analyses were performed to assess the potential disruptions caused by the mutated genes in relevant biological pathways. Finally, genes with orthologues in Drosophila melanogaster were considered to experimentally validate the potential impediments to ovarian function and reproductive potential. RESULTS Eighteen genes had a higher presence of SNVs in the OF population (FDR < 0.05). AK2, CDC27, CFTR, CTBP2, KMT2C, and MTCH2 were associated with OF for the first time and their silenced/knockout forms reduced fertility in Drosophila. We also predicted the disruption of 29 sub-pathways across four signalling pathways (FDR < 0.05). These sub-pathways included the metaphase to anaphase transition during oocyte meiosis, inflammatory processes related to necroptosis, DNA repair mismatch systems and the MAPK signalling cascade. CONCLUSIONS This study sheds light on the underlying molecular mechanisms of OF, providing novel associations for six genes and OF-related infertility, setting a foundation for further biomarker development, and improving precision medicine in infertility.
Collapse
Affiliation(s)
- Ismael Henarejos-Castillo
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
- Department of Pediatrics, Obstetrics and Gynaecology, University of Valencia, Av. Blasco Ibáñez 15, Valencia, 46010, Spain
| | - Francisco José Sanz
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
- Department of Genetics, Biotechnology and Biomedicine Institute (BioTecMed), University of Valencia, C. Dr. Moliner, 50, Burjassot, 46100, Spain
| | - Cristina Solana-Manrique
- Department of Genetics, Biotechnology and Biomedicine Institute (BioTecMed), University of Valencia, C. Dr. Moliner, 50, Burjassot, 46100, Spain
- Department of Physiotherapy, Faculty of Health Sciences, European University of Valencia, Passeig de l'Albereda, 7, Valencia, 46010, Spain
| | - Patricia Sebastian-Leon
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
| | - Ignacio Medina
- High-Performance Computing Service, University of Cambridge, 7 JJ Thomson Ave, Cambridge, CB3 0RB, UK
| | - José Remohi
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
- Department of Pediatrics, Obstetrics and Gynaecology, University of Valencia, Av. Blasco Ibáñez 15, Valencia, 46010, Spain
| | - Nuria Paricio
- Department of Genetics, Biotechnology and Biomedicine Institute (BioTecMed), University of Valencia, C. Dr. Moliner, 50, Burjassot, 46100, Spain
| | - Patricia Diaz-Gimeno
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain.
- Department of Genomic & Systems Reproductive Medicine, IVI Foundation, Valencia, Spain - Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Torre A, Planta 1ª, Valencia, 46026, Spain.
| |
Collapse
|
2
|
Zhang T, Zhang SW, Xie MY, Li Y. Identifying cooperating cancer driver genes in individual patients through hypergraph random walk. J Biomed Inform 2024; 157:104710. [PMID: 39159864 DOI: 10.1016/j.jbi.2024.104710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/30/2024] [Accepted: 08/14/2024] [Indexed: 08/21/2024]
Abstract
OBJECTIVE Identifying cancer driver genes, especially rare or patient-specific cancer driver genes, is a primary goal in cancer therapy. Although researchers have proposed some methods to tackle this problem, these methods mostly identify cancer driver genes at single gene level, overlooking the cooperative relationship among cancer driver genes. Identifying cooperating cancer driver genes in individual patients is pivotal for understanding cancer etiology and advancing the development of personalized therapies. METHODS Here, we propose a novel Personalized Cooperating cancer Driver Genes (PCoDG) method by using hypergraph random walk to identify the cancer driver genes that cooperatively drive individual patient cancer progression. By leveraging the powerful ability of hypergraph in representing multi-way relationships, PCoDG first employs the personalized hypergraph to depict the complex interactions among mutated genes and differentially expressed genes of an individual patient. Then, a hypergraph random walk algorithm based on hyperedge similarity is utilized to calculate the importance scores of mutated genes, integrating these scores with signaling pathway data to identify the cooperating cancer driver genes in individual patients. RESULTS The experimental results on three TCGA cancer datasets (i.e., BRCA, LUAD, and COADREAD) demonstrate the effectiveness of PCoDG in identifying personalized cooperating cancer driver genes. These genes identified by PCoDG not only offer valuable insights into patient stratification correlating with clinical outcomes, but also provide an useful reference resource for tailoring personalized treatments. CONCLUSION We propose a novel method that can effectively identify cooperating cancer driver genes for individual patients, thereby deepening our understanding of the cooperative relationship among personalized cancer driver genes and advancing the development of precision oncology.
Collapse
Affiliation(s)
- Tong Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China; School of Electrical and Mechanical Engineering, Pingdingshan University, Pingdingshan 467000, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Ming-Yu Xie
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yan Li
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
3
|
Niu R, Guo Y, Shang X. GLIMS: A two-stage gradual-learning method for cancer genes prediction using multi-omics data and co-splicing network. iScience 2024; 27:109387. [PMID: 38510118 PMCID: PMC10951990 DOI: 10.1016/j.isci.2024.109387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/30/2023] [Accepted: 02/27/2024] [Indexed: 03/22/2024] Open
Abstract
Identifying cancer genes is vital for cancer diagnosis and treatment. However, because of the complexity of cancer occurrence and limited cancer genes knowledge, it is hard to identify cancer genes accurately using only a few omics data, and the overall performance of existing methods is being called for further improvement. Here, we introduce a two-stage gradual-learning strategy GLIMS to predict cancer genes using integrative features from multi-omics data. Firstly, it uses a semi-supervised hierarchical graph neural network to predict the initial candidate cancer genes by integrating multi-omics data and protein-protein interaction (PPI) network. Then, it uses an unsupervised approach to further optimize the initial prediction by integrating the co-splicing network in post-transcriptional regulation, which plays an important role in cancer development. Systematic experiments on multi-omics cancer data demonstrated that GLIMS outperforms the state-of-the-art methods for the identification of cancer genes and it could be a useful tool to help advance cancer analysis.
Collapse
Affiliation(s)
- Rui Niu
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| | - Yang Guo
- School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
4
|
Xu X, Qi Z, Wang L, Zhang M, Geng Z, Han X. Gsw-fi: a GLM model incorporating shrinkage and double-weighted strategies for identifying cancer driver genes with functional impact. BMC Bioinformatics 2024; 25:99. [PMID: 38448819 PMCID: PMC10916024 DOI: 10.1186/s12859-024-05707-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 02/16/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND Cancer, a disease with high morbidity and mortality rates, poses a significant threat to human health. Driver genes, which harbor mutations accountable for the initiation and progression of tumors, play a crucial role in cancer development. Identifying driver genes stands as a paramount objective in cancer research and precision medicine. RESULTS In the present work, we propose a method for identifying driver genes using a Generalized Linear Regression Model (GLM) with Shrinkage and double-Weighted strategies based on Functional Impact, which is named GSW-FI. Firstly, an estimating model is proposed for assessing the background functional impacts of genes based on GLM, utilizing gene features as predictors. Secondly, the shrinkage and double-weighted strategies as two revising approaches are integrated to ensure the rationality of the identified driver genes. Lastly, a statistical method of hypothesis testing is designed to identify driver genes by leveraging the estimated background function impacts. Experimental results conducted on 31 The Cancer Genome Altas datasets demonstrate that GSW-FI outperforms ten other prediction methods in terms of the overlap fraction with well-known databases and consensus predictions among different methods. CONCLUSIONS GSW-FI presents a novel approach that efficiently identifies driver genes with functional impact mutations using computational methods, thereby advancing the development of precision medicine for cancer.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, USA
| | - Lei Wang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children's Medical Group, Dalian, China.
| | - Meiwei Zhang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children's Medical Group, Dalian, China.
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Xiumei Han
- College of Artificial Intelligence, Dalian Maritime University, Dalian, China
| |
Collapse
|
5
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
6
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
7
|
Chu X, Guan B, Dai L, Liu JX, Li F, Shang J. Network embedding framework for driver gene discovery by combining functional and structural information. BMC Genomics 2023; 24:426. [PMID: 37516822 PMCID: PMC10386255 DOI: 10.1186/s12864-023-09515-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 07/13/2023] [Indexed: 07/31/2023] Open
Abstract
Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.
Collapse
Affiliation(s)
- Xin Chu
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Boxin Guan
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Lingyun Dai
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China.
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China.
| |
Collapse
|
8
|
Zhu X, Zhao W, Zhou Z, Gu X. Unraveling the Drivers of Tumorigenesis in the Context of Evolution: Theoretical Models and Bioinformatics Tools. J Mol Evol 2023:10.1007/s00239-023-10117-0. [PMID: 37246992 DOI: 10.1007/s00239-023-10117-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 05/09/2023] [Indexed: 05/30/2023]
Abstract
Cancer originates from somatic cells that have accumulated mutations. These mutations alter the phenotype of the cells, allowing them to escape homeostatic regulation that maintains normal cell numbers. The emergence of malignancies is an evolutionary process in which the random accumulation of somatic mutations and sequential selection of dominant clones cause cancer cells to proliferate. The development of technologies such as high-throughput sequencing has provided a powerful means to measure subclonal evolutionary dynamics across space and time. Here, we review the patterns that may be observed in cancer evolution and the methods available for quantifying the evolutionary dynamics of cancer. An improved understanding of the evolutionary trajectories of cancer will enable us to explore the molecular mechanism of tumorigenesis and to design tailored treatment strategies.
Collapse
Affiliation(s)
- Xunuo Zhu
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wenyi Zhao
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China.
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China.
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
9
|
Xu X, Qi Z, Zhang D, Zhang M, Ren Y, Geng Z. DriverGenePathway: Identifying driver genes and driver pathways in cancer based on MutSigCV and statistical methods. Comput Struct Biotechnol J 2023; 21:3124-3135. [PMID: 37293242 PMCID: PMC10244682 DOI: 10.1016/j.csbj.2023.05.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 05/18/2023] [Accepted: 05/18/2023] [Indexed: 06/10/2023] Open
Abstract
Although computational methods for driver gene identification have progressed rapidly, it is far from the goal of obtaining widely recognized driver genes for all cancer types. The driver gene lists predicted by these methods often lack consistency and stability across different studies or datasets. In addition to analytical performance, some tools may require further improvement regarding operability and system compatibility. Here, we developed a user-friendly R package (DriverGenePathway) integrating MutSigCV and statistical methods to identify cancer driver genes and pathways. The theoretical basis of the MutSigCV program is elaborated and integrated into DriverGenePathway, such as mutation categories discovery based on information entropy. Five methods of hypothesis testing, including the beta-binomial test, Fisher combined p-value test, likelihood ratio test, convolution test, and projection test, are used to identify the minimal core driver genes. Moreover, de novo methods, which can effectively overcome mutational heterogeneity, are introduced to identify driver pathways. Herein, we describe the computational structure and statistical fundamentals of the DriverGenePathway pipeline and demonstrate its performance using eight types of cancer from TCGA. DriverGenePathway correctly confirms many expected driver genes with high overlap with the Cancer Gene Census list and driver pathways associated with cancer development. The DriverGenePathway R package is freely available on GitHub: https://github.com/bioinformatics-xu/DriverGenePathway.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Dawei Zhang
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Meiwei Zhang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children’s Medical Group, Dalian 116037, China
| | - Yonggong Ren
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China
| |
Collapse
|
10
|
Parvandeh S, Donehower LA, Katsonis P, Hsu TK, Asmussen J, Lee K, Lichtarge O. EPIMUTESTR: a nearest neighbor machine learning approach to predict cancer driver genes from the evolutionary action of coding variants. Nucleic Acids Res 2022; 50:e70. [PMID: 35412634 PMCID: PMC9262594 DOI: 10.1093/nar/gkac215] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/17/2022] [Accepted: 03/21/2022] [Indexed: 02/01/2023] Open
Abstract
Discovering rare cancer driver genes is difficult because their mutational frequency is too low for statistical detection by computational methods. EPIMUTESTR is an integrative nearest-neighbor machine learning algorithm that identifies such marginal genes by modeling the fitness of their mutations with the phylogenetic Evolutionary Action (EA) score. Over cohorts of sequenced patients from The Cancer Genome Atlas representing 33 tumor types, EPIMUTESTR detected 214 previously inferred cancer driver genes and 137 new candidates never identified computationally before of which seven genes are supported in the COSMIC Cancer Gene Census. EPIMUTESTR achieved better robustness and specificity than existing methods in a number of benchmark methods and datasets.
Collapse
Affiliation(s)
- Saeid Parvandeh
- To whom correspondence should be addressed. Tel: +1 713 798 7677;
| | - Lawrence A Donehower
- Department of Molecular Virology and Microbiology, Houston, TX 77030, USA,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Teng-Kuei Hsu
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jennifer K Asmussen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kwanghyuk Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Correspondence may also be addressed to Olivier Lichtarge. Tel: +1 713 798 5646;
| |
Collapse
|
11
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
12
|
Kan Y, Jiang L, Guo Y, Tang J, Guo F. Two-stage-vote ensemble framework based on integration of mutation data and gene interaction network for uncovering driver genes. Brief Bioinform 2021; 23:6426028. [PMID: 34791034 DOI: 10.1093/bib/bbab429] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 08/30/2021] [Accepted: 09/18/2021] [Indexed: 11/14/2022] Open
Abstract
Identifying driver genes, exactly from massive genes with mutations, promotes accurate diagnosis and treatment of cancer. In recent years, a lot of works about uncovering driver genes based on integration of mutation data and gene interaction networks is gaining more attention. However, it is in suspense if it is more effective for prioritizing driver genes when integrating various types of mutation information (frequency and functional impact) and gene networks. Hence, we build a two-stage-vote ensemble framework based on somatic mutations and mutual interactions. Specifically, we first represent and combine various kinds of mutation information, which are propagated through networks by an improved iterative framework. The first vote is conducted on iteration results by voting methods, and the second vote is performed to get ensemble results of the first poll for the final driver gene list. Compared with four excellent previous approaches, our method has better performance in identifying driver genes on $33$ types of cancer from The Cancer Genome Atlas. Meanwhile, we also conduct a comparative analysis about two kinds of mutation information, five gene interaction networks and four voting strategies. Our framework offers a new view for data integration and promotes more latent cancer genes to be admitted.
Collapse
Affiliation(s)
- Yingxin Kan
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, U.S
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
13
|
Trairatphisan P, de Souza TM, Kleinjans J, Jennen D, Saez-Rodriguez J. Contextualization of causal regulatory networks from toxicogenomics data applied to drug-induced liver injury. Toxicol Lett 2021; 350:40-51. [PMID: 34229068 DOI: 10.1016/j.toxlet.2021.06.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/19/2021] [Accepted: 06/30/2021] [Indexed: 11/19/2022]
Abstract
In recent years, network-based methods have become an attractive analytical approach for toxicogenomics studies. They can capture not only the global changes of regulatory gene networks but also the relationships between their components. Among them, a causal reasoning approach depicts the mechanisms of regulation that connect upstream regulators in signaling networks to their downstream gene targets. In this work, we applied CARNIVAL, a causal network contextualisation tool, to infer upstream signaling networks deregulated in drug-induced liver injury (DILI) from gene expression microarray data from the TG-GATEs database. We focussed on six compounds that induce observable histopathologies linked to DILI from repeated dosing experiments in rats. We compared responses in vitro and in vivo to identify potential cross-platform concordances in rats as well as network preservations between rat and human. Our results showed similarities of enriched pathways and network motifs between compounds. These pathways and motifs induced the same pathology in rats but not in humans. In particular, the causal interactions "LCK activates SOCS3, which in turn inhibits TFDP1" was commonly identified as a regulatory path among the fibrosis-inducing compounds. This potential pathology-inducing regulation illustrates the value of our approach to generate hypotheses that can be further validated experimentally.
Collapse
Affiliation(s)
- Panuwat Trairatphisan
- Heidelberg University, Faculty of Medicine, Institute of Computational Biomedicine, 69120, Heidelberg, Germany.
| | - Terezinha Maria de Souza
- Department of Toxicogenomics (TGX), GROW School for Oncology and Developmental Biology, Maastricht University, 6200 MD, Maastricht, the Netherlands.
| | - Jos Kleinjans
- Department of Toxicogenomics (TGX), GROW School for Oncology and Developmental Biology, Maastricht University, 6200 MD, Maastricht, the Netherlands.
| | - Danyel Jennen
- Department of Toxicogenomics (TGX), GROW School for Oncology and Developmental Biology, Maastricht University, 6200 MD, Maastricht, the Netherlands.
| | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Institute of Computational Biomedicine, 69120, Heidelberg, Germany; RWTH Aachen University, Faculty of Medicine, Joint Research Centre for Computational Biomedicine (JRC-COMBINE), 52074, Aachen, Germany.
| |
Collapse
|
14
|
Xenos A, Malod-Dognin N, Milinković S, Pržulj N. Linear functional organization of the omic embedding space. Bioinformatics 2021; 37:3839-3847. [PMID: 34213534 PMCID: PMC8570782 DOI: 10.1093/bioinformatics/btab487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 06/21/2021] [Accepted: 06/30/2021] [Indexed: 11/21/2022] Open
Abstract
Motivation We are increasingly accumulating complex omics data that capture different aspects of cellular functioning. A key challenge is to untangle their complexity and effectively mine them for new biomedical information. To decipher this new information, we introduce algorithms based on network embeddings. Such algorithms represent biological macromolecules as vectors in d-dimensional space, in which topologically similar molecules are embedded close in space and knowledge is extracted directly by vector operations. Recently, it has been shown that neural networks used to obtain vectorial representations (embeddings) are implicitly factorizing a mutual information matrix, called Positive Pointwise Mutual Information (PPMI) matrix. Thus, we propose the use of the PPMI matrix to represent the human protein–protein interaction (PPI) network and also introduce the graphlet degree vector PPMI matrix of the PPI network to capture different topological (structural) similarities of the nodes in the molecular network. Results We generate the embeddings by decomposing these matrices with Nonnegative Matrix Tri-Factorization. We demonstrate that genes that are embedded close in these spaces have similar biological functions, so we can extract new biomedical knowledge directly by doing linear operations on their embedding vector representations. We exploit this property to predict new genes participating in protein complexes and to identify new cancer-related genes based on the cosine similarities between the vector representations of the genes. We validate 80% of our novel cancer-related gene predictions in the literature and also by patient survival curves that demonstrating that 93.3% of them have a potential clinical relevance as biomarkers of cancer. Availability and implementation Code and data are available online at https://gitlab.bsc.es/axenos/embedded-omics-data-geometry/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- A Xenos
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.,Universitat Politecnica de Catalunya (UPC), 08034 Barcelona, Spain
| | - N Malod-Dognin
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.,Department of Computer Science, University College London, WC1E 6BT London, United Kingdom
| | - S Milinković
- RAF School of Computing, Union University, Belgrade, Serbia
| | - N Pržulj
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain.,Department of Computer Science, University College London, WC1E 6BT London, United Kingdom.,ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
15
|
Weighill D, Ben Guebila M, Glass K, Platig J, Yeh JJ, Quackenbush J. Gene Targeting in Disease Networks. Front Genet 2021; 12:649942. [PMID: 33968133 PMCID: PMC8103030 DOI: 10.3389/fgene.2021.649942] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 03/15/2021] [Indexed: 01/12/2023] Open
Abstract
Profiling of whole transcriptomes has become a cornerstone of molecular biology and an invaluable tool for the characterization of clinical phenotypes and the identification of disease subtypes. Analyses of these data are becoming ever more sophisticated as we move beyond simple comparisons to consider networks of higher-order interactions and associations. Gene regulatory networks (GRNs) model the regulatory relationships of transcription factors and genes and have allowed the identification of differentially regulated processes in disease systems. In this perspective, we discuss gene targeting scores, which measure changes in inferred regulatory network interactions, and their use in identifying disease-relevant processes. In addition, we present an example analysis for pancreatic ductal adenocarcinoma (PDAC), demonstrating the power of gene targeting scores to identify differential processes between complex phenotypes, processes that would have been missed by only performing differential expression analysis. This example demonstrates that gene targeting scores are an invaluable addition to gene expression analysis in the characterization of diseases and other complex phenotypes.
Collapse
Affiliation(s)
- Deborah Weighill
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, United States
| | - Marouen Ben Guebila
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, United States
| | - Kimberly Glass
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, United States
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Harvard Medical School, Harvard University, Boston, MA, United States
| | - John Platig
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Harvard Medical School, Harvard University, Boston, MA, United States
| | - Jen Jen Yeh
- Departments of Surgery and Pharmacology, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - John Quackenbush
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, United States
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
| |
Collapse
|
16
|
Cutigi JF, Evangelista RF, Ramos RH, de Oliveira Lage Ferreira C, Evangelista AF, de Carvalho ACPLF, Simao A. Combining Mutation and Gene Network Data in a Machine Learning Approach for False-Positive Cancer Driver Gene Discovery. LECTURE NOTES IN COMPUTER SCIENCE 2020:81-92. [DOI: 10.1007/978-3-030-65775-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|