1
|
Yi S, Xie M. DriverMEDS: Cancer driver gene identification using mutual exclusivity from embeded features and driver mutation scoring. Methods 2025; 239:22-29. [PMID: 40113153 DOI: 10.1016/j.ymeth.2025.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 01/24/2025] [Accepted: 03/14/2025] [Indexed: 03/22/2025] Open
Abstract
Efficiently identifying cancer driver genes plays a key role in the cancer development, diagnosis and treatment. Current unsupervised driver gene identification methods typically integrate multi-omics data into gene function networks and employ network embedding algorithms to learn gene features. Additionally, they consider mutual exclusivity and mutation frequency as crucial concepts in identifying driver genes. However, existing approaches neglect the possible important implications of mutual exclusivity in the embedding space. Furthermore, they simply assume that all driver genes exhibit high mutation frequencies. Fortunately, we explored the mutual exclusivity implanted in the learned features and have verified that the Euclidean distances between learned features are strongly related to the mutual exclusivity and they can reveal more information for the mutual exclusivity. Thus, we designed an unsupervised driver gene predicting framework DriverMEDS based on the above idea and a novel driver mutation scoring strategy. First, we design a feature clustering algorithm to generate gene modules. In each module, the Euclidean distances of learned features are used to calculate a module importance score for each gene based on the related mutual exclusivity. Then, following the fact that most of driver genes have intermediate mutation frequencies, a driver mutation scoring function is designed for each gene to optimize the existing mutation frequency scoring strategy. Finally, the weighted sum of the module importance score and the driver mutation score is used to prioritize the genes. The experiment results and analysis show that DriverMEDS could detect novel cancer driver genes and relevant function modules, and outperforms other five state-of-the-art methods for cancer driver identification.
Collapse
Affiliation(s)
- Sichen Yi
- Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China.
| | - Minzhu Xie
- Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China; College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China.
| |
Collapse
|
2
|
Su X, Hu P, Li D, Zhao B, Niu Z, Herget T, Yu PS, Hu L. Interpretable identification of cancer genes across biological networks via transformer-powered graph representation learning. Nat Biomed Eng 2025; 9:371-389. [PMID: 39789329 DOI: 10.1038/s41551-024-01312-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 11/01/2024] [Indexed: 01/12/2025]
Abstract
Graph representation learning has been leveraged to identify cancer genes from biological networks. However, its applicability is limited by insufficient interpretability and generalizability under integrative network analysis. Here we report the development of an interpretable and generalizable transformer-based model that accurately predicts cancer genes by leveraging graph representation learning and the integration of multi-omics data with the topologies of homogeneous and heterogeneous networks of biological interactions. The model allows for the interpretation of the respective importance of multi-omic and higher-order structural features, achieved state-of-the-art performance in the prediction of cancer genes across biological networks (including networks of interactions between miRNA and proteins, transcription factors and proteins, and transcription factors and miRNA) in pan-cancer and cancer-specific scenarios, and predicted 57 cancer-gene candidates (including three genes that had not been identified by other models) among 4,729 unlabelled genes across 8 pan-cancer datasets. The model's interpretability and generalization may facilitate the understanding of gene-related regulatory mechanisms and the discovery of new cancer genes.
Collapse
Affiliation(s)
- Xiaorui Su
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA
| | - Pengwei Hu
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Dongxu Li
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Bowei Zhao
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhaomeng Niu
- Department of Health Informatics, Rutgers School of Health Professions, Piscataway, NJ, USA
| | | | - Philip S Yu
- Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA
| | - Lun Hu
- Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
3
|
Zhang H, Lin C, Chen Y, Shen X, Wang R, Chen Y, Lyu J. Enhancing Molecular Network-Based Cancer Driver Gene Prediction Using Machine Learning Approaches: Current Challenges and Opportunities. J Cell Mol Med 2025; 29:e70351. [PMID: 39804102 PMCID: PMC11726689 DOI: 10.1111/jcmm.70351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 12/24/2024] [Accepted: 01/02/2025] [Indexed: 01/16/2025] Open
Abstract
Cancer is a complex disease driven by mutations in the genes that play critical roles in cellular processes. The identification of cancer driver genes is crucial for understanding tumorigenesis, developing targeted therapies and identifying rational drug targets. Experimental identification and validation of cancer driver genes are time-consuming and costly. Studies have demonstrated that interactions among genes are associated with similar phenotypes. Therefore, identifying cancer driver genes using molecular network-based approaches is necessary. Molecular network-based random walk-based approaches, which integrate mutation data with protein-protein interaction networks, have been widely employed in predicting cancer driver genes and demonstrated robust predictive potential. However, recent advancements in deep learning, particularly graph-based models, have provided novel opportunities for enhancing the prediction of cancer driver genes. This review aimed to comprehensively explore how machine learning methodologies, particularly network propagation, graph neural networks, autoencoders, graph embeddings, and attention mechanisms, improve the scalability and interpretability of molecular network-based cancer gene prediction.
Collapse
Affiliation(s)
- Hao Zhang
- Postgraduate Training Base Alliance of Wenzhou Medical UniversityWenzhouZhejiangChina
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| | - Chaohuan Lin
- Postgraduate Training Base Alliance of Wenzhou Medical UniversityWenzhouZhejiangChina
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| | - Ying'ao Chen
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| | | | - Ruizhe Wang
- Wenzhou Longwan High SchoolWenzhouZhejiangChina
| | - Yiqi Chen
- Wenzhou Longwan High SchoolWenzhouZhejiangChina
| | - Jie Lyu
- Postgraduate Training Base Alliance of Wenzhou Medical UniversityWenzhouZhejiangChina
- Wenzhou Key Laboratory of Biophysics, Wenzhou InstituteUniversity of Chinese Academy of SciencesWenzhouZhejiangChina
| |
Collapse
|
4
|
Nourian R, Motamedi SA, Pourfard M. BHBA-GRNet: Cancer detection through improved gene expression profiling using Binary Honey Badger Algorithm and Gene Residual-based Network. Comput Biol Med 2025; 184:109348. [PMID: 39615230 DOI: 10.1016/j.compbiomed.2024.109348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Revised: 10/29/2024] [Accepted: 10/30/2024] [Indexed: 12/22/2024]
Abstract
Cancer, a pervasive and devastating disease, remains a leading global cause of mortality, emphasizing the growing urgency for effective detection methods. Gene Expression Microarray (GEM) data has emerged as a crucial tool in this context, offering insights into early cancer detection and treatment. While deep learning methods offer promise in detecting various cancers through GEM analysis, they suffer from high dimensionality inherent in gene sequences, preventing optimal detection performance across diverse cancer types. Additionally, existing methods often resort to synthetic features and data augmentation to enhance performance. To address these challenges and enhance accuracy, a novel Binary Honey Badger Algorithm (BHBA) integrated with the Gene Residual Network (GRNet) method has been proposed. Our approach capitalizes on BHBA's feature reduction mechanism, eliminating the need for additional preprocessing steps. Comprehensive evaluations on three well-established datasets representing lung and blood-type cancers demonstrate that our method reduces GEM data size by approximately 40 % and achieves a superior accuracy improvement of around 1 % in lung cancer types compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Reza Nourian
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| | - Seyed Ahmad Motamedi
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| | - Mohammadreza Pourfard
- Electrical Engineering Department, Amirkabir University of Technology, No. 350, Hafez Ave, Valiasr Square, 15875-4413, Tehran, 159163-4311, Iran.
| |
Collapse
|
5
|
Li X, Xu J, Li J, Gu J, Shang X. Towards simplified graph neural networks for identifying cancer driver genes in heterophilic networks. Brief Bioinform 2024; 26:bbae691. [PMID: 39751645 PMCID: PMC11697181 DOI: 10.1093/bib/bbae691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2024] [Revised: 11/26/2024] [Accepted: 12/16/2024] [Indexed: 01/04/2025] Open
Abstract
The identification of cancer driver genes is crucial for understanding the complex processes involved in cancer development, progression, and therapeutic strategies. Multi-omics data and biological networks provided by numerous databases enable the application of graph deep learning techniques that incorporate network structures into the deep learning framework. However, most existing methods do not account for the heterophily in the biological networks, which hinders the improvement of model performance. Meanwhile, feature confusion often arises in models based on graph neural networks in such graphs. To address this, we propose a Simplified Graph neural network for identifying Cancer Driver genes in heterophilic networks (SGCD), which comprises primarily two components: a graph convolutional neural network with representation separation and a bimodal feature extractor. The results demonstrate that SGCD not only performs exceptionally well but also exhibits robust discriminative capabilities compared to state-of-the-art methods across all benchmark datasets. Moreover, subsequent interpretability experiments on both the model and biological aspects provide compelling evidence supporting the reliability of SGCD. Additionally, the model can dissect gene modules, revealing clearer connections between driver genes in cancers. We are confident that SGCD holds potential in the field of precision oncology and may be applied to prognosticate biomarkers for a wide range of complex diseases.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 Shaanxi, China
- Research & Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen, 518063 Guangdong, China
- Faculty of Data Science, City University of Macau, Macau, 999078 Macau, China
| | - Jialuo Xu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 Shaanxi, China
| | - Junming Li
- Research & Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen, 518063 Guangdong, China
- School of Software, Northwestern Polytechnical University, Xi’an, 710072 Shaanxi, China
| | - Jia Gu
- School of Software, Northwestern Polytechnical University, Xi’an, 710072 Shaanxi, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 Shaanxi, China
| |
Collapse
|
6
|
Xu J, Hao J, Liao X, Shang X, Li X. SSCI: Self-Supervised Deep Learning Improves Network Structure for Cancer Driver Gene Identification. Int J Mol Sci 2024; 25:10351. [PMID: 39408682 PMCID: PMC11476395 DOI: 10.3390/ijms251910351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 09/21/2024] [Accepted: 09/23/2024] [Indexed: 10/20/2024] Open
Abstract
The pathogenesis of cancer is complex, involving abnormalities in some genes in organisms. Accurately identifying cancer genes is crucial for the early detection of cancer and personalized treatment, among other applications. Recent studies have used graph deep learning methods to identify cancer driver genes based on biological networks. However, incompleteness and the noise of the networks will weaken the performance of models. To address this, we propose a cancer driver gene identification method based on self-supervision for graph convolutional networks, which can efficiently enhance the structure of the network and further improve predictive accuracy. The reliability of SSCI is verified by the area under the receiver operating characteristic curves (AUROC), the area under the precision-recall curves (AUPRC), and the F1 score, with respective values of 0.966, 0.964, and 0.913. The results show that our method can identify cancer driver genes with strong discriminative power and biological interpretability.
Collapse
Affiliation(s)
- Jialuo Xu
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Jun Hao
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Xingyu Liao
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
| | - Xingyi Li
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China; (J.X.); (J.H.); (X.L.); (X.S.)
- Research & Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen 518063, China
| |
Collapse
|
7
|
Zhang T, Zhang SW, Xie MY, Li Y. Identifying cooperating cancer driver genes in individual patients through hypergraph random walk. J Biomed Inform 2024; 157:104710. [PMID: 39159864 DOI: 10.1016/j.jbi.2024.104710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/30/2024] [Accepted: 08/14/2024] [Indexed: 08/21/2024]
Abstract
OBJECTIVE Identifying cancer driver genes, especially rare or patient-specific cancer driver genes, is a primary goal in cancer therapy. Although researchers have proposed some methods to tackle this problem, these methods mostly identify cancer driver genes at single gene level, overlooking the cooperative relationship among cancer driver genes. Identifying cooperating cancer driver genes in individual patients is pivotal for understanding cancer etiology and advancing the development of personalized therapies. METHODS Here, we propose a novel Personalized Cooperating cancer Driver Genes (PCoDG) method by using hypergraph random walk to identify the cancer driver genes that cooperatively drive individual patient cancer progression. By leveraging the powerful ability of hypergraph in representing multi-way relationships, PCoDG first employs the personalized hypergraph to depict the complex interactions among mutated genes and differentially expressed genes of an individual patient. Then, a hypergraph random walk algorithm based on hyperedge similarity is utilized to calculate the importance scores of mutated genes, integrating these scores with signaling pathway data to identify the cooperating cancer driver genes in individual patients. RESULTS The experimental results on three TCGA cancer datasets (i.e., BRCA, LUAD, and COADREAD) demonstrate the effectiveness of PCoDG in identifying personalized cooperating cancer driver genes. These genes identified by PCoDG not only offer valuable insights into patient stratification correlating with clinical outcomes, but also provide an useful reference resource for tailoring personalized treatments. CONCLUSION We propose a novel method that can effectively identify cooperating cancer driver genes for individual patients, thereby deepening our understanding of the cooperative relationship among personalized cancer driver genes and advancing the development of precision oncology.
Collapse
Affiliation(s)
- Tong Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China; School of Electrical and Mechanical Engineering, Pingdingshan University, Pingdingshan 467000, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Ming-Yu Xie
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yan Li
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
8
|
Yang J, Fu H, Xue F, Li M, Wu Y, Yu Z, Luo H, Gong J, Niu X, Zhang W. Multiview representation learning for identification of novel cancer genes and their causative biological mechanisms. Brief Bioinform 2024; 25:bbae418. [PMID: 39210506 PMCID: PMC11361854 DOI: 10.1093/bib/bbae418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/08/2024] [Accepted: 08/23/2024] [Indexed: 09/04/2024] Open
Abstract
Tumorigenesis arises from the dysfunction of cancer genes, leading to uncontrolled cell proliferation through various mechanisms. Establishing a complete cancer gene catalogue will make precision oncology possible. Although existing methods based on graph neural networks (GNN) are effective in identifying cancer genes, they fall short in effectively integrating data from multiple views and interpreting predictive outcomes. To address these shortcomings, an interpretable representation learning framework IMVRL-GCN is proposed to capture both shared and specific representations from multiview data, offering significant insights into the identification of cancer genes. Experimental results demonstrate that IMVRL-GCN outperforms state-of-the-art cancer gene identification methods and several baselines. Furthermore, IMVRL-GCN is employed to identify a total of 74 high-confidence novel cancer genes, and multiview data analysis highlights the pivotal roles of shared, mutation-specific, and structure-specific representations in discriminating distinctive cancer genes. Exploration of the mechanisms behind their discriminative capabilities suggests that shared representations are strongly associated with gene functions, while mutation-specific and structure-specific representations are linked to mutagenic propensity and functional synergy, respectively. Finally, our in-depth analyses of these candidates suggest potential insights for individualized treatments: afatinib could counteract many mutation-driven risks, and targeting interactions with cancer gene SRC is a reasonable strategy to mitigate interaction-induced risks for NR3C1, RXRA, HNF4A, and SP1.
Collapse
Affiliation(s)
- Jianye Yang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Haitao Fu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- School of Artificial Intelligence, Hubei University, Wuhan 430070, China
| | - Feiyang Xue
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Menglu Li
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuyang Wu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhanhui Yu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Haohui Luo
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jing Gong
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- College of Biomedicine and Health, Huazhong Agricultural University, Wuhan 430062, China
| | - Xiaohui Niu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
9
|
Li H, Han Z, Sun Y, Wang F, Hu P, Gao Y, Bai X, Peng S, Ren C, Xu X, Liu Z, Chen H, Yang Y, Bo X. CGMega: explainable graph neural network framework with attention mechanisms for cancer gene module dissection. Nat Commun 2024; 15:5997. [PMID: 39013885 PMCID: PMC11252405 DOI: 10.1038/s41467-024-50426-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 07/09/2024] [Indexed: 07/18/2024] Open
Abstract
Cancer is rarely the straightforward consequence of an abnormality in a single gene, but rather reflects a complex interplay of many genes, represented as gene modules. Here, we leverage the recent advances of model-agnostic interpretation approach and develop CGMega, an explainable and graph attention-based deep learning framework to perform cancer gene module dissection. CGMega outperforms current approaches in cancer gene prediction, and it provides a promising approach to integrate multi-omics information. We apply CGMega to breast cancer cell line and acute myeloid leukemia (AML) patients, and we uncover the high-order gene module formed by ErbB family and tumor factors NRG1, PPM1A and DLG2. We identify 396 candidate AML genes, and observe the enrichment of either known AML genes or candidate AML genes in a single gene module. We also identify patient-specific AML genes and associated gene modules. Together, these results indicate that CGMega can be used to dissect cancer gene modules, and provide high-order mechanistic insights into cancer development and heterogeneity.
Collapse
Affiliation(s)
- Hao Li
- Academy of Military Medical Sciences, Beijing, China
| | - Zebei Han
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yu Sun
- Academy of Military Medical Sciences, Beijing, China
| | - Fu Wang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Pengzhen Hu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, China
| | - Yuang Gao
- Department of Hematology, PLA General Hospital, the Fifth Medical Center, Beijing, China
| | - Xuemei Bai
- Academy of Military Medical Sciences, Beijing, China
| | - Shiyu Peng
- Academy of Military Medical Sciences, Beijing, China
| | - Chao Ren
- Academy of Military Medical Sciences, Beijing, China
| | - Xiang Xu
- Academy of Military Medical Sciences, Beijing, China
| | - Zeyu Liu
- Academy of Military Medical Sciences, Beijing, China
| | - Hebing Chen
- Academy of Military Medical Sciences, Beijing, China.
| | - Yang Yang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China.
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing, China.
| |
Collapse
|
10
|
Jung S, Wang S, Lee D. CancerGATE: Prediction of cancer-driver genes using graph attention autoencoders. Comput Biol Med 2024; 176:108568. [PMID: 38744009 DOI: 10.1016/j.compbiomed.2024.108568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/13/2024] [Accepted: 05/05/2024] [Indexed: 05/16/2024]
Abstract
Discovery of the cancer type specific-driver genes is important for understanding the molecular mechanisms of each cancer type and for providing proper treatment. Recently, graph deep learning methods became widely used in finding cancer-driver genes. However, previous methods had limited performance in individual cancer types due to a small number of cancer-driver genes used in training and biases toward the cancer-driver genes used in training the models. Here, we introduce a novel pipeline, CancerGATE that predicts the cancer-driver genes using graph attention autoencoder (GATE) to learn in a self-supervised manner and can be applied to each of the cancer types. CancerGATE utilizes biological network topology and multi-omics data from 15 types of cancer of 20,079 samples from the cancer genome atlas (TCGA). Attention coefficients calculated in the model are used to prioritize cancer-driver genes by comparing coefficients of cancer and normal contexts. CancerGATE shows a higher AUPRC with a difference ranging from 1.5 % to 36.5 % compared to the previous graph deep learning models in each cancer type. We also show that CancerGATE is free from the bias toward cancer-driver genes used in training, revealing mechanisms of the cancer-driver genes in specific cancer types. Finally, we propose novel cancer-driver gene candidates that could be therapeutic targets for specific cancer types.
Collapse
Affiliation(s)
- Seunghwan Jung
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| | - Seunghyun Wang
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| |
Collapse
|
11
|
Patel Y, Shah T, Dhar MK, Zhang T, Niezgoda J, Gopalakrishnan S, Yu Z. Integrated image and location analysis for wound classification: a deep learning approach. Sci Rep 2024; 14:7043. [PMID: 38528003 DOI: 10.1038/s41598-024-56626-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/08/2024] [Indexed: 03/27/2024] Open
Abstract
The global burden of acute and chronic wounds presents a compelling case for enhancing wound classification methods, a vital step in diagnosing and determining optimal treatments. Recognizing this need, we introduce an innovative multi-modal network based on a deep convolutional neural network for categorizing wounds into four categories: diabetic, pressure, surgical, and venous ulcers. Our multi-modal network uses wound images and their corresponding body locations for more precise classification. A unique aspect of our methodology is incorporating a body map system that facilitates accurate wound location tagging, improving upon traditional wound image classification techniques. A distinctive feature of our approach is the integration of models such as VGG16, ResNet152, and EfficientNet within a novel architecture. This architecture includes elements like spatial and channel-wise Squeeze-and-Excitation modules, Axial Attention, and an Adaptive Gated Multi-Layer Perceptron, providing a robust foundation for classification. Our multi-modal network was trained and evaluated on two distinct datasets comprising relevant images and corresponding location information. Notably, our proposed network outperformed traditional methods, reaching an accuracy range of 74.79-100% for Region of Interest (ROI) without location classifications, 73.98-100% for ROI with location classifications, and 78.10-100% for whole image classifications. This marks a significant enhancement over previously reported performance metrics in the literature. Our results indicate the potential of our multi-modal network as an effective decision-support tool for wound image classification, paving the way for its application in various clinical contexts.
Collapse
Affiliation(s)
- Yash Patel
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Tirth Shah
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Mrinal Kanti Dhar
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Taiyu Zhang
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Jeffrey Niezgoda
- Advancing the Zenith of Healthcare (AZH) Wound and Vascular Center, Milwaukee, WI, USA
| | | | - Zeyun Yu
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
- Department of Biomedical Engineering, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
| |
Collapse
|
12
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
13
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
14
|
Cui Y, Wang Z, Wang X, Zhang Y, Zhang Y, Pan T, Zhang Z, Li S, Guo Y, Akutsu T, Song J. SMG: self-supervised masked graph learning for cancer gene identification. Brief Bioinform 2023; 24:bbad406. [PMID: 37950905 PMCID: PMC10639095 DOI: 10.1093/bib/bbad406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 09/26/2023] [Accepted: 10/24/2023] [Indexed: 11/13/2023] Open
Abstract
Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.
Collapse
Affiliation(s)
- Yan Cui
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Zhikang Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Xiaoyu Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yiwen Zhang
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Ying Zhang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Tong Pan
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | | | - Shanshan Li
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
15
|
Zhu X, Zhao W, Zhou Z, Gu X. Unraveling the Drivers of Tumorigenesis in the Context of Evolution: Theoretical Models and Bioinformatics Tools. J Mol Evol 2023:10.1007/s00239-023-10117-0. [PMID: 37246992 DOI: 10.1007/s00239-023-10117-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 05/09/2023] [Indexed: 05/30/2023]
Abstract
Cancer originates from somatic cells that have accumulated mutations. These mutations alter the phenotype of the cells, allowing them to escape homeostatic regulation that maintains normal cell numbers. The emergence of malignancies is an evolutionary process in which the random accumulation of somatic mutations and sequential selection of dominant clones cause cancer cells to proliferate. The development of technologies such as high-throughput sequencing has provided a powerful means to measure subclonal evolutionary dynamics across space and time. Here, we review the patterns that may be observed in cancer evolution and the methods available for quantifying the evolutionary dynamics of cancer. An improved understanding of the evolutionary trajectories of cancer will enable us to explore the molecular mechanism of tumorigenesis and to design tailored treatment strategies.
Collapse
Affiliation(s)
- Xunuo Zhu
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wenyi Zhao
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China.
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China.
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
16
|
Xu X, Qi Z, Zhang D, Zhang M, Ren Y, Geng Z. DriverGenePathway: Identifying driver genes and driver pathways in cancer based on MutSigCV and statistical methods. Comput Struct Biotechnol J 2023; 21:3124-3135. [PMID: 37293242 PMCID: PMC10244682 DOI: 10.1016/j.csbj.2023.05.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 05/18/2023] [Accepted: 05/18/2023] [Indexed: 06/10/2023] Open
Abstract
Although computational methods for driver gene identification have progressed rapidly, it is far from the goal of obtaining widely recognized driver genes for all cancer types. The driver gene lists predicted by these methods often lack consistency and stability across different studies or datasets. In addition to analytical performance, some tools may require further improvement regarding operability and system compatibility. Here, we developed a user-friendly R package (DriverGenePathway) integrating MutSigCV and statistical methods to identify cancer driver genes and pathways. The theoretical basis of the MutSigCV program is elaborated and integrated into DriverGenePathway, such as mutation categories discovery based on information entropy. Five methods of hypothesis testing, including the beta-binomial test, Fisher combined p-value test, likelihood ratio test, convolution test, and projection test, are used to identify the minimal core driver genes. Moreover, de novo methods, which can effectively overcome mutational heterogeneity, are introduced to identify driver pathways. Herein, we describe the computational structure and statistical fundamentals of the DriverGenePathway pipeline and demonstrate its performance using eight types of cancer from TCGA. DriverGenePathway correctly confirms many expected driver genes with high overlap with the Cancer Gene Census list and driver pathways associated with cancer development. The DriverGenePathway R package is freely available on GitHub: https://github.com/bioinformatics-xu/DriverGenePathway.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Dawei Zhang
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Meiwei Zhang
- Center for Reproductive and Genetic Medicine, Dalian Women and Children’s Medical Group, Dalian 116037, China
| | - Yonggong Ren
- School of Computer and Information Technology, Liaoning Normal University, Dalian 116029, China
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China
| |
Collapse
|
17
|
He Z, Lin Y, Wei R, Liu C, Jiang D. Repulsion and attraction in searching: A hybrid algorithm based on gravitational kernel and vital few for cancer driver gene prediction. Comput Biol Med 2022; 151:106236. [PMID: 36370584 DOI: 10.1016/j.compbiomed.2022.106236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/15/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
By taking a new perspective to combine a machine learning method with an evolutionary algorithm, a new hybrid algorithm is developed to predict cancer driver genes. Firstly, inspired by the search strategy with the capability of global search in evolutionary algorithms, a gravitational kernel is proposed to act on the full range of gene features. Constructed by fusing PPI and mutation features, the gravitational kernel is capable to produce repulsion effects. The candidate genes with greater mutation effects and PPI have higher similarity scores. According to repulsion, the similarity score of these promising genes is larger than ordinary genes, which is beneficial to search for these promising genes. Secondly, inspired by the idea of elite populations related to evolutionary algorithms, the concept of vital few is proposed. Targeted at a local scale, it acts on the candidate genes associated with vital few genes. Under attraction effect, these vital few driver genes attract those with similar mutational effects to them, which leads to greater similarity scores. Lastly, the model and parameters are optimized by using an evolutionary algorithm, so as to obtain the optimal model and parameters for cancer driver gene prediction. Herein, a comparison is performed with six other advanced methods of cancer driver gene prediction. According to the experimental results, the method proposed in this study outperforms these six state-of-the-art algorithms on the pan-oncogene dataset.
Collapse
Affiliation(s)
- Zhihui He
- Department of Computer Science, Shantou University, 515063, China
| | - Yingqing Lin
- Department of Computer Science, Shantou University, 515063, China
| | - Runguo Wei
- Department of Computer Science, Shantou University, 515063, China
| | - Cheng Liu
- Department of Computer Science, Shantou University, 515063, China
| | - Dazhi Jiang
- Department of Computer Science, Shantou University, 515063, China; Guangdong Provincial Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou 510399, China.
| |
Collapse
|
18
|
Parvandeh S, Donehower LA, Katsonis P, Hsu TK, Asmussen J, Lee K, Lichtarge O. EPIMUTESTR: a nearest neighbor machine learning approach to predict cancer driver genes from the evolutionary action of coding variants. Nucleic Acids Res 2022; 50:e70. [PMID: 35412634 PMCID: PMC9262594 DOI: 10.1093/nar/gkac215] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 03/17/2022] [Accepted: 03/21/2022] [Indexed: 02/01/2023] Open
Abstract
Discovering rare cancer driver genes is difficult because their mutational frequency is too low for statistical detection by computational methods. EPIMUTESTR is an integrative nearest-neighbor machine learning algorithm that identifies such marginal genes by modeling the fitness of their mutations with the phylogenetic Evolutionary Action (EA) score. Over cohorts of sequenced patients from The Cancer Genome Atlas representing 33 tumor types, EPIMUTESTR detected 214 previously inferred cancer driver genes and 137 new candidates never identified computationally before of which seven genes are supported in the COSMIC Cancer Gene Census. EPIMUTESTR achieved better robustness and specificity than existing methods in a number of benchmark methods and datasets.
Collapse
Affiliation(s)
- Saeid Parvandeh
- To whom correspondence should be addressed. Tel: +1 713 798 7677;
| | - Lawrence A Donehower
- Department of Molecular Virology and Microbiology, Houston, TX 77030, USA,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Teng-Kuei Hsu
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jennifer K Asmussen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Kwanghyuk Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Correspondence may also be addressed to Olivier Lichtarge. Tel: +1 713 798 5646;
| |
Collapse
|
19
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
20
|
Rostami B, Anisuzzaman DM, Wang C, Gopalakrishnan S, Niezgoda J, Yu Z. Multiclass wound image classification using an ensemble deep CNN-based classifier. Comput Biol Med 2021; 134:104536. [PMID: 34126281 DOI: 10.1016/j.compbiomed.2021.104536] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 05/21/2021] [Accepted: 05/22/2021] [Indexed: 10/21/2022]
Abstract
Acute and chronic wounds are a challenge to healthcare systems around the world and affect many people's lives annually. Wound classification is a key step in wound diagnosis that would help clinicians to identify an optimal treatment procedure. Hence, having a high-performance classifier assists wound specialists to classify wound types with less financial and time costs. Different wound classification methods based on machine learning and deep learning have been proposed in the literature. In this study, we have developed an ensemble Deep Convolutional Neural Network-based classifier to categorize wound images into multiple classes including surgical, diabetic, and venous ulcers. The output classification scores of two classifiers (namely, patch-wise and image-wise) are fed into a Multilayer Perceptron to provide a superior classification performance. A 5-fold cross-validation approach is used to evaluate the proposed method. We obtained maximum and average classification accuracy values of 96.4% and 94.28% for binary and 91.9% and 87.7% for 3-class classification problems. The proposed classifier was compared with some common deep classifiers and showed significantly higher accuracy metrics. We also tested the proposed method on the Medetec wound image dataset, and the accuracy values of 91.2% and 82.9% were obtained for binary and 3-class classifications. The results show that our proposed method can be used effectively as a decision support system in classification of wound images or other related clinical applications.
Collapse
Affiliation(s)
- Behrouz Rostami
- Electrical Engineering Department, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - D M Anisuzzaman
- Computer Science Department, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Chuanbo Wang
- Computer Science Department, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | | | - Jeffrey Niezgoda
- Advancing the Zenith of Healthcare (AZH) Wound and Vascular Center, Milwaukee, WI, USA
| | - Zeyun Yu
- Electrical Engineering Department, University of Wisconsin-Milwaukee, Milwaukee, WI, USA; Computer Science Department, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
| |
Collapse
|
21
|
Ülgen E, Sezerman OU. driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics 2021; 22:263. [PMID: 34030627 PMCID: PMC8142487 DOI: 10.1186/s12859-021-04203-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 05/17/2021] [Indexed: 12/15/2022] Open
Abstract
Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04203-7.
Collapse
Affiliation(s)
- Ege Ülgen
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey.
| | - O Uğur Sezerman
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| |
Collapse
|
22
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 131] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
23
|
Integration of multiomics data with graph convolutional networks to identify new cancer genes and their associated molecular mechanisms. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00325-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
24
|
Gaudelet T, Malod-Dognin N, Pržulj N. Integrative Data Analytic Framework to Enhance Cancer Precision Medicine. NETWORK AND SYSTEMS MEDICINE 2021; 4:60-73. [PMID: 33796878 PMCID: PMC8006589 DOI: 10.1089/nsm.2020.0015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/15/2021] [Indexed: 12/20/2022] Open
Abstract
With the advancement of high-throughput biotechnologies, we increasingly accumulate biomedical data about diseases, especially cancer. There is a need for computational models and methods to sift through, integrate, and extract new knowledge from the diverse available data, to improve the mechanistic understanding of diseases and patient care. To uncover molecular mechanisms and drug indications for specific cancer types, we develop an integrative framework able to harness a wide range of diverse molecular and pan-cancer data. We show that our approach outperforms the competing methods and can identify new associations. Furthermore, it captures the underlying biology predictive of drug response. Through the joint integration of data sources, our framework can also uncover links between cancer types and molecular entities for which no prior knowledge is available. Our new framework is flexible and can be easily reformulated to study any biomedical problem.
Collapse
Affiliation(s)
- Thomas Gaudelet
- Department of Computer Science, University College London, London, United Kingdom
| | - Noël Malod-Dognin
- Department of Computer Science, University College London, London, United Kingdom
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Nataša Pržulj
- Department of Computer Science, University College London, London, United Kingdom
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- ICREA, Barcelona, Spain
| |
Collapse
|
25
|
Auslander N, Gussow AB, Koonin EV. Incorporating Machine Learning into Established Bioinformatics Frameworks. Int J Mol Sci 2021; 22:2903. [PMID: 33809353 PMCID: PMC8000113 DOI: 10.3390/ijms22062903] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 03/08/2021] [Accepted: 03/10/2021] [Indexed: 12/23/2022] Open
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges.
Collapse
Affiliation(s)
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA;
| |
Collapse
|
26
|
Dong G, Wendl MC, Zhang B, Ding L, Huang KL. AeQTL: eQTL analysis using region-based aggregation of rare genomic variants. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2021; 26:172-183. [PMID: 33691015 PMCID: PMC8050802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Concurrently available genomic and transcriptomic data from large cohorts provide opportunities to discover expression quantitative trait loci (eQTLs)-genetic variants associated with gene expression changes. However, the statistical power of detecting rare variant eQTLs is often limited and most existing eQTL tools are not compatible with sequence variant file formats. We have developed AeQTL (Aggregated eQTL), a software tool that performs eQTL analysis on variants aggregated according to user-specified regions and is designed to accommodate standard genomic files. AeQTL consistently yielded similar or higher powers for identifying rare variant eQTLs than single-variant tests. Using AeQTL, we discovered that aggregated rare germline truncations in cis exomic regions are significantly associated with the expression of BRCA1 and SLC25A39 in breast tumors. In a somatic mutation pan-cancer analysis, aggregated mutations of those predicted to be missense versus truncations were differentially associated with gene expressions of cancer drivers, and somatic truncation eQTLs were further identified as a new multi-omic classifier of oncogenes versus tumor-suppressor genes. AeQTL is easy to use and customize, allowing a broad application for discovering rare variants, including coding and noncoding variants, associated with gene expression. AeQTL is implemented in Python and the source code is freely available at https://github.com/Huan-glab/AeQTL under the MIT license.
Collapse
Affiliation(s)
- Guanlan Dong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Michael C. Wendl
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Li Ding
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Kuan-lin Huang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA,Corresponding:
| |
Collapse
|
27
|
Preston RJ, Rühm W, Azzam EI, Boice JD, Bouffler S, Held KD, Little MP, Shore RE, Shuryak I, Weil MM. Adverse outcome pathways, key events, and radiation risk assessment. Int J Radiat Biol 2020; 97:804-814. [PMID: 33211576 PMCID: PMC10666972 DOI: 10.1080/09553002.2020.1853847] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/09/2020] [Accepted: 11/12/2020] [Indexed: 12/12/2022]
Abstract
The overall aim of this contribution to the 'Second Bill Morgan Memorial Special Issue' is to provide a high-level review of a recent report developed by a Committee for the National Council on Radiation Protection and Measurements (NCRP) titled 'Approaches for Integrating Information from Radiation Biology and Epidemiology to Enhance Low-Dose Health Risk Assessment'. It derives from previous NCRP Reports and Commentaries that provide the case for integrating data from radiation biology studies (available and proposed) with epidemiological studies (also available and proposed) to develop Biologically-Based Dose-Response (BBDR) models. In this review, it is proposed for such models to leverage the adverse outcome pathways (AOP) and key events (KE) approach for better characterizing radiation-induced cancers and circulatory disease (as the example for a noncancer outcome). The review discusses the current state of knowledge of mechanisms of carcinogenesis, with an emphasis on radiation-induced cancers, and a similar discussion for circulatory disease. The types of the various informative BBDR models are presented along with a proposed generalized BBDR model for cancer and a more speculative one for circulatory disease. The way forward is presented in a comprehensive discussion of the research needs to address the goal of enhancing health risk assessment of exposures to low doses of radiation. The use of an AOP/KE approach for developing a mechanistic framework for BBDR models of radiation-induced cancer and circulatory disease is considered to be a viable one based upon current knowledge of the mechanisms of formation of these adverse health outcomes and the available technical capabilities and computational advances. The way forward for enhancing low-dose radiation risk estimates will require there to be a tight integration of epidemiology data and radiation biology information to meet the goals of relevance and sensitivity of the adverse health outcomes required for overall health risk assessment at low doses and dose rates.
Collapse
Affiliation(s)
- R Julian Preston
- Office of Air and Radiation, Radiation Protection Division, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
| | - Werner Rühm
- Institute of Radiation Medicine, Helmholtz Zentrum Muenchen, German Research Center for Environmental Health (GmbH) Ingolstaedter, Neuherberg, Germany
| | - Edouard I Azzam
- Department of Radiology, Rutgers Biomedical and Health Sciences, New Jersey Medical School, Newark, NJ, USA
| | - John D Boice
- National Council on Radiation Protection and Measurement, Bethesda, MD, USA
| | - Simon Bouffler
- Radiation Effects Department, Centre for Radiation, Chemical and Environmental Hazards, Public Health England, Oxfordshire, UK
| | - Kathryn D Held
- National Council on Radiation Protection and Measurements, Bethesda, MD, USA
| | - Mark P Little
- Radiation Epidemiology Branch, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Roy E Shore
- Department of Population Health, New York University School of Medicine, New York, NY, USA
| | - Igor Shuryak
- Center for Radiological Research, Columbia University Irving Medical Center, New York, NY, USA
| | - Michael M Weil
- Department of Environmental and Radiological Health Sciences, Colorado State University, Fort Collins, CO, USA
| |
Collapse
|
28
|
Chen X, Fan Z, Li KKW, Wu G, Yang Z, Gao X, Liu Y, Wu H, Chen H, Tang Q, Chen L, Wang Y, Mao Y, Ng HK, Shi Z, Yu J, Zhou L. Molecular subgrouping of medulloblastoma based on few-shot learning of multitasking using conventional MR images: a retrospective multicenter study. Neurooncol Adv 2020; 2:vdaa079. [PMID: 32760911 PMCID: PMC7393307 DOI: 10.1093/noajnl/vdaa079] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Background The determination of molecular subgroups—wingless (WNT), sonic hedgehog (SHH), Group 3, and Group 4—of medulloblastomas is very important for prognostication and risk-adaptive treatment strategies. Due to the rare disease characteristics of medulloblastoma, we designed a unique multitask framework for the few-shot scenario to achieve noninvasive molecular subgrouping with high accuracy. Methods We introduced a multitask technique based on mask regional convolutional neural network (Mask-RCNN). By effectively utilizing the comprehensive information including genotyping, tumor mask, and prognosis, multitask technique, on the one hand, realized multi-purpose modeling and simultaneously, on the other hand, promoted the accuracy of the molecular subgrouping. One hundred and thirteen medulloblastoma cases were collected from 4 hospitals during the 8-year period in the retrospective study, which were divided into 3-fold cross-validation cohorts (N = 74) from 2 hospitals and independent testing cohort (N = 39) from the other 2 hospitals. Comparative experiments of different auxiliary tasks were designed to illustrate the effect of multitasking in molecular subgrouping. Results Compared to the single-task framework, the multitask framework that combined 3 tasks increased the average accuracy of molecular subgrouping from 0.84 to 0.93 in cross-validation and from 0.79 to 0.85 in independent testing. The average area under the receiver operating characteristic curves (AUCs) of molecular subgrouping were 0.97 in cross-validation and 0.92 in independent testing. The average AUCs of prognostication also reached to 0.88 in cross-validation and 0.79 in independent testing. The tumor segmentation results achieved the Dice coefficient of 0.90 in both cohorts. Conclusions The multitask Mask-RCNN is an effective method for the molecular subgrouping and prognostication of medulloblastomas with high accuracy in few-shot learning.
Collapse
Affiliation(s)
- Xi Chen
- Department of Electronic Engineering, Fudan University, Shanghai, China
| | - Zhen Fan
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Kay Ka-Wai Li
- Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong, China SAR
| | - Guoqing Wu
- Department of Electronic Engineering, Fudan University, Shanghai, China
| | - Zhong Yang
- Department of Radiology, Huashan Hospital, Fudan University, Shanghai, China
| | - Xin Gao
- Department of Neurosurgery, Huadong Hospital, Fudan University, Shanghai, China
| | - Yingchao Liu
- Department of Neurosurgery, Shandong Provincial Hospital, Jinan, China
| | - Haibo Wu
- Department of Pathology, the First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Hong Chen
- Department of Pathology, Huashan Hospital, Fudan University, Shanghai, China
| | - Qisheng Tang
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Liang Chen
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Yuanyuan Wang
- Department of Electronic Engineering, Fudan University, Shanghai, China
| | - Ying Mao
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Ho-Keung Ng
- Department of Anatomical and Cellular Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Hong Kong, China SAR
| | - Zhifeng Shi
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| | - Jinhua Yu
- Department of Electronic Engineering, Fudan University, Shanghai, China
| | - Liangfu Zhou
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, China
| |
Collapse
|
29
|
Cutigi JF, Evangelista RF, Ramos RH, de Oliveira Lage Ferreira C, Evangelista AF, de Carvalho ACPLF, Simao A. Combining Mutation and Gene Network Data in a Machine Learning Approach for False-Positive Cancer Driver Gene Discovery. LECTURE NOTES IN COMPUTER SCIENCE 2020:81-92. [DOI: 10.1007/978-3-030-65775-8_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
30
|
Song J, Peng W, Wang F, Wang J. Identifying driver genes involving gene dysregulated expression, tissue-specific expression and gene-gene network. BMC Med Genomics 2019; 12:168. [PMID: 31888619 PMCID: PMC6936147 DOI: 10.1186/s12920-019-0619-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 11/11/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Cancer as a kind of genomic alteration disease each year deprives many people's life. The biggest challenge to overcome cancer is to identify driver genes that promote the cancer development from a huge amount of passenger mutations that have no effect on the selective growth advantage of cancer. In order to solve those problems, some researchers have started to focus on identification of driver genes by integrating networks with other biological information. However, more efforts should be needed to improve the prediction performance. METHODS Considering the facts that driver genes have impact on expression of their downstream genes, they likely interact with each other to form functional modules and those modules should tend to be expressed similarly in the same tissue. We proposed a novel model named by DyTidriver to identify driver genes through involving the gene dysregulated expression, tissue-specific expression and variation frequency into the human functional interaction network (e.g. human FIN). RESULTS This method was applied on 974 breast, 316 prostate and 230 lung cancer patients. The consequence shows our method outperformed other five existing methods in terms of Fscore, Precision and Recall values. The enrichment and cociter analysis illustrate DyTidriver can not only identifies the driver genes enriched in some significant pathways but also has the capability to figure out some unknown driver genes. CONCLUSION The final results imply that driver genes are those that impact more dysregulated genes and express similarly in the same tissue.
Collapse
Affiliation(s)
- Junrong Song
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China
| | - Wei Peng
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China.
| | - Feng Wang
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, Hunan, 410083, People's Republic of China
| |
Collapse
|