1
|
Yi S, Xie M. DriverMEDS: Cancer driver gene identification using mutual exclusivity from embeded features and driver mutation scoring. Methods 2025; 239:22-29. [PMID: 40113153 DOI: 10.1016/j.ymeth.2025.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 01/24/2025] [Accepted: 03/14/2025] [Indexed: 03/22/2025] Open
Abstract
Efficiently identifying cancer driver genes plays a key role in the cancer development, diagnosis and treatment. Current unsupervised driver gene identification methods typically integrate multi-omics data into gene function networks and employ network embedding algorithms to learn gene features. Additionally, they consider mutual exclusivity and mutation frequency as crucial concepts in identifying driver genes. However, existing approaches neglect the possible important implications of mutual exclusivity in the embedding space. Furthermore, they simply assume that all driver genes exhibit high mutation frequencies. Fortunately, we explored the mutual exclusivity implanted in the learned features and have verified that the Euclidean distances between learned features are strongly related to the mutual exclusivity and they can reveal more information for the mutual exclusivity. Thus, we designed an unsupervised driver gene predicting framework DriverMEDS based on the above idea and a novel driver mutation scoring strategy. First, we design a feature clustering algorithm to generate gene modules. In each module, the Euclidean distances of learned features are used to calculate a module importance score for each gene based on the related mutual exclusivity. Then, following the fact that most of driver genes have intermediate mutation frequencies, a driver mutation scoring function is designed for each gene to optimize the existing mutation frequency scoring strategy. Finally, the weighted sum of the module importance score and the driver mutation score is used to prioritize the genes. The experiment results and analysis show that DriverMEDS could detect novel cancer driver genes and relevant function modules, and outperforms other five state-of-the-art methods for cancer driver identification.
Collapse
Affiliation(s)
- Sichen Yi
- Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China.
| | - Minzhu Xie
- Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha 410081, China; College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China.
| |
Collapse
|
2
|
Deng Z, Wu J, Chen X, Li G, Liu J, Hu Z, Li R, Deng W. MNMO: discover driver genes from a multi-omics data based-multi-layer network. Bioinformatics 2025; 41:btaf134. [PMID: 40152235 PMCID: PMC12033032 DOI: 10.1093/bioinformatics/btaf134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 03/05/2025] [Accepted: 03/25/2025] [Indexed: 03/29/2025] Open
Abstract
MOTIVATION Cancer as a public health problem is driven by genomic variations in "cancer driver" genes. The identification of driver genes is critical for the discovery of key biomarkers and the development of personalized therapy. RESULTS We propose a prediction method MNMO: a multi-layer network model based on multi-omics data. MNMO firstly constructs a dynamically adjusted four-layer network composed of miRNAs and three kinds of genes with different features. Then three kinds of scores, i.e. control capacity, mutation score, and network score, are devised and calculated by harmonic mean to produce the integrated gene score. Experiments were performed on three kinds of real cancer data to compare the identification performance of method MNMO with that of six state-of-the-art ones. The results indicate that method MNMO presents the best identification performance under most circumstances. The genes prioritized by method MNMO not only have a better match to the benchmark ones than those identified by the other methods, but also are all associated with the development and progression of cancers. In addition, some extended versions of method MNMO can further achieve better performance on most evaluation metrics for some specific datasets. They may be more conducive to identifying tissue-specific genes, which has been verified through a number of experiments. AVAILABILITY AND IMPLEMENTATION The source code and the R package "MNMO" are available at https://github.com/Zheng-D/MNMO. The dataset and code are archived at https://doi.org/10.5281/zenodo.14969986.
Collapse
Affiliation(s)
- Zheng Deng
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- College of Computer Science and Information Engineering, Guangxi Normal University, Guilin 541004, China
| | - Jingli Wu
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- College of Computer Science and Information Engineering, Guangxi Normal University, Guilin 541004, China
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
| | - Xiaorong Chen
- College of Computer, National University of Defense Technology, Changsha 410073, China
| | - Gaoshi Li
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- College of Computer Science and Information Engineering, Guangxi Normal University, Guilin 541004, China
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
| | - Jiafei Liu
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- College of Computer Science and Information Engineering, Guangxi Normal University, Guilin 541004, China
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin 541004, China
| | - Zhipeng Hu
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- College of Computer Science and Information Engineering, Guangxi Normal University, Guilin 541004, China
| | - Rongyuan Li
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin 541004, China
- College of Computer Science and Information Engineering, Guangxi Normal University, Guilin 541004, China
| | - Wansu Deng
- Department of Radiopharmaceuticals, School of Pharmacy, Nanjing Medical University, Nanjing 211166, China
| |
Collapse
|
3
|
Das S, Patel V, Chakravarty S, Ghosh A, Mukhopadhyay A, Biswas NK. An ensemble machine learning-based performance evaluation identifies top In-Silico pathogenicity prediction methods that best classify driver mutations in cancer. BioData Min 2025; 18:7. [PMID: 39833905 PMCID: PMC11744934 DOI: 10.1186/s13040-024-00420-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 12/26/2024] [Indexed: 01/22/2025] Open
Abstract
BACKGROUND AND OBJECTIVE Accurate identification and prioritization of driver-mutations in cancer is critical for effective patient management. Despite the presence of numerous bioinformatic algorithms for estimating mutation pathogenicity, there is significant variation in their assessments. This inconsistency is evident even for well-established cancer driver mutations. This study aims to develop an ensemble machine learning approach to evaluate the performance (rank) of pathogenic and conservation scoring algorithms (PCSAs) based on their ability to distinguish pathogenic driver mutations from benign passenger (non-driver) mutations in head and neck squamous cell carcinoma (HNSC). METHODS The study used a dataset from 502 HNSC patients, classifying mutations based on 299 known high-confidence cancer driver genes. Missense somatic mutations in driver genes were treated as driver mutations, while non-driver mutations were randomly selected from other genes. Each mutation was annotated with 41 PCSAs. Three machine learning algorithms-logistic regression, random forest, and support vector machine-along with recursive feature elimination, were used to rank these PCSAs. The final ranking of the PCSAs was determined using rank-average-sort and rank-sum-sort methods. RESULTS The random forest algorithm emerged as the top performer among the three tested ML algorithms, with an AUC-ROC of 0.89, compared to 0.83 for the other two, in distinguishing pathogenic driver mutations from benign passenger mutations using all 41 PCSAs. The top 11 PCSAs were selected based on the first quintile cut-off from the final rank-sum distribution. Classifiers built using these top 11 PCSAs (DEOGEN2, Integrated_fitCons, MVP, etc.) demonstrated significantly higher performance (p-value < 2.22e-16) compared to those using the remaining 30 PCSAs across all three ML algorithms, in separating pathogenic driver from benign passenger mutations. The top PCSAs demonstrated strong performance on a validation cohort including independent HNSC and other cancer types: breast, lung, and colorectal - reflecting its consistency, robustness and generalizability. CONCLUSIONS The ensemble machine learning approach effectively evaluates the performance of PCSAs based on their ability to differentiate pathogenic drivers from benign passenger mutations in HNSC and other cancer types. Notably, some well-known PCSAs performed poorly, underscoring the importance of data-driven selection over relying solely on popularity.
Collapse
Affiliation(s)
- Subrata Das
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Vatsal Patel
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
| | - Shouvik Chakravarty
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Arnab Ghosh
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India
- Biotechnology Research and Innovation Council-Regional Centre for Biotechnology (BRIC- RCB), Faridabad, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, 741235, India.
| | - Nidhan K Biswas
- Biotechnology Research and Innovation Council-National Institute of Biomedical Genomics (BRIC-NIBMG), National Institute of Biomedical Genomics, Kalyani, West Bengal, India.
| |
Collapse
|
4
|
Shi P, Han J, Zhang Y, Li G, Zhou X. IMI-driver: Integrating multi-level gene networks and multi-omics for cancer driver gene identification. PLoS Comput Biol 2024; 20:e1012389. [PMID: 39186807 PMCID: PMC11379397 DOI: 10.1371/journal.pcbi.1012389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 09/06/2024] [Accepted: 08/05/2024] [Indexed: 08/28/2024] Open
Abstract
The identification of cancer driver genes is crucial for early detection, effective therapy, and precision medicine of cancer. Cancer is caused by the dysregulation of several genes at various levels of regulation. However, current techniques only capture a limited amount of regulatory information, which may hinder their efficacy. In this study, we present IMI-driver, a model that integrates multi-omics data into eight biological networks and applies Multi-view Collaborative Network Embedding to embed the gene regulation information from the biological networks into a low-dimensional vector space to identify cancer drivers. We apply IMI-driver to 29 cancer types from The Cancer Genome Atlas (TCGA) and compare its performance with nine other methods on nine benchmark datasets. IMI-driver outperforms the other methods, demonstrating that multi-level network integration enhances prediction accuracy. We also perform a pan-cancer analysis using the genes identified by IMI-driver, which confirms almost all our selected candidate genes as known or potential drivers. Case studies of the new positive genes suggest their roles in cancer development and progression.
Collapse
Affiliation(s)
- Peiting Shi
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Junmin Han
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Yinghao Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Guanpu Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Xionghui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, People's Republic of China
- Key Laboratory of Smart Farming for Agricultural Animals, Ministry of Agriculture and Rural Affairs, People's Republic of China
| |
Collapse
|
5
|
Patil SS, Roberts SA, Gebremedhin AH. Network analysis of driver genes in human cancers. FRONTIERS IN BIOINFORMATICS 2024; 4:1365200. [PMID: 39040139 PMCID: PMC11260686 DOI: 10.3389/fbinf.2024.1365200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 06/14/2024] [Indexed: 07/24/2024] Open
Abstract
Cancer is a heterogeneous disease that results from genetic alteration of cell cycle and proliferation controls. Identifying mutations that drive cancer, understanding cancer type specificities, and delineating how driver mutations interact with each other to establish disease is vital for identifying therapeutic vulnerabilities. Such cancer specific patterns and gene co-occurrences can be identified by studying tumor genome sequences, and networks have proven effective in uncovering relationships between sequences. We present two network-based approaches to identify driver gene patterns among tumor samples. The first approach relies on analysis using the Directed Weighted All Nearest Neighbors (DiWANN) model, which is a variant of sequence similarity network, and the second approach uses bipartite network analysis. A data reduction framework was implemented to extract the minimal relevant information for the sequence similarity network analysis, where a transformed reference sequence is generated for constructing the driver gene network. This data reduction process combined with the efficiency of the DiWANN network model, greatly lowered the computational cost (in terms of execution time and memory usage) of generating the networks enabling us to work at a much larger scale than previously possible. The DiWANN network helped us identify cancer types in which samples were more closely connected to each other suggesting they are less heterogeneous and potentially susceptible to a common drug. The bipartite network analysis provided insight into gene associations and co-occurrences. We identified genes that were broadly mutated in multiple cancer types and mutations exclusive to only a few. Additionally, weighted one-mode gene projections of the bipartite networks revealed a pattern of occurrence of driver genes in different cancers. Our study demonstrates that network-based approaches can be an effective tool in cancer genomics. The analysis identifies co-occurring and exclusive driver genes and mutations for specific cancer types, providing a better understanding of the driver genes that lead to tumor initiation and evolution.
Collapse
Affiliation(s)
- Shruti S. Patil
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
| | - Steven A. Roberts
- School of Molecular Biosciences, Washington State University, Pullman, WA, United States
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, United States
- UVM’s Larner College of Medicine, University of Vermont Cancer Center, Burlington, VT, United States
| | - Assefaw H. Gebremedhin
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
| |
Collapse
|
6
|
Wang Y, Zhou B, Ru J, Meng X, Wang Y, Liu W. Advances in computational methods for identifying cancer driver genes. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21643-21669. [PMID: 38124614 DOI: 10.3934/mbe.2023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Cancer driver genes (CDGs) are crucial in cancer prevention, diagnosis and treatment. This study employed computational methods for identifying CDGs, categorizing them into four groups. The major frameworks for each of these four categories were summarized. Additionally, we systematically gathered data from public databases and biological networks, and we elaborated on computational methods for identifying CDGs using the aforementioned databases. Further, we summarized the algorithms, mainly involving statistics and machine learning, used for identifying CDGs. Notably, the performances of nine typical identification methods for eight types of cancer were compared to analyze the applicability areas of these methods. Finally, we discussed the challenges and prospects associated with methods for identifying CDGs. The present study revealed that the network-based algorithms and machine learning-based methods demonstrated superior performance.
Collapse
Affiliation(s)
- Ying Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Bohao Zhou
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Jidong Ru
- School of Textile Garment and Design, Changshu Institute of Technology, Changshu 215500, China
| | - Xianglian Meng
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| | - Yundong Wang
- School of Computer Science and Engineering, Changshu Institute of Technology, Changshu 215500, China
| | - Wenjie Liu
- School of Computer Information and Engineering, Changzhou Institute of Technology, Changzhou 213032, China
| |
Collapse
|
7
|
Peng W, Yu P, Dai W, Fu X, Liu L, Pan Y. A Graph Convolution Network-Based Model for Prioritizing Personalized Cancer Driver Genes of Individual Patients. IEEE Trans Nanobioscience 2023; 22:744-754. [PMID: 37195839 DOI: 10.1109/tnb.2023.3277316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Cancer driver genes are mutated genes that play a key role in the growth of cancer cells. Accurately identifying the cancer driver genes helps us understand cancer's pathogenesis and develop effective treatment strategies. However, cancers are highly heterogeneous diseases; patients with the same cancer type may have different genomic characteristics and clinical symptoms. Hence, it is urgent to devise effective methods to identify personalized cancer driver genes of individual patients to help determine whether a patient can be treated with a certain targeted drug. This work presents a method for predicting personalized cancer Driver genes of individual patients based on Graph Convolution Networks and Neighbor Interactions called NIGCNDriver. NIGCNDriver first constructs a gene-sample association matrix using the associations between a sample and its known driver genes. Then, it employs graph convolution models on the gene-sample network to aggregate neighbor node features, and themself features, and then combines with the element-wise level interactions between neighbors to learn new feature representations for the samples and gene nodes. Finally, a linear correlation coefficient decoder is used to reconstruct the association between the sample and the mutant gene, enabling the prediction of a personalized driver gene for the individual sample. We applied the NIGCNDriver method to predict cancer driver genes for individual samples in the TCGA and cancer cell line datasets. The results show that our method outperforms the baseline methods in cancer driver gene prediction for individual samples.
Collapse
|
8
|
Zhu X, Zhao W, Zhou Z, Gu X. Unraveling the Drivers of Tumorigenesis in the Context of Evolution: Theoretical Models and Bioinformatics Tools. J Mol Evol 2023:10.1007/s00239-023-10117-0. [PMID: 37246992 DOI: 10.1007/s00239-023-10117-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 05/09/2023] [Indexed: 05/30/2023]
Abstract
Cancer originates from somatic cells that have accumulated mutations. These mutations alter the phenotype of the cells, allowing them to escape homeostatic regulation that maintains normal cell numbers. The emergence of malignancies is an evolutionary process in which the random accumulation of somatic mutations and sequential selection of dominant clones cause cancer cells to proliferate. The development of technologies such as high-throughput sequencing has provided a powerful means to measure subclonal evolutionary dynamics across space and time. Here, we review the patterns that may be observed in cancer evolution and the methods available for quantifying the evolutionary dynamics of cancer. An improved understanding of the evolutionary trajectories of cancer will enable us to explore the molecular mechanism of tumorigenesis and to design tailored treatment strategies.
Collapse
Affiliation(s)
- Xunuo Zhu
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wenyi Zhao
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine, Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China.
- Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China.
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
9
|
Meng P, Wang G, Guo H, Jiang T. Identifying cancer driver genes using a two-stage random walk with restart on a gene interaction network. Comput Biol Med 2023; 158:106810. [PMID: 37011433 DOI: 10.1016/j.compbiomed.2023.106810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 03/08/2023] [Accepted: 03/20/2023] [Indexed: 04/03/2023]
Abstract
Cancer development and progression are significantly influenced by cancer driver genes. Understanding cancer driver genes and their mechanisms of action is essential for developing effective cancer treatments. As a result, identifying driver genes is important for drug development, cancer diagnosis, and treatment. Here, we present an algorithm to discover driver genes based on the two-stage random walk with restart (RWR), and the modified method for calculating the transition probability matrix in random walk algorithm. First, we performed the first stage of RWR on the whole gene interaction network, in which we employ a new method for calculating the transition probability matrix and extracted the subnetwork based on nodes that had a high correlation with the seed nodes. The subnetwork was then applied to the second stage of RWR and the nodes were re-ranked in the subnetwork. Our approach outperformed existing methods in identifying driver genes. The outcome of the effect of three gene interaction networks, two rounds of random walk, and the seed nodes' sensitivity were all compared at the same time. In addition, we identified several potential driver genes, some of which are involved in driving cancer development. Overall, our method is efficient in various cancer types, significantly outperforms existing methods, and can identify possible driver genes.
Collapse
|
10
|
Zhang D, Wang Y, Zhao F, Yang Q. Integrated multiomics analyses unveil the implication of a costimulatory molecule score on tumor aggressiveness and immune evasion in breast cancer: A large-scale study through over 8,000 patients. Comput Biol Med 2023; 159:106866. [PMID: 37068318 DOI: 10.1016/j.compbiomed.2023.106866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 02/05/2023] [Accepted: 03/30/2023] [Indexed: 04/08/2023]
Abstract
BACKGROUND Although immunotherapy has revolutionised cancer management, reliable genomic biomarkers for identifying eligible patient subpopulations are lacking. Costimulatory molecules play a crucial role in mounting anti-tumour responses, and clinical trials targeting these novel biomarkers are underway. However, whether these molecules can determine tumour aggressiveness and the risk of tumour evasion in breast cancer (BC) remains largely unknown. METHODS The whole-tissue transcriptomic data of 8236 patients with BC from 15 independent cohorts were extracted. An integrated scoring system named 'costimulatory molecule score' (CMS) was constructed and sufficient validated using least absolute shrinkage and selection operator regression (1000 iterations) and the random survival forest algorithm (1000 trees). The correlation among CMSs, cancer genotypes and clinicopathological characteristics was examined. Extensive multiomics and immunogenomic analyses were performed to investigate and verify the association among CMSs, enriched pathways, potential intrinsic and extrinsic immune escape mechanisms, immunotherapy response and therapeutic options. RESULTS The predictive role of CMS model that relies on expression pattern of merely 5 costimulatory genes for prognosis is almost universally applicable to BC patients in a platform-independent manner. Through internal and external in silico validation, high CMS was characterized by favorable genotypes but decreased tumor immunogenicity, activation of stroma, immune-suppressive states and potential immunotherapeutic resistance. Similar results were observed in a real-world immunotherapy cohort and Pan-Cancer analysis. CONCLUSION This comprehensive characterization indicates CMS model may be complemented for predicting tumor aggressiveness and immune evasion in BC patients, underlining the future clinical potential for further exploration of resistance mechanisms and optimization of immunotherapeutic strategies.
Collapse
Affiliation(s)
- Dong Zhang
- Department of Breast Surgery, General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China; Department of Clinical Medicine, The First Clinical College, Shandong University, Jinan, 250012, China
| | - Yingnan Wang
- Department of Breast Surgery, General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China; Department of Clinical Medicine, The First Clinical College, Shandong University, Jinan, 250012, China
| | - Faming Zhao
- Key Laboratory of Environmental Health, Ministry of Education & Ministry of Environmental Protection, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Qifeng Yang
- Department of Breast Surgery, General Surgery, Qilu Hospital of Shandong University, Jinan, 250012, China; Pathology Tissue Bank, Qilu Hospital of Shandong University, Jinan, Shandong, 250012, China; Research Institute of Breast Cancer, Shandong University, Jinan, 250102, China.
| |
Collapse
|
11
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:1958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
12
|
Yang H, Gan L, Chen R, Li D, Zhang J, Wang Z. From multi-omics data to the cancer druggable gene discovery: a novel machine learning-based approach. Brief Bioinform 2023; 24:6896032. [PMID: 36515158 DOI: 10.1093/bib/bbac528] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/31/2022] [Accepted: 11/07/2022] [Indexed: 12/15/2022] Open
Abstract
The development of targeted drugs allows precision medicine in cancer treatment and optimal targeted therapies. Accurate identification of cancer druggable genes helps strengthen the understanding of targeted cancer therapy and promotes precise cancer treatment. However, rare cancer-druggable genes have been found due to the multi-omics data's diversity and complexity. This study proposes deep forest for cancer druggable genes discovery (DF-CAGE), a novel machine learning-based method for cancer-druggable gene discovery. DF-CAGE integrated the somatic mutations, copy number variants, DNA methylation and RNA-Seq data across ˜10 000 TCGA profiles to identify the landscape of the cancer-druggable genes. We found that DF-CAGE discovers the commonalities of currently known cancer-druggable genes from the perspective of multi-omics data and achieved excellent performance on OncoKB, Target and Drugbank data sets. Among the ˜20 000 protein-coding genes, DF-CAGE pinpointed 465 potential cancer-druggable genes. We found that the candidate cancer druggable genes (CDG) are clinically meaningful and divided the CDG into known, reliable and potential gene sets. Finally, we analyzed the omics data's contribution to identifying druggable genes. We found that DF-CAGE reports druggable genes mainly based on the copy number variations (CNVs) data, the gene rearrangements and the mutation rates in the population. These findings may enlighten the future study and development of new drugs.
Collapse
Affiliation(s)
- Hai Yang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237 Shanghai, PR China
| | - Lipeng Gan
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237 Shanghai, PR China
| | - Rui Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237 Shanghai, PR China
| | - Jing Zhang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237 Shanghai, PR China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237 Shanghai, PR China
| |
Collapse
|
13
|
Chen Y, Li H, Sun X. Construction and analysis of sample-specific driver modules for breast cancer. BMC Genomics 2022; 23:717. [PMID: 36266635 PMCID: PMC9583575 DOI: 10.1186/s12864-022-08928-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 10/07/2022] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND It is important to understand the functional impact of somatic mutation and methylation aberration at an individual level to implement precision medicine. Recent studies have demonstrated that the perturbation of gene interaction networks can provide a fundamental link between genotype (or epigenotype) and phenotype. However, it is unclear how individual mutations affect the function of biological networks, especially for individual methylation aberration. To solve this, we provided a sample-specific driver module construction method using the 2-order network theory and hub-gene theory to identify individual perturbation networks driven by mutations or methylation aberrations. RESULTS Our method integrated multi-omics of breast cancer, including genomics, transcriptomics, epigenomics and interactomics, and provided new insight into the synergistic collaboration between methylation and mutation at an individual level. A common driver pattern of breast cancer was identified from a novel perspective of a driver module, which is correlated to the occurrence and development of breast cancer. The constructed driver module reflects the survival prognosis and degree of malignancy among different subtypes of breast cancer. Additionally, subtype-specific driver modules were identified. CONCLUSIONS This study explores the driver module of individual cancer, and contributes to a better understanding of the mechanism of breast cancer driven by the mutations and methylation variations from the point of view of the driver network. This work will help identify new therapeutic combinations of gene mutations and drugs in humans.
Collapse
Affiliation(s)
- Yuanyuan Chen
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096 P. R. China
- College of Science, Nanjing Agricultural University, Nanjing, 210095 P. R. China
| | - Haitao Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096 P. R. China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096 P. R. China
| |
Collapse
|
14
|
Zhang SW, Xu JY, Zhang T. DGMP: Identifying Cancer Driver Genes by Jointing DGCN and MLP from Multi-omics Genomic Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:928-938. [PMID: 36464123 PMCID: PMC10025764 DOI: 10.1016/j.gpb.2022.11.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 10/21/2022] [Accepted: 11/04/2022] [Indexed: 12/03/2022]
Abstract
Identification of cancer driver genes plays an important role in precision oncology research, which is helpful to understand cancer initiation and progression. However, most existing computational methods mainly used the protein-protein interaction (PPI) networks, or treated the directed gene regulatory networks (GRNs) as the undirected gene-gene association networks to identify the cancer driver genes, which will lose the unique structure regulatory information in the directed GRNs, and then affect the outcome of the cancer driver gene identification. Here, based on the multi-omics pan-cancer data (i.e., gene expression, mutation, copy number variation, and DNA methylation), we propose a novel method (called DGMP) to identify cancer driver genes by jointing directed graph convolutional network (DGCN) and multilayer perceptron (MLP). DGMP learns the multi-omics features of genes as well as the topological structure features in GRN with the DGCN model and uses MLP to weigh more on gene features for mitigating the bias toward the graph topological features in the DGCN learning process. The results on three GRNs show that DGMP outperforms other existing state-of-the-art methods. The ablation experimental results on the DawnNet network indicate that introducing MLP into DGCN can offset the performance degradation of DGCN, and jointing MLP and DGCN can effectively improve the performance of identifying cancer driver genes. DGMP can identify not only the highly mutated cancer driver genes but also the driver genes harboring other kinds of alterations (e.g., differential expression and aberrant DNA methylation) or genes involved in GRNs with other cancer genes. The source code of DGMP can be freely downloaded from https://github.com/NWPU-903PR/DGMP.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.
| | - Jing-Yu Xu
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Tong Zhang
- MOE Key Laboratory of Information Fusion Technology, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
15
|
Zhao W, Gu X, Chen S, Wu J, Zhou Z. MODIG: Integrating Multi-Omics and Multi-Dimensional Gene Network for Cancer Driver Gene Identification based on Graph Attention Network Model. Bioinformatics 2022; 38:4901-4907. [PMID: 36094338 DOI: 10.1093/bioinformatics/btac622] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 09/07/2022] [Accepted: 09/10/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying genes that play a causal role in cancer evolution remains one of the biggest challenges in cancer biology. With the accumulation of high-throughput multi-omics data over decades, it becomes a great challenge to effectively integrate these data into the identification of cancer driver genes. RESULTS Here, we propose MODIG, a graph attention network (GAT)-based framework to identify cancer driver genes by combining multi-omics pan-cancer data (mutations, copy number variants, gene expression, and methylation levels) with multi-dimensional gene networks. First, we established diverse types of gene relationship maps based on protein-protein interactions (PPI), gene sequence similarity, KEGG pathway co-occurrence, gene co-expression patterns, and Gene Ontology (GO). Then, we constructed a multi-dimensional gene network consisting of approximately 20,000 genes as nodes and five types of gene associations as multiplex edges. We applied a GAT to model within-dimension interactions to generate a gene representation for each dimension based on this graph. Moreover, we introduced a joint learning module to fuse multiple dimension-specific representations to generate general gene representations. Finally, we used the obtained gene representation to perform a semi-supervised driver gene identification task. The experiment results show that MODIG outperforms the baseline models in terms of area under precision-recall curves (AUPR) and area under the receiver operating characteristic curves (AUROC). AVAILABILITY The MODIG program is available at https://github.com/zjupgx/modig. The code and data underlying this article are also available on Zenodo, at https://doi.org/10.5281/zenodo.7057241. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenyi Zhao
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.,Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.,Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, 310018, China
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Shuqing Chen
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Jian Wu
- Second Affiliated Hospital School of Medicine, and School of Public Health, Zhejiang University, Hangzhou, 310058, China.,Collaborative Innovation Center of Artificial Intelligence by MOE and Zhejiang Provincial Government, College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China.,Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, 310018, China
| | - Zhan Zhou
- Institute of Drug Metabolism and Pharmaceutical Analysis and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.,Innovation Institute for Artificial Intelligence in Medicine, Zhejiang University, Hangzhou, 310018, China.,Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 310058, China
| |
Collapse
|
16
|
Zhou L, Yi Y, Liu C, Chen Z. Constructing a novel prognostic signature of tumor driver genes for breast cancer. Am J Transl Res 2022; 14:4515-4531. [PMID: 35958490 PMCID: PMC9360863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 05/27/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVES To systematically explore the function and prognostic ability of tumor-driver genes (TDGs) in breast carcinoma (BRCA). METHODS Functional enrichment analysis of BRCA differentially expressed TDGs was assesed. We used univariate Cox, lasso, and multivariate Cox regression to identify the independent prognostic TDGs of BRCA. Then we constructed a prognostic signature and verified its predictive performance. Gene set enrichment analysis of the signal pathway revealed the differences between the prognostic signature high- and low-risk groups. Finally, a nomogram related to the prognostic model was established and verified. RESULTS A total of 595 differentially expressed TDGs were identified, which are related to various molecular mechanisms of BRCA progression. We identified 8 independent prognostic TDGs for BRCA and validated their expression and prognosis with public data and clinical samples. The BRCA cohort was divided into training and validation cohorts, and prognostic signatures were constructed separately. The log-rank test showed that the survival rate of the high-risk group was significantly lower than that of the low-risk group in the prognostic signature (P<0.001); the AUC in the three cohorts were 0.805, 0.712, and 0.760, respectively; the nomogram also showed better predictive performance. Analyzing the difference between the two risk subtypes, the high-risk group is mainly enriched in angiogenesis, MTORC1, epithelial-mesenchymal transition and glycolysis, which means it is highly malignant. CONCLUSIONS The prognostic signature and nomogram was confirmed to accurately predict the prognosis of patients with BRCA and we validated the hub genes, suggesting their potential as future therapeutic targets.
Collapse
Affiliation(s)
- Liqiang Zhou
- Department of General Surgery, The Second Affiliated Hospital of Nanchang UniversityNanchang 330006, Jiangxi, China
| | - Yali Yi
- Department of Oncology, The Second Affiliated Hospital of Nanchang UniversityNanchang 330006, Jiangxi, China
| | - Chuan Liu
- Key Laboratory of Molecular Medicine of Jiangxi Province, The Second Affiliated Hospital of Nanchang UniversityNanchang 330006, Jiangxi, China
| | - Zhiqing Chen
- Key Laboratory of Molecular Medicine of Jiangxi Province, The Second Affiliated Hospital of Nanchang UniversityNanchang 330006, Jiangxi, China
| |
Collapse
|
17
|
Wang C, Shi J, Cai J, Zhang Y, Zheng X, Zhang N. DriverRWH: discovering cancer driver genes by random walk on a gene mutation hypergraph. BMC Bioinformatics 2022; 23:277. [PMID: 35831792 PMCID: PMC9281118 DOI: 10.1186/s12859-022-04788-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 06/08/2022] [Indexed: 12/24/2022] Open
Abstract
Background Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data. A critical challenge in cancer genomics is identification of a few cancer driver genes whose mutations cause tumor growth. However, the majority of existing computational approaches underuse the co-occurrence mutation information of the individuals, which are deemed to be important in tumorigenesis and tumor progression, resulting in high rate of false positive. Results To make full use of co-mutation information, we present a random walk algorithm referred to as DriverRWH on a weighted gene mutation hypergraph model, using somatic mutation data and molecular interaction network data to prioritize candidate driver genes. Applied to tumor samples of different cancer types from The Cancer Genome Atlas, DriverRWH shows significantly better performance than state-of-art prioritization methods in terms of the area under the curve scores and the cumulative number of known driver genes recovered in top-ranked candidate genes. Besides, DriverRWH discovers several potential drivers, which are enriched in cancer-related pathways. DriverRWH recovers approximately 50% known driver genes in the top 30 ranked candidate genes for more than half of the cancer types. In addition, DriverRWH is also highly robust to perturbations in the mutation data and gene functional network data. Conclusion DriverRWH is effective among various cancer types in prioritizes cancer driver genes and provides considerable improvement over other tools with a better balance of precision and sensitivity. It can be a useful tool for detecting potential driver genes and facilitate targeted cancer therapies. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04788-7.
Collapse
Affiliation(s)
- Chenye Wang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Junhan Shi
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Jiansheng Cai
- Department of Mathematics, Weifang University, Weifang, 261061, Shandong, China
| | - Yusen Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China
| | - Naiqian Zhang
- School of Mathematics and Statistics, Shandong University, Weihai, 264209, China.
| |
Collapse
|
18
|
Yan J, Hu Z, Li ZW, Sun S, Guo WF. Network Control Models With Personalized Genomics Data for Understanding Tumor Heterogeneity in Cancer. Front Oncol 2022; 12:891676. [PMID: 35712516 PMCID: PMC9195174 DOI: 10.3389/fonc.2022.891676] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 04/12/2022] [Indexed: 11/25/2022] Open
Abstract
Due to rapid development of high-throughput sequencing and biotechnology, it has brought new opportunities and challenges in developing efficient computational methods for exploring personalized genomics data of cancer patients. Because of the high-dimension and small sample size characteristics of these personalized genomics data, it is difficult for excavating effective information by using traditional statistical methods. In the past few years, network control methods have been proposed to solve networked system with high-dimension and small sample size. Researchers have made progress in the design and optimization of network control principles. However, there are few studies comprehensively surveying network control methods to analyze the biomolecular network data of individual patients. To address this problem, here we comprehensively surveyed complex network control methods on personalized omics data for understanding tumor heterogeneity in precision medicine of individual patients with cancer.
Collapse
Affiliation(s)
- Jipeng Yan
- Department of Nephrology, Xijing Hospital, The Fourth Military Medical University, Xi’an, China
| | - Zhuo Hu
- School of Electrical Engineering, Zhengzhou University, Zhengzhou, China
| | - Zong-Wei Li
- School of Electrical Engineering, Zhengzhou University, Zhengzhou, China
| | - Shiren Sun
- Department of Nephrology, Xijing Hospital, The Fourth Military Medical University, Xi’an, China
- *Correspondence: Wei-Feng Guo, ; Shiren Sun,
| | - Wei-Feng Guo
- School of Electrical Engineering, Zhengzhou University, Zhengzhou, China
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
- *Correspondence: Wei-Feng Guo, ; Shiren Sun,
| |
Collapse
|
19
|
Erten C, Houdjedj A, Kazan H, Taleb Bahmed AA. PersonaDrive: A Method for the Identification and Prioritization of Personalized Cancer Drivers. Bioinformatics 2022; 38:3407-3414. [PMID: 35579340 DOI: 10.1093/bioinformatics/btac329] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 05/06/2022] [Accepted: 05/11/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A major challenge in cancer genomics is to distinguish the driver mutations that are causally linked to cancer from passenger mutations that do not contribute to cancer development. The majority of existing methods provide a single driver gene list for the entire cohort of patients. However, since mutation profiles of patients from the same cancer type show a high degree of heterogeneity, a more ideal approach is to identify patient-specific drivers. RESULTS We propose a novel method that integrates genomic data, biological pathways, and protein connectivity information for personalized identification of driver genes. The method is formulated on a personalized bipartite graph for each patient. Our approach provides a personalized ranking of the mutated genes of a patient based on the sum of weighted 'pairwise pathway coverage' scores across all the samples, where appropriate pairwise patient similarity scores are used as weights to normalize these coverage scores. We compare our method against three state-of-the-art patient-specific cancer gene prioritization methods. The comparisons are with respect to a novel evaluation method that takes into account the personalized nature of the problem. We show that our approach outperforms the existing alternatives for both the TCGA and the cell line data. Additionally, we show that the KEGG/Reactome pathways enriched in our ranked genes and those that are enriched in cell lines' reference sets overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods. Our findings can provide valuable information towards the development of personalized treatments and therapies. AVAILABILITY All the code and data are available at https://github.com/abu-compbio/PersonaDrive (archived at https://doi.org/10.5281/zenodo.6520187). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cesim Erten
- Department of Computer Engineering, Antalya Bilim University, Antalya, 07190, Turkey
| | - Aissa Houdjedj
- Department of Computer Engineering, Antalya Bilim University, Antalya, 07190, Turkey.,Department of Computer Engineering, Akdeniz University, Antalya, 07070, Turkey
| | - Hilal Kazan
- Department of Computer Engineering, Antalya Bilim University, Antalya, 07190, Turkey
| | - Ahmed Amine Taleb Bahmed
- Electrical and Computer Engineering Graduate Program, Antalya Bilim University, Antalya, 07190, Turkey
| |
Collapse
|
20
|
Zeng Z, Bromberg Y. Inferring Potential Cancer Driving Synonymous Variants. Genes (Basel) 2022; 13:778. [PMID: 35627162 PMCID: PMC9140830 DOI: 10.3390/genes13050778] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/25/2022] [Accepted: 04/26/2022] [Indexed: 02/01/2023] Open
Abstract
Synonymous single nucleotide variants (sSNVs) are often considered functionally silent, but a few cases of cancer-causing sSNVs have been reported. From available databases, we collected four categories of sSNVs: germline, somatic in normal tissues, somatic in cancerous tissues, and putative cancer drivers. We found that screening sSNVs for recurrence among patients, conservation of the affected genomic position, and synVep prediction (synVep is a machine learning-based sSNV effect predictor) recovers cancer driver variants (termed proposed drivers) and previously unknown putative cancer genes. Of the 2.9 million somatic sSNVs found in the COSMIC database, we identified 2111 proposed cancer driver sSNVs. Of these, 326 sSNVs could be further tagged for possible RNA splicing effects, RNA structural changes, and affected RBP motifs. This list of proposed cancer driver sSNVs provides computational guidance in prioritizing the experimental evaluation of synonymous mutations found in cancers. Furthermore, our list of novel potential cancer genes, galvanized by synonymous mutations, may highlight yet unexplored cancer mechanisms.
Collapse
Affiliation(s)
- Zishuo Zeng
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
21
|
Andrades R, Recamonde-Mendoza M. Machine learning methods for prediction of cancer driver genes: a survey paper. Brief Bioinform 2022; 23:6551145. [PMID: 35323900 DOI: 10.1093/bib/bbac062] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/06/2022] [Accepted: 02/08/2022] [Indexed: 12/21/2022] Open
Abstract
Identifying the genes and mutations that drive the emergence of tumors is a critical step to improving our understanding of cancer and identifying new directions for disease diagnosis and treatment. Despite the large volume of genomics data, the precise detection of driver mutations and their carrying genes, known as cancer driver genes, from the millions of possible somatic mutations remains a challenge. Computational methods play an increasingly important role in discovering genomic patterns associated with cancer drivers and developing predictive models to identify these elements. Machine learning (ML), including deep learning, has been the engine behind many of these efforts and provides excellent opportunities for tackling remaining gaps in the field. Thus, this survey aims to perform a comprehensive analysis of ML-based computational approaches to identify cancer driver mutations and genes, providing an integrated, panoramic view of the broad data and algorithmic landscape within this scientific problem. We discuss how the interactions among data types and ML algorithms have been explored in previous solutions and outline current analytical limitations that deserve further attention from the scientific community. We hope that by helping readers become more familiar with significant developments in the field brought by ML, we may inspire new researchers to address open problems and advance our knowledge towards cancer driver discovery.
Collapse
Affiliation(s)
- Renan Andrades
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| | - Mariana Recamonde-Mendoza
- Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre/RS, Brazil.,Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre/RS, Brazil
| |
Collapse
|
22
|
Wang C, Ma H, Wu W, Lu X. Drug Discovery in Spinal Cord Injury With Ankylosing Spondylitis Identified by Text Mining and Biomedical Databases. Front Genet 2022; 13:799970. [PMID: 35281834 PMCID: PMC8914062 DOI: 10.3389/fgene.2022.799970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 01/19/2022] [Indexed: 11/15/2022] Open
Abstract
Spinal cord injury (SCI) and ankylosing spondylitis (AS) are common inflammatory diseases in spine surgery. However, it is a project where the relationship between the two diseases is ambiguous and the efficiency of drug discovery is limited. Therefore, the study aimed to investigate new drug therapies for SCI and AS. First, text mining was used to obtain the interacting genes related to SCI and AS, and then, the functional analysis was conducted. Protein–protein interaction (PPI) networks were constructed by STRING online and Cytoscape software to identify hub genes. Last, hub genes and potential drugs were performed after undergoing drug–gene interaction analysis, and MicroRNA and transcription factors regulatory networks were also analyzed. Two hundred five genes common to “SCI” and “AS” identified by text mining were enriched in inflammatory responses. PPI network analysis showed that 30 genes constructed two significant modules. Ultimately, nine (SST, VWF, IL1B, IL6, CXCR4, VEGFA, SERPINE1, FN1, and PROS1) out of 30 genes could be targetable by a total of 13 drugs. In conclusion, the novel core genes contribute to a novel insight for latent functional mechanisms and present potential prognostic indicators and therapeutic targets in SCI and AS.
Collapse
|
23
|
Liu Y, Li G, Yang Y, Lu Z, Wang T, Wang X, Liu J. Analysis of N6-Methyladenosine Modification Patterns and Tumor Immune Microenvironment in Pancreatic Adenocarcinoma. Front Genet 2022; 12:752025. [PMID: 35046996 PMCID: PMC8762218 DOI: 10.3389/fgene.2021.752025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 12/10/2021] [Indexed: 12/26/2022] Open
Abstract
Background: Pancreatic adenocarcinoma (PAAD) is a rare cancer with a poor prognosis. N6-methyladenosine (m6A) is the most common mRNA modification. However, little is known about the relationship between m6A modification and the tumor immune microenvironment (TIME) in PAAD. Methods: Based on 22 m6A regulators, m6A modification patterns of PAAD samples extracted from public databases were systematically evaluated and correlated with the tumor immune and prognosis characteristics. An integrated model called the "m6Ascore" was constructed, and its prognostic role was evaluated. Results: Three different m6Aclusters and gene clusters were successively identified; these clusters were characterized by differences in prognosis, immune cell infiltration, and pathway signatures. The m6Ascore was constructed to quantify the m6A modifications of individual patients. Subsequent analysis revealed that m6Ascore was an independent prognostic factor of PAAD and could be a potential indicator to predict the response to immunotherapy. Conclusion: This study comprehensively evaluated the features of m6A modification patterns in PAAD. m6A modification patterns play a non-negligible role in the TIME of PAAD. m6Ascore provides a more holistic understanding of m6A modification in PAAD, and will help clinicians predict the prognosis and response to immunotherapy.
Collapse
Affiliation(s)
- Yong Liu
- Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Guangbing Li
- Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Yang Yang
- Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ziwen Lu
- Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Tao Wang
- Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Xiaoyu Wang
- Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jun Liu
- Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Department of Liver Transplantation and Hepatobiliary Surgery, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| |
Collapse
|
24
|
Integration of Genomic Profiling and Organoid Development in Precision Oncology. Int J Mol Sci 2021; 23:ijms23010216. [PMID: 35008642 PMCID: PMC8745679 DOI: 10.3390/ijms23010216] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/20/2021] [Accepted: 12/22/2021] [Indexed: 11/26/2022] Open
Abstract
Precision oncology involves an innovative personalized treatment strategy for each cancer patient that provides strategies and options for cancer treatment. Currently, personalized cancer medicine is primarily based on molecular matching. Next-generation sequencing and related technologies, such as single-cell whole-transcriptome sequencing, enable the accurate elucidation of the genetic landscape in individual cancer patients and consequently provide clinical benefits. Furthermore, advances in cancer organoid models that represent genetic variations and mutations in individual cancer patients have direct and important clinical implications in precision oncology. This review aimed to discuss recent advances, clinical potential, and limitations of genomic profiling and the use of organoids in breast and ovarian cancer. We also discuss the integration of genomic profiling and organoid models for applications in cancer precision medicine.
Collapse
|
25
|
Cutigi JF, Evangelista AF, Reis RM, Simao A. A computational approach for the discovery of significant cancer genes by weighted mutation and asymmetric spreading strength in networks. Sci Rep 2021; 11:23551. [PMID: 34876593 PMCID: PMC8651746 DOI: 10.1038/s41598-021-02671-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 10/26/2021] [Indexed: 11/25/2022] Open
Abstract
Identifying significantly mutated genes in cancer is essential for understanding the mechanisms of tumor initiation and progression. This task is a key challenge since large-scale genomic studies have reported an endless number of genes mutated at a shallow frequency. Towards uncovering infrequently mutated genes, gene interaction networks combined with mutation data have been explored. This work proposes Discovering Significant Cancer Genes (DiSCaGe), a computational method for discovering significant genes for cancer. DiSCaGe computes a mutation score for the genes based on the type of mutations they have. The influence received for their neighbors in the network is also considered and obtained through an asymmetric spreading strength applied to a consensus gene network. DiSCaGe produces a ranking of prioritized possible cancer genes. An experimental evaluation with six types of cancer revealed the potential of DiSCaGe for discovering known and possible novel significant cancer genes.
Collapse
Affiliation(s)
- Jorge Francisco Cutigi
- Federal Institute of Sao Paulo, Sao Carlos, SP, Brazil.
- University of Sao Paulo, Sao Carlos, SP, Brazil.
| | | | - Rui Manuel Reis
- Molecular Oncology Research Center, Barretos Cancer Hospital, Barretos, SP, Brazil
| | | |
Collapse
|
26
|
Venkatraman DL, Pulimamidi D, Shukla HG, Hegde SR. Tumor relevant protein functional interactions identified using bipartite graph analyses. Sci Rep 2021; 11:21530. [PMID: 34728699 PMCID: PMC8563864 DOI: 10.1038/s41598-021-00879-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 09/30/2021] [Indexed: 12/02/2022] Open
Abstract
An increased surge of -omics data for the diseases such as cancer allows for deriving insights into the affiliated protein interactions. We used bipartite network principles to build protein functional associations of the differentially regulated genes in 18 cancer types. This approach allowed us to combine expression data to functional associations in many cancers simultaneously. Further, graph centrality measures suggested the importance of upregulated genes such as BIRC5, UBE2C, BUB1B, KIF20A and PTH1R in cancer. Pathway analysis of the high centrality network nodes suggested the importance of the upregulation of cell cycle and replication associated proteins in cancer. Some of the downregulated high centrality proteins include actins, myosins and ATPase subunits. Among the transcription factors, mini-chromosome maintenance proteins (MCMs) and E2F family proteins appeared prominently in regulating many differentially regulated genes. The projected unipartite networks of the up and downregulated genes were comprised of 37,411 and 41,756 interactions, respectively. The conclusions obtained by collating these interactions revealed pan-cancer as well as subtype specific protein complexes and clusters. Therefore, we demonstrate that incorporating expression data from multiple cancers into bipartite graphs validates existing cancer associated mechanisms as well as directs to novel interactions and pathways.
Collapse
Affiliation(s)
| | - Deepshika Pulimamidi
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Harsh G Shukla
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India
| | - Shubhada R Hegde
- Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, 560 100, India.
| |
Collapse
|
27
|
Peng W, Tang Q, Dai W, Chen T. Improving cancer driver gene identification using multi-task learning on graph convolutional network. Brief Bioinform 2021; 23:6394994. [PMID: 34643232 DOI: 10.1093/bib/bbab432] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 09/08/2021] [Accepted: 09/18/2021] [Indexed: 01/18/2023] Open
Abstract
Cancer is thought to be caused by the accumulation of driver genetic mutations. Therefore, identifying cancer driver genes plays a crucial role in understanding the molecular mechanism of cancer and developing precision therapies and biomarkers. In this work, we propose a Multi-Task learning method, called MTGCN, based on the Graph Convolutional Network to identify cancer driver genes. First, we augment gene features by introducing their features on the protein-protein interaction (PPI) network. After that, the multi-task learning framework propagates and aggregates nodes and graph features from input to next layer to learn node embedding features, simultaneously optimizing the node prediction task and the link prediction task. Finally, we use a Bayesian task weight learner to balance the two tasks automatically. The outputs of MTGCN assign each gene a probability of being a cancer driver gene. Our method and the other four existing methods are applied to predict cancer drivers for pan-cancer and some single cancer types. The experimental results show that our model shows outstanding performance compared with the state-of-the-art methods in terms of the area under the Receiver Operating Characteristic (ROC) curves and the area under the precision-recall curves. The MTGCN is freely available via https://github.com/weiba/MTGCN.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China.,Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| | - Qi Tang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China.,Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| | - Tielin Chen
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, Yunnan 650500, P. R. China
| |
Collapse
|
28
|
Guo WF, Zhang SW, Zeng T, Akutsu T, Chen L. Network control principles for identifying personalized driver genes in cancer. Brief Bioinform 2021; 21:1641-1662. [PMID: 31711128 DOI: 10.1093/bib/bbz089] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 06/26/2019] [Accepted: 06/27/2019] [Indexed: 02/02/2023] Open
Abstract
To understand tumor heterogeneity in cancer, personalized driver genes (PDGs) need to be identified for unraveling the genotype-phenotype associations corresponding to particular patients. However, most of the existing driver-focus methods mainly pay attention on the cohort information rather than on individual information. Recent developing computational approaches based on network control principles are opening a new way to discover driver genes in cancer, particularly at an individual level. To provide comprehensive perspectives of network control methods on this timely topic, we first considered the cancer progression as a network control problem, in which the expected PDGs are altered genes by oncogene activation signals that can change the individual molecular network from one health state to the other disease state. Then, we reviewed the network reconstruction methods on single samples and introduced novel network control methods on single-sample networks to identify PDGs in cancer. Particularly, we gave a performance assessment of the network structure control-based PDGs identification methods on multiple cancer datasets from TCGA, for which the data and evaluation package also are publicly available. Finally, we discussed future directions for the application of network control methods to identify PDGs in cancer and diverse biological processes.
Collapse
Affiliation(s)
- Wei-Feng Guo
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, 611-0011, Japan
| | - Luonan Chen
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China.,Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, 200031, China.,School of Life Science and Technology, ShanghaiTech University, 201210 Shanghai, China.,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai 201210, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
29
|
Smith J, Shi Y, Benedikt M, Nikolic M. Scalable analysis of multi-modal biomedical data. Gigascience 2021; 10:giab058. [PMID: 34508579 PMCID: PMC8434767 DOI: 10.1093/gigascience/giab058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 05/31/2021] [Accepted: 08/18/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Targeted diagnosis and treatment options are dependent on insights drawn from multi-modal analysis of large-scale biomedical datasets. Advances in genomics sequencing, image processing, and medical data management have supported data collection and management within medical institutions. These efforts have produced large-scale datasets and have enabled integrative analyses that provide a more thorough look of the impact of a disease on the underlying system. The integration of large-scale biomedical data commonly involves several complex data transformation steps, such as combining datasets to build feature vectors for learning analysis. Thus, scalable data integration solutions play a key role in the future of targeted medicine. Though large-scale data processing frameworks have shown promising performance for many domains, they fail to support scalable processing of complex datatypes. SOLUTION To address these issues and achieve scalable processing of multi-modal biomedical data, we present TraNCE, a framework that automates the difficulties of designing distributed analyses with complex biomedical data types. PERFORMANCE We outline research and clinical applications for the platform, including data integration support for building feature sets for classification. We show that the system is capable of outperforming the common alternative, based on "flattening" complex data structures, and runs efficiently when alternative approaches are unable to perform at all.
Collapse
Affiliation(s)
- Jaclyn Smith
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Yao Shi
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Michael Benedikt
- University of Oxford, Computer Science, Wolfson Building, Parks Road, Oxford OX1 3QD, UK
| | - Milos Nikolic
- University of Edinburgh, School of Informatics, Informatics Forum, 10 Crichton St, Newington, Edinburgh EH8 9AB, Scotland
| |
Collapse
|
30
|
Lu X, Wang X, Ding L, Li J, Gao Y, He K. frDriver: A Functional Region Driver Identification for Protein Sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1773-1783. [PMID: 32870797 DOI: 10.1109/tcbb.2020.3020096] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Identifying cancer drivers is a crucial challenge to explain the underlying mechanisms of cancer development. There are many methods to identify cancer drivers based on the single mutation site or the entire gene. But they ignore a large number of functional elements with medium in size. It is hypothesized that mutations occurring in different regions of the protein sequence have different effects on the progression of cancer. Here, we develop a novel functional region driver(frDriver) identification method based on Bayesian probability and multiple linear regression models to identify protein regions that can regulate gene expression levels and have high functional impact potential. Combining gene expression data and somatic mutation data, with functional impact scores(SIFT, PROVEAN) as a priori knowledge, we identified cancer driver regions that are most accurate in predicting gene expression levels. We evaluated the performance of frDriver on the BRCA and GBM datasets from TCGA. The results showed that frDriver identified known cancer drivers and outperformed the other three state-of-the-art methods(eDriver, ActiveDriver and OncodriveCLUST). In addition, we performed KEGG pathway and GO term enrichment analysis, and the results indicated that the cancer drivers predicted by frDriver were related to processes such as cancer formation and gene regulation.
Collapse
|
31
|
Lin C, Liu X, Zheng B, Ke R, Tzeng CM. Liquid Biopsy, ctDNA Diagnosis through NGS. Life (Basel) 2021; 11:life11090890. [PMID: 34575039 PMCID: PMC8468354 DOI: 10.3390/life11090890] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/08/2021] [Accepted: 08/11/2021] [Indexed: 12/15/2022] Open
Abstract
Liquid biopsy with circulating tumor DNA (ctDNA) profiling by next-generation sequencing holds great promise to revolutionize clinical oncology. It relies on the basis that ctDNA represents the real-time status of the tumor genome which contains information of genetic alterations. Compared to tissue biopsy, liquid biopsy possesses great advantages such as a less demanding procedure, minimal invasion, ease of frequent sampling, and less sampling bias. Next-generation sequencing (NGS) methods have come to a point that both the cost and performance are suitable for clinical diagnosis. Thus, profiling ctDNA by NGS technologies is becoming more and more popular since it can be applied in the whole process of cancer diagnosis and management. Further developments of liquid biopsy ctDNA testing will be beneficial for cancer patients, paving the way for precision medicine. In conclusion, profiling ctDNA with NGS for cancer diagnosis is both biologically sound and technically convenient.
Collapse
Affiliation(s)
- Chen Lin
- School of Medicine, Huaqiao University, Quanzhou 362021, China; (C.L.); (X.L.)
| | - Xuzhu Liu
- School of Medicine, Huaqiao University, Quanzhou 362021, China; (C.L.); (X.L.)
| | - Bingyi Zheng
- Translational Medicine Research Center, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361102, China;
- Xiamen Key Laboratory of Cancer Cell Theranostics and Clinical Translation, Xiamen 361102, China
| | - Rongqin Ke
- School of Medicine, Huaqiao University, Quanzhou 362021, China; (C.L.); (X.L.)
- Correspondence: (R.K.); (C.-M.T.)
| | - Chi-Meng Tzeng
- Translational Medicine Research Center, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361102, China;
- Xiamen Key Laboratory of Cancer Cell Theranostics and Clinical Translation, Xiamen 361102, China
- Correspondence: (R.K.); (C.-M.T.)
| |
Collapse
|
32
|
Yang C, Guo Y, Qian R, Huang Y, Zhang L, Wang J, Huang X, Liu Z, Qin W, Wang C, Chen H, Ma X, Zhang D. Mapping the landscape of synthetic lethal interactions in liver cancer. Theranostics 2021; 11:9038-9053. [PMID: 34522226 PMCID: PMC8419043 DOI: 10.7150/thno.63416] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 08/14/2021] [Indexed: 12/11/2022] Open
Abstract
Almost all the current therapies against liver cancer are based on the "one size fits all" principle and offer only limited survival benefit. Fortunately, synthetic lethality (SL) may provide an alternate route towards individualized therapy in liver cancer. The concept that simultaneous losses of two genes are lethal to a cell while a single loss is non-lethal can be utilized to selectively eliminate tumors with genetic aberrations. Methods: To infer liver cancer-specific SL interactions, we propose a computational pipeline termed SiLi (statistical inference-based synthetic lethality identification) that incorporates five inference procedures. Based on large-scale sequencing datasets, SiLi analysis was performed to identify SL interactions in liver cancer. Results: By SiLi analysis, a total of 272 SL pairs were discerned, which included 209 unique target candidates. Among these, polo-like kinase 1 (PLK1) was considered to have considerable therapeutic potential. Further computational and experimental validation of the SL pair TP53-PLK1 demonstrated that inhibition of PLK1 could be a novel therapeutic strategy specifically targeting those patients with TP53-mutant liver tumors. Conclusions: In this study, we report a comprehensive analysis of synthetic lethal interactions of liver cancer. Our findings may open new possibilities for patient-tailored therapeutic interventions in liver cancer.
Collapse
Affiliation(s)
- Chen Yang
- Department of Clinical Medicine, School of Medicine, Zhejiang University City College, Hangzhou, China
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuchen Guo
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ruolan Qian
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yiwen Huang
- Department of Clinical Medicine, Fujian Medical University, Fuzhou, China
| | - Linmeng Zhang
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jun Wang
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiaowen Huang
- Division of Gastroenterology and Hepatology, Key Laboratory of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhicheng Liu
- Hepatic Surgery Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Wenxin Qin
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Cun Wang
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Huimin Chen
- Division of Gastroenterology and Hepatology, Key Laboratory of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Xuhui Ma
- State Key Laboratory of Oncogenes and Related Genes, Shanghai Cancer Institute, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Dayong Zhang
- Department of Clinical Medicine, School of Medicine, Zhejiang University City College, Hangzhou, China
| |
Collapse
|
33
|
Song J, Peng W, Wang F. Identifying cancer patient subgroups by finding co-modules from the driver mutation profiles and downstream gene expression profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; PP:2863-2872. [PMID: 34415837 DOI: 10.1109/tcbb.2021.3106344] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identifying cancer subtypes shed new light on effective personalized cancer medicine, future therapeutic strategies and minimizing treatment-related costs. Recently, there are many clustering methods have been proposed in categorizing cancer patients. However, these methods still fail to fully use the prior known biological information in the model designing process to improve precision and efficiency. It is acknowledged that the driver gene always regulates its downstream genes in the net-work to perform a certain function. By analyzing the known clinic cancer subtype data, we found some special co-pathways between the driver genes and the downstream genes in the cancer patients of the same subgroup. Hence, we proposed a novel model named DDCMNMF(Driver and Downstream gene Co-Module Assisted Multiple Non-negative Matrix Factorization model) that first stratify cancer sub-types by identifying co-modules of driver genes and downstream genes. We applied our model on lung and breast cancer datasets and compared it with the other four state-of-the-art models. The final results show that our model could identify the cancer subtypes with high compactness and separateness and achieve a high degree of consistency with the known cancer subtypes. The survival time analysis further proves the significant clinical characteristic of identified cancer subgroups by our model.
Collapse
|
34
|
Zhang T, Zhang SW, Li Y. Identifying Driver Genes for Individual Patients through Inductive Matrix Completion. Bioinformatics 2021; 37:4477-4484. [PMID: 34175939 DOI: 10.1093/bioinformatics/btab477] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 04/30/2021] [Accepted: 06/25/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The driver genes play a key role in the evolutionary process of cancer. Effectively identifying these driver genes is crucial to cancer diagnosis and treatment. However, due to the high heterogeneity of cancers, it remains challenging to identify the driver genes for individual patients. Although some computational methods have been proposed to tackle this problem, they seldom consider the fact that the genes functionally similar to the well-established driver genes may likely play similar roles in cancer process, which potentially promotes the driver gene identification. Thus, here we developed a novel approach of IMCDriver to promote the driver gene identification both for cohorts and individual patients. RESULTS IMCDriver first considers the well-established driver genes as prior information, and adopts the using multi-omics data (e.g., somatic mutation, gene expression and protein-protein interaction) to compute the similarity between patients/genes. Then, IMCDriver prioritizes the personalized mutated genes according to their functional similarity to the well-established driver genes via Inductive Matrix Completion. Finally, IMCDriver identifies the highly rank-ordered genes as the personalized driver genes. The results on five cancer datasets from TCGA show that our IMCDriver outperforms other existing state-of-the-art methods both in the cohort and patient-specific driver gene identification. IMCDriver also reveals some novel driver genes that potentially drive cancer development. In addition, even for the driver genes rarely mutated among a population, IMCDriver can still identify them and prioritize them with high priorities. AVAILABILITY Code available at https://github.com/NWPU-903PR/IMCDriver. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tong Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, China Xi'an.,School of Electrical and Mechanical Engineering, Pingdingshan University, Pingdingshan, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, China Xi'an
| | - Yan Li
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, China Xi'an
| |
Collapse
|
35
|
Ülgen E, Sezerman OU. driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics 2021; 22:263. [PMID: 34030627 PMCID: PMC8142487 DOI: 10.1186/s12859-021-04203-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Accepted: 05/17/2021] [Indexed: 12/15/2022] Open
Abstract
Background Cancer develops due to “driver” alterations. Numerous approaches exist for predicting cancer drivers from cohort-scale genomics data. However, methods for personalized analysis of driver genes are underdeveloped. In this study, we developed a novel personalized/batch analysis approach for driver gene prioritization utilizing somatic genomics data, called driveR. Results Combining genomics information and prior biological knowledge, driveR accurately prioritizes cancer driver genes via a multi-task learning model. Testing on 28 different datasets, this study demonstrates that driveR performs adequately, achieving a median AUC of 0.684 (range 0.651–0.861) on the 28 batch analysis test datasets, and a median AUC of 0.773 (range 0–1) on the 5157 personalized analysis test samples. Moreover, it outperforms existing approaches, achieving a significantly higher median AUC than all of MutSigCV (Wilcoxon rank-sum test p < 0.001), DriverNet (p < 0.001), OncodriveFML (p < 0.001) and MutPanning (p < 0.001) on batch analysis test datasets, and a significantly higher median AUC than DawnRank (p < 0.001) and PRODIGY (p < 0.001) on personalized analysis datasets. Conclusions This study demonstrates that the proposed method is an accurate and easy-to-utilize approach for prioritizing driver genes in cancer genomes in personalized or batch analyses. driveR is available on CRAN: https://cran.r-project.org/package=driveR. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04203-7.
Collapse
Affiliation(s)
- Ege Ülgen
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey.
| | - O Uğur Sezerman
- Department of Biostatistics and Medical Informatics, School of Medicine, Acibadem Mehmet Ali Aydinlar University, Istanbul, Turkey
| |
Collapse
|
36
|
Rogozin IB, Roche-Lima A, Tyryshkin K, Carrasquillo-Carrión K, Lada AG, Poliakov LY, Schwartz E, Saura A, Yurchenko V, Cooper DN, Panchenko AR, Pavlov YI. DNA Methylation, Deamination, and Translesion Synthesis Combine to Generate Footprint Mutations in Cancer Driver Genes in B-Cell Derived Lymphomas and Other Cancers. Front Genet 2021; 12:671866. [PMID: 34093666 PMCID: PMC8170131 DOI: 10.3389/fgene.2021.671866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Cancer genomes harbor numerous genomic alterations and many cancers accumulate thousands of nucleotide sequence variations. A prominent fraction of these mutations arises as a consequence of the off-target activity of DNA/RNA editing cytosine deaminases followed by the replication/repair of edited sites by DNA polymerases (pol), as deduced from the analysis of the DNA sequence context of mutations in different tumor tissues. We have used the weight matrix (sequence profile) approach to analyze mutagenesis due to Activation Induced Deaminase (AID) and two error-prone DNA polymerases. Control experiments using shuffled weight matrices and somatic mutations in immunoglobulin genes confirmed the power of this method. Analysis of somatic mutations in various cancers suggested that AID and DNA polymerases η and θ contribute to mutagenesis in contexts that almost universally correlate with the context of mutations in A:T and G:C sites during the affinity maturation of immunoglobulin genes. Previously, we demonstrated that AID contributes to mutagenesis in (de)methylated genomic DNA in various cancers. Our current analysis of methylation data from malignant lymphomas suggests that driver genes are subject to different (de)methylation processes than non-driver genes and, in addition to AID, the activity of pols η and θ contributes to the establishment of methylation-dependent mutation profiles. This may reflect the functional importance of interplay between mutagenesis in cancer and (de)methylation processes in different groups of genes. The resulting changes in CpG methylation levels and chromatin modifications are likely to cause changes in the expression levels of driver genes that may affect cancer initiation and/or progression.
Collapse
Affiliation(s)
- Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States
| | - Abiel Roche-Lima
- Center for Collaborative Research in Health Disparities - RCMI Program, University of Puerto Rico, San Juan, Puerto Rico
| | - Kathrin Tyryshkin
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, Kingston, ON, Canada
| | | | - Artem G Lada
- Department Microbiology and Molecular Genetics, University of California, Davis, Davis, CA, United States
| | - Lennard Y Poliakov
- Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Elena Schwartz
- Coordinating Center for Clinical Trials, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Andreu Saura
- Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, Czechia
| | - Vyacheslav Yurchenko
- Life Science Research Centre, Faculty of Science, University of Ostrava, Ostrava, Czechia.,Martsinovsky Institute of Medical Parasitology, Tropical and Vector Borne Diseases, Sechenov First Moscow State Medical University, Moscow, Russia
| | - David N Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, Kingston, ON, Canada
| | - Youri I Pavlov
- Eppley Institute for Research in Cancer and Allied Diseases, Omaha, NE, United States.,Department of Microbiology and Pathology, Biochemistry and Molecular Biology, Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States.,Department of Genetics and Biotechnology, Saint-Petersburg State University, Saint-Petersburg, Russia
| |
Collapse
|
37
|
Zhao W, Yang J, Wu J, Cai G, Zhang Y, Haltom J, Su W, Dong MJ, Chen S, Wu J, Zhou Z, Gu X. CanDriS: posterior profiling of cancer-driving sites based on two-component evolutionary model. Brief Bioinform 2021; 22:6238585. [PMID: 33876217 DOI: 10.1093/bib/bbab131] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 03/17/2021] [Accepted: 03/18/2021] [Indexed: 12/12/2022] Open
Abstract
Current cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored. Due to the over-excess mutations unrelated to cancer, the great challenge is to identify somatic mutations that are cancer-driven. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model: while the ground component corresponds to passenger mutations, the rapidly evolving component corresponds to driver mutations. Then, we implemented an empirical Bayesian procedure to calculate the posterior probability of a site being cancer-driven. Based on these, we developed a software CanDriS (Cancer Driver Sites) to profile the potential cancer-driving sites for thousands of tumor samples from the Cancer Genome Atlas and International Cancer Genome Consortium across tumor types and pan-cancer level. As a result, we identified that approximately 1% of the sites have posterior probabilities larger than 0.90 and listed potential cancer-wide and cancer-specific driver mutations. By comprehensively profiling all potential cancer-driving sites, CanDriS greatly enhances our ability to refine our knowledge of the genetic basis of cancer and might guide clinical medication in the upcoming era of precision medicine. The results were displayed in a database CandrisDB (http://biopharm.zju.edu.cn/candrisdb/).
Collapse
Affiliation(s)
- Wenyi Zhao
- College of Pharmaceutical Sciences & College of Computer Science and Technology, Zhejiang University, China
| | - Jingwen Yang
- MOE Key Laboratory of Contemporary Anthropology, Human Phenome Institute, School of Life Sciences, Fudan University, China
| | - Jingcheng Wu
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Guoxing Cai
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Yao Zhang
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Jeffrey Haltom
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, 12 Iowa 50011, USA
| | - Weijia Su
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, 12 Iowa 50011, USA
| | - Michael J Dong
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, 12 Iowa 50011, USA
| | - Shuqing Chen
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Jian Wu
- College of Computer Science and Technology & School of Medicine, Zhejiang University, China
| | - Zhan Zhou
- College of Pharmaceutical Sciences, Innovation Institute for Artificial Intelligence in Medicine, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Zhejiang University, China
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, 12 Iowa 50011, USA
| |
Collapse
|
38
|
Zhou K, Wang Y, Bretonnel Cohen K, Kim JD, Ma X, Shen Z, Meng X, Xia J. Bridging heterogeneous mutation data to enhance disease gene discovery. Brief Bioinform 2021; 22:6224263. [PMID: 33847357 DOI: 10.1093/bib/bbab079] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 02/01/2021] [Accepted: 02/19/2021] [Indexed: 12/18/2022] Open
Abstract
Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations that link genotype and phenotype. However, due to the differences of size and quality between GWAS studies, not all de facto vital variations are able to pass the multiple testing. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain of function and loss of function. To bring together the heterogeneous mutation data, we propose a 'Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)' pipeline with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types and recovers false-negative GWAS mutations that fail to pass significant test but represent supportive evidences of functional biological process in literature. Eventually, we applied GDAMDB in Alzheimer's disease (AD) and predicted 79 AD-associated genes. Besides, 12 of them from the original GWAS, 60 of them are supported to be AD-related by other GWAS or literature report, and rest of them are newly predicted genes. Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.
Collapse
Affiliation(s)
- Kaiyin Zhou
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| | - Yuxing Wang
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| | - Kevin Bretonnel Cohen
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| | - Jin-Dong Kim
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| | - Xiaohang Ma
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| | - Zhixue Shen
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| | - Xiangyu Meng
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| | - Jingbo Xia
- Hubei Key Lab of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, Hubei Province, P.R. China
| |
Collapse
|
39
|
Li R, Lin CY, Guo WF, Akutsu T. Weighted minimum feedback vertex sets and implementation in human cancer genes detection. BMC Bioinformatics 2021; 22:143. [PMID: 33752597 PMCID: PMC7986389 DOI: 10.1186/s12859-021-04062-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 03/03/2021] [Indexed: 11/22/2022] Open
Abstract
Background Recently, many computational methods have been proposed to predict cancer genes. One typical kind of method is to find the differentially expressed genes between tumour and normal samples. However, there are also some genes, for example, ‘dark’ genes, that play important roles at the network level but are difficult to find by traditional differential gene expression analysis. In addition, network controllability methods, such as the minimum feedback vertex set (MFVS) method, have been used frequently in cancer gene prediction. However, the weights of vertices (or genes) are ignored in the traditional MFVS methods, leading to difficulty in finding the optimal solution because of the existence of many possible MFVSs. Results Here, we introduce a novel method, called weighted MFVS (WMFVS), which integrates the gene differential expression value with MFVS to select the maximum-weighted MFVS from all possible MFVSs in a protein interaction network. Our experimental results show that WMFVS achieves better performance than using traditional bio-data or network-data analyses alone. Conclusion This method balances the advantage of differential gene expression analyses and network analyses, improves the low accuracy of differential gene expression analyses and decreases the instability of pure network analyses. Furthermore, WMFVS can be easily applied to various kinds of networks, providing a useful framework for data analysis and prediction. Supplementary Information The online version supplementary material available at 10.1186/s12859-021-04062-2.
Collapse
Affiliation(s)
- Ruiming Li
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
| | - Chun-Yu Lin
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan.,Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, 300, Hsinchu, Taiwan.,Center for Intelligent Drug Systems and Smart Bio-devices, National Yang Ming Chiao Tung University, 300, Hsinchu, Taiwan
| | - Wei-Feng Guo
- School of Electrical Engineering, Zhengzhou University, 450001, Zhengzhou, China
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan.
| |
Collapse
|
40
|
Luo P, Chen B, Liao B, Wu F. Predicting disease‐associated genes: Computational methods, databases, and evaluations. WIRES DATA MINING AND KNOWLEDGE DISCOVERY 2021; 11. [DOI: 10.1002/widm.1383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 06/13/2020] [Indexed: 09/09/2024]
Abstract
AbstractComplex diseases are associated with a set of genes (called disease genes), the identification of which can help scientists uncover the mechanisms of diseases and develop new drugs and treatment strategies. Due to the huge cost and time of experimental identification techniques, many computational algorithms have been proposed to predict disease genes. Although several review publications in recent years have discussed many computational methods, some of them focus on cancer driver genes while others focus on biomolecular networks, which only cover a specific aspect of existing methods. In this review, we summarize existing methods and classify them into three categories based on their rationales. Then, the algorithms, biological data, and evaluation methods used in the computational prediction are discussed. Finally, we highlight the limitations of existing methods and point out some future directions for improving these algorithms. This review could help investigators understand the principles of existing methods, and thus develop new methods to advance the computational prediction of disease genes.This article is categorized under:Technologies > Machine LearningTechnologies > PredictionAlgorithmic Development > Biological Data Mining
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering University of Saskatchewan Saskatoon Canada
- Princess Margaret Cancer Centre University Health Network Toronto Canada
| | - Bolin Chen
- School of Computer Science and Technology Northwestern Polytechnical University China
| | - Bo Liao
- School of Mathematics and Statistics Hainan Normal University Haikou China
| | - Fang‐Xiang Wu
- Department of Mechanical Engineering and Department of Computer Science University of Saskatchewan Saskatoon Canada
| |
Collapse
|
41
|
Zhao Y, Yang B, Chen D, Zhou X, Wang M, Jiang J, Wei L, Chen Z. Combined identification of ARID1A, CSMD1, and SENP3 as effective prognostic biomarkers for hepatocellular carcinoma. Aging (Albany NY) 2021; 13:4696-4712. [PMID: 33558447 PMCID: PMC7906131 DOI: 10.18632/aging.202586] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Accepted: 12/09/2020] [Indexed: 01/01/2023]
Abstract
Background: The current study aimed to understand the genetic landscape and investigate the diagnostic and prognostic biomarkers of primary hepatocellular carcinoma (HCC). Methods: A cohort of 36 Chinese HCC samples with hepatitis B virus (HBV) infection was examined by whole-exome sequencing (WES). Prognosis-related alterations were identified and further verified in the TCGA database and GSE65372 profiles in the GEO database. A Chinese replication cohort of 180 HCC samples with HBV infection was collected to evaluate the candidate genes by immunohistochemical analysis. A receiver operating characteristic (ROC) curve analysis evaluated the prognostic power of candidate genes. Finally, EdU and transwell invasion assay were performed to detect the function of candidate genes. Results: A total of 11 novel genes showed a significant association with HCC in the discovery cohort. The data were verified using the GEO and TCGA databases, and the expression of ARID1A, CSMD1, and SENP was evaluated in the replication cohort. Furthermore, ARID1A, CSMD1, and SENP3 are effective prognostic biomarkers for HCC patients in the replication population. Conclusions: Molecular heterogeneity was detected in HCC patients, and ARID1A, CSMD1, and SENP3 were identified as effective HCC prognosis biomarkers. CSMD1 prevents HCC by suppressing cell invasion.
Collapse
Affiliation(s)
- Yuanyuan Zhao
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| | - Bo Yang
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| | - Dong Chen
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| | - Xiaojun Zhou
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| | - Meixi Wang
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| | - Jipin Jiang
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| | - Lai Wei
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| | - Zhishui Chen
- Institute of Organ Transplantation, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Ministry of Education, Wuhan 430030, China.,NHC Key Laboratory of Organ Transplantation, Wuhan 430030, China.,Key Laboratory of Organ Transplantation, Chinese Academy of Medical Sciences, Wuhan 430030, China
| |
Collapse
|
42
|
Hou J, Ye X, Wang Y, Li C. Stratification of Estrogen Receptor-Negative Breast Cancer Patients by Integrating the Somatic Mutations and Transcriptomic Data. Front Genet 2021; 12:610087. [PMID: 33613637 PMCID: PMC7886807 DOI: 10.3389/fgene.2021.610087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 01/04/2021] [Indexed: 01/26/2023] Open
Abstract
Patients with estrogen receptor-negative breast cancer generally have a worse prognosis than estrogen receptor-positive patients. Nevertheless, a significant proportion of the estrogen receptor-negative cases have favorable outcomes. Identifying patients with a good prognosis, however, remains difficult, as recent studies are quite limited. The identification of molecular biomarkers is needed to better stratify patients. The significantly mutated genes may be potentially used as biomarkers to identify the subtype and to predict outcomes. To identify the biomarkers of receptor-negative breast cancer among the significantly mutated genes, we developed a workflow to screen significantly mutated genes associated with the estrogen receptor in breast cancer by a gene coexpression module. The similarity matrix was calculated with distance correlation to obtain gene modules through a weighted gene coexpression network analysis. The modules highly associated with the estrogen receptor, called important modules, were enriched for breast cancer-related pathways or disease. To screen significantly mutated genes, a new gene list was obtained through the overlap of the important module genes and the significantly mutated genes. The genes on this list can be used as biomarkers to predict survival of estrogen receptor-negative breast cancer patients. Furthermore, we selected six hub significantly mutated genes in the gene list which were also able to separate these patients. Our method provides a new and alternative method for integrating somatic gene mutations and expression data for patient stratification of estrogen receptor-negative breast cancers.
Collapse
Affiliation(s)
| | - Xiufen Ye
- College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China
| | | | | |
Collapse
|
43
|
Yu J, Chang X. Topological Data Analysis: A New Method to Identify Genetic Alterations in Cancer. Asia Pac J Oncol Nurs 2021; 8:112-114. [PMID: 33688559 PMCID: PMC7934599 DOI: 10.4103/2347-5625.308301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 11/30/2020] [Indexed: 12/05/2022] Open
Abstract
Cancer is the largest health problem worldwide. A number of targeted therapies are currently employed for the treatment of different cancers. Determining the molecular mechanisms that are necessary for cancer development and progression is the most critical step in targeted therapies. Currently, many studies have identified a large number of frequently mutated cancer-associated genes using recurrence-based methods. However, only the cancer-associated mutations with a mutation frequency >15% can be identified by these methods. In other words, they cannot be used to identify driver genes that have low mutation frequency but play a major role in tumorigenesis and development. Thus, there is an urgent need for a method for identifying cancer-associated genes that are not based on recurrence. In a study, recently published in Nature Communications, research team led by Prof. Raúl Rabadán from the Columbia University successfully devised a novel topological data analysis approach to identify low-prevalence cancer-associated gene mutations using expression data from multiple cancers.
Collapse
Affiliation(s)
- Jie Yu
- Foreign Languages College, Tianjin Normal University, Tianjin, China
| | - Xinzhong Chang
- Department of Breast Surgery, Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| |
Collapse
|
44
|
Hu R, Xu H, Jia P, Zhao Z. KinaseMD: kinase mutations and drug response database. Nucleic Acids Res 2021; 49:D552-D561. [PMID: 33137204 PMCID: PMC7779064 DOI: 10.1093/nar/gkaa945] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/05/2020] [Accepted: 10/07/2020] [Indexed: 12/11/2022] Open
Abstract
Mutations in kinases are abundant and critical to study signaling pathways and regulatory roles in human disease, especially in cancer. Somatic mutations in kinase genes can affect drug treatment, both sensitivity and resistance, to clinically used kinase inhibitors. Here, we present a newly constructed database, KinaseMD (kinase mutations and drug response), to structurally and functionally annotate kinase mutations. KinaseMD integrates 679 374 somatic mutations, 251 522 network-rewiring events, and 390 460 drug response records curated from various sources for 547 kinases. We uniquely annotate the mutations and kinase inhibitor response in four types of protein substructures (gatekeeper, A-loop, G-loop and αC-helix) that are linked to kinase inhibitor resistance in literature. In addition, we annotate functional mutations that may rewire kinase regulatory network and report four phosphorylation signals (gain, loss, up-regulation and down-regulation). Overall, KinaseMD provides the most updated information on mutations, unique annotations of drug response especially drug resistance and functional sites of kinases. KinaseMD is accessible at https://bioinfo.uth.edu/kmd/, having functions for searching, browsing and downloading data. To our knowledge, there has been no systematic annotation of these structural mutations linking to kinase inhibitor response. In summary, KinaseMD is a centralized database for kinase mutations and drug response.
Collapse
Affiliation(s)
- Ruifeng Hu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston TX 77030, USA
| | - Haodong Xu
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston TX 77030, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston TX 77030, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston TX 77030, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston TX 77030, USA.,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston TX 77030, USA
| |
Collapse
|
45
|
Hassan R, Allali I, Agamah FE, Elsheikh SSM, Thomford NE, Dandara C, Chimusa ER. Drug response in association with pharmacogenomics and pharmacomicrobiomics: towards a better personalized medicine. Brief Bioinform 2020; 22:6012864. [PMID: 33253350 DOI: 10.1093/bib/bbaa292] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/19/2020] [Accepted: 10/03/2020] [Indexed: 12/15/2022] Open
Abstract
Researchers have long been presented with the challenge imposed by the role of genetic heterogeneity in drug response. For many years, Pharmacogenomics and pharmacomicrobiomics has been investigating the influence of an individual's genetic background to drug response and disposition. More recently, the human gut microbiome has proven to play a crucial role in the way patients respond to different therapeutic drugs and it has been shown that by understanding the composition of the human microbiome, we can improve the drug efficacy and effectively identify drug targets. However, our knowledge on the effect of host genetics on specific gut microbes related to variation in drug metabolizing enzymes, the drug remains limited and therefore limits the application of joint host-microbiome genome-wide association studies. In this paper, we provide a historical overview of the complex interactions between the host, human microbiome and drugs. While discussing applications, challenges and opportunities of these studies, we draw attention to the critical need for inclusion of diverse populations and the development of an innovative and combined pharmacogenomics and pharmacomicrobiomics approach, that may provide an important basis in personalized medicine.
Collapse
Affiliation(s)
- Radia Hassan
- Division of Human Genetics, Department of Pathology, University of Cape Town
| | - Imane Allali
- Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Morocco
| | - Francis E Agamah
- Division of Human Genetics, Department of Pathology, University of Cape Town
| | | | - Nicholas E Thomford
- Lecturers at the Department of Medical Biochemistry School of Medical Sciences, University of Cape Coast, Ghana
| | - Collet Dandara
- Division of Human Genetics, Department of Pathology, University of Cape Town
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, University of Cape Town
| |
Collapse
|
46
|
Ma H, Li G, Su Z. KSP: an integrated method for predicting catalyzing kinases of phosphorylation sites in proteins. BMC Genomics 2020; 21:537. [PMID: 32753030 PMCID: PMC7646512 DOI: 10.1186/s12864-020-06895-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 07/08/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Protein phosphorylation by kinases plays crucial roles in various biological processes including signal transduction and tumorigenesis, thus a better understanding of protein phosphorylation events in cells is fundamental for studying protein functions and designing drugs to treat diseases caused by the malfunction of phosphorylation. Although a large number of phosphorylation sites in proteins have been identified using high-throughput phosphoproteomic technologies, their specific catalyzing kinases remain largely unknown. Therefore, computational methods are urgently needed to predict the kinases that catalyze the phosphorylation of these sites. RESULTS We developed KSP, a new algorithm for predicting catalyzing kinases for experimentally identified phosphorylation sites in human proteins. KSP constructs a network based on known protein-protein interactions and kinase-substrate relationships. Based on the network, it computes an affinity score between a phosphorylation site and kinases, and returns the top-ranked kinases of the score as candidate catalyzing kinases. When tested on known kinase-substrate pairs, KSP outperforms existing methods including NetworKIN, iGPS, and PKIS. CONCLUSIONS We developed a novel accurate tool for predicting catalyzing kinases of known phosphorylation sites. It can work as a complementary network approach for sequence-based phosphorylation site predictors.
Collapse
Affiliation(s)
- Hongli Ma
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China.,School of Mathematics, Shandong University, Jinan, 250100, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao, 266237, China. .,School of Mathematics, Shandong University, Jinan, 250100, China.
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
47
|
Identifying and ranking potential cancer drivers using representation learning on attributed network. Methods 2020; 192:13-24. [PMID: 32758683 DOI: 10.1016/j.ymeth.2020.07.013] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 07/16/2020] [Accepted: 07/29/2020] [Indexed: 12/14/2022] Open
Abstract
Cancer can arise as a consequence of the accumulation of genomic alterations. Only a small part of driver mutations contributes to cancer development and progression. Hence, the identification of genes and alterations that serve as drivers for cancer development plays a critical role in drug design, cancer diagnoses and treatment. In this study, we propose a novel method to identify potential cancer drivers by using a Representation Learning method on Attributed Graphs (called RLAG). It is a first attempt to use both network structure and node attributes to learn feature representation for the genes in the network. Then it leverages these feature vectors to divide the genes into several subgroups. Finally, potential cancer driver genes are prioritized according to ranking scores that measure both genes' properties and their importance in the subgroups. We apply our method to predict driver genes for lung cancer, breast cancer and prostate cancer. The results show that our method outperforms the other three state-of-the-art methods in terms of Precision, Recall and F1-score values.
Collapse
|
48
|
Rabadán R, Mohamedi Y, Rubin U, Chu T, Alghalith AN, Elliott O, Arnés L, Cal S, Obaya ÁJ, Levine AJ, Cámara PG. Identification of relevant genetic alterations in cancer using topological data analysis. Nat Commun 2020; 11:3808. [PMID: 32732999 PMCID: PMC7393176 DOI: 10.1038/s41467-020-17659-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 07/09/2020] [Indexed: 01/05/2023] Open
Abstract
Large-scale cancer genomic studies enable the systematic identification of mutations that lead to the genesis and progression of tumors, uncovering the underlying molecular mechanisms and potential therapies. While some such mutations are recurrently found in many tumors, many others exist solely within a few samples, precluding detection by conventional recurrence-based statistical approaches. Integrated analysis of somatic mutations and RNA expression data across 12 tumor types reveals that mutations of cancer genes are usually accompanied by substantial changes in expression. We use topological data analysis to leverage this observation and uncover 38 elusive candidate cancer-associated genes, including inactivating mutations of the metalloproteinase ADAMTS12 in lung adenocarcinoma. We show that ADAMTS12-/- mice have a five-fold increase in the susceptibility to develop lung tumors, confirming the role of ADAMTS12 as a tumor suppressor gene. Our results demonstrate that data integration through topological techniques can increase our ability to identify previously unreported cancer-related alterations.
Collapse
Affiliation(s)
- Raúl Rabadán
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA.
| | - Yamina Mohamedi
- Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Asturias, Spain
- IUOPA, Instituto Universitario de Oncologia, Oviedo, Asturias, Spain
| | - Udi Rubin
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, 10065, USA
| | - Tim Chu
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
| | - Adam N Alghalith
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA
| | - Oliver Elliott
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
| | - Luis Arnés
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
| | - Santiago Cal
- Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Asturias, Spain
- IUOPA, Instituto Universitario de Oncologia, Oviedo, Asturias, Spain
| | - Álvaro J Obaya
- IUOPA, Instituto Universitario de Oncologia, Oviedo, Asturias, Spain
- Departamento de Biologia Funcional, Universidad de Oviedo, Oviedo, Asturias, Spain
| | - Arnold J Levine
- The Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ, 08540, USA.
| | - Pablo G Cámara
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA.
| |
Collapse
|
49
|
Cutigi JF, Evangelista AF, Simao A. Approaches for the identification of driver mutations in cancer: A tutorial from a computational perspective. J Bioinform Comput Biol 2020; 18:2050016. [PMID: 32698724 DOI: 10.1142/s021972002050016x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cancer is a complex disease caused by the accumulation of genetic alterations during the individual's life. Such alterations are called genetic mutations and can be divided into two groups: (1) Passenger mutations, which are not responsible for cancer and (2) Driver mutations, which are significant for cancer and responsible for its initiation and progression. Cancer cells undergo a large number of mutations, of which most are passengers, and few are drivers. The identification of driver mutations is a key point and one of the biggest challenges in Cancer Genomics. Many computational methods for such a purpose have been developed in Cancer Bioinformatics. Such computational methods are complex and are usually described in a high level of abstraction. This tutorial details some classical computational methods, from a computational perspective, with the transcription in an algorithmic format towards an easy access by researchers.
Collapse
Affiliation(s)
- Jorge Francisco Cutigi
- Federal Institute of São Paulo (IFSP), São Carlos, SP, Brazil.,University of São Paulo (USP), São Carlos, SP, Brazil
| | | | | |
Collapse
|
50
|
Kim P, Li H, Wang J, Zhao Z. Landscape of drug-resistance mutations in kinase regulatory hotspots. Brief Bioinform 2020; 22:5854404. [PMID: 32510566 DOI: 10.1093/bib/bbaa108] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/23/2020] [Accepted: 05/05/2020] [Indexed: 12/13/2022] Open
Abstract
More than 48 kinase inhibitors (KIs) have been approved by Food and Drug Administration. However, drug-resistance (DR) eventually occurs, and secondary mutations have been found in the previously targeted primary-mutated cancer cells. Cancer and drug research communities recognize the importance of the kinase domain (KD) mutations for kinasopathies. So far, a systematic investigation of kinase mutations on DR hotspots has not been done yet. In this study, we systematically investigated four types of representative mutation hotspots (gatekeeper, G-loop, αC-helix and A-loop) associated with DR in 538 human protein kinases using large-scale cancer data sets (TCGA, ICGC, COSMIC and GDSC). Our results revealed 358 kinases harboring 3318 mutations that covered 702 drug resistance hotspot residues. Among them, 197 kinases had multiple genetic variants on each residue. We further computationally assessed and validated the epidermal growth factor receptor mutations on protein structure and drug-binding efficacy. This is the first study to provide a landscape view of DR-associated mutation hotspots in kinase's secondary structures, and its knowledge will help the development of effective next-generation KIs for better precision medicine.
Collapse
|