1
|
Li G, Hu Z, Luo X, Liu J, Wu J, Peng W, Zhu X. Identification of cancer driver genes based on hierarchical weak consensus model. Health Inf Sci Syst 2024; 12:21. [PMID: 38464463 PMCID: PMC10917728 DOI: 10.1007/s13755-024-00279-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 01/31/2024] [Indexed: 03/12/2024] Open
Abstract
Cancer is a complex gene mutation disease that derives from the accumulation of mutations during somatic cell evolution. With the advent of high-throughput technology, a large amount of omics data has been generated, and how to find cancer-related driver genes from a large number of omics data is a challenge. In the early stage, the researchers developed many frequency-based driver genes identification methods, but they could not identify driver genes with low mutation rates well. Afterwards, researchers developed network-based methods by fusing multi-omics data, but they rarely considered the connection among features. In this paper, after analyzing a large number of methods for integrating multi-omics data, a hierarchical weak consensus model for fusing multiple features is proposed according to the connection among features. By analyzing the connection between PPI network and co-mutation hypergraph network, this paper firstly proposes a new topological feature, called co-mutation clustering coefficient (CMCC). Then, a hierarchical weak consensus model is used to integrate CMCC, mRNA and miRNA differential expression scores, and a new driver genes identification method HWC is proposed. In this paper, the HWC method and current 7 state-of-the-art methods are compared on three types of cancers. The comparison results show that HWC has the best identification performance in statistical evaluation index, functional consistency and the partial area under ROC curve. Supplementary Information The online version contains supplementary material available at 10.1007/s13755-024-00279-6.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-Source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
2
|
Hu Z, Li G, Luo X, Peng W, Liu J, Zhu X, Wu J. Identification of Cancer Driver Genes based on Dynamic Incentive Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2371-2381. [PMID: 39316497 DOI: 10.1109/tcbb.2024.3467119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Cancer is a complex genomic mutation disease, and identifying cancer driver genes promotes the development of targeted drugs and personalized therapies. The current computational method takes less consideration of the relationship among features and the effect of noise in protein-protein interaction(PPI) data, resulting in a low recognition rate. In this paper, we propose a cancer driver genes identification method based on dynamic incentive model, DIM. This method firstly constructs a hypergraph to reduce the impact of false positive data in PPI. Then, the importance of genes in each hyperedge in hypergraph is considered from three perspectives, network and functional score(NFS) is proposed. By analyzing the relation among features, the dynamic incentive model is proposed to fuse NFS, the differential expression score of mRNA and the differential expression score of miRNA. DIM is compared with some classical methods on breast cancer, lung cancer, prostate cancer, and pan-cancer datasets. The results show that DIM has the best performance on statistical evaluation indicators, functional consistency and the partial area under the ROC curve, and has good cross-cancer capability.
Collapse
|
3
|
Huang J, Mao L, Lei Q, Guo AY. Bioinformatics tools and resources for cancer and application. Chin Med J (Engl) 2024; 137:2052-2064. [PMID: 39075637 PMCID: PMC11374212 DOI: 10.1097/cm9.0000000000003254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Indexed: 07/31/2024] Open
Abstract
ABSTRACT Tumor bioinformatics plays an important role in cancer research and precision medicine. The primary focus of traditional cancer research has been molecular and clinical studies of a number of fundamental pathways and genes. In recent years, driven by breakthroughs in high-throughput technologies, large-scale cancer omics data have accumulated rapidly. How to effectively utilize and share these data is particularly important. To address this crucial task, many computational tools and databases have been developed over the past few years. To help researchers quickly learn and understand the functions of these tools, in this review, we summarize publicly available bioinformatics tools and resources for pan-cancer multi-omics analysis, regulatory analysis of tumorigenesis, tumor treatment and prognosis, immune infiltration analysis, immune repertoire analysis, cancer driver gene and driver mutation analysis, and cancer single-cell analysis, which may further help researchers find more suitable tools for their research.
Collapse
Affiliation(s)
- Jin Huang
- Department of Thoracic Surgery, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Lingzi Mao
- Hubei Bioinformatics & Molecular Imaging Key Laboratory, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Qian Lei
- Department of Thoracic Surgery, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - An-Yuan Guo
- Department of Thoracic Surgery, West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| |
Collapse
|
4
|
Srivastava S, Jain P. Computational Approaches: A New Frontier in Cancer Research. Comb Chem High Throughput Screen 2024; 27:1861-1876. [PMID: 38031782 DOI: 10.2174/0113862073265604231106112203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/08/2023] [Accepted: 09/21/2023] [Indexed: 12/01/2023]
Abstract
Cancer is a broad category of disease that can start in virtually any organ or tissue of the body when aberrant cells assault surrounding organs and proliferate uncontrollably. According to the most recent statistics, cancer will be the cause of 10 million deaths worldwide in 2020, accounting for one death out of every six worldwide. The typical approach used in anti-cancer research is highly time-consuming and expensive, and the outcomes are not particularly encouraging. Computational techniques have been employed in anti-cancer research to advance our understanding. Recent years have seen a significant and exceptional impact on anticancer research due to the rapid development of computational tools for novel drug discovery, drug design, genetic studies, genome characterization, cancer imaging and detection, radiotherapy, cancer metabolomics, and novel therapeutic approaches. In this paper, we examined the various subfields of contemporary computational techniques, including molecular docking, artificial intelligence, bioinformatics, virtual screening, and QSAR, and their applications in the study of cancer.
Collapse
Affiliation(s)
- Shubham Srivastava
- Department of Pharmacy, IIMT College of Pharmacy, Uttar Pradesh, 201310, India
| | - Pushpendra Jain
- Department of Pharmacy, IIMT College of Pharmacy, Uttar Pradesh, 201310, India
| |
Collapse
|
5
|
Rocha J, Sastre J, Amengual-Cladera E, Hernandez-Rodriguez J, Asensio-Landa V, Heine-Suñer D, Capriotti E. Identification of Driver Epistatic Gene Pairs Combining Germline and Somatic Mutations in Cancer. Int J Mol Sci 2023; 24:ijms24119323. [PMID: 37298272 DOI: 10.3390/ijms24119323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 05/20/2023] [Accepted: 05/22/2023] [Indexed: 06/12/2023] Open
Abstract
Cancer arises from the complex interplay of various factors. Traditionally, the identification of driver genes focuses primarily on the analysis of somatic mutations. We describe a new method for the detection of driver gene pairs based on an epistasis analysis that considers both germline and somatic variations. Specifically, the identification of significantly mutated gene pairs entails the calculation of a contingency table, wherein one of the co-mutated genes can exhibit a germline variant. By adopting this approach, it is possible to select gene pairs in which the individual genes do not exhibit significant associations with cancer. Finally, a survival analysis is used to select clinically relevant gene pairs. To test the efficacy of the new algorithm, we analyzed the colon adenocarcinoma (COAD) and lung adenocarcinoma (LUAD) samples available at The Cancer Genome Atlas (TCGA). In the analysis of the COAD and LUAD samples, we identify epistatic gene pairs significantly mutated in tumor tissue with respect to normal tissue. We believe that further analysis of the gene pairs detected by our method will unveil new biological insights, enhancing a better description of the cancer mechanism.
Collapse
Affiliation(s)
- Jairo Rocha
- Department of Mathematics and Computer Science, University of the Balearic Islands, 07122 Palma de Majorca, Spain
- Genomics of Health Group, Health Research Institute of the Balearic Islands (IDISBA), 07120 Palma de Majorca, Spain
| | - Jaume Sastre
- Department of Mathematics and Computer Science, University of the Balearic Islands, 07122 Palma de Majorca, Spain
| | - Emilia Amengual-Cladera
- Genomics of Health Group, Health Research Institute of the Balearic Islands (IDISBA), 07120 Palma de Majorca, Spain
| | - Jessica Hernandez-Rodriguez
- Genomics of Health Group, Health Research Institute of the Balearic Islands (IDISBA), 07120 Palma de Majorca, Spain
| | - Victor Asensio-Landa
- Genomics of Health Group, Health Research Institute of the Balearic Islands (IDISBA), 07120 Palma de Majorca, Spain
| | - Damià Heine-Suñer
- Genomics of Health Group, Health Research Institute of the Balearic Islands (IDISBA), 07120 Palma de Majorca, Spain
| | - Emidio Capriotti
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, 40126 Bologna, Italy
| |
Collapse
|
6
|
Lu X, Wang X, Ding L, Li J, Gao Y, He K. frDriver: A Functional Region Driver Identification for Protein Sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1773-1783. [PMID: 32870797 DOI: 10.1109/tcbb.2020.3020096] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Identifying cancer drivers is a crucial challenge to explain the underlying mechanisms of cancer development. There are many methods to identify cancer drivers based on the single mutation site or the entire gene. But they ignore a large number of functional elements with medium in size. It is hypothesized that mutations occurring in different regions of the protein sequence have different effects on the progression of cancer. Here, we develop a novel functional region driver(frDriver) identification method based on Bayesian probability and multiple linear regression models to identify protein regions that can regulate gene expression levels and have high functional impact potential. Combining gene expression data and somatic mutation data, with functional impact scores(SIFT, PROVEAN) as a priori knowledge, we identified cancer driver regions that are most accurate in predicting gene expression levels. We evaluated the performance of frDriver on the BRCA and GBM datasets from TCGA. The results showed that frDriver identified known cancer drivers and outperformed the other three state-of-the-art methods(eDriver, ActiveDriver and OncodriveCLUST). In addition, we performed KEGG pathway and GO term enrichment analysis, and the results indicated that the cancer drivers predicted by frDriver were related to processes such as cancer formation and gene regulation.
Collapse
|
7
|
Al Hajri Q, Dash S, Feng WC, Garner HR, Anandakrishnan R. Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPU. Sci Rep 2020; 10:2022. [PMID: 32029803 PMCID: PMC7005272 DOI: 10.1038/s41598-020-58785-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/20/2020] [Indexed: 01/16/2023] Open
Abstract
Despite decades of research, effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). Therefore, treatments that may be effective in some cases are not effective in others. We previously developed an algorithm for identifying combinations of carcinogenic genes with mutations (multi-hit combinations), which could suggest a likely cause for individual instances of cancer. Most cancers are estimated to require three or more hits. However, the computational complexity of the algorithm scales exponentially with the number of hits, making it impractical for identifying combinations of more than two hits. To identify combinations of greater than two hits, we used a compressed binary matrix representation, and optimized the algorithm for parallel execution on an NVIDIA V100 graphics processing unit (GPU). With these enhancements, the optimized GPU implementation was on average an estimated 12,144 times faster than the original integer matrix based CPU implementation, for the 3-hit algorithm, allowing us to identify 3-hit combinations. The 3-hit combinations identified using a training set were able to differentiate between tumor and normal samples in a separate test set with 90% overall sensitivity and 93% overall specificity. We illustrate how the distribution of mutations in tumor and normal samples in the multi-hit gene combinations can suggest potential driver mutations for further investigation. With experimental validation, these combinations may provide insight into the etiology of cancer and a rational basis for targeted combination therapy.
Collapse
Affiliation(s)
- Qais Al Hajri
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Sajal Dash
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Wu-Chun Feng
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Harold R Garner
- Department of Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, 29303, USA
| | - Ramu Anandakrishnan
- Department of Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA.
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, 29303, USA.
| |
Collapse
|
8
|
Zia A, Rashid S. Systems Biology and Integrated Computational Methods for Cancer-Associated Mutation Analysis. 'ESSENTIALS OF CANCER GENOMIC, COMPUTATIONAL APPROACHES AND PRECISION MEDICINE 2020:335-362. [DOI: 10.1007/978-981-15-1067-0_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
9
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
10
|
Mukherjee S, Perumal TM, Daily K, Sieberts SK, Omberg L, Preuss C, Carter GW, Mangravite LM, Logsdon BA. Identifying and ranking potential driver genes of Alzheimer's disease using multiview evidence aggregation. Bioinformatics 2019; 35:i568-i576. [PMID: 31510680 PMCID: PMC6612835 DOI: 10.1093/bioinformatics/btz365] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION Late onset Alzheimer's disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types. RESULTS We demonstrate the utility of our machine learning algorithm on two benchmark multiview datasets by significantly outperforming the baseline approaches in predicting missing labels. We then use the algorithm to predict and rank potential drivers of Alzheimer's. We show that our ranked genes show a significant enrichment for single nucleotide polymorphisms associated with Alzheimer's and are enriched in pathways that have been previously associated with the disease. AVAILABILITY AND IMPLEMENTATION Source code and link to all feature sets is available at https://github.com/Sage-Bionetworks/EvidenceAggregatedDriverRanking.
Collapse
Affiliation(s)
| | | | | | | | | | - Christoph Preuss
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA
| | - Gregory W Carter
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME, USA
| | | | - Benjamin A Logsdon
- Sage Bionetworks, Seattle, WA, USA,To whom correspondence should be addressed.
| |
Collapse
|
11
|
Song J, Peng W, Wang F. A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph. BMC Bioinformatics 2019; 20:238. [PMID: 31088372 PMCID: PMC6518800 DOI: 10.1186/s12859-019-2847-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Accepted: 04/24/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Cancer as a worldwide problem is driven by genomic alterations. With the advent of high-throughput sequencing technology, a huge amount of genomic data generates at every second which offer many valuable cancer information and meanwhile throw a big challenge to those investigators. As the major characteristic of cancer is heterogeneity and most of alterations are supposed to be useless passenger mutations that make no contribution to the cancer progress. Hence, how to dig out driver genes that have effect on a selective growth advantage in tumor cells from those tremendously and noisily data is still an urgent task. RESULTS Considering previous network-based method ignoring some important biological properties of driver genes and the low reliability of gene interactive network, we proposed a random walk method named as Subdyquency that integrates the information of subcellular localization, variation frequency and its interaction with other dysregulated genes to improve the prediction accuracy of driver genes. We applied our model to three different cancers: lung, prostate and breast cancer. The results show our model can not only identify the well-known important driver genes but also prioritize the rare unknown driver genes. Besides, compared with other existing methods, our method can improve the precision, recall and fscore to a higher level for most of cancer types. CONCLUSIONS The final results imply that driver genes are those prone to have higher variation frequency and impact more dysregulated genes in the common significant compartment. AVAILABILITY The source code can be obtained at https://github.com/weiba/Subdyquency .
Collapse
Affiliation(s)
- Junrong Song
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China
| | - Wei Peng
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China.
| | - Feng Wang
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China
| |
Collapse
|
12
|
Dash S, Kinney NA, Varghese RT, Garner HR, Feng WC, Anandakrishnan R. Differentiating between cancer and normal tissue samples using multi-hit combinations of genetic mutations. Sci Rep 2019; 9:1005. [PMID: 30700767 PMCID: PMC6353925 DOI: 10.1038/s41598-018-37835-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Accepted: 12/14/2018] [Indexed: 01/06/2023] Open
Abstract
Cancer is known to result from a combination of a small number of genetic defects. However, the specific combinations of mutations responsible for the vast majority of cancers have not been identified. Current computational approaches focus on identifying driver genes and mutations. Although individually these mutations can increase the risk of cancer they do not result in cancer without additional mutations. We present a fundamentally different approach for identifying the cause of individual instances of cancer: we search for combinations of genes with carcinogenic mutations (multi-hit combinations) instead of individual driver genes or mutations. We developed an algorithm that identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples with 91% sensitivity (95% Confidence Interval (CI) = 89-92%) and 93% specificity (95% CI = 91-94%) on average for seventeen cancer types. We then present an approach based on mutational profile that can be used to distinguish between driver and passenger mutations within these genes. These combinations, with experimental validation, can aid in better diagnosis, provide insights into the etiology of cancer, and provide a rational basis for designing targeted combination therapies.
Collapse
Affiliation(s)
- Sajal Dash
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Nicholas A Kinney
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA
| | - Robin T Varghese
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA
| | - Harold R Garner
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA
| | - Wu-Chun Feng
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA
| | - Ramu Anandakrishnan
- Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, USA.
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, USA.
| |
Collapse
|
13
|
Hou Y, Gao B, Li G, Su Z. MaxMIF: A New Method for Identifying Cancer Driver Genes through Effective Data Integration. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2018; 5:1800640. [PMID: 30250803 PMCID: PMC6145398 DOI: 10.1002/advs.201800640] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 06/14/2018] [Indexed: 05/05/2023]
Abstract
Identification of a few cancer driver mutation genes from a much larger number of passenger mutation genes in cancer samples remains a highly challenging task. Here, a novel method for distinguishing the driver genes from the passenger genes by effective integration of somatic mutation data and molecular interaction data using a maximal mutational impact function (MaxMIF) is presented. When evaluated on six somatic mutation datasets of Pan-Cancer and 19 datasets of different cancer types from TCGA, MaxMIF almost always significantly outperforms all the existing state-of-the-art methods in terms of predictive accuracy, sensitivity, and specificity. It recovers about 30% more known cancer genes in 500 top-ranked candidate genes than the best among the other tools evaluated. MaxMIF is also highly robust to data perturbation. Intriguingly, MaxMIF is able to identify potential cancer driver genes, with strong experimental data support. Therefore, MaxMIF can be very useful for identifying or prioritizing cancer driver genes in the increasing number of available cancer genomic data.
Collapse
Affiliation(s)
- Yingnan Hou
- School of MathematicsShandong UniversityJinan250100P. R. China
- State Key Laboratory of Microbial TechnologyShandong UniversityJinan250100P. R. China
| | - Bo Gao
- School of MathematicsShandong UniversityJinan250100P. R. China
- State Key Laboratory of Microbial TechnologyShandong UniversityJinan250100P. R. China
| | - Guojun Li
- School of MathematicsShandong UniversityJinan250100P. R. China
- State Key Laboratory of Microbial TechnologyShandong UniversityJinan250100P. R. China
- Department of Bioinformatics and GenomicsThe University of North Carolina at Charlotte9201, University City BlvdCharlotteNC28223USA
| | - Zhengchang Su
- Department of Bioinformatics and GenomicsThe University of North Carolina at Charlotte9201, University City BlvdCharlotteNC28223USA
| |
Collapse
|
14
|
Przytycki PF, Singh M. Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes. Genome Med 2017; 9:79. [PMID: 28841835 PMCID: PMC5574113 DOI: 10.1186/s13073-017-0465-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Accepted: 08/07/2017] [Indexed: 12/30/2022] Open
Abstract
A major aim of cancer genomics is to pinpoint which somatically mutated genes are involved in tumor initiation and progression. We introduce a new framework for uncovering cancer genes, differential mutation analysis, which compares the mutational profiles of genes across cancer genomes with their natural germline variation across healthy individuals. We present DiffMut, a fast and simple approach for differential mutational analysis, and demonstrate that it is more effective in discovering cancer genes than considerably more sophisticated approaches. We conclude that germline variation across healthy human genomes provides a powerful means for characterizing somatic mutation frequency and identifying cancer driver genes. DiffMut is available at https://github.com/Singh-Lab/Differential-Mutation-Analysis.
Collapse
Affiliation(s)
- Pawel F Przytycki
- Department of Computer Science, Princeton University, Princeton, NJ, 08544, USA.,Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ, 08544, USA. .,Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, 08544, USA.
| |
Collapse
|
15
|
Hibsh D, Buetow KH, Yaari G, Efroni S. Quantification of read species behavior within whole genome sequencing of cancer genomes for the stratification and visualization of genomic variation. Nucleic Acids Res 2016; 44:e81. [PMID: 26809676 PMCID: PMC4872078 DOI: 10.1093/nar/gkw031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 01/11/2016] [Indexed: 11/13/2022] Open
Abstract
The cancer genome is abnormal genome, and the ability to monitor its sequence had undergone a technological revolution. Yet prognosis and diagnosis remain an expert-based decision, with only limited abilities to provide machine-based decisions. We introduce a heterogeneity-based method for stratifying and visualizing whole-genome sequencing (WGS) reads. This method uses the heterogeneity within WGS reads to markedly reduce the dimensionality of next-generation sequencing data; it is available through the tool HiBS (Heterogeneity-Based Subclassification) that allows cancer sample classification. We validated HiBS using >200 WGS samples from nine different cancer types from The Cancer Genome Atlas (TCGA). With HiBS, we show progress with two WGS related issues: (i) differentiation between normal (NB) and tumor (TP) samples based solely on the information structure of their WGS data, and (ii) identification of specific regions of chromosomal amplification/deletion and their association with tumor stage. By comparing results to those obtained through available WGS analyses tools, we demonstrate some of the novelties obtained by the approach implemented in HiBS and also show nearly perfect normal/tumor classification, used to identify known and unknown chromosomal aberrations. Finally, the HiBS index has been associated with breast cancer tumor stage.
Collapse
Affiliation(s)
- Dror Hibsh
- Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel
| | - Kenneth H Buetow
- Computational Sciences and Informatics Program, Complex Adaptive Systems Initiative, Arizona State University, Tempe AZ 85281, USA
| | - Gur Yaari
- Faculty of Engineering, Bar-Ilan University, Ramat Gan 52900, Israel
| | - Sol Efroni
- Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 52900, Israel
| |
Collapse
|
16
|
Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016; 37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| |
Collapse
|
17
|
Cheng F, Zhao J, Zhao Z. Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform 2015; 17:642-56. [PMID: 26307061 DOI: 10.1093/bib/bbv068] [Citation(s) in RCA: 94] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Indexed: 12/27/2022] Open
Abstract
Cancer is often driven by the accumulation of genetic alterations, including single nucleotide variants, small insertions or deletions, gene fusions, copy-number variations, and large chromosomal rearrangements. Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data and catalog somatic mutations in both common and rare cancer types. So far, the somatic mutation landscapes and signatures of >10 major cancer types have been reported; however, pinpointing driver mutations and cancer genes from millions of available cancer somatic mutations remains a monumental challenge. To tackle this important task, many methods and computational tools have been developed during the past several years and, thus, a review of its advances is urgently needed. Here, we first summarize the main features of these methods and tools for whole-exome, whole-genome and whole-transcriptome sequencing data. Then, we discuss major challenges like tumor intra-heterogeneity, tumor sample saturation and functionality of synonymous mutations in cancer, all of which may result in false-positive discoveries. Finally, we highlight new directions in studying regulatory roles of noncoding somatic mutations and quantitatively measuring circulating tumor DNA in cancer. This review may help investigators find an appropriate tool for detecting potential driver or actionable mutations in rapidly emerging precision cancer medicine.
Collapse
|
18
|
Tian R, Basu MK, Capriotti E. Computational methods and resources for the interpretation of genomic variants in cancer. BMC Genomics 2015; 16 Suppl 8:S7. [PMID: 26111056 PMCID: PMC4480958 DOI: 10.1186/1471-2164-16-s8-s7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The recent improvement of the high-throughput sequencing technologies is having a strong impact on the detection of genetic variations associated with cancer. Several institutions worldwide have been sequencing the whole exomes and or genomes of cancer patients in the thousands, thereby providing an invaluable collection of new somatic mutations in different cancer types. These initiatives promoted the development of methods and tools for the analysis of cancer genomes that are aimed at studying the relationship between genotype and phenotype in cancer. In this article we review the online resources and computational tools for the analysis of cancer genome. First, we describe the available repositories of cancer genome data. Next, we provide an overview of the methods for the detection of genetic variation and computational tools for the prioritization of cancer related genes and causative somatic variations. Finally, we discuss the future perspectives in cancer genomics focusing on the impact of computational methods and quantitative approaches for defining personalized strategies to improve the diagnosis and treatment of cancer.
Collapse
|