51
|
Song J, Peng W, Wang F. An Entropy-Based Method for Identifying Mutual Exclusive Driver Genes in Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:758-768. [PMID: 30763245 DOI: 10.1109/tcbb.2019.2897931] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Cancer in essence is a complex genomic alteration disease which is caused by the somatic mutations during the lifetime. According to previous researches, the first step to overcome cancer is to identify driver genes which can promote carcinogenesis. However, it is still a big challenge to precisely and efficiently extract the cancer related driver genes because the nature of cancer is heterogeneous and there exists tremendously irrelevant passenger mutations which have no function impact on the cancer's development. In this work, we proposed a novel entropy-based method namely EntroRank to identify driver genes by integrating the subcellular localization information and mutual exclusive of variation frequency into the network. EntroRank can take into full consideration different properties of driver genes. Considering the modularity of driver genes, the mutated genes in the network were first clustered into different subgroups according to their located compartments. After that, the structural entropy of the gene in the subgroup was employed to measure its indispensability. Considering mutual exclusive property between driver genes in the modules, relative entropy was utilized to measure the degree of mutual exclusive between two mutated genes in terms of their variation frequency. We applied our method to three different cancers including lung, prostate, and breast cancer. The results show our method not only detect the well-known important drivers but also prioritiz the rare unknown driver genes. Besides, EntroRank can identify driver genes having mutual exclusive property. Compared with other existing methods, our method achieves a better performance for most of cancer types in terms of Precision, Recall, and Fscore.
Collapse
|
52
|
Zhang S, Zhou Y, Wang Y, Wang Z, Xiao Q, Zhang Y, Lou Y, Qiu Y, Zhu F. The mechanistic, diagnostic and therapeutic novel nucleic acids for hepatocellular carcinoma emerging in past score years. Brief Bioinform 2020; 22:1860-1883. [PMID: 32249290 DOI: 10.1093/bib/bbaa023] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 02/09/2020] [Accepted: 02/12/2020] [Indexed: 02/07/2023] Open
Abstract
Despite The Central Dogma states the destiny of gene as 'DNA makes RNA and RNA makes protein', the nucleic acids not only store and transmit genetic information but also, surprisingly, join in intracellular vital movement as a regulator of gene expression. Bioinformatics has contributed to knowledge for a series of emerging novel nucleic acids molecules. For typical cases, microRNA (miRNA), long noncoding RNA (lncRNA) and circular RNA (circRNA) exert crucial role in regulating vital biological processes, especially in malignant diseases. Due to extraordinarily heterogeneity among all malignancies, hepatocellular carcinoma (HCC) has emerged enormous limitation in diagnosis and therapy. Mechanistic, diagnostic and therapeutic nucleic acids for HCC emerging in past score years have been systematically reviewed. Particularly, we have organized recent advances on nucleic acids of HCC into three facets: (i) summarizing diverse nucleic acids and their modification (miRNA, lncRNA, circRNA, circulating tumor DNA and DNA methylation) acting as potential biomarkers in HCC diagnosis; (ii) concluding different patterns of three key noncoding RNAs (miRNA, lncRNA and circRNA) in gene regulation and (iii) outlining the progress of these novel nucleic acids for HCC diagnosis and therapy in clinical trials, and discuss their possibility for clinical applications. All in all, this review takes a detailed look at the advances of novel nucleic acids from potential of biomarkers and elaboration of mechanism to early clinical application in past 20 years.
Collapse
Affiliation(s)
- Song Zhang
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital in Zhejiang University, China.,College of Pharmaceutical Sciences in Zhejiang University, China
| | - Ying Zhou
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital in Zhejiang University, China
| | - Yanan Wang
- School of Life Sciences in Nanchang University, China
| | - Zhengwen Wang
- College of Pharmaceutical Sciences in Zhejiang University, China
| | - Qitao Xiao
- College of Pharmaceutical Sciences in Zhejiang University, China
| | - Ying Zhang
- College of Pharmaceutical Sciences in Zhejiang University, China
| | - Yan Lou
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital in Zhejiang University, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital in Zhejiang University, China
| | - Feng Zhu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital in Zhejiang University, China.,College of Pharmaceutical Sciences in Zhejiang University, China
| |
Collapse
|
53
|
Chang JW, Ding Y, Tahir Ul Qamar M, Shen Y, Gao J, Chen LL. A deep learning model based on sparse auto-encoder for prioritizing cancer-related genes and drug target combinations. Carcinogenesis 2020; 40:624-632. [PMID: 30944926 DOI: 10.1093/carcin/bgz044] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 01/06/2019] [Accepted: 03/10/2019] [Indexed: 12/21/2022] Open
Abstract
Prioritization of cancer-related genes from gene expression profiles and proteomic data is vital to improve the targeted therapies research. Although computational approaches have been complementing high-throughput biological experiments on the understanding of human diseases, it still remains a big challenge to accurately discover cancer-related proteins/genes via automatic learning from large-scale protein/gene expression data and protein-protein interaction data. Most of the existing methods are based on network construction combined with gene expression profiles, which ignore the diversity between normal samples and disease cell lines. In this study, we introduced a deep learning model based on a sparse auto-encoder to learn the specific characteristics of protein interactions in cancer cell lines integrated with protein expression data. The model showed learning ability to identify cancer-related proteins/genes from the input of different protein expression profiles by extracting the characteristics of protein interaction information, which could also predict cancer-related protein combinations. Comparing with other reported methods including differential expression and network-based methods, our model got the highest area under the curve value (>0.8) in predicting cancer-related genes. Our study prioritized ~500 high-confidence cancer-related genes; among these genes, 211 already known cancer drug targets were found, which supported the accuracy of our method. The above results indicated that the proposed auto-encoder model could computationally prioritize candidate proteins/genes involved in cancer and improve the targeted therapies research.
Collapse
Affiliation(s)
- Ji-Wei Chang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Yuduan Ding
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Muhammad Tahir Ul Qamar
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Yin Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Junxiang Gao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Ling-Ling Chen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
54
|
Dinstag G, Shamir R. PRODIGY: personalized prioritization of driver genes. Bioinformatics 2020; 36:1831-1839. [PMID: 31681944 PMCID: PMC7703777 DOI: 10.1093/bioinformatics/btz815] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 09/03/2019] [Accepted: 10/30/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Evolution of cancer is driven by few somatic mutations that disrupt cellular processes, causing abnormal proliferation and tumor development, whereas most somatic mutations have no impact on progression. Distinguishing those mutated genes that drive tumorigenesis in a patient is a primary goal in cancer therapy: Knowledge of these genes and the pathways on which they operate can illuminate disease mechanisms and indicate potential therapies and drug targets. Current research focuses mainly on cohort-level driver gene identification but patient-specific driver gene identification remains a challenge. METHODS We developed a new algorithm for patient-specific ranking of driver genes. The algorithm, called PRODIGY, analyzes the expression and mutation profiles of the patient along with data on known pathways and protein-protein interactions. Prodigy quantifies the impact of each mutated gene on every deregulated pathway using the prize-collecting Steiner tree model. Mutated genes are ranked by their aggregated impact on all deregulated pathways. RESULTS In testing on five TCGA cancer cohorts spanning >2500 patients and comparison to validated driver genes, Prodigy outperformed extant methods and ranking based on network centrality measures. Our results pinpoint the pleiotropic effect of driver genes and show that Prodigy is capable of identifying even very rare drivers. Hence, Prodigy takes a step further toward personalized medicine and treatment. AVAILABILITY AND IMPLEMENTATION The Prodigy R package is available at: https://github.com/Shamir-Lab/PRODIGY. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gal Dinstag
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801, Israel
| | - Ron Shamir
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
55
|
Al Hajri Q, Dash S, Feng WC, Garner HR, Anandakrishnan R. Identifying multi-hit carcinogenic gene combinations: Scaling up a weighted set cover algorithm using compressed binary matrix representation on a GPU. Sci Rep 2020; 10:2022. [PMID: 32029803 PMCID: PMC7005272 DOI: 10.1038/s41598-020-58785-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/20/2020] [Indexed: 01/16/2023] Open
Abstract
Despite decades of research, effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). Therefore, treatments that may be effective in some cases are not effective in others. We previously developed an algorithm for identifying combinations of carcinogenic genes with mutations (multi-hit combinations), which could suggest a likely cause for individual instances of cancer. Most cancers are estimated to require three or more hits. However, the computational complexity of the algorithm scales exponentially with the number of hits, making it impractical for identifying combinations of more than two hits. To identify combinations of greater than two hits, we used a compressed binary matrix representation, and optimized the algorithm for parallel execution on an NVIDIA V100 graphics processing unit (GPU). With these enhancements, the optimized GPU implementation was on average an estimated 12,144 times faster than the original integer matrix based CPU implementation, for the 3-hit algorithm, allowing us to identify 3-hit combinations. The 3-hit combinations identified using a training set were able to differentiate between tumor and normal samples in a separate test set with 90% overall sensitivity and 93% overall specificity. We illustrate how the distribution of mutations in tumor and normal samples in the multi-hit gene combinations can suggest potential driver mutations for further investigation. With experimental validation, these combinations may provide insight into the etiology of cancer and a rational basis for targeted combination therapy.
Collapse
Affiliation(s)
- Qais Al Hajri
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Sajal Dash
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Wu-Chun Feng
- Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA
- Department of Computer Science, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Harold R Garner
- Department of Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, 29303, USA
| | - Ramu Anandakrishnan
- Department of Biomedical Sciences, Edward Via College of Osteopathic Medicine, Blacksburg, VA, 24060, USA.
- Gibbs Cancer Center and Research Institute, Spartanburg, SC, 29303, USA.
| |
Collapse
|
56
|
Kong F, Kong D, Yang X, Yuan D, Zhang N, Hua X, You H, Zheng K, Tang R. Integrative analysis of highly mutated genes in hepatitis B virus-related hepatic carcinoma. Cancer Med 2020; 9:2462-2479. [PMID: 32017470 PMCID: PMC7131865 DOI: 10.1002/cam4.2903] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 01/15/2020] [Accepted: 01/21/2020] [Indexed: 12/14/2022] Open
Abstract
Gene mutation is responsible for the development of hepatocellular carcinoma (HCC) with hepatitis B virus (HBV) infection; however, the characteristics and associated biological functions of highly mutated genes, in which the mutation frequencies are at least 5% in HCC patients with HBV infection, are not clearly evaluated. In the study, we analyzed the information regarding somatic mutation obtained by whole‐exome sequencing in 280 HBV‐related HCC tissues from public databases and published studies. Via integrative analysis, 78 genes, including TP53, TTN, MUC16, CTNNB1, and PCLO were summarized as highly mutated genes, and some of these mutated genes were further identified as cancer driver genes. Besides, we discovered that the highly mutated genes were enriched with various biological functions and pathways. The expression of many of highly mutated genes was found to be significantly altered in HBV‐related HCC, and several highly mutated genes were related to a variety of clinical factors and associated with the poor survival of the disease. Taken together, these results could enrich our understanding of highly mutated genes and their relationships with HBV‐related HCC. Some of the identified highly mutated genes might be used as novel biomarkers of disease prognosis, or as molecular targets for the treatment of HCC with HBV infection.
Collapse
Affiliation(s)
- Fanyun Kong
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Delong Kong
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Xiaoying Yang
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Dongchen Yuan
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Ning Zhang
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Xuan Hua
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Hongjuan You
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Kuiyang Zheng
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China.,National Demonstration Center for Experimental Basic Medical Sciences Education, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| | - Renxian Tang
- Jiangsu Key Laboratory of Immunity and Metabolism, Department of Pathogenic Biology and Immunology, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China.,National Demonstration Center for Experimental Basic Medical Sciences Education, Xuzhou Medical University, Xuzhou, Jiangsu, P. R. China
| |
Collapse
|
57
|
Si Z, Hu K. Identification of osteosarcoma driver genes using a network method. Oncol Lett 2020; 19:1215-1222. [PMID: 31966051 PMCID: PMC6956419 DOI: 10.3892/ol.2019.11212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 11/07/2019] [Indexed: 02/05/2023] Open
Abstract
Osteosarcoma (OS) is a severe disease that is generally caused by genetic alterations. Systematic identification of driver genes may be used to increase the understanding of the mechanisms underlying the disease. The present study identified a framework to predict driver genes, with the hypothesis that driver genes operate through a number of connected functional genes. OS-related genes were extracted from the Catalogue Of Somatic Mutations In Cancer and subsequently ranked by virtue of their effect on a set of functional genes using a network-based algorithm. This revealed the driver genes associated with dysregulated networks. In addition, compared with the Mutations For Functional Impact on Network Neighbors algorithm, the results obtained using the aforementioned network-based algorithm revealed that the proposed method is effective. Gene functional analysis demonstrated that the potential OS driver genes were involved in OS-associated pathways. Through the validation of the 15 candidate OS driver genes, the classifier constructed in the present study revealed that the identified driver genes were able to distinguish 184 cancer samples from controls. Therefore, the present study provided insights into the identification of driver genes from a vast amount of sequencing data.
Collapse
Affiliation(s)
- Zebing Si
- Department of Orthopedics, The Affiliated Yuebei People's Hospital of Shantou University Medical College, Wujiang, Shaoguan 512026, P.R. China
| | - Konghe Hu
- Department of Orthopedics, The Affiliated Yuebei People's Hospital of Shantou University Medical College, Wujiang, Shaoguan 512026, P.R. China
- Correspondence to: Dr Konghe Hu, Department of Orthopedics, The Affiliated Yuebei People's Hospital of Shantou University Medical College, 133 Shaoguan Huimin South Avenue, Wujiang, Shaoguan 512026, P.R. China, E-mail:
| |
Collapse
|
58
|
Zia A, Rashid S. Systems Biology and Integrated Computational Methods for Cancer-Associated Mutation Analysis. 'ESSENTIALS OF CANCER GENOMIC, COMPUTATIONAL APPROACHES AND PRECISION MEDICINE 2020:335-362. [DOI: 10.1007/978-981-15-1067-0_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
59
|
Song J, Peng W, Wang F, Wang J. Identifying driver genes involving gene dysregulated expression, tissue-specific expression and gene-gene network. BMC Med Genomics 2019; 12:168. [PMID: 31888619 PMCID: PMC6936147 DOI: 10.1186/s12920-019-0619-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 11/11/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Cancer as a kind of genomic alteration disease each year deprives many people's life. The biggest challenge to overcome cancer is to identify driver genes that promote the cancer development from a huge amount of passenger mutations that have no effect on the selective growth advantage of cancer. In order to solve those problems, some researchers have started to focus on identification of driver genes by integrating networks with other biological information. However, more efforts should be needed to improve the prediction performance. METHODS Considering the facts that driver genes have impact on expression of their downstream genes, they likely interact with each other to form functional modules and those modules should tend to be expressed similarly in the same tissue. We proposed a novel model named by DyTidriver to identify driver genes through involving the gene dysregulated expression, tissue-specific expression and variation frequency into the human functional interaction network (e.g. human FIN). RESULTS This method was applied on 974 breast, 316 prostate and 230 lung cancer patients. The consequence shows our method outperformed other five existing methods in terms of Fscore, Precision and Recall values. The enrichment and cociter analysis illustrate DyTidriver can not only identifies the driver genes enriched in some significant pathways but also has the capability to figure out some unknown driver genes. CONCLUSION The final results imply that driver genes are those that impact more dysregulated genes and express similarly in the same tissue.
Collapse
Affiliation(s)
- Junrong Song
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China
| | - Wei Peng
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China.
| | - Feng Wang
- Faculty of Management and Economics/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, Yunnan, 650500, People's Republic of China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, Hunan, 410083, People's Republic of China
| |
Collapse
|
60
|
Guo WF, Zhang SW, Zeng T, Li Y, Gao J, Chen L. A novel network control model for identifying personalized driver genes in cancer. PLoS Comput Biol 2019; 15:e1007520. [PMID: 31765387 PMCID: PMC6901264 DOI: 10.1371/journal.pcbi.1007520] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 12/09/2019] [Accepted: 10/30/2019] [Indexed: 12/11/2022] Open
Abstract
Although existing computational models have identified many common driver genes, it remains challenging to identify the personalized driver genes by using samples of an individual patient. Recently, the methods of exploiting the structure-based control principles of complex networks provide new clues for identifying minimum number of driver nodes to drive the state transition of large-scale complex networks from an initial state to the desired state. However, the structure-based network control methods cannot be directly applied to identify the personalized driver genes due to the unknown network dynamics of the personalized system. Here we proposed the personalized network control model (PNC) to identify the personalized driver genes by employing the structure-based network control principle on genetic data of individual patients. In PNC model, we firstly presented a paired single sample network construction method to construct the personalized state transition network for capturing the phenotype transitions between healthy and disease states. Then, we designed a novel structure-based network control method from the Feedback Vertex Sets-based control perspective to identify the personalized driver genes. The wide experimental results on 13 cancer datasets from The Cancer Genome Atlas firstly showed that PNC model outperforms current state-of-the-art methods, in terms of F-measures for identifying cancer driver genes enriched in the gold-standard cancer driver gene lists. Furthermore, these results showed that personalized driver genes can be explored by their network characteristics even when they are hidden factors in transcription and mutation profiles. Our PNC gives novel insights and useful tools into understanding the tumor heterogeneity in cancer. The PNC package and data resources used in this work can be freely downloaded from https://github.com/NWPU-903PR/PNC. Notably there may be unique personalized driver genes for an individual patient in cancer. Identifying personalized driver genes that lead to particular cancer initiation and progression of individual patient is one of the biggest challenges in precision medicine. However, most methods for cancer driver genes identification have focused mainly on the cohort information rather than on individual information and fail to identify personalized driver genes. We here proposed personalized network control model (PNC) to identify personalized driver genes by applying the structure based network control principle on personalized data of individual patients. By considering the progression from the healthy state to the disease state as the network control problem, our PNC aims to detect a small number of personalized driver genes that are altered in response to input signals for triggering the state transition in individual patients on expression level. The impetus behind PNC contains two main respects. One is to design a paired single sample network construction method (namely Paired-SSN) for constructing personalized state transition networks to capture the phenotypic transitions between normal and disease attractors. The other one is to develop a novel structure based network control method (namely NCUA) on personalized state transition networks for identifying personalized driver genes which can drive individual patient system state from healthy state to disease state through oncogene activations. Each part of the proposed method has been deeply examined to be efficient. Compared with other existing models, our PNC shows a higher performance in terms of F-measures of the cancer driver genes in the well-known Cancer Census Genes (CCG) and Network of Cancer Genes (NCG). The wide experimental results on multiple cancer datasets highlight that sample specific network theory and structure based network control theory can contribute to identifying personalized driver genes in cancer.
Collapse
Affiliation(s)
- Wei-Feng Guo
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xian, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xian, China
- * E-mail: (S-WZ); (JG); (LC)
| | - Tao Zeng
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institutes for Biological Science, Chinese Academy Science, Shanghai, China
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
| | - Yan Li
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xian, China
| | - Jianxi Gao
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, United States of America
- Network Science and Technology Center, Rensselaer Polytechnic Institute, Troy, New York, United States of America
- * E-mail: (S-WZ); (JG); (LC)
| | - Luonan Chen
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xian, China
- Key Laboratory of Systems Biology, Center for Excellence in Molecular Cell Science, Shanghai Institutes for Biological Science, Chinese Academy Science, Shanghai, China
- Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
- * E-mail: (S-WZ); (JG); (LC)
| |
Collapse
|
61
|
Nagarajan N, Yapp EKY, Le NQK, Kamaraj B, Al-Subaie AM, Yeh HY. Application of Computational Biology and Artificial Intelligence Technologies in Cancer Precision Drug Discovery. BIOMED RESEARCH INTERNATIONAL 2019; 2019:8427042. [PMID: 31886259 PMCID: PMC6925679 DOI: 10.1155/2019/8427042] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 10/14/2019] [Indexed: 02/08/2023]
Abstract
Artificial intelligence (AI) proves to have enormous potential in many areas of healthcare including research and chemical discoveries. Using large amounts of aggregated data, the AI can discover and learn further transforming these data into "usable" knowledge. Being well aware of this, the world's leading pharmaceutical companies have already begun to use artificial intelligence to improve their research regarding new drugs. The goal is to exploit modern computational biology and machine learning systems to predict the molecular behaviour and the likelihood of getting a useful drug, thus saving time and money on unnecessary tests. Clinical studies, electronic medical records, high-resolution medical images, and genomic profiles can be used as resources to aid drug development. Pharmaceutical and medical researchers have extensive data sets that can be analyzed by strong AI systems. This review focused on how computational biology and artificial intelligence technologies can be implemented by integrating the knowledge of cancer drugs, drug resistance, next-generation sequencing, genetic variants, and structural biology in the cancer precision drug discovery.
Collapse
Affiliation(s)
| | - Edward K. Y. Yapp
- Singapore Institute of Manufacturing Technology, 2 Fusionopolis Way, Singapore 138634
| | - Nguyen Quoc Khanh Le
- School of Humanities, Nanyang Technological University, 14 Nanyang Dr, Singapore 637332
| | - Balu Kamaraj
- Department of Neuroscience Technology, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Jubail 35816, Saudi Arabia
| | - Abeer Mohammed Al-Subaie
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
| | - Hui-Yuan Yeh
- School of Humanities, Nanyang Technological University, 14 Nanyang Dr, Singapore 637332
| |
Collapse
|
62
|
Deng Y, Luo S, Deng C, Luo T, Yin W, Zhang H, Zhang Y, Zhang X, Lan Y, Ping Y, Xiao Y, Li X. Identifying mutual exclusivity across cancer genomes: computational approaches to discover genetic interaction and reveal tumor vulnerability. Brief Bioinform 2019; 20:254-266. [PMID: 28968730 DOI: 10.1093/bib/bbx109] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Indexed: 02/06/2023] Open
Abstract
Systematic sequencing of cancer genomes has revealed prevalent heterogeneity, with patients harboring various combinatorial patterns of genetic alteration. In particular, a phenomenon that a group of genes exhibits mutually exclusive patterns has been widespread across cancers, covering a broad spectrum of crucial cancer pathways. Recently, there is considerable evidence showing that, mutual exclusivity reflects alternative functions in tumor initiation and progression, or suggests adverse effects of their concurrence. Given its importance, numerous computational approaches have been proposed to study mutual exclusivity using genomic profiles alone, or by integrating networks and phenotypes. Some of them have been routinely used to explore genetic associations, which lead to a deeper understanding of carcinogenic mechanisms and reveals unexpected tumor vulnerabilities. Here, we present an overview of mutual exclusivity from the perspective of cancer genome. We describe the common hypothesis underlying mutual exclusivity, summarize the strategies for the identification of significant mutually exclusive patterns, compare the performance of representative algorithms from simulated data sets and discuss their common confounders.
Collapse
Affiliation(s)
- Yulan Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Shangyi Luo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Chunyu Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Tao Luo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Wenkang Yin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Hongyi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Yong Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Xinxin Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Yujia Lan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Yanyan Ping
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Yun Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| |
Collapse
|
63
|
Li J, Zhao T, Zhang Y, Zhang K, Shi L, Chen Y, Wang X, Sun Z. Performance evaluation of pathogenicity-computation methods for missense variants. Nucleic Acids Res 2019; 46:7793-7804. [PMID: 30060008 PMCID: PMC6125674 DOI: 10.1093/nar/gky678] [Citation(s) in RCA: 142] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 07/17/2018] [Indexed: 12/20/2022] Open
Abstract
With expanding applications of next-generation sequencing in medical genetics, increasing computational methods are being developed to predict the pathogenicity of missense variants. Selecting optimal methods can accelerate the identification of candidate genes. However, the performances of different computational methods under various conditions have not been completely evaluated. Here, we compared 12 performance measures of 23 methods based on three independent benchmark datasets: (i) clinical variants from the ClinVar database related to genetic diseases, (ii) somatic variants from the IARC TP53 and ICGC databases related to human cancers and (iii) experimentally evaluated PPARG variants. Some methods showed different performances under different conditions, suggesting that they were not always applicable for different conditions. Furthermore, the specificities were lower than the sensitivities for most methods (especially, for the experimentally evaluated benchmark datasets), suggesting that more rigorous cutoff values are necessary to distinguish pathogenic variants. Furthermore, REVEL, VEST3 and the combination of both methods (i.e. ReVe) showed the best overall performances with all the benchmark data. Finally, we evaluated the performances of these methods with de novo mutations, finding that ReVe consistently showed the best performance. We have summarized the performances of different methods under various conditions, providing tentative guidance for optimal tool selection.
Collapse
Affiliation(s)
- Jinchen Li
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Tingting Zhao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Yi Zhang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Kun Zhang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Leisheng Shi
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Yun Chen
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Xingxing Wang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Zhongsheng Sun
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China.,Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
64
|
Agajanian S, Oluyemi O, Verkhivker GM. Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations. Front Mol Biosci 2019; 6:44. [PMID: 31245384 PMCID: PMC6579812 DOI: 10.3389/fmolb.2019.00044] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 05/23/2019] [Indexed: 12/21/2022] Open
Abstract
Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.
Collapse
Affiliation(s)
- Steve Agajanian
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
| | - Odeyemi Oluyemi
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States
| | - Gennady M Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, United States.,Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, United States
| |
Collapse
|
65
|
Zhang W, Wang SL. A Novel Method for Identifying the Potential Cancer Driver Genes Based on Molecular Data Integration. Biochem Genet 2019; 58:16-39. [PMID: 31115714 DOI: 10.1007/s10528-019-09924-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Accepted: 05/02/2019] [Indexed: 12/17/2022]
Abstract
The identification of the cancer driver genes is essential for personalized therapy. The mutation frequency of most driver genes is in the middle (2-20%) or even lower range, which makes it difficult to find the driver genes with low-frequency mutations. Other forms of genomic aberrations, such as copy number variations (CNVs) and epigenetic changes, may also reflect cancer progression. In this work, a method for identifying the potential cancer driver genes (iPDG) based on molecular data integration is proposed. DNA copy number variation, somatic mutation, and gene expression data of matched cancer samples are integrated. In combination with the method of iKEEG, the "key genes" of cancer are identified, and the change in their expression levels is used for auxiliary evaluation of whether the mutated genes are potential drivers. For a mutated gene, the concept of mutational effect is defined, which takes into account the effects of copy number variation, mutation gene itself, and its neighbor genes. The method mainly includes two steps: the first step is data preprocessing. First, DNA copy number variation and somatic mutation data are integrated. Then, the integrated data are mapped to a given interaction network, and the diffusion kernel is used to form the mutation effect matrix. The second step is to obtain the key genes by using the iKGGE method, and construct the connection matrix by means of the gene expression data of the key genes and mutation impact matrix of the mutated genes. Experiments on TCGA breast cancer and Glioblastoma multiforme datasets demonstrate that iPDG is effective not only to identify the known cancer driver genes but also to discover the rare potential driver genes. When measured by functional enrichment analysis, we find that these genes are clearly associated with these two types of cancers.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Shu-Lin Wang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
66
|
Song J, Peng W, Wang F. A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph. BMC Bioinformatics 2019; 20:238. [PMID: 31088372 PMCID: PMC6518800 DOI: 10.1186/s12859-019-2847-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2019] [Accepted: 04/24/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Cancer as a worldwide problem is driven by genomic alterations. With the advent of high-throughput sequencing technology, a huge amount of genomic data generates at every second which offer many valuable cancer information and meanwhile throw a big challenge to those investigators. As the major characteristic of cancer is heterogeneity and most of alterations are supposed to be useless passenger mutations that make no contribution to the cancer progress. Hence, how to dig out driver genes that have effect on a selective growth advantage in tumor cells from those tremendously and noisily data is still an urgent task. RESULTS Considering previous network-based method ignoring some important biological properties of driver genes and the low reliability of gene interactive network, we proposed a random walk method named as Subdyquency that integrates the information of subcellular localization, variation frequency and its interaction with other dysregulated genes to improve the prediction accuracy of driver genes. We applied our model to three different cancers: lung, prostate and breast cancer. The results show our model can not only identify the well-known important driver genes but also prioritize the rare unknown driver genes. Besides, compared with other existing methods, our method can improve the precision, recall and fscore to a higher level for most of cancer types. CONCLUSIONS The final results imply that driver genes are those prone to have higher variation frequency and impact more dysregulated genes in the common significant compartment. AVAILABILITY The source code can be obtained at https://github.com/weiba/Subdyquency .
Collapse
Affiliation(s)
- Junrong Song
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China
| | - Wei Peng
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China.
| | - Feng Wang
- Faculty of Management and Economics/Computer center/Faculty of Information Engineering and Automation/Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Lianhua Road, 650050, Kunming, People's Republic of China
| |
Collapse
|
67
|
Halperin RF, Liang WS, Kulkarni S, Tassone EE, Adkins J, Enriquez D, Tran NL, Hank NC, Newell J, Kodira C, Korn R, Berens ME, Kim S, Byron SA. Leveraging Spatial Variation in Tumor Purity for Improved Somatic Variant Calling of Archival Tumor Only Samples. Front Oncol 2019; 9:119. [PMID: 30949446 PMCID: PMC6435595 DOI: 10.3389/fonc.2019.00119] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 02/11/2019] [Indexed: 12/28/2022] Open
Abstract
Archival tumor samples represent a rich resource of annotated specimens for translational genomics research. However, standard variant calling approaches require a matched normal sample from the same individual, which is often not available in the retrospective setting, making it difficult to distinguish between true somatic variants and individual-specific germline variants. Archival sections often contain adjacent normal tissue, but this tissue can include infiltrating tumor cells. As existing comparative somatic variant callers are designed to exclude variants present in the normal sample, a novel approach is required to leverage adjacent normal tissue with infiltrating tumor cells for somatic variant calling. Here we present lumosVar 2.0, a software package designed to jointly analyze multiple samples from the same patient, built upon our previous single sample tumor only variant caller lumosVar 1.0. The approach assumes that the allelic fraction of somatic variants and germline variants follow different patterns as tumor content and copy number state change. lumosVar 2.0 estimates allele specific copy number and tumor sample fractions from the data, and uses a to model to determine expected allelic fractions for somatic and germline variants and to classify variants accordingly. To evaluate the utility of lumosVar 2.0 to jointly call somatic variants with tumor and adjacent normal samples, we used a glioblastoma dataset with matched high and low tumor content and germline whole exome sequencing data (for true somatic variants) available for each patient. Both sensitivity and positive predictive value were improved when analyzing the high tumor and low tumor samples jointly compared to analyzing the samples individually or in-silico pooling of the two samples. Finally, we applied this approach to a set of breast and prostate archival tumor samples for which tumor blocks containing adjacent normal tissue were available for sequencing. Joint analysis using lumosVar 2.0 detected several variants, including known cancer hotspot mutations that were not detected by standard somatic variant calling tools using the adjacent tissue as presumed normal reference. Together, these results demonstrate the utility of leveraging paired tissue samples to improve somatic variant calling when a constitutional sample is not available.
Collapse
Affiliation(s)
- Rebecca F Halperin
- Quantitative Medicine and Systems Biology Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| | - Winnie S Liang
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| | - Sidharth Kulkarni
- Quantitative Medicine and Systems Biology Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| | - Erica E Tassone
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| | - Jonathan Adkins
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| | - Daniel Enriquez
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| | | | | | - James Newell
- HonorHealth Scottsdale Shea Medical Center, Scottsdale, AZ, United States
| | - Chinnappa Kodira
- GE Global Research Center, Niskayuna, NY, United States.,PureTech Health, Boston, MA, United States
| | - Ronald Korn
- Imaging Endpoints, Scottsdale, AZ, United States.,HonorHealth Scottsdale Shea Medical Center, Scottsdale, AZ, United States
| | - Michael E Berens
- Cancer and Cell Biology Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| | - Seungchan Kim
- Prairie View A&M University, Prairie View, TX, United States
| | - Sara A Byron
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, AZ, United States
| |
Collapse
|
68
|
Luo P, Ding Y, Lei X, Wu FX. deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks. Front Genet 2019; 10:13. [PMID: 30761181 PMCID: PMC6361806 DOI: 10.3389/fgene.2019.00013] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 01/11/2019] [Indexed: 12/16/2022] Open
Abstract
With the advances in high-throughput technologies, millions of somatic mutations have been reported in the past decade. Identifying driver genes with oncogenic mutations from these data is a critical and challenging problem. Many computational methods have been proposed to predict driver genes. Among them, machine learning-based methods usually train a classifier with representations that concatenate various types of features extracted from different kinds of data. Although successful, simply concatenating different types of features may not be the best way to fuse these data. We notice that a few types of data characterize the similarities of genes, to better integrate them with other data and improve the accuracy of driver gene prediction, in this study, a deep learning-based method (deepDriver) is proposed by performing convolution on mutation-based features of genes and their neighbors in the similarity networks. The method allows the convolutional neural network to learn information within mutation data and similarity networks simultaneously, which enhances the prediction of driver genes. deepDriver achieves AUC scores of 0.984 and 0.976 on breast cancer and colorectal cancer, which are superior to the competing algorithms. Further evaluations of the top 10 predictions also demonstrate that deepDriver is valuable for predicting new driver genes.
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xian, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.,Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
69
|
Precision medicine review: rare driver mutations and their biophysical classification. Biophys Rev 2019; 11:5-19. [PMID: 30610579 PMCID: PMC6381362 DOI: 10.1007/s12551-018-0496-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 12/18/2018] [Indexed: 02/07/2023] Open
Abstract
How can biophysical principles help precision medicine identify rare driver mutations? A major tenet of pragmatic approaches to precision oncology and pharmacology is that driver mutations are very frequent. However, frequency is a statistical attribute, not a mechanistic one. Rare mutations can also act through the same mechanism, and as we discuss below, “latent driver” mutations may also follow the same route, with “helper” mutations. Here, we review how biophysics provides mechanistic guidelines that extend precision medicine. We outline principles and strategies, especially focusing on mutations that drive cancer. Biophysics has contributed profoundly to deciphering biological processes. However, driven by data science, precision medicine has skirted some of its major tenets. Data science embodies genomics, tissue- and cell-specific expression levels, making it capable of defining genome- and systems-wide molecular disease signatures. It classifies cancer driver genes/mutations and affected pathways, and its associated protein structural data guide drug discovery. Biophysics complements data science. It considers structures and their heterogeneous ensembles, explains how mutational variants can signal through distinct pathways, and how allo-network drugs can be harnessed. Biophysics clarifies how one mutation—frequent or rare—can affect multiple phenotypic traits by populating conformations that favor interactions with other network modules. It also suggests how to identify such mutations and their signaling consequences. Biophysics offers principles and strategies that can help precision medicine push the boundaries to transform our insight into biological processes and the practice of personalized medicine. By contrast, “phenotypic drug discovery,” which capitalizes on physiological cellular conditions and first-in-class drug discovery, may not capture the proper molecular variant. This is because variants of the same protein can express more than one phenotype, and a phenotype can be encoded by several variants.
Collapse
|
70
|
Abstract
Network-aided in silico approaches have been widely used for prediction of drug-target interactions and evaluation of drug safety to increase the clinical efficiency and productivity during drug discovery and development. Here we review the advances and new progress in this field and summarize the translational applications of several new network-aided in silico approaches we developed recently. In addition, we describe the detailed protocols for a network-aided drug repositioning infrastructure for identification of new targets for old drugs, failed drugs in clinical trials, and new chemical entities. These state-of-the-art network-aided in silico approaches have been used for the discovery and development of broad-acting and targeted clinical therapies for various complex diseases, in particular for oncology drug repositioning. In this chapter, the described network-aided in silico protocols are appropriate for target-centric drug repositioning to various complex diseases, but expertise is still necessary to perform the specific oncology projects based on the cancer targets of interest.
Collapse
|
71
|
Frost HR, Amos CI. A multi-omics approach for identifying important pathways and genes in human cancer. BMC Bioinformatics 2018; 19:479. [PMID: 30541428 PMCID: PMC6292115 DOI: 10.1186/s12859-018-2476-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 11/09/2018] [Indexed: 12/15/2022] Open
Abstract
Background Cancer develops when pathways controlling cell survival, cell fate or genome maintenance are disrupted by the somatic alteration of key driver genes. Understanding how pathway disruption is driven by somatic alterations is thus essential for an accurate characterization of cancer biology and identification of therapeutic targets. Unfortunately, current cancer pathway analysis methods fail to fully model the relationship between somatic alterations and pathway activity. Results To address these limitations, we developed a multi-omics method for identifying biologically important pathways and genes in human cancer. Our approach combines single-sample pathway analysis with multi-stage, lasso-penalized regression to find pathways whose gene expression can be explained largely in terms of gene-level somatic alterations in the tumor. Importantly, this method can analyze case-only data sets, does not require information regarding pathway topology and supports personalized pathway analysis using just somatic alteration data for a limited number of cancer-associated genes. The practical effectiveness of this technique is illustrated through an analysis of data from The Cancer Genome Atlas using gene sets from the Molecular Signatures Database. Conclusions Novel insights into the pathophysiology of human cancer can be obtained from statistical models that predict expression-based pathway activity in terms of non-silent somatic mutations and copy number variation. These models enable the identification of biologically important pathways and genes and support personalized pathway analysis in cases where gene expression data is unavailable. Electronic supplementary material The online version of this article (10.1186/s12859-018-2476-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- H Robert Frost
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, 03755, NH, USA.
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine, Dartmouth College, Hanover, 03755, NH, USA
| |
Collapse
|
72
|
Fang J, Liu C, Wang Q, Lin P, Cheng F. In silico polypharmacology of natural products. Brief Bioinform 2018; 19:1153-1171. [PMID: 28460068 DOI: 10.1093/bib/bbx045] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Indexed: 01/03/2025] Open
Abstract
Natural products with polypharmacological profiles have demonstrated promise as novel therapeutics for various complex diseases, including cancer. Currently, many gaps exist in our knowledge of which compounds interact with which targets, and experimentally testing all possible interactions is infeasible. Recent advances and developments of systems pharmacology and computational (in silico) approaches provide powerful tools for exploring the polypharmacological profiles of natural products. In this review, we introduce recent progresses and advances of computational tools and systems pharmacology approaches for identifying drug targets of natural products by focusing on the development of targeted cancer therapy. We survey the polypharmacological and systems immunology profiles of five representative natural products that are being considered as cancer therapies. We summarize various chemoinformatics, bioinformatics and systems biology resources for reconstructing drug-target networks of natural products. We then review currently available computational approaches and tools for prediction of drug-target interactions by focusing on five domains: target-based, ligand-based, chemogenomics-based, network-based and omics-based systems biology approaches. In addition, we describe a practical example of the application of systems pharmacology approaches by integrating the polypharmacology of natural products and large-scale cancer genomics data for the development of precision oncology under the systems biology framework. Finally, we highlight the promise of cancer immunotherapies and combination therapies that target tumor ecosystems (e.g. clones or 'selfish' sub-clones) via exploiting the immunological and inflammatory 'side' effects of natural products in the cancer post-genomics era.
Collapse
Affiliation(s)
- Jiansong Fang
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Chuang Liu
- Alibaba Research Center for Complexity Sciences at the Hangzhou Normal University, Hangzhou, China
| | - Qi Wang
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, China
| | - Ping Lin
- National Key Laboratory of Biotherapy and Cancer Center, West China Hospital, West China Medical School, Sichuan University, Chengdu, Sichuan, China
| | - Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center in Nashville (United States)
| |
Collapse
|
73
|
Agajanian S, Odeyemi O, Bischoff N, Ratra S, Verkhivker GM. Machine Learning Classification and Structure–Functional Analysis of Cancer Mutations Reveal Unique Dynamic and Network Signatures of Driver Sites in Oncogenes and Tumor Suppressor Genes. J Chem Inf Model 2018; 58:2131-2150. [DOI: 10.1021/acs.jcim.8b00414] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Affiliation(s)
- Steve Agajanian
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Oluyemi Odeyemi
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Nathaniel Bischoff
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Simrath Ratra
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
| | - Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Department of Computational Sciences, Schmid College of Science and Technology, Chapman University, One University
Drive, Orange, California 92866, United States
- Chapman University, School of Pharmacy, Irvine, California 92618, United States
| |
Collapse
|
74
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
75
|
Computational Approaches to Prioritize Cancer Driver Missense Mutations. Int J Mol Sci 2018; 19:ijms19072113. [PMID: 30037003 PMCID: PMC6073793 DOI: 10.3390/ijms19072113] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/02/2018] [Accepted: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Cancer is a complex disease that is driven by genetic alterations. There has been a rapid development of genome-wide techniques during the last decade along with a significant lowering of the cost of gene sequencing, which has generated widely available cancer genomic data. However, the interpretation of genomic data and the prediction of the association of genetic variations with cancer and disease phenotypes still requires significant improvement. Missense mutations, which can render proteins non-functional and provide a selective growth advantage to cancer cells, are frequently detected in cancer. Effects caused by missense mutations can be pinpointed by in silico modeling, which makes it more feasible to find a treatment and reverse the effect. Specific human phenotypes are largely determined by stability, activity, and interactions between proteins and other biomolecules that work together to execute specific cellular functions. Therefore, analysis of missense mutations’ effects on proteins and their complexes would provide important clues for identifying functionally important missense mutations, understanding the molecular mechanisms of cancer progression and facilitating treatment and prevention. Herein, we summarize the major computational approaches and tools that provide not only the classification of missense mutations as cancer drivers or passengers but also the molecular mechanisms induced by driver mutations. This review focuses on the discussion of annotation and prediction methods based on structural and biophysical data, analysis of somatic cancer missense mutations in 3D structures of proteins and their complexes, predictions of the effects of missense mutations on protein stability, protein-protein and protein-nucleic acid interactions, and assessment of conformational changes in protein conformations induced by mutations.
Collapse
|
76
|
NIPS, a 3D network-integrated predictor of deleterious protein SAPs, and its application in cancer prognosis. Sci Rep 2018; 8:6021. [PMID: 29662108 PMCID: PMC5902451 DOI: 10.1038/s41598-018-24286-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Accepted: 03/27/2018] [Indexed: 12/20/2022] Open
Abstract
Identifying deleterious mutations remains a challenge in cancer genome sequencing projects, reflecting the vast number of candidate mutations per tumour and the existence of interpatient heterogeneity. Based on a 3D protein interaction network profiled via large-scale cross-linking mass spectrometry, we propose a weighted average formula involving the combination of three types of information into a 'meta-score'. We assume that a single amino acid polymorphism (SAP) may have a deleterious effect if the mutation rarely occurs naturally during evolution, if it inhibits binding between a pair of interacting proteins when located at their interface, or if it plays an important role in a protein interaction (PPI) network. Cross-validation indicated that this new method presents an AUC value of 0.93 and outperforms other widely used tools. The application of this method to the CPTAC colorectal cancer dataset enabled the accurate identification of validated deleterious mutations and yielded insights into their potential pathogenesis. Survival analysis showed that the accumulation of deleterious SAPs is significantly associated with a poor prognosis. The new method provides an alternative method to identifying and ranking deleterious cancer SAPs based on a 3D PPI network and will contribute to the understanding of pathogenesis and the discovery of prognostic biomarkers.
Collapse
|
77
|
Gao B, Li G, Liu J, Li Y, Huang X. Identification of driver modules in pan-cancer via coordinating coverage and exclusivity. Oncotarget 2018; 8:36115-36126. [PMID: 28415609 PMCID: PMC5482642 DOI: 10.18632/oncotarget.16433] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 03/13/2017] [Indexed: 12/30/2022] Open
Abstract
It is widely accepted that cancer is driven by accumulated somatic mutations during the lifetime of an individual. Cancer mutations may target relatively small number of cell functional modules. The heterogeneity in different cancer patients makes it difficult to identify driver mutations or functional modules related to cancer. It is biologically desired to be capable of identifying cancer pathway modules through coordination between coverage and exclusivity. There have been a few approaches developed for this purpose, but they all have limitations in practice due to their computational complexity and prediction accuracy. We present a network based approach, CovEx, to predict the specific patient oriented modules by 1) discovering candidate modules for each considered gene, 2) extracting significant candidates by harmonizing coverage and exclusivity and, 3) further selecting the patient oriented modules based on a set cover model. Applying CovEx to pan-cancer datasets spanning 12 cancer types collecting from public database TCGA, it demonstrates significant superiority over the current leading competitors in performance. It is published under GNU GENERAL PUBLIC LICENSE and the source code is available at:https://sourceforge.net/projects/cancer-pathway/files/
Collapse
Affiliation(s)
- Bo Gao
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.,Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, 72401, USA
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.,Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, 72401, USA
| | - Juntao Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Yang Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, 72401, USA.,Molecular Biosciences Program, Arkansas State University, Jonesboro, Arkansas, 72401, USA
| |
Collapse
|
78
|
Bertrand D, Drissler S, Chia BK, Koh JY, Li C, Suphavilai C, Tan IB, Nagarajan N. ConsensusDriver Improves upon Individual Algorithms for Predicting Driver Alterations in Different Cancer Types and Individual Patients. Cancer Res 2017; 78:290-301. [PMID: 29259006 DOI: 10.1158/0008-5472.can-17-1345] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Revised: 08/02/2017] [Accepted: 10/24/2017] [Indexed: 11/16/2022]
Abstract
Existing cancer driver prediction methods are based on very different assumptions and each of them can detect only a particular subset of driver genes. Here we perform a comprehensive assessment of 18 driver prediction methods on more than 3,400 tumor samples from 15 cancer types, all to determine their suitability in guiding precision medicine efforts. We categorized these methods into five groups: functional impact on proteins in general (FI) or specific to cancer (FIC), cohort-based analysis for recurrent mutations (CBA), mutations with expression correlation (MEC), and methods that use gene interaction network-based analysis (INA). The performance of driver prediction methods varied considerably, with concordance with a gold standard varying from 9% to 68%. FI methods showed relatively poor performance (concordance <22%), while CBA methods provided conservative results but required large sample sizes for high sensitivity. INA methods, through the integration of genomic and transcriptomic data, and FIC methods, by training cancer-specific models, provided the best trade-off between sensitivity and specificity. As the methods were found to predict different subsets of driver genes, we propose a novel consensus-based approach, ConsensusDriver, which significantly improves the quality of predictions (20% increase in sensitivity) in patient subgroups or even individual patients. Consensus-based methods like ConsensusDriver promise to harness the strengths of different driver prediction paradigms.Significance: These findings assess state-of-the-art cancer driver prediction methods and develop a new and improved consensus-based approach for use in precision oncology. Cancer Res; 78(1); 290-301. ©2017 AACR.
Collapse
Affiliation(s)
- Denis Bertrand
- Computational and Systems Biology, Genome Institute of Singapore, Singapore.
| | - Sibyl Drissler
- Computational and Systems Biology, Genome Institute of Singapore, Singapore.,Terry Fox Laboratory, BC Cancer Agency, British Columbia, Canada
| | - Burton K Chia
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jia Yu Koh
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Chenhao Li
- Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Chayaporn Suphavilai
- Computational and Systems Biology, Genome Institute of Singapore, Singapore.,Department of Computer Science, School of Computing, National University of Singapore, Singapore
| | - Iain Beehuat Tan
- Division of Medical Oncology, National Cancer Centre Singapore, Singapore.,Cancer Therapeutics & Stratified Oncology, Genome Institute of Singapore, Singapore
| | - Niranjan Nagarajan
- Computational and Systems Biology, Genome Institute of Singapore, Singapore.
| |
Collapse
|
79
|
Cheng F, Hong H, Yang S, Wei Y. Individualized network-based drug repositioning infrastructure for precision oncology in the panomics era. Brief Bioinform 2017; 18:682-697. [PMID: 27296652 DOI: 10.1093/bib/bbw051] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Indexed: 12/12/2022] Open
Abstract
Advances in next-generation sequencing technologies have generated the data supporting a large volume of somatic alterations in several national and international cancer genome projects, such as The Cancer Genome Atlas and the International Cancer Genome Consortium. These cancer genomics data have facilitated the revolution of a novel oncology drug discovery paradigm from candidate target or gene studies toward targeting clinically relevant driver mutations or molecular features for precision cancer therapy. This focuses on identifying the most appropriately targeted therapy to an individual patient harboring a particularly genetic profile or molecular feature. However, traditional experimental approaches that are used to develop new chemical entities for targeting the clinically relevant driver mutations are costly and high-risk. Drug repositioning, also known as drug repurposing, re-tasking or re-profiling, has been demonstrated as a promising strategy for drug discovery and development. Recently, computational techniques and methods have been proposed for oncology drug repositioning and identifying pharmacogenomics biomarkers, but overall progress remains to be seen. In this review, we focus on introducing new developments and advances of the individualized network-based drug repositioning approaches by targeting the clinically relevant driver events or molecular features derived from cancer panomics data for the development of precision oncology drug therapies (e.g. one-person trials) to fully realize the promise of precision medicine. We discuss several potential challenges (e.g. tumor heterogeneity and cancer subclones) for precision oncology. Finally, we highlight several new directions for the precision oncology drug discovery via biotherapies (e.g. gene therapy and immunotherapy) that target the 'undruggable' cancer genome in the functional genomics era.
Collapse
|
80
|
Driver pattern identification over the gene co-expression of drug response in ovarian cancer by integrating high throughput genomics data. Sci Rep 2017. [PMID: 29170526 DOI: 10.1038/s41598-017-16286-5]+[] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Multiple types of high throughput genomics data create a potential opportunity to identify driver patterns in ovarian cancer, which will acquire some novel and clinical biomarkers for appropriate diagnosis and treatment to cancer patients. To identify candidate driver genes and the corresponding driving patterns for resistant and sensitive tumors from the heterogeneous data, we combined gene co-expression modules with mutation modulators and proposed the method to identify driver patterns. Firstly, co-expression network analysis is applied to explore gene modules for gene expression profiles through weighted correlation network analysis (WGCNA). Secondly, mutation matrix is generated by integrating the CNV data and somatic mutation data, and a mutation network is constructed from the mutation matrix. Thirdly, candidate modulators are selected from significant genes by clustering vertexs of the mutation network. Finally, a regression tree model is utilized for module network learning, in which the obtained gene modules and candidate modulators are trained for the driving pattern identification and modulators regulatory exploration. Many identified candidate modulators are known to be involved in biological meaningful processes associated with ovarian cancer, such as CCL11, CCL16, CCL18, CCL23, CCL8, CCL5, APOB, BRCA1, SLC18A1, FGF22, GADD45B, GNA15, GNA11, and so on.
Collapse
|
81
|
Lu X, Lu J, Liao B, Li X, Qian X, Li K. Driver pattern identification over the gene co-expression of drug response in ovarian cancer by integrating high throughput genomics data. Sci Rep 2017. [PMID: 29170526 DOI: 10.1038/s41598-017-16286-5] [] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Multiple types of high throughput genomics data create a potential opportunity to identify driver patterns in ovarian cancer, which will acquire some novel and clinical biomarkers for appropriate diagnosis and treatment to cancer patients. To identify candidate driver genes and the corresponding driving patterns for resistant and sensitive tumors from the heterogeneous data, we combined gene co-expression modules with mutation modulators and proposed the method to identify driver patterns. Firstly, co-expression network analysis is applied to explore gene modules for gene expression profiles through weighted correlation network analysis (WGCNA). Secondly, mutation matrix is generated by integrating the CNV data and somatic mutation data, and a mutation network is constructed from the mutation matrix. Thirdly, candidate modulators are selected from significant genes by clustering vertexs of the mutation network. Finally, a regression tree model is utilized for module network learning, in which the obtained gene modules and candidate modulators are trained for the driving pattern identification and modulators regulatory exploration. Many identified candidate modulators are known to be involved in biological meaningful processes associated with ovarian cancer, such as CCL11, CCL16, CCL18, CCL23, CCL8, CCL5, APOB, BRCA1, SLC18A1, FGF22, GADD45B, GNA15, GNA11, and so on.
Collapse
Affiliation(s)
- Xinguo Lu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China.
| | - Jibo Lu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Bo Liao
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Xing Li
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Xin Qian
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Keqin Li
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China.,Department of Computer Science, State University of New York, New Paltz, NY, 12561, USA
| |
Collapse
|
82
|
Lu X, Lu J, Liao B, Li X, Qian X, Li K. Driver pattern identification over the gene co-expression of drug response in ovarian cancer by integrating high throughput genomics data. Sci Rep 2017; 7:16188. [PMID: 29170526 PMCID: PMC5700962 DOI: 10.1038/s41598-017-16286-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 11/09/2017] [Indexed: 01/08/2023] Open
Abstract
Multiple types of high throughput genomics data create a potential opportunity to identify driver patterns in ovarian cancer, which will acquire some novel and clinical biomarkers for appropriate diagnosis and treatment to cancer patients. To identify candidate driver genes and the corresponding driving patterns for resistant and sensitive tumors from the heterogeneous data, we combined gene co-expression modules with mutation modulators and proposed the method to identify driver patterns. Firstly, co-expression network analysis is applied to explore gene modules for gene expression profiles through weighted correlation network analysis (WGCNA). Secondly, mutation matrix is generated by integrating the CNV data and somatic mutation data, and a mutation network is constructed from the mutation matrix. Thirdly, candidate modulators are selected from significant genes by clustering vertexs of the mutation network. Finally, a regression tree model is utilized for module network learning, in which the obtained gene modules and candidate modulators are trained for the driving pattern identification and modulators regulatory exploration. Many identified candidate modulators are known to be involved in biological meaningful processes associated with ovarian cancer, such as CCL11, CCL16, CCL18, CCL23, CCL8, CCL5, APOB, BRCA1, SLC18A1, FGF22, GADD45B, GNA15, GNA11, and so on.
Collapse
Affiliation(s)
- Xinguo Lu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China.
| | - Jibo Lu
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Bo Liao
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Xing Li
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Xin Qian
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
| | - Keqin Li
- College of Computer Science and Electronic Engineering, Hunan University, Lushan Nan Rd., Changsha, 410082, China
- Department of Computer Science, State University of New York, New Paltz, NY, 12561, USA
| |
Collapse
|
83
|
Zhang T, Zhang D. Integrating omics data and protein interaction networks to prioritize driver genes in cancer. Oncotarget 2017; 8:58050-58060. [PMID: 28938536 PMCID: PMC5601632 DOI: 10.18632/oncotarget.19481] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 06/19/2017] [Indexed: 11/25/2022] Open
Abstract
Although numerous approaches have been proposed to discern driver from passenger, identification of driver genes remains a critical challenge in the cancer genomics field. Driver genes with low mutated frequency tend to be filtered in cancer research. In addition, the accumulation of different omics data necessitates the development of algorithmic frameworks for nominating putative driver genes. In this study, we presented a novel framework to identify driver genes through integrating multi-omics data such as somatic mutation, gene expression, and copy number alterations. We developed a computational approach to detect potential driver genes by virtue of their effect on their neighbors in network. Application to three datasets (head and neck squamous cell carcinoma (HNSC), thyroid carcinoma (THCA) and kidney renal clear cell carcinoma (KIRC)) from The Cancer Genome Atlas (TCGA), by comparing the Precision, Recall and F1 score, our method outperformed DriverNet and MUFFINN in all three datasets. In addition, our method was less affected by protein length compared with DriverNet. Lastly, our method not only identified the known cancer genes but also detected the potential rare drivers (PTPN6 in THCA, SRC, GRB2 and PTPN6 in KIRC, MAPK1 and SMAD2 in HNSC).
Collapse
Affiliation(s)
- Tiejun Zhang
- GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, Guangzhou, Guangdong 511436, China
| | - Di Zhang
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
84
|
Breast Cancer Risk Associated with Genotype Polymorphisms of the Aurora Kinase a Gene (AURKA): a Case-Control Study in a High Altitude Ecuadorian Mestizo Population. Pathol Oncol Res 2017. [PMID: 28647900 DOI: 10.1007/s12253-017-0267-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Breast cancer (BC) is the leading cause of cancer related death among women in 2014. The AURKA gene that encodes the protein called Aurora kinase A plays an important role in the progression of the cell cycle, by controlling and promoting the entry into the phase of mitosis. The single nucleotide polymorphism AURKA T91A (rs2273535) (Phe21Ile) has been identified as functional alternator of this kinase, the Ile allele is associated with the occurrence of chromosome segregation errors and tumor progression. Therefore, it is essential to know how BC risk is associated with histopathological characteristics, immunohistochemical characteristics, and genotype polymorphism in a high altitude Ecuadorian mestizo population. In this retrospective case-control study 200 individuals were analyzed. DNA was extracted from 100 healthy and 100 affected women. Genotypes were determined by genomic sequencing. We found significant association between the AURKA T91A (rs2273535) (Phe21Ile) genotype and an increased risk of BC development: Phe/Ile (odds ratio [OR] = 2.6; 95% confidence interval [CI] = 1.4-4.9; P = 0.004), Ile/Ile (OR = 3.8; 95% CI = 1.6-9.0; P = 0.002), and Phe/Ile + Ile/Ile (OR = 2.9; 95% CI = 1.6-5.2; P = 0.001). Additionally, the rs2273535 variant was associated with the tumor grade SBR III (OR = 9.6; 95% CI = 1.0-91.9; P = 0.048) and the Ki-67 ≥ 20 (OR = 16.5; 95% CI = 2.7-101.3; P = 0.002). In brief, this study provides the first evidence where the Ile allele of the AURKA gene could act as potentially predictive biomarker of BC in the high altitude Ecuadorian mestizo population that lives at 2800 m above sea level (masl).
Collapse
|
85
|
Fang J, Cai C, Wang Q, Lin P, Zhao Z, Cheng F. Systems Pharmacology-Based Discovery of Natural Products for Precision Oncology Through Targeting Cancer Mutated Genes. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2017; 6:177-187. [PMID: 28294568 PMCID: PMC5356618 DOI: 10.1002/psp4.12172] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Revised: 01/09/2017] [Accepted: 01/10/2017] [Indexed: 02/05/2023]
Abstract
Massive cancer genomics data have facilitated the rapid revolution of a novel oncology drug discovery paradigm through targeting clinically relevant driver genes or mutations for the development of precision oncology. Natural products with polypharmacological profiles have been demonstrated as promising agents for the development of novel cancer therapies. In this study, we developed an integrated systems pharmacology framework that facilitated identifying potential natural products that target mutated genes across 15 cancer types or subtypes in the realm of precision medicine. High performance was achieved for our systems pharmacology framework. In case studies, we computationally identified novel anticancer indications for several US Food and Drug Administration-approved or clinically investigational natural products (e.g., resveratrol, quercetin, genistein, and fisetin) through targeting significantly mutated genes in multiple cancer types. In summary, this study provides a powerful tool for the development of molecularly targeted cancer therapies through targeting the clinically actionable alterations by exploiting the systems pharmacology of natural products.
Collapse
Affiliation(s)
- J Fang
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, P.R. China
| | - C Cai
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, P.R. China
| | - Q Wang
- Institute of Clinical Pharmacology, Guangzhou University of Chinese Medicine, Guangzhou, P.R. China
| | - P Lin
- State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu, Sichuan, P.R. China
| | - Z Zhao
- Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA.,Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - F Cheng
- State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu, Sichuan, P.R. China.,Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, USA.,Center for Complex Networks Research, Northeastern University, Boston, Massachusetts, USA
| |
Collapse
|
86
|
Mounika Inavolu S, Renbarger J, Radovich M, Vasudevaraja V, Kinnebrew GH, Zhang S, Cheng L. IODNE: An integrated optimization method for identifying the deregulated subnetwork for precision medicine in cancer. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2017; 6:168-176. [PMID: 28266149 PMCID: PMC5351413 DOI: 10.1002/psp4.12167] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 01/05/2017] [Accepted: 01/06/2017] [Indexed: 12/18/2022]
Abstract
Subnetwork analysis can explore complex patterns of entire molecular pathways for the purpose of drug target identification. In this article, the gene expression profiles of a cohort of patients with breast cancer are integrated with protein‐protein interaction (PPI) networks using, simultaneously, both edge scoring and node scoring. A novel optimization algorithm, integrated optimization method to identify deregulated subnetwork (IODNE), is developed to search for the optimal dysregulated subnetwork of the merged gene and protein network. IODNE is applied to select subnetworks for Luminal‐A breast cancer from The Cancer Genome Atlas (TCGA) data. A large fraction of cancer‐related genes and the well‐known clinical targets, ER1/PR and HER2, are found by IODNE. This validates the utility of IODNE. When applying IODNE to the triple‐negative breast cancer (TNBC) subtype data, we identified subnetworks that contain genes such as ERBB2, HRAS, PGR, CAD, POLE, and SLC2A1.
Collapse
Affiliation(s)
- S Mounika Inavolu
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - J Renbarger
- Department of Pediatrics, Hematology/Oncology, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - M Radovich
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - V Vasudevaraja
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - G H Kinnebrew
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - S Zhang
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - L Cheng
- Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Medical and Molecular Genetics, School of Medicine, Indiana University, Indianapolis, Indiana, USA.,Department of Pediatrics, Hematology/Oncology, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| |
Collapse
|
87
|
Proteome-Scale Investigation of Protein Allosteric Regulation Perturbed by Somatic Mutations in 7,000 Cancer Genomes. Am J Hum Genet 2017; 100:5-20. [PMID: 27939638 DOI: 10.1016/j.ajhg.2016.09.020] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Accepted: 09/27/2016] [Indexed: 02/05/2023] Open
Abstract
The allosteric regulation triggering the protein's functional activity via conformational changes is an intrinsic function of protein under many physiological and pathological conditions, including cancer. Identification of the biological effects of specific somatic variants on allosteric proteins and the phenotypes that they alter during tumor initiation and progression is a central challenge for cancer genomes in the post-genomic era. Here, we mapped more than 47,000 somatic missense mutations observed in approximately 7,000 tumor-normal matched samples across 33 cancer types into protein allosteric sites to prioritize the mutated allosteric proteins and we tested our prediction in cancer cell lines. We found that the deleterious mutations identified in cancer genomes were more significantly enriched at protein allosteric sites than tolerated mutations, suggesting a critical role for protein allosteric variants in cancer. Next, we developed a statistical approach, namely AlloDriver, and further identified 15 potential mutated allosteric proteins during pan-cancer and individual cancer-type analyses. More importantly, we experimentally confirmed that p.Pro360Ala on PDE10A played a potential oncogenic role in mediating tumorigenesis in non-small cell lung cancer (NSCLC). In summary, these findings shed light on the role of allosteric regulation during tumorigenesis and provide a useful tool for the timely development of targeted cancer therapies.
Collapse
|
88
|
Xi J, Wang M, Li A. Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. MOLECULAR BIOSYSTEMS 2017; 13:2135-2144. [DOI: 10.1039/c7mb00303j] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Abstract
An integrated approach to identify driver genes based on information of somatic mutations, the interaction network and Gene Ontology similarity.
Collapse
Affiliation(s)
- Jianing Xi
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
| | - Minghui Wang
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| | - Ao Li
- School of Information Science and Technology
- University of Science and Technology of China
- Hefei AH 230027
- People’s Republic of China
- Centers for Biomedical Engineering
| |
Collapse
|
89
|
Wei PJ, Zhang D, Xia J, Zheng CH. LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network. BMC Bioinformatics 2016; 17:467. [PMID: 28155630 PMCID: PMC5259866 DOI: 10.1186/s12859-016-1332-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Cancer is a complex disease which is characterized by the accumulation of genetic alterations during the patient's lifetime. With the development of the next-generation sequencing technology, multiple omics data, such as cancer genomic, epigenomic and transcriptomic data etc., can be measured from each individual. Correspondingly, one of the key challenges is to pinpoint functional driver mutations or pathways, which contributes to tumorigenesis, from millions of functional neutral passenger mutations. RESULTS In this paper, in order to identify driver genes effectively, we applied a generalized additive model to mutation profiles to filter genes with long length and constructed a new gene-gene interaction network. Then we integrated the mutation data and expression data into the gene-gene interaction network. Lastly, greedy algorithm was used to prioritize candidate driver genes from the integrated data. We named the proposed method Length-Net-Driver (LNDriver). CONCLUSIONS Experiments on three TCGA datasets, i.e., head and neck squamous cell carcinoma, kidney renal clear cell carcinoma and thyroid carcinoma, demonstrated that the proposed method was effective. Also, it can identify not only frequently mutated drivers, but also rare candidate driver genes.
Collapse
Affiliation(s)
- Pi-Jing Wei
- College of Electrical Engineering and Automation, Anhui University, Hefei, Anhui 230601 China
| | - Di Zhang
- College of Computer Science and Technology, Anhui University, Hefei, Anhui 230601 China
| | - Junfeng Xia
- Institute of Health Sciences, Anhui University, Hefei, Anhui 230601 China
| | - Chun-Hou Zheng
- College of Computer Science and Technology, Anhui University, Hefei, Anhui 230601 China
| |
Collapse
|
90
|
Abstract
Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machine-learning-based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.
Collapse
|
91
|
Cheng F, Zhao J, Hanker AB, Brewer MR, Arteaga CL, Zhao Z. Transcriptome- and proteome-oriented identification of dysregulated eIF4G, STAT3, and Hippo pathways altered by PIK3CA H1047R in HER2/ER-positive breast cancer. Breast Cancer Res Treat 2016; 160:457-474. [PMID: 27771839 PMCID: PMC10183099 DOI: 10.1007/s10549-016-4011-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 10/05/2016] [Indexed: 01/25/2023]
Abstract
PURPOSE Phosphatidylinositol 3-kinase (PI3K)/AKT pathway aberrations are common in human breast cancer. Furthermore, PIK3CA mutations are commonly associated with resistance to anti-epidermal growth factor receptor 2 (HER2) or anti-estrogen receptor (ER) agents in HER2 or ER positive (HER2+/ER+) breast cancer. Hence, deciphering the underlying mechanisms of PIK3CA mutations in HER2+/ER+ breast cancer would provide novel insights into elucidating resistance to anti-HER2/ER therapies. METHODS In this study, we systematically investigated the biological consequences of PIK3CA H1047R in HER2+/ER+ breast cancer by uniquely incorporating mRNA transcriptomic data from The Cancer Genome Atlas and proteomic data from reverse-phase protein arrays. RESULTS Our integrative bioinformatics analyses revealed that several important pathways such as STAT3 and VEGF/hypoxia were selectively altered by PIK3CA H1047R in HER2+/ER+ breast cancer. Protein differential expression analysis indicated that an elevated eIF4G might promote tumor angiogenesis and growth via regulation of the hypoxia-activated switch in HER2+ PIK3CA H1047R breast cancer. We observed hypo-phosphorylation of EGFR in HER2+ PIK3CA H1047R breast cancer versus HER2+PIK3CAwild-type (PIK3CA WT). In addition, ER and PIK3CA H1047R might cooperate to activate STAT3, MAPK, AKT, and Hippo pathways in ER+ PIK3CA H1047R breast cancer. A higher YAPpS127 level was observed in ER+ PIK3CA H1047R patients than that in an ER+ PIK3CA WT subgroup. By examining breast cancer cell lines having both microarray gene expression and drug treatment data from the Genomics of Drug Sensitivity in Cancer and the Stand Up to Cancer datasets, we found that the elevated YAP1 mRNA expression was associated with the resistance of BCL-2 family inhibitors, but with the sensitivity to MEK/MAPK inhibitors in breast cancer cells. CONCLUSIONS In summary, these findings shed light on the functional consequences of PIK3CA H1047R-driven breast tumorigenesis and resistance to the existing therapeutic agents in HER2+/ER+ breast cancer.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.,Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 02215, USA.,Center for Complex Networks Research, Northeastern University, Boston, MA, 02115, USA
| | - Junfei Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Ariella B Hanker
- Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Monica Red Brewer
- Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Carlos L Arteaga
- Department of Medicine, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. .,Department of Cancer Biology, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. .,Breast Cancer Research Program, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA. .,Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA. .,Department of Cancer Biology, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA. .,Breast Cancer Research Program, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| |
Collapse
|
92
|
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides. Sci Rep 2016; 6:32115. [PMID: 27558848 PMCID: PMC4997263 DOI: 10.1038/srep32115] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 08/02/2016] [Indexed: 12/19/2022] Open
Abstract
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.
Collapse
|
93
|
Qi H, Dong C, Chung WK, Wang K, Shen Y. Deep Genetic Connection Between Cancer and Developmental Disorders. Hum Mutat 2016; 37:1042-50. [PMID: 27363847 DOI: 10.1002/humu.23040] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 06/15/2016] [Accepted: 06/23/2016] [Indexed: 12/19/2022]
Abstract
Cancer and developmental disorders (DDs) share dysregulated cellular processes such as proliferation and differentiation. There are well-known genes implicated in both in cancer and DDs. In this study, we aim to quantify this genetic connection using publicly available data. We found that among DD patients, germline damaging de novo variants are more enriched in cancer driver genes than non-drivers. We estimate that cancer driver genes comprise about a third of DD risk genes. Additionally, de novo likely-gene-disrupting variants are more enriched in tumor suppressors, and about 40% of implicated de novo damaging missense variants are located in cancer somatic mutation hotspots, indicating that many genes have a similar mode of action in cancer and DDs. Our results suggest that we can view tumors as natural laboratories for assessing the deleterious effects of mutations that are applicable to germline variants and identification of causal genes and variants in DDs.
Collapse
Affiliation(s)
- Hongjian Qi
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York.,Department of Systems Biology, Columbia University Medical Center, New York, New York
| | - Chengliang Dong
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California.,Biostatistics Division, Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Wendy K Chung
- Departments of Pediatrics and Medicine, Columbia University Medical Center, New York, New York
| | - Kai Wang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California.,Biostatistics Division, Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Medical Center, New York, New York. .,Department of Biomedical Informatics, Columbia University Medical Center, New York, New York. .,JP Sulzberger Columbia Genome Center, Columbia University Medical Center, New York, New York.
| |
Collapse
|
94
|
Cheng F, Zhao J, Fooksa M, Zhao Z. A network-based drug repositioning infrastructure for precision cancer medicine through targeting significantly mutated genes in the human cancer genomes. J Am Med Inform Assoc 2016; 23:681-91. [PMID: 27026610 DOI: 10.1093/jamia/ocw007] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2015] [Accepted: 01/13/2016] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE Development of computational approaches and tools to effectively integrate multidomain data is urgently needed for the development of newly targeted cancer therapeutics. METHODS We proposed an integrative network-based infrastructure to identify new druggable targets and anticancer indications for existing drugs through targeting significantly mutated genes (SMGs) discovered in the human cancer genomes. The underlying assumption is that a drug would have a high potential for anticancer indication if its up-/down-regulated genes from the Connectivity Map tended to be SMGs or their neighbors in the human protein interaction network. RESULTS We assembled and curated 693 SMGs in 29 cancer types and found 121 proteins currently targeted by known anticancer or noncancer (repurposed) drugs. We found that the approved or experimental cancer drugs could potentially target these SMGs in 33.3% of the mutated cancer samples, and this number increased to 68.0% by drug repositioning through surveying exome-sequencing data in approximately 5000 normal-tumor pairs from The Cancer Genome Atlas. Furthermore, we identified 284 potential new indications connecting 28 cancer types and 48 existing drugs (adjusted P < .05), with a 66.7% success rate validated by literature data. Several existing drugs (e.g., niclosamide, valproic acid, captopril, and resveratrol) were predicted to have potential indications for multiple cancer types. Finally, we used integrative analysis to showcase a potential mechanism-of-action for resveratrol in breast and lung cancer treatment whereby it targets several SMGs (ARNTL, ASPM, CTTN, EIF4G1, FOXP1, and STIP1). CONCLUSIONS In summary, we demonstrated that our integrative network-based infrastructure is a promising strategy to identify potential druggable targets and uncover new indications for existing drugs to speed up molecularly targeted cancer therapeutics.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Junfei Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203, USA
| | - Michaela Fooksa
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203, USA Chemical and Physical Biology Program, Vanderbilt University School of Medicine, Nashville, TN 37232, USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203, USA Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232, USA Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37212, USA Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
95
|
Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics 2016; 8:E8. [PMID: 26999190 PMCID: PMC4810084 DOI: 10.3390/pharmaceutics8010008] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Revised: 03/08/2016] [Accepted: 03/10/2016] [Indexed: 01/22/2023] Open
Abstract
Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
Collapse
Affiliation(s)
- Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Shraddha Thakkar
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| |
Collapse
|
96
|
Discovering gene re-ranking efficiency and conserved gene-gene relationships derived from gene co-expression network analysis on breast cancer data. Sci Rep 2016; 6:20518. [PMID: 26892392 PMCID: PMC4759568 DOI: 10.1038/srep20518] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Accepted: 01/05/2016] [Indexed: 12/18/2022] Open
Abstract
Systemic approaches are essential in the discovery of disease-specific genes, offering a different perspective and new tools on the analysis of several types of molecular relationships, such as gene co-expression or protein-protein interactions. However, due to lack of experimental information, this analysis is not fully applicable. The aim of this study is to reveal the multi-potent contribution of statistical network inference methods in highlighting significant genes and interactions. We have investigated the ability of statistical co-expression networks to highlight and prioritize genes for breast cancer subtypes and stages in terms of: (i) classification efficiency, (ii) gene network pattern conservation, (iii) indication of involved molecular mechanisms and (iv) systems level momentum to drug repurposing pipelines. We have found that statistical network inference methods are advantageous in gene prioritization, are capable to contribute to meaningful network signature discovery, give insights regarding the disease-related mechanisms and boost drug discovery pipelines from a systems point of view.
Collapse
|
97
|
Zhao J, Cheng F, Wang Y, Arteaga CL, Zhao Z. Systematic Prioritization of Druggable Mutations in ∼5000 Genomes Across 16 Cancer Types Using a Structural Genomics-based Approach. Mol Cell Proteomics 2015; 15:642-56. [PMID: 26657081 DOI: 10.1074/mcp.m115.053199] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Indexed: 11/06/2022] Open
Abstract
A massive amount of somatic mutations has been cataloged in large-scale projects such as The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium projects. The majority of the somatic mutations found in tumor genomes are neutral 'passenger' rather than damaging "driver" mutations. Now, understanding their biological consequences and prioritizing them for druggable targets are urgently needed. Thanks to the rapid advances in structural genomics technologies (e.g. X-ray), large-scale protein structural data has now been made available, providing critical information for deciphering functional roles of mutations in cancer and prioritizing those alterations that may mediate drug binding at the atom resolution and, as such, be druggable targets. We hypothesized that mutations at protein-ligand binding-site residues are likely to be druggable targets. Thus, to prioritize druggable mutations, we developed SGDriver, a structural genomics-based method incorporating the somatic missense mutations into protein-ligand binding-site residues using a Bayes inference statistical framework. We applied SGDriver to 746,631 missense mutations observed in 4997 tumor-normal pairs across 16 cancer types from The Cancer Genome Atlas. SGDriver detected 14,471 potential druggable mutations in 2091 proteins (including 1,516 recurrently mutated proteins) across 3558 cancer genomes (71.2%), and further identified 298 proteins harboring mutations that were significantly enriched at protein-ligand binding-site residues (adjusted p value < 0.05). The identified proteins are significantly enriched in both oncoproteins and tumor suppressors. The follow-up drug-target network analysis suggested 98 known and 126 repurposed druggable anticancer targets (e.g. SPOP and NR3C1). Furthermore, our integrative analysis indicated that 13% of patients might benefit from current targeted therapy, and this -proportion would increase to 31% when considering drug repositioning. This study provides a testable strategy for prioritizing druggable mutations in precision cancer medicine.
Collapse
Affiliation(s)
- Junfei Zhao
- From the ‡Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37203
| | - Feixiong Cheng
- From the ‡Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37203
| | - Yuanyuan Wang
- From the ‡Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37203
| | - Carlos L Arteaga
- §Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee 37232; ¶Breast Cancer Program, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee 37232; ‖Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232
| | - Zhongming Zhao
- From the ‡Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee 37203; ‖Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee 37232; **Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee 37232; ¶¶School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas 77030
| |
Collapse
|
98
|
Cheng F, Liu C, Lin CC, Zhao J, Jia P, Li WH, Zhao Z. A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types. PLoS Comput Biol 2015; 11:e1004497. [PMID: 26352260 PMCID: PMC4564226 DOI: 10.1371/journal.pcbi.1004497] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 08/11/2015] [Indexed: 12/14/2022] Open
Abstract
Cancer development and progression result from somatic evolution by an accumulation of genomic alterations. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferation, angiogenesis, and altered anticancer drug responses. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape subsequent evolution of the cancer genome. In this study, we proposed the gene gravity model to study the evolution of cancer genomes by incorporating the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that somatic mutations of a cancer driver gene may drive cancer genome evolution by inducing mutations in other genes. This functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. By quantifying cancer genome evolution using the gene gravity model, we identified six putative cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1). The tumor genomes harboring the nonsynonymous somatic mutations in these genes had a higher mutation density at the genome level compared to the wild-type groups. Furthermore, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes. In summary, this study sheds light on the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by propelling adaptive cancer genome evolution, which would provide new perspectives for cancer research and therapeutics.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Chuang Liu
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Junfei Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
- Biodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|