1
|
Campitelli P, Kazan IC, Hamilton S, Ozkan SB. Dynamic Allostery: Evolution's Double-Edged Sword in Protein Function and Disease. J Mol Biol 2025:169175. [PMID: 40286867 DOI: 10.1016/j.jmb.2025.169175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Revised: 04/21/2025] [Accepted: 04/21/2025] [Indexed: 04/29/2025]
Abstract
Allostery is a core mechanism in biology that allows proteins to communicate and regulate activity over long structural distances. While classical models of allostery focus on conformational changes triggered by ligand binding, dynamic allostery-where protein function is modulated through alterations in thermal fluctuations without major conformational shifts-has emerged as a critical evolutionary mechanism. This review explores how evolution leverages dynamic allostery to fine-tune protein function through subtle mutations at distal sites, preserving core structural architecture while dramatically altering functional properties. Using a combination of computational approaches including Dynamic Flexibility Index (DFI), Dynamic Coupling Index (DCI), and vibrational density of states (VDOS) analysis, we demonstrate that functional adaptations in proteins often involve "hinge-shift" mechanisms, where redistribution of rigid and flexible regions modulates collective motions without changing the overall fold. This evolutionary principle is a double-edged sword: the same mechanisms that enable functional innovation also create vulnerabilities that can be exploited in disease states. Disease-associated variants frequently occur at positions highly coupled to functional sites despite being physically distant, forming Dynamic Allosteric Residue Couples (DARC sites). We demonstrate applications of these principles in understanding viral evolution, drug resistance, and capsid assembly dynamics. Understanding dynamic allostery provides critical insights into protein evolution and offers new avenues for therapeutic interventions targeting allosteric regulation.
Collapse
Affiliation(s)
- Paul Campitelli
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States
| | - I Can Kazan
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States
| | - Sean Hamilton
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States
| | - S Banu Ozkan
- Department of Physics, Arizona State University, Tempe, AZ, United States; Center for Biological Physics, Arizona State University, Tempe, AZ, United States.
| |
Collapse
|
2
|
Kumar S, Stecher G, Suleski M, Sanderford M, Sharma S, Tamura K. MEGA12: Molecular Evolutionary Genetic Analysis Version 12 for Adaptive and Green Computing. Mol Biol Evol 2024; 41:msae263. [PMID: 39708372 PMCID: PMC11683415 DOI: 10.1093/molbev/msae263] [Citation(s) in RCA: 49] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 12/12/2024] [Accepted: 12/17/2024] [Indexed: 12/23/2024] Open
Abstract
We introduce the 12th version of the Molecular Evolutionary Genetics Analysis (MEGA12) software. This latest version brings many significant improvements by reducing the computational time needed for selecting optimal substitution models and conducting bootstrap tests on phylogenies using maximum likelihood (ML) methods. These improvements are achieved by implementing heuristics that minimize likely unnecessary computations. Analyses of empirical and simulated datasets show substantial time savings by using these heuristics without compromising the accuracy of results. MEGA12 also links-in an evolutionary sparse learning approach to identify fragile clades and associated sequences in evolutionary trees inferred through phylogenomic analyses. In addition, this version includes fine-grained parallelization for ML analyses, support for high-resolution monitors, and an enhanced Tree Explorer. MEGA12 can be downloaded from https://www.megasoftware.net.
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Glen Stecher
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
| | - Michael Suleski
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
| | - Sudip Sharma
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Koichiro Tamura
- Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
- Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University, Tokyo, Japan
| |
Collapse
|
3
|
Luu DD, Ramesh N, Kazan IC, Shah KH, Lahiri G, Mana MD, Ozkan SB, Van Horn WD. Evidence that the cold- and menthol-sensing functions of the human TRPM8 channel evolved separately. SCIENCE ADVANCES 2024; 10:eadm9228. [PMID: 38905339 PMCID: PMC11192081 DOI: 10.1126/sciadv.adm9228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 05/16/2024] [Indexed: 06/23/2024]
Abstract
Transient receptor potential melastatin 8 (TRPM8) is a temperature- and menthol-sensitive ion channel that contributes to diverse physiological roles, including cold sensing and pain perception. Clinical trials targeting TRPM8 have faced repeated setbacks predominantly due to the knowledge gap in unraveling the molecular underpinnings governing polymodal activation. A better understanding of the molecular foundations between the TRPM8 activation modes may aid the development of mode-specific, thermal-neutral therapies. Ancestral sequence reconstruction was used to explore the origins of TRPM8 activation modes. By resurrecting key TRPM8 nodes along the human evolutionary trajectory, we gained valuable insights into the trafficking, stability, and function of these ancestral forms. Notably, this approach unveiled the differential emergence of cold and menthol sensitivity over evolutionary time, providing a fresh perspective on complex polymodal behavior. These studies provide a paradigm for understanding polymodal behavior in TRPM8 and other proteins with the potential to enhance our understanding of sensory receptor biology and pave the way for innovative therapeutic interventions.
Collapse
Affiliation(s)
- Dustin D. Luu
- School of Molecular Sciences and The Virginia G. Piper Biodesign Center for Personalized Diagnostics, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Nikhil Ramesh
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, USA
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, USA
| | - Karan H. Shah
- School of Molecular Sciences and The Virginia G. Piper Biodesign Center for Personalized Diagnostics, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Gourab Lahiri
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Miyeko D. Mana
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, USA
| | - Wade D. Van Horn
- School of Molecular Sciences and The Virginia G. Piper Biodesign Center for Personalized Diagnostics, The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
4
|
Chen H, Shu J, Maley CC, Liu L. A Mouse-Specific Model to Detect Genes under Selection in Tumors. Cancers (Basel) 2023; 15:5156. [PMID: 37958330 PMCID: PMC10647215 DOI: 10.3390/cancers15215156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/16/2023] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
The mouse is a widely used model organism in cancer research. However, no computational methods exist to identify cancer driver genes in mice due to a lack of labeled training data. To address this knowledge gap, we adapted the GUST (Genes Under Selection in Tumors) model, originally trained on human exomes, to mouse exomes via transfer learning. The resulting tool, called GUST-mouse, can estimate long-term and short-term evolutionary selection in mouse tumors, and distinguish between oncogenes, tumor suppressor genes, and passenger genes using high-throughput sequencing data. We applied GUST-mouse to analyze 65 exomes of mouse primary breast cancer models and 17 exomes of mouse leukemia models. Comparing the predictions between cancer types and between human and mouse tumors revealed common and unique driver genes. The GUST-mouse method is available as an open-source R package on github.
Collapse
Affiliation(s)
- Hai Chen
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
| | - Jingmin Shu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
| | - Carlo C. Maley
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85281, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; (H.C.); (J.S.)
- Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;
- Arizona Cancer Evolution Center, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|
5
|
Ose NJ, Butler BM, Kumar A, Kazan IC, Sanderford M, Kumar S, Ozkan SB. Dynamic coupling of residues within proteins as a mechanistic foundation of many enigmatic pathogenic missense variants. PLoS Comput Biol 2022; 18:e1010006. [PMID: 35389981 PMCID: PMC9017885 DOI: 10.1371/journal.pcbi.1010006] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 04/19/2022] [Accepted: 03/09/2022] [Indexed: 01/07/2023] Open
Abstract
Many pathogenic missense mutations are found in protein positions that are neither well-conserved nor fall in any known functional domains. Consequently, we lack any mechanistic underpinning of dysfunction caused by such mutations. We explored the disruption of allosteric dynamic coupling between these positions and the known functional sites as a possible mechanism for pathogenesis. In this study, we present an analysis of 591 pathogenic missense variants in 144 human enzymes that suggests that allosteric dynamic coupling of mutated positions with known active sites is a plausible biophysical mechanism and evidence of their functional importance. We illustrate this mechanism in a case study of β-Glucocerebrosidase (GCase) in which a vast majority of 94 sites harboring Gaucher disease-associated missense variants are located some distance away from the active site. An analysis of the conformational dynamics of GCase suggests that mutations on these distal sites cause changes in the flexibility of active site residues despite their distance, indicating a dynamic communication network throughout the protein. The disruption of the long-distance dynamic coupling caused by missense mutations may provide a plausible general mechanistic explanation for biological dysfunction and disease.
Collapse
Affiliation(s)
- Nicholas J. Ose
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Center for Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
6
|
Wei SA, Xu R, Ji YY, Ding ZW, Zou YZ. Deduction and exploration of the evolution and function of vertebrate GFPT family. Genes Genomics 2022; 44:175-185. [PMID: 35038160 DOI: 10.1007/s13258-021-01188-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 11/06/2021] [Indexed: 11/27/2022]
Abstract
BACKGROUND Glutamine-fructose-6-phosphate aminotransferase (GFPT) is a key factor in the hexosamine metabolism pathway. It regulates the downstream factor O-GlcNAc to change cell function and plays an important role in the metabolism and immune process of tissues and organs. However, the evolutionary relationship of GFPT family proteins in vertebrates has not been elucidated. OBJECTIVE To deduce and explore the evolution and function of vertebrate GFPT family. METHODS 18 GFPT sequences were obtained from Homo sapiens (H. sapiens), Trachypithecus francoisi (T. francoisi), Mus musculus (M. musculus), Rattus norvegicus (R. norvegicus), Gallus gallus (G. gallus), Zootoca vivipara (Z. vivipara), Xenopus tropicalis (X. tropicalis), Danio rerio (D. rerio), Rhincodon typus (R. typus), Plasmodium relictum from National Center for Biotechnology Information (NCBI). The physical and chemical characteristics and molecular evolution of GFPT family proteins and nucleic acid sequences were analyzed by ClustalX2, Gene Doc, MEGA-X, SMART, Datamonkey, R etc. RESULTS: Based on the neighbor-joining (NJ) phylogenetic tree and evolution fingerprints, GFPT family members of vertebrates can be divided into two groups: the GFPT1 group and the GFPT2 group. Seven positive selection sites were identified by IFEL and integrated methods mixed effects model of evolution (MEME) and fixed effects likelihood (REL). Finally, we predicted 28 phosphorylation sites and 18 ubiquitousness sites in the human GFPT1 sequence, 10 phosphorylation sites, and five ubiquitousness sites in GFPT2. Gene ontology (GO) analyzes the protein molecules and KEGG signaling pathways of vertebrates interacting with GFPT family proteins. CONCLUSIONS Our work confirmed that higher animals GFPT family may have differentiated GFPT1 and GFPT2, which meets their own functional needs. This knowledge answers the question what the origin and evolution of GFPT family in vertebrates and provided the basis for disease treatment and function research of GFPT protein.
Collapse
Affiliation(s)
- Si-Ang Wei
- Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, 180 Feng Lin Road, Shanghai, 200032, People's Republic of China
| | - Ran Xu
- Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, 180 Feng Lin Road, Shanghai, 200032, People's Republic of China
| | - Yu-Yao Ji
- Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, 180 Feng Lin Road, Shanghai, 200032, People's Republic of China
| | - Zhi-Wen Ding
- Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, 180 Feng Lin Road, Shanghai, 200032, People's Republic of China.
| | - Yun-Zeng Zou
- Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, 180 Feng Lin Road, Shanghai, 200032, People's Republic of China.
| |
Collapse
|
7
|
Pejaver V, Urresti J, Lugo-Martinez J, Pagel KA, Lin GN, Nam HJ, Mort M, Cooper DN, Sebat J, Iakoucheva LM, Mooney SD, Radivojac P. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nat Commun 2020; 11:5918. [PMID: 33219223 PMCID: PMC7680112 DOI: 10.1038/s41467-020-19669-x] [Citation(s) in RCA: 388] [Impact Index Per Article: 77.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 10/23/2020] [Indexed: 01/02/2023] Open
Abstract
Identifying pathogenic variants and underlying functional alterations is challenging. To this end, we introduce MutPred2, a tool that improves the prioritization of pathogenic amino acid substitutions over existing methods, generates molecular mechanisms potentially causative of disease, and returns interpretable pathogenicity score distributions on individual genomes. Whilst its prioritization performance is state-of-the-art, a distinguishing feature of MutPred2 is the probabilistic modeling of variant impact on specific aspects of protein structure and function that can serve to guide experimental studies of phenotype-altering variants. We demonstrate the utility of MutPred2 in the identification of the structural and functional mutational signatures relevant to Mendelian disorders and the prioritization of de novo mutations associated with complex neurodevelopmental disorders. We then experimentally validate the functional impact of several variants identified in patients with such disorders. We argue that mechanism-driven studies of human inherited disease have the potential to significantly accelerate the discovery of clinically actionable variants.
Collapse
Affiliation(s)
- Vikas Pejaver
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Jorge Urresti
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Jose Lugo-Martinez
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213, USA
| | - Kymberleigh A Pagel
- Department of Computer Science, Indiana University, Bloomington, IN, USA
- Institute for Computational Medicine, Whiting School of Engineering, Johns Hopkins University, 220 Hackerman Hall, 3400 N Charles St, Baltimore, MD, 21218, USA
| | - Guan Ning Lin
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200030, People's Republic of China
| | - Hyun-Jun Nam
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Beyster Center for Genomics of Psychiatric Diseases, University of California San Diego, La Jolla, CA, USA
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA
| | - Lilia M Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
| | - Predrag Radivojac
- Department of Computer Science, Indiana University, Bloomington, IN, USA.
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| |
Collapse
|
8
|
Chandrashekar P, Ahmadinejad N, Wang J, Sekulic A, Egan JB, Asmann YW, Kumar S, Maley C, Liu L. Somatic selection distinguishes oncogenes and tumor suppressor genes. Bioinformatics 2020; 36:1712-1717. [PMID: 32176769 PMCID: PMC7703750 DOI: 10.1093/bioinformatics/btz851] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 10/22/2019] [Accepted: 11/12/2019] [Indexed: 02/06/2023] Open
Abstract
Motivation Functions of cancer driver genes vary substantially across tissues and organs. Distinguishing passenger genes, oncogenes (OGs) and tumor-suppressor genes (TSGs) for each cancer type is critical for understanding tumor biology and identifying clinically actionable targets. Although many computational tools are available to predict putative cancer driver genes, resources for context-aware classifications of OGs and TSGs are limited. Results We show that the direction and magnitude of somatic selection of protein-coding mutations are significantly different for passenger genes, OGs and TSGs. Based on these patterns, we develop a new method (genes under selection in tumors) to discover OGs and TSGs in a cancer-type specific manner. Genes under selection in tumors shows a high accuracy (92%) when evaluated via strict cross-validations. Its application to 10 172 tumor exomes found known and novel cancer drivers with high tissue-specificities. In 11 out of 13 OGs shared among multiple cancer types, we found functional domains selectively engaged in different cancers, suggesting differences in disease mechanisms. Availability and implementation An R implementation of the GUST algorithm is available at https://github.com/liliulab/gust. A database with pre-computed results is available at https://liliulab.shinyapps.io/gust. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pramod Chandrashekar
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Navid Ahmadinejad
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Junwen Wang
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| | - Aleksandar Sekulic
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| | - Jan B Egan
- Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| | - Yan W Asmann
- Department of Health Sciences Research, Mayo Clinic Florida, Jacksonville, AZ, 32224, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | - Carlo Maley
- Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Center for Personalized Diagnostics, Biodesign Institute, Arizona State University, Tempe, AZ, 85281, USA.,Department of Health Sciences Research & Center for Individualized Medicine, Mayo Clinic Arizona, Scottsdale, AZ, 85259, USA
| |
Collapse
|
9
|
Guan X, Runger G, Liu L. Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery. BMC Bioinformatics 2020; 21:77. [PMID: 32164534 PMCID: PMC7068914 DOI: 10.1186/s12859-020-3344-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Results Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. Conclusions Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.
Collapse
Affiliation(s)
- Xin Guan
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA.,Intel Corporation, Chandler, AZ, 85226, USA
| | - George Runger
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ, 85004, USA. .,Biodesign Institute, Arizona State University, Tempe, AZ, 85287, USA. .,Department of Neurology, Mayo Clinic, Scottsdale, AZ, 85259, USA.
| |
Collapse
|
10
|
Lin CY, Ruan P, Li R, Yang JM, See S, Song J, Akutsu T. Deep learning with evolutionary and genomic profiles for identifying cancer subtypes. J Bioinform Comput Biol 2019; 17:1940005. [DOI: 10.1142/s0219720019400055] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Cancer subtype identification is an unmet need in precision diagnosis. Recently, evolutionary conservation has been indicated to contain informative signatures for functional significance in cancers. However, the importance of evolutionary conservation in distinguishing cancer subtypes remains largely unclear. Here, we identified the evolutionarily conserved genes (i.e. core genes) and observed that they are primarily involved in cellular pathways relevant to cell growth and metabolisms. By using these core genes, we developed two novel strategies, namely a feature-based strategy (FES) and an image-based strategy (IMS) by integrating their evolutionary and genomic profiles with the deep learning algorithm. In comparison with the FES using the random set and the strategy using the PAM50 classifier, the core gene set-based FES achieved a higher accuracy for identifying breast cancer subtypes. The IMS and FES using the core gene set yielded better performances than the other strategies, in terms of classifying both breast cancer subtypes and multiple cancer types. Moreover, the IMS is reproducible even using different gene expression data (i.e. RNA-seq and microarray). Comprehensive analysis of eight cancer types demonstrates that our evolutionary conservation-based models represent a valid and helpful approach for identifying cancer subtypes and the core gene set offers distinguishable clues of cancer subtypes.
Collapse
Affiliation(s)
- Chun-Yu Lin
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 6110011, Japan
| | - Peiying Ruan
- NVIDIA AI Technology Center, NVIDIA Corporation Japan, Tokyo 1070052, Japan
| | - Ruiming Li
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 6110011, Japan
| | - Jinn-Moon Yang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Simon See
- NVIDIA AI Technology Center, NVIDIA Corporation Singapore, Singapore 138522, Singapore
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry & Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 6110011, Japan
| |
Collapse
|
11
|
Klee EW, Zimmermann MT. Molecular modeling of LDLR aids interpretation of genomic variants. J Mol Med (Berl) 2019; 97:533-540. [PMID: 30778614 PMCID: PMC6440939 DOI: 10.1007/s00109-019-01755-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 01/14/2019] [Accepted: 02/05/2019] [Indexed: 11/24/2022]
Abstract
Abstract Genetic variants in low-density lipoprotein receptor (LDLR) are known to cause familial hypercholesterolemia (FH), occurring in up to 1 in 200 people (Youngblom E. et al. 1993 and Nordestgaard BG et al. 34:3478–3490a, 2013) and leading to significant risk for heart disease. Clinical genomics testing using high-throughput sequencing is identifying novel genomic variants of uncertain significance (VUS) in individuals suspected of having FH, but for whom the causal link to the disease remains to be established (Nordestgaard BG et al. 34:3478–3490a, 2013). Unfortunately, experimental data about the atomic structure of the LDL binding domains of LDLR at extracellular pH does not exist. This leads to an inability to apply protein structure-based methods for assessing novel variants identified through genetic testing. Thus, the ambiguities in interpretation of LDLR variants are a barrier to achieving the expected clinical value for personalized genomics assays for management of FH. In this study, we integrated data from the literature and related cellular receptors to develop high-resolution models of full-length LDLR at extracellular conditions and use them to predict which VUS alter LDL binding. We believe that the functional effects of LDLR variants can be resolved using a combination of structural bioinformatics and functional assays, leading to a better correlation with clinical presentation. We have completed modeling of LDLR in two major physiologic conditions, generating detailed hypotheses for how each of the 1007 reported protein variants may affect function. Key messages • Hundreds of variants are observed in the LDLR, but most lack interpretation. • Molecular modeling is aided by biochemical knowledge. • We generated context-specific 3D protein models of LDLR. • Our models allowed mechanistic interpretation of many variants. • We interpreted both rare and common genomic variants in their physiologic context. • Effects of genomic variants are often context-specific. Electronic supplementary material The online version of this article (10.1007/s00109-019-01755-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Eric W Klee
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA.,Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Michael T Zimmermann
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA.
| |
Collapse
|
12
|
Liu L, Sanderford MD, Patel R, Chandrashekar P, Gibson G, Kumar S. Biological relevance of computationally predicted pathogenicity of noncoding variants. Nat Commun 2019; 10:330. [PMID: 30659175 PMCID: PMC6338804 DOI: 10.1038/s41467-018-08270-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 12/19/2018] [Indexed: 11/15/2022] Open
Abstract
Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists. Researchers can make use of a variety of computational tools to prioritize genetic variants and predict their pathogenicity. Here, the authors evaluate the performance of six of these tools in three typical biological tasks and find generally low concordance of predictions and experimental confirmation.
Collapse
Affiliation(s)
- Li Liu
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA.,Department of Biology, Temple University, Philadelphia, PA, USA
| | - Pramod Chandrashekar
- College of Health Solutions, Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Greg Gibson
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA.
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. .,Department of Biology, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
13
|
Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018; 14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open
Abstract
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures. Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.
Collapse
Affiliation(s)
- Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
14
|
Patel R, Scheinfeldt LB, Sanderford MD, Lanham TR, Tamura K, Platt A, Glicksberg BS, Xu K, Dudley JT, Kumar S. Adaptive Landscape of Protein Variation in Human Exomes. Mol Biol Evol 2018; 35:2015-2025. [PMID: 29846678 PMCID: PMC6063297 DOI: 10.1093/molbev/msy107] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The human genome contains hundreds of thousands of missense mutations. However, only a handful of these variants are known to be adaptive, which implies that adaptation through protein sequence change is an extremely rare phenomenon in human evolution. Alternatively, existing methods may lack the power to pinpoint adaptive variation. We have developed and applied an Evolutionary Probability Approach (EPA) to discover candidate adaptive polymorphisms (CAPs) through the discordance between allelic evolutionary probabilities and their observed frequencies in human populations. EPA reveals thousands of missense CAPs, which suggest that a large number of previously optimal alleles experienced a reversal of fortune in the human lineage. We explored nonadaptive mechanisms to explain CAPs, including the effects of demography, mutation rate variability, and negative and positive selective pressures in modern humans. Many nonadaptive hypotheses were tested, but failed to explain the data, which suggests that a large proportion of CAP alleles have increased in frequency due to beneficial selection. This suggestion is supported by the fact that a vast majority of adaptive missense variants discovered previously in humans are CAPs, and hundreds of CAP alleles are protective in genotype-phenotype association data. Our integrated phylogenomic and population genetic EPA approach predicts the existence of thousands of nonneutral candidate variants in the human proteome. We expect this collection to be enriched in beneficial variation. The EPA approach can be applied to discover candidate adaptive variation in any protein, population, or species for which allele frequency data and reliable multispecies alignments are available.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| | - Laura B Scheinfeldt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Coriell Institute for Medical Research, Camden, NJ
| | - Maxwell D Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Tamera R Lanham
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Koichiro Tamura
- Department of Biology, Tokyo Metropolitan University, Tokyo, Japan
| | - Alexander Platt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA
| | - Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Ke Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Joel T Dudley
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
15
|
Abstract
Genetic differences between species and within populations are two sides of the same coin under the neutral theory of molecular evolution. This theory posits that a vast majority of evolutionary substitutions, which appear as differences between species, are (nearly) neutral, that is, these substitutions are permitted without a significantly adverse impact on a species' survival. We refer to them as evolutionarily permissible (ePerm) variation. Evolutionary permissibility of any possible variant can be inferred from multispecies sequence alignments by applying sophisticated statistical methods to the evolutionary tree of species. Here, we explore the evolutionary permissibility of amino acid variants associated with genetic diseases and those observed in personal exomes. Consistent with the predictions of the neutral theory, disease associated amino acid variants are rarely ePerm, much more biochemically radical, and found predominantly at more conserved positions than their non-disease counterparts. Only 10% of amino acid mutations are ePerm, but these variants rise to become two-thirds of all substitutions in the human lineage (a 6-fold enrichment). In contrast, only a minority of the variants in a personal exome are ePerm, a seemingly counterintuitive pattern that results from a combination of mutational and evolutionary processes that are, in fact, broadly consistent with the neutral theory. Evolutionarily forbidden variants outnumber detrimental variants in individual exomes and may play an underappreciated role in protecting against disease. We discuss these observations and conclude that the long-term evolutionary history of species can illuminate functional biomedical properties of variation present in personal exomes.
Collapse
Affiliation(s)
- Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
- Department of Biology, Temple University, Philadelphia, PA
| |
Collapse
|
16
|
Jin ZB, Li Z, Liu Z, Jiang Y, Cai XB, Wu J. Identification of de novo germline mutations and causal genes for sporadic diseases using trio-based whole-exome/genome sequencing. Biol Rev Camb Philos Soc 2017; 93:1014-1031. [PMID: 29154454 DOI: 10.1111/brv.12383] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 09/28/2017] [Accepted: 10/10/2017] [Indexed: 12/14/2022]
Abstract
Whole-genome or whole-exome sequencing (WGS/WES) of the affected proband together with normal parents (trio) is commonly adopted to identify de novo germline mutations (DNMs) underlying sporadic cases of various genetic disorders. However, our current knowledge of the occurrence and functional effects of DNMs remains limited and accurately identifying the disease-causing DNM from a group of irrelevant DNMs is complicated. Herein, we provide a general-purpose discussion of important issues related to pathogenic gene identification based on trio-based WGS/WES data. Specifically, the relevance of DNMs to human sporadic diseases, current knowledge of DNM biogenesis mechanisms, and common strategies or software tools used for DNM detection are reviewed, followed by a discussion of pathogenic gene prioritization. In addition, several key factors that may affect DNM identification accuracy and causal gene prioritization are reviewed. Based on recent major advances, this review both sheds light on how trio-based WGS/WES technologies can play a significant role in the identification of DNMs and causal genes for sporadic diseases, and also discusses existing challenges.
Collapse
Affiliation(s)
- Zi-Bing Jin
- Division of Ophthalmic Genetics, The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China.,State Key Laboratory of Ophthalmology Optometry and Vision Science, Wenzhou Medical University, Wenzhou, 325027, China
| | - Zhongshan Li
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Zhenwei Liu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Yi Jiang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| | - Xue-Bi Cai
- Division of Ophthalmic Genetics, The Eye Hospital, School of Ophthalmology & Optometry, Wenzhou Medical University, Wenzhou, 325027, China.,State Key Laboratory of Ophthalmology Optometry and Vision Science, Wenzhou Medical University, Wenzhou, 325027, China
| | - Jinyu Wu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, 325000, China
| |
Collapse
|
17
|
Cousin MA, Zimmermann MT, Mathison AJ, Blackburn PR, Boczek NJ, Oliver GR, Lomberk GA, Urrutia RA, Deyle DR, Klee EW. Functional validation reveals the novel missense V419L variant in TGFBR2 associated with Loeys-Dietz syndrome (LDS) impairs canonical TGF-β signaling. Cold Spring Harb Mol Case Stud 2017; 3:mcs.a001727. [PMID: 28679693 PMCID: PMC5495030 DOI: 10.1101/mcs.a001727] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Accepted: 04/12/2017] [Indexed: 12/31/2022] Open
Abstract
TGF-β-related heritable connective tissue disorders are characterized by a similar pattern of cardiovascular defects, including aortic root dilatation, mitral valve prolapse, vascular aneurysms, and vascular dissections and exhibit incomplete penetrance and variable expressivity. Because of the phenotypic overlap of these disorders, panel-based genetic testing is frequently used to confirm the clinical findings. Unfortunately in many cases, variants of uncertain significance (VUSs) obscure the genetic diagnosis until more information becomes available. Here, we describe and characterize the functional impact of a novel VUS in the TGFBR2 kinase domain (c.1255G>T; p.Val419Leu), in a patient with the clinical diagnosis of Marfan syndrome spectrum. We assessed the structural and functional consequence of this VUS using molecular modeling, molecular dynamic simulations, and in vitro cell-based assays. A high-quality homology-based model of TGFBR2 was generated and computational mutagenesis followed by refinement and molecular dynamics simulations were used to assess structural and dynamic changes. Relative to wild type, the V419L induced conformational and dynamic changes that may affect ATP binding, increasing the likelihood of adopting an inactive state, and, we hypothesize, alter canonical signaling. Experimentally, we tested this by measuring the canonical TGF-β signaling pathway activation at two points; V419L significantly delayed SMAD2 phosphorylation by western blot and significantly decreased TGF-β-induced gene transcription by reporter assays consistent with known pathogenic variants in this gene. Thus, our results establish that the V419L variant leads to aberrant TGF-β signaling and confirm the diagnosis of Loeys-Dietz syndrome in this patient.
Collapse
Affiliation(s)
- Margot A Cousin
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA.,Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Michael T Zimmermann
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Angela J Mathison
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Patrick R Blackburn
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, Florida 32224, USA.,Center for Individualized Medicine, Mayo Clinic, Jacksonville, Florida 32224, USA
| | - Nicole J Boczek
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA.,Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Gavin R Oliver
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Gwen A Lomberk
- Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.,Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Raul A Urrutia
- Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.,Laboratory of Epigenetics and Chromatin Dynamics, Gastroenterology Research Unit, Department of Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - David R Deyle
- Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.,Department of Clinic Genomics, Mayo Clinic, Rochester, Minnesota 55905, USA
| | - Eric W Klee
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota 55905, USA.,Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA.,Department of Clinic Genomics, Mayo Clinic, Rochester, Minnesota 55905, USA
| |
Collapse
|
18
|
McSkimming DI, Dastgheib S, Baffi TR, Byrne DP, Ferries S, Scott ST, Newton AC, Eyers CE, Kochut KJ, Eyers PA, Kannan N. KinView: a visual comparative sequence analysis tool for integrated kinome research. MOLECULAR BIOSYSTEMS 2016; 12:3651-3665. [PMID: 27731453 PMCID: PMC5508867 DOI: 10.1039/c6mb00466k] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Multiple sequence alignments (MSAs) are a fundamental analysis tool used throughout biology to investigate relationships between protein sequence, structure, function, evolutionary history, and patterns of disease-associated variants. However, their widespread application in systems biology research is currently hindered by the lack of user-friendly tools to simultaneously visualize, manipulate and query the information conceptualized in large sequence alignments, and the challenges in integrating MSAs with multiple orthogonal data such as cancer variants and post-translational modifications, which are often stored in heterogeneous data sources and formats. Here, we present the Multiple Sequence Alignment Ontology (MSAOnt), which represents a profile or consensus alignment in an ontological format. Subsets of the alignment are easily selected through the SPARQL Protocol and RDF Query Language for downstream statistical analysis or visualization. We have also created the Kinome Viewer (KinView), an interactive integrative visualization that places eukaryotic protein kinase cancer variants in the context of natural sequence variation and experimentally determined post-translational modifications, which play central roles in the regulation of cellular signaling pathways. Using KinView, we identified differential phosphorylation patterns between tyrosine and serine/threonine kinases in the activation segment, a major kinase regulatory region that is often mutated in proliferative diseases. We discuss cancer variants that disrupt phosphorylation sites in the activation segment, and show how KinView can be used as a comparative tool to identify differences and similarities in natural variation, cancer variants and post-translational modifications between kinase groups, families and subfamilies. Based on KinView comparisons, we identify and experimentally characterize a regulatory tyrosine (Y177PLK4) in the PLK4 C-terminal activation segment region termed the P+1 loop. To further demonstrate the application of KinView in hypothesis generation and testing, we formulate and validate a hypothesis explaining a novel predicted loss-of-function variant (D523NPKCβ) in the regulatory spine of PKCβ, a recently identified tumor suppressor kinase. KinView provides a novel, extensible interface for performing comparative analyses between subsets of kinases and for integrating multiple types of residue specific annotations in user friendly formats.
Collapse
Affiliation(s)
| | - Shima Dastgheib
- Department of Computer Science, University of Georgia, Athens, GA 30602, USA
| | - Timothy R Baffi
- Department of Pharmacology, University of California at San Diego, La Jolla, CA 92093, USA
| | - Dominic P Byrne
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Samantha Ferries
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Steven Thomas Scott
- Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA 30602, USA
| | - Alexandra C Newton
- Department of Pharmacology, University of California at San Diego, La Jolla, CA 92093, USA
| | - Claire E Eyers
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Krzysztof J Kochut
- Department of Computer Science, University of Georgia, Athens, GA 30602, USA
| | - Patrick A Eyers
- Department of Biochemistry, Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA. and Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
19
|
Liu L, Chang Y, Yang T, Noren DP, Long B, Kornblau S, Qutub A, Ye J. Evolution-informed modeling improves outcome prediction for cancers. Evol Appl 2016; 10:68-76. [PMID: 28035236 PMCID: PMC5192825 DOI: 10.1111/eva.12417] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 08/17/2016] [Indexed: 12/19/2022] Open
Abstract
Despite wide applications of high-throughput biotechnologies in cancer research, many biomarkers discovered by exploring large-scale omics data do not provide satisfactory performance when used to predict cancer treatment outcomes. This problem is partly due to the overlooking of functional implications of molecular markers. Here, we present a novel computational method that uses evolutionary conservation as prior knowledge to discover bona fide biomarkers. Evolutionary selection at the molecular level is nature's test on functional consequences of genetic elements. By prioritizing genes that show significant statistical association and high functional impact, our new method reduces the chances of including spurious markers in the predictive model. When applied to predicting therapeutic responses for patients with acute myeloid leukemia and to predicting metastasis for patients with prostate cancers, the new method gave rise to evolution-informed models that enjoyed low complexity and high accuracy. The identified genetic markers also have significant implications in tumor progression and embrace potential drug targets. Because evolutionary conservation can be estimated as a gene-specific, position-specific, or allele-specific parameter on the nucleotide level and on the protein level, this new method can be extended to apply to miscellaneous "omics" data to accelerate biomarker discoveries.
Collapse
Affiliation(s)
- Li Liu
- Department of Biomedical Informatics Arizona State University Tempe AZ USA
| | - Yung Chang
- School of Life Science Arizona State University Tempe AZ USA
| | - Tao Yang
- Department of Computer Science and Engineering Arizona State University Tempe AZ USA
| | - David P Noren
- Department of Bioengineering Rice University Houston TX USA
| | - Byron Long
- Department of Bioengineering Rice University Houston TX USA
| | - Steven Kornblau
- The University of Texas MD Anderson Cancer Center Houston TX USA
| | - Amina Qutub
- Department of Bioengineering Rice University Houston TX USA
| | - Jieping Ye
- Department of Computational Medicine and Bioinformatics University of Michigan Ann Arbor MI USA
| |
Collapse
|
20
|
Karim S, NourEldin HF, Abusamra H, Salem N, Alhathli E, Dudley J, Sanderford M, Scheinfeldt LB, Chaudhary AG, Al-Qahtani MH, Kumar S. e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations. BMC Genomics 2016; 17:770. [PMID: 27766955 PMCID: PMC5073857 DOI: 10.1186/s12864-016-3088-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023] Open
Abstract
Background Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. Description We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. Conclusion By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp.
Collapse
Affiliation(s)
- Sajjad Karim
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hend Fakhri NourEldin
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Heba Abusamra
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nada Salem
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Elham Alhathli
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Joel Dudley
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Max Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Laura B Scheinfeldt
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.,Department of Biology, Temple University, Philadelphia, PA, 19122, USA
| | | | | | - Sudhir Kumar
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia. .,Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA. .,Department of Biology, Temple University, Philadelphia, PA, 19122, USA.
| |
Collapse
|
21
|
Kulshreshtha S, Chaudhary V, Goswami GK, Mathur N. Computational approaches for predicting mutant protein stability. J Comput Aided Mol Des 2016; 30:401-12. [DOI: 10.1007/s10822-016-9914-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 05/02/2016] [Indexed: 11/24/2022]
|
22
|
A Broad Overview of Computational Methods for Predicting the Pathophysiological Effects of Non-synonymous Variants. Methods Mol Biol 2016; 1415:423-40. [PMID: 27115646 DOI: 10.1007/978-1-4939-3572-7_22] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Next-generation sequencing has provided extraordinary opportunities to investigate the massive human genetic variability. It helped identifying several kinds of genomic mismatches from the wild-type reference genome sequences and to explain the onset of several pathogenic phenotypes and diseases susceptibility. In this context, distinguishing pathogenic from functionally neutral amino acid changes turns out to be a task as useful as complex, expensive, and time-consuming.Here, we present an exhaustive and up-to-dated survey of the algorithms and software packages conceived for the estimation of the putative pathogenicity of mutations, along with a description of the most popular mutation datasets that these tools used as training sets. Finally, we present and describe software for the prediction of cancer-related mutations.
Collapse
|
23
|
Subramanian S. Europeans have a higher proportion of high‑frequency deleterious variants than Africans. Hum Genet 2016; 135:1-7. [PMID: 26462918 DOI: 10.1007/s00439-015-1604-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 10/01/2015] [Indexed: 12/11/2022]
Abstract
Recent studies have shown that a high proportion of rare variants in European and African populations are deleterious in nature. However, the deleterious fraction of high-frequency variants is unclear. Using more than 6500 exomes we show a much higher fraction (11 %) of relatively high-frequency nonsynonymous (amino acid altering) variants (DAF 0.1–10 %) in European Americans (EA) compared to those from African Americans (AA). In contrast, this difference was only marginal (<2 %) for low-frequency nonsynonymous variants (DAF <0.1 %). Our results also revealed that the proportion of high-frequency deleterious nonsynonymous variants in EA was much higher (24 %) than that in AA and this difference was very small (4 %) for the low-frequency deleterious amino acid altering variants. We also show that EA have significantly more number of high-frequency deleterious nonsynonymous variants per genome than AA. The high proportion of high-frequency deleterious variants in EA could be the result of the well-known bottleneck experienced by European populations in which harmful mutations may have drifted to high frequencies. The estimated ages of deleterious variants support this prediction. Our results suggest that high-frequency variants could be relatively more likely to be associated with diseases in Europeans than in Africans and hence emphasize the need for population-specific strategies in genomic medicine studies.
Collapse
Affiliation(s)
- Sankar Subramanian
- Environmental Futures Research Institute, Griffith University, 170 Kessels Road, Nathan, QLD 4111, Australia.
| |
Collapse
|
24
|
Kumar A, Butler BM, Kumar S, Ozkan SB. Integration of structural dynamics and molecular evolution via protein interaction networks: a new era in genomic medicine. Curr Opin Struct Biol 2015; 35:135-42. [PMID: 26684487 PMCID: PMC4856467 DOI: 10.1016/j.sbi.2015.11.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Revised: 11/03/2015] [Accepted: 11/05/2015] [Indexed: 01/08/2023]
Abstract
Sequencing technologies are revealing many new non-synonymous single nucleotide variants (nsSNVs) in each personal exome. To assess their functional impacts, comparative genomics is frequently employed to predict if they are benign or not. However, evolutionary analysis alone is insufficient, because it misdiagnoses many disease-associated nsSNVs, such as those at positions involved in protein interfaces, and because evolutionary predictions do not provide mechanistic insights into functional change or loss. Structural analyses can aid in overcoming both of these problems by incorporating conformational dynamics and allostery in nSNV diagnosis. Finally, protein-protein interaction networks using systems-level methodologies shed light onto disease etiology and pathogenesis. Bridging these network approaches with structurally resolved protein interactions and dynamics will advance genomic medicine.
Collapse
Affiliation(s)
- Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Brandon M Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, United States; Department of Biology, Temple University, Philadelphia, PA 19122, United States; Center for Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ 85281, United States.
| |
Collapse
|
25
|
Miura S, Tate S, Kumar S. Using Disease-Associated Coding Sequence Variation to Investigate Functional Compensation by Human Paralogous Proteins. Evol Bioinform Online 2015; 11:245-51. [PMID: 26604664 PMCID: PMC4631161 DOI: 10.4137/ebo.s30594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2015] [Revised: 09/14/2015] [Accepted: 09/18/2015] [Indexed: 11/09/2022] Open
Abstract
Gene duplication enables the functional diversification in species. It is thought that duplicated genes may be able to compensate if the function of one of the gene copies is disrupted. This possibility is extensively debated with some studies reporting proteome-wide compensation, whereas others suggest functional compensation among only recent gene duplicates or no compensation at all. We report results from a systematic molecular evolutionary analysis to test the predictions of the functional compensation hypothesis. We contrasted the density of Mendelian disease-associated single nucleotide variants (dSNVs) in proteins with no discernable paralogs (singletons) with the dSNV density in proteins found in multigene families. Under the functional compensation hypothesis, we expected to find greater numbers of dSNVs in singletons due to the lack of any compensating partners. Our analyses produced an opposite pattern; paralogs have over 35% higher dSNV density than singletons. We found that these patterns are concordant with similar differences in the rates of amino acid evolution (ie, functional constraints), as the proteins with paralogs have evolved 33% slower than singletons. Our evolutionary constraint explanation is robust to differences in family sizes, ages (young vs. old duplicates), and degrees of amino acid sequence similarities among paralogs. Therefore, disease-associated human variation does not exhibit significant signals of functional compensation among paralogous proteins, but rather an evolutionary constraint hypothesis provides a better explanation for the observed patterns of disease-associated and neutral polymorphisms in the human genome.
Collapse
Affiliation(s)
- Sayaka Miura
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Stephanie Tate
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA. ; Department of Biology, Temple University, Philadelphia, PA, USA. ; Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
26
|
Belinsky MG, Rink L, Cai KQ, Capuzzi SJ, Hoang Y, Chien J, Godwin AK, von Mehren M. Somatic loss of function mutations in neurofibromin 1 and MYC associated factor X genes identified by exome-wide sequencing in a wild-type GIST case. BMC Cancer 2015; 15:887. [PMID: 26555092 PMCID: PMC4641358 DOI: 10.1186/s12885-015-1872-y] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/30/2015] [Indexed: 12/25/2022] Open
Abstract
Background Approximately 10–15 % of gastrointestinal stromal tumors (GISTs) lack gain of function mutations in the KIT and platelet-derived growth factor receptor alpha (PDGFRA) genes. An alternate mechanism of oncogenesis through loss of function of the succinate-dehydrogenase (SDH) enzyme complex has been identified for a subset of these “wild type” GISTs. Methods Paired tumor and normal DNA from an SDH-intact wild-type GIST case was subjected to whole exome sequencing to identify the pathogenic mechanism(s) in this tumor. Selected findings were further investigated in panels of GIST tumors through Sanger DNA sequencing, quantitative real-time PCR, and immunohistochemical approaches. Results A hemizygous frameshift mutation (p.His2261Leufs*4), in the neurofibromin 1 (NF1) gene was identified in the patient’s GIST; however, no germline NF1 mutation was found. A somatic frameshift mutation (p.Lys54Argfs*31) in the MYC associated factor X (MAX) gene was also identified. Immunohistochemical analysis for MAX on a large panel of GISTs identified loss of MAX expression in the MAX-mutated GIST and in a subset of mainly KIT-mutated tumors. Conclusion This study suggests that inactivating NF1 mutations outside the context of neurofibromatosis may be the oncogenic mechanism for a subset of sporadic GIST. In addition, loss of function mutation of the MAX gene was identified for the first time in GIST, and a broader role for MAX in GIST progression was suggested. Electronic supplementary material The online version of this article (doi:10.1186/s12885-015-1872-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin G Belinsky
- Molecular Therapeutics Program, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA, 19111-2497, USA.
| | - Lori Rink
- Molecular Therapeutics Program, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA, 19111-2497, USA.
| | - Kathy Q Cai
- Cancer Biology Program, Fox Chase Cancer Center, Philadelphia, PA, USA.
| | - Stephen J Capuzzi
- Molecular Therapeutics Program, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA, 19111-2497, USA. .,Division of Chemical Biology and Medicinal Chemistry, University of North Carolina, Chapel Hill, NC, USA.
| | - Yen Hoang
- Department of Bioinformatics and Biosystems Technology, University of Applied Sciences Wildau, Wildau, Germany. .,Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, USA.
| | - Jeremy Chien
- Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, USA.
| | - Andrew K Godwin
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, USA.
| | - Margaret von Mehren
- Molecular Therapeutics Program, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA, 19111-2497, USA.
| |
Collapse
|
27
|
Liu L, Tamura K, Sanderford M, Gray VE, Kumar S. A Molecular Evolutionary Reference for the Human Variome. Mol Biol Evol 2015; 33:245-54. [PMID: 26464126 DOI: 10.1093/molbev/msv198] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Widespread sequencing efforts are revealing unprecedented amount of genomic variation in populations. Such information is routinely used to derive consensus reference sequences and to infer positions subject to natural selection. Here, we present a new molecular evolutionary method for estimating neutral evolutionary probabilities (EPs) of each amino acid, or nucleotide state at a genomic position without using intraspecific polymorphism data. Because EPs are derived independently of population-level information, they serve as null expectations that can be used to evaluate selective forces on alleles at both polymorphic and monomorphic positions in populations. We applied this method to coding sequences in the human genome and produced a comprehensive evolutionary variome reference for all human proteins. We found that EPs accurately predict neutral and disease-associated alleles. Through an analysis of discordance between allelic EPs and their observed population frequencies, we discovered thousands of novel candidate sites for nonneutral evolution in human proteins. Many of these were validated in a joint analysis of disease-associated variants and population data. The EP method is also directly applicable to the analysis of noncoding sequences and genomic analyses of nonmodel species.
Collapse
Affiliation(s)
- Li Liu
- Department of Biomedical Informatics, Arizona State University, Scottsdale Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphila
| | - Koichiro Tamura
- Department of Biological Sciences, Tokyo Metropolitan University, Hachioji, Tokyo, Japan
| | - Maxwell Sanderford
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphila
| | - Vanessa E Gray
- Department of Genome Sciences, University of Washington, Seattle
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphila Department of Biology, Temple University, Philadelphila Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
28
|
Cheng F, Liu C, Lin CC, Zhao J, Jia P, Li WH, Zhao Z. A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types. PLoS Comput Biol 2015; 11:e1004497. [PMID: 26352260 PMCID: PMC4564226 DOI: 10.1371/journal.pcbi.1004497] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2015] [Accepted: 08/11/2015] [Indexed: 12/14/2022] Open
Abstract
Cancer development and progression result from somatic evolution by an accumulation of genomic alterations. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferation, angiogenesis, and altered anticancer drug responses. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape subsequent evolution of the cancer genome. In this study, we proposed the gene gravity model to study the evolution of cancer genomes by incorporating the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that somatic mutations of a cancer driver gene may drive cancer genome evolution by inducing mutations in other genes. This functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. By quantifying cancer genome evolution using the gene gravity model, we identified six putative cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1). The tumor genomes harboring the nonsynonymous somatic mutations in these genes had a higher mutation density at the genome level compared to the wild-type groups. Furthermore, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes. In summary, this study sheds light on the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by propelling adaptive cancer genome evolution, which would provide new perspectives for cancer research and therapeutics.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Chuang Liu
- Alibaba Research Center for Complexity Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Junfei Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
- Biodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
29
|
Hart SN, Duffy P, Quest DJ, Hossain A, Meiners MA, Kocher JP. VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files. Brief Bioinform 2015. [PMID: 26210358 PMCID: PMC4793895 DOI: 10.1093/bib/bbv051] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Next-generation sequencing platforms are widely used to discover variants associated with disease. The processing of sequencing data involves read alignment, variant calling, variant annotation and variant filtering. The standard file format to hold variant calls is the variant call format (VCF) file. According to the format specifications, any arbitrary annotation can be added to the VCF file for downstream processing. However, most downstream analysis programs disregard annotations already present in the VCF and re-annotate variants using the annotation provided by that particular program. This precludes investigators who have collected information on variants from literature or other sources from including these annotations in the filtering and mining of variants. We have developed VCF-Miner, a graphical user interface-based stand-alone tool, to mine variants and annotation stored in the VCF. Powered by a MongoDB database engine, VCF-Miner enables the stepwise trimming of non-relevant variants. The grouping feature implemented in VCF-Miner can be used to identify somatic variants by contrasting variants in tumor and in normal samples or to identify recessive/dominant variants in family studies. It is not limited to human data, but can also be extended to include non-diploid organisms. It also supports copy number or any other variant type supported by the VCF specification. VCF-Miner can be used on a personal computer or large institutional servers and is freely available for download from http://bioinformaticstools.mayo.edu/research/vcf-miner/.
Collapse
|
30
|
A commentary on identification of the rare compound heterozygous variants in the NEB gene in a Korean family with intellectual disability, epilepsy and early-childhood-onset generalized muscle weakness. J Hum Genet 2015; 60:161-2. [DOI: 10.1038/jhg.2014.120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
31
|
Popovic D, Sifrim A, Davis J, Moreau Y, De Moor B. Problems with the nested granularity of feature domains in bioinformatics: the eXtasy case. BMC Bioinformatics 2015; 16 Suppl 4:S2. [PMID: 25734591 PMCID: PMC4347616 DOI: 10.1186/1471-2105-16-s4-s2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Background Data from biomedical domains often have an inherit hierarchical structure. As this structure is usually implicit, its existence can be overlooked by practitioners interested in constructing and evaluating predictive models from such data. Ignoring these constructs leads to potentially problematic and the routinely unrecognized bias in the models and results. In this work, we discuss this bias in detail and propose a simple, sampling-based solution for it. Next, we explore its sources and extent on synthetic data. Finally, we demonstrate how the state-of-the-art variant prioritization framework, eXtasy, benefits from using the described approach in its Random forest-based core classification model. Results and conclusions The conducted simulations clearly indicate that the heterogeneous granularity of feature domains poses significant problems for both the standard Random forest classifier and a modification that relies on stratified bootstrapping. Conversely, using the proposed sampling scheme when training the classifier mitigates the described bias. Furthermore, when applied to the eXtasy data under a realistic class distribution scenario, a Random forest learned using the proposed sampling scheme displays much better precision that its standard version, without degrading recall. Moreover, the largest performance gains are achieved in the most important part of the operating range: the top of prioritized gene list.
Collapse
|
32
|
Gerek NZ, Liu L, Gerold K, Biparva P, Thomas ED, Kumar S. Evolutionary Diagnosis of non-synonymous variants involved in differential drug response. BMC Med Genomics 2015; 8 Suppl 1:S6. [PMID: 25952014 PMCID: PMC4315320 DOI: 10.1186/1755-8794-8-s1-s6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Many pharmaceutical drugs are known to be ineffective or have negative side effects in a substantial proportion of patients. Genomic advances are revealing that some non-synonymous single nucleotide variants (nsSNVs) may cause differences in drug efficacy and side effects. Therefore, it is desirable to evaluate nsSNVs of interest in their ability to modulate the drug response. Results We found that the available data on the link between drug response and nsSNV is rather modest. There were only 31 distinct drug response-altering (DR-altering) and 43 distinct drug response-neutral (DR-neutral) nsSNVs in the whole Pharmacogenomics Knowledge Base (PharmGKB). However, even with this modest dataset, it was clear that existing bioinformatics tools have difficulties in correctly predicting the known DR-altering and DR-neutral nsSNVs. They exhibited an overall accuracy of less than 50%, which was not better than random diagnosis. We found that the underlying problem is the markedly different evolutionary properties between positions harboring nsSNVs linked to drug responses and those observed for inherited diseases. To solve this problem, we developed a new diagnosis method, Drug-EvoD, which was trained on the evolutionary properties of nsSNVs associated with drug responses in a sparse learning framework. Drug-EvoD achieves a TPR of 84% and a TNR of 53%, with a balanced accuracy of 69%, which improves upon other methods significantly. Conclusions The new tool will enable researchers to computationally identify nsSNVs that may affect drug responses. However, much larger training and testing datasets are needed to develop more reliable and accurate tools.
Collapse
|
33
|
Butler BM, Gerek ZN, Kumar S, Ozkan SB. Conformational dynamics of nonsynonymous variants at protein interfaces reveals disease association. Proteins 2015; 83:428-35. [PMID: 25546381 DOI: 10.1002/prot.24748] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 11/20/2014] [Accepted: 12/10/2014] [Indexed: 12/12/2022]
Abstract
Recent studies have shown that the protein interface sites between individual monomeric units in biological assemblies are enriched in disease-associated non-synonymous single nucleotide variants (nsSNVs). To elucidate the mechanistic underpinning of this observation, we investigated the conformational dynamic properties of protein interface sites through a site-specific structural dynamic flexibility metric (dfi) for 333 multimeric protein assemblies. dfi measures the dynamic resilience of a single residue to perturbations that occurred in the rest of the protein structure and identifies sites contributing the most to functionally critical dynamics. Analysis of dfi profiles of over a thousand positions harboring variation revealed that amino acid residues at interfaces have lower average dfi (31%) than those present at non-interfaces (50%), which means that protein interfaces have less dynamic flexibility. Interestingly, interface sites with disease-associated nsSNVs have significantly lower average dfi (23%) as compared to those of neutral nsSNVs (42%), which directly relates structural dynamics to functional importance. We found that less conserved interface positions show much lower dfi for disease nsSNVs as compared to neutral nsSNVs. In this case, dfi is better as compared to the accessible surface area metric, which is based on the static protein structure. Overall, our proteome-wide conformational dynamic analysis indicates that certain interface sites play a critical role in functionally related dynamics (i.e., those with low dfi values), therefore mutations at those sites are more likely to be associated with disease.
Collapse
|
34
|
Castellana S, Rónai J, Mazza T. MitImpact: an exhaustive collection of pre-computed pathogenicity predictions of human mitochondrial non-synonymous variants. Hum Mutat 2014; 36:E2413-22. [PMID: 25516408 DOI: 10.1002/humu.22720] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Mitochondrial DNA carries a tiny, but fundamental portion of the eukaryotic genetic code. As its nuclear counterpart, it is susceptible to point mutations. Their level of pathogenicity has been assessed for the newly discovered mutations only, leaving some degree of uncertainty on the potential impact of the unknown mutations. Here we present Mitochondrial mutation Impact (MitImpact), a queryable lightweight web interface to a reasoned collection of structurally and evolutionary annotated pathogenicity predictions, obtained by assembling pre-computed with on-the-fly-computed sets of pathogenicity estimations, for all the possible mitochondrial missense variants. It presents itself as a resource for fast and reliable evaluation of gene-specific susceptibility of unknown and verified amino acid changes. MitImpact is freely available at http://bioinformatics.css-mendel.it/ (tools section). ©2014 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Stefano Castellana
- IRCCS Casa Sollievo della Sofferenza, Istituto Mendel, Bioinformatics Unit. Viale Regina Margherita, 261. 00198, Roma, Italy
| | | | | |
Collapse
|
35
|
Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history. Nat Genet 2014; 46:1303-10. [DOI: 10.1038/ng.3137] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 10/09/2014] [Indexed: 11/08/2022]
|
36
|
Gaston D, Hansford S, Oliveira C, Nightingale M, Pinheiro H, Macgillivray C, Kaurah P, Rideout AL, Steele P, Soares G, Huang WY, Whitehouse S, Blowers S, LeBlanc MA, Jiang H, Greer W, Samuels ME, Orr A, Fernandez CV, Majewski J, Ludman M, Dyack S, Penney LS, McMaster CR, Huntsman D, Bedard K. Germline mutations in MAP3K6 are associated with familial gastric cancer. PLoS Genet 2014; 10:e1004669. [PMID: 25340522 PMCID: PMC4207611 DOI: 10.1371/journal.pgen.1004669] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 08/14/2014] [Indexed: 12/13/2022] Open
Abstract
Gastric cancer is among the leading causes of cancer-related deaths worldwide. While heritable forms of gastric cancer are relatively rare, identifying the genes responsible for such cases can inform diagnosis and treatment for both hereditary and sporadic cases of gastric cancer. Mutations in the E-cadherin gene, CDH1, account for 40% of the most common form of familial gastric cancer (FGC), hereditary diffuse gastric cancer (HDGC). The genes responsible for the remaining forms of FGC are currently unknown. Here we examined a large family from Maritime Canada with FGC without CDH1 mutations, and identified a germline coding variant (p.P946L) in mitogen-activated protein kinase kinase kinase 6 (MAP3K6). Based on conservation, predicted pathogenicity and a known role of the gene in cancer predisposition, MAP3K6 was considered a strong candidate and was investigated further. Screening of an additional 115 unrelated individuals with non-CDH1 FGC identified the p.P946L MAP3K6 variant, as well as four additional coding variants in MAP3K6 (p.F849Sfs*142, p.P958T, p.D200Y and p.V207G). A somatic second-hit variant (p.H506Y) was present in DNA obtained from one of the tumor specimens, and evidence of DNA hypermethylation within the MAP3K6 gene was observed in DNA from the tumor of another affected individual. These findings, together with previous evidence from mouse models that MAP3K6 acts as a tumor suppressor, and studies showing the presence of somatic mutations in MAP3K6 in non-hereditary gastric cancers and gastric cancer cell lines, point towards MAP3K6 variants as a predisposing factor for FGC. The underlying genetic mutations involved in 60% of inherited gastric cancer cases remain unknown. Here we present a large, extended pedigree with familial gastric cancer and an association in part of the family with a mutation in MAP3K6. The conservation, predicted pathogenicity of the variant, tissue distribution, and known function of MAP3K6 made this a strong candidate that warranted further investigation. Examination of an additional 115 unrelated probands identified additional mutations in MAP3K6, including a truncating mutation.
Collapse
Affiliation(s)
- Daniel Gaston
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Samantha Hansford
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Carla Oliveira
- Expression Regulation in Cancer Group, IPATIMUP, Institute of Molecular Pathology and Immunology of the University of Porto & Medical Faculty of the University of Porto, Porto, Portugal
| | - Mathew Nightingale
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Hugo Pinheiro
- Expression Regulation in Cancer Group, IPATIMUP, Institute of Molecular Pathology and Immunology of the University of Porto & Medical Faculty of the University of Porto, Porto, Portugal
| | - Christine Macgillivray
- Department of Ophthalmology and Visual Sciences, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Pardeep Kaurah
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Patricia Steele
- Medical Genetics, IWK Health Centre, Halifax, Nova Scotia, Canada
| | - Gabriela Soares
- Center of Medical Genetics Jacinto de Magalhães, Porto Hospital Center, Porto, Portugal
| | - Weei-Yuarn Huang
- Division of Anatomical Pathology, Department of Pathology, Queen Elizabeth II Health Science Center and Dalhousie University, Halifax, Nova Scotia, Canada
| | - Scott Whitehouse
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Sarah Blowers
- Queen's Family Health Team, Kingston, Ontario, Canada
| | - Marissa A. LeBlanc
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Haiyan Jiang
- Department of Biostatistics, Princess Margaret Cancer Centre, Toronto, Ontario, Canada
| | - Wenda Greer
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Mark E. Samuels
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
- Centre de Recherche du CHU Ste-Justine and Department of Medicine, University of Montreal, Montreal, Quebec, Canada
| | - Andrew Orr
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
- Department of Ophthalmology and Visual Sciences, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Conrad V. Fernandez
- Department of Pediatrics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Jacek Majewski
- Department of Human Genetics, McGill University, Montreal, Québec, Canada
| | - Mark Ludman
- Medical Genetics, IWK Health Centre, Halifax, Nova Scotia, Canada
- Oncogenetics Service, Institute of Medical Genetics, Meir Medical Center, Kfar Saba, Israel
| | - Sarah Dyack
- Medical Genetics, IWK Health Centre, Halifax, Nova Scotia, Canada
- Department of Pediatrics, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Lynette S. Penney
- Medical Genetics, IWK Health Centre, Halifax, Nova Scotia, Canada
- Department of Pediatrics, Dalhousie University, Halifax, Nova Scotia, Canada
| | | | - David Huntsman
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Karen Bedard
- Department of Pathology, Dalhousie University, Halifax, Nova Scotia, Canada
- * E-mail:
| |
Collapse
|
37
|
Subramanian S. Using the plurality of codon positions to identify deleterious variants in human exomes. ACTA ACUST UNITED AC 2014; 31:301-5. [PMID: 25282643 DOI: 10.1093/bioinformatics/btu653] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
MOTIVATION A codon position could perform different or multiple roles in alternative transcripts of a gene. For instance, a non-synonymous position in one transcript could be a synonymous site in another. Alternatively, a position could remain as non-synonymous in multiple transcripts. Here we examined the impact of codon position plurality on the frequency of deleterious single-nucleotide variations (SNVs) using data from 6500 human exomes. RESULTS Our results showed that the proportion of deleterious SNVs was more than 2-fold higher in positions that remain non-synonymous in multiple transcripts compared with that observed in positions that are non-synonymous in one or some transcript(s) and synonymous or intronic in other(s). Furthermore, we observed a positive relationship between the fraction of deleterious non-synonymous SNVs and the number of proteins (alternative splice variants) affected. These results demonstrate that the plurality of codon positions is an important attribute, which could be useful in identifying mutations associated with diseases. CONTACT s.subramanian@griffith.edu.au SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sankar Subramanian
- Environmental Futures Research Institute, Griffith University, 170 Kessels Road, Nathan Qld 4111, Australia
| |
Collapse
|
38
|
Lachance J, Tishkoff SA. Biased gene conversion skews allele frequencies in human populations, increasing the disease burden of recessive alleles. Am J Hum Genet 2014; 95:408-20. [PMID: 25279983 PMCID: PMC4185123 DOI: 10.1016/j.ajhg.2014.09.008] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 08/21/2014] [Accepted: 09/10/2014] [Indexed: 10/25/2022] Open
Abstract
Gene conversion results in the nonreciprocal transfer of genetic information between two recombining sequences, and there is evidence that this process is biased toward G and C alleles. However, the strength of GC-biased gene conversion (gBGC) in human populations and its effects on hereditary disease have yet to be assessed on a genomic scale. Using high-coverage whole-genome sequences of African hunter-gatherers, agricultural populations, and primate outgroups, we quantified the effects of GC-biased gene conversion on population genomic data sets. We find that genetic distances (FST and population branch statistics) are modified by gBGC. In addition, the site frequency spectrum is left-shifted when ancestral alleles are favored by gBGC and right-shifted when derived alleles are favored by gBGC. Allele frequency shifts due to gBGC mimic the effects of natural selection. As expected, these effects are strongest in high-recombination regions of the human genome. By comparing the relative rates of fixation of unbiased and biased sites, the strength of gene conversion was estimated to be on the order of Nb ≈ 0.05 to 0.09. We also find that derived alleles favored by gBGC are much more likely to be homozygous than derived alleles at unbiased SNPs (+42.2% to 62.8%). This results in a curse of the converted, whereby gBGC causes substantial increases in hereditary disease risks. Taken together, our findings reveal that GC-biased gene conversion has important population genetic and public health implications.
Collapse
MESH Headings
- Bias
- Evolution, Molecular
- Gene Conversion
- Gene Frequency
- Genes, Recessive/genetics
- Genetic Diseases, Inborn/genetics
- Genetics, Population
- Genome, Human/genetics
- Humans
- Models, Genetic
- Models, Theoretical
- Polymorphism, Single Nucleotide/genetics
- Recombination, Genetic
- Selection, Genetic/genetics
Collapse
Affiliation(s)
- Joseph Lachance
- Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Sarah A Tishkoff
- Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
39
|
|
40
|
|
41
|
|
42
|
Cheng F, Jia P, Wang Q, Lin CC, Li WH, Zhao Z. Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol Biol Evol 2014; 31:2156-69. [PMID: 24881052 DOI: 10.1093/molbev/msu167] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Cells govern biological functions through complex biological networks. Perturbations to networks may drive cells to new phenotypic states, for example, tumorigenesis. Identifying how genetic lesions perturb molecular networks is a fundamental challenge. This study used large-scale human interactome data to systematically explore the relationship among network topology, somatic mutation, evolutionary rate, and evolutionary origin of cancer genes. We found the unique network centrality of cancer proteins, which is largely independent of gene essentiality. Cancer genes likely have experienced a lower evolutionary rate and stronger purifying selection than those of noncancer, Mendelian disease, and orphan disease genes. Cancer proteins tend to have ancient histories, likely originated in early metazoan, although they are younger than proteins encoded by Mendelian disease genes, orphan disease genes, and essential genes. We found that the protein evolutionary origin (age) positively correlates with protein connectivity in the human interactome. Furthermore, we investigated the network-attacking perturbations due to somatic mutations identified from 3,268 tumors across 12 cancer types in The Cancer Genome Atlas. We observed a positive correlation between protein connectivity and the number of nonsynonymous somatic mutations, whereas a weaker or insignificant correlation between protein connectivity and the number of synonymous somatic mutations. These observations suggest that somatic mutational network-attacking perturbations to hub genes play an important role in tumor emergence and evolution. Collectively, this work has broad biomedical implications for both basic cancer biology and the development of personalized cancer therapy.
Collapse
Affiliation(s)
- Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Quan Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of ChicagoBiodiversity Research Center and Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of MedicineDepartment of Cancer Biology, Vanderbilt University School of MedicineDepartment of Psychiatry, Vanderbilt University School of MedicineCenter for Quantitative Sciences, Vanderbilt University Medical Center
| |
Collapse
|
43
|
Bermejo-Das-Neves C, Nguyen HN, Poch O, Thompson JD. A comprehensive study of small non-frameshift insertions/deletions in proteins and prediction of their phenotypic effects by a machine learning method (KD4i). BMC Bioinformatics 2014; 15:111. [PMID: 24742296 PMCID: PMC4021375 DOI: 10.1186/1471-2105-15-111] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/09/2014] [Indexed: 11/10/2022] Open
Abstract
Background Small insertion and deletion polymorphisms (Indels) are the second most common mutations in the human genome, after Single Nucleotide Polymorphisms (SNPs). Recent studies have shown that they have significant influence on genetic variation by altering human traits and can cause multiple human diseases. In particular, many Indels that occur in protein coding regions are known to impact the structure or function of the protein. A major challenge is to predict the effects of these Indels and to distinguish between deleterious and neutral variants. When an Indel occurs within a coding region, it can be either frameshifting (FS) or non-frameshifting (NFS). FS-Indels either modify the complete C-terminal region of the protein or result in premature termination of translation. NFS-Indels insert/delete multiples of three nucleotides leading to the insertion/deletion of one or more amino acids. Results In order to study the relationships between NFS-Indels and Mendelian diseases, we characterized NFS-Indels according to numerous structural, functional and evolutionary parameters. We then used these parameters to identify specific characteristics of disease-causing and neutral NFS-Indels. Finally, we developed a new machine learning approach, KD4i, that can be used to predict the phenotypic effects of NFS-Indels. Conclusions We demonstrate in a large-scale evaluation that the accuracy of KD4i is comparable to existing state-of-the-art methods. However, a major advantage of our approach is that we also provide the reasons for the predictions, in the form of a set of rules. The rules are interpretable by non-expert humans and they thus represent new knowledge about the relationships between the genotype and phenotypes of NFS-Indels and the causative molecular perturbations that result in the disease.
Collapse
Affiliation(s)
| | | | | | - Julie D Thompson
- ICube Laboratory and Strasbourg Federation of Translational Medicine (FMTS), University of Strasbourg and CNRS, Strasbourg, France.
| |
Collapse
|
44
|
Unifying immunology with informatics and multiscale biology. Nat Immunol 2014; 15:118-27. [PMID: 24448569 DOI: 10.1038/ni.2787] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 11/14/2013] [Indexed: 12/14/2022]
Abstract
The immune system is a highly complex and dynamic system. Historically, the most common scientific and clinical practice has been to evaluate its individual components. This kind of approach cannot always expose the interconnecting pathways that control immune-system responses and does not reveal how the immune system works across multiple biological systems and scales. High-throughput technologies can be used to measure thousands of parameters of the immune system at a genome-wide scale. These system-wide surveys yield massive amounts of quantitative data that provide a means to monitor and probe immune-system function. New integrative analyses can help synthesize and transform these data into valuable biological insight. Here we review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.
Collapse
|
45
|
Gala MK, Mizukami Y, Le LP, Moriichi K, Austin T, Yamamoto M, Lauwers GY, Bardeesy N, Chung DC. Germline mutations in oncogene-induced senescence pathways are associated with multiple sessile serrated adenomas. Gastroenterology 2014; 146:520-9. [PMID: 24512911 PMCID: PMC3978775 DOI: 10.1053/j.gastro.2013.10.045] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2013] [Revised: 10/17/2013] [Accepted: 10/21/2013] [Indexed: 12/20/2022]
Abstract
BACKGROUND & AIMS Little is known about the genetic factors that contribute to the development of sessile serrated adenomas (SSAs). SSAs contain somatic mutations in BRAF or KRAS early in development. However, evidence from humans and mouse models indicates that these mutations result in oncogene-induced senescence (OIS) of intestinal crypt cells. Progression to serrated neoplasia requires cells to escape OIS via inactivation of tumor suppressor pathways. We investigated whether subjects with multiple SSAs carry germline loss-of function mutations (nonsense and splice site) in genes that regulate OIS: the p16-Rb and ATM-ATR DNA damage response pathways. METHODS Through a bioinformatic analysis of the literature, we identified a set of genes that function at the main nodes of the p16-Rb and ATM-ATR DNA damage response pathways. We performed whole-exome sequencing of 20 unrelated subjects with multiple SSAs; most had features of serrated polyposis. We compared sequences with those from 4300 subjects matched for ethnicity (controls). We also used an integrative genomics approach to identify additional genes involved in senescence mechanisms. RESULTS We identified mutations in genes that regulate senescence (ATM, PIF1, TELO2,XAF1, and RBL1) in 5 of 20 subjects with multiple SSAs (odds ratio, 3.0; 95% confidence interval, 0.9–8.9; P =.04). In 2 subjects,we found nonsense mutations in RNF43, indicating that it is also associated with multiple serrated polyps (odds ratio, 460; 95% confidence interval, 23.1–16,384; P = 6.8 x 10(-5)). In knockdown experiments with pancreatic duct cells exposed to UV light, RNF43 appeared to function as a regulator of ATMATRDNA damage response. CONCLUSIONS We associated germline loss-of-function variants in genes that regulate senescence pathways with the development of multiple SSAs.We identified RNF43 as a regulator of the DNA damage response and associated nonsense variants in this gene with a high risk of developing SSAs.
Collapse
Affiliation(s)
- Manish K. Gala
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Yusuke Mizukami
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA,Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, MA,Center for Clinical and Biomedical Research, Sapporo Higashi Tokushukai Hospital, Sapporo, Japan
| | - Long P. Le
- Massachusetts General Hospital Department of Pathology and Harvard Medical School, Boston, MA
| | - Kentaro Moriichi
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Thomas Austin
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Masayoshi Yamamoto
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA
| | - Gregory Y. Lauwers
- Massachusetts General Hospital Department of Pathology and Harvard Medical School, Boston, MA
| | - Nabeel Bardeesy
- Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, MA
| | - Daniel C. Chung
- Massachusetts General Hospital Department of Medicine, G.I. Unit and Harvard Medical School, Boston, MA,Massachusetts General Hospital Cancer Center and Harvard Medical School, Boston, MA
| |
Collapse
|
46
|
Stecher G, Liu L, Sanderford M, Peterson D, Tamura K, Kumar S. MEGA-MD: molecular evolutionary genetics analysis software with mutational diagnosis of amino acid variation. Bioinformatics 2014; 30:1305-7. [PMID: 24413669 DOI: 10.1093/bioinformatics/btu018] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Computational diagnosis of amino acid variants in the human exome is the first step in assessing the disruptive impacts of non-synonymous single nucleotide variants (nsSNVs) on human health and disease. The Molecular Evolutionary Genetics Analysis software with mutational diagnosis (MEGA-MD) is a suite of tools developed to forecast the deleteriousness of nsSNVs using multiple methods and to explore nsSNVs in the context of the variability permitted in the long-term evolution of the affected position. In its graphical interface for use on desktops, it enables interactive computational diagnosis and evolutionary exploration of nsSNVs. As a web service, MEGA-MD is suitable for diagnosing variants on an exome scale. The MEGA-MD suite intends to serve the needs for conducting low- and high-throughput analysis of nsSNVs in diverse applications.
Collapse
Affiliation(s)
- Glen Stecher
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University (ASU), Tempe, AZ 85287, Research Center for Genomics and Bioinformatics, Tokyo Metropolitan University (TMU), Hachioji, Tokyo, Japan, Department of Biological Sciences, TMU, Tokyo, Japan, School of Life Sciences, ASU, Tempe, AZ 85287, USA and Center for Excellence in Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | | | | | | | | | | |
Collapse
|
47
|
Goswami DB, Ogawa LM, Ward JM, Miller GM, Vallender EJ. Large-scale polymorphism discovery in macaque G-protein coupled receptors. BMC Genomics 2013; 14:703. [PMID: 24119066 PMCID: PMC3907043 DOI: 10.1186/1471-2164-14-703] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Accepted: 10/04/2013] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND G-protein coupled receptors (GPCRs) play an inordinately large role in human health. Variation in the genes that encode these receptors is associated with numerous disorders across the entire spectrum of disease. GPCRs also represent the single largest class of drug targets and associated pharmacogenetic effects are modulated, in part, by polymorphisms. Recently, non-human primate models have been developed focusing on naturally-occurring, functionally-parallel polymorphisms in candidate genes. This work aims to extend those studies broadly across the roughly 377 non-olfactory GPCRs. Initial efforts include resequencing 44 Indian-origin rhesus macaques (Macaca mulatta), 20 Chinese-origin rhesus macaques, and 32 cynomolgus macaques (M. fascicularis). RESULTS Using the Agilent target enrichment system, capture baits were designed for GPCRs off the human and rhesus exonic sequence. Using next generation sequencing technologies, nearly 25,000 SNPs were identified in coding sequences including over 14,000 non-synonymous and more than 9,500 synonymous protein-coding SNPs. As expected, regions showing the least evolutionary constraint show greater rates of polymorphism and greater numbers of higher frequency polymorphisms. While the vast majority of these SNPs are singletons, roughly 1,750 non-synonymous and 2,900 synonymous SNPs were found in multiple individuals. CONCLUSIONS In all three populations, polymorphism and divergence is highly concentrated in N-terminal and C-terminal domains and the third intracellular loop region of GPCRs, regions critical to ligand-binding and signaling. SNP frequencies in macaques follow a similar pattern of divergence from humans and new polymorphisms in primates have been identified that may parallel those seen in humans, helping to establish better non-human primate models of disease.
Collapse
Affiliation(s)
- Dharmendra B Goswami
- New England Primate Research Center, Harvard Medical School, One Pine Hill Drive, Southborough, MA 01772, USA.
| | | | | | | | | |
Collapse
|
48
|
eXtasy: variant prioritization by genomic data fusion. Nat Methods 2013; 10:1083-4. [PMID: 24076761 DOI: 10.1038/nmeth.2656] [Citation(s) in RCA: 124] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 08/26/2013] [Indexed: 01/01/2023]
Abstract
Massively parallel sequencing greatly facilitates the discovery of novel disease genes causing Mendelian and oligogenic disorders. However, many mutations are present in any individual genome, and identifying which ones are disease causing remains a largely open problem. We introduce eXtasy, an approach to prioritize nonsynonymous single-nucleotide variants (nSNVs) that substantially improves prediction of disease-causing variants in exome sequencing data by integrating variant impact prediction, haploinsufficiency prediction and phenotype-specific gene prioritization.
Collapse
|
49
|
Veeramah KR, Johnstone L, Karafet TM, Wolf D, Sprissler R, Salogiannis J, Barth-Maron A, Greenberg ME, Stuhlmann T, Weinert S, Jentsch T, Pazzi M, Restifo LL, Talwar D, Erickson RP, Hammer MF. Exome sequencing reveals new causal mutations in children with epileptic encephalopathies. Epilepsia 2013; 54:1270-81. [PMID: 23647072 PMCID: PMC3700577 DOI: 10.1111/epi.12201] [Citation(s) in RCA: 232] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2013] [Indexed: 02/06/2023]
Abstract
PURPOSE The management of epilepsy in children is particularly challenging when seizures are resistant to antiepileptic medications, or undergo many changes in seizure type over time, or have comorbid cognitive, behavioral, or motor deficits. Despite efforts to classify such epilepsies based on clinical and electroencephalographic criteria, many children never receive a definitive etiologic diagnosis. Whole exome sequencing (WES) is proving to be a highly effective method for identifying de novo variants that cause neurologic disorders, especially those associated with abnormal brain development. Herein we explore the utility of WES for identifying candidate causal de novo variants in a cohort of children with heterogeneous sporadic epilepsies without etiologic diagnoses. METHODS We performed WES (mean coverage approximately 40×) on 10 trios comprised of unaffected parents and a child with sporadic epilepsy characterized by difficult-to-control seizures and some combination of developmental delay, epileptic encephalopathy, autistic features, cognitive impairment, or motor deficits. Sequence processing and variant calling were performed using standard bioinformatics tools. A custom filtering system was used to prioritize de novo variants of possible functional significance for validation by Sanger sequencing. KEY FINDINGS In 9 of 10 probands, we identified one or more de novo variants predicted to alter protein function, for a total of 15. Four probands had de novo mutations in genes previously shown to harbor heterozygous mutations in patients with severe, early onset epilepsies (two in SCN1A, and one each in CDKL5 and EEF1A2). In three children, the de novo variants were in genes with functional roles that are plausibly relevant to epilepsy (KCNH5, CLCN4, and ARHGEF15). The variant in KCNH5 alters one of the highly conserved arginine residues of the voltage sensor of the encoded voltage-gated potassium channel. In vitro analyses using cell-based assays revealed that the CLCN4 mutation greatly impaired ion transport by the ClC-4 2Cl(-) /H(+) -exchanger and that the mutation in ARHGEF15 reduced GEF exchange activity of the gene product, Ephexin5, by about 50%. Of interest, these seven probands all presented with seizures within the first 6 months of life, and six of these have intractable seizures. SIGNIFICANCE The finding that 7 of 10 children carried de novo mutations in genes of known or plausible clinical significance to neuronal excitability suggests that WES will be of use for the molecular genetic diagnosis of sporadic epilepsies in children, especially when seizures are of early onset and difficult to control.
Collapse
Affiliation(s)
- Krishna R Veeramah
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ, 85721, USA
| | - Laurel Johnstone
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ, 85721, USA
| | - Tatiana M Karafet
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ, 85721, USA
| | - Daniel Wolf
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ, 85721, USA
| | - Ryan Sprissler
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ, 85721, USA
| | - John Salogiannis
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Asa Barth-Maron
- Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA
| | | | - Till Stuhlmann
- Leibniz-Institut für Molekulare Pharmakologie (FMP) and Max-Delbrück-Centrum für Molekulare Medizin (MDC), 13125 Berlin, Germany
| | - Stefanie Weinert
- Leibniz-Institut für Molekulare Pharmakologie (FMP) and Max-Delbrück-Centrum für Molekulare Medizin (MDC), 13125 Berlin, Germany
| | - Thomas Jentsch
- Leibniz-Institut für Molekulare Pharmakologie (FMP) and Max-Delbrück-Centrum für Molekulare Medizin (MDC), 13125 Berlin, Germany
| | | | - Linda L Restifo
- Department of Neurology, Arizona Health Science Center, Tucson AZ 85724, USA
- Department of Neuroscience, University of Arizona, Tucson, AZ 85821, USA
- Department of Cellular & Molecular Medicine, Arizona Health Science Center, Tucson, AZ 85724, USA
| | - Dinesh Talwar
- Center for Neurosciences, Tucson, AZ 85718, USA
- Department of Neurology, Arizona Health Science Center, Tucson AZ 85724, USA
- Department of Pediatrics, Arizona Health Science Center, Tucson AZ 85724, USA
| | - Robert P Erickson
- Department of Pediatrics, Arizona Health Science Center, Tucson AZ 85724, USA
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Michael F Hammer
- ARL Division of Biotechnology, University of Arizona, Tucson, AZ, 85721, USA
| |
Collapse
|
50
|
Kirwan JD, Bekaert M, Commins JM, Davies KTJ, Rossiter SJ, Teeling EC. A phylomedicine approach to understanding the evolution of auditory sensory perception and disease in mammals. Evol Appl 2013; 6:412-22. [PMID: 23745134 PMCID: PMC3673470 DOI: 10.1111/eva.12047] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Accepted: 12/21/2012] [Indexed: 01/31/2023] Open
Abstract
Hereditary deafness affects 0.1% of individuals globally and is considered as one of the most debilitating diseases of man. Despite recent advances, the molecular basis of normal auditory function is not fully understood and little is known about the contribution of single-nucleotide variations to the disease. Using cross-species comparisons of 11 ‘deafness’ genes (Myo15, Ush1 g, Strc, Tecta, Tectb, Otog, Col11a2, Gjb2, Cldn14, Kcnq4, Pou3f4) across 69 evolutionary and ecologically divergent mammals, we elucidated whether there was evidence for: (i) adaptive evolution acting on these genes across mammals with similar hearing capabilities; and, (ii) regions of long-term evolutionary conservation within which we predict disease-associated mutations should occur. We find evidence of adaptive evolution acting on the eutherian mammals in Myo15, Otog and Tecta. Examination of selection pressures in Tecta and Pou3f4 across a taxonomic sample that included a wide representation of auditory specialists, the bats, did not uncover any evidence for a role in echolocation. We generated ‘conservation indices’ based on selection estimates at nucleotide sites and found that known disease mutations fall within sites of high evolutionary conservation. We suggest that methods such as this, derived from estimates of evolutionary conservation using phylogenetically divergent taxa, will help to differentiate between deleterious and benign mutations.
Collapse
Affiliation(s)
- John D Kirwan
- UCD School of Biology and Environmental Science & UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin Dublin, Ireland
| | | | | | | | | | | |
Collapse
|