1
|
Riccio C, Jansen ML, Guo L, Ziegler A. Variant effect predictors: a systematic review and practical guide. Hum Genet 2024; 143:625-634. [PMID: 38573379 PMCID: PMC11098935 DOI: 10.1007/s00439-024-02670-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 03/11/2024] [Indexed: 04/05/2024]
Abstract
Large-scale association analyses using whole-genome sequence data have become feasible, but understanding the functional impacts of these associations remains challenging. Although many tools are available to predict the functional impacts of genetic variants, it is unclear which tool should be used in practice. This work provides a practical guide to assist in selecting appropriate tools for variant annotation. We conducted a MEDLINE search up to November 10, 2023, and included tools that are applicable to a broad range of phenotypes, can be used locally, and have been recently updated. Tools were categorized based on the types of variants they accept and the functional impacts they predict. Sequence Ontology terms were used for standardization. We identified 118 databases and software packages, encompassing 36 variant types and 161 functional impacts. Combining only three tools, namely SnpEff, FAVOR, and SparkINFERNO, allows predicting 99 (61%) distinct functional impacts. Thirty-seven tools predict 89 functional impacts that are not supported by any other tool, while 75 tools predict pathogenicity and can be used within the ACMG/AMP guidelines in a clinical context. We launched a website allowing researchers to select tools based on desired variants and impacts. In summary, more than 100 tools are already available to predict approximately 160 functional impacts. About 60% of the functional impacts can be predicted by the combination of three tools. Unexpectedly, recent tools do not predict more impacts than older ones. Future research should allow predicting the functionality of so far unsupported variant types, such as gene fusions.URL: https://cardio-care.shinyapps.io/VEP_Finder/ .Registration: OSF Registries on November 10, 2023, https://osf.io/s2gct .
Collapse
Affiliation(s)
- Cristian Riccio
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, Davos Wolfgang, 7265, Davos, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Max L Jansen
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, Davos Wolfgang, 7265, Davos, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Linlin Guo
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- University Center of Cardiovascular Science & Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Andreas Ziegler
- Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 1, Davos Wolfgang, 7265, Davos, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Center for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
- University Center of Cardiovascular Science & Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa.
| |
Collapse
|
2
|
Shukla K, Idanwekhai K, Naradikian M, Ting S, Schoenberger SP, Brunk E. Machine Learning of Three-Dimensional Protein Structures to Predict the Functional Impacts of Genome Variation. J Chem Inf Model 2024. [PMID: 38635316 DOI: 10.1021/acs.jcim.3c01967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Research in the human genome sciences generates a substantial amount of genetic data for hundreds of thousands of individuals, which concomitantly increases the number of variants of unknown significance (VUS). Bioinformatic analyses can successfully reveal rare variants and variants with clear associations with disease-related phenotypes. These studies have had a significant impact on how clinical genetic screens are interpreted and how patients are stratified for treatment. There are few, if any, computational methods for variants comparable to biological activity predictions. To address this gap, we developed a machine learning method that uses protein three-dimensional structures from AlphaFold to predict how a variant will influence changes to a gene's downstream biological pathways. We trained state-of-the-art machine learning classifiers to predict which protein regions will most likely impact transcriptional activities of two proto-oncogenes, nuclear factor erythroid 2 (NFE2L2)-related factor 2 (NRF2) and c-Myc. We have identified classifiers that attain accuracies higher than 80%, which have allowed us to identify a set of key protein regions that lead to significant perturbations in c-Myc or NRF2 transcriptional pathway activities.
Collapse
Affiliation(s)
- Kriti Shukla
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Kelvin Idanwekhai
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Martin Naradikian
- La Jolla Institute for Immunology, San Diego, California 92093, United States
| | - Stephanie Ting
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | | | - Elizabeth Brunk
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Integrative Program for Biological and Genome Sciences (IBGS), University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| |
Collapse
|
3
|
Stefanski A, Pérez-Palma E, Brünger T, Montanucci L, Gati C, Klöckner C, Johannesen KM, Goodspeed K, Macnee M, Deng AT, Aledo-Serrano Á, Borovikov A, Kava M, Bouman AM, Hajianpour MJ, Pal DK, Engelen M, Hagebeuk EEO, Shinawi M, Heidlebaugh AR, Oetjens K, Hoffman TL, Striano P, Freed AS, Futtrup L, Balslev T, Abulí A, Danvoye L, Lederer D, Balci T, Nouri MN, Butler E, Drewes S, van Engelen K, Howell KB, Khoury J, May P, Trinidad M, Froelich S, Lemke JR, Tiller J, Freed AN, Kang JQ, Wuster A, Møller RS, Lal D. SLC6A1 variant pathogenicity, molecular function and phenotype: a genetic and clinical analysis. Brain 2023; 146:5198-5208. [PMID: 37647852 PMCID: PMC10689929 DOI: 10.1093/brain/awad292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 06/05/2023] [Accepted: 07/08/2023] [Indexed: 09/01/2023] Open
Abstract
Genetic variants in the SLC6A1 gene can cause a broad phenotypic disease spectrum by altering the protein function. Thus, systematically curated clinically relevant genotype-phenotype associations are needed to understand the disease mechanism and improve therapeutic decision-making. We aggregated genetic and clinical data from 172 individuals with likely pathogenic/pathogenic (lp/p) SLC6A1 variants and functional data for 184 variants (14.1% lp/p). Clinical and functional data were available for a subset of 126 individuals. We explored the potential associations of variant positions on the GAT1 3D structure with variant pathogenicity, altered molecular function and phenotype severity using bioinformatic approaches. The GAT1 transmembrane domains 1, 6 and extracellular loop 4 (EL4) were enriched for patient over population variants. Across functionally tested missense variants (n = 156), the spatial proximity from the ligand was associated with loss-of-function in the GAT1 transporter activity. For variants with complete loss of in vitro GABA uptake, we found a 4.6-fold enrichment in patients having severe disease versus non-severe disease (P = 2.9 × 10-3, 95% confidence interval: 1.5-15.3). In summary, we delineated associations between the 3D structure and variant pathogenicity, variant function and phenotype in SLC6A1-related disorders. This knowledge supports biology-informed variant interpretation and research on GAT1 function. All our data can be interactively explored in the SLC6A1 portal (https://slc6a1-portal.broadinstitute.org/).
Collapse
Affiliation(s)
- Arthur Stefanski
- Genomic Medicine Institute and Epilepsy Center, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Eduardo Pérez-Palma
- Universidad del Desarrollo, Centro de Genética y Genómica, Facultad de Medicina Clínica Alemana, Santiago de Chile 7610658, Chile
| | - Tobias Brünger
- Cologne Center for Genomics (CCG), Medical Faculty of the University of Cologne, University Hospital of Cologne, Cologne 50931, Germany
| | - Ludovica Montanucci
- Genomic Medicine Institute and Epilepsy Center, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Cornelius Gati
- Department of Biological Sciences, Bridge Institute, USC Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA
| | - Chiara Klöckner
- Institute of Human Genetics, University of Leipzig Medical Center, Leipzig 04103, Germany
| | - Katrine M Johannesen
- Department of Epilepsy Genetics and Personalized Medicine, The Danish Epilepsy Centre, Dianalund 4293, Denmark
- Department of Genetics, University Hospital of Copenhagen, Rigshispitalet, Copenhagen 2100, Denmark
| | - Kimberly Goodspeed
- Children’s Health, Medical Center, Dallas, TX 75235, USA
- Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Marie Macnee
- Cologne Center for Genomics (CCG), Medical Faculty of the University of Cologne, University Hospital of Cologne, Cologne 50931, Germany
| | - Alexander T Deng
- Clinical Genetics, Guys and St Thomas NHS Trust, London SE19RT, UK
| | - Ángel Aledo-Serrano
- Epilepsy Program, Neurology Department, Hospital Ruber Internacional, Madrid 28034, Spain
| | - Artem Borovikov
- Research and Counseling Department, Research Centre for Medical Genetics, Moscow 115478, Russia
| | - Maina Kava
- Department of Neurology and Metabolic Medicine, Perth Children’s Hospital, Perth 6009, Australia
- School of Paediatrics and Child Health, UWA Medical School, University of Western Australia, Perth 6009, Australia
| | - Arjan M Bouman
- Department of Clinical Genetics, Erasmus MC, University Medical Center, Rotterdam 3015GD, The Netherlands
| | - M J Hajianpour
- Department of Pediatrics, Division of Medical Genetics and Genomics, Albany Medical College, Albany Med Health System, Albany, NY 12208, USA
| | - Deb K Pal
- Department of Basic and Clinical Neurosciences, Institute of Psychiatry, Psychology and Neuroscience, King’s College, London SE58AF, UK
- Department of Basic and Clinical Neurosciences, King’s College Hospital, London SE59RS, UK
| | - Marc Engelen
- Department of Pediatric Neurology, Amsterdam Public Health, Amsterdam University Medical Center, Amsterdam 1081HV, The Netherlands
| | - Eveline E O Hagebeuk
- Department of Pediatric Neurology, Stichting Epilepsie Instellingen Nederland (SEIN), Heemstede and Zwolle 2103SW, The Netherlands
| | - Marwan Shinawi
- Division of Genetics and Genomic Medicine, Department of Pediatrics, St.Louis Children’s Hospital, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | | - Kathryn Oetjens
- Autism and Developmental Medicine Institute, Geisinger, Danville, PA 17837, USA
| | - Trevor L Hoffman
- Department of Regional Genetics, Anaheim, Southern California Kaiser Permanente Medical Group, CA 92806, USA
| | - Pasquale Striano
- Pediatric Neurology and Muscular Diseases Unit, IRCCS Istituto Giannina Gaslini, Genoa 16147, Italy
- Department of Neurosciences, Rehabilitation, Ophthalmology, Genetics, Maternal and Child Health, University of Genoa, Genoa 16132, Italy
| | - Amanda S Freed
- Department of Clinical Science, Kaiser Permanente Bernard J. Tyson School of Medicine, Pasadena, CA 91101, USA
| | - Line Futtrup
- Department of Paediatrics, Regional Hospital of Central Jutland, Viborg 8800, Denmark
| | - Thomas Balslev
- Department of Paediatrics, Regional Hospital of Central Jutland, Viborg 8800, Denmark
- Centre for Educational Development, Aarhus University, Aarhus 8200, Denmark
| | - Anna Abulí
- Department of Clinical and Molecular Genetics and Medicine Genetics Group, VHIR, University Hospital Vall d’Hebron, Barcelona 08035, Spain
| | - Leslie Danvoye
- Department of Neurology, Université catholique de Louvain, Cliniques universitaires Saint-Luc, Brussels 1200, Belgium
| | - Damien Lederer
- Centre for Human Genetics, Institute for Pathology and Genetics, Gosselies 6041, Belgium
| | - Tugce Balci
- Department of Pediatrics, Division of Medical Genetics, Western University, London, ON N6A3K7, Canada
- Medical Genetics Program of Southwestern Ontario, London Health Sciences Centre and Children's Health Research Institute, London, ON N6A5A5, Canada
| | - Maryam Nabavi Nouri
- Department of Paediatrics, Division of Pediatric Neurology, London Health Sciences Centre, London, ON N6A5W9, Canada
| | | | - Sarah Drewes
- Department of Medical Genetics, UPMC Children’s Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Kalene van Engelen
- Medical Genetics Program of Southwestern Ontario, London Health Sciences Centre, London, ON N6A5W9, Canada
| | - Katherine B Howell
- Department of Neurology, Royal Children’s Hospital, Melbourne, VIC 3052, Australia
- Department of Pediatrics, University of Melbourne, Melbourne, VIC 3052, Australia
- Murdoch Children’s Research Institute, Melbourne, VIC 3052, Australia
| | - Jean Khoury
- Genomic Medicine Institute and Epilepsy Center, Cleveland Clinic, Cleveland, OH 44195, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette 4362, Luxembourg
| | - Marena Trinidad
- Translational Genomics, BioMarin Pharmaceutical Inc., Novato, CA 94949, USA
| | - Steven Froelich
- Translational Genomics, BioMarin Pharmaceutical Inc., Novato, CA 94949, USA
| | - Johannes R Lemke
- Institute of Human Genetics, University of Leipzig Medical Center, Leipzig 04103, Germany
- Center for Rare Diseases, University of Leipzig Medical Center, Leipzig 04103, Germany
| | | | | | - Jing-Qiong Kang
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN 37240, USA
- Neuroscience Graduate Program, Vanderbilt University, Nashville, TN 37235, USA
- Department of Neurology, Vanderbilt Brain Institute, Nashville, TN 37235, USA
- Department of Pharmacology, Vanderbilt University, Nashville, TN 37232, USA
- Vanderbilt Kennedy Center of Human Development, Nashville, TN 37203, USA
| | - Arthur Wuster
- Translational Genomics, BioMarin Pharmaceutical Inc., Novato, CA 94949, USA
| | - Rikke S Møller
- Department of Epilepsy Genetics and Personalized Medicine, The Danish Epilepsy Centre, Dianalund 4293, Denmark
- Department of Regional Health Research, University of Southern Denmark, Odense 5000, Denmark
| | - Dennis Lal
- Genomic Medicine Institute and Epilepsy Center, Cleveland Clinic, Cleveland, OH 44195, USA
- Stanley Center of Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
- Department of Neurology, University of Texas Health Sciences Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
4
|
Sierk M, Ratnayake S, Wagle MM, Chen B, Park B, Wang J, Youkharibache P, Meerzaman D. 3DVizSNP: a tool for rapidly visualizing missense mutations identified in high throughput experiments in iCn3D. BMC Bioinformatics 2023; 24:244. [PMID: 37296383 DOI: 10.1186/s12859-023-05370-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/30/2023] [Indexed: 06/12/2023] Open
Abstract
BACKGROUND High throughput experiments in cancer and other areas of genomic research identify large numbers of sequence variants that need to be evaluated for phenotypic impact. While many tools exist to score the likely impact of single nucleotide polymorphisms (SNPs) based on sequence alone, the three-dimensional structural environment is essential for understanding the biological impact of a nonsynonymous mutation. RESULTS We present a program, 3DVizSNP, that enables the rapid visualization of nonsynonymous missense mutations extracted from a variant caller format file using the web-based iCn3D visualization platform. The program, written in Python, leverages REST APIs and can be run locally without installing any other software or databases, or from a webserver hosted by the National Cancer Institute. It automatically selects the appropriate experimental structure from the Protein Data Bank, if available, or the predicted structure from the AlphaFold database, enabling users to rapidly screen SNPs based on their local structural environment. 3DVizSNP leverages iCn3D annotations and its structural analysis functions to assess changes in structural contacts associated with mutations. CONCLUSIONS This tool enables researchers to efficiently make use of 3D structural information to prioritize mutations for further computational and experimental impact assessment. The program is available as a webserver at https://analysistools.cancer.gov/3dvizsnp or as a standalone python program at https://github.com/CBIIT-CGBB/3DVizSNP .
Collapse
Affiliation(s)
- Michael Sierk
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA.
| | - Shashikala Ratnayake
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| | - Manoj M Wagle
- Faculty of Pharmacy, University of Grenoble Alpes, Grenoble, France
- Department of Bioinformatics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, 576104, India
- School of Mathematics and Statistics, Faculty of Science, and Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
| | - Ben Chen
- Digital Services and Solutions Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| | - Brian Park
- Digital Services and Solutions Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Philippe Youkharibache
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| |
Collapse
|
5
|
Leiva S, Bugnon Valdano M, Gardiol D. Unravelling the epidemiological diversity of Zika virus by analyzing key protein variations. Arch Virol 2023; 168:115. [PMID: 36943525 DOI: 10.1007/s00705-023-05726-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 01/19/2023] [Indexed: 03/23/2023]
Abstract
The consequences of Zika virus (ZIKV) infections were limited to sporadic mild diseases until almost a decade ago, when epidemic outbreaks took place, with quick spread into the Americas. Simultaneously, novel severe neurological manifestations of ZIKV infections were identified, including congenital microcephaly. However, why the epidemic strains behave differently is not yet completely understood, and many questions remain about the actual significance of genetic variations in the epidemiology and biology of ZIKV. In this study, we analysed a large number of viral sequences to identify genes with different levels of variability and patterns of genomic variations that could be associated with ZIKV diversity. We compared numerous epidemic strains with pre-epidemic strains, using the BWA-mem algorithm, and we also examined specific variations among the epidemic ZIKV strains derived from microcephaly cases. We identified several viral genes with dissimilar mutation rates among the ZIKV strain groups and novel protein variation profiles that might be associated with epidemiological particularities. Finally, we assessed the impact of the detected changes on the structure and stability of the NS1, NS5, and E proteins using the I-TASSER, trRosetta, and RaptorX modelling algorithms, and we found some interesting variations that might help to explain the heterogeneous features of the diverse ZIKA strains. This work contributes to the identification of genetic differences in the ZIKV genome that might have a phenotypic impact, providing a basis for future experimental analysis to elucidate the genetic causes of the recent ZIKV emergency.
Collapse
Affiliation(s)
- Santiago Leiva
- Facultad de Ciencias Bioquímicas y Farmacéuticas, Instituto de Biología Molecular y Celular de Rosario-CONICET, Universidad Nacional de Rosario, Suipacha 531, 2000, Rosario, Argentina
| | - Marina Bugnon Valdano
- Facultad de Ciencias Bioquímicas y Farmacéuticas, Instituto de Biología Molecular y Celular de Rosario-CONICET, Universidad Nacional de Rosario, Suipacha 531, 2000, Rosario, Argentina.
| | - Daniela Gardiol
- Facultad de Ciencias Bioquímicas y Farmacéuticas, Instituto de Biología Molecular y Celular de Rosario-CONICET, Universidad Nacional de Rosario, Suipacha 531, 2000, Rosario, Argentina.
| |
Collapse
|
6
|
Oluwole OG, Henry M. Genomic medicine in Africa: a need for molecular genetics and pharmacogenomics experts. Curr Med Res Opin 2023; 39:141-147. [PMID: 36094413 DOI: 10.1080/03007995.2022.2124072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The large-scale implementation of genomic medicine in Africa has not been actualized. This overview describes how routine molecular genetics and advanced protein engineering/structural biotechnology could accelerate the implementation of genomic medicine. By using data-mining and analysis approaches, we analyzed relevant information obtained from public genomic databases on pharmacogenomics biomarkers and reviewed published studies to discuss the ideas. The results showed that only 68 very important pharmacogenes currently exist, while 867 drug label annotations, 201 curated functional pathways, and 746 annotated drugs have been catalogued on the largest pharmacogenomics database (PharmGKB). Only about 5009 variants of the reported ∼25,000 have been clinically annotated. Predominantly, the genetic variants were derived from 43 genes that contribute to 2318 clinically relevant variations in 57 diseases. Majority (∼60%) of the clinically relevant genetic variations in the pharmacogenes are missense variants (1390). The enrichment analysis showed that 15 pharmacogenes are connected biologically and are involved in the metabolism of cardiovascular and cancer drugs. The review of studies showed that cardiovascular diseases are the most frequent non-communicable diseases responsible for approximately 13% of all deaths in Africa. Also, warfarin pharmacogenomics is the most studied drug on the continent, while CYP2D6, CYP2C9, DPD, and TPMT are the most investigated pharmacogenes with allele activities indicated in African and considered to be intermediate metaboliser for DPD and TPMT (8.4% and 11%). In summary, we highlighted a framework for implementing genomic medicine starting from the available resources on ground.
Collapse
Affiliation(s)
- Oluwafemi G Oluwole
- Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa
| | - Marc Henry
- Medical Biotechnology and Immunotherapy Unit, Department of Integrative Biomedical Sciences Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa
| |
Collapse
|
7
|
Babbi G, Savojardo C, Baldazzi D, Martelli PL, Casadio R. Pathogenic variation types in human genes relate to diseases through Pfam and InterPro mapping. Front Mol Biosci 2022; 9:966927. [PMID: 36188216 PMCID: PMC9523224 DOI: 10.3389/fmolb.2022.966927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Grouping residue variations in a protein according to their physicochemical properties allows a dimensionality reduction of all the possible substitutions in a variant with respect to the wild type. Here, by using a large dataset of proteins with disease-related and benign variations, as derived by merging Humsavar and ClinVar data, we investigate to which extent our physicochemical grouping procedure can help in determining whether patterns of variation types are related to specific groups of diseases and whether they occur in Pfam and/or InterPro gene domains. Here, we download 75,145 germline disease-related and benign variations of 3,605 genes, group them according to physicochemical categories and map them into Pfam and InterPro gene domains. Statistically validated analysis indicates that each cluster of genes associated to Mondo anatomical system categorizations is characterized by a specific variation pattern. Patterns identify specific Pfam and InterPro domain–Mondo category associations. Our data suggest that the association of variation patterns to Mondo categories is unique and may help in associating gene variants to genetic diseases. This work corroborates in a much larger data set previous observations from our group.
Collapse
Affiliation(s)
- Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| |
Collapse
|
8
|
Tichkule S, Myung Y, Naung MT, Ansell BRE, Guy AJ, Srivastava N, Mehra S, Cacciò SM, Mueller I, Barry AE, van Oosterhout C, Pope B, Ascher DB, Jex AR. VIVID: a web application for variant interpretation and visualisation in multidimensional analyses. Mol Biol Evol 2022; 39:6697981. [PMID: 36103257 PMCID: PMC9514033 DOI: 10.1093/molbev/msac196] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Large-scale comparative genomics- and population genetic studies generate enormous amounts of polymorphism data in the form of DNA variants. Ultimately, the goal of many of these studies is to associate genetic variants to phenotypes or fitness. We introduce VIVID, an interactive, user-friendly web application that integrates a wide range of approaches for encoding genotypic to phenotypic information in any organism or disease, from an individual or population, in three-dimensional (3D) space. It allows mutation mapping and annotation, calculation of interactions and conservation scores, prediction of harmful effects, analysis of diversity and selection, and 3D visualization of genotypic information encoded in Variant Call Format on AlphaFold2 protein models. VIVID enables the rapid assessment of genes of interest in the study of adaptive evolution and the genetic load, and it helps prioritizing targets for experimental validation. We demonstrate the utility of VIVID by exploring the evolutionary genetics of the parasitic protist Plasmodium falciparum, revealing geographic variation in the signature of balancing selection in potential targets of functional antibodies.
Collapse
Affiliation(s)
- Swapnil Tichkule
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
- Department of Medical Biology, University of Melbourne , Melbourne , Australia
| | - Yoochan Myung
- Systems and Computational Biology, Bio21 Institute, University of Melbourne , Melbourne , Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes , Melbourne , Australia
| | - Myo T Naung
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
- Department of Medical Biology, University of Melbourne , Melbourne , Australia
| | - Brendan R E Ansell
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
| | - Andrew J Guy
- School of Science, RMIT University , Melbourne , Australia
| | - Namrata Srivastava
- Department of Data Science and AI, Monash University , Melbourne , Australia
| | - Somya Mehra
- Life Sciences Discipline, Burnet Institute , Melbourne , Australia
| | - Simone M Cacciò
- Department of Infectious Disease, Istituto Superiore di Sanità , Rome , Italy
| | - Ivo Mueller
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
| | - Alyssa E Barry
- Life Sciences Discipline, Burnet Institute , Melbourne , Australia
- Institute of Mental and Physical Health and Clinical Translation (IMPACT) and School of Medicine, Deakin University , Geelong , Australia
| | - Cock van Oosterhout
- School of Environmental Sciences, University of East Anglia, Norwich Research Park , Norwich , UK
| | - Bernard Pope
- Melbourne Bioinformatics, University of Melbourne , Melbourne , Australia
- Australian BioCommons , Sydney , Australia
- Department of Clinical Pathology, University of Melbourne , Melbourne , Australia
- Department of Surgery (Royal Melbourne Hospital), University of Melbourne , Melbourne , Australia
| | - David B Ascher
- Systems and Computational Biology, Bio21 Institute, University of Melbourne , Melbourne , Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes , Melbourne , Australia
| | - Aaron R Jex
- Population Health and Immunity, Walter and Eliza Hall Institute of Medical Research , Melbourne , Australia
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne , Melbourne , Australia
| |
Collapse
|
9
|
Mintoff D, Pace NP, Borg I. Interpreting the spectrum of gamma-secretase complex missense variation in the context of hidradenitis suppurativa—An in-silico study. Front Genet 2022; 13:962449. [PMID: 36118898 PMCID: PMC9478468 DOI: 10.3389/fgene.2022.962449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/08/2022] [Indexed: 11/23/2022] Open
Abstract
Hidradenitis suppurativa (HS) is a disease of the pilosebaceous unit characterized by recurrent nodules, abscesses and draining tunnels with a predilection to intertriginous skin. The pathophysiology of HS is complex. However, it is known that inflammation and hyperkeratinization at the hair follicle play crucial roles in disease manifestation. Genetic and environmental factors are considered the main drivers of these two pathophysiological processes. Despite a considerable proportion of patients having a positive family history of disease, only a minority of patients suffering from HS have been found to harbor monogenic variants which segregate to affected kindreds. Most of these variants are in the ɣ secretase complex (GSC) protein-coding genes. In this manuscript, we set out to characterize the burden of missense pathogenic variants in healthy reference population using large scale genomic dataset thereby providing a standard for comparing genomic variation in GSC protein-coding genes in the HS patient cohort.
Collapse
Affiliation(s)
- Dillon Mintoff
- Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
- Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| | - Nikolai P. Pace
- Centre for Molecular Biology and Biobanking, University of Malta, Msida, Malta
- Department of Anatomy, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
- *Correspondence: Nikolai P. Pace,
| | - Isabella Borg
- Department of Pathology, Faculty of Medicine and Surgery, University of Malta, Msida, Malta
- Centre for Molecular Biology and Biobanking, University of Malta, Msida, Malta
- Department of Pathology, Mater Dei Hospital, Msida, Malta
| |
Collapse
|
10
|
Kim M, Huffman JE, Justice A, Goethert I, Agasthya G, Danciu I. Identifying intragenic functional modules of genomic variations associated with cancer phenotypes by learning representation of association networks. BMC Med Genomics 2022; 15:151. [PMID: 35794577 PMCID: PMC9258200 DOI: 10.1186/s12920-022-01298-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 06/14/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND Genome-wide Association Studies (GWAS) aims to uncover the link between genomic variation and phenotype. They have been actively applied in cancer biology to investigate associations between variations and cancer phenotypes, such as susceptibility to certain types of cancer and predisposed responsiveness to specific treatments. Since GWAS primarily focuses on finding associations between individual genomic variations and cancer phenotypes, there are limitations in understanding the mechanisms by which cancer phenotypes are cooperatively affected by more than one genomic variation. RESULTS This paper proposes a network representation learning approach to learn associations among genomic variations using a prostate cancer cohort. The learned associations are encoded into representations that can be used to identify functional modules of genomic variations within genes associated with early- and late-onset prostate cancer. The proposed method was applied to a prostate cancer cohort provided by the Veterans Administration's Million Veteran Program to identify candidates for functional modules associated with early-onset prostate cancer. The cohort included 33,159 prostate cancer patients, 3181 early-onset patients, and 29,978 late-onset patients. The reproducibility of the proposed approach clearly showed that the proposed approach can improve the model performance in terms of robustness. CONCLUSIONS To our knowledge, this is the first attempt to use a network representation learning approach to learn associations among genomic variations within genes. Associations learned in this way can lead to an understanding of the underlying mechanisms of how genomic variations cooperatively affect each cancer phenotype. This method can reveal unknown knowledge in the field of cancer biology and can be utilized to design more advanced cancer-targeted therapies.
Collapse
Affiliation(s)
- Minsu Kim
- grid.135519.a0000 0004 0446 2659Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN USA
| | - Jennifer E. Huffman
- grid.410370.10000 0004 4657 1992Center for Population Genomics, MAVERIC, VA Boston Healthcare System, Jamaica Plain, MA USA
- grid.410370.10000 0004 4657 1992Massachusetts Veterans Epidemiology Research and Information Center, Veterans Affairs Boston Healthcare System, Boston, MA USA
| | - Amy Justice
- grid.281208.10000 0004 0419 3073Department of Veterans Affairs Connecticut Healthcare System, West Haven, CT USA
- grid.47100.320000000419368710Yale School of Medicine, New Haven, CT USA
| | - Ian Goethert
- grid.135519.a0000 0004 0446 2659Information Technology Services Division, Oak Ridge National Laboratory, Oak Ridge, TN USA
| | - Greeshma Agasthya
- grid.135519.a0000 0004 0446 2659Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN USA
| | | | - Ioana Danciu
- grid.135519.a0000 0004 0446 2659Advanced Computing for Health Sciences Group, Oak Ridge National Laboratory, Oak Ridge, TN USA
- grid.152326.10000 0001 2264 7217Department of Biomedical Informatics, Vanderbilt University, Nashville, TN USA
| |
Collapse
|
11
|
Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun 2022; 13:3895. [PMID: 35794153 PMCID: PMC9259657 DOI: 10.1038/s41467-022-31686-6] [Citation(s) in RCA: 62] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 06/29/2022] [Indexed: 12/12/2022] Open
Abstract
Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Taking protein structure into account has therefore provided great insight into the molecular mechanisms underlying human genetic disease. While there has been much focus on how mutations can disrupt protein structure and thus cause a loss of function (LOF), alternative mechanisms, specifically dominant-negative (DN) and gain-of-function (GOF) effects, are less understood. Here, we investigate the protein-level effects of pathogenic missense mutations associated with different molecular mechanisms. We observe striking differences between recessive vs dominant, and LOF vs non-LOF mutations, with dominant, non-LOF disease mutations having much milder effects on protein structure, and DN mutations being highly enriched at protein interfaces. We also find that nearly all computational variant effect predictors, even those based solely on sequence conservation, underperform on non-LOF mutations. However, we do show that non-LOF mutations could potentially be identified by their tendency to cluster in three-dimensional space. Overall, our work suggests that many pathogenic mutations that act via DN and GOF mechanisms are likely being missed by current variant prioritisation strategies, but that there is considerable scope to improve computational predictions through consideration of molecular disease mechanisms. Most known pathogenic mutations occur in protein-coding regions of DNA and change the way proteins are made. Here the authors analyse the locations of thousands of human disease mutations and their predicted effects on protein structure and show that,while loss-of-function mutations tend to be highly disruptive, non-loss-of-function mutations are in general much milder at a protein structural level.
Collapse
|
12
|
Veatch OJ, Mazzotti DR, Schultz RT, Abel T, Michaelson JJ, Brodkin ES, Tunc B, Assouline SG, Nickl-Jockschat T, Malow BA, Sutcliffe JS, Pack AI. Calculating genetic risk for dysfunction in pleiotropic biological processes using whole exome sequencing data. J Neurodev Disord 2022; 14:39. [PMID: 35751013 PMCID: PMC9233372 DOI: 10.1186/s11689-022-09448-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 06/08/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Numerous genes are implicated in autism spectrum disorder (ASD). ASD encompasses a wide-range and severity of symptoms and co-occurring conditions; however, the details of how genetic variation contributes to phenotypic differences are unclear. This creates a challenge for translating genetic evidence into clinically useful knowledge. Sleep disturbances are particularly prevalent co-occurring conditions in ASD, and genetics may inform treatment. Identifying convergent mechanisms with evidence for dysfunction that connect ASD and sleep biology could help identify better treatments for sleep disturbances in these individuals. METHODS To identify mechanisms that influence risk for ASD and co-occurring sleep disturbances, we analyzed whole exome sequence data from individuals in the Simons Simplex Collection (n = 2380). We predicted protein damaging variants (PDVs) in genes currently implicated in either ASD or sleep duration in typically developing children. We predicted a network of ASD-related proteins with direct evidence for interaction with sleep duration-related proteins encoded by genes with PDVs. Overrepresentation analyses of Gene Ontology-defined biological processes were conducted on the resulting gene set. We calculated the likelihood of dysfunction in the top overrepresented biological process. We then tested if scores reflecting genetic dysfunction in the process were associated with parent-reported sleep duration. RESULTS There were 29 genes with PDVs in the ASD dataset where variation was reported in the literature to be associated with both ASD and sleep duration. A network of 108 proteins encoded by ASD and sleep duration candidate genes with PDVs was identified. The mechanism overrepresented in PDV-containing genes that encode proteins in the interaction network with the most evidence for dysfunction was cerebral cortex development (GO:0,021,987). Scores reflecting dysfunction in this process were associated with sleep durations; the largest effects were observed in adolescents (p = 4.65 × 10-3). CONCLUSIONS Our bioinformatic-driven approach detected a biological process enriched for genes encoding a protein-protein interaction network linking ASD gene products with sleep duration gene products where accumulation of potentially damaging variants in individuals with ASD was associated with sleep duration as reported by the parents. Specifically, genetic dysfunction impacting development of the cerebral cortex may affect sleep by disrupting sleep homeostasis which is evidenced to be regulated by this brain region. Future functional assessments and objective measurements of sleep in adolescents with ASD could provide the basis for more informed treatment of sleep problems in these individuals.
Collapse
Affiliation(s)
- Olivia J Veatch
- Department of Psychiatry and Behavioral Sciences, Medical Center, University of Kansas, Kansas City, KS, USA.
| | - Diego R Mazzotti
- Division of Medical Informatics, Department of Internal Medicine, Medical Center, University of Kansas, Kansas City, KS, USA
| | - Robert T Schultz
- Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ted Abel
- Department of Neuroscience and Pharmacology, Iowa Neuroscience Institute, University of Iowa, Iowa City, Iowa, USA
| | | | - Edward S Brodkin
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Birkan Tunc
- Center for Autism Research, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Susan G Assouline
- Belin-Blank Center for Gifted Education and Talent Development, University of Iowa, Iowa City, Iowa, USA
| | | | - Beth A Malow
- Division of Sleep Medicine, Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - James S Sutcliffe
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Allan I Pack
- Division of Sleep Medicine, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
13
|
Lal M, Bhardwaj E, Chahar N, Yadav S, Das S. Comprehensive analysis of 1R- and 2R-MYBs reveals novel genic and protein features, complex organisation, selective expansion and insights into evolutionary tendencies. Funct Integr Genomics 2022; 22:371-405. [PMID: 35260976 DOI: 10.1007/s10142-022-00836-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 02/10/2022] [Accepted: 02/23/2022] [Indexed: 11/28/2022]
Abstract
Myeloblastosis (MYB) family, the largest plant transcription factor family, has been subcategorised based on the number and type of repeats in the MYB domain. In spite of several reports, evolution of MYB genes and repeats remains enigmatic. Brassicaceae members are endowed with complex genomes, including dysploidy because of its unique history with multiple rounds of polyploidisation, genomic fractionations and rearrangements. The present study is an attempt to gain insights into the complexities of MYB family diversity, understand impacts of genome evolution on gene families and develop an evolutionary framework to understand the origin of various subcategories of MYB gene family. We identified and analysed 1129 MYBs that included 1R-, 2R-, 3R- and atypical-MYBs across sixteen species representing protists, fungi, animals and plants and exclude MYB identified from Brassicaceae except Arabidopsis thaliana; in addition, a total of 1137 2R-MYB genes from six Brassicaceae species were also analysed. Comparative analysis revealed predominance of 1R-MYBs in protists, fungi, animals and lower plants. Phylogenetic reconstruction and analysis of selection pressure suggested ancestral nature of R1-type repeat containing 1R-MYBs that might have undergone intragenic duplication to form multi-repeat MYBs. Distinct differences in gene structure between 1R-MYB and 2R-MYBs were observed regarding intron number, the ratio of gene length to coding DNA sequence (CDS) length and the length of exons encoding the MYB domain. Conserved as well as novel and lineage-specific intron phases were identified. Analyses of physicochemical properties revealed drastic differences indicating functional diversification in MYBs. Phylogenetic reconstruction of 1R- and 2R-MYB genes revealed a shared structure-function relationship in clades which was supported when transcriptome data was analysed in silico. Comparative genomics to study distribution pattern and mapping of 2R-MYBs revealed congruency and greater degree of synteny and collinearity among closely related species. Micro-synteny analysis of genomic segments revealed high conservation of genes that are immediately flanking the surrounding tandemly organised 2R-MYBs along with instances of local duplication, reorganisations and genome fractionation. In summary, polyploidy, dysploidy, reshuffling and genome fractionation were found to cause loss or gain of 2R-MYB genes. The findings need to be supported with functional validation to understand gene structure-function relationship along the evolutionary lineage and adaptive strategies based on comparative functional genomics in plants.
Collapse
Affiliation(s)
- Mukund Lal
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Ekta Bhardwaj
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Nishu Chahar
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Shobha Yadav
- Department of Botany, University of Delhi, Delhi, 110007, India
| | - Sandip Das
- Department of Botany, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
14
|
Abstract
Three-dimensional protein structural data at the molecular level are pivotal for successful precision medicine. Such data are crucial not only for discovering drugs that act to block the active site of the target mutant protein but also for clarifying to the patient and the clinician how the mutations harbored by the patient work. The relative paucity of structural data reflects their cost, challenges in their interpretation, and lack of clinical guidelines for their utilization. Rapid technological advancements in experimental high-resolution structural determination increasingly generate structures. Computationally, modeling algorithms, including molecular dynamics simulations, are becoming more powerful, as are compute-intensive hardware, particularly graphics processing units, overlapping with the inception of the exascale era. Accessible, freely available, and detailed structural and dynamical data can be merged with big data to powerfully transform personalized pharmacology. Here we review protein and emerging genome high-resolution data, along with means, applications, and examples underscoring their usefulness in precision medicine. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA; .,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Guy Nir
- Department of Biochemistry and Molecular Biology, Department of Neuroscience, Cell Biology and Anatomy, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, Texas, USA
| | - Chung-Jung Tsai
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research in the Laboratory of Cancer Immunometabolism, National Cancer Institute, Frederick, Maryland, USA;
| | - Feixiong Cheng
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, USA.,Department of Molecular Medicine, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, Ohio, USA.,Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
15
|
SWAAT Bioinformatics Workflow for Protein Structure-Based Annotation of ADME Gene Variants. J Pers Med 2022; 12:jpm12020263. [PMID: 35207751 PMCID: PMC8875676 DOI: 10.3390/jpm12020263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 01/26/2022] [Accepted: 02/01/2022] [Indexed: 02/01/2023] Open
Abstract
Recent genomic studies have revealed the critical impact of genetic diversity within small population groups in determining the way individuals respond to drugs. One of the biggest challenges is to accurately predict the effect of single nucleotide variants and to get the relevant information that allows for a better functional interpretation of genetic data. Different conformational scenarios upon the changing in amino acid sequences of pharmacologically important proteins might impact their stability and plasticity, which in turn might alter the interaction with the drug. Current sequence-based annotation methods have limited power to access this type of information. Motivated by these calls, we have developed the Structural Workflow for Annotating ADME Targets (SWAAT) that allows for the prediction of the variant effect based on structural properties. SWAAT annotates a panel of 36 ADME genes including 22 out of the 23 clinically important members identified by the PharmVar consortium. The workflow consists of a set of Python codes of which the execution is managed within Nextflow to annotate coding variants based on 37 criteria. SWAAT also includes an auxiliary workflow allowing a versatile use for genes other than ADME members. Our tool also includes a machine learning random forest binary classifier that showed an accuracy of 73%. Moreover, SWAAT outperformed six commonly used sequence-based variant prediction tools (PROVEAN, SIFT, PolyPhen-2, CADD, MetaSVM, and FATHMM) in terms of sensitivity and has comparable specificity. SWAAT is available as an open-source tool.
Collapse
|
16
|
Fellner A, Goldberg Y, Lev D, Basel-Salmon L, Shor O, Benninger F. In-silico phenotype prediction by normal mode variant analysis in TUBB4A-related disease. Sci Rep 2022; 12:58. [PMID: 34997144 PMCID: PMC8741991 DOI: 10.1038/s41598-021-04337-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/21/2021] [Indexed: 11/09/2022] Open
Abstract
TUBB4A-associated disorder is a rare condition affecting the central nervous system. It displays a wide phenotypic spectrum, ranging from isolated late-onset torsion dystonia to a severe early-onset disease with developmental delay, neurological deficits, and atrophy of the basal ganglia and cerebellum, therefore complicating variant interpretation and phenotype prediction in patients carrying TUBB4A variants. We applied entropy-based normal mode analysis (NMA) to investigate genotype–phenotype correlations in TUBB4A-releated disease and to develop an in-silico approach to assist in variant interpretation and phenotype prediction in this disorder. Variants included in our analysis were those reported prior to the conclusion of data collection for this study in October 2019. All TUBB4A pathogenic missense variants reported in ClinVar and Pubmed, for which associated clinical information was available, and all benign/likely benign TUBB4A missense variants reported in ClinVar, were included in the analysis. Pathogenic variants were divided into five phenotypic subgroups. In-silico point mutagenesis in the wild-type modeled protein structure was performed for each variant. Wild-type and mutated structures were analyzed by coarse-grained NMA to quantify protein stability as entropy difference value (ΔG) for each variant. Pairwise ΔG differences between all variant pairs in each structural cluster were calculated and clustered into dendrograms. Our search yielded 41 TUBB4A pathogenic variants in 126 patients, divided into 11 partially overlapping structural clusters across the TUBB4A protein. ΔG-based cluster analysis of the NMA results revealed a continuum of genotype–phenotype correlation across each structural cluster, as well as in transition areas of partially overlapping structural clusters. Benign/likely benign variants were integrated into the genotype–phenotype continuum as expected and were clearly separated from pathogenic variants. We conclude that our results support the incorporation of the NMA-based approach used in this study in the interpretation of variant pathogenicity and phenotype prediction in TUBB4A-related disease. Moreover, our results suggest that NMA may be of value in variant interpretation in additional monogenic conditions.
Collapse
Affiliation(s)
- Avi Fellner
- Raphael Recanati Genetics Institute, Rabin Medical Center, Beilinson Hospital, 49100, Petah Tikva, Israel. .,Department of Neurology, Rabin Medical Center, Beilinson Hospital, 49100, Petah Tikva, Israel.
| | - Yael Goldberg
- Raphael Recanati Genetics Institute, Rabin Medical Center, Beilinson Hospital, 49100, Petah Tikva, Israel.,Sackler Faculty of Medicine, Tel-Aviv University, 69978, Tel-Aviv, Israel
| | - Dorit Lev
- Sackler Faculty of Medicine, Tel-Aviv University, 69978, Tel-Aviv, Israel.,Metabolic-Neurogenetic Clinic, Wolfson Medical Center, 58220, Holon, Israel.,Rina Mor Institute of Medical Genetics, Wolfson Medical Center, 58220, Holon, Israel
| | - Lina Basel-Salmon
- Raphael Recanati Genetics Institute, Rabin Medical Center, Beilinson Hospital, 49100, Petah Tikva, Israel.,Sackler Faculty of Medicine, Tel-Aviv University, 69978, Tel-Aviv, Israel.,Felsenstein Medical Research Center, 49100, Petah Tikva, Israel
| | - Oded Shor
- Department of Neurology, Rabin Medical Center, Beilinson Hospital, 49100, Petah Tikva, Israel.,Sackler Faculty of Medicine, Tel-Aviv University, 69978, Tel-Aviv, Israel.,Felsenstein Medical Research Center, 49100, Petah Tikva, Israel
| | - Felix Benninger
- Department of Neurology, Rabin Medical Center, Beilinson Hospital, 49100, Petah Tikva, Israel.,Sackler Faculty of Medicine, Tel-Aviv University, 69978, Tel-Aviv, Israel.,Felsenstein Medical Research Center, 49100, Petah Tikva, Israel
| |
Collapse
|
17
|
Duong HTT, Suzuki H, Katagiri S, Shibata M, Arai M, Yura K. Computational study of the impact of nucleotide variations on highly conserved proteins: In the case of actin. Biophys Physicobiol 2022; 19:e190025. [PMID: 36160324 PMCID: PMC9465404 DOI: 10.2142/biophysico.bppb-v19.0025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Accepted: 07/27/2022] [Indexed: 12/01/2022] Open
Abstract
Sequencing of individual human genomes enables studying relationship among nucleotide variations, amino acid substitutions, effect on protein structures and diseases. Many studies have found general tendencies, for instance, that pathogenic variations tend to be found in the buried regions of the protein structures, that benign variations tend to be found on the surface of the proteins, and that variations on evolutionary conserved residues tend to be pathogenic. These tendencies were deduced from globular proteins with standard evolutionary changes in amino acid sequences. In this study, we investigated the variation distribution on actin, one of the highly conserved proteins. Many nucleotide variations and three-dimensional structures of actin have been registered in databases. By combining those data, we found that variations buried inside the protein were rather benign and variations on the surface of the protein were pathogenic. This idiosyncratic distribution of the variation impact is likely ascribed to the extensive use of the surface of the protein for protein-protein interactions in actin.
Collapse
Affiliation(s)
- Ha T. T. Duong
- Graduate School of Humanities and Sciences, Ochanomizu University
| | - Hirofumi Suzuki
- Graduate School of Advanced Science and Engineering, Waseda University
| | - Saki Katagiri
- Graduate School of Humanities and Sciences, Ochanomizu University
| | - Mayu Shibata
- Graduate School of Humanities and Sciences, Ochanomizu University
| | - Misae Arai
- Graduate School of Humanities and Sciences, Ochanomizu University
| | - Kei Yura
- Graduate School of Humanities and Sciences, Ochanomizu University
| |
Collapse
|
18
|
Findlay GM. Linking genome variants to disease: scalable approaches to test the functional impact of human mutations. Hum Mol Genet 2021; 30:R187-R197. [PMID: 34338757 PMCID: PMC8490018 DOI: 10.1093/hmg/ddab219] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 07/19/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
The application of genomics to medicine has accelerated the discovery of mutations underlying disease and has enhanced our knowledge of the molecular underpinnings of diverse pathologies. As the amount of human genetic material queried via sequencing has grown exponentially in recent years, so too has the number of rare variants observed. Despite progress, our ability to distinguish which rare variants have clinical significance remains limited. Over the last decade, however, powerful experimental approaches have emerged to characterize variant effects orders of magnitude faster than before. Fueled by improved DNA synthesis and sequencing and, more recently, by CRISPR/Cas9 genome editing, multiplex functional assays provide a means of generating variant effect data in wide-ranging experimental systems. Here, I review recent applications of multiplex assays that link human variants to disease phenotypes and I describe emerging strategies that will enhance their clinical utility in coming years.
Collapse
Affiliation(s)
- Gregory M Findlay
- The Francis Crick Institute, The Genome Function Laboratory, London NW1 1AT, UK
| |
Collapse
|
19
|
Functional and structural analyses of novel Smith-Kingsmore Syndrome-Associated MTOR variants reveal potential new mechanisms and predictors of pathogenicity. PLoS Genet 2021; 17:e1009651. [PMID: 34197453 PMCID: PMC8279410 DOI: 10.1371/journal.pgen.1009651] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 07/14/2021] [Accepted: 06/08/2021] [Indexed: 12/31/2022] Open
Abstract
Smith-Kingsmore syndrome (SKS) is a rare neurodevelopmental disorder characterized by macrocephaly/megalencephaly, developmental delay, intellectual disability, hypotonia, and seizures. It is caused by dominant missense mutations in MTOR. The pathogenicity of novel variants in MTOR in patients with neurodevelopmental disorders can be difficult to determine and the mechanism by which variants cause disease remains poorly understood. We report 7 patients with SKS with 4 novel MTOR variants and describe their phenotypes. We perform in vitro functional analyses to confirm MTOR activation and interrogate disease mechanisms. We complete structural analyses to understand the 3D properties of pathogenic variants. We examine the accuracy of relative accessible surface area, a quantitative measure of amino acid side-chain accessibility, as a predictor of MTOR variant pathogenicity. We describe novel clinical features of patients with SKS. We confirm MTOR Complex 1 activation and identify MTOR Complex 2 activation as a new potential mechanism of disease in SKS. We find that pathogenic MTOR variants disproportionately cluster in hotspots in the core of the protein, where they disrupt alpha helix packing due to the insertion of bulky amino acid side chains. We find that relative accessible surface area is significantly lower for SKS-associated variants compared to benign variants. We expand the phenotype of SKS and demonstrate that additional pathways of activation may contribute to disease. Incorporating 3D properties of MTOR variants may help in pathogenicity classification. We hope these findings may contribute to improving the precision of care and therapeutic development for individuals with SKS. Smith-Kingsmore Syndrome is a rare disease caused by damage in a gene named MTOR that is associated with excessive growth of the head and brain, delays in development and deficits in intellectual functioning. We report 7 patients who have changes in MTOR that have never been reported before. We describe new medical findings in these patients that may be common in Smith-Kingsmore Syndrome more broadly. We then identify how these new gene changes impact the function of the MTOR protein and thus cell function downstream. Lastly, we show that changes in the gene that lie deep inside the 3D structure of the MTOR protein are more likely to cause disease than those changes that lie on the surface of the protein. We may be able to use the 3D properties of MTOR gene changes to predict if future changes we see are likely to cause disease or not.
Collapse
|
20
|
Pereira JM, Vieira M, Santos SM. Step-by-step design of proteins for small molecule interaction: A review on recent milestones. Protein Sci 2021; 30:1502-1520. [PMID: 33934427 DOI: 10.1002/pro.4098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/21/2021] [Accepted: 04/23/2021] [Indexed: 01/01/2023]
Abstract
Protein design is the field of synthetic biology that aims at developing de novo custom-made proteins and peptides for specific applications. Despite exploring an ambitious goal, recent computational advances in both hardware and software technologies have paved the way to high-throughput screening and detailed design of novel folds and improved functionalities. Modern advances in the field of protein design for small molecule targeting are described in this review, organized in a step-by-step fashion: from the conception of a new or upgraded active binding site, to scaffold design, sequence optimization, and experimental expression of the custom protein. In each step, contemporary examples are described, and state-of-the-art software is briefly explored.
Collapse
Affiliation(s)
- José M Pereira
- CICECO & Departamento de Química, Universidade de Aveiro, Aveiro, Portugal
| | - Maria Vieira
- CICECO & Departamento de Química, Universidade de Aveiro, Aveiro, Portugal
| | - Sérgio M Santos
- CICECO & Departamento de Química, Universidade de Aveiro, Aveiro, Portugal
| |
Collapse
|
21
|
Missense3D-DB web catalogue: an atom-based analysis and repository of 4M human protein-coding genetic variants. Hum Genet 2021; 140:805-812. [PMID: 33502607 PMCID: PMC8052235 DOI: 10.1007/s00439-020-02246-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 12/07/2020] [Indexed: 12/22/2022]
Abstract
The interpretation of human genetic variation is one of the greatest challenges of modern genetics. New approaches are urgently needed to prioritize variants, especially those that are rare or lack a definitive clinical interpretation. We examined 10,136,597 human missense genetic variants from GnomAD, ClinVar and UniProt. We were able to perform large-scale atom-based mapping and phenotype interpretation of 3,960,015 of these variants onto 18,874 experimental and 84,818 in house predicted three-dimensional coordinates of the human proteome. We demonstrate that 14% of amino acid substitutions from the GnomAD database that could be structurally analysed are predicted to affect protein structure (n = 568,548, of which 566,439 rare or extremely rare) and may, therefore, have a yet unknown disease-causing effect. The same is true for 19.0% (n = 6266) of variants of unknown clinical significance or conflicting interpretation reported in the ClinVar database. The results of the structural analysis are available in the dedicated web catalogue Missense3D-DB ( http://missense3d.bc.ic.ac.uk/ ). For each of the 4 M variants, the results of the structural analysis are presented in a friendly concise format that can be included in clinical genetic reports. A detailed report of the structural analysis is also available for the non-experts in structural biology. Population frequency and predictions from SIFT and PolyPhen are included for a more comprehensive variant interpretation. This is the first large-scale atom-based structural interpretation of human genetic variation and offers geneticists and the biomedical community a new approach to genetic variant interpretation.
Collapse
|
22
|
Ultrarare heterozygous pathogenic variants of genes causing dominant forms of early-onset deafness underlie severe presbycusis. Proc Natl Acad Sci U S A 2020; 117:31278-31289. [PMID: 33229591 DOI: 10.1073/pnas.2010782117] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Presbycusis, or age-related hearing loss (ARHL), is a major public health issue. About half the phenotypic variance has been attributed to genetic factors. Here, we assessed the contribution to presbycusis of ultrarare pathogenic variants, considered indicative of Mendelian forms. We focused on severe presbycusis without environmental or comorbidity risk factors and studied multiplex family age-related hearing loss (mARHL) and simplex/sporadic age-related hearing loss (sARHL) cases and controls with normal hearing by whole-exome sequencing. Ultrarare variants (allele frequency [AF] < 0.0001) of 35 genes responsible for autosomal dominant early-onset forms of deafness, predicted to be pathogenic, were detected in 25.7% of mARHL and 22.7% of sARHL cases vs. 7.5% of controls (P = 0.001); half were previously unknown (AF < 0.000002). MYO6, MYO7A, PTPRQ, and TECTA variants were present in 8.9% of ARHL cases but less than 1% of controls. Evidence for a causal role of variants in presbycusis was provided by pathogenicity prediction programs, documented haploinsufficiency, three-dimensional structure/function analyses, cell biology experiments, and reported early effects. We also established Tmc1 N321I/+ mice, carrying the TMC1:p.(Asn327Ile) variant detected in an mARHL case, as a mouse model for a monogenic form of presbycusis. Deafness gene variants can thus result in a continuum of auditory phenotypes. Our findings demonstrate that the genetics of presbycusis is shaped by not only well-studied polygenic risk factors of small effect size revealed by common variants but also, ultrarare variants likely resulting in monogenic forms, thereby paving the way for treatment with emerging inner ear gene therapy.
Collapse
|
23
|
Spreafico R, Soriaga LB, Grosse J, Virgin HW, Telenti A. Advances in Genomics for Drug Development. Genes (Basel) 2020; 11:E942. [PMID: 32824125 PMCID: PMC7465049 DOI: 10.3390/genes11080942] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 08/04/2020] [Accepted: 08/13/2020] [Indexed: 11/16/2022] Open
Abstract
Drug development (target identification, advancing drug leads to candidates for preclinical and clinical studies) can be facilitated by genetic and genomic knowledge. Here, we review the contribution of population genomics to target identification, the value of bulk and single cell gene expression analysis for understanding the biological relevance of a drug target, and genome-wide CRISPR editing for the prioritization of drug targets. In genomics, we discuss the different scope of genome-wide association studies using genotyping arrays, versus exome and whole genome sequencing. In transcriptomics, we discuss the information from drug perturbation and the selection of biomarkers. For CRISPR screens, we discuss target discovery, mechanism of action and the concept of gene to drug mapping. Harnessing genetic support increases the probability of drug developability and approval.
Collapse
Affiliation(s)
| | | | | | | | - Amalio Telenti
- Vir Biotechnology, Inc., San Francisco, CA 94158, USA; (R.S.); (L.B.S.); (J.G.); (H.W.V.)
| |
Collapse
|
24
|
Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 2020; 18:1968-1979. [PMID: 32774791 PMCID: PMC7397395 DOI: 10.1016/j.csbj.2020.07.011] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 07/10/2020] [Accepted: 07/14/2020] [Indexed: 12/13/2022] Open
Abstract
Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.
Collapse
Affiliation(s)
- Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Ludovica Montanucci
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Viale dell'Università 16, 35020 Legnaro, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
25
|
Zaucha J, Heinzinger M, Kulandaisamy A, Kataka E, Salvádor ÓL, Popov P, Rost B, Gromiha MM, Zhorov BS, Frishman D. Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Brief Bioinform 2020; 22:5872174. [PMID: 32672331 DOI: 10.1093/bib/bbaa132] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 05/26/2020] [Accepted: 05/28/2020] [Indexed: 12/18/2022] Open
Abstract
Membrane proteins are unique in that they interact with lipid bilayers, making them indispensable for transporting molecules and relaying signals between and across cells. Due to the significance of the protein's functions, mutations often have profound effects on the fitness of the host. This is apparent both from experimental studies, which implicated numerous missense variants in diseases, as well as from evolutionary signals that allow elucidating the physicochemical constraints that intermembrane and aqueous environments bring. In this review, we report on the current state of knowledge acquired on missense variants (referred to as to single amino acid variants) affecting membrane proteins as well as the insights that can be extrapolated from data already available. This includes an overview of the annotations for membrane protein variants that have been collated within databases dedicated to the topic, bioinformatics approaches that leverage evolutionary information in order to shed light on previously uncharacterized membrane protein structures or interaction interfaces, tools for predicting the effects of mutations tailored specifically towards the characteristics of membrane proteins as well as two clinically relevant case studies explaining the implications of mutated membrane proteins in cancer and cardiomyopathy.
Collapse
Affiliation(s)
- Jan Zaucha
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - A Kulandaisamy
- Department of Biotechnology of the IIT Bhupat and Jyoti Mehta School of BioSciences in Madras, India
| | - Evans Kataka
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Óscar Llorian Salvádor
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - Petr Popov
- Center for Computational and Data-Intensive Science and Engineering of the Skolkovo Institute of Science and Technology in Moscow, Russia
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology at the TUM Faculty of Informatics in Garching, Germany
| | | | - Boris S Zhorov
- Department of Biochemistry and Biomedical Sciences, McMaster University in Hamilton, Canada
| | - Dmitrij Frishman
- Department of Bioinformatics at the TUM School of Life Sciences Weihenstephan in Freising, Germany
| |
Collapse
|
26
|
Epidemiology and evolutionary analysis of Torque teno sus virus. Vet Microbiol 2020; 244:108668. [PMID: 32402339 DOI: 10.1016/j.vetmic.2020.108668] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 01/14/2020] [Accepted: 01/15/2020] [Indexed: 11/20/2022]
Abstract
Single stranded (ss) DNA viruses are increasingly being discovered due to the ongoing development of modern technologies in exploring the virosphere. Characterized by high rates of recombination and nucleotide substitutions, it could be comparable to RNA virus ones. Torque teno sus virus (TTSuV) is a standard ssDNA virus with a high population diversity, whose evolution is still obscure, further, it is frequently found in co-infections with other viruses threatening the porcine industry and therefore share the same host and epidemiological context. Here, we implement and describe approach to integrate viral nucleotide sequence analysis, surveillance data, and a structural approach to examine the evolution of TTSuVs, we collected samples from pigs displaying respiratory signs in China and revealed a high prevalence of TTSuV1 and TTSuVk2, frequently as part of co-infections with porcine circoviruses (PCVs), especially in spleen and lung. In addition, thirty six strains sequenced were obtained to investigate their genetic diversity in China. The evolutionary history of TTSuVs were unveiled as following: At the nucleotide sequence level, TTSuVs ORF1 was confirmed to be a robust phylogenetic maker to study evolution comparably to full genomes. Additionally, extensive recombination discovered within TTSuVk2a (also 5 out of the 36 sequenced strains in this study revealed to be recombination). Then, pairwise distance, phylogenetic trees, and amino acid analysis confirmed TTSuVs species, and allowed to define circulating genotypes (TTSuV1a-1, 1a-2, 1b-1, 1b-2, 1b-3, and k2a-1, k2a-2, k2b). Selection analysis uncovered seven and six positive selected sites in TTSuV1 and TTSuVk2, respectively. At the protein structure level, mapping of sites onto the three-dimensional structure revealed that several positive selected sites locate into potential epitopes, which might related to the potential escaping from host immune response. Our result could assist future studies on swine ssDNA virus classification, surveillance and control.
Collapse
|
27
|
Wang C, Balch WE. Bridging Genomics to Phenomics at Atomic Resolution through Variation Spatial Profiling. Cell Rep 2020; 24:2013-2028.e6. [PMID: 30134164 PMCID: PMC6261431 DOI: 10.1016/j.celrep.2018.07.059] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/25/2018] [Accepted: 07/16/2018] [Indexed: 01/04/2023] Open
Abstract
To understand the impact of genome sequence variation (the genotype) responsible for biological diversity and human health (the phenotype) including cystic fibrosis and Alzheimer's disease, we developed a Gaussian-process-based machine learning (ML) approach, variation spatial profiling (VSP). VSP uses a sparse collection of known variants found in the population that perturb the protein fold to define unknown variant function based on the emergent general principle of spatial covariance (SCV). SCV quantitatively captures the role of proximity in genotype-to-phenotype spatial-temporal relationships. Phenotype landscapes generated through SCV provide a platform that can be used to describe the functional properties that drive sequence-to-function-to-structure design of the polypeptide fold at atomic resolution. We provide proof of principle that SCV can enable the use of population-based genomic platforms to define the origins and mechanism of action of genotype-to-phenotype transformations contributing to the health and disease of an individual.
Collapse
Affiliation(s)
- Chao Wang
- Department of Molecular Medicine, The Scripps Research Institute (TSRI), La Jolla, CA 92037, USA
| | - William E Balch
- Department of Molecular Medicine, The Scripps Research Institute (TSRI), La Jolla, CA 92037, USA; The Skaggs Institute for Chemical Biology, The Scripps Research Institute (TSRI), La Jolla, CA 92037, USA.
| |
Collapse
|
28
|
Target discovery using biobanks and human genetics. Drug Discov Today 2019; 25:438-445. [PMID: 31562982 DOI: 10.1016/j.drudis.2019.09.014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/18/2019] [Accepted: 09/18/2019] [Indexed: 11/22/2022]
Abstract
Large-scale biobanks can yield unprecedented insights into our health and provide discoveries of new and potentially targetable biomarkers. Several protective loss-of-function alleles have been identified, including variants that protect against cardiovascular disease, obesity, type 2 diabetes, and asthma and allergic diseases. These alleles serve as indicators of efficacy, mimicking the effects of drugs and suggesting that inhibiting these genes could provide therapeutic benefit, as has been observed for PCSK9. We provide a context for these findings through a multifaceted review covering the use of genetics in drug discovery efforts through genome-wide and phenome-wide association studies, linking deep mutation scanning data to molecular function and highlighting some additional tools that might help in the interpretation of newly discovered variants.
Collapse
|
29
|
Strokach A, Corbi-Verge C, Kim PM. Predicting changes in protein stability caused by mutation using sequence-and structure-based methods in a CAGI5 blind challenge. Hum Mutat 2019; 40:1414-1423. [PMID: 31243847 PMCID: PMC6744338 DOI: 10.1002/humu.23852] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 05/16/2019] [Accepted: 06/24/2019] [Indexed: 12/26/2022]
Abstract
Predicting the impact of mutations on proteins remains an important problem. As part of the CAGI5 frataxin challenge, we evaluate the accuracy with which Provean, FoldX, and ELASPIC can predict changes in the Gibbs free energy of a protein using a limited data set of eight mutations. We find that different methods have distinct strengths and limitations, with no method being strictly superior to other methods on all metrics. ELASPIC achieves the highest accuracy while also providing a web interface which simplifies the evaluation and analysis of mutations. FoldX is slightly less accurate than ELASPIC but is easier to run locally, as it does not depend on external tools or datasets. Provean achieves reasonable results while being computational less expensive than the other methods and not requiring a structure of the protein. In addition to methods submitted to the CAGI5 community experiment, and with the aim to inform about other methods with high accuracy, we also evaluate predictions made by Rosetta's ddg_monomer protocol, Rosetta's cartesian_ddg protocol, and thermodynamic integration calculations using Amber package. ELASPIC still achieves the highest accuracy, while Rosetta's catesian_ddg protocol appears to perform best in capturing the overall trend in the data.
Collapse
Affiliation(s)
- Alexey Strokach
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Carles Corbi-Verge
- Donnelly Centre for Cellular and Biomolecular Research, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Philip M Kim
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Donnelly Centre for Cellular and Biomolecular Research, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
30
|
Wang B, Yan C, Lou S, Emani P, Li B, Xu M, Kong X, Meyerson W, Yang YT, Lee D, Gerstein M. Building a Hybrid Physical-Statistical Classifier for Predicting the Effect of Variants Related to Protein-Drug Interactions. Structure 2019; 27:1469-1481.e3. [PMID: 31279629 DOI: 10.1016/j.str.2019.06.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 02/14/2019] [Accepted: 06/03/2019] [Indexed: 11/17/2022]
Abstract
A key issue in drug design is how population variation affects drug efficacy by altering binding affinity (BA) in different individuals, an essential consideration for government regulators. Ideally, we would like to evaluate the BA perturbations of millions of single-nucleotide variants (SNVs). However, only hundreds of protein-drug complexes with SNVs have experimentally characterized BAs, constituting too small a gold standard for straightforward statistical model training. Thus, we take a hybrid approach: using physically based calculations to bootstrap the parameterization of a full model. In particular, we do 3D structure-based docking on ∼10,000 SNVs modifying known protein-drug complexes to construct a pseudo gold standard. Then we use this augmented set of BAs to train a statistical model combining structure, ligand and sequence features and illustrate how it can be applied to millions of SNVs. Finally, we show that our model has good cross-validated performance (97% AUROC) and can also be validated by orthogonal ligand-binding data.
Collapse
Affiliation(s)
- Bo Wang
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Chengfei Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Shaoke Lou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Prashant Emani
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Bian Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Min Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Xiangmeng Kong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - William Meyerson
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Yale School of Medicine, Yale University, New Haven, CT 06520, USA
| | - Yucheng T Yang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Donghoon Lee
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA; Department of Computer Science, Yale University, New Haven, CT 06520, USA.
| |
Collapse
|
31
|
Stein A, Fowler DM, Hartmann-Petersen R, Lindorff-Larsen K. Biophysical and Mechanistic Models for Disease-Causing Protein Variants. Trends Biochem Sci 2019; 44:575-588. [PMID: 30712981 PMCID: PMC6579676 DOI: 10.1016/j.tibs.2019.01.003] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 01/04/2019] [Accepted: 01/08/2019] [Indexed: 12/13/2022]
Abstract
The rapid decrease in DNA sequencing cost is revolutionizing medicine and science. In medicine, genome sequencing has revealed millions of missense variants that change protein sequences, yet we only understand the molecular and phenotypic consequences of a small fraction. Within protein science, high-throughput deep mutational scanning experiments enable us to probe thousands of variants in a single, multiplexed experiment. We review efforts that bring together these topics via experimental and computational approaches to determine the consequences of missense variants in proteins. We focus on the role of changes in protein stability as a driver for disease, and how experiments, biophysical models, and computation are providing a framework for understanding and predicting how changes in protein sequence affect cellular protein stability.
Collapse
Affiliation(s)
- Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Douglas M Fowler
- Departments of Genome Sciences and Bioengineering, University of Washington, Seattle, WA, USA
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
32
|
Miller JE, Veturi Y, Ritchie MD. Innovative strategies for annotating the "relationSNP" between variants and molecular phenotypes. BioData Min 2019; 12:10. [PMID: 31114635 PMCID: PMC6518798 DOI: 10.1186/s13040-019-0197-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 04/18/2019] [Indexed: 11/10/2022] Open
Abstract
Characterizing how variation at the level of individual nucleotides contributes to traits and diseases has been an area of growing interest since the completion of sequencing the first human genome. Our understanding of how a single nucleotide polymorphism (SNP) leads to a pathogenic phenotype on a genome-wide scale is a fruitful endeavor for anyone interested in developing diagnostic tests, therapeutics, or simply wanting to understand the etiology of a disease or trait. To this end, many datasets and algorithms have been developed as resources/tools to annotate SNPs. One of the most common practices is to annotate coding SNPs that affect the protein sequence. Synonymous variants are often grouped as one type of variant, however there are in fact many tools available to dissect their effects on gene expression. More recently, large consortiums like ENCODE and GTEx have made it possible to annotate non-coding regions. Although annotating variants is a common technique among human geneticists, the constant advances in tools and biology surrounding SNPs requires an updated summary of what is known and the trajectory of the field. This review will discuss the history behind SNP annotation, commonly used tools, and newer strategies for SNP annotation. Additionally, we will comment on the caveats that distinguish approaches from one another, along with gaps in the current state of knowledge, and potential future directions. We do not intend for this to be a comprehensive review for any specific area of SNP annotation, but rather it will be an excellent resource for those unfamiliar with computational tools used to functionally characterize SNPs. In summary, this review will help illustrate how each SNP annotation method impacts the way in which the genetic and molecular etiology of a disease is explored in-silico.
Collapse
Affiliation(s)
- Jason E. Miller
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA
| | - Yogasudha Veturi
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA
| | - Marylyn D. Ritchie
- Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA 19104 USA
| |
Collapse
|
33
|
Ofoegbu TC, David A, Kelley LA, Mezulis S, Islam SA, Mersmann SF, Strömich L, Vakser IA, Houlston RS, Sternberg MJE. PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants. J Mol Biol 2019; 431:2460-2466. [PMID: 31075275 PMCID: PMC6597944 DOI: 10.1016/j.jmb.2019.04.043] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 04/02/2019] [Accepted: 04/29/2019] [Indexed: 12/12/2022]
Abstract
PhyreRisk is an open-access, publicly accessible web application for interactively bridging genomic, proteomic and structural data facilitating the mapping of human variants onto protein structures. A major advance over other tools for sequence-structure variant mapping is that PhyreRisk provides information on 20,214 human canonical proteins and an additional 22,271 alternative protein sequences (isoforms). Specifically, PhyreRisk provides structural coverage (partial or complete) for 70% (14,035 of 20,214 canonical proteins) of the human proteome, by storing 18,874 experimental structures and 84,818 pre-built models of canonical proteins and their isoforms generated using our in house Phyre2. PhyreRisk reports 55,732 experimentally, multi-validated protein interactions from IntAct and 24,260 experimental structures of protein complexes. Another major feature of PhyreRisk is that, rather than presenting a limited set of precomputed variant-structure mapping of known genetic variants, it allows the user to explore novel variants using, as input, genomic coordinates formats (Ensembl, VCF, reference SNP ID and HGVS notations) and Human Build GRCh37 and GRCh38. PhyreRisk also supports mapping variants using amino acid coordinates and searching for genes or proteins of interest. PhyreRisk is designed to empower researchers to translate genetic data into protein structural information, thereby providing a more comprehensive appreciation of the functional impact of variants. PhyreRisk is freely available at http://phyrerisk.bc.ic.ac.uk.
Collapse
Affiliation(s)
- Tochukwu C Ofoegbu
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Lawrence A Kelley
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Stefans Mezulis
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Suhail A Islam
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Sophia F Mersmann
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Léonie Strömich
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, KS 66045, USA
| | - Richard S Houlston
- Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SM2 5NG, UK
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
34
|
Functional characterization of 3D protein structures informed by human genetic diversity. Proc Natl Acad Sci U S A 2019; 116:8960-8965. [PMID: 30988206 PMCID: PMC6500140 DOI: 10.1073/pnas.1820813116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Sequence variation data of the human proteome can be used to analyze 3D protein structures to derive functional insights. We used genetic variant data from nearly 140,000 individuals to analyze 3D positional conservation in 4,715 proteins and 3,951 homology models using 860,292 missense and 465,886 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. Structural intolerance data correlated with deep mutational scanning functional readouts for PPARG, MAPK1/ERK2, UBE2I, SUMO1, PTEN, CALM1, CALM2, and TPK1 and with shallow mutagenesis data for 1,026 proteins. The 3D structural intolerance analysis revealed different features for ligand binding pockets and orthosteric and allosteric sites. Large-scale data on human genetic variation support a definition of functional 3D sites proteome-wide.
Collapse
|
35
|
Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated? J Mol Biol 2019; 431:2197-2212. [PMID: 30995449 PMCID: PMC6544567 DOI: 10.1016/j.jmb.2019.04.009] [Citation(s) in RCA: 268] [Impact Index Per Article: 53.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 03/18/2019] [Accepted: 04/07/2019] [Indexed: 01/29/2023]
Abstract
Knowledge of protein structure can be used to predict the phenotypic consequence of a missense variant. Since structural coverage of the human proteome can be roughly tripled to over 50% of the residues if homology-predicted structures are included in addition to experimentally determined coordinates, it is important to assess the reliability of using predicted models when analyzing missense variants. Accordingly, we assess whether a missense variant is structurally damaging by using experimental and predicted structures. We considered 606 experimental structures and show that 40% of the 1965 disease-associated missense variants analyzed have a structurally damaging change in the mutant structure. Only 11% of the 2134 neutral variants are structurally damaging. Importantly, similar results are obtained when 1052 structures predicted using Phyre2 algorithm were used, even when the model shares low (<40%) sequence identity to the template. Thus, structure-based analysis of the effects of missense variants can be effectively applied to homology models. Our in-house pipeline, Missense3D, for structurally assessing missense variants was made available at http://www.sbg.bio.ic.ac.uk/~missense3d.
Collapse
|
36
|
Review: Precision medicine and driver mutations: Computational methods, functional assays and conformational principles for interpreting cancer drivers. PLoS Comput Biol 2019; 15:e1006658. [PMID: 30921324 PMCID: PMC6438456 DOI: 10.1371/journal.pcbi.1006658] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
At the root of the so-called precision medicine or precision oncology, which is our focus here, is the hypothesis that cancer treatment would be considerably better if therapies were guided by a tumor’s genomic alterations. This hypothesis has sparked major initiatives focusing on whole-genome and/or exome sequencing, creation of large databases, and developing tools for their statistical analyses—all aspiring to identify actionable alterations, and thus molecular targets, in a patient. At the center of the massive amount of collected sequence data is their interpretations that largely rest on statistical analysis and phenotypic observations. Statistics is vital, because it guides identification of cancer-driving alterations. However, statistics of mutations do not identify a change in protein conformation; therefore, it may not define sufficiently accurate actionable mutations, neglecting those that are rare. Among the many thematic overviews of precision oncology, this review innovates by further comprehensively including precision pharmacology, and within this framework, articulating its protein structural landscape and consequences to cellular signaling pathways. It provides the underlying physicochemical basis, thereby also opening the door to a broader community.
Collapse
|
37
|
Lappalainen T, Scott AJ, Brandt M, Hall IM. Genomic Analysis in the Age of Human Genome Sequencing. Cell 2019; 177:70-84. [PMID: 30901550 PMCID: PMC6532068 DOI: 10.1016/j.cell.2019.02.032] [Citation(s) in RCA: 147] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Revised: 02/19/2019] [Accepted: 02/19/2019] [Indexed: 02/08/2023]
Abstract
Affordable genome sequencing technologies promise to revolutionize the field of human genetics by enabling comprehensive studies that interrogate all classes of genome variation, genome-wide, across the entire allele frequency spectrum. Ongoing projects worldwide are sequencing many thousands-and soon millions-of human genomes as part of various gene mapping studies, biobanking efforts, and clinical programs. However, while genome sequencing data production has become routine, genome analysis and interpretation remain challenging endeavors with many limitations and caveats. Here, we review the current state of technologies for genetic variant discovery, genotyping, and functional interpretation and discuss the prospects for future advances. We focus on germline variants discovered by whole-genome sequencing, genome-wide functional genomic approaches for predicting and measuring variant functional effects, and implications for studies of common and rare human disease.
Collapse
Affiliation(s)
- Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; Department of Systems Biology, Columbia University, New York, NY, USA.
| | - Alexandra J Scott
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA; Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Margot Brandt
- New York Genome Center, New York, NY, USA; Department of Systems Biology, Columbia University, New York, NY, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA; Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA; Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
38
|
Klee EW, Zimmermann MT. Molecular modeling of LDLR aids interpretation of genomic variants. J Mol Med (Berl) 2019; 97:533-540. [PMID: 30778614 PMCID: PMC6440939 DOI: 10.1007/s00109-019-01755-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 01/14/2019] [Accepted: 02/05/2019] [Indexed: 11/24/2022]
Abstract
Abstract Genetic variants in low-density lipoprotein receptor (LDLR) are known to cause familial hypercholesterolemia (FH), occurring in up to 1 in 200 people (Youngblom E. et al. 1993 and Nordestgaard BG et al. 34:3478–3490a, 2013) and leading to significant risk for heart disease. Clinical genomics testing using high-throughput sequencing is identifying novel genomic variants of uncertain significance (VUS) in individuals suspected of having FH, but for whom the causal link to the disease remains to be established (Nordestgaard BG et al. 34:3478–3490a, 2013). Unfortunately, experimental data about the atomic structure of the LDL binding domains of LDLR at extracellular pH does not exist. This leads to an inability to apply protein structure-based methods for assessing novel variants identified through genetic testing. Thus, the ambiguities in interpretation of LDLR variants are a barrier to achieving the expected clinical value for personalized genomics assays for management of FH. In this study, we integrated data from the literature and related cellular receptors to develop high-resolution models of full-length LDLR at extracellular conditions and use them to predict which VUS alter LDL binding. We believe that the functional effects of LDLR variants can be resolved using a combination of structural bioinformatics and functional assays, leading to a better correlation with clinical presentation. We have completed modeling of LDLR in two major physiologic conditions, generating detailed hypotheses for how each of the 1007 reported protein variants may affect function. Key messages • Hundreds of variants are observed in the LDLR, but most lack interpretation. • Molecular modeling is aided by biochemical knowledge. • We generated context-specific 3D protein models of LDLR. • Our models allowed mechanistic interpretation of many variants. • We interpreted both rare and common genomic variants in their physiologic context. • Effects of genomic variants are often context-specific. Electronic supplementary material The online version of this article (10.1007/s00109-019-01755-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Eric W Klee
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA.,Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Michael T Zimmermann
- Bioinformatics Research and Development Laboratory, Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, WI, 53226-0509, USA.
| |
Collapse
|
39
|
Gutierrez B, Escalera-Zamudio M, Pybus OG. Parallel molecular evolution and adaptation in viruses. Curr Opin Virol 2019; 34:90-96. [PMID: 30703578 PMCID: PMC7102768 DOI: 10.1016/j.coviro.2018.12.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 12/11/2018] [Indexed: 01/05/2023]
Abstract
Parallel molecular evolution is the independent evolution of the same genotype or phenotype from distinct ancestors. The simple genomes and rapid evolution of many viruses mean they are useful model systems for studying parallel evolution by natural selection. Parallel adaptation occurs in the context of several viral behaviours, including cross-species transmission, drug resistance, and host immune escape, and its existence suggests that at least some aspects of virus evolution and emergence are repeatable and predictable. We introduce examples of virus parallel evolution and summarise key concepts. We outline the difficulties in detecting parallel adaptation using virus genomes, with a particular focus on phylogenetic and structural approaches, and we discuss future approaches that may improve our understanding of the phenomenon.
Collapse
Affiliation(s)
| | | | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, United Kingdom.
| |
Collapse
|
40
|
Single nucleotide polymorphisms alter kinase anchoring and the subcellular targeting of A-kinase anchoring proteins. Proc Natl Acad Sci U S A 2018; 115:E11465-E11474. [PMID: 30455320 DOI: 10.1073/pnas.1816614115] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
A-kinase anchoring proteins (AKAPs) shape second-messenger signaling responses by constraining protein kinase A (PKA) at precise intracellular locations. A defining feature of AKAPs is a helical region that binds to regulatory subunits (RII) of PKA. Mining patient-derived databases has identified 42 nonsynonymous SNPs in the PKA-anchoring helices of five AKAPs. Solid-phase RII binding assays confirmed that 21 of these amino acid substitutions disrupt PKA anchoring. The most deleterious side-chain modifications are situated toward C-termini of AKAP helices. More extensive analysis was conducted on a valine-to-methionine variant in the PKA-anchoring helix of AKAP18. Molecular modeling indicates that additional density provided by methionine at position 282 in the AKAP18γ isoform deflects the pitch of the helical anchoring surface outward by 6.6°. Fluorescence polarization measurements show that this subtle topological change reduces RII-binding affinity 8.8-fold and impairs cAMP responsive potentiation of L-type Ca2+ currents in situ. Live-cell imaging of AKAP18γ V282M-GFP adducts led to the unexpected discovery that loss of PKA anchoring promotes nuclear accumulation of this polymorphic variant. Targeting proceeds via a mechanism whereby association with the PKA holoenzyme masks a polybasic nuclear localization signal on the anchoring protein. This led to the discovery of AKAP18ε: an exclusively nuclear isoform that lacks a PKA-anchoring helix. Enzyme-mediated proximity-proteomics reveal that compartment-selective variants of AKAP18 associate with distinct binding partners. Thus, naturally occurring PKA-anchoring-defective AKAP variants not only perturb dissemination of local second-messenger responses, but also may influence the intracellular distribution of certain AKAP18 isoforms.
Collapse
|
41
|
Telenti A, Lippert C, Chang PC, DePristo M. Deep learning of genomic variation and regulatory network data. Hum Mol Genet 2018; 27:R63-R71. [PMID: 29648622 PMCID: PMC6499235 DOI: 10.1093/hmg/ddy115] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 03/26/2018] [Accepted: 03/27/2018] [Indexed: 02/07/2023] Open
Abstract
The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.
Collapse
Affiliation(s)
- Amalio Telenti
- Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | | | | |
Collapse
|
42
|
Buljan M, Blattmann P, Aebersold R, Boutros M. Systematic characterization of pan-cancer mutation clusters. Mol Syst Biol 2018; 14:e7974. [PMID: 29572294 PMCID: PMC5866917 DOI: 10.15252/msb.20177974] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Cancer genome sequencing has shown that driver genes can often be distinguished not only by the elevated mutation frequency but also by specific nucleotide positions that accumulate changes at a high rate. However, properties associated with a residue's potential to drive tumorigenesis when mutated have not yet been systematically investigated. Here, using a novel methodological approach, we identify and characterize a compendium of 180 hotspot residues within 160 human proteins which occur with a significant frequency and are likely to have functionally relevant impact. We find that such mutations (i) are more prominent in proteins that can exist in the on and off state, (ii) reflect the identity of a tumor of origin, and (iii) often localize within interfaces which mediate interactions with other proteins or ligands. Following, we further examine structural data for human protein complexes and identify a number of additional protein interfaces that accumulate cancer mutations at a high rate. Jointly, these analyses suggest that disruption and dysregulation of protein interactions can be instrumental in switching functions of cancer proteins and activating downstream changes.
Collapse
Affiliation(s)
- Marija Buljan
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,Division Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Peter Blattmann
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland .,Faculty of Science, University of Zurich, Zurich, Switzerland
| | - Michael Boutros
- Division Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany .,Department Cell and Molecular Biology, Faculty of Medicine Mannheim, Heidelberg University, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg, Germany
| |
Collapse
|