101
|
Seaby EG, Leggatt G, Cheng G, Thomas NS, Ashton JJ, Stafford I, Baralle D, Rehm HL, O'Donnell-Luria A, Ennis S. A gene pathogenicity tool 'GenePy' identifies missed biallelic diagnoses in the 100,000 Genomes Project. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.21.23287545. [PMID: 37034701 PMCID: PMC10081430 DOI: 10.1101/2023.03.21.23287545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
The 100,000 Genomes Project (100KGP) diagnosed a quarter of recruited affected participants, but 26% of diagnoses were in genes not on the chosen gene panel(s); with many being de novo variants of high impact. However, assessing biallelic variants without a gene panel is challenging, due to the number of variants requiring scrutiny. We sought to identify potential missed biallelic diagnoses independent of the gene panel applied using GenePy - a whole gene pathogenicity metric. GenePy scores all variants called in a given individual, incorporating allele frequency, zygosity, and a user-defined deleterious metric (CADD v1.6 applied herein). GenePy then combines all variant scores for individual genes, generating an aggregate score per gene, per participant. We calculated GenePy scores for 2862 recessive disease genes in 78,216 individuals in 100KGP. For each gene, we ranked participant GenePy scores for that gene, and scrutinised affected individuals without a diagnosis whose scores ranked amongst the top-5 for each gene. We assessed these participants' phenotypes for overlap with the disease gene associated phenotype for which they were highly ranked. Where phenotypes overlapped, we extracted rare variants in the gene of interest and applied phase, ClinVar and ACMG classification looking for putative causal biallelic variants. 3184 affected individuals without a molecular diagnosis had a top-5 ranked GenePy gene score and 682/3184 (21%) had phenotypes overlapping with one of the top-ranking genes. After removing 13 withdrawn participants, in 122/669 (18%) of the phenotype-matched cases, we identified a putative missed diagnosis in a top-ranked gene supported by phasing, ClinVar and ACMG classification. A further 334/669 (50%) of cases have a possible missed diagnosis but require functional validation. Applying GenePy at scale has identified potential diagnoses for 456/3183 (14%) of undiagnosed participants who had a top-5 ranked GenePy score in a recessive disease gene, whilst adding only 1.2 additional variants (per individual) for assessment.
Collapse
Affiliation(s)
- Eleanor G Seaby
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
- Paediatric Infectious Diseases, Imperial College London, London, W2 1NY, UK
| | - Gary Leggatt
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - Guo Cheng
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - N Simon Thomas
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury, SP2 8BJ, UK
| | - James J Ashton
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | | | - Diana Baralle
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Sarah Ennis
- Human Development and Health, Faculty of Medicine, University Hospital Southampton, Southampton, Hampshire, SO16 6YD, UK
| |
Collapse
|
102
|
García-Ruiz S, Zhang D, Gustavsson EK, Rocamora-Perez G, Grant-Peters M, Fairbrother-Browne A, Reynolds RH, Brenton JW, Gil-Martínez AL, Chen Z, Rio DC, Botia JA, Guelfi S, Collado-Torres L, Ryten M. Splicing accuracy varies across human introns, tissues and age. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.29.534370. [PMID: 37034741 PMCID: PMC10081249 DOI: 10.1101/2023.03.29.534370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigated mis-splicing using RNA-sequencing data from ~14K control samples and 42 human body sites, focusing on split reads partially mapping to known transcripts in annotation. We show that mis-splicing occurs at different rates across introns and tissues and that these splicing inaccuracies are primarily affected by the abundance of core components of the spliceosome assembly and its regulators. Using publicly available data on short-hairpin RNA-knockdowns of numerous spliceosomal components and related regulators, we found support for the importance of RNA-binding proteins in mis-splicing. We also demonstrated that age is positively correlated with mis-splicing, and it affects genes implicated in neurodegenerative diseases. This in-depth characterisation of mis-splicing can have important implications for our understanding of the role of splicing inaccuracies in human disease and the interpretation of long-read RNA-sequencing data.
Collapse
Affiliation(s)
- S García-Ruiz
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - D Zhang
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
| | - E K Gustavsson
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - G Rocamora-Perez
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
| | - M Grant-Peters
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - A Fairbrother-Browne
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King's College London, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - R H Reynolds
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - J W Brenton
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| | - A L Gil-Martínez
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - Z Chen
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - D C Rio
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA 94720, USA
| | - J A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - S Guelfi
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- Verge Genomics, South San Francisco, CA, 94080, USA
| | - L Collado-Torres
- Lieber Institute for Brain Development, Baltimore, MD, USA , 21205
| | - M Ryten
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815
| |
Collapse
|
103
|
Lyulcheva-Bennett E, Genomics England Research Consortium, Bennett D. A retrospective analysis of phosphatase catalytic subunit gene variants in patients with rare disorders identifies novel candidate neurodevelopmental disease genes. Front Cell Dev Biol 2023; 11:1107930. [PMID: 37056996 PMCID: PMC10086149 DOI: 10.3389/fcell.2023.1107930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/03/2023] [Indexed: 03/30/2023] Open
Abstract
Rare genetic disorders represent some of the most severe and life-limiting conditions that constitute a considerable burden on global healthcare systems and societies. Most individuals affected by rare disorders remain undiagnosed, highlighting the unmet need for improved disease gene discovery and novel variant interpretation. Aberrant (de) phosphorylation can have profound pathological consequences underpinning many disease processes. Numerous phosphatases and associated proteins have been identified as disease genes, with many more likely to have gone undiscovered thus far. To begin to address these issues, we have performed a systematic survey of de novo variants amongst 189 genes encoding phosphatase catalytic subunits found in rare disease patients recruited to the 100,000 Genomes Project (100 kGP), the largest national sequencing project of its kind in the United Kingdom. We found that 49% of phosphatases were found to carry de novo mutation(s) in this cohort. Only 25% of these phosphatases have been previously linked to genetic disorders. A gene-to-patient approach matching variants to phenotypic data identified 9 novel candidate rare-disease genes: PTPRD, PTPRG, PTPRT, PTPRU, PTPRZ1, MTMR3, GAK, TPTE2, PTPN18. As the number of patients undergoing whole genome sequencing increases and information sharing improves, we anticipate that reiterative analysis of genomic and phenotypic data will continue to identify candidate phosphatase disease genes for functional validation. This is the first step towards delineating the aetiology of rare genetic disorders associated with altered phosphatase function, leading to new biological insights and improved clinical outcomes for the affected individuals and their families.
Collapse
Affiliation(s)
| | | | - Daimark Bennett
- Division of Developmental Biology and Medicine, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
104
|
Varabyou A, Erdogdu B, Salzberg SL, Pertea M. Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.23.533704. [PMID: 36993373 PMCID: PMC10055401 DOI: 10.1101/2023.03.23.533704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
ORFanage is a system designed to assign open reading frames (ORFs) to both known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing (RNA-seq) experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the RefSeq and GENCODE human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.
Collapse
Affiliation(s)
- Ales Varabyou
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
| | - Beril Erdogdu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
105
|
Amaral P, Carbonell-Sala S, De La Vega FM, Faial T, Frankish A, Gingeras T, Guigo R, Harrow JL, Hatzigeorgiou AG, Johnson R, Murphy TD, Pertea M, Pruitt KD, Pujar S, Takahashi H, Ulitsky I, Varabyou A, Wells CA, Yandell M, Carninci P, Salzberg SL. The status of the human gene catalogue. ARXIV 2023:arXiv:2303.13996v1. [PMID: 36994150 PMCID: PMC10055485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Scientists have been trying to identify all of the genes in the human genome since the initial draft of the genome was published in 2001. Over the intervening years, much progress has been made in identifying protein-coding genes, and the estimated number has shrunk to fewer than 20,000, although the number of distinct protein-coding isoforms has expanded dramatically. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function. A combination of recent advances offers a path forward to identifying these functions and towards eventually completing the human gene catalogue. However, much work remains to be done before we have a universal annotation standard that includes all medically significant genes, maintains their relationships with different reference genomes, and describes clinically relevant genetic variants.
Collapse
Affiliation(s)
- Paulo Amaral
- INSPER Institute of Education and Research, São Paulo, SP, Brasil
| | - Silvia Carbonell-Sala
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
| | - Francisco M. De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA; Tempus Labs, Inc., Chicago, IL
| | | | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thomas Gingeras
- Department of Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY
| | - Roderic Guigo
- Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Catalonia, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain
| | - Jennifer L Harrow
- Centre for Genomics Research, Discovery Sciences, AstraZeneca, Da Vinci Building. Melbourn Science Park, Royston UK SG8 6HB
| | - Artemis G. Hatzigeorgiou
- Universithy of Thessaly, Department of Computer Science and Biomedical Informatics, Lamia, Greece; Hellenic Pasteur Institute, Athens, Greece
| | - Rory Johnson
- School of Biology and Environmental Science, University College Dublin, D04 V1W8 Dublin, Ireland; Conway Institute of Biomedical and Biomolecular Research, University College Dublin, D04 V1W8 Dublin, Ireland; Department of Medical Oncology, Inselspital, Bern University Hospital, University of Bern, 3010 Bern, Switzerland; Department for BioMedical Research, University of Bern, 3008 Bern, Switzerland
| | - Terence D. Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Hazuki Takahashi
- Laboratory for Transcriptome Technology, RIKEN Center for Integrative Medical Sciences, Yokohama Kanagawa 230-0045 Japan
| | - Igor Ulitsky
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ales Varabyou
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Christine A. Wells
- Stem Cell Systems, Department of Anatomy and Physiology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Parkville 3010 Vic Australia
| | - Mark Yandell
- Departent of Human Genetics, Utah Center for Genetic Discovery, University of Utah, Salt Lake City, UT, USA
| | - Piero Carninci
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Human Technopole, via Rita Levi Montalcini 1, Milan 20157 Italy
| | - Steven L. Salzberg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Immunology and Regenerative Biology; Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
106
|
Greene D, Pirri D, Frudd K, Sackey E, Al-Owain M, Giese APJ, Ramzan K, Riaz S, Yamanaka I, Boeckx N, Thys C, Gelb BD, Brennan P, Hartill V, Harvengt J, Kosho T, Mansour S, Masuno M, Ohata T, Stewart H, Taibah K, Turner CLS, Imtiaz F, Riazuddin S, Morisaki T, Ostergaard P, Loeys BL, Morisaki H, Ahmed ZM, Birdsey GM, Freson K, Mumford A, Turro E. Genetic association analysis of 77,539 genomes reveals rare disease etiologies. Nat Med 2023; 29:679-688. [PMID: 36928819 PMCID: PMC10033407 DOI: 10.1038/s41591-023-02211-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 01/06/2023] [Indexed: 03/18/2023]
Abstract
The genetic etiologies of more than half of rare diseases remain unknown. Standardized genome sequencing and phenotyping of large patient cohorts provide an opportunity for discovering the unknown etiologies, but this depends on efficient and powerful analytical methods. We built a compact database, the 'Rareservoir', containing the rare variant genotypes and phenotypes of 77,539 participants sequenced by the 100,000 Genomes Project. We then used the Bayesian genetic association method BeviMed to infer associations between genes and each of 269 rare disease classes assigned by clinicians to the participants. We identified 241 known and 19 previously unidentified associations. We validated associations with ERG, PMEPA1 and GPR156 by searching for pedigrees in other cohorts and using bioinformatic and experimental approaches. We provide evidence that (1) loss-of-function variants in the Erythroblast Transformation Specific (ETS)-family transcription factor encoding gene ERG lead to primary lymphoedema, (2) truncating variants in the last exon of transforming growth factor-β regulator PMEPA1 result in Loeys-Dietz syndrome and (3) loss-of-function variants in GPR156 give rise to recessive congenital hearing impairment. The Rareservoir provides a lightweight, flexible and portable system for synthesizing the genetic and phenotypic data required to study rare disease cohorts with tens of thousands of participants.
Collapse
Affiliation(s)
- Daniel Greene
- Department of Medicine, University of Cambridge, Cambridge, UK
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniela Pirri
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Karen Frudd
- National Heart and Lung Institute, Imperial College London, London, UK
- University College London Institute of Ophthalmology, University College London, London, UK
| | - Ege Sackey
- Molecular and Clinical Sciences Institute, St. George's University of London, London, UK
| | - Mohammed Al-Owain
- Department of Medical Genomics, Centre for Genomic Medicine, King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia
| | - Arnaud P J Giese
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Khushnooda Ramzan
- Department of Clinical Genomics, Centre for Genomic Medicine, King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia
| | - Sehar Riaz
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Itaru Yamanaka
- Department of Bioscience and Genetics, National Cerebral and Cardiovascular Center, Osaka, Japan
| | - Nele Boeckx
- Center for Medical Genetics, Antwerp University Hospital/University of Antwerp, Antwerp, Belgium
| | - Chantal Thys
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Bruce D Gelb
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Paul Brennan
- Northern Genetics Service, Newcastle upon Tyne Hospitals National Health Service Trust International Centre for Life, Newcastle upon Tyne, UK
| | - Verity Hartill
- Department of Clinical Genetics, Chapel Allerton Hospital, Leeds Teaching Hospitals National Health Service Trust, Leeds, UK
- Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Julie Harvengt
- Centre for Medical Genetics, Centre Hospitalier Universitaire de Liège, Liège, Belgium
| | - Tomoki Kosho
- Department of Medical Genetics, Shinshu University School of Medicine, Nagano, Japan
- Center for Medical Genetics, Shinshu University Hospital, Nagano, Japan
| | - Sahar Mansour
- Molecular and Clinical Sciences Institute, St. George's University of London, London, UK
- South West Thames Regional Genetics Service, St. George's University Hospitals National Health Service Foundation Trust, London, UK
| | - Mitsuo Masuno
- Department of Medical Genetics, Kawasaki Medical School Hospital, Okayama, Japan
| | | | - Helen Stewart
- Oxford University Hospitals National Health Service Foundation Trust, Oxford, UK
| | - Khalid Taibah
- Ear Nose and Throat Medical Centre, Riyadh, Saudi Arabia
| | - Claire L S Turner
- Peninsula Clinical Genetics Service, Royal Devon & Exeter Hospital, Exeter, UK
| | - Faiqa Imtiaz
- Department of Clinical Genomics, Centre for Genomic Medicine, King Faisal Specialist Hospital & Research Centre, Riyadh, Saudi Arabia
| | - Saima Riazuddin
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Takayuki Morisaki
- Department of Bioscience and Genetics, National Cerebral and Cardiovascular Center, Osaka, Japan
- Division of Molecular Pathology and Department of Internal Medicine, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Pia Ostergaard
- Molecular and Clinical Sciences Institute, St. George's University of London, London, UK
| | - Bart L Loeys
- Center for Medical Genetics, Antwerp University Hospital/University of Antwerp, Antwerp, Belgium
- Department of Human Genetics, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Hiroko Morisaki
- Department of Bioscience and Genetics, National Cerebral and Cardiovascular Center, Osaka, Japan
- Department of Medical Genetics, Sakakibara Heart Institute, Tokyo, Japan
| | - Zubair M Ahmed
- Department of Otorhinolaryngology Head and Neck Surgery, School of Medicine, University of Maryland, Baltimore, MD, USA
- Department of Biochemistry and Molecular Biology, School of Medicine, University of Maryland, Baltimore, MD, USA
| | - Graeme M Birdsey
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Kathleen Freson
- Department of Cardiovascular Sciences, Center for Molecular and Vascular Biology, KU Leuven, Leuven, Belgium
| | - Andrew Mumford
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, UK
- South West National Health Service Genomic Medicine Service Alliance, Bristol, UK
| | - Ernest Turro
- Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK.
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
107
|
Seaby EG, Thomas NS, Webb A, Brittain H, Taylor Tavares AL, Baralle D, Rehm HL, O'Donnell-Luria A, Ennis S. Targeting de novo loss-of-function variants in constrained disease genes improves diagnostic rates in the 100,000 Genomes Project. Hum Genet 2023; 142:351-362. [PMID: 36477409 PMCID: PMC9950176 DOI: 10.1007/s00439-022-02509-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND Genome sequencing was first offered clinically in the UK through the 100,000 Genomes Project (100KGP). Analysis was restricted to predefined gene panels associated with the patient's phenotype. However, panels rely on clearly characterised phenotypes and risk missing diagnoses outside of the panel(s) applied. We propose a complementary method to rapidly identify pathogenic variants, including those missed by 100KGP methods. METHODS The Loss-of-function Observed/Expected Upper-bound Fraction (LOEUF) score quantifies gene constraint, with low scores correlated with haploinsufficiency. We applied DeNovoLOEUF, a filtering strategy to sequencing data from 13,949 rare disease trios in the 100KGP, by filtering for rare, de novo, loss-of-function variants in disease genes with a LOEUF score < 0.2. We compared our findings with the corresponding patient's diagnostic reports. RESULTS 324/332 (98%) of the variants identified using DeNovoLOEUF were diagnostic or partially diagnostic (whereby the variant was responsible for some of the phenotype). We identified 39 diagnoses that were "missed" by 100KGP standard analyses, which are now being returned to patients. CONCLUSION We have demonstrated a highly specific and rapid method with a 98% positive predictive value that has good concordance with standard analysis, low false-positive rate, and can identify additional diagnoses. Globally, as more patients are being offered genome sequencing, we anticipate that DeNovoLOEUF will rapidly identify new diagnoses and facilitate iterative analyses when new disease genes are discovered.
Collapse
Affiliation(s)
- Eleanor G Seaby
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, 02115, USA.
- Paediatric Infectious Diseases, Imperial College London, London, W2 1NY, UK.
| | - N Simon Thomas
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury, SP2 8BJ, UK
| | - Amy Webb
- Wessex Regional Genomics Laboratory, Salisbury NHS Foundation Trust, Salisbury, SP2 8BJ, UK
| | - Helen Brittain
- Genomics England, Charterhouse Square, London, EC1M 6BQ, UK
| | - Ana Lisa Taylor Tavares
- Genomics England, Charterhouse Square, London, EC1M 6BQ, UK
- East Anglian Medical Genetics Service, Cambridge University Hospital, Hills Road, Cambridge, CB2 0QQ, UK
| | - Diana Baralle
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, 02115, USA
| | - Sarah Ennis
- Genomic Informatics Group, Human Development and Health, Faculty of Medicine, University Hospital Southampton, MP 808, Duthie Building, Southampton, SO16 6YD, Hampshire, UK
| |
Collapse
|
108
|
Walker LC, de la Hoya M, Wiggins GA, Lindy A, Vincent LM, Parsons M, Canson DM, Bis-Brewer D, Cass A, Tchourbanov A, Zimmermann H, Byrne AB, Pesaran T, Karam R, Harrison SM, Spurdle AB. APPLICATION OF THE ACMG/AMP FRAMEWORK TO CAPTURE EVIDENCE RELEVANT TO PREDICTED AND OBSERVED IMPACT ON SPLICING: RECOMMENDATIONS FROM THE CLINGEN SVI SPLICING SUBGROUP. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.24.23286431. [PMID: 36865205 PMCID: PMC9980257 DOI: 10.1101/2023.02.24.23286431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
The American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) framework for classifying variants uses six evidence categories related to the splicing potential of variants: PVS1 (null variant in a gene where loss-of-function is the mechanism of disease), PS3 (functional assays show damaging effect on splicing), PP3 (computational evidence supports a splicing effect), BS3 (functional assays show no damaging effect on splicing), BP4 (computational evidence suggests no splicing impact), and BP7 (silent change with no predicted impact on splicing). However, the lack of guidance on how to apply such codes has contributed to variation in the specifications developed by different Clinical Genome Resource (ClinGen) Variant Curation Expert Panels. The ClinGen Sequence Variant Interpretation (SVI) Splicing Subgroup was established to refine recommendations for applying ACMG/AMP codes relating to splicing data and computational predictions. Our study utilised empirically derived splicing evidence to: 1) determine the evidence weighting of splicing-related data and appropriate criteria code selection for general use, 2) outline a process for integrating splicing-related considerations when developing a gene-specific PVS1 decision tree, and 3) exemplify methodology to calibrate bioinformatic splice prediction tools. We propose repurposing of the PVS1_Strength code to capture splicing assay data that provide experimental evidence for variants resulting in RNA transcript(s) with loss of function. Conversely BP7 may be used to capture RNA results demonstrating no impact on splicing for both intronic and synonymous variants, and for missense variants if protein functional impact has been excluded. Furthermore, we propose that the PS3 and BS3 codes are applied only for well-established assays that measure functional impact that is not directly captured by RNA splicing assays. We recommend the application of PS1 based on similarity of predicted RNA splicing effects for a variant under assessment in comparison to a known Pathogenic variant. The recommendations and approaches for consideration and evaluation of RNA assay evidence described aim to help standardise variant pathogenicity classification processes and result in greater consistency when interpreting splicing-based evidence.
Collapse
|
109
|
Kovačević M, Janković M, Branković M, Milićević O, Novaković I, Sokić D, Ristić A, Shamsani J, Vojvodić N. Novel GATOR1 variants in focal epilepsy. Epilepsy Behav 2023; 141:109139. [PMID: 36848747 DOI: 10.1016/j.yebeh.2023.109139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/03/2023] [Accepted: 02/05/2023] [Indexed: 02/27/2023]
Abstract
INTRODUCTION Variants in GATOR1 genes are well established in focal epilepsy syndromes. A strong association of GATOR1 variants with drug-resistant epilepsy as well as an increased risk of sudden unexplained death in epilepsy warrants developing strategies to facilitate the identification of patients who could potentially benefit from genetic testing and precision medicine. We aimed to determine the yield of GATOR1 gene sequencing in patients with focal epilepsy typically referred for genetic testing, establish novel GATOR1 variants and determine clinical, electroencephalographic, and radiological characteristics of variant carriers. PATIENTS AND METHODS Ninety-six patients with clinical suspicion of genetic focal epilepsy with previous comprehensive diagnostic epilepsy evaluation in The Neurology Clinic, University Clinical Center of Serbia, were included in the study. Sequencing was performed using a custom gene panel encompassing DEPDC5, NPRL2, and NPRL3. Variants of interest (VOI) were classified according to criteria proposed by the American College of Medical Genetics and the Association for Molecular Pathology. RESULTS Four previously unreported VOI in 4/96 (4.2%) patients were found in our cohort. Three likely pathogenic variants were determined in 3/96 (3.1%) patients, one frameshift variant in DEPDC5 in a patient with nonlesional frontal lobe epilepsy, one splicogenic DEPDC5 variant in a patient with nonlesional posterior quadrant epilepsy, and one frameshift variant in NPRL2 in a patient with temporal lobe epilepsy associated with hippocampal sclerosis. Only one VOI, a missense variant in NPRL3, found in 1/96 (1.1%) patients, was classified as a variant of unknown significance. CONCLUSION GATOR1 gene sequencing was diagnostic in 3.1% of our cohort and revealed three novel likely pathogenic variants, including a previously unreported association of temporal lobe epilepsy with hippocampal sclerosis with an NPRL2 variant. Further research is essential for a better understanding of the clinical scope of GATOR1 gene-associated epilepsy.
Collapse
Affiliation(s)
- Maša Kovačević
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia.
| | - Milena Janković
- Neurology Clinic, University Clinical Center of Serbia, Serbia
| | | | | | | | - Dragoslav Sokić
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia
| | - Aleksandar Ristić
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia
| | | | - Nikola Vojvodić
- Neurology Clinic, University Clinical Center of Serbia, Serbia; Faculty of Medicine, University of Belgrade, Serbia
| |
Collapse
|
110
|
SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation. Hum Genomics 2023; 17:7. [PMID: 36765386 PMCID: PMC9912651 DOI: 10.1186/s40246-023-00451-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 01/18/2023] [Indexed: 02/12/2023] Open
Abstract
SpliceAI is an open-source deep learning splicing prediction algorithm that has demonstrated in the past few years its high ability to predict splicing defects caused by DNA variations. However, its outputs present several drawbacks: (1) although the numerical values are very convenient for batch filtering, their precise interpretation can be difficult, (2) the outputs are delta scores which can sometimes mask a severe consequence, and (3) complex delins are most often not handled. We present here SpliceAI-visual, a free online tool based on the SpliceAI algorithm, and show how it complements the traditional SpliceAI analysis. First, SpliceAI-visual manipulates raw scores and not delta scores, as the latter can be misleading in certain circumstances. Second, the outcome of SpliceAI-visual is user-friendly thanks to the graphical presentation. Third, SpliceAI-visual is currently one of the only SpliceAI-derived implementations able to annotate complex variants (e.g., complex delins). We report here the benefits of using SpliceAI-visual and demonstrate its relevance in the assessment/modulation of the PVS1 classification criteria. We also show how SpliceAI-visual can elucidate several complex splicing defects taken from the literature but also from unpublished cases. SpliceAI-visual is available as a Google Colab notebook and has also been fully integrated in a free online variant interpretation tool, MobiDetails ( https://mobidetails.iurc.montp.inserm.fr/MD ).
Collapse
|
111
|
Wright CF, FitzPatrick DR, Ware JS, Rehm HL, Firth HV. Importance of adopting standardized MANE transcripts in clinical reporting. Genet Med 2023; 25:100331. [PMID: 36441169 DOI: 10.1016/j.gim.2022.10.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 10/24/2022] [Accepted: 10/25/2022] [Indexed: 11/29/2022] Open
Affiliation(s)
- Caroline F Wright
- Institute of Biomedical and Clinical Science, University of Exeter Medical School, Royal Devon and Exeter Hospital, Exeter, United Kingdom
| | - David R FitzPatrick
- MRC Human Genetics Unit, Institute of Genetic and Cancer, The University of Edinburgh, Edinburgh, United Kingdom
| | - James S Ware
- National Heart and Lung Institute and MRC London Institute of Medical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Heidi L Rehm
- Center for Genomic Medicine, Massachusetts General Hospital and Broad Institute of MIT and Harvard, Boston, MA.
| | - Helen V Firth
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom; Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom.
| |
Collapse
|
112
|
aRgus: Multilevel visualization of non-synonymous single nucleotide variants & advanced pathogenicity score modeling for genetic vulnerability assessment. Comput Struct Biotechnol J 2023; 21:1077-1083. [PMID: 36789265 PMCID: PMC9900257 DOI: 10.1016/j.csbj.2023.01.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/18/2023] [Accepted: 01/18/2023] [Indexed: 01/26/2023] Open
Abstract
The widespread use of high-throughput sequencing techniques is leading to a rapidly increasing number of disease-associated variants of unknown significance and candidate genes. Integration of knowledge concerning their genetic, protein as well as functional and conservational aspects is necessary for an exhaustive assessment of their relevance and for prioritization of further clinical and functional studies investigating their role in human disease. To collect the necessary information, a multitude of different databases has to be accessed and data extraction from the original sources commonly is not user-friendly and requires advanced bioinformatics skills. This leads to a decreased data accessibility for a relevant number of potential users such as clinicians, geneticist, and clinical researchers. Here, we present aRgus (https://argus.urz.uni-heidelberg.de/), a standalone webtool for simple extraction and intuitive visualization of multi-layered gene, protein, variant, and variant effect prediction data. aRgus provides interactive exploitation of these data within seconds for any known gene of the human genome. In contrast to existing online platforms for compilation of variant data, aRgus complements visualization of chromosomal exon-intron structure and protein domain annotation with ClinVar and gnomAD variant distributions as well as position-specific variant effect prediction score modeling. aRgus thereby enables timely assessment of protein regions vulnerable to variation with single amino acid resolution and provides numerous applications in variant and protein domain interpretation as well as in the design of in vitro experiments.
Collapse
|
113
|
Dvorak P, Hanicinec V, Soucek P. The position of the longest intron is related to biological functions in some human genes. Front Genet 2023; 13:1085139. [PMID: 36712854 PMCID: PMC9875286 DOI: 10.3389/fgene.2022.1085139] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/27/2022] [Indexed: 01/12/2023] Open
Abstract
The evidence that introns can influence different levels of transfer of genetic information between DNA and the final product is increasing. Longer first introns were found to be a general property of eukaryotic gene structure and shown to contain a higher fraction of conserved sequence and different functional elements. Our work brings more precise information about the position of the longest introns in human protein-coding genes and possible connection with biological function and gene expression. According to our results, the position of the longest intron can be localized to the first third of introns in 64%, the second third in 19%, and the third in 17%, with notable peaks at the middle and last introns of approximately 5% and 6%, respectively. The median lengths of the longest introns decrease with increasing distance from the start of the gene from approximately 15,000 to 5,000 bp. We have shown that the position of the longest intron is in some cases linked to the biological function of the given gene. For example, DNA repair genes have the longest intron more often in the second or third. In the distribution of gene expression according to the position of the longest intron, tissue-specific profiles can be traced with the highest expression usually at the absolute positions of intron 1 and 2. In this work, we present arguments supporting the hypothesis that the position of the longest intron in a gene is another biological factor modulating the transmission of genetic information. The position of the longest intron is related to biological functions in some human genes.
Collapse
Affiliation(s)
- Pavel Dvorak
- Department of Biology, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia,Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia,Institute of Medical Genetics, University Hospital Pilsen, Pilsen, Czechia,*Correspondence: Pavel Dvorak,
| | - Vojtech Hanicinec
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia
| | - Pavel Soucek
- Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czechia,Toxicogenomics Unit, National Institute of Public Health, Prague, Czechia
| |
Collapse
|
114
|
Sayers EW, Bolton EE, Brister J, Canese K, Chan J, Comeau D, Farrell C, Feldgarden M, Fine AM, Funk K, Hatcher E, Kannan S, Kelly C, Kim S, Klimke W, Landrum M, Lathrop S, Lu Z, Madden T, Malheiro A, Marchler-Bauer A, Murphy T, Phan L, Pujar S, Rangwala S, Schneider V, Tse T, Wang J, Ye J, Trawick B, Pruitt K, Sherry S. Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res 2023; 51:D29-D38. [PMID: 36370100 PMCID: PMC9825438 DOI: 10.1093/nar/gkac1032] [Citation(s) in RCA: 85] [Impact Index Per Article: 85.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/11/2022] [Accepted: 11/09/2022] [Indexed: 11/15/2022] Open
Abstract
The National Center for Biotechnology Information (NCBI) provides online information resources for biology, including the GenBank® nucleic acid sequence database and the PubMed® database of citations and abstracts published in life science journals. NCBI provides search and retrieval operations for most of these data from 35 distinct databases. The E-utilities serve as the programming interface for most of these databases. New resources include the Comparative Genome Resource (CGR) and the BLAST ClusteredNR database. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, IgBLAST, GDV, RefSeq, NCBI Virus, GenBank type assemblies, iCn3D, ClinVar, GTR, dbGaP, ALFA, ClinicalTrials.gov, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Collapse
Affiliation(s)
- Eric W Sayers
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - J Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathi Canese
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jessica Chan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Donald C Comeau
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Catherine M Farrell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Michael Feldgarden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Anna M Fine
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kathryn Funk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Eneida Hatcher
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sivakumar Kannan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Christopher Kelly
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - William Klimke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Melissa J Landrum
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stacy Lathrop
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Thomas L Madden
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Adriana Malheiro
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Lon Phan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Shashikant Pujar
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Sanjida H Rangwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Tony Tse
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Jian Ye
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Barton W Trawick
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
115
|
Burley SK, Bhikadiya C, Bi C, Bittrich S, Chao H, Chen L, Craig PA, Crichlow GV, Dalenberg K, Duarte JM, Dutta S, Fayazi M, Feng Z, Flatt JW, Ganesan S, Ghosh S, Goodsell DS, Green RK, Guranovic V, Henry J, Hudson BP, Khokhriakov I, Lawson CL, Liang Y, Lowe R, Peisach E, Persikova I, Piehl DW, Rose Y, Sali A, Segura J, Sekharan M, Shao C, Vallat B, Voigt M, Webb B, Westbrook JD, Whetstone S, Young JY, Zalevsky A, Zardecki C. RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res 2023; 51:D488-D508. [PMID: 36420884 PMCID: PMC9825554 DOI: 10.1093/nar/gkac1077] [Citation(s) in RCA: 141] [Impact Index Per Article: 141.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/17/2022] [Accepted: 11/02/2022] [Indexed: 11/27/2022] Open
Abstract
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to ∼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.
Collapse
Affiliation(s)
- Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Charmi Bhikadiya
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Chunxiao Bi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Sebastian Bittrich
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Henry Chao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Li Chen
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Paul A Craig
- School of Chemistry and Materials Science, Rochester Institute of Technology, Rochester, NY 14623, USA
| | - Gregg V Crichlow
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Kenneth Dalenberg
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Shuchismita Dutta
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Maryam Fayazi
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Zukang Feng
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Justin W Flatt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Sai Ganesan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Sutapa Ghosh
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - David S Goodsell
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Rachel Kramer Green
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Vladimir Guranovic
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jeremy Henry
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Brian P Hudson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Igor Khokhriakov
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Catherine L Lawson
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yuhe Liang
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Robert Lowe
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Irina Persikova
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Yana Rose
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Andrej Sali
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Joan Segura
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Monica Sekharan
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Chenghua Shao
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Maria Voigt
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ben Webb
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ 08901, USA
| | - Shamara Whetstone
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jasmine Y Young
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Arthur Zalevsky
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA
| | - Christine Zardecki
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
- Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| |
Collapse
|
116
|
UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res 2023; 51:D523-D531. [PMID: 36408920 PMCID: PMC9825514 DOI: 10.1093/nar/gkac1052] [Citation(s) in RCA: 1157] [Impact Index Per Article: 1157.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 10/05/2022] [Accepted: 10/25/2022] [Indexed: 11/22/2022] Open
Abstract
The aim of the UniProt Knowledgebase is to provide users with a comprehensive, high-quality and freely accessible set of protein sequences annotated with functional information. In this publication we describe enhancements made to our data processing pipeline and to our website to adapt to an ever-increasing information content. The number of sequences in UniProtKB has risen to over 227 million and we are working towards including a reference proteome for each taxonomic group. We continue to extract detailed annotations from the literature to update or create reviewed entries, while unreviewed entries are supplemented with annotations provided by automated systems using a variety of machine-learning techniques. In addition, the scientific community continues their contributions of publications and annotations to UniProt entries of their interest. Finally, we describe our new website (https://www.uniprot.org/), designed to enhance our users' experience and make our data easily accessible to the research community. This interface includes access to AlphaFold structures for more than 85% of all entries as well as improved visualisations for subcellular localisation of proteins.
Collapse
|
117
|
Fan S, Zhao T, Sun L. The global prevalence and ethnic heterogeneity of iron-refractory iron deficiency anaemia. Orphanet J Rare Dis 2023; 18:2. [PMID: 36604716 PMCID: PMC9814447 DOI: 10.1186/s13023-022-02612-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 12/29/2022] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Iron-refractory iron deficiency anaemia (IRIDA) is an autosomal recessive iron deficiency anaemia caused by mutations in the TMPRSS6 gene. Iron deficiency anaemia is common, whereas IRIDA is rare. The prevalence of IRIDA is unclear. This study aimed to estimate the carrier frequency and genetic prevalence of IRIDA using Genome Aggregation Database (gnomAD) data. METHODS The pathogenicity of TMPRSS6 variants was interpreted according to the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) standards and guidelines. The minor allele frequency (MAF) of TMPRSS6 gene disease-causing variants in 141,456 unique individuals was examined to estimate the global prevalence of IRIDA in seven ethnicities: African/African American (afr), American Admixed/Latino (amr), Ashkenazi Jewish (asj), East Asian (eas), Finnish (fin), Non-Finnish European (nfe) and South Asian (sas). The global and population-specific carrier frequencies and genetic prevalence of IRIDA were calculated using the Hardy-Weinberg equation. RESULTS In total, 86 pathogenic/likely pathogenic variants (PV/LPV) were identified according to ACMG/AMP guideline. The global carrier frequency and genetic prevalence of IRIDA were 2.02 per thousand and 1.02 per million, respectively. CONCLUSIONS The prevalence of IRIDA is greater than previous estimates.
Collapse
Affiliation(s)
- Shanghua Fan
- grid.412632.00000 0004 1758 2270Department of Neurology, Renmin Hospital of Wuhan University, Wuhan, 430060 China
| | - Ting Zhao
- grid.414011.10000 0004 1808 090XDepartment of Neurology, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, 450003 China
| | - Liu Sun
- Department of Information Technology, School of Mathematics and Information Technology, Yuxi Normal University, Yuxi, 653100, China.
| |
Collapse
|
118
|
McCann EP, Grima N, Fifita JA, Chan Moi Fat S, Lehnert K, Henden L, Blair IP, Williams KL. Characterising the Genetic Landscape of Amyotrophic Lateral Sclerosis: A Catalogue and Assessment of Over 1,000 Published Genetic Variants. J Neuromuscul Dis 2023; 10:1127-1141. [PMID: 37638449 PMCID: PMC10657717 DOI: 10.3233/jnd-230148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/07/2023] [Indexed: 08/29/2023]
Abstract
BACKGROUND Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with genetic and phenotypic heterogeneity. Pathogenic genetic variants remain the only validated cause of disease, the majority of which were discovered in familial ALS patients. While causal gene variants are a lesser contributor to sporadic ALS, an increasing number of risk alleles (low penetrance genetic variants associated with a small increase in disease risk) and variants of uncertain significance have been reported. OBJECTIVE To examine the pathogenic potential of genetic variation in ALS, we sought to characterise variant- and gene-level attributes of previously reported ALS-implicated variants. METHODS A list of 1,087 genetic variants reported in ALS to March 2021 was compiled through comprehensive literature review. Individual variants were annotated using in silico tools and databases across variant features including pathogenicity scores, localisation to protein domains, evolutionary conservation, and minor allele frequencies. Gene level attributes of genic tolerance, gene expression in ALS-relevant tissues and gene ontology terms were assessed for 33 ALS genes. Statistical analysis was performed for each characteristic, and we compared the most penetrant variants found in familial cases with risk alleles exclusive to sporadic cases, to explore genetic variant features that associate with disease penetrance. RESULTS We provide spreadsheet (hg19 and GRCh38) and variant call format (GRCh38) resources for all 1,087 reported ALS-implicated variants, including detailed summaries for each attribute. We demonstrate that the characteristics of variants found exclusively in sporadic ALS cases are less severe than those observed in familial ALS. CONCLUSIONS We provide a comprehensive, literature-derived catalogue of genetic variation in ALS thus far and reveal crucial attributes that contribute to ALS pathogenicity. Our variant- and gene-level observations highlight the complexity of genetic variation in ALS, and we discuss important implications and considerations for novel variant interpretation.
Collapse
Affiliation(s)
- Emily P. McCann
- Motor Neuron Disease Research Centre, Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Natalie Grima
- Motor Neuron Disease Research Centre, Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Jennifer A. Fifita
- Motor Neuron Disease Research Centre, Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Sandrine Chan Moi Fat
- Motor Neuron Disease Research Centre, Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Klaus Lehnert
- School of Biological Sciences, Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Lyndal Henden
- Motor Neuron Disease Research Centre, Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Ian P. Blair
- Motor Neuron Disease Research Centre, Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Kelly L. Williams
- Motor Neuron Disease Research Centre, Macquarie Medical School, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
119
|
Alenezi WM, Fierheller CT, Serruya C, Revil T, Oros KK, Subramanian DN, Bruce J, Spiegelman D, Pugh T, Campbell IG, Mes-Masson AM, Provencher D, Foulkes WD, Haffaf ZE, Rouleau G, Bouchard L, Greenwood CMT, Ragoussis J, Tonin PN. Genetic analyses of DNA repair pathway associated genes implicate new candidate cancer predisposing genes in ancestrally defined ovarian cancer cases. Front Oncol 2023; 13:1111191. [PMID: 36969007 PMCID: PMC10030840 DOI: 10.3389/fonc.2023.1111191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/06/2023] [Indexed: 03/29/2023] Open
Abstract
Not all familial ovarian cancer (OC) cases are explained by pathogenic germline variants in known risk genes. A candidate gene approach involving DNA repair pathway genes was applied to identify rare recurring pathogenic variants in familial OC cases not associated with known OC risk genes from a population exhibiting genetic drift. Whole exome sequencing (WES) data of 15 OC cases from 13 families tested negative for pathogenic variants in known OC risk genes were investigated for candidate variants in 468 DNA repair pathway genes. Filtering and prioritization criteria were applied to WES data to select top candidates for further analyses. Candidates were genotyped in ancestry defined study groups of 214 familial and 998 sporadic OC or breast cancer (BC) cases and 1025 population-matched controls and screened for additional carriers in 605 population-matched OC cases. The candidate genes were also analyzed in WES data from 937 familial or sporadic OC cases of diverse ancestries. Top candidate variants in ERCC5, EXO1, FANCC, NEIL1 and NTHL1 were identified in 5/13 (39%) OC families. Collectively, candidate variants were identified in 7/435 (1.6%) sporadic OC cases and 1/566 (0.2%) sporadic BC cases versus 1/1025 (0.1%) controls. Additional carriers were identified in 6/605 (0.9%) OC cases. Tumour DNA from ERCC5, NEIL1 and NTHL1 variant carriers exhibited loss of the wild-type allele. Carriers of various candidate variants in these genes were identified in 31/937 (3.3%) OC cases of diverse ancestries versus 0-0.004% in cancer-free controls. The strategy of applying a candidate gene approach in a population exhibiting genetic drift identified new candidate OC predisposition variants in DNA repair pathway genes.
Collapse
Affiliation(s)
- Wejdan M. Alenezi
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Cancer Research Program, Centre for Translational Biology, The Research Institute of McGill University Health Centre, Montreal, QC, Canada
- Department of Medical Laboratory Technology, Taibah University, Medina, Saudi Arabia
| | - Caitlin T. Fierheller
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Cancer Research Program, Centre for Translational Biology, The Research Institute of McGill University Health Centre, Montreal, QC, Canada
| | - Corinne Serruya
- Cancer Research Program, Centre for Translational Biology, The Research Institute of McGill University Health Centre, Montreal, QC, Canada
| | - Timothée Revil
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill Genome Centre, McGill University, Montreal, QC, Canada
| | - Kathleen K. Oros
- Lady Davis Institute for Medical Research of the Jewish General Hospital, Montreal, QC, Canada
| | - Deepak N. Subramanian
- Cancer Genetics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Jeffrey Bruce
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Dan Spiegelman
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Trevor Pugh
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Ian G. Campbell
- Cancer Genetics Laboratory, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, VIC, Australia
| | - Anne-Marie Mes-Masson
- Centre de recherche du Centre hospitalier de l’Université de Montréal and Institut du cancer de Montréal, Montreal, QC, Canada
- Departement of Medicine, Université de Montréal, Montreal, QC, Canada
| | - Diane Provencher
- Centre de recherche du Centre hospitalier de l’Université de Montréal and Institut du cancer de Montréal, Montreal, QC, Canada
- Division of Gynecologic Oncology, Université de Montréal, Montreal, QC, Canada
| | - William D. Foulkes
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Cancer Research Program, Centre for Translational Biology, The Research Institute of McGill University Health Centre, Montreal, QC, Canada
- Lady Davis Institute for Medical Research of the Jewish General Hospital, Montreal, QC, Canada
- Department of Medical Genetics, McGill University Health Centre, Montreal, QC, Canada
- Department of Medicine, McGill University, Montreal, QC, Canada
- Gerald Bronfman Department of Oncology, McGill University, Montreal, QC, Canada
| | - Zaki El Haffaf
- Centre de recherche du Centre hospitalier de l’Université de Montréal and Institut du cancer de Montréal, Montreal, QC, Canada
- Service de Médecine Génique, Centre Hospitalier de l’Université de Montréal, Montreal, QC, Canada
| | - Guy Rouleau
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Luigi Bouchard
- Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC, Canada
- Department of Medical Biology, Centres intégrés universitaires de santé et de services sociaux du Saguenay-Lac-Saint-Jean hôpital Universitaire de Chicoutimi, Saguenay, QC, Canada
- Centre de Recherche du Centre hospitalier l’Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Celia M. T. Greenwood
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Lady Davis Institute for Medical Research of the Jewish General Hospital, Montreal, QC, Canada
- Gerald Bronfman Department of Oncology, McGill University, Montreal, QC, Canada
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- McGill Genome Centre, McGill University, Montreal, QC, Canada
| | - Patricia N. Tonin
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Cancer Research Program, Centre for Translational Biology, The Research Institute of McGill University Health Centre, Montreal, QC, Canada
- Department of Medicine, McGill University, Montreal, QC, Canada
- *Correspondence: Patricia N. Tonin,
| |
Collapse
|
120
|
Chen LL, Bindereif A, Bozzoni I, Chang HY, Matera AG, Gorospe M, Hansen TB, Kjems J, Ma XK, Pek JW, Rajewsky N, Salzman J, Wilusz JE, Yang L, Zhao F. A guide to naming eukaryotic circular RNAs. Nat Cell Biol 2023; 25:1-5. [PMID: 36658223 PMCID: PMC10114414 DOI: 10.1038/s41556-022-01066-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Alternative splicing of eukaryotic transcripts often leads to production of multiple mature RNAs from a single gene locus. In addition to encoding linear RNAs, genes can produce stable circular RNAs (circRNAs) that are often co-expressed with their cognate linear RNAs. Multiple distinct circRNAs are frequently generated from a gene locus via back-splicing, with each mature transcript having a potentially unique function due to its distinct combination of exons and sometimes retained introns. However, names currently given to circRNAs are often ambiguous and lack consistency across studies. Here, we call on the community to embrace standards for naming circRNAs so that a common nomenclature is used to ensure clarity and reproducibility.
Collapse
Affiliation(s)
- Ling-Ling Chen
- State Key Laboratory of Molecular Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China.
| | - Albrecht Bindereif
- Institute of Biochemistry, Faculty of Biology and Chemistry, Justus Liebig University of Giessen, Giessen, Germany
| | - Irene Bozzoni
- Department of Biology and Biotechnologies 'Charles Darwin' and IIT Center for Life Nano- & Neuro-Science@Sapienza, Sapienza University of Rome, Rome, Italy
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Howard Hughes Medical Institute, Stanford University, Stanford, CA, USA
| | - A Gregory Matera
- Integrative Program in Biological and Genome Sciences, University of North Carolina, Chapel Hill, NC, USA
| | - Myriam Gorospe
- Laboratory of Genetics and Genomics, National Institute on Aging Intramural Research Program, National Institutes of Health, Baltimore, MD, USA
| | - Thomas B Hansen
- Interdisciplinary Nanoscience Center (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Jørgen Kjems
- Interdisciplinary Nanoscience Center (iNANO), Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Xu-Kai Ma
- Center for Molecular Medicine, Children's Hospital, Fudan University and Shanghai Key Laboratory of Medical Epigenetics, International Laboratory of Medical Epigenetics and Metabolism, Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Jun Wei Pek
- Temasek Life Sciences Laboratory, 1 Research Link National University of Singapore, Singapore, Singapore
| | - Nikolaus Rajewsky
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Hemholtz Association, Berlin, Germany
| | - Julia Salzman
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Jeremy E Wilusz
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Therapeutic Innovation Center, Baylor College of Medicine, Houston, TX, USA.
| | - Li Yang
- Center for Molecular Medicine, Children's Hospital, Fudan University and Shanghai Key Laboratory of Medical Epigenetics, International Laboratory of Medical Epigenetics and Metabolism, Institutes of Biomedical Sciences, Fudan University, Shanghai, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
121
|
Viho EMG, Punt AM, Distel B, Houtman R, Kroon J, Elgersma Y, Meijer OC. The Hippocampal Response to Acute Corticosterone Elevation Is Altered in a Mouse Model for Angelman Syndrome. Int J Mol Sci 2022; 24:ijms24010303. [PMID: 36613751 PMCID: PMC9820460 DOI: 10.3390/ijms24010303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 12/18/2022] [Accepted: 12/21/2022] [Indexed: 12/28/2022] Open
Abstract
Angelman Syndrome (AS) is a severe neurodevelopmental disorder, caused by the neuronal absence of the ubiquitin protein ligase E3A (UBE3A). UBE3A promotes ubiquitin-mediated protein degradation and functions as a transcriptional coregulator of nuclear hormone receptors, including the glucocorticoid receptor (GR). Previous studies showed anxiety-like behavior and hippocampal-dependent memory disturbances in AS mouse models. Hippocampal GR is an important regulator of the stress response and memory formation, and we therefore investigated whether the absence of UBE3A in AS mice disrupted GR signaling in the hippocampus. We first established a strong cortisol-dependent interaction between the GR ligand binding domain and a UBE3A nuclear receptor box in a high-throughput interaction screen. In vivo, we found that UBE3A-deficient AS mice displayed significantly more variation in circulating corticosterone levels throughout the day compared to wildtypes (WT), with low to undetectable levels of corticosterone at the trough of the circadian cycle. Additionally, we observed an enhanced transcriptomic response in the AS hippocampus following acute corticosterone treatment. Surprisingly, chronic corticosterone treatment showed less contrast between AS and WT mice in the hippocampus and liver transcriptomic responses. This suggests that UBE3A limits the acute stimulation of GR signaling, likely as a member of the GR transcriptional complex. Altogether, these data indicate that AS mice are more sensitive to acute glucocorticoid exposure in the brain compared to WT mice. This suggests that stress responsiveness is altered in AS which could lead to anxiety symptoms.
Collapse
Affiliation(s)
- Eva M. G. Viho
- Department of Medicine, Division of Endocrinology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, The Netherlands
- Einthoven Laboratory for Experimental Vascular Medicine, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands
- Correspondence:
| | - A. Mattijs Punt
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Ben Distel
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - René Houtman
- Precision Medicine Lab, 5349 AB Oss, The Netherlands
| | - Jan Kroon
- Department of Medicine, Division of Endocrinology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, The Netherlands
- Einthoven Laboratory for Experimental Vascular Medicine, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands
| | - Ype Elgersma
- Department of Clinical Genetics, Erasmus MC, 3015 GD Rotterdam, The Netherlands
- ENCORE Expertise Center for Neurodevelopmental Disorders, Erasmus MC, 3015 GD Rotterdam, The Netherlands
| | - Onno C. Meijer
- Department of Medicine, Division of Endocrinology, Leiden University Medical Center, Albinusdreef 2, 2333 ZA Leiden, The Netherlands
- Einthoven Laboratory for Experimental Vascular Medicine, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands
| |
Collapse
|
122
|
Dolin RH, Heale BSE, Alterovitz G, Gupta R, Aronson J, Boxwala A, Gothi SR, Haines D, Hermann A, Hongsermeier T, Husami A, Jones J, Naeymi-Rad F, Rapchak B, Ravishankar C, Shalaby J, Terry M, Xie N, Zhang P, Chamala S. Introducing HL7 FHIR Genomics Operations: a developer-friendly approach to genomics-EHR integration. J Am Med Inform Assoc 2022; 30:485-493. [PMID: 36548217 PMCID: PMC9933060 DOI: 10.1093/jamia/ocac246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/16/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
OBJECTIVE Enabling clinicians to formulate individualized clinical management strategies from the sea of molecular data remains a fundamentally important but daunting task. Here, we describe efforts towards a new paradigm in genomics-electronic health record (HER) integration, using a standardized suite of FHIR Genomics Operations that encapsulates the complexity of molecular data so that precision medicine solution developers can focus on building applications. MATERIALS AND METHODS FHIR Genomics Operations essentially "wrap" a genomics data repository, presenting a uniform interface to applications. More importantly, operations encapsulate the complexity of data within a repository and normalize redundant data representations-particularly relevant in genomics, where a tremendous amount of raw data exists in often-complex non-FHIR formats. RESULTS Fifteen FHIR Genomics Operations have been developed, designed to support a wide range of clinical scenarios, such as variant discovery; clinical trial matching; hereditary condition and pharmacogenomic screening; and variant reanalysis. Operations are being matured through the HL7 balloting process, connectathons, pilots, and the HL7 FHIR Accelerator program. DISCUSSION Next-generation sequencing can identify thousands to millions of variants, whose clinical significance can change over time as our knowledge evolves. To manage such a large volume of dynamic and complex data, new models of genomics-EHR integration are needed. Qualitative observations to date suggest that freeing application developers from the need to understand the nuances of genomic data, and instead base applications on standardized APIs can not only accelerate integration but also dramatically expand the applications of Omic data in driving precision care at scale for all.
Collapse
Affiliation(s)
- Robert H Dolin
- Corresponding Author: Robert H. Dolin, MD, Elimu Informatics, 1709 Julian Ct, El Cerrito, CA 94530, USA;
| | | | - Gil Alterovitz
- Brigham and Women’s Hospital, Boston, Massachusetts, USA,Harvard/MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, USA
| | - Rohan Gupta
- Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| | | | | | - Shaileshbhai R Gothi
- Department of Pathology, Immunology and Laboratory Medicine, University of Florida, Gainesville, Florida, USA
| | - David Haines
- Leap of Faith Technologies, Libertyville, Illinois, USA
| | - Arthur Hermann
- Department of Health IT Strategy & Policy, Kaiser Permanente, Pasadena, California, USA
| | | | - Ammar Husami
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| | - James Jones
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
| | | | | | | | | | - May Terry
- MITRE Corporation, McLean, Virginia, USA
| | - Ning Xie
- Biomedical Cybernetics Laboratory, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
| | - Powell Zhang
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Srikar Chamala
- Keck School of Medicine, Department of Pathology, University of Southern California, Los Angeles, California, USA,Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California, USA
| |
Collapse
|
123
|
Sommer MJ, Cha S, Varabyou A, Rincon N, Park S, Minkin I, Pertea M, Steinegger M, Salzberg SL. Structure-guided isoform identification for the human transcriptome. eLife 2022; 11:e82556. [PMID: 36519529 PMCID: PMC9812405 DOI: 10.7554/elife.82556] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Recently developed methods to predict three-dimensional protein structure with high accuracy have opened new avenues for genome and proteome research. We explore a new hypothesis in genome annotation, namely whether computationally predicted structures can help to identify which of multiple possible gene isoforms represents a functional protein product. Guided by protein structure predictions, we evaluated over 230,000 isoforms of human protein-coding genes assembled from over 10,000 RNA sequencing experiments across many human tissues. From this set of assembled transcripts, we identified hundreds of isoforms with more confidently predicted structure and potentially superior function in comparison to canonical isoforms in the latest human gene database. We illustrate our new method with examples where structure provides a guide to function in combination with expression and evolutionary evidence. Additionally, we provide the complete set of structures as a resource to better understand the function of human genes and their isoforms. These results demonstrate the promise of protein structure prediction as a genome annotation tool, allowing us to refine even the most highly curated catalog of human proteins. More generally we demonstrate a practical, structure-guided approach that can be used to enhance the annotation of any genome.
Collapse
Affiliation(s)
- Markus J Sommer
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Sooyoung Cha
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Artificial Intelligence Institute, Seoul National UniversitySeoulRepublic of Korea
| | - Ales Varabyou
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
| | - Natalia Rincon
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Sukhwan Park
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Artificial Intelligence Institute, Seoul National UniversitySeoulRepublic of Korea
| | - Ilia Minkin
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
| | - Martin Steinegger
- School of Biological Sciences, Seoul National UniversitySeoulRepublic of Korea
- Artificial Intelligence Institute, Seoul National UniversitySeoulRepublic of Korea
- Institute of Molecular Biology and Genetics, Seoul National UniversitySeoulRepublic of Korea
| | - Steven L Salzberg
- Department of Biomedical Engineering, Johns Hopkins School of Medicine and Whiting School of EngineeringBaltimoreUnited States
- Center for Computational Biology, Johns Hopkins UniversityBaltimoreUnited States
- Department of Computer Science, Johns Hopkins UniversityBaltimoreUnited States
- Department of Biostatistics, Johns Hopkins UniversityBaltimoreUnited States
| |
Collapse
|
124
|
Parthasarathy S, Ruggiero SM, Gelot A, Soardi FC, Ribeiro BFR, Pires DEV, Ascher DB, Schmitt A, Rambaud C, Represa A, Xie HM, Lusk L, Wilmarth O, McDonnell PP, Juarez OA, Grace AN, Buratti J, Mignot C, Gras D, Nava C, Pierce SR, Keren B, Kennedy BC, Pena SDJ, Helbig I, Cuddapah VA. A recurrent de novo splice site variant involving DNM1 exon 10a causes developmental and epileptic encephalopathy through a dominant-negative mechanism. Am J Hum Genet 2022; 109:2253-2269. [PMID: 36413998 PMCID: PMC9748255 DOI: 10.1016/j.ajhg.2022.11.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/01/2022] [Indexed: 11/23/2022] Open
Abstract
Heterozygous pathogenic variants in DNM1 cause developmental and epileptic encephalopathy (DEE) as a result of a dominant-negative mechanism impeding vesicular fission. Thus far, pathogenic variants in DNM1 have been studied with a canonical transcript that includes the alternatively spliced exon 10b. However, after performing RNA sequencing in 39 pediatric brain samples, we find the primary transcript expressed in the brain includes the downstream exon 10a instead. Using this information, we evaluated genotype-phenotype correlations of variants affecting exon 10a and identified a cohort of eleven previously unreported individuals. Eight individuals harbor a recurrent de novo splice site variant, c.1197-8G>A (GenBank: NM_001288739.1), which affects exon 10a and leads to DEE consistent with the classical DNM1 phenotype. We find this splice site variant leads to disease through an unexpected dominant-negative mechanism. Functional testing reveals an in-frame upstream splice acceptor causing insertion of two amino acids predicted to impair oligomerization-dependent activity. This is supported by neuropathological samples showing accumulation of enlarged synaptic vesicles adherent to the plasma membrane consistent with impaired vesicular fission. Two additional individuals with missense variants affecting exon 10a, p.Arg399Trp and p.Gly401Asp, had a similar DEE phenotype. In contrast, one individual with a missense variant affecting exon 10b, p.Pro405Leu, which is less expressed in the brain, had a correspondingly less severe presentation. Thus, we implicate variants affecting exon 10a as causing the severe DEE typically associated with DNM1-related disorders. We highlight the importance of considering relevant isoforms for disease-causing variants as well as the possibility of splice site variants acting through a dominant-negative mechanism.
Collapse
Affiliation(s)
- Shridhar Parthasarathy
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Sarah McKeown Ruggiero
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Antoinette Gelot
- AP-HP, Hôpital Armand-Trousseau, Service d'Anatomie Pathologique, 75012 Paris, France; INMED INSERM U 901 Parc Scientifique de Luminy, 13273 Marseille, France; Centre de Recherche Clinique ConCer-LD, Paris, France
| | - Fernanda C Soardi
- GENE - Núcleo de Genética Médica, Belo Horizonte, MG, Brazil; Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil; Laboratório de Genômica Clínica, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | | | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC 3052, Australia; School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3053, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia; Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, VIC 3052, Australia; School of Chemistry and Molecular Biology, University of Queensland, St Lucia, QLD 4072, Australia
| | - Alain Schmitt
- INSERM U 1016, Institut Cochin, Paris, France; CNRS UMR 8104, Paris, France; Université Paris Descartes, Sorbonne Paris Cité, Paris, France
| | - Caroline Rambaud
- AP-HP, Hôpital Raymond-Poincaré, Laboratoire Anatomie Pathologique, Garches, France
| | - Alfonso Represa
- INMED, INSERM, Aix-Marseille Université, Campus de Luminy, 13009 Marseille, France
| | - Hongbo M Xie
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Laina Lusk
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA
| | - Olivia Wilmarth
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Pamela Pojomovsky McDonnell
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Olivia A Juarez
- Baylor College of Medicine Genetics Clinic, Children's Hospital of San Antonio, San Antonio, TX, USA
| | - Alexandra N Grace
- Baylor College of Medicine Genetics Clinic, Children's Hospital of San Antonio, San Antonio, TX, USA
| | - Julien Buratti
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France
| | - Cyril Mignot
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France; Sorbonne Universités, UPMC Univ Paris 06, UMR S 1127, INSERM U 1127, CNRS UMR 7225, ICM, 75013 Paris, France; AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Domitille Gras
- AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Caroline Nava
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France; Sorbonne Universités, UPMC Univ Paris 06, UMR S 1127, INSERM U 1127, CNRS UMR 7225, ICM, 75013 Paris, France; AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Samuel R Pierce
- The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Physical Therapy, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Boris Keren
- AP-HP, Hôpital de la Pitié Salpêtrière, Département de Génétique, 75013 Paris, France; Sorbonne Universités, UPMC Univ Paris 06, UMR S 1127, INSERM U 1127, CNRS UMR 7225, ICM, 75013 Paris, France; AP-HP, Hôpital Robert Debré, Service de Neurologie Pediatrique et de Maladies Métaboliques, 75019 Paris, France
| | - Benjamin C Kennedy
- Division of Neurosurgery, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA; Department of Neurosurgery, The University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sergio D J Pena
- GENE - Núcleo de Genética Médica, Belo Horizonte, MG, Brazil; Departamento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil; Laboratório de Genômica Clínica, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA; Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Vishnu Anand Cuddapah
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
| |
Collapse
|
125
|
Nassar LR, Barber GP, Benet-Pagès A, Casper J, Clawson H, Diekhans M, Fischer C, Gonzalez JN, Hinrichs A, Lee B, Lee C, Muthuraman P, Nguy B, Pereira T, Nejad P, Perez G, Raney B, Schmelter D, Speir M, Wick B, Zweig A, Haussler D, Kuhn R, Haeussler M, Kent W. The UCSC Genome Browser database: 2023 update. Nucleic Acids Res 2022; 51:D1188-D1195. [PMID: 36420891 PMCID: PMC9825520 DOI: 10.1093/nar/gkac1072] [Citation(s) in RCA: 132] [Impact Index Per Article: 66.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/14/2022] [Accepted: 10/25/2022] [Indexed: 11/26/2022] Open
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is an omics data consolidator, graphical viewer, and general bioinformatics resource that continues to serve the community as it enters its 23rd year. This year has seen an emphasis in clinical data, with new tracks and an expanded Recommended Track Sets feature on hg38 as well as the addition of a single cell track group. SARS-CoV-2 continues to remain a focus, with regular annotation updates to the browser and continued curation of our phylogenetic sequence placing tool, hgPhyloPlace, whose tree has now reached over 12M sequences. Our GenArk resource has also grown, offering over 2500 hubs and a system for users to request any absent assemblies. We have expanded our bigBarChart display type and created new ways to visualize data via bigRmsk and dynseq display. Displaying custom annotations is now easier due to our chromAlias system which eliminates the requirement for renaming sequence names to the UCSC standard. Users involved in data generation may also be interested in our new tools and trackDb settings which facilitate the creation and display of their custom annotations.
Collapse
Affiliation(s)
- Luis R Nassar
- To whom correspondence should be addressed. Tel: +1 305 205 9160;
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anna Benet-Pagès
- Institute of Neurogenomics, Helmholtz Zentrum München GmbH - German Research Center for Environmental Health, 85764Neuherberg, Germany,Medical Genetics Center (Medizinisch Genetisches Zentrum), Munich 80335, Germany
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Clay Fischer
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Pranav Muthuraman
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Beagan Nguy
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Tiana Pereira
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Parisa Nejad
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gerardo Perez
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brittney D Wick
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
126
|
Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland J, Mudge J, Sisu C, Wright J, Arnan C, Barnes I, Banerjee A, Bennett R, Berry A, Bignell A, Boix C, Calvet F, Cerdán-Vélez D, Cunningham F, Davidson C, Donaldson S, Dursun C, Fatima R, Giorgetti S, Giron C, Gonzalez J, Hardy M, Harrison P, Hourlier T, Hollis Z, Hunt T, James B, Jiang Y, Johnson R, Kay M, Lagarde J, Martin F, Gómez L, Nair S, Ni P, Pozo F, Ramalingam V, Ruffier M, Schmitt B, Schreiber J, Steed E, Suner MM, Sumathipala D, Sycheva I, Uszczynska-Ratajczak B, Wass E, Yang Y, Yates A, Zafrulla Z, Choudhary J, Gerstein M, Guigo R, Hubbard TJP, Kellis M, Kundaje A, Paten B, Tress M, Flicek P. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 2022; 51:D942-D949. [PMID: 36420896 PMCID: PMC9825462 DOI: 10.1093/nar/gkac1071] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/15/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- To whom correspondence should be addressed. Tel: +44 1223 494388; Fax: +44 1223 484696;
| | - Sílvia Carbonell-Sala
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Department of Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Carme Arnan
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abhimanyu Banerjee
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Boix
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Ferriol Calvet
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcıa Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin James
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland,School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Surag Nair
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Pengyu Ni
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Vivek Ramalingam
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jacob M Schreiber
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Barbara Uszczynska-Ratajczak
- Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahoor Zafrulla
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Roderic Guigo
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
127
|
García-Ruiz S, Gustavsson EK, Zhang D, Reynolds RH, Chen Z, Fairbrother-Browne A, Gil-Martínez AL, Botia JA, Collado-Torres L, Ryten M. IntroVerse: a comprehensive database of introns across human tissues. Nucleic Acids Res 2022; 51:D167-D178. [PMID: 36399497 PMCID: PMC9825543 DOI: 10.1093/nar/gkac1056] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/21/2022] [Accepted: 10/30/2022] [Indexed: 11/19/2022] Open
Abstract
Dysregulation of RNA splicing contributes to both rare and complex diseases. RNA-sequencing data from human tissues has shown that this process can be inaccurate, resulting in the presence of novel introns detected at low frequency across samples and within an individual. To enable the full spectrum of intron use to be explored, we have developed IntroVerse, which offers an extensive catalogue on the splicing of 332,571 annotated introns and a linked set of 4,679,474 novel junctions covering 32,669 different genes. This dataset has been generated through the analysis of 17,510 human control RNA samples from 54 tissues provided by the Genotype-Tissue Expression Consortium. IntroVerse has two unique features: (i) it provides a complete catalogue of novel junctions and (ii) each novel junction has been assigned to a specific annotated intron. This unique, hierarchical structure offers multiple uses, including the identification of novel transcripts from known genes and their tissue-specific usage, and the assessment of background splicing noise for introns thought to be mis-spliced in disease states. IntroVerse provides a user-friendly web interface and is freely available at https://rytenlab.com/browser/app/introverse.
Collapse
Affiliation(s)
- Sonia García-Ruiz
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK,Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Emil K Gustavsson
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK,Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - David Zhang
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK,Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Regina H Reynolds
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK,Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Zhongbo Chen
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK,Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK
| | - Aine Fairbrother-Browne
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, WC1N 1EH, UK,Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK,Department of Medical and Molecular Genetics, School of Basic and Medical Biosciences, King's College London, London, WC2R 2LS, UK
| | - Ana Luisa Gil-Martínez
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, UCL, London WC1N 3BG, UK,Department of Information and Communications Engineering Faculty of Informatics, Espinardo Campus, University of Murcia, Murcia, 30100, Spain
| | - Juan A Botia
- Department of Information and Communications Engineering Faculty of Informatics, Espinardo Campus, University of Murcia, Murcia, 30100, Spain
| | | | - Mina Ryten
- To whom correspondence should be addressed. Tel: +44 2081387617;
| |
Collapse
|
128
|
Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva N, Pyysalo S, Bork P, Jensen L, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2022; 51:D638-D646. [PMID: 36370105 PMCID: PMC9825434 DOI: 10.1093/nar/gkac1000] [Citation(s) in RCA: 923] [Impact Index Per Article: 461.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open
Abstract
Much of the complexity within cells arises from functional and regulatory interactions among proteins. The core of these interactions is increasingly known, but novel interactions continue to be discovered, and the information remains scattered across different database resources, experimental modalities and levels of mechanistic detail. The STRING database (https://string-db.org/) systematically collects and integrates protein-protein interactions-both physical interactions as well as functional associations. The data originate from a number of sources: automated text mining of the scientific literature, computational interaction predictions from co-expression, conserved genomic context, databases of interaction experiments and known complexes/pathways from curated sources. All of these interactions are critically assessed, scored, and subsequently automatically transferred to less well-studied organisms using hierarchical orthology information. The data can be accessed via the website, but also programmatically and via bulk downloads. The most recent developments in STRING (version 12.0) are: (i) it is now possible to create, browse and analyze a full interaction network for any novel genome of interest, by submitting its complement of encoded proteins, (ii) the co-expression channel now uses variational auto-encoders to predict interactions, and it covers two new sources, single-cell RNA-seq and experimental proteomics data and (iii) the confidence in each experimentally derived interaction is now estimated based on the detection method used, and communicated to the user in the web-interface. Furthermore, STRING continues to enhance its facilities for functional enrichment analysis, which are now fully available also for user-submitted genomes.
Collapse
Affiliation(s)
- Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Rebecca Kirsch
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Farrokh Mehryary
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Annika L Gable
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Sampo Pyysalo
- TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
| | - Peer Bork
- Correspondence may also be addressed to Peer Bork. Tel: +49 6221 387 8526; Fax: +49 6221 387 517;
| | - Lars J Jensen
- Correspondence may also be addressed to Lars J. Jensen. Tel: +45 3 532 5025;
| | - Christian von Mering
- To whom correspondence should be addressed. Tel: +41 44 6353147; Fax: +41 44 6356864;
| |
Collapse
|
129
|
Martinez-Gomez L, Cerdán-Vélez D, Abascal F, Tress ML. Origins and Evolution of Human Tandem Duplicated Exon Substitution Events. Genome Biol Evol 2022; 14:6809199. [PMID: 36346145 PMCID: PMC9741552 DOI: 10.1093/gbe/evac162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/25/2022] [Accepted: 10/29/2022] [Indexed: 11/10/2022] Open
Abstract
The mutually exclusive splicing of tandem duplicated exons produces protein isoforms that are identical save for a homologous region that allows for the fine tuning of protein function. Tandem duplicated exon substitution events are rare, yet highly important alternative splicing events. Most events are ancient, their isoforms are highly expressed, and they have significantly more pathogenic mutations than other splice events. Here, we analyzed the physicochemical properties and functional roles of the homologous polypeptide regions produced by the 236 tandem duplicated exon substitutions annotated in the human gene set. We find that the most important structural and functional residues in these homologous regions are maintained, and that most changes are conservative rather than drastic. Three quarters of the isoforms produced from tandem duplicated exon substitution events are tissue-specific, particularly in nervous and cardiac tissues, and tandem duplicated exon substitution events are enriched in functional terms related to structures in the brain and skeletal muscle. We find considerable evidence for the convergent evolution of tandem duplicated exon substitution events in vertebrates, arthropods, and nematodes. Twelve human gene families have orthologues with tandem duplicated exon substitution events in both Drosophila melanogaster and Caenorhabditis elegans. Six of these gene families are ion transporters, suggesting that tandem exon duplication in genes that control the flow of ions into the cell has an adaptive benefit. The ancient origins, the strong indications of tissue-specific functions, and the evidence of convergent evolution suggest that these events may have played important roles in the evolution of animal tissues and organs.
Collapse
Affiliation(s)
- Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| | | |
Collapse
|
130
|
Martin FJ, Amode MR, Aneja A, Austine-Orimoloye O, Azov A, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji S, Bignell A, Boddu S, Branco Lins PR, Brooks L, Ramaraju SB, Charkhchi M, Cockburn A, Da Rin Fiorretto L, Davidson C, Dodiya K, Donaldson S, El Houdaigui B, El Naboulsi T, Fatima R, Giron CG, Genez T, Ghattaoraya GS, Martinez JG, Guijarro C, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Marques-Coelho D, Marugán JC, Merino G, Mirabueno L, Mushtaq A, Hossain S, Ogeh DN, Sakthivel MP, Parker A, Perry M, Piližota I, Prosovetskaia I, Pérez-Silva JG, Salam A, Saraiva-Agostinho N, Schuilenburg H, Sheppard D, Sinha S, Sipos B, Stark W, Steed E, Sukumaran R, Sumathipala D, Suner MM, Surapaneni L, Sutinen K, Szpak M, Tricomi F, Urbina-Gómez D, Veidenberg A, Walsh T, Walts B, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley G, Loveland J, Moore B, Mudge J, Tate J, Thybert D, Trevanion S, Winterbottom A, Frankish A, Hunt SE, Ruffier M, Cunningham F, Dyer S, Finn R, Howe K, Harrison PW, Yates AD, Flicek P. Ensembl 2023. Nucleic Acids Res 2022; 51:D933-D941. [PMID: 36318249 PMCID: PMC9825606 DOI: 10.1093/nar/gkac958] [Citation(s) in RCA: 186] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/06/2022] [Accepted: 10/14/2022] [Indexed: 11/22/2022] Open
Abstract
Ensembl (https://www.ensembl.org) has produced high-quality genomic resources for vertebrates and model organisms for more than twenty years. During that time, our resources, services and tools have continually evolved in line with both the publicly available genome data and the downstream research and applications that utilise the Ensembl platform. In recent years we have witnessed a dramatic shift in the genomic landscape. There has been a large increase in the number of high-quality reference genomes through global biodiversity initiatives. In parallel, there have been major advances towards pangenome representations of higher species, where many alternative genome assemblies representing different breeds, cultivars, strains and haplotypes are now available. In order to support these efforts and accelerate downstream research, it is our goal at Ensembl to create high-quality annotations, tools and services for species across the tree of life. Here, we report our resources for popular reference genomes, the dramatic growth of our annotations (including haplotypes from the first human pangenome graphs), updates to the Ensembl Variant Effect Predictor (VEP), interactive protein structure predictions from AlphaFold DB, and the beta release of our new website.
Collapse
Affiliation(s)
- Fergal J Martin
- To whom correspondence should be addressed. Tel: +44 1223 49 44 44;
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Alisha Aneja
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Alexander Cockburn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Luca Da Rin Fiorretto
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Gurpreet S Ghattaoraya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Cristi Guijarro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Diego Marques-Coelho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - José Carlos Marugán
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Helen Schuilenburg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Ranjit Sukumaran
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Michal Szpak
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Brandon Walts
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Natalie Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - David Thybert
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Kevin L Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, Cambridge, UK
| |
Collapse
|
131
|
Clinical variant interpretation and biologically relevant reference transcripts. NPJ Genom Med 2022; 7:59. [PMID: 36257961 PMCID: PMC9579139 DOI: 10.1038/s41525-022-00329-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/29/2022] [Indexed: 12/03/2022] Open
Abstract
Clinical variant interpretation is highly dependent on the choice of reference transcript. Although the longest transcript has traditionally been chosen as the reference, APPRIS principal and MANE Select transcripts, biologically supported reference sequences, are now available. In this study, we show that MANE Select and APPRIS principal transcripts are the best reference transcripts for clinical variation. APPRIS principal and MANE Select transcripts capture almost all ClinVar pathogenic variants, and they are particularly powerful over the 94% of coding genes in which they agree. We find that a vanishingly small number of ClinVar pathogenic variants affect alternative protein products. Alternative isoforms that are likely to be clinically relevant can be predicted using TRIFID scores, the highest scoring alternative transcripts are almost 700 times more likely to house pathogenic variants. We believe that APPRIS, MANE and TRIFID are essential tools for clinical variant interpretation.
Collapse
|
132
|
Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, Bruford EA. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res 2022; 51:D1003-D1009. [PMID: 36243972 PMCID: PMC9825485 DOI: 10.1093/nar/gkac888] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 09/28/2022] [Accepted: 10/03/2022] [Indexed: 01/30/2023] Open
Abstract
The HUGO Gene Nomenclature Committee (HGNC) assigns unique symbols and names to human genes. The HGNC database (www.genenames.org) currently contains over 43 000 approved gene symbols, over 19 200 of which are assigned to protein-coding genes, 14 000 to pseudogenes and nearly 9000 to non-coding RNA genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC nomenclature advisors and links to related genomic, clinical, and proteomic information. Here, we describe updates to our resource, including improvements to our search facility and new download features.
Collapse
Affiliation(s)
- Ruth L Seal
- To whom correspondence should be addressed. Tel: +44 1223 494444; Fax: +44 1223 494446;
| | - Bryony Braschi
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Kristian Gray
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge CB2 0PT, UK
| | - Tamsin E M Jones
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Susan Tweedie
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Liora Haim-Vilmovsky
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Elspeth A Bruford
- HUGO Gene Nomenclature Committee, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK,Department of Haematology, University of Cambridge School of Clinical Medicine, Cambridge CB2 0PT, UK
| |
Collapse
|
133
|
Tung KF, Lin WC. TEx-MST: tissue expression profiles of MANE select transcripts. Database (Oxford) 2022; 2022:6726258. [PMID: 36170113 PMCID: PMC9518666 DOI: 10.1093/database/baac089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/16/2022] [Accepted: 09/23/2022] [Indexed: 12/05/2022]
Abstract
Recently, a new reference transcript dataset [Matched Annotation from the NCBI and EMBL-EBI (MANE) select] was released by NCBI and EMBL-EBI to make available a new unified representative transcript for human protein-coding genes. While the main purpose of MANE project is to provide a harmonized gene and transcript information standard, there is no explicit tissue expression information about these MANE select transcripts. In this report, we tried to provide useful expression profiles of MANE select transcripts in various normal human tissues to allow further interrogation of their molecular modulations and functional significance. We obtained the new V9 transcript expression dataset from the Genotype-Tissue Expression (GTEx) web portal. This new GTEx dataset, based on a long-read sequencing platform, affords better assessment of the expression of alternative spliced transcripts. This tissue expression profiles of MANE select transcripts (TEx-MST) database not only provides the basic information of MANE select transcripts but also tissue expression profiles on alternative transcripts in protein-coding genes. Users can initiate the interrogation by gene symbol searches or by browsing the MANE genes with various criteria (such as genome locations or expression rankings). We further utilized the GENCODE biotype feature to identify the top-ranked protein-coding transcripts by choosing the most expressed protein-coding transcripts from GTEx datasets (both V8 and V9 datasets). In summary, there are 18 083 genes matched between MANE and GTEx. Among them, 13 245 MANE select transcripts matched with the top-ranked protein-coding transcripts in GTEx V9 dataset, which underlined the dominate expression of MANE select transcripts. This TEx-MST web bioinformatic database provides a visualized user interface for the normal tissue expression patterns of MANE select transcripts using the newly released GTEx dataset. Database URL: TEx-MST is available at https://texmst.ibms.sinica.edu.tw/
Collapse
Affiliation(s)
- Kuo-Feng Tung
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
| | - Wen-chang Lin
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan, R.O.C
- Institute of Biomedical Informatics, National Yang-Ming Chiao Tung University , Taipei 112, Taiwan, R.O.C
| |
Collapse
|
134
|
Seaby EG, Baralle D, Rehm HL, O'Donnell-Luria A, Ennis S. Response to Ramos et al. Genet Med 2022; 24:2593-2594. [PMID: 36121441 DOI: 10.1016/j.gim.2022.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 10/14/2022] Open
Affiliation(s)
- Eleanor G Seaby
- Faculty of Medicine, University of Southampton, Southampton, United Kingdom; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA.
| | - Diana Baralle
- Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| | - Heidi L Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Center for Genomic Medicine, Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA
| | - Anne O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA
| | - Sarah Ennis
- Faculty of Medicine, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
135
|
Pozo F, Rodriguez JM, Martínez Gómez L, Vázquez J, Tress ML. APPRIS principal isoforms and MANE Select transcripts define reference splice variants. Bioinformatics 2022; 38:ii89-ii94. [PMID: 36124785 PMCID: PMC9486585 DOI: 10.1093/bioinformatics/btac473] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants. This study compares the longest isoforms, MANE Select transcripts, APPRIS principal isoforms, and expression data, and aims to determine which method is best for selecting biological important reference splice variants for large-scale analyses. RESULTS Proteomics analyses and human genetic variation data suggest that most coding genes have a single main protein isoform. We show that APPRIS principal isoforms and MANE Select transcripts best describe these main cellular isoforms, and find that using the longest splice variant as the representative is a poor strategy. Exons unique to the longest splice isoforms are not under selective pressure, and so are unlikely to be functionally relevant. Expression data are also a poor means of selecting the main splice variant. APPRIS principal and MANE Select exons are under purifying selection, while exons specific to alternative transcripts are not. There are MANE and APPRIS representatives for almost 95% of genes, and where they agree they are particularly effective, coinciding with the main proteomics isoform for over 98.2% of genes. AVAILABILITY AND IMPLEMENTATION APPRIS principal isoforms for human, mouse and other model species can be downloaded from the APPRIS database (https://appris.bioinfo.cnio.es), GENCODE genes (https://www.gencodegenes.org/) and the Ensembl website (https://www.ensembl.org). MANE Select transcripts for the human reference set are available from the Ensembl, GENCODE and RefSeq databases (https://www.ncbi.nlm.nih.gov/refseq/). Lists of splice variants where MANE and APPRIS coincide are available from the APPRIS database. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Jesús Vázquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain,CIBER de Investigaciones Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | | |
Collapse
|
136
|
Yang Y, Zhao J, Zeng L, Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int J Mol Sci 2022; 23:ijms231810798. [PMID: 36142711 PMCID: PMC9505338 DOI: 10.3390/ijms231810798] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 09/12/2022] [Accepted: 09/13/2022] [Indexed: 11/16/2022] Open
Abstract
The stability of proteins is an essential property that has several biological implications. Knowledge about protein stability is important in many ways, ranging from protein purification and structure determination to stability in cells and biotechnological applications. Experimental determination of thermal stabilities has been tedious and available data have been limited. The introduction of limited proteolysis and mass spectrometry approaches has facilitated more extensive cellular protein stability data production. We collected melting temperature information for 34,913 proteins and developed a machine learning predictor, ProTstab2, by utilizing a gradient boosting algorithm after testing seven algorithms. The method performance was assessed on a blind test data set and showed a Pearson correlation coefficient of 0.753 and root mean square error of 7.005. Comparison to previous methods indicated that ProTstab2 had superior performance. The method is fast, so it was applied to predict and compare the stabilities of all proteins in human, mouse, and zebrafish proteomes for which experimental data were not determined. The tool is freely available.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Jianjun Zhao
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Lianjie Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
- Correspondence:
| |
Collapse
|
137
|
Aoi Y, Shah AP, Ganesan S, Soliman SHA, Cho BK, Goo YA, Kelleher NL, Shilatifard A. SPT6 functions in transcriptional pause/release via PAF1C recruitment. Mol Cell 2022; 82:3412-3423.e5. [PMID: 35973425 PMCID: PMC9714687 DOI: 10.1016/j.molcel.2022.06.037] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 05/11/2022] [Accepted: 06/29/2022] [Indexed: 01/24/2023]
Abstract
It is unclear how various factors functioning in the transcriptional elongation by RNA polymerase II (RNA Pol II) cooperatively regulate pause/release and productive elongation in living cells. Using an acute protein-depletion approach, we report that SPT6 depletion results in the release of paused RNA Pol II into gene bodies through an impaired recruitment of PAF1C. Short genes demonstrate a release with increased mature transcripts, whereas long genes are released but fail to yield mature transcripts, due to a reduced processivity resulting from both SPT6 and PAF1C loss. Unexpectedly, SPT6 depletion causes an association of NELF with the elongating RNA Pol II on gene bodies, without any observed functional significance on transcriptional elongation pattern, arguing against a role for NELF in keeping RNA Pol II in the paused state. Furthermore, SPT6 depletion impairs heat-shock-induced pausing, pointing to a role for SPT6 in regulating RNA Pol II pause/release through PAF1C recruitment.
Collapse
Affiliation(s)
- Yuki Aoi
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Avani P Shah
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Sheetal Ganesan
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Shimaa H A Soliman
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Byoung-Kyu Cho
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Proteomics Center of Excellence, Northwestern University, Evanston, IL 60611, USA
| | - Young Ah Goo
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Proteomics Center of Excellence, Northwestern University, Evanston, IL 60611, USA
| | - Neil L Kelleher
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA; Proteomics Center of Excellence, Northwestern University, Evanston, IL 60611, USA
| | - Ali Shilatifard
- Simpson Querrey Institute for Epigenetics, Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA.
| |
Collapse
|
138
|
Kapral TH, Farnhammer F, Zhao W, Lu ZJ, Zagrovic B. Widespread autogenous mRNA-protein interactions detected by CLIP-seq. Nucleic Acids Res 2022; 50:9984-9999. [PMID: 36107779 PMCID: PMC9508846 DOI: 10.1093/nar/gkac756] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 07/12/2022] [Accepted: 08/24/2022] [Indexed: 02/02/2023] Open
Abstract
Autogenous interactions between mRNAs and the proteins they encode are implicated in cellular feedback-loop regulation, but their extent and mechanistic foundation are unclear. It was recently hypothesized that such interactions may be common, reflecting the role of intrinsic nucleobase-amino acid affinities in shaping the genetic code's structure. Here we analyze a comprehensive set of CLIP-seq experiments involving multiple protocols and report on widespread autogenous interactions across different organisms. Specifically, 230 of 341 (67%) studied RNA-binding proteins (RBPs) interact with their own mRNAs, with a heavy enrichment among high-confidence hits and a preference for coding sequence binding. We account for different confounding variables, including physical (overexpression and proximity during translation), methodological (difference in CLIP protocols, peak callers and cell types) and statistical (treatment of null backgrounds). In particular, we demonstrate a high statistical significance of autogenous interactions by sampling null distributions of fixed-margin interaction matrices. Furthermore, we study the dependence of autogenous binding on the presence of RNA-binding motifs and structured domains in RBPs. Finally, we show that intrinsic nucleobase-amino acid affinities favor co-aligned binding between mRNA coding regions and the proteins they encode. Our results suggest a central role for autogenous interactions in RBP regulation and support the possibility of a fundamental connection between coding and binding.
Collapse
Affiliation(s)
- Thomas H Kapral
- Departmet of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, A-1030, Austria,Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, A-1030, Austria
| | - Fiona Farnhammer
- Departmet of Structural and Computational Biology, Max Perutz Labs, University of Vienna, Vienna, A-1030, Austria,Division of Metabolism, University Children's Hospital Zurich and Children's Research Center, University of Zurich, Zurich, 8032, Switzerland,Division of Oncology, University Children's Hospital Zurich and Children's Research Center, University of Zurich, Zurich, 8032, Switzerland
| | - Weihao Zhao
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Zhi J Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Bojan Zagrovic
- To whom correspondence should be addressed. Tel: +43 1 4277 52271; Fax: +43 1 4277 9522;
| |
Collapse
|
139
|
Bi-allelic loss-of-function variants in PPFIBP1 cause a neurodevelopmental disorder with microcephaly, epilepsy, and periventricular calcifications. Am J Hum Genet 2022; 109:1421-1435. [PMID: 35830857 PMCID: PMC9388382 DOI: 10.1016/j.ajhg.2022.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 06/13/2022] [Indexed: 02/06/2023] Open
Abstract
PPFIBP1 encodes for the liprin-β1 protein, which has been shown to play a role in neuronal outgrowth and synapse formation in Drosophila melanogaster. By exome and genome sequencing, we detected nine ultra-rare homozygous loss-of-function variants in 16 individuals from 12 unrelated families. The individuals presented with moderate to profound developmental delay, often refractory early-onset epilepsy, and progressive microcephaly. Further common clinical findings included muscular hyper- and hypotonia, spasticity, failure to thrive and short stature, feeding difficulties, impaired vision, and congenital heart defects. Neuroimaging revealed abnormalities of brain morphology with leukoencephalopathy, ventriculomegaly, cortical abnormalities, and intracranial periventricular calcifications as major features. In a fetus with intracranial calcifications, we identified a rare homozygous missense variant that by structural analysis was predicted to disturb the topology of the SAM domain region that is essential for protein-protein interaction. For further insight into the effects of PPFIBP1 loss of function, we performed automated behavioral phenotyping of a Caenorhabditis elegans PPFIBP1/hlb-1 knockout model, which revealed defects in spontaneous and light-induced behavior and confirmed resistance to the acetylcholinesterase inhibitor aldicarb, suggesting a defect in the neuronal presynaptic zone. In conclusion, we establish bi-allelic loss-of-function variants in PPFIBP1 as a cause of an autosomal recessive severe neurodevelopmental disorder with early-onset epilepsy, microcephaly, and periventricular calcifications.
Collapse
|
140
|
Ellingford JM, Ahn JW, Bagnall RD, Baralle D, Barton S, Campbell C, Downes K, Ellard S, Duff-Farrier C, FitzPatrick DR, Greally JM, Ingles J, Krishnan N, Lord J, Martin HC, Newman WG, O’Donnell-Luria A, Ramsden SC, Rehm HL, Richardson E, Singer-Berk M, Taylor JC, Williams M, Wood JC, Wright CF, Harrison SM, Whiffin N. Recommendations for clinical interpretation of variants found in non-coding regions of the genome. Genome Med 2022; 14:73. [PMID: 35850704 PMCID: PMC9295495 DOI: 10.1186/s13073-022-01073-3] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 06/16/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts. METHODS We convened a panel of nine clinical and research scientists with wide-ranging expertise in clinical variant interpretation, with specific experience in variants within non-coding regions. This panel discussed and refined an initial draft of the guidelines which were then extensively tested and reviewed by external groups. RESULTS We discuss considerations specifically for variants in non-coding regions of the genome. We outline how to define candidate regulatory elements, highlight examples of mechanisms through which non-coding region variants can lead to penetrant monogenic disease, and outline how existing guidelines can be adapted for the interpretation of these variants. CONCLUSIONS These recommendations aim to increase the number and range of non-coding region variants that can be clinically interpreted, which, together with a compatible phenotype, can lead to new diagnoses and catalyse the discovery of novel disease mechanisms.
Collapse
Affiliation(s)
- Jamie M. Ellingford
- grid.5379.80000000121662407Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT UK ,grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK ,grid.498322.6Genomics England, London, UK
| | - Joo Wook Ahn
- grid.24029.3d0000 0004 0383 8386Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Richard D. Bagnall
- grid.1013.30000 0004 1936 834XAgnes Ginges Centre for Molecular Cardiology at Centenary Institute, University of Sydney, Sydney, Australia
| | - Diana Baralle
- grid.5491.90000 0004 1936 9297School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK ,grid.430506.40000 0004 0465 4079Wessex Clinical Genetics Service, University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Stephanie Barton
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Chris Campbell
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Kate Downes
- grid.24029.3d0000 0004 0383 8386Cambridge Genomics Laboratory, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, UK
| | - Sian Ellard
- grid.8391.30000 0004 1936 8024Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK ,grid.419309.60000 0004 0495 6261South West Genomic Laboratory Hub, Exeter Genomic Laboratory, Royal Devon and Exeter NHS Foundation Trust, Exeter, UK
| | - Celia Duff-Farrier
- grid.418484.50000 0004 0380 7221South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - David R. FitzPatrick
- grid.417068.c0000 0004 0624 9907MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, UK
| | - John M. Greally
- grid.251993.50000000121791997Department of Pediatrics, Division of Pediatric Genetic, Medicine, Children’s Hospital at Montefiore/Montefiore Medical Center/Albert, Einstein College of Medicine, Bronx, NY USA
| | - Jodie Ingles
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Neesha Krishnan
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Jenny Lord
- grid.5491.90000 0004 1936 9297School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Hilary C. Martin
- grid.10306.340000 0004 0606 5382Human Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - William G. Newman
- grid.5379.80000000121662407Division of Evolution, Infection and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, M13 9PT UK ,grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Anne O’Donnell-Luria
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.2515.30000 0004 0378 8438Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA USA ,grid.32224.350000 0004 0386 9924Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Simon C. Ramsden
- grid.498924.a0000 0004 0430 9101Manchester Centre for Genomic Medicine, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, M13 9WL UK
| | - Heidi L. Rehm
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.32224.350000 0004 0386 9924Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA USA
| | - Ebony Richardson
- grid.1005.40000 0004 4902 0432Centre for Population Genomics, Garvan Institute of Medical Research, and UNSW Sydney, Sydney, Australia ,grid.1058.c0000 0000 9442 535XCentre for Population Genomics, Murdoch Children’s Research Institute, Melbourne, Australia
| | - Moriel Singer-Berk
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Jenny C. Taylor
- grid.4991.50000 0004 1936 8948National Institute for Health Research Oxford Biomedical Research Centre, Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK ,grid.4991.50000 0004 1936 8948Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| | - Maggie Williams
- grid.418484.50000 0004 0380 7221South West NHS Genomic Laboratory Hub, Bristol Genetics Laboratory, North Bristol NHS Trust, Bristol, UK
| | - Jordan C. Wood
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA
| | - Caroline F. Wright
- grid.8391.30000 0004 1936 8024Institute of Biomedical and Clinical Science, University of Exeter Medical School, Exeter, UK
| | - Steven M. Harrison
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.465138.d0000 0004 0455 211XAmbry Genetics, Aliso Viejo, CA USA
| | - Nicola Whiffin
- grid.66859.340000 0004 0546 1623Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA USA ,grid.4991.50000 0004 1936 8948Wellcome Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN UK
| |
Collapse
|
141
|
Smith AP, Creagh EM. Caspase-4 and -5 Biology in the Pathogenesis of Inflammatory Bowel Disease. Front Pharmacol 2022; 13:919567. [PMID: 35712726 PMCID: PMC9194562 DOI: 10.3389/fphar.2022.919567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 05/11/2022] [Indexed: 12/14/2022] Open
Abstract
Inflammatory bowel disease (IBD) is a chronic relapsing inflammatory disease of the gastrointestinal tract, associated with high levels of inflammatory cytokine production. Human caspases-4 and -5, and their murine ortholog caspase-11, are essential components of the innate immune pathway, capable of sensing and responding to intracellular lipopolysaccharide (LPS), a component of Gram-negative bacteria. Following their activation by LPS, these caspases initiate potent inflammation by causing pyroptosis, a lytic form of cell death. While this pathway is essential for host defence against bacterial infection, it is also negatively associated with inflammatory pathologies. Caspases-4/-5/-11 display increased intestinal expression during IBD and have been implicated in chronic IBD inflammation. This review discusses the current literature in this area, identifying links between inflammatory caspase activity and IBD in both human and murine models. Differences in the expression and functions of caspases-4, -5 and -11 are discussed, in addition to mechanisms of their activation, function and regulation, and how these mechanisms may contribute to the pathogenesis of IBD.
Collapse
Affiliation(s)
- Aoife P Smith
- School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Ireland
| | - Emma M Creagh
- School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
142
|
Shirota M, Kinoshita K. Current status and future perspectives of the evaluation of missense variants by using three-dimensional structures of proteins. Biophys Physicobiol 2022; 19:e190023. [PMID: 36071878 PMCID: PMC9402263 DOI: 10.2142/biophysico.bppb-v19.0023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 07/12/2022] [Indexed: 12/01/2022] Open
|