1
|
Percudani R, De Rito C. Predicting Protein Function in the AI and Big Data Era. Biochemistry 2025. [PMID: 40380914 DOI: 10.1021/acs.biochem.5c00186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2025]
Abstract
It is an exciting time for researchers working to link proteins to their functions. Most techniques for extracting functional information from genomic sequences were developed several years ago, with major progress driven by the availability of big data. Now, groundbreaking advances in deep-learning and AI-based methods have enriched protein databases with three-dimensional information and offer the potential to predict biochemical properties and biomolecular interactions, providing key functional insights. This progress is expected to increase the proportion of functionally bright proteins in databases and deepen our understanding of life at the molecular level.
Collapse
Affiliation(s)
- Riccardo Percudani
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy
| | - Carlo De Rito
- Department of Chemistry, Life Sciences and Environmental Sustainability, University of Parma, 43124 Parma, Italy
| |
Collapse
|
2
|
Humphrey J, Brophy E, Kosoy R, Zeng B, Coccia E, Mattei D, Ravi A, Naito T, Efthymiou AG, Navarro E, De Sanctis C, Flores-Almazan V, Muller BZ, Snijders GJLJ, Allan A, Münch A, Kitata RB, Kleopoulos SP, Argyriou S, Malakates P, Psychogyiou K, Shao Z, Francoeur N, Tsai CF, Gritsenko MA, Monroe ME, Paurus VL, Weitz KK, Shi T, Sebra R, Liu T, de Witte LD, Goate AM, Bennett DA, Haroutunian V, Hoffman GE, Fullard JF, Roussos P, Raj T. Long-read RNA sequencing atlas of human microglia isoforms elucidates disease-associated genetic regulation of splicing. Nat Genet 2025; 57:604-615. [PMID: 40033057 DOI: 10.1038/s41588-025-02099-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/23/2025] [Indexed: 03/05/2025]
Abstract
Microglia, the innate immune cells of the central nervous system, have been genetically implicated in multiple neurodegenerative diseases. Mapping the genetics of gene expression in human microglia has identified several loci associated with disease-associated genetic variants in microglia-specific regulatory elements. However, identifying genetic effects on splicing is challenging because of the use of short sequencing reads. Here, we present the isoform-centric microglia genomic atlas (isoMiGA), which leverages long-read RNA sequencing to identify 35,879 novel microglia isoforms. We show that these isoforms are involved in stimulation response and brain region specificity. We then quantified the expression of both known and novel isoforms in a multi-ancestry meta-analysis of 555 human microglia short-read RNA sequencing samples from 391 donors, and found associations with genetic risk loci in Alzheimer's and Parkinson's disease. We nominate several loci that may act through complex changes in isoform and splice-site usage.
Collapse
Affiliation(s)
- Jack Humphrey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Erica Brophy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roman Kosoy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Biao Zeng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elena Coccia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniele Mattei
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashvin Ravi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Tatsuhiko Naito
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Anastasia G Efthymiou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elisa Navarro
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Biochemistry and Molecular Biology, Universidad Complutense de Madrid, Madrid, Spain
- Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
- Instituto Ramon y Cajal de Investigacion Sanitaria (IRYCIS), Madrid, Spain
| | - Claudia De Sanctis
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Pathology, Department of Artificial Intelligence & Human Health, Neuropathology Brain Bank & Research CoRE, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Victoria Flores-Almazan
- Department of Pathology, Department of Artificial Intelligence & Human Health, Neuropathology Brain Bank & Research CoRE, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Benjamin Z Muller
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gijsje J L J Snijders
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Amanda Allan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexandra Münch
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Reta Birhanu Kitata
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Steven P Kleopoulos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Stathis Argyriou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Periklis Malakates
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Konstantina Psychogyiou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Zhiping Shao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nancy Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chia-Feng Tsai
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Vanessa L Paurus
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Karl K Weitz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Tujin Shi
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
| | - Lot D de Witte
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alison M Goate
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Vahram Haroutunian
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E Hoffman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - John F Fullard
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA.
| | - Towfique Raj
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Ronald M. Loeb Center for Alzheimer's Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
3
|
Loving RK, Sullivan DK, Booeshagi AS, Reese F, Rebboah E, Sakr J, Rezaie N, Liang HY, Filimban G, Kawauchi S, Oakes C, Trout D, Williams BA, MacGregor G, Wold BJ, Mortazavi A, Pachter L. Long-read sequencing transcriptome quantification with lr-kallisto. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.07.19.604364. [PMID: 39071335 PMCID: PMC11275803 DOI: 10.1101/2024.07.19.604364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
RNA abundance quantification has become routine and affordable thanks to high-throughput "short-read" technologies that provide accurate molecule counts at the gene level. Similarly accurate and affordable quantification of definitive full-length, transcript isoforms has remained a stubborn challenge, despite its obvious biological significance across a wide range of problems. "Long-read" sequencing platforms now produce data-types that can, in principle, drive routine definitive isoform quantification. However some particulars of contemporary long-read datatypes, together with isoform complexity and genetic variation, present bioinformatic challenges. We show here, using ONT data, that fast and accurate quantification of long-read data is possible and that it is improved by exome capture. To perform quantifications we developed lr-kallisto, which adapts the kallisto bulk and single-cell RNA-seq quantification methods for long-read technologies.
Collapse
Affiliation(s)
- Rebekah K. Loving
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, USA
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - A. Sina Booeshagi
- Department of Bioengineering, University of California, Berkeley, Berkeley, USA
| | - Fairlie Reese
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Elisabeth Rebboah
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Jasmine Sakr
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Narges Rezaie
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Heidi Y. Liang
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Ghassan Filimban
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Shimako Kawauchi
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
| | - Conrad Oakes
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Diane Trout
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Brian A. Williams
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Grant MacGregor
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
| | - Barbara J. Wold
- Division of Biology and Biological Engineering, California Institute of Technology, USA
| | - Ali Mortazavi
- Developmental and Cell Biology, University of California Irvine, Irvine, USA
- Center for Complex Biological Systems, University of California Irvine, Irvine, USA
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, USA
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, USA
| |
Collapse
|
4
|
García-Ruiz S, Zhang D, Gustavsson EK, Rocamora-Perez G, Grant-Peters M, Fairbrother-Browne A, Reynolds RH, Brenton JW, Gil-Martínez AL, Chen Z, Rio DC, Botia JA, Guelfi S, Collado-Torres L, Ryten M. Splicing accuracy varies across human introns, tissues, age and disease. Nat Commun 2025; 16:1068. [PMID: 39870615 PMCID: PMC11772838 DOI: 10.1038/s41467-024-55607-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 12/17/2024] [Indexed: 01/29/2025] Open
Abstract
Alternative splicing impacts most multi-exonic human genes. Inaccuracies during this process may have an important role in ageing and disease. Here, we investigate splicing accuracy using RNA-sequencing data from >14k control samples and 40 human body sites, focusing on split reads partially mapping to known transcripts in annotation. We show that splicing inaccuracies occur at different rates across introns and tissues and are affected by the abundance of core components of the spliceosome assembly and its regulators. We find that age is positively correlated with a global decline in splicing fidelity, mostly affecting genes implicated in neurodegenerative diseases. We find support for the latter by observing a genome-wide increase in splicing inaccuracies in samples affected with Alzheimer's disease as compared to neurologically normal individuals. In this work, we provide an in-depth characterisation of splicing accuracy, with implications for our understanding of the role of inaccuracies in ageing and neurodegenerative disorders.
Collapse
Affiliation(s)
- S García-Ruiz
- UK Dementia Research Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, United Kingdom
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - D Zhang
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
| | - E K Gustavsson
- UK Dementia Research Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - G Rocamora-Perez
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
| | - M Grant-Peters
- UK Dementia Research Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - A Fairbrother-Browne
- UK Dementia Research Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - R H Reynolds
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
| | - J W Brenton
- UK Dementia Research Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
| | - A L Gil-Martínez
- Department of Clinical and Movement Neuroscience, Queen Square Institute of Neurology, UCL, London, United Kingdom
| | - Z Chen
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom
- Department of Clinical and Movement Neuroscience, Queen Square Institute of Neurology, UCL, London, United Kingdom
- The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, United Kingdom
| | - D C Rio
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, 94720, USA
| | - J A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - S Guelfi
- Department of Clinical and Movement Neuroscience, Queen Square Institute of Neurology, UCL, London, United Kingdom
| | - L Collado-Torres
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - M Ryten
- UK Dementia Research Institute, University of Cambridge, Cambridge, United Kingdom.
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom.
- Department of Genetics and Genomic Medicine Research & Teaching, UCL GOS Institute of Child Health, London, United Kingdom.
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, United Kingdom.
- Aligning Science Across Parkinson's (ASAP) Collaborative Research Network, Chevy Chase, MD, 20815, USA.
| |
Collapse
|
5
|
Margasyuk S, Kuznetsova A, Zavileyskiy L, Vlasenok M, Skvortsov D, Pervouchine D. Human introns contain conserved tissue-specific cryptic poison exons. NAR Genom Bioinform 2024; 6:lqae163. [PMID: 39664813 PMCID: PMC11632617 DOI: 10.1093/nargab/lqae163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 10/10/2024] [Accepted: 11/10/2024] [Indexed: 12/13/2024] Open
Abstract
Eukaryotic cells express a large number of transcripts from a single gene due to alternative splicing. Despite hundreds of thousands of splice isoforms being annotated in databases, it has been reported that the current exon catalogs remain incomplete. At the same time, introns of human protein-coding (PC) genes contain a large number of evolutionarily conserved elements with unknown function. Here, we explore the possibility that some of them represent cryptic exons that are expressed in rare conditions. We identified a group of cryptic exons that are similar to the annotated exons in terms of evolutionary conservation and RNA-seq read coverage in the Genotype-Tissue Expression dataset. Most of them were poison, i.e. generated an nonsense-mediated decay (NMD) isoform upon inclusion, and many showed signs of tissue-specific and cancer-specific expression and regulation. We performed RNA-seq in A549 cell line treated with cycloheximide to inactivate NMD and confirmed using quantitative polymerase chain reaction that seven of eight exons tested are, indeed, expressed. This study shows that introns of human PC genes contain cryptic poison exons, which reside in conserved intronic regions and remain not fully annotated due to insufficient representation in RNA-seq libraries.
Collapse
Affiliation(s)
- Sergey Margasyuk
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Antonina Kuznetsova
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Lev Zavileyskiy
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Maria Vlasenok
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| | - Dmitry Skvortsov
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
- Faculty of Chemistry, Moscow State University, Ul Kolmogorova, 1, 119991, Moscow, Russia
| | - Dmitri D Pervouchine
- Center for Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Bolshoy Bulvar, 30, 121205, Moscow, Russia
| |
Collapse
|
6
|
Tang L, Xu D, Luo L, Ma W, He X, Diao Y, Ke R, Kapranov P. A novel human protein-coding locus identified using a targeted RNA enrichment technique. BMC Biol 2024; 22:273. [PMID: 39593153 PMCID: PMC11590353 DOI: 10.1186/s12915-024-02069-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 11/12/2024] [Indexed: 11/28/2024] Open
Abstract
BACKGROUND Accurate and comprehensive genomic annotation, including the full list of protein-coding genes, is vital for understanding the molecular mechanisms of human biology. We have previously shown that the genome contains a multitude of yet hidden functional exons and transcripts, some of which might represent novel mRNAs. These results resonate with those from other groups and strongly argue that two decades after the completion of the first draft of the human genome sequence, the current annotation of human genes and transcripts remains far from being complete. RESULTS Using a targeted RNA enrichment technique, we showed that one of the novel functional exons previously discovered by us and currently annotated as part of a long non-coding RNA, is actually a part of a novel protein-coding gene, InSETG-4, which encodes a novel human protein with no known homologs or motifs. We found that InSETG-4 is induced by various DNA-damaging agents across multiple cell types and therefore might represent a novel component of DNA damage response. Despite its low abundance in bulk cell populations, InSETG-4 exhibited expression restricted to a small fraction of cells, as demonstrated by the amplification-based single-molecule fluorescence in situ hybridization (asmFISH) analysis. CONCLUSIONS This study argues that yet undiscovered human protein-coding genes exist and provides an example of how targeted RNA enrichment techniques can help to fill this major gap in our knowledge of the information encoded in the human genome.
Collapse
Affiliation(s)
- Lu Tang
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Dongyang Xu
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China.
| | - Lingcong Luo
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Weiyan Ma
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Xiaojie He
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Yong Diao
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China
| | - Rongqin Ke
- School of Medicine, Huaqiao University, 668 Jimei Road, Xiamen, 361021, China.
| | - Philipp Kapranov
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, 361102, China.
| |
Collapse
|
7
|
Zang XC, Chen K, Khan IM, Shao M. Augmenting Transcriptome Annotations through the Lens of Splicing Evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.04.621892. [PMID: 39574730 PMCID: PMC11580973 DOI: 10.1101/2024.11.04.621892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/01/2024]
Abstract
Alternative splicing (AS) is a ubiquitous mechanism in eukaryotes. It is estimated that 90% of human genes are alternatively spliced. Despite enormous efforts, transcriptome annotations remain, nevertheless, incomplete. Conventional means of annotation were largely driven by experimental data such as RNA-seq and protein sequences, while little insight was shed on understanding transcriptomes and alternative splicings from the perspective of evolution. This study addresses this critical gap by presenting TENNIS (Transcript EvolutioN for New Isoform Splicing), an evolution-based model to predict unannotated isoforms and refine existing annotations without requiring additional data. The model of TENNIS is based on two minimal premises-AS isoforms evolve sequentially from existing isoforms, and each evolutionary step involves a single AS event. We formulate the identification of missing transcripts as an optimization problem and parsimoniously find the minimal number of novel transcripts. Our analysis showed approximately 80% of multi-transcript groups from six transcriptome annotations satisfy our evolutionary model. At a high confidence level, 40% of isoforms predicted by TENNIS were validated by deep long-read RNA-seq. In a simulated incomplete annotation scenario, TENNIS dramatically outperforms two randomized baseline approaches by a 2.25-3 fold-change in precision or a 3.5-3.9 fold-change in recall, after controlling the same level of recall or precision of the baseline methods. These results demonstrate that TENNIS effectively identifies missing transcripts by complying with minimal propositions, offering a powerful approach for transcriptome augmentations through the lens of alternative splicing evolutions. TENNIS is freely available at https://github.com/Shao-Group/tennis .
Collapse
Affiliation(s)
- Xiaofei Carl Zang
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Ke Chen
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Irtesam Mahmud Khan
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | - Mingfu Shao
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
8
|
Haj Abdullah Alieh L, Cardoso de Toledo B, Hadarovich A, Toth-Petroczy A, Calegari F. Characterization of alternative splicing during mammalian brain development reveals the extent of isoform diversity and potential effects on protein structural changes. Biol Open 2024; 13:bio061721. [PMID: 39387301 PMCID: PMC11554263 DOI: 10.1242/bio.061721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 09/09/2024] [Indexed: 10/15/2024] Open
Abstract
Regulation of gene expression is critical for fate commitment of stem and progenitor cells during tissue formation. In the context of mammalian brain development, a plethora of studies have described how changes in the expression of individual genes characterize cell types across ontogeny and phylogeny. However, little attention has been paid to the fact that different transcripts can arise from any given gene through alternative splicing (AS). Considered a key mechanism expanding transcriptome diversity during evolution, assessing the full potential of AS on isoform diversity and protein function has been notoriously difficult. Here, we capitalize on the use of a validated reporter mouse line to isolate neural stem cells, neurogenic progenitors and neurons during corticogenesis and combine the use of short- and long-read sequencing to reconstruct the full transcriptome diversity characterizing neurogenic commitment. Extending available transcriptional profiles of the mammalian brain by nearly 50,000 new isoforms, we found that neurogenic commitment is characterized by a progressive increase in exon inclusion resulting in the profound remodeling of the transcriptional profile of specific cortical cell types. Most importantly, we computationally infer the biological significance of AS on protein structure by using AlphaFold2, revealing how radical protein conformational changes can arise from subtle changes in isoforms sequence. Together, our study reveals that AS has a greater potential to impact protein diversity and function than previously thought, independently from changes in gene expression.
Collapse
Affiliation(s)
| | | | - Anna Hadarovich
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Agnes Toth-Petroczy
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany
| | - Federico Calegari
- CRTD-Center for Regenerative Therapies Dresden, School of Medicine, TU Dresden, Germany
| |
Collapse
|
9
|
Gustavsson EK, Sethi S, Gao Y, Brenton JW, García-Ruiz S, Zhang D, Garza R, Reynolds RH, Evans JR, Chen Z, Grant-Peters M, Macpherson H, Montgomery K, Dore R, Wernick AI, Arber C, Wray S, Gandhi S, Esselborn J, Blauwendraat C, Douse CH, Adami A, Atacho DAM, Kouli A, Quaegebeur A, Barker RA, Englund E, Platt F, Jakobsson J, Wood NW, Houlden H, Saini H, Bento CF, Hardy J, Ryten M. The annotation of GBA1 has been concealed by its protein-coding pseudogene GBAP1. SCIENCE ADVANCES 2024; 10:eadk1296. [PMID: 38924406 PMCID: PMC11204300 DOI: 10.1126/sciadv.adk1296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 05/17/2024] [Indexed: 06/28/2024]
Abstract
Mutations in GBA1 cause Gaucher disease and are the most important genetic risk factor for Parkinson's disease. However, analysis of transcription at this locus is complicated by its highly homologous pseudogene, GBAP1. We show that >50% of short RNA-sequencing reads mapping to GBA1 also map to GBAP1. Thus, we used long-read RNA sequencing in the human brain, which allowed us to accurately quantify expression from both GBA1 and GBAP1. We discovered significant differences in expression compared to short-read data and identify currently unannotated transcripts of both GBA1 and GBAP1. These included protein-coding transcripts from both genes that were translated in human brain, but without the known lysosomal function-yet accounting for almost a third of transcription. Analyzing brain-specific cell types using long-read and single-nucleus RNA sequencing revealed region-specific variations in transcript expression. Overall, these findings suggest nonlysosomal roles for GBA1 and GBAP1 with implications for our understanding of the role of GBA1 in health and disease.
Collapse
Affiliation(s)
- Emil K. Gustavsson
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
| | - Siddharth Sethi
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge, UK
| | - Yujing Gao
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge, UK
| | - Jonathan W. Brenton
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
| | - Sonia García-Ruiz
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
| | - David Zhang
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Raquel Garza
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center and Lund Stem Cell Center, Lund, Sweden
| | - Regina H. Reynolds
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
| | - James R. Evans
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
- The Francis Crick Institute, London, UK
| | - Zhongbo Chen
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Melissa Grant-Peters
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
| | - Hannah Macpherson
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Kylie Montgomery
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Rhys Dore
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Anna I. Wernick
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
- The Francis Crick Institute, London, UK
| | - Charles Arber
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Selina Wray
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Sonia Gandhi
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
- The Francis Crick Institute, London, UK
| | - Julian Esselborn
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge, UK
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Christopher H. Douse
- Laboratory of Epigenetics and Chromatin Dynamics, Department of Experimental Medical Science, Lund Stem Cell Center, Lund University, Lund, Sweden
| | - Anita Adami
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center and Lund Stem Cell Center, Lund, Sweden
| | - Diahann A. M. Atacho
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center and Lund Stem Cell Center, Lund, Sweden
| | - Antonina Kouli
- Wellcome-MRC Cambridge Stem Cell Institute and John Van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | - Annelies Quaegebeur
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Department of Clinical Neurosciences, University of Cambridge, Clifford Albutt Building, Cambridge, UK
| | - Roger A. Barker
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Wellcome-MRC Cambridge Stem Cell Institute and John Van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | | | - Frances Platt
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Department of Pharmacology, University of Oxford, Oxford, UK
| | - Johan Jakobsson
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Laboratory of Molecular Neurogenetics, Department of Experimental Medical Science, Wallenberg Neuroscience Center and Lund Stem Cell Center, Lund, Sweden
| | - Nicholas W. Wood
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Henry Houlden
- Department of Neuromuscular Disease, UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Harpreet Saini
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge, UK
| | - Carla F. Bento
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge, UK
| | - John Hardy
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
- Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, UCL, London, UK
- UK Dementia Research Institute at UCL, UCL Queen Square Institute of Neurology, UCL, London, UK
- NIHR University College London Hospitals Biomedical Research Centre, London, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mina Ryten
- Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
- Aligning Science Across Parkinson’s (ASAP) Collaborative Research Network, Chevy Chase, MD 20815, USA
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
| |
Collapse
|
10
|
Wen C, Margolis M, Dai R, Zhang P, Przytycki PF, Vo DD, Bhattacharya A, Matoba N, Tang M, Jiao C, Kim M, Tsai E, Hoh C, Aygün N, Walker RL, Chatzinakos C, Clarke D, Pratt H, Peters MA, Gerstein M, Daskalakis NP, Weng Z, Jaffe AE, Kleinman JE, Hyde TM, Weinberger DR, Bray NJ, Sestan N, Geschwind DH, Roeder K, Gusev A, Pasaniuc B, Stein JL, Love MI, Pollard KS, Liu C, Gandal MJ. Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain. Science 2024; 384:eadh0829. [PMID: 38781368 DOI: 10.1126/science.adh0829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/07/2024] [Indexed: 05/25/2024]
Abstract
Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.
Collapse
Affiliation(s)
- Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael Margolis
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Pan Zhang
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Pawel F Przytycki
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
| | - Daniel D Vo
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Miao Tang
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chuan Jiao
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, Team Krebs, 75014 Paris, France
| | - Minsoo Kim
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ellen Tsai
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Celine Hoh
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rebecca L Walker
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Christos Chatzinakos
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
- McLean Hospital, Belmont, MA 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Declan Clarke
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Henry Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Mette A Peters
- CNS Data Coordination Group, Sage Bionetworks, Seattle, WA 98109, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Nikolaos P Daskalakis
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
- McLean Hospital, Belmont, MA 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Neumora Therapeutics, Watertown, MA 02472, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Nicholas J Bray
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine, Cardiff CF24 4HQ, UK
| | - Nenad Sestan
- Department of Comparative Medicine, Yale University School of Medicine, New Haven, CT 06520, USA
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alexander Gusev
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard Medical School, Boston, MA 02215, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
11
|
Richardson R, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife 2024; 12:RP93429. [PMID: 38546716 PMCID: PMC10977968 DOI: 10.7554/elife.93429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2024] Open
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of -omics studies. To promote the investigation of understudied genes, we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese Richardson
- Interdisciplinary Biological Sciences, Northwestern UniversityEvanstonUnited States
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
- Department of Molecular Biosciences, Northwestern UniversityEvanstonUnited States
- Department of Physics and Astronomy, Northwestern UniversityEvanstonUnited States
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- The Potocsnak Longevity Institute, Northwestern UniversityChicagoUnited States
- Simpson Querrey Lung Institute for Translational Science, Northwestern UniversityChicagoUnited States
| |
Collapse
|
12
|
Richardson RAK, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.28.530483. [PMID: 36909550 PMCID: PMC10002660 DOI: 10.1101/2023.02.28.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of - omics studies. To promote the investigation of understudied genes we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese AK Richardson
- Interdisciplinary Biological Sciences, Northwestern University
- Department of Chemical and Biological Engineering, Northwestern University
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
- Department of Physics and Astronomy, Northwestern University
- Department of Molecular Biosciences, Northwestern University
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University
- The Potocsnak Longevity Institute, Northwestern University
- Simpson Querrey Lung Institute for Translational Science, Northwestern University
| |
Collapse
|
13
|
Humphrey J, Brophy E, Kosoy R, Zeng B, Coccia E, Mattei D, Ravi A, Efthymiou AG, Navarro E, Muller BZ, Snijders GJLJ, Allan A, Münch A, Kitata RB, Kleopoulos SP, Argyriou S, Shao Z, Francoeur N, Tsai CF, Gritsenko MA, Monroe ME, Paurus VL, Weitz KK, Shi T, Sebra R, Liu T, de Witte LD, Goate AM, Bennett DA, Haroutunian V, Hoffman GE, Fullard JF, Roussos P, Raj T. Long-read RNA-seq atlas of novel microglia isoforms elucidates disease-associated genetic regulation of splicing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.01.23299073. [PMID: 38076956 PMCID: PMC10705658 DOI: 10.1101/2023.12.01.23299073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Microglia, the innate immune cells of the central nervous system, have been genetically implicated in multiple neurodegenerative diseases. We previously mapped the genetic regulation of gene expression and mRNA splicing in human microglia, identifying several loci where common genetic variants in microglia-specific regulatory elements explain disease risk loci identified by GWAS. However, identifying genetic effects on splicing has been challenging due to the use of short sequencing reads to identify causal isoforms. Here we present the isoform-centric microglia genomic atlas (isoMiGA) which leverages the power of long-read RNA-seq to identify 35,879 novel microglia isoforms. We show that the novel microglia isoforms are involved in stimulation response and brain region specificity. We then quantified the expression of both known and novel isoforms in a multi-ethnic meta-analysis of 555 human microglia short-read RNA-seq samples from 391 donors, the largest to date, and found associations with genetic risk loci in Alzheimer's disease and Parkinson's disease. We nominate several loci that may act through complex changes in isoform and splice site usage.
Collapse
Affiliation(s)
- Jack Humphrey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Erica Brophy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roman Kosoy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Biao Zeng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Elena Coccia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniele Mattei
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashvin Ravi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Anastasia G. Efthymiou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elisa Navarro
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Biochemistry and Molecular Biology, Faculty of Medicine (Universidad Complutense de Madrid), Madrid, Spain
- Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
- Instituto Ramon y Cajal de Investigacion Sanitaria (IRYCIS), Madrid, Spain
| | - Benjamin Z. Muller
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gijsje JLJ Snijders
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Amanda Allan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexandra Münch
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Reta Birhanu Kitata
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Steven P Kleopoulos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Stathis Argyriou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Zhiping Shao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Nancy Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chia-Feng Tsai
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Vanessa L Paurus
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Karl K Weitz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tujin Shi
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Lot D. de Witte
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alison M. Goate
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, USA
| | - Vahram Haroutunian
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E. Hoffman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - John F. Fullard
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Towfique Raj
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
14
|
Guo LT, Pyle AM. End-to-end RT-PCR of long RNA and highly structured RNA. Methods Enzymol 2023; 691:3-15. [PMID: 37914451 DOI: 10.1016/bs.mie.2023.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
RNA molecules play important roles in numerous normal cellular processes and disease states, from protein coding to gene regulation. RT-PCR, applying the power of polymerase chain reaction (PCR) to RNA by coupling reverse transcription with PCR, is one of the most important techniques to characterize RNA transcripts and monitor gene expression. The ability to analyze full-length RNA transcripts and detect their expression is critical to decipher their biological functions. However, due to the low processivity of retroviral reverse transcriptases (RTs), we can only monitor a small fraction of long RNA transcripts, especially those containing stable secondary and tertiary structures. The full-length sequences can only be deduced by computational analysis, which is often misleading. Group II intron-encoded RTs are a new type of RT enzymes. They have evolved specialized structural elements that unwind template structures and maintain close contact with the RNA template. Therefore, group II intron-encoded RTs are more processive than the retroviral RTs. The discovery, optimization and deployment of processive group II intron RTs provide us the opportunity to analyze RNA transcripts with single molecule resolution. MarathonRT, the most processive group II intron RT, has been extensively optimized for processive reverse transcription. In this chapter, we use MarathonRT to deliver a general protocol for long amplicon generation by RT-PCR, and also provide guidance for troubleshooting and further optimization.
Collapse
Affiliation(s)
- Li-Tao Guo
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States
| | - Anna Marie Pyle
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States; Department of Chemistry, Yale University, New Haven, CT, United States; Howard Hughes Medical Institute, Chevy Chase, MD, United States.
| |
Collapse
|
15
|
Zhao X, Song L, Yang A, Zhang Z, Zhang J, Yang YT, Zhao XM. Prioritizing genes associated with brain disorders by leveraging enhancer-promoter interactions in diverse neural cells and tissues. Genome Med 2023; 15:56. [PMID: 37488639 PMCID: PMC10364416 DOI: 10.1186/s13073-023-01210-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 07/10/2023] [Indexed: 07/26/2023] Open
Abstract
BACKGROUND Prioritizing genes that underlie complex brain disorders poses a considerable challenge. Despite previous studies have found that they shared symptoms and heterogeneity, it remained difficult to systematically identify the risk genes associated with them. METHODS By using the CAGE (Cap Analysis of Gene Expression) read alignment files for 439 human cell and tissue types (including primary cells, tissues and cell lines) from FANTOM5 project, we predicted enhancer-promoter interactions (EPIs) of 439 cell and tissue types in human, and examined their reliability. Then we evaluated the genetic heritability of 17 diverse brain disorders and behavioral-cognitive phenotypes in each neural cell type, brain region, and developmental stage. Furthermore, we prioritized genes associated with brain disorders and phenotypes by leveraging the EPIs in each neural cell and tissue type, and analyzed their pleiotropy and functionality for different categories of disorders and phenotypes. Finally, we characterized the spatiotemporal expression dynamics of these associated genes in cells and tissues. RESULTS We found that identified EPIs showed activity specificity and network aggregation in cell and tissue types, and enriched TF binding in neural cells played key roles in synaptic plasticity and nerve cell development, i.e., EGR1 and SOX family. We also discovered that most neurological disorders exhibit heritability enrichment in neural stem cells and astrocytes, while psychiatric disorders and behavioral-cognitive phenotypes exhibit enrichment in neurons. Furthermore, our identified genes recapitulated well-known risk genes, which exhibited widespread pleiotropy between psychiatric disorders and behavioral-cognitive phenotypes (i.e., FOXP2), and indicated expression specificity in neural cell types, brain regions, and developmental stages associated with disorders and phenotypes. Importantly, we showed the potential associations of brain disorders with brain regions and developmental stages that have not been well studied. CONCLUSIONS Overall, our study characterized the gene-enhancer regulatory networks and genetic mechanisms in the human neural cells and tissues, and illustrated the value of reanalysis of publicly available genomic datasets.
Collapse
Affiliation(s)
- Xingzhong Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Liting Song
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Anyi Yang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Zichao Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Jinglong Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Yucheng T Yang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, 200032, China.
- Internatioal Human Phenome Institutes (Shanghai), Shanghai, 200433, China.
| |
Collapse
|
16
|
Mayer C, Vogt A, Uslu T, Scalzitti N, Chennen K, Poch O, Thompson JD. CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach. J Fungi (Basel) 2023; 9:jof9040424. [PMID: 37108879 PMCID: PMC10141177 DOI: 10.3390/jof9040424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/21/2023] [Accepted: 03/28/2023] [Indexed: 03/31/2023] Open
Abstract
In fungi, the most abundant transcription factor (TF) class contains a fungal-specific ‘GAL4-like’ Zn2C6 DNA binding domain (DBD), while the second class contains another fungal-specific domain, known as ‘fungal_trans’ or middle homology domain (MHD), whose function remains largely uncharacterized. Remarkably, almost a third of MHD-containing TFs in public sequence databases apparently lack DNA binding activity, since they are not predicted to contain a DBD. Here, we reassess the domain organization of these ‘MHD-only’ proteins using an in silico error-tracking approach. In a large-scale analysis of ~17,000 MHD-only TF sequences present in all fungal phyla except Microsporidia and Cryptomycota, we show that the vast majority (>90%) result from genome annotation errors and we are able to predict a new DBD sequence for 14,261 of them. Most of these sequences correspond to a Zn2C6 domain (82%), with a small proportion of C2H2 domains (4%) found only in Dikarya. Our results contradict previous findings that the MHD-only TF are widespread in fungi. In contrast, we show that they are exceptional cases, and that the fungal-specific Zn2C6–MHD domain pair represents the canonical domain signature defining the most predominant fungal TF family. We call this family CeGAL, after the highly characterized members: Cep3, whose 3D structure is determined, and GAL4, a eukaryotic TF archetype. We believe that this will not only improve the annotation and classification of the Zn2C6 TF but will also provide critical guidance for future fungal gene regulatory network analyses.
Collapse
Affiliation(s)
- Claudine Mayer
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Faculté des Sciences, Université Paris Cité, UFR Sciences du Vivant, 75013 Paris, France
- Correspondence: (C.M.); (J.D.T.)
| | - Arthur Vogt
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Tuba Uslu
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Julie D. Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Correspondence: (C.M.); (J.D.T.)
| |
Collapse
|
17
|
Wen C, Margolis M, Dai R, Zhang P, Przytycki PF, Vo DD, Bhattacharya A, Matoba N, Jiao C, Kim M, Tsai E, Hoh C, Aygün N, Walker RL, Chatzinakos C, Clarke D, Pratt H, Consortium P, Peters MA, Gerstein M, Daskalakis NP, Weng Z, Jaffe AE, Kleinman JE, Hyde TM, Weinberger DR, Bray NJ, Sestan N, Geschwind DH, Roeder K, Gusev A, Pasaniuc B, Stein JL, Love MI, Pollard KS, Liu C, Gandal MJ. Cross-ancestry, cell-type-informed atlas of gene, isoform, and splicing regulation in the developing human brain. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.03.23286706. [PMID: 36945630 PMCID: PMC10029021 DOI: 10.1101/2023.03.03.23286706] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.
Collapse
Affiliation(s)
- Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Michael Margolis
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
| | - Pan Zhang
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Pawel F Przytycki
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
| | - Daniel D Vo
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Chuan Jiao
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
| | - Minsoo Kim
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Ellen Tsai
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Celine Hoh
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Rebecca L Walker
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Christos Chatzinakos
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Declan Clarke
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
| | - Henry Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
| | - PsychENCODE Consortium
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
- CNS Data Coordination Group, Sage Bionetworks; Seattle, WA, 98109, USA
- Program in Computational Biology and Bioinformatics, Yale University; New Haven, CT, 06520, USA
- Department of Computer Science, Yale University; New Haven, CT, 06520, USA
- Department of Statistics and Data Science, Yale University; New Haven, CT, 06520, USA
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Neumora Therapeutics; Watertown, MA, 02472, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine; Cardiff, CF24 4HQ, UK
- Department of Comparative Medicine, Yale University School of Medicine; New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University School of Medicine; New Haven, CT, 06520, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Statistics & Data Science, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Computational Biology Department, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute; Boston, MA, 02215, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Harvard Medical School; Boston, MA, 02215, USA
- Division of Genetics, Brigham and Women's Hospital; Boston, MA, 02215, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco; San Francisco, CA, 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA, 94158, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University; Changsha, Hunan, 410008, China
| | - Mette A Peters
- CNS Data Coordination Group, Sage Bionetworks; Seattle, WA, 98109, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University; New Haven, CT, 06520, USA
- Department of Computer Science, Yale University; New Haven, CT, 06520, USA
- Department of Statistics and Data Science, Yale University; New Haven, CT, 06520, USA
| | - Nikolaos P Daskalakis
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Neumora Therapeutics; Watertown, MA, 02472, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Daniel R Weinberger
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Nicholas J Bray
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine; Cardiff, CF24 4HQ, UK
| | - Nenad Sestan
- Department of Comparative Medicine, Yale University School of Medicine; New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University School of Medicine; New Haven, CT, 06520, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Computational Biology Department, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
| | - Alexander Gusev
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute; Boston, MA, 02215, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Harvard Medical School; Boston, MA, 02215, USA
- Division of Genetics, Brigham and Women's Hospital; Boston, MA, 02215, USA
| | - Bogdan Pasaniuc
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco; San Francisco, CA, 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA, 94158, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University; Changsha, Hunan, 410008, China
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
| |
Collapse
|
18
|
Premzl M. Revised eutherian gene collections. BMC Genom Data 2022; 23:56. [PMID: 35870891 PMCID: PMC9308196 DOI: 10.1186/s12863-022-01071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/13/2022] [Indexed: 11/24/2022] Open
Abstract
Objectives The most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future, so that future revisions and updates of eutherian gene data sets were expected. Data description Using 35 public eutherian reference genomic sequence assemblies and free available software, the eutherian comparative genomic analysis protocol RRID:SCR_014401 was published as guidance against potential genomic sequence errors. The protocol curated 14 eutherian third-party data gene data sets, including, in aggregate, 2615 complete coding sequences that were deposited in European Nucleotide Archive. The published eutherian gene collections were used in revisions and updates of eutherian gene data set classifications and nomenclatures that included gene annotations, phylogenetic analyses and protein molecular evolution analyses.
Collapse
|
19
|
Meyer E, Chaung K, Dehghannasiri R, Salzman J. ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq. Genome Biol 2022; 23:226. [PMID: 36284317 PMCID: PMC9594907 DOI: 10.1186/s13059-022-02795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 10/13/2022] [Indexed: 11/13/2022] Open
Abstract
RNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis-including global 3' UTR shortening in human spermatogenesis. ReadZS also discovers global 3' UTR lengthening in Arabidopsis development, highlighting the usefulness of this method in under-annotated transcriptomes.
Collapse
Affiliation(s)
- Elisabeth Meyer
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Kaitlin Chaung
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Roozbeh Dehghannasiri
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Julia Salzman
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
- Department of Statistics (by courtesy), Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
20
|
Guan D, Halstead MM, Islas-Trejo AD, Goszczynski DE, Cheng HH, Ross PJ, Zhou H. Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing. Front Genet 2022; 13:997460. [PMID: 36246588 PMCID: PMC9561881 DOI: 10.3389/fgene.2022.997460] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/30/2022] [Indexed: 11/22/2022] Open
Abstract
To identify and annotate transcript isoforms in the chicken genome, we generated Nanopore long-read sequencing data from 68 samples that encompassed 19 diverse tissues collected from experimental adult male and female White Leghorn chickens. More than 23.8 million reads with mean read length of 790 bases and average quality of 18.2 were generated. The annotation and subsequent filtering resulted in the identification of 55,382 transcripts at 40,547 loci with mean length of 1,700 bases. We predicted 30,967 coding transcripts at 19,461 loci, and 16,495 lncRNA transcripts at 15,512 loci. Compared to existing reference annotations, we found ∼52% of annotated transcripts could be partially or fully matched while ∼47% were novel. Seventy percent of novel transcripts were potentially transcribed from lncRNA loci. Based on our annotation, we quantified transcript expression across tissues and found two brain tissues (i.e., cerebellum and cortex) expressed the highest number of transcripts and loci. Furthermore, ∼22% of the transcripts displayed tissue specificity with the reproductive tissues (i.e., testis and ovary) exhibiting the most tissue-specific transcripts. Despite our wide sampling, ∼20% of Ensembl reference loci were not detected. This suggests that deeper sequencing and additional samples that include different breeds, cell types, developmental stages, and physiological conditions, are needed to fully annotate the chicken genome. The application of Nanopore sequencing in this study demonstrates the usefulness of long-read data in discovering additional novel loci (e.g., lncRNA loci) and resolving complex transcripts (e.g., the longest transcript for the TTN locus).
Collapse
Affiliation(s)
- Dailu Guan
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Michelle M. Halstead
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Alma D. Islas-Trejo
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Daniel E. Goszczynski
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Hans H. Cheng
- USDA, ARS, USNPRC, Avian Disease and Oncology Laboratory, East Lansing, MI, United States
| | - Pablo J. Ross
- Department of Animal Science, University of California Davis, Davis, CA, United States
- *Correspondence: Pablo J. Ross, ; Huaijun Zhou,
| | - Huaijun Zhou
- Department of Animal Science, University of California Davis, Davis, CA, United States
- *Correspondence: Pablo J. Ross, ; Huaijun Zhou,
| |
Collapse
|
21
|
Kondratyeva L, Alekseenko I, Chernov I, Sverdlov E. Data Incompleteness May form a Hard-to-Overcome Barrier to Decoding Life's Mechanism. BIOLOGY 2022; 11:1208. [PMID: 36009835 PMCID: PMC9404739 DOI: 10.3390/biology11081208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/03/2022] [Accepted: 08/10/2022] [Indexed: 11/23/2022]
Abstract
In this brief review, we attempt to demonstrate that the incompleteness of data, as well as the intrinsic heterogeneity of biological systems, may form very strong and possibly insurmountable barriers for researchers trying to decipher the mechanisms of the functioning of live systems. We illustrate this challenge using the two most studied organisms: E. coli, with 34.6% genes lacking experimental evidence of function, and C. elegans, with identified proteins for approximately 50% of its genes. Another striking example is an artificial unicellular entity named JCVI-syn3.0, with a minimal set of genes. A total of 31.5% of the genes of JCVI-syn3.0 cannot be ascribed a specific biological function. The human interactome mapping project identified only 5-10% of all protein interactions in humans. In addition, most of the available data are static snapshots, and it is barely possible to generate realistic models of the dynamic processes within cells. Moreover, the existing interactomes reflect the de facto interaction but not its functional result, which is an unpredictable emerging property. Perhaps the completeness of molecular data on any living organism is beyond our reach and represents an unsolvable problem in biology.
Collapse
Affiliation(s)
- Liya Kondratyeva
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russia
| | - Irina Alekseenko
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russia
- Institute of Molecular Genetics of National Research Centre “Kurchatov Institute”, Moscow 123182, Russia
| | - Igor Chernov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russia
| | - Eugene Sverdlov
- Institute of Molecular Genetics of National Research Centre “Kurchatov Institute”, Moscow 123182, Russia
- Kurchatov Center for Genome Research, National Research Center “Kurchatov Institute”, Moscow 123182, Russia
| |
Collapse
|
22
|
Lee AJ, Reiter T, Doing G, Oh J, Hogan DA, Greene CS. Using genome-wide expression compendia to study microorganisms. Comput Struct Biotechnol J 2022; 20:4315-4324. [PMID: 36016717 PMCID: PMC9396250 DOI: 10.1016/j.csbj.2022.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/07/2022] [Accepted: 08/07/2022] [Indexed: 11/30/2022] Open
Abstract
A gene expression compendium is a heterogeneous collection of gene expression experiments assembled from data collected for diverse purposes. The widely varied experimental conditions and genetic backgrounds across samples creates a tremendous opportunity for gaining a systems level understanding of the transcriptional responses that influence phenotypes. Variety in experimental design is particularly important for studying microbes, where the transcriptional responses integrate many signals and demonstrate plasticity across strains including response to what nutrients are available and what microbes are present. Advances in high-throughput measurement technology have made it feasible to construct compendia for many microbes. In this review we discuss how these compendia are constructed and analyzed to reveal transcriptional patterns.
Collapse
Affiliation(s)
- Alexandra J. Lee
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA, USA
| | - Taylor Reiter
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO, USA
| | - Georgia Doing
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Julia Oh
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Deborah A. Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine, Dartmouth, Hanover, NH, USA
| | - Casey S. Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO, USA
| |
Collapse
|
23
|
Tan S, Wang W, Jie W, Liu J. FishExp: A comprehensive database and analysis platform for gene expression and alternative splicing of fish species. Comput Struct Biotechnol J 2022; 20:3676-3684. [PMID: 35891795 PMCID: PMC9293738 DOI: 10.1016/j.csbj.2022.07.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 07/07/2022] [Accepted: 07/07/2022] [Indexed: 11/09/2022] Open
Abstract
The publicly archived RNA-seq data has grown exponentially, while its valuable information has not yet been fully discovered and utilized, such as alternative splicing and its integration with gene expression. This is especially true for fish species which play important roles in ecology, research and the food industry. Furthermore, there is a lack of online platform to analyze users’ new data individually and jointly with existing data for the comprehensive analysis of alternative splicing and gene expression. Here, we present FishExp, a web-based data platform covering gene expression and alternative splicing in 26,081 RNA-seq experiments from 44 fishes. It allows users to query the data in a variety of ways, including gene identifier/symbol, functional term, and BLAST alignment. Moreover, users can customize experiments and tools to perform differential/specific expression and alternative splicing analysis, co-expression and cross-species analysis. In addition, functional enrichment is provided to confer biological significance. Notably, users are allowed to submit their own data and perform various analyses using the new data alone or alongside existing data in FishExp. Results of retrieval and analysis can be visualized on the gene-, transcript- and splicing event-level webpage in a highly interactive and intuitive manner. All data in FishExp can be downloaded for more in-depth analysis. The manually curated sample information, uniform data processing and various tools make it efficient for users to gain new insights from these large data sets, facilitating scientific hypothesis generation. FishExp is freely accessible at https://bioinfo.njau.edu.cn/fishExp.
Collapse
Affiliation(s)
- Suxu Tan
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Wenwen Wang
- School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University, Auburn, AL 36849, USA
| | - Wencai Jie
- Institute for Plant Molecular Biology, State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, Jiangsu 210023, China
| | - Jinding Liu
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| |
Collapse
|
24
|
An unexplored angle: T cell antigen discoveries reveal a marginal contribution of proteasome splicing to the immunogenic MHC class I antigen pool. Proc Natl Acad Sci U S A 2022; 119:e2119736119. [PMID: 35858315 PMCID: PMC9303865 DOI: 10.1073/pnas.2119736119] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In the current era of T cell–based immunotherapies, it is crucial to understand which types of MHC-presented T cell antigens are produced by tumor cells. In addition to linear peptide antigens, chimeric peptides are generated through proteasome-catalyzed peptide splicing (PCPS). Whether such spliced peptides are abundantly presented by MHC is highly disputed because of disagreement in computational analyses of mass spectrometry data of MHC-eluted peptides. Moreover, such mass spectrometric analyses cannot elucidate how much spliced peptides contribute to the pool of immunogenic antigens. In this Perspective, we explain the significance of knowing the contribution of spliced peptides for accurate analyses of peptidomes on one hand, and to serve as a potential source of targetable tumor antigens on the other hand. Toward a strategy for mass spectrometry independent estimation of the contribution of PCPS to the immunopeptidome, we first reviewed methodologies to identify MHC-presented spliced peptide antigens expressed by tumors. Data from these identifications allowed us to compile three independent datasets containing 103, 74, and 83 confirmed T cell antigens from cancer patients. Only 3.9%, 1.4%, and between 0% and 7.2% of these truly immunogenic antigens are produced by PCPS, therefore providing a marginal contribution to the pool of immunogenic tumor antigens. We conclude that spliced peptides will not serve as a comprehensive source to expand the number of targetable antigens for immunotherapies.
Collapse
|
25
|
Kwak Y, Daly CWP, Fogarty EA, Grimson A, Kwak H. Dynamic and widespread control of poly(A) tail length during macrophage activation. RNA (NEW YORK, N.Y.) 2022; 28:947-971. [PMID: 35512831 PMCID: PMC9202586 DOI: 10.1261/rna.078918.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 03/21/2022] [Indexed: 06/14/2023]
Abstract
The poly(A) tail enhances translation and transcript stability, and tail length is under dynamic control during cell state transitions. Tail regulation plays essential roles in translational timing and fertilization in early development, but poly(A) tail dynamics have not been fully explored in post-embryonic systems. Here, we examined the landscape and impact of tail length control during macrophage activation. Upon activation, more than 1500 mRNAs, including proinflammatory genes, underwent distinctive changes in tail lengths. Increases in tail length correlated with mRNA levels regardless of transcriptional activity, and many mRNAs that underwent tail extension encode proteins necessary for immune function and post-transcriptional regulation. Strikingly, we found that ZFP36, whose protein product destabilizes target transcripts, undergoes tail extension. Our analyses indicate that many mRNAs undergoing tail lengthening are, in turn, degraded by elevated levels of ZFP36, constituting a post-transcriptional feedback loop that ensures transient regulation of transcripts integral to macrophage activation. Taken together, this study establishes the complexity, relevance, and widespread nature of poly(A) tail dynamics, and the resulting post-transcriptional regulation during macrophage activation.
Collapse
Affiliation(s)
- Yeonui Kwak
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
- Graduate Field of Genetics, Genomics, and Development, Cornell University, Ithaca, New York 14853, USA
| | - Ciarán W P Daly
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
- Graduate Field of Biochemistry, Molecular, and Cell Biology, Cornell University, Ithaca, New York 14853, USA
| | - Elizabeth A Fogarty
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew Grimson
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Hojoong Kwak
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
26
|
Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes. Nat Commun 2022; 13:2270. [PMID: 35477703 PMCID: PMC9046390 DOI: 10.1038/s41467-022-30017-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.
Collapse
|
27
|
Zhang CY, Xiao X, Zhang Z, Hu Z, Li M. An alternative splicing hypothesis for neuropathology of schizophrenia: evidence from studies on historical candidate genes and multi-omics data. Mol Psychiatry 2022; 27:95-112. [PMID: 33686213 DOI: 10.1038/s41380-021-01037-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/08/2021] [Accepted: 01/22/2021] [Indexed: 01/31/2023]
Abstract
Alternative splicing of schizophrenia risk genes, such as DRD2, GRM3, and DISC1, has been extensively described. Nevertheless, the alternative splicing characteristics of the growing number of schizophrenia risk genes identified through genetic analyses remain relatively opaque. Recently, transcriptomic analyses in human brains based on short-read RNA-sequencing have discovered many "local splicing" events (e.g., exon skipping junctions) associated with genetic risk of schizophrenia, and further molecular characterizations have identified novel spliced isoforms, such as AS3MTd2d3 and ZNF804AE3E4. In addition, long-read sequencing analyses of schizophrenia risk genes (e.g., CACNA1C and NRXN1) have revealed multiple previously unannotated brain-abundant isoforms with therapeutic potentials, and functional analyses of KCNH2-3.1 and Ube3a1 have provided examples for investigating such spliced isoforms in vitro and in vivo. These findings suggest that alternative splicing may be an essential molecular mechanism underlying genetic risk of schizophrenia, however, the incomplete annotations of human brain transcriptomes might have limited our understanding of schizophrenia pathogenesis, and further efforts to elucidate these transcriptional characteristics are urgently needed to gain insights into the illness-correlated brain physiology and pathology as well as to translate genetic discoveries into novel therapeutic targets.
Collapse
Affiliation(s)
- Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
28
|
Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, Imada EL, Zhang D, Joseph L, Leek JT, Jaffe AE, Nellore A, Collado-Torres L, Hansen KD, Langmead B. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol 2021; 22:323. [PMID: 34844637 PMCID: PMC8628444 DOI: 10.1186/s13059-021-02533-6] [Citation(s) in RCA: 137] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/29/2021] [Indexed: 12/12/2022] Open
Abstract
We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .
Collapse
Affiliation(s)
- Christopher Wilks
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Shijie C Zheng
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | | | - Rone Charles
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Thomas M. Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonathan P Ling
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - David Zhang
- Institute of Child Health, University College London (UCL), London, UK
| | | | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
- Lieber Institute for Brain Development, Baltimore, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Abhinav Nellore
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Department of Surgery, Oregon Health & Science University, Portland, OR, USA
| | | | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA.
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
29
|
Scalzitti N, Kress A, Orhand R, Weber T, Moulinier L, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Spliceator: multi-species splice site prediction using convolutional neural networks. BMC Bioinformatics 2021; 22:561. [PMID: 34814826 PMCID: PMC8609763 DOI: 10.1186/s12859-021-04471-3] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 11/09/2021] [Indexed: 12/14/2022] Open
Abstract
Background Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. Results We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89–92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. Conclusions Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04471-3.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Romain Orhand
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Thomas Weber
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Luc Moulinier
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Anne Jeannin-Girardon
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Pierre Collet
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Julie D Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.
| |
Collapse
|
30
|
Leung SK, Jeffries AR, Castanho I, Jordan BT, Moore K, Davies JP, Dempster EL, Bray NJ, O'Neill P, Tseng E, Ahmed Z, Collier DA, Jeffery ED, Prabhakar S, Schalkwyk L, Jops C, Gandal MJ, Sheynkman GM, Hannon E, Mill J. Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing. Cell Rep 2021; 37:110022. [PMID: 34788620 PMCID: PMC8609283 DOI: 10.1016/j.celrep.2021.110022] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 07/30/2021] [Accepted: 10/28/2021] [Indexed: 12/05/2022] Open
Abstract
Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community. There is widespread transcript diversity in the cortex and many novel transcripts Some genes display big differences in isoform number between human and mouse cortex There is evidence of differential transcript usage between human fetal and adult cortex There are many novel isoforms of genes associated with human brain disease
Collapse
Key Words
- isoform, transcript, expression, brain, cortex, mouse, human, adult, fetal, long-read sequencing, alternative splicing
Collapse
Affiliation(s)
| | | | - Isabel Castanho
- University of Exeter, Exeter, UK; Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Ben T Jordan
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | | | | | | | | | | | | | | | | | - Erin D Jeffery
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Shyam Prabhakar
- Genome Institute of Singapore, Agency for Science, Technology and Research (A(∗)STAR), Singapore, Singapore
| | | | - Connor Jops
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Michael J Gandal
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; UVA Cancer Center, University of Virginia, Charlottesville, VA, USA
| | | | | |
Collapse
|
31
|
Eagles NJ, Burke EE, Leonard J, Barry BK, Stolz JM, Huuki L, Phan BN, Serrato VL, Gutiérrez-Millán E, Aguilar-Ordoñez I, Jaffe AE, Collado-Torres L. SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses. BMC Bioinformatics 2021; 22:224. [PMID: 33932985 PMCID: PMC8088074 DOI: 10.1186/s12859-021-04142-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 04/21/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step-such as alignment of reads to a reference genome-of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. RESULTS In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided ( http://research.libd.org/SPEAQeasy/ ). CONCLUSIONS SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.
Collapse
Affiliation(s)
- Nicholas J Eagles
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Emily E Burke
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Jacob Leonard
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- QuestBridge Scholar, Palo Alto, CA, 94303, USA
| | - Brianna K Barry
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Joshua M Stolz
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Louise Huuki
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - BaDoi N Phan
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
- Medical Scientist Training Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Violeta Larios Serrato
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- Instituto Politécnico Nacional, Escuela Nacional de Ciencias Biológicas, Mexico City, CDMX, 11340, Mexico
| | | | - Israel Aguilar-Ordoñez
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- Department of Supercomputing, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, CDMX, 14610, Mexico
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Department of Genetic Medicine, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA.
| |
Collapse
|
32
|
Chen Z, Zhang D, Reynolds RH, Gustavsson EK, García-Ruiz S, D'Sa K, Fairbrother-Browne A, Vandrovcova J, Hardy J, Houlden H, Gagliano Taliun SA, Botía J, Ryten M. Human-lineage-specific genomic elements are associated with neurodegenerative disease and APOE transcript usage. Nat Commun 2021; 12:2076. [PMID: 33824317 PMCID: PMC8024253 DOI: 10.1038/s41467-021-22262-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 03/03/2021] [Indexed: 12/12/2022] Open
Abstract
Knowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer's disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.
Collapse
Affiliation(s)
- Zhongbo Chen
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - David Zhang
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Regina H Reynolds
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Emil K Gustavsson
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Sonia García-Ruiz
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Karishma D'Sa
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Aine Fairbrother-Browne
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Jana Vandrovcova
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
| | - John Hardy
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- Reta Lila Weston Institute, Queen Square Institute of Neurology, UCL, London, UK
- UK Dementia Research Institute, Queen Square Institute of Neurology, UCL, London, UK
- NIHR University College London Hospitals Biomedical Research Centre, London, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Henry Houlden
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - Sarah A Gagliano Taliun
- Department of Medicine & Department of Neurosciences, Université de Montréal, Université de Montréal, Montréal, QC, Canada
- Montréal Heart Institute, Montréal, Québec, Canada
| | - Juan Botía
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Mina Ryten
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK.
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK.
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK.
| |
Collapse
|
33
|
Minnis CJ, Townsend S, Petschnigg J, Tinelli E, Bähler J, Russell C, Mole SE. Global network analysis in Schizosaccharomyces pombe reveals three distinct consequences of the common 1-kb deletion causing juvenile CLN3 disease. Sci Rep 2021; 11:6332. [PMID: 33737578 PMCID: PMC7973434 DOI: 10.1038/s41598-021-85471-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 02/23/2021] [Indexed: 12/15/2022] Open
Abstract
Juvenile CLN3 disease is a recessively inherited paediatric neurodegenerative disorder, with most patients homozygous for a 1-kb intragenic deletion in CLN3. The btn1 gene is the Schizosaccharomyces pombe orthologue of CLN3. Here, we have extended the use of synthetic genetic array (SGA) analyses to delineate functional signatures for two different disease-causing mutations in addition to complete deletion of btn1. We show that genetic-interaction signatures can differ for mutations in the same gene, which helps to dissect their distinct functional effects. The mutation equivalent to the minor transcript arising from the 1-kb deletion (btn1102–208del) shows a distinct interaction pattern. Taken together, our results imply that the minor 1-kb deletion transcript has three consequences for CLN3: to both lose and retain some inherent functions and to acquire abnormal characteristics. This has particular implications for the therapeutic development of juvenile CLN3 disease. In addition, this proof of concept could be applied to conserved genes for other mendelian disorders or any gene of interest, aiding in the dissection of their functional domains, unpacking the global consequences of disease pathogenesis, and clarifying genotype–phenotype correlations. In doing so, this detail will enhance the goals of personalised medicine to improve treatment outcomes and reduce adverse events.
Collapse
Affiliation(s)
- Christopher J Minnis
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK. .,Department of Comparative Biomedical Sciences, Royal Veterinary College, Royal College Street, London, NW1 0TU, UK.
| | - StJohn Townsend
- Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK.,The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
| | - Julia Petschnigg
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK
| | - Elisa Tinelli
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK
| | - Jürg Bähler
- Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Claire Russell
- Department of Comparative Biomedical Sciences, Royal Veterinary College, Royal College Street, London, NW1 0TU, UK
| | - Sara E Mole
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK
| |
Collapse
|
34
|
Wilks C, Ahmed O, Baker DN, Zhang D, Collado-Torres L, Langmead B. Megadepth: efficient coverage quantification for BigWigs and BAMs. Bioinformatics 2021; 37:3014-3016. [PMID: 33693500 PMCID: PMC8528031 DOI: 10.1093/bioinformatics/btab152] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 01/16/2021] [Accepted: 03/04/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION A common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types. RESULTS Megadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19 000 GTExV8 BigWig files in approximately 1 h using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package. AVAILABILITY AND IMPLEMENTATION https://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/megadepth. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher Wilks
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA,To whom correspondence should be addressed.
or
| | - Omar Ahmed
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA
| | - Daniel N Baker
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA
| | - David Zhang
- Department of Molecular Neuroscience Institute of
Neurology, University College London (UCL), London WC1E 6BT,
UK,NIHR Great Ormond Street Hospital Biomedical
Research Centre, University College London, London WC1E 6BT,
UK,Genetics and Genomic Medicine, Great Ormond Street
Institute of Child Health University College London, London WC1E
6BT, UK
| | | | - Ben Langmead
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA,To whom correspondence should be addressed.
or
| |
Collapse
|
35
|
Kölsch Y, Hahn J, Sappington A, Stemmer M, Fernandes AM, Helmbrecht TO, Lele S, Butrus S, Laurell E, Arnold-Ammer I, Shekhar K, Sanes JR, Baier H. Molecular classification of zebrafish retinal ganglion cells links genes to cell types to behavior. Neuron 2021; 109:645-662.e9. [PMID: 33357413 PMCID: PMC7897282 DOI: 10.1016/j.neuron.2020.12.003] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/09/2020] [Accepted: 12/01/2020] [Indexed: 12/12/2022]
Abstract
Retinal ganglion cells (RGCs) form an array of feature detectors, which convey visual information to central brain regions. Characterizing RGC diversity is required to understand the logic of the underlying functional segregation. Using single-cell transcriptomics, we systematically classified RGCs in adult and larval zebrafish, thereby identifying marker genes for >30 mature types and several developmental intermediates. We used this dataset to engineer transgenic driver lines, enabling specific experimental access to a subset of RGC types. Expression of one or few transcription factors often predicts dendrite morphologies and axonal projections to specific tectal layers and extratectal targets. In vivo calcium imaging revealed that molecularly defined RGCs exhibit specific functional tuning. Finally, chemogenetic ablation of eomesa+ RGCs, which comprise melanopsin-expressing types with projections to a small subset of central targets, selectively impaired phototaxis. Together, our study establishes a framework for systematically studying the functional architecture of the visual system.
Collapse
Affiliation(s)
- Yvonne Kölsch
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany; Graduate School of Systemic Neurosciences, Ludwig Maximilian University, 82152 Martinsried, Germany
| | - Joshua Hahn
- Department of Chemical and Biomolecular Engineering, UC Berkeley, Berkeley, CA 94720, USA
| | - Anna Sappington
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| | - Manuel Stemmer
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - António M Fernandes
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Thomas O Helmbrecht
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Shriya Lele
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Salwan Butrus
- Department of Chemical and Biomolecular Engineering, UC Berkeley, Berkeley, CA 94720, USA
| | - Eva Laurell
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Irene Arnold-Ammer
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Karthik Shekhar
- Department of Chemical and Biomolecular Engineering, UC Berkeley, Berkeley, CA 94720, USA; Helen Wills Neuroscience Institute, California Institute for Quantitative Biosciences, QB3, Center for Computational Biology, UC Berkeley, Berkeley, CA 94720, USA.
| | - Joshua R Sanes
- Center for Brain Science and Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Herwig Baier
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany.
| |
Collapse
|