1
|
Wen C, Margolis M, Dai R, Zhang P, Przytycki PF, Vo DD, Bhattacharya A, Matoba N, Tang M, Jiao C, Kim M, Tsai E, Hoh C, Aygün N, Walker RL, Chatzinakos C, Clarke D, Pratt H, Peters MA, Gerstein M, Daskalakis NP, Weng Z, Jaffe AE, Kleinman JE, Hyde TM, Weinberger DR, Bray NJ, Sestan N, Geschwind DH, Roeder K, Gusev A, Pasaniuc B, Stein JL, Love MI, Pollard KS, Liu C, Gandal MJ. Cross-ancestry atlas of gene, isoform, and splicing regulation in the developing human brain. Science 2024; 384:eadh0829. [PMID: 38781368 DOI: 10.1126/science.adh0829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/07/2024] [Indexed: 05/25/2024]
Abstract
Neuropsychiatric genome-wide association studies (GWASs), including those for autism spectrum disorder and schizophrenia, show strong enrichment for regulatory elements in the developing brain. However, prioritizing risk genes and mechanisms is challenging without a unified regulatory atlas. Across 672 diverse developing human brains, we identified 15,752 genes harboring gene, isoform, and/or splicing quantitative trait loci, mapping 3739 to cellular contexts. Gene expression heritability drops during development, likely reflecting both increasing cellular heterogeneity and the intrinsic properties of neuronal maturation. Isoform-level regulation, particularly in the second trimester, mediated the largest proportion of GWAS heritability. Through colocalization, we prioritized mechanisms for about 60% of GWAS loci across five disorders, exceeding adult brain findings. Finally, we contextualized results within gene and isoform coexpression networks, revealing the comprehensive landscape of transcriptome regulation in development and disease.
Collapse
Affiliation(s)
- Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Michael Margolis
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
| | - Pan Zhang
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Pawel F Przytycki
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
| | - Daniel D Vo
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Miao Tang
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Chuan Jiao
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Université Paris Cité, Institute of Psychiatry and Neuroscience of Paris (IPNP), INSERM U1266, Team Krebs, 75014 Paris, France
| | - Minsoo Kim
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Ellen Tsai
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Celine Hoh
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Rebecca L Walker
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Christos Chatzinakos
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
- McLean Hospital, Belmont, MA 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Declan Clarke
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Henry Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Mette A Peters
- CNS Data Coordination Group, Sage Bionetworks, Seattle, WA 98109, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, CT 06520, USA
| | - Nikolaos P Daskalakis
- Department of Psychiatry, Harvard Medical School, Boston, MA 02215, USA
- McLean Hospital, Belmont, MA 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
- Neumora Therapeutics, Watertown, MA 02472, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Baltimore, MD 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Nicholas J Bray
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine, Cardiff CF24 4HQ, UK
| | - Nenad Sestan
- Department of Comparative Medicine, Yale University School of Medicine, New Haven, CT 06520, USA
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alexander Gusev
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Harvard Medical School, Boston, MA 02215, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY 13210, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410008, China
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Richardson R, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife 2024; 12:RP93429. [PMID: 38546716 PMCID: PMC10977968 DOI: 10.7554/elife.93429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2024] Open
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of -omics studies. To promote the investigation of understudied genes, we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese Richardson
- Interdisciplinary Biological Sciences, Northwestern UniversityEvanstonUnited States
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
- Department of Molecular Biosciences, Northwestern UniversityEvanstonUnited States
- Department of Physics and Astronomy, Northwestern UniversityEvanstonUnited States
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- The Potocsnak Longevity Institute, Northwestern UniversityChicagoUnited States
- Simpson Querrey Lung Institute for Translational Science, Northwestern UniversityChicagoUnited States
| |
Collapse
|
3
|
Richardson RAK, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.28.530483. [PMID: 36909550 PMCID: PMC10002660 DOI: 10.1101/2023.02.28.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of - omics studies. To promote the investigation of understudied genes we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese AK Richardson
- Interdisciplinary Biological Sciences, Northwestern University
- Department of Chemical and Biological Engineering, Northwestern University
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
- Department of Physics and Astronomy, Northwestern University
- Department of Molecular Biosciences, Northwestern University
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University
- The Potocsnak Longevity Institute, Northwestern University
- Simpson Querrey Lung Institute for Translational Science, Northwestern University
| |
Collapse
|
4
|
Humphrey J, Brophy E, Kosoy R, Zeng B, Coccia E, Mattei D, Ravi A, Efthymiou AG, Navarro E, Muller BZ, Snijders GJLJ, Allan A, Münch A, Kitata RB, Kleopoulos SP, Argyriou S, Shao Z, Francoeur N, Tsai CF, Gritsenko MA, Monroe ME, Paurus VL, Weitz KK, Shi T, Sebra R, Liu T, de Witte LD, Goate AM, Bennett DA, Haroutunian V, Hoffman GE, Fullard JF, Roussos P, Raj T. Long-read RNA-seq atlas of novel microglia isoforms elucidates disease-associated genetic regulation of splicing. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.01.23299073. [PMID: 38076956 PMCID: PMC10705658 DOI: 10.1101/2023.12.01.23299073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Microglia, the innate immune cells of the central nervous system, have been genetically implicated in multiple neurodegenerative diseases. We previously mapped the genetic regulation of gene expression and mRNA splicing in human microglia, identifying several loci where common genetic variants in microglia-specific regulatory elements explain disease risk loci identified by GWAS. However, identifying genetic effects on splicing has been challenging due to the use of short sequencing reads to identify causal isoforms. Here we present the isoform-centric microglia genomic atlas (isoMiGA) which leverages the power of long-read RNA-seq to identify 35,879 novel microglia isoforms. We show that the novel microglia isoforms are involved in stimulation response and brain region specificity. We then quantified the expression of both known and novel isoforms in a multi-ethnic meta-analysis of 555 human microglia short-read RNA-seq samples from 391 donors, the largest to date, and found associations with genetic risk loci in Alzheimer's disease and Parkinson's disease. We nominate several loci that may act through complex changes in isoform and splice site usage.
Collapse
Affiliation(s)
- Jack Humphrey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Erica Brophy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Roman Kosoy
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Biao Zeng
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Elena Coccia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Daniele Mattei
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashvin Ravi
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Anastasia G. Efthymiou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Elisa Navarro
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Biochemistry and Molecular Biology, Faculty of Medicine (Universidad Complutense de Madrid), Madrid, Spain
- Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Madrid, Spain
- Instituto Ramon y Cajal de Investigacion Sanitaria (IRYCIS), Madrid, Spain
| | - Benjamin Z. Muller
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gijsje JLJ Snijders
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Amanda Allan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Alexandra Münch
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Reta Birhanu Kitata
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Steven P Kleopoulos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Stathis Argyriou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Zhiping Shao
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Nancy Francoeur
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chia-Feng Tsai
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Marina A Gritsenko
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Matthew E Monroe
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Vanessa L Paurus
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Karl K Weitz
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tujin Shi
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Black Family Stem Cell Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Tao Liu
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Lot D. de Witte
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Alison M. Goate
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, USA
| | - Vahram Haroutunian
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Gabriel E. Hoffman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - John F. Fullard
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Panos Roussos
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Center for Disease Neurogenomics, Icahn School of Medicine at Mount Sinai, New York, USA
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, USA
- Mental Illness Research Education, and Clinical Center (VISN 2 South), James J. Peters VA Medical Center, Bronx, NY, USA
| | - Towfique Raj
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience & Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Ronald M. Loeb Center for Alzheimer’s Disease, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Genomics Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Estelle and Daniel Maggin Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
5
|
Guo LT, Pyle AM. End-to-end RT-PCR of long RNA and highly structured RNA. Methods Enzymol 2023; 691:3-15. [PMID: 37914451 DOI: 10.1016/bs.mie.2023.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
RNA molecules play important roles in numerous normal cellular processes and disease states, from protein coding to gene regulation. RT-PCR, applying the power of polymerase chain reaction (PCR) to RNA by coupling reverse transcription with PCR, is one of the most important techniques to characterize RNA transcripts and monitor gene expression. The ability to analyze full-length RNA transcripts and detect their expression is critical to decipher their biological functions. However, due to the low processivity of retroviral reverse transcriptases (RTs), we can only monitor a small fraction of long RNA transcripts, especially those containing stable secondary and tertiary structures. The full-length sequences can only be deduced by computational analysis, which is often misleading. Group II intron-encoded RTs are a new type of RT enzymes. They have evolved specialized structural elements that unwind template structures and maintain close contact with the RNA template. Therefore, group II intron-encoded RTs are more processive than the retroviral RTs. The discovery, optimization and deployment of processive group II intron RTs provide us the opportunity to analyze RNA transcripts with single molecule resolution. MarathonRT, the most processive group II intron RT, has been extensively optimized for processive reverse transcription. In this chapter, we use MarathonRT to deliver a general protocol for long amplicon generation by RT-PCR, and also provide guidance for troubleshooting and further optimization.
Collapse
Affiliation(s)
- Li-Tao Guo
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States
| | - Anna Marie Pyle
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, United States; Department of Chemistry, Yale University, New Haven, CT, United States; Howard Hughes Medical Institute, Chevy Chase, MD, United States.
| |
Collapse
|
6
|
Zhao X, Song L, Yang A, Zhang Z, Zhang J, Yang YT, Zhao XM. Prioritizing genes associated with brain disorders by leveraging enhancer-promoter interactions in diverse neural cells and tissues. Genome Med 2023; 15:56. [PMID: 37488639 PMCID: PMC10364416 DOI: 10.1186/s13073-023-01210-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 07/10/2023] [Indexed: 07/26/2023] Open
Abstract
BACKGROUND Prioritizing genes that underlie complex brain disorders poses a considerable challenge. Despite previous studies have found that they shared symptoms and heterogeneity, it remained difficult to systematically identify the risk genes associated with them. METHODS By using the CAGE (Cap Analysis of Gene Expression) read alignment files for 439 human cell and tissue types (including primary cells, tissues and cell lines) from FANTOM5 project, we predicted enhancer-promoter interactions (EPIs) of 439 cell and tissue types in human, and examined their reliability. Then we evaluated the genetic heritability of 17 diverse brain disorders and behavioral-cognitive phenotypes in each neural cell type, brain region, and developmental stage. Furthermore, we prioritized genes associated with brain disorders and phenotypes by leveraging the EPIs in each neural cell and tissue type, and analyzed their pleiotropy and functionality for different categories of disorders and phenotypes. Finally, we characterized the spatiotemporal expression dynamics of these associated genes in cells and tissues. RESULTS We found that identified EPIs showed activity specificity and network aggregation in cell and tissue types, and enriched TF binding in neural cells played key roles in synaptic plasticity and nerve cell development, i.e., EGR1 and SOX family. We also discovered that most neurological disorders exhibit heritability enrichment in neural stem cells and astrocytes, while psychiatric disorders and behavioral-cognitive phenotypes exhibit enrichment in neurons. Furthermore, our identified genes recapitulated well-known risk genes, which exhibited widespread pleiotropy between psychiatric disorders and behavioral-cognitive phenotypes (i.e., FOXP2), and indicated expression specificity in neural cell types, brain regions, and developmental stages associated with disorders and phenotypes. Importantly, we showed the potential associations of brain disorders with brain regions and developmental stages that have not been well studied. CONCLUSIONS Overall, our study characterized the gene-enhancer regulatory networks and genetic mechanisms in the human neural cells and tissues, and illustrated the value of reanalysis of publicly available genomic datasets.
Collapse
Affiliation(s)
- Xingzhong Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Liting Song
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Anyi Yang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Zichao Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Jinglong Zhang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China
| | - Yucheng T Yang
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.
| | - Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence, and Department of Neurology of Zhongshan Hospital, Fudan University, 220 Handan Road, Shanghai, 200433, China.
- MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433, China.
- State Key Laboratory of Medical Neurobiology, Institutes of Brain Science, Fudan University, Shanghai, 200032, China.
- Internatioal Human Phenome Institutes (Shanghai), Shanghai, 200433, China.
| |
Collapse
|
7
|
Mayer C, Vogt A, Uslu T, Scalzitti N, Chennen K, Poch O, Thompson JD. CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach. J Fungi (Basel) 2023; 9:jof9040424. [PMID: 37108879 PMCID: PMC10141177 DOI: 10.3390/jof9040424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/21/2023] [Accepted: 03/28/2023] [Indexed: 03/31/2023] Open
Abstract
In fungi, the most abundant transcription factor (TF) class contains a fungal-specific ‘GAL4-like’ Zn2C6 DNA binding domain (DBD), while the second class contains another fungal-specific domain, known as ‘fungal_trans’ or middle homology domain (MHD), whose function remains largely uncharacterized. Remarkably, almost a third of MHD-containing TFs in public sequence databases apparently lack DNA binding activity, since they are not predicted to contain a DBD. Here, we reassess the domain organization of these ‘MHD-only’ proteins using an in silico error-tracking approach. In a large-scale analysis of ~17,000 MHD-only TF sequences present in all fungal phyla except Microsporidia and Cryptomycota, we show that the vast majority (>90%) result from genome annotation errors and we are able to predict a new DBD sequence for 14,261 of them. Most of these sequences correspond to a Zn2C6 domain (82%), with a small proportion of C2H2 domains (4%) found only in Dikarya. Our results contradict previous findings that the MHD-only TF are widespread in fungi. In contrast, we show that they are exceptional cases, and that the fungal-specific Zn2C6–MHD domain pair represents the canonical domain signature defining the most predominant fungal TF family. We call this family CeGAL, after the highly characterized members: Cep3, whose 3D structure is determined, and GAL4, a eukaryotic TF archetype. We believe that this will not only improve the annotation and classification of the Zn2C6 TF but will also provide critical guidance for future fungal gene regulatory network analyses.
Collapse
Affiliation(s)
- Claudine Mayer
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Faculté des Sciences, Université Paris Cité, UFR Sciences du Vivant, 75013 Paris, France
- Correspondence: (C.M.); (J.D.T.)
| | - Arthur Vogt
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Tuba Uslu
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Kirsley Chennen
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
| | - Julie D. Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000 Strasbourg, France
- Correspondence: (C.M.); (J.D.T.)
| |
Collapse
|
8
|
Wen C, Margolis M, Dai R, Zhang P, Przytycki PF, Vo DD, Bhattacharya A, Matoba N, Jiao C, Kim M, Tsai E, Hoh C, Aygün N, Walker RL, Chatzinakos C, Clarke D, Pratt H, Consortium P, Peters MA, Gerstein M, Daskalakis NP, Weng Z, Jaffe AE, Kleinman JE, Hyde TM, Weinberger DR, Bray NJ, Sestan N, Geschwind DH, Roeder K, Gusev A, Pasaniuc B, Stein JL, Love MI, Pollard KS, Liu C, Gandal MJ. Cross-ancestry, cell-type-informed atlas of gene, isoform, and splicing regulation in the developing human brain. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.03.03.23286706. [PMID: 36945630 PMCID: PMC10029021 DOI: 10.1101/2023.03.03.23286706] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/23/2023]
Abstract
Genomic regulatory elements active in the developing human brain are notably enriched in genetic risk for neuropsychiatric disorders, including autism spectrum disorder (ASD), schizophrenia, and bipolar disorder. However, prioritizing the specific risk genes and candidate molecular mechanisms underlying these genetic enrichments has been hindered by the lack of a single unified large-scale gene regulatory atlas of human brain development. Here, we uniformly process and systematically characterize gene, isoform, and splicing quantitative trait loci (xQTLs) in 672 fetal brain samples from unique subjects across multiple ancestral populations. We identify 15,752 genes harboring a significant xQTL and map 3,739 eQTLs to a specific cellular context. We observe a striking drop in gene expression and splicing heritability as the human brain develops. Isoform-level regulation, particularly in the second trimester, mediates the greatest proportion of heritability across multiple psychiatric GWAS, compared with eQTLs. Via colocalization and TWAS, we prioritize biological mechanisms for ~60% of GWAS loci across five neuropsychiatric disorders, nearly two-fold that observed in the adult brain. Finally, we build a comprehensive set of developmentally regulated gene and isoform co-expression networks capturing unique genetic enrichments across disorders. Together, this work provides a comprehensive view of genetic regulation across human brain development as well as the stage-and cell type-informed mechanistic underpinnings of neuropsychiatric disorders.
Collapse
Affiliation(s)
- Cindy Wen
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Michael Margolis
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Rujia Dai
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
| | - Pan Zhang
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Pawel F Przytycki
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
| | - Daniel D Vo
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Nana Matoba
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Chuan Jiao
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
| | - Minsoo Kim
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Ellen Tsai
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Celine Hoh
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Nil Aygün
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Rebecca L Walker
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Christos Chatzinakos
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Declan Clarke
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
| | - Henry Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
| | - PsychENCODE Consortium
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
- CNS Data Coordination Group, Sage Bionetworks; Seattle, WA, 98109, USA
- Program in Computational Biology and Bioinformatics, Yale University; New Haven, CT, 06520, USA
- Department of Computer Science, Yale University; New Haven, CT, 06520, USA
- Department of Statistics and Data Science, Yale University; New Haven, CT, 06520, USA
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Neumora Therapeutics; Watertown, MA, 02472, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine; Cardiff, CF24 4HQ, UK
- Department of Comparative Medicine, Yale University School of Medicine; New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University School of Medicine; New Haven, CT, 06520, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Statistics & Data Science, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Computational Biology Department, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute; Boston, MA, 02215, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Harvard Medical School; Boston, MA, 02215, USA
- Division of Genetics, Brigham and Women's Hospital; Boston, MA, 02215, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco; San Francisco, CA, 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA, 94158, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University; Changsha, Hunan, 410008, China
| | - Mette A Peters
- CNS Data Coordination Group, Sage Bionetworks; Seattle, WA, 98109, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University; New Haven, CT, 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University; New Haven, CT, 06520, USA
- Department of Computer Science, Yale University; New Haven, CT, 06520, USA
- Department of Statistics and Data Science, Yale University; New Haven, CT, 06520, USA
| | - Nikolaos P Daskalakis
- Department of Psychiatry, Harvard Medical School; Boston, MA, 02215, USA
- McLean Hospital; Belmont, MA, 02478, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School; Worcester, MA, 01605, USA
| | - Andrew E Jaffe
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health; Baltimore, MD, 21205, USA
- Neumora Therapeutics; Watertown, MA, 02472, USA
| | - Joel E Kleinman
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Thomas M Hyde
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Daniel R Weinberger
- Lieber Institute for Brain Development; Baltimore, MD, 21205, USA
- Department of Psychiatry & Behavioral Sciences, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Genetic Medicine, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins University School of Medicine; Baltimore, MD, 21205, USA
| | - Nicholas J Bray
- MRC Centre for Neuropsychiatric Genetics & Genomics, Division of Psychological Medicine & Clinical Neurosciences, Cardiff University School of Medicine; Cardiff, CF24 4HQ, UK
| | - Nenad Sestan
- Department of Comparative Medicine, Yale University School of Medicine; New Haven, CT, 06520, USA
- Department of Neuroscience, Yale University School of Medicine; New Haven, CT, 06520, USA
| | - Daniel H Geschwind
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Kathryn Roeder
- Department of Statistics & Data Science, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
- Computational Biology Department, Carnegie Mellon University; Pittsburgh, PA, 15213, USA
| | - Alexander Gusev
- Department of Medical Oncology, Division of Population Sciences, Dana-Farber Cancer Institute; Boston, MA, 02215, USA
- Broad Institute of MIT and Harvard; Cambridge, MA, 02142, USA
- Harvard Medical School; Boston, MA, 02215, USA
- Division of Genetics, Brigham and Women's Hospital; Boston, MA, 02215, USA
| | - Bogdan Pasaniuc
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Institute for Precision Health, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
| | - Jason L Stein
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- UNC Neuroscience Center, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Michael I Love
- Department of Genetics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill; Chapel Hill, NC, 27599, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology; San Francisco, CA, 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco; San Francisco, CA, 94158, USA
- Chan Zuckerberg Biohub; San Francisco, CA, 94158, USA
| | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University; Syracuse, NY, 13210, USA
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University; Changsha, Hunan, 410008, China
| | - Michael J Gandal
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles; Los Angeles, CA, 90095, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania; Philadelphia, PA, 19104, USA
- Lifespan Brain Institute, The Children's Hospital of Philadelphia; Philadelphia, PA, 19104, USA
| |
Collapse
|
9
|
Premzl M. Revised eutherian gene collections. BMC Genom Data 2022; 23:56. [PMID: 35870891 PMCID: PMC9308196 DOI: 10.1186/s12863-022-01071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/13/2022] [Indexed: 11/24/2022] Open
Abstract
Objectives The most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future, so that future revisions and updates of eutherian gene data sets were expected. Data description Using 35 public eutherian reference genomic sequence assemblies and free available software, the eutherian comparative genomic analysis protocol RRID:SCR_014401 was published as guidance against potential genomic sequence errors. The protocol curated 14 eutherian third-party data gene data sets, including, in aggregate, 2615 complete coding sequences that were deposited in European Nucleotide Archive. The published eutherian gene collections were used in revisions and updates of eutherian gene data set classifications and nomenclatures that included gene annotations, phylogenetic analyses and protein molecular evolution analyses.
Collapse
|
10
|
Meyer E, Chaung K, Dehghannasiri R, Salzman J. ReadZS detects cell type-specific and developmentally regulated RNA processing programs in single-cell RNA-seq. Genome Biol 2022; 23:226. [PMID: 36284317 PMCID: PMC9594907 DOI: 10.1186/s13059-022-02795-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 10/13/2022] [Indexed: 11/13/2022] Open
Abstract
RNA processing, including splicing and alternative polyadenylation, is crucial to gene function and regulation, but methods to detect RNA processing from single-cell RNA sequencing data are limited by reliance on pre-existing annotations, peak calling heuristics, and collapsing measurements by cell type. We introduce ReadZS, an annotation-free statistical approach to identify regulated RNA processing in single cells. ReadZS discovers cell type-specific RNA processing in human lung and conserved, developmentally regulated RNA processing in mammalian spermatogenesis-including global 3' UTR shortening in human spermatogenesis. ReadZS also discovers global 3' UTR lengthening in Arabidopsis development, highlighting the usefulness of this method in under-annotated transcriptomes.
Collapse
Affiliation(s)
- Elisabeth Meyer
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Kaitlin Chaung
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Roozbeh Dehghannasiri
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Julia Salzman
- Department of Biochemistry, Stanford University, Stanford, CA, 94305, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
- Department of Statistics (by courtesy), Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
11
|
Guan D, Halstead MM, Islas-Trejo AD, Goszczynski DE, Cheng HH, Ross PJ, Zhou H. Prediction of transcript isoforms in 19 chicken tissues by Oxford Nanopore long-read sequencing. Front Genet 2022; 13:997460. [PMID: 36246588 PMCID: PMC9561881 DOI: 10.3389/fgene.2022.997460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/30/2022] [Indexed: 11/22/2022] Open
Abstract
To identify and annotate transcript isoforms in the chicken genome, we generated Nanopore long-read sequencing data from 68 samples that encompassed 19 diverse tissues collected from experimental adult male and female White Leghorn chickens. More than 23.8 million reads with mean read length of 790 bases and average quality of 18.2 were generated. The annotation and subsequent filtering resulted in the identification of 55,382 transcripts at 40,547 loci with mean length of 1,700 bases. We predicted 30,967 coding transcripts at 19,461 loci, and 16,495 lncRNA transcripts at 15,512 loci. Compared to existing reference annotations, we found ∼52% of annotated transcripts could be partially or fully matched while ∼47% were novel. Seventy percent of novel transcripts were potentially transcribed from lncRNA loci. Based on our annotation, we quantified transcript expression across tissues and found two brain tissues (i.e., cerebellum and cortex) expressed the highest number of transcripts and loci. Furthermore, ∼22% of the transcripts displayed tissue specificity with the reproductive tissues (i.e., testis and ovary) exhibiting the most tissue-specific transcripts. Despite our wide sampling, ∼20% of Ensembl reference loci were not detected. This suggests that deeper sequencing and additional samples that include different breeds, cell types, developmental stages, and physiological conditions, are needed to fully annotate the chicken genome. The application of Nanopore sequencing in this study demonstrates the usefulness of long-read data in discovering additional novel loci (e.g., lncRNA loci) and resolving complex transcripts (e.g., the longest transcript for the TTN locus).
Collapse
Affiliation(s)
- Dailu Guan
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Michelle M. Halstead
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Alma D. Islas-Trejo
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Daniel E. Goszczynski
- Department of Animal Science, University of California Davis, Davis, CA, United States
| | - Hans H. Cheng
- USDA, ARS, USNPRC, Avian Disease and Oncology Laboratory, East Lansing, MI, United States
| | - Pablo J. Ross
- Department of Animal Science, University of California Davis, Davis, CA, United States
- *Correspondence: Pablo J. Ross, ; Huaijun Zhou,
| | - Huaijun Zhou
- Department of Animal Science, University of California Davis, Davis, CA, United States
- *Correspondence: Pablo J. Ross, ; Huaijun Zhou,
| |
Collapse
|
12
|
Using genome-wide expression compendia to study microorganisms. Comput Struct Biotechnol J 2022; 20:4315-4324. [PMID: 36016717 PMCID: PMC9396250 DOI: 10.1016/j.csbj.2022.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/07/2022] [Accepted: 08/07/2022] [Indexed: 11/30/2022] Open
Abstract
A gene expression compendium is a heterogeneous collection of gene expression experiments assembled from data collected for diverse purposes. The widely varied experimental conditions and genetic backgrounds across samples creates a tremendous opportunity for gaining a systems level understanding of the transcriptional responses that influence phenotypes. Variety in experimental design is particularly important for studying microbes, where the transcriptional responses integrate many signals and demonstrate plasticity across strains including response to what nutrients are available and what microbes are present. Advances in high-throughput measurement technology have made it feasible to construct compendia for many microbes. In this review we discuss how these compendia are constructed and analyzed to reveal transcriptional patterns.
Collapse
|
13
|
Data Incompleteness May form a Hard-to-Overcome Barrier to Decoding Life’s Mechanism. BIOLOGY 2022; 11:biology11081208. [PMID: 36009835 PMCID: PMC9404739 DOI: 10.3390/biology11081208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/03/2022] [Accepted: 08/10/2022] [Indexed: 11/23/2022]
Abstract
Simple Summary The influence of data incompleteness on the correctness of conclusions about the structure and functions of the objects under study is widely discussed in the literature. It was noted that even a small percentage of missing data can lead to incorrect conclusions and imperfect knowledge. In particular, incompleteness can lead to critical errors in the qualitative and quantitative assessments of interactions in biological systems and a distorted understanding of the functioning mechanisms of living systems. In this brief review, we attempt to demonstrate the extent of this incompleteness in functional information about living systems using the best-studied examples. We suggest that this incompleteness may form seemingly insurmountable barriers in deciphering the mechanisms of the functioning of complex systems with unpredictable properties arising from the interaction of the system components. Abstract In this brief review, we attempt to demonstrate that the incompleteness of data, as well as the intrinsic heterogeneity of biological systems, may form very strong and possibly insurmountable barriers for researchers trying to decipher the mechanisms of the functioning of live systems. We illustrate this challenge using the two most studied organisms: E. coli, with 34.6% genes lacking experimental evidence of function, and C. elegans, with identified proteins for approximately 50% of its genes. Another striking example is an artificial unicellular entity named JCVI-syn3.0, with a minimal set of genes. A total of 31.5% of the genes of JCVI-syn3.0 cannot be ascribed a specific biological function. The human interactome mapping project identified only 5–10% of all protein interactions in humans. In addition, most of the available data are static snapshots, and it is barely possible to generate realistic models of the dynamic processes within cells. Moreover, the existing interactomes reflect the de facto interaction but not its functional result, which is an unpredictable emerging property. Perhaps the completeness of molecular data on any living organism is beyond our reach and represents an unsolvable problem in biology.
Collapse
|
14
|
Tan S, Wang W, Jie W, Liu J. FishExp: A comprehensive database and analysis platform for gene expression and alternative splicing of fish species. Comput Struct Biotechnol J 2022; 20:3676-3684. [PMID: 35891795 PMCID: PMC9293738 DOI: 10.1016/j.csbj.2022.07.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 07/07/2022] [Accepted: 07/07/2022] [Indexed: 11/09/2022] Open
Abstract
The publicly archived RNA-seq data has grown exponentially, while its valuable information has not yet been fully discovered and utilized, such as alternative splicing and its integration with gene expression. This is especially true for fish species which play important roles in ecology, research and the food industry. Furthermore, there is a lack of online platform to analyze users’ new data individually and jointly with existing data for the comprehensive analysis of alternative splicing and gene expression. Here, we present FishExp, a web-based data platform covering gene expression and alternative splicing in 26,081 RNA-seq experiments from 44 fishes. It allows users to query the data in a variety of ways, including gene identifier/symbol, functional term, and BLAST alignment. Moreover, users can customize experiments and tools to perform differential/specific expression and alternative splicing analysis, co-expression and cross-species analysis. In addition, functional enrichment is provided to confer biological significance. Notably, users are allowed to submit their own data and perform various analyses using the new data alone or alongside existing data in FishExp. Results of retrieval and analysis can be visualized on the gene-, transcript- and splicing event-level webpage in a highly interactive and intuitive manner. All data in FishExp can be downloaded for more in-depth analysis. The manually curated sample information, uniform data processing and various tools make it efficient for users to gain new insights from these large data sets, facilitating scientific hypothesis generation. FishExp is freely accessible at https://bioinfo.njau.edu.cn/fishExp.
Collapse
Affiliation(s)
- Suxu Tan
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Wenwen Wang
- School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University, Auburn, AL 36849, USA
| | - Wencai Jie
- Institute for Plant Molecular Biology, State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, Jiangsu 210023, China
| | - Jinding Liu
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA.,Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| |
Collapse
|
15
|
An unexplored angle: T cell antigen discoveries reveal a marginal contribution of proteasome splicing to the immunogenic MHC class I antigen pool. Proc Natl Acad Sci U S A 2022; 119:e2119736119. [PMID: 35858315 PMCID: PMC9303865 DOI: 10.1073/pnas.2119736119] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
In the current era of T cell–based immunotherapies, it is crucial to understand which types of MHC-presented T cell antigens are produced by tumor cells. In addition to linear peptide antigens, chimeric peptides are generated through proteasome-catalyzed peptide splicing (PCPS). Whether such spliced peptides are abundantly presented by MHC is highly disputed because of disagreement in computational analyses of mass spectrometry data of MHC-eluted peptides. Moreover, such mass spectrometric analyses cannot elucidate how much spliced peptides contribute to the pool of immunogenic antigens. In this Perspective, we explain the significance of knowing the contribution of spliced peptides for accurate analyses of peptidomes on one hand, and to serve as a potential source of targetable tumor antigens on the other hand. Toward a strategy for mass spectrometry independent estimation of the contribution of PCPS to the immunopeptidome, we first reviewed methodologies to identify MHC-presented spliced peptide antigens expressed by tumors. Data from these identifications allowed us to compile three independent datasets containing 103, 74, and 83 confirmed T cell antigens from cancer patients. Only 3.9%, 1.4%, and between 0% and 7.2% of these truly immunogenic antigens are produced by PCPS, therefore providing a marginal contribution to the pool of immunogenic tumor antigens. We conclude that spliced peptides will not serve as a comprehensive source to expand the number of targetable antigens for immunotherapies.
Collapse
|
16
|
Kwak Y, Daly CWP, Fogarty EA, Grimson A, Kwak H. Dynamic and widespread control of poly(A) tail length during macrophage activation. RNA (NEW YORK, N.Y.) 2022; 28:947-971. [PMID: 35512831 PMCID: PMC9202586 DOI: 10.1261/rna.078918.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 03/21/2022] [Indexed: 06/14/2023]
Abstract
The poly(A) tail enhances translation and transcript stability, and tail length is under dynamic control during cell state transitions. Tail regulation plays essential roles in translational timing and fertilization in early development, but poly(A) tail dynamics have not been fully explored in post-embryonic systems. Here, we examined the landscape and impact of tail length control during macrophage activation. Upon activation, more than 1500 mRNAs, including proinflammatory genes, underwent distinctive changes in tail lengths. Increases in tail length correlated with mRNA levels regardless of transcriptional activity, and many mRNAs that underwent tail extension encode proteins necessary for immune function and post-transcriptional regulation. Strikingly, we found that ZFP36, whose protein product destabilizes target transcripts, undergoes tail extension. Our analyses indicate that many mRNAs undergoing tail lengthening are, in turn, degraded by elevated levels of ZFP36, constituting a post-transcriptional feedback loop that ensures transient regulation of transcripts integral to macrophage activation. Taken together, this study establishes the complexity, relevance, and widespread nature of poly(A) tail dynamics, and the resulting post-transcriptional regulation during macrophage activation.
Collapse
Affiliation(s)
- Yeonui Kwak
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
- Graduate Field of Genetics, Genomics, and Development, Cornell University, Ithaca, New York 14853, USA
| | - Ciarán W P Daly
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
- Graduate Field of Biochemistry, Molecular, and Cell Biology, Cornell University, Ithaca, New York 14853, USA
| | - Elizabeth A Fogarty
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Andrew Grimson
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| | - Hojoong Kwak
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
| |
Collapse
|
17
|
Leveraging omic features with F3UTER enables identification of unannotated 3'UTRs for synaptic genes. Nat Commun 2022; 13:2270. [PMID: 35477703 PMCID: PMC9046390 DOI: 10.1038/s41467-022-30017-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
There is growing evidence for the importance of 3' untranslated region (3'UTR) dependent regulatory processes. However, our current human 3'UTR catalogue is incomplete. Here, we develop a machine learning-based framework, leveraging both genomic and tissue-specific transcriptomic features to predict previously unannotated 3'UTRs. We identify unannotated 3'UTRs associated with 1,563 genes across 39 human tissues, with the greatest abundance found in the brain. These unannotated 3'UTRs are significantly enriched for RNA binding protein (RBP) motifs and exhibit high human lineage-specificity. We find that brain-specific unannotated 3'UTRs are enriched for the binding motifs of important neuronal RBPs such as TARDBP and RBFOX1, and their associated genes are involved in synaptic function. Our data is shared through an online resource F3UTER ( https://astx.shinyapps.io/F3UTER/ ). Overall, our data improves 3'UTR annotation and provides additional insights into the mRNA-RBP interactome in the human brain, with implications for our understanding of neurological and neurodevelopmental diseases.
Collapse
|
18
|
Zhang CY, Xiao X, Zhang Z, Hu Z, Li M. An alternative splicing hypothesis for neuropathology of schizophrenia: evidence from studies on historical candidate genes and multi-omics data. Mol Psychiatry 2022; 27:95-112. [PMID: 33686213 DOI: 10.1038/s41380-021-01037-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 01/08/2021] [Accepted: 01/22/2021] [Indexed: 01/31/2023]
Abstract
Alternative splicing of schizophrenia risk genes, such as DRD2, GRM3, and DISC1, has been extensively described. Nevertheless, the alternative splicing characteristics of the growing number of schizophrenia risk genes identified through genetic analyses remain relatively opaque. Recently, transcriptomic analyses in human brains based on short-read RNA-sequencing have discovered many "local splicing" events (e.g., exon skipping junctions) associated with genetic risk of schizophrenia, and further molecular characterizations have identified novel spliced isoforms, such as AS3MTd2d3 and ZNF804AE3E4. In addition, long-read sequencing analyses of schizophrenia risk genes (e.g., CACNA1C and NRXN1) have revealed multiple previously unannotated brain-abundant isoforms with therapeutic potentials, and functional analyses of KCNH2-3.1 and Ube3a1 have provided examples for investigating such spliced isoforms in vitro and in vivo. These findings suggest that alternative splicing may be an essential molecular mechanism underlying genetic risk of schizophrenia, however, the incomplete annotations of human brain transcriptomes might have limited our understanding of schizophrenia pathogenesis, and further efforts to elucidate these transcriptional characteristics are urgently needed to gain insights into the illness-correlated brain physiology and pathology as well as to translate genetic discoveries into novel therapeutic targets.
Collapse
Affiliation(s)
- Chu-Yi Zhang
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Xiao Xiao
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Zhuohua Zhang
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China.,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Zhonghua Hu
- Institute of Molecular Precision Medicine and Hunan Key Laboratory of Molecular Precision Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Center for Medical Genetics and Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Department of Critical Care Medicine, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Animal Models for Human Diseases, School of Life Sciences, Central South University, Changsha, Hunan, China. .,Eye Center of Xiangya Hospital and Hunan Key Laboratory of Ophthalmology, Central South University, Changsha, Hunan, China. .,National Clinical Research Center on Mental Disorders, Changsha, Hunan, China.
| | - Ming Li
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China. .,Kunming College of Life Science, University of Chinese Academy of Sciences, Kunming, Yunnan, China. .,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China.
| |
Collapse
|
19
|
Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, Imada EL, Zhang D, Joseph L, Leek JT, Jaffe AE, Nellore A, Collado-Torres L, Hansen KD, Langmead B. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol 2021; 22:323. [PMID: 34844637 PMCID: PMC8628444 DOI: 10.1186/s13059-021-02533-6] [Citation(s) in RCA: 82] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 10/29/2021] [Indexed: 12/12/2022] Open
Abstract
We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .
Collapse
Affiliation(s)
- Christopher Wilks
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Shijie C Zheng
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | | | - Rone Charles
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Thomas M. Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonathan P Ling
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA
| | - David Zhang
- Institute of Child Health, University College London (UCL), London, UK
| | | | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
- Lieber Institute for Brain Development, Baltimore, USA
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA
| | - Abhinav Nellore
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Department of Surgery, Oregon Health & Science University, Portland, OR, USA
| | | | - Kasper D Hansen
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, USA.
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, USA.
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
20
|
Scalzitti N, Kress A, Orhand R, Weber T, Moulinier L, Jeannin-Girardon A, Collet P, Poch O, Thompson JD. Spliceator: multi-species splice site prediction using convolutional neural networks. BMC Bioinformatics 2021; 22:561. [PMID: 34814826 PMCID: PMC8609763 DOI: 10.1186/s12859-021-04471-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 11/09/2021] [Indexed: 12/14/2022] Open
Abstract
Background Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. Results We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89–92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. Conclusions Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04471-3.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Arnaud Kress
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Romain Orhand
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Thomas Weber
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Luc Moulinier
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.,BiGEst-ICube Platform, ICube Laboratory, UMR7357, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Anne Jeannin-Girardon
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Pierre Collet
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Olivier Poch
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France
| | - Julie D Thompson
- Complex Systems and Translational Bioinformatics (CSTB), ICube Laboratory, UMR7357, University of Strasbourg, 1 rue Eugène Boeckel, 67000, Strasbourg, France.
| |
Collapse
|
21
|
Leung SK, Jeffries AR, Castanho I, Jordan BT, Moore K, Davies JP, Dempster EL, Bray NJ, O'Neill P, Tseng E, Ahmed Z, Collier DA, Jeffery ED, Prabhakar S, Schalkwyk L, Jops C, Gandal MJ, Sheynkman GM, Hannon E, Mill J. Full-length transcript sequencing of human and mouse cerebral cortex identifies widespread isoform diversity and alternative splicing. Cell Rep 2021; 37:110022. [PMID: 34788620 PMCID: PMC8609283 DOI: 10.1016/j.celrep.2021.110022] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 07/30/2021] [Accepted: 10/28/2021] [Indexed: 12/05/2022] Open
Abstract
Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community. There is widespread transcript diversity in the cortex and many novel transcripts Some genes display big differences in isoform number between human and mouse cortex There is evidence of differential transcript usage between human fetal and adult cortex There are many novel isoforms of genes associated with human brain disease
Collapse
Key Words
- isoform, transcript, expression, brain, cortex, mouse, human, adult, fetal, long-read sequencing, alternative splicing
Collapse
Affiliation(s)
| | | | - Isabel Castanho
- University of Exeter, Exeter, UK; Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Ben T Jordan
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | | | | | | | | | | | | | | | | | - Erin D Jeffery
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Shyam Prabhakar
- Genome Institute of Singapore, Agency for Science, Technology and Research (A(∗)STAR), Singapore, Singapore
| | | | - Connor Jops
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Michael J Gandal
- Department of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, USA
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA; UVA Cancer Center, University of Virginia, Charlottesville, VA, USA
| | | | | |
Collapse
|
22
|
Eagles NJ, Burke EE, Leonard J, Barry BK, Stolz JM, Huuki L, Phan BN, Serrato VL, Gutiérrez-Millán E, Aguilar-Ordoñez I, Jaffe AE, Collado-Torres L. SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses. BMC Bioinformatics 2021; 22:224. [PMID: 33932985 PMCID: PMC8088074 DOI: 10.1186/s12859-021-04142-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 04/21/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND RNA sequencing (RNA-seq) is a common and widespread biological assay, and an increasing amount of data is generated with it. In practice, there are a large number of individual steps a researcher must perform before raw RNA-seq reads yield directly valuable information, such as differential gene expression data. Existing software tools are typically specialized, only performing one step-such as alignment of reads to a reference genome-of a larger workflow. The demand for a more comprehensive and reproducible workflow has led to the production of a number of publicly available RNA-seq pipelines. However, we have found that most require computational expertise to set up or share among several users, are not actively maintained, or lack features we have found to be important in our own analyses. RESULTS In response to these concerns, we have developed a Scalable Pipeline for Expression Analysis and Quantification (SPEAQeasy), which is easy to install and share, and provides a bridge towards R/Bioconductor downstream analysis solutions. SPEAQeasy is portable across computational frameworks (SGE, SLURM, local, docker integration) and different configuration files are provided ( http://research.libd.org/SPEAQeasy/ ). CONCLUSIONS SPEAQeasy is user-friendly and lowers the computational-domain entry barrier for biologists and clinicians to RNA-seq data processing as the main input file is a table with sample names and their corresponding FASTQ files. The goal is to provide a flexible pipeline that is immediately usable by researchers, regardless of their technical background or computing environment.
Collapse
Affiliation(s)
- Nicholas J Eagles
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Emily E Burke
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Jacob Leonard
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- QuestBridge Scholar, Palo Alto, CA, 94303, USA
| | - Brianna K Barry
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Joshua M Stolz
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - Louise Huuki
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
| | - BaDoi N Phan
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
- Medical Scientist Training Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Violeta Larios Serrato
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- Instituto Politécnico Nacional, Escuela Nacional de Ciencias Biológicas, Mexico City, CDMX, 11340, Mexico
| | | | - Israel Aguilar-Ordoñez
- Winter Genomics, Salaverry 874 int 100, Lindavista, CDMX, 07300, Mexico
- Department of Supercomputing, Instituto Nacional de Medicina Genómica (INMEGEN), Mexico City, CDMX, 14610, Mexico
| | - Andrew E Jaffe
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
- Department of Genetic Medicine, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA
| | - Leonardo Collado-Torres
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, 21205, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, 21205, USA.
| |
Collapse
|
23
|
Chen Z, Zhang D, Reynolds RH, Gustavsson EK, García-Ruiz S, D'Sa K, Fairbrother-Browne A, Vandrovcova J, Hardy J, Houlden H, Gagliano Taliun SA, Botía J, Ryten M. Human-lineage-specific genomic elements are associated with neurodegenerative disease and APOE transcript usage. Nat Commun 2021; 12:2076. [PMID: 33824317 PMCID: PMC8024253 DOI: 10.1038/s41467-021-22262-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Accepted: 03/03/2021] [Indexed: 12/12/2022] Open
Abstract
Knowledge of genomic features specific to the human lineage may provide insights into brain-related diseases. We leverage high-depth whole genome sequencing data to generate a combined annotation identifying regions simultaneously depleted for genetic variation (constrained regions) and poorly conserved across primates. We propose that these constrained, non-conserved regions (CNCRs) have been subject to human-specific purifying selection and are enriched for brain-specific elements. We find that CNCRs are depleted from protein-coding genes but enriched within lncRNAs. We demonstrate that per-SNP heritability of a range of brain-relevant phenotypes are enriched within CNCRs. We find that genes implicated in neurological diseases have high CNCR density, including APOE, highlighting an unannotated intron-3 retention event. Using human brain RNA-sequencing data, we show the intron-3-retaining transcript to be more abundant in Alzheimer's disease with more severe tau and amyloid pathological burden. Thus, we demonstrate potential association of human-lineage-specific sequences in brain development and neurological disease.
Collapse
Affiliation(s)
- Zhongbo Chen
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - David Zhang
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Regina H Reynolds
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Emil K Gustavsson
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Sonia García-Ruiz
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Karishma D'Sa
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Aine Fairbrother-Browne
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Jana Vandrovcova
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
| | - John Hardy
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- Reta Lila Weston Institute, Queen Square Institute of Neurology, UCL, London, UK
- UK Dementia Research Institute, Queen Square Institute of Neurology, UCL, London, UK
- NIHR University College London Hospitals Biomedical Research Centre, London, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Henry Houlden
- Department of Neuromuscular Disease, Queen Square Institute of Neurology, UCL, London, UK
| | - Sarah A Gagliano Taliun
- Department of Medicine & Department of Neurosciences, Université de Montréal, Université de Montréal, Montréal, QC, Canada
- Montréal Heart Institute, Montréal, Québec, Canada
| | - Juan Botía
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Mina Ryten
- Department of Neurodegenerative Disease, Queen Square Institute of Neurology, University College London (UCL), London, UK.
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London, UK.
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London, UK.
| |
Collapse
|
24
|
Minnis CJ, Townsend S, Petschnigg J, Tinelli E, Bähler J, Russell C, Mole SE. Global network analysis in Schizosaccharomyces pombe reveals three distinct consequences of the common 1-kb deletion causing juvenile CLN3 disease. Sci Rep 2021; 11:6332. [PMID: 33737578 PMCID: PMC7973434 DOI: 10.1038/s41598-021-85471-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 02/23/2021] [Indexed: 12/15/2022] Open
Abstract
Juvenile CLN3 disease is a recessively inherited paediatric neurodegenerative disorder, with most patients homozygous for a 1-kb intragenic deletion in CLN3. The btn1 gene is the Schizosaccharomyces pombe orthologue of CLN3. Here, we have extended the use of synthetic genetic array (SGA) analyses to delineate functional signatures for two different disease-causing mutations in addition to complete deletion of btn1. We show that genetic-interaction signatures can differ for mutations in the same gene, which helps to dissect their distinct functional effects. The mutation equivalent to the minor transcript arising from the 1-kb deletion (btn1102–208del) shows a distinct interaction pattern. Taken together, our results imply that the minor 1-kb deletion transcript has three consequences for CLN3: to both lose and retain some inherent functions and to acquire abnormal characteristics. This has particular implications for the therapeutic development of juvenile CLN3 disease. In addition, this proof of concept could be applied to conserved genes for other mendelian disorders or any gene of interest, aiding in the dissection of their functional domains, unpacking the global consequences of disease pathogenesis, and clarifying genotype–phenotype correlations. In doing so, this detail will enhance the goals of personalised medicine to improve treatment outcomes and reduce adverse events.
Collapse
Affiliation(s)
- Christopher J Minnis
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK. .,Department of Comparative Biomedical Sciences, Royal Veterinary College, Royal College Street, London, NW1 0TU, UK.
| | - StJohn Townsend
- Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK.,The Molecular Biology of Metabolism Laboratory, The Francis Crick Institute, London, NW1 1AT, UK
| | - Julia Petschnigg
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK
| | - Elisa Tinelli
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK
| | - Jürg Bähler
- Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
| | - Claire Russell
- Department of Comparative Biomedical Sciences, Royal Veterinary College, Royal College Street, London, NW1 0TU, UK
| | - Sara E Mole
- MRC Laboratory for Molecular Cell Biology and Great Ormond Street, Institute of Child Health, University College London, London, WC1E 6BT, UK
| |
Collapse
|
25
|
Wilks C, Ahmed O, Baker DN, Zhang D, Collado-Torres L, Langmead B. Megadepth: efficient coverage quantification for BigWigs and BAMs. Bioinformatics 2021; 37:3014-3016. [PMID: 33693500 PMCID: PMC8528031 DOI: 10.1093/bioinformatics/btab152] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 01/16/2021] [Accepted: 03/04/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION A common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types. RESULTS Megadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19 000 GTExV8 BigWig files in approximately 1 h using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package. AVAILABILITY AND IMPLEMENTATION https://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/megadepth. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher Wilks
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA,To whom correspondence should be addressed.
or
| | - Omar Ahmed
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA
| | - Daniel N Baker
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA
| | - David Zhang
- Department of Molecular Neuroscience Institute of
Neurology, University College London (UCL), London WC1E 6BT,
UK,NIHR Great Ormond Street Hospital Biomedical
Research Centre, University College London, London WC1E 6BT,
UK,Genetics and Genomic Medicine, Great Ormond Street
Institute of Child Health University College London, London WC1E
6BT, UK
| | | | - Ben Langmead
- Department of Computer Science, Johns Hopkins
University, Baltimore, MD 21218, USA,To whom correspondence should be addressed.
or
| |
Collapse
|
26
|
Kölsch Y, Hahn J, Sappington A, Stemmer M, Fernandes AM, Helmbrecht TO, Lele S, Butrus S, Laurell E, Arnold-Ammer I, Shekhar K, Sanes JR, Baier H. Molecular classification of zebrafish retinal ganglion cells links genes to cell types to behavior. Neuron 2021; 109:645-662.e9. [PMID: 33357413 PMCID: PMC7897282 DOI: 10.1016/j.neuron.2020.12.003] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/09/2020] [Accepted: 12/01/2020] [Indexed: 12/12/2022]
Abstract
Retinal ganglion cells (RGCs) form an array of feature detectors, which convey visual information to central brain regions. Characterizing RGC diversity is required to understand the logic of the underlying functional segregation. Using single-cell transcriptomics, we systematically classified RGCs in adult and larval zebrafish, thereby identifying marker genes for >30 mature types and several developmental intermediates. We used this dataset to engineer transgenic driver lines, enabling specific experimental access to a subset of RGC types. Expression of one or few transcription factors often predicts dendrite morphologies and axonal projections to specific tectal layers and extratectal targets. In vivo calcium imaging revealed that molecularly defined RGCs exhibit specific functional tuning. Finally, chemogenetic ablation of eomesa+ RGCs, which comprise melanopsin-expressing types with projections to a small subset of central targets, selectively impaired phototaxis. Together, our study establishes a framework for systematically studying the functional architecture of the visual system.
Collapse
Affiliation(s)
- Yvonne Kölsch
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany; Graduate School of Systemic Neurosciences, Ludwig Maximilian University, 82152 Martinsried, Germany
| | - Joshua Hahn
- Department of Chemical and Biomolecular Engineering, UC Berkeley, Berkeley, CA 94720, USA
| | - Anna Sappington
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139, USA
| | - Manuel Stemmer
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - António M Fernandes
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Thomas O Helmbrecht
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Shriya Lele
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Salwan Butrus
- Department of Chemical and Biomolecular Engineering, UC Berkeley, Berkeley, CA 94720, USA
| | - Eva Laurell
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Irene Arnold-Ammer
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany
| | - Karthik Shekhar
- Department of Chemical and Biomolecular Engineering, UC Berkeley, Berkeley, CA 94720, USA; Helen Wills Neuroscience Institute, California Institute for Quantitative Biosciences, QB3, Center for Computational Biology, UC Berkeley, Berkeley, CA 94720, USA.
| | - Joshua R Sanes
- Center for Brain Science and Department of Molecular and Cell Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Herwig Baier
- Max Planck Institute of Neurobiology, Department Genes - Circuits - Behavior, 82152 Martinsried, Germany.
| |
Collapse
|