1
|
Maquedano M, Cerdán-Vélez D, Tress ML. More than 2,500 coding genes in the human reference gene set still have unsettled status. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.05.626965. [PMID: 39713347 PMCID: PMC11661123 DOI: 10.1101/2024.12.05.626965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
In 2018 we analysed the three main repositories for the human proteome, Ensembl/GENCODE, RefSeq and UniProtKB. They disagreed on the coding status of one of every eight annotated coding genes. The analysis inspired bilateral collaborations between annotation groups. Here we have repeated our analysis with updated versions of the three reference coding gene sets. Superficially, little appears to have changed. Although there are slightly fewer genes predicted as coding overall, the three groups still disagree on the status of 2,606 annotated genes. However, a comparison without read-through genes and immunoglobulin fragments shows that the three reference sets have merged or reclassified more than 700 genes since the last analysis and that just 0.6% of Ensembl/GENCODE coding genes are not also annotated by the other two reference sets. We used eight features indicative of non-coding genes to examine the 21,873 coding genes annotated across the three reference sets. We found that more than 2,000 had one or more potential non-coding features. While some of these genes will be protein coding, we believe that most are likely to be non-coding genes or pseudogenes. Our results suggest that annotators still vastly overestimate the number of true coding genes.
Collapse
Affiliation(s)
- Miguel Maquedano
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO)
| | | | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO)
| |
Collapse
|
2
|
Rodriguez JM, Maquedano M, Cerdan-Velez D, Calvo E, Vazquez J, Tress ML. A deep audit of the PeptideAtlas database uncovers evidence for unannotated coding genes and aberrant translation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.14.623419. [PMID: 39605392 PMCID: PMC11601488 DOI: 10.1101/2024.11.14.623419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
The human genome has been the subject of intense scrutiny by experimental and manual curation projects for more than two decades. Novel coding genes have been proposed from large-scale RNASeq, ribosome profiling and proteomics experiments. Here we carry out an in-depth analysis of an entire proteomics database. We analysed the proteins, peptides and spectra housed in the human build of the PeptideAtlas proteomics database to identify coding regions that are not yet annotated in the GENCODE reference gene set. We find support for hundreds of missing alternative protein isoforms and unannotated upstream translations, and evidence of cross-contamination from other species. There was reliable peptide evidence for 34 novel unannotated open reading frames (ORFs) in PeptideAtlas. We find that almost half belong to coding genes that are missing from GENCODE and other reference sets. Most of the remaining ORFs were not conserved beyond human, however, and their peptide confirmation was restricted to cancer cell lines. We show that this is strong evidence for aberrant translation, raising important questions about the extent of aberrant translation and how these ORFs should be annotated in reference genomes.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Miguel Maquedano
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Daniel Cerdan-Velez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Enrique Calvo
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), 28029 Madrid, Spain
- CIBER de Enfermedades Cardiovasculares (CIBERCV), 28029 Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| |
Collapse
|
3
|
Gussakovsky D, Black NA, Booy EP, McKenna SA. The role of SRP9/SRP14 in regulating Alu RNA. RNA Biol 2024; 21:1-12. [PMID: 39563162 PMCID: PMC11581171 DOI: 10.1080/15476286.2024.2430817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 11/08/2024] [Accepted: 11/12/2024] [Indexed: 11/21/2024] Open
Abstract
SRP9/SRP14 is a protein heterodimer that plays a critical role in the signal recognition particle through its interaction with the scaffolding signal recognition particle RNA (7SL). SRP9/SRP14 binding to 7SL is mediated through a conserved structural motif that is shared with the primate-specific Alu RNA. Alu RNA are transcription products of Alu elements, a retroelement that comprises ~10% of the human genome. Alu RNA are involved in myriad biological processes and are dysregulated in several human disease states. This review focuses on the roles SRP9/SRP14 has in regulating Alu RNA diversification, maturation, and function. The diverse mechanisms through which SRP9/SRP14 regulates Alu RNA exemplify the breadth of protein-mediated regulation of non-coding RNA.
Collapse
Affiliation(s)
| | - Nicole A. Black
- Department of Chemistry, University of Manitoba, Winnipeg, MB, Canada
| | - Evan P. Booy
- Department of Chemistry, University of Manitoba, Winnipeg, MB, Canada
| | - Sean A. McKenna
- Department of Chemistry, University of Manitoba, Winnipeg, MB, Canada
| |
Collapse
|
4
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
5
|
Pasquesi GIM, Allen H, Ivancevic A, Barbachano-Guerrero A, Joyner O, Guo K, Simpson DM, Gapin K, Horton I, Nguyen L, Yang Q, Warren CJ, Florea LD, Bitler BG, Santiago ML, Sawyer SL, Chuong EB. Regulation of human interferon signaling by transposon exonization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557241. [PMID: 37745311 PMCID: PMC10515820 DOI: 10.1101/2023.09.11.557241] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Innate immune signaling is essential for clearing pathogens and damaged cells, and must be tightly regulated to avoid excessive inflammation or autoimmunity. Here, we found that the alternative splicing of exons derived from transposable elements is a key mechanism controlling immune signaling in human cells. By analyzing long-read transcriptome datasets, we identified numerous transposon exonization events predicted to generate functional protein variants of immune genes, including the type I interferon receptor IFNAR2. We demonstrated that the transposon-derived isoform of IFNAR2 is more highly expressed than the canonical isoform in almost all tissues, and functions as a decoy receptor that potently inhibits interferon signaling including in cells infected with SARS-CoV-2. Our findings uncover a primate-specific axis controlling interferon signaling and show how a transposon exonization event can be co-opted for immune regulation.
Collapse
Affiliation(s)
- Giulia Irene Maria Pasquesi
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Crnic Institute Boulder Branch, BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, 80303
| | - Holly Allen
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Atma Ivancevic
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Arturo Barbachano-Guerrero
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Olivia Joyner
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Kejun Guo
- Division of Infectious Diseases, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - David M. Simpson
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Keala Gapin
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Isabella Horton
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Lily Nguyen
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - Qing Yang
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Fred Hutchinson Cancer Research Center, Seattle, WA, 98109
| | - Cody J. Warren
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- The Ohio State University College of Veterinary Medicine, Columbus, OH, 43210
| | - Liliana D. Florea
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205
| | - Benjamin G. Bitler
- Division of Reproductive Sciences, Department of Obstetrics and Gynecology, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - Mario L. Santiago
- Division of Infectious Diseases, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045
| | - Sara L. Sawyer
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
| | - Edward B. Chuong
- BioFrontiers Institute and Department of Molecular, Cellular & Developmental Biology, University of Colorado Boulder, Boulder, CO, 80309
- Crnic Institute Boulder Branch, BioFrontiers Institute, University of Colorado Boulder, Boulder, CO, 80303
| |
Collapse
|
6
|
Wang J, Weatheritt R, Voineagu I. Alu-minating the Mechanisms Underlying Primate Cortex Evolution. Biol Psychiatry 2022; 92:760-771. [PMID: 35981906 DOI: 10.1016/j.biopsych.2022.04.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 04/04/2022] [Accepted: 04/28/2022] [Indexed: 11/02/2022]
Abstract
The higher-order cognitive functions observed in primates correlate with the evolutionary enhancement of cortical volume and folding, which in turn are driven by the primate-specific expansion of cellular diversity in the developing cortex. Underlying these changes is the diversification of molecular features including the creation of human and/or primate-specific genes, the activation of specific molecular pathways, and the interplay of diverse layers of gene regulation. We review and discuss evidence for connections between Alu elements and primate brain evolution, the evolutionary milestones of which are known to coincide along primate lineages. Alus are repetitive elements that contribute extensively to the acquisition of novel genes and the expansion of diverse gene regulatory layers, including enhancers, alternative splicing, RNA editing, and microRNA pathways. By reviewing the impact of Alus on molecular features linked to cortical expansions or gyrification or implications in cognitive deficits, we suggest that future research focusing on the role of Alu-derived molecular events in the context of brain development may greatly advance our understanding of higher-order cognitive functions and neurologic disorders.
Collapse
Affiliation(s)
- Juli Wang
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia.
| | - Robert Weatheritt
- St Vincent Clinical School, University of New South Wales, Sydney, Australia; Garvan Institute of Medical Research, EMBL Australia, Sydney, New South Wales, Australia
| | - Irina Voineagu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia; Cellular Genomics Futures Institute, University of New South Wales, Sydney, Australia.
| |
Collapse
|
7
|
Martinez-Gomez L, Cerdán-Vélez D, Abascal F, Tress ML. Origins and Evolution of Human Tandem Duplicated Exon Substitution Events. Genome Biol Evol 2022; 14:6809199. [PMID: 36346145 PMCID: PMC9741552 DOI: 10.1093/gbe/evac162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/25/2022] [Accepted: 10/29/2022] [Indexed: 11/10/2022] Open
Abstract
The mutually exclusive splicing of tandem duplicated exons produces protein isoforms that are identical save for a homologous region that allows for the fine tuning of protein function. Tandem duplicated exon substitution events are rare, yet highly important alternative splicing events. Most events are ancient, their isoforms are highly expressed, and they have significantly more pathogenic mutations than other splice events. Here, we analyzed the physicochemical properties and functional roles of the homologous polypeptide regions produced by the 236 tandem duplicated exon substitutions annotated in the human gene set. We find that the most important structural and functional residues in these homologous regions are maintained, and that most changes are conservative rather than drastic. Three quarters of the isoforms produced from tandem duplicated exon substitution events are tissue-specific, particularly in nervous and cardiac tissues, and tandem duplicated exon substitution events are enriched in functional terms related to structures in the brain and skeletal muscle. We find considerable evidence for the convergent evolution of tandem duplicated exon substitution events in vertebrates, arthropods, and nematodes. Twelve human gene families have orthologues with tandem duplicated exon substitution events in both Drosophila melanogaster and Caenorhabditis elegans. Six of these gene families are ion transporters, suggesting that tandem exon duplication in genes that control the flow of ions into the cell has an adaptive benefit. The ancient origins, the strong indications of tissue-specific functions, and the evidence of convergent evolution suggest that these events may have played important roles in the evolution of animal tissues and organs.
Collapse
Affiliation(s)
- Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| | | |
Collapse
|
8
|
Clinical variant interpretation and biologically relevant reference transcripts. NPJ Genom Med 2022; 7:59. [PMID: 36257961 PMCID: PMC9579139 DOI: 10.1038/s41525-022-00329-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/29/2022] [Indexed: 12/03/2022] Open
Abstract
Clinical variant interpretation is highly dependent on the choice of reference transcript. Although the longest transcript has traditionally been chosen as the reference, APPRIS principal and MANE Select transcripts, biologically supported reference sequences, are now available. In this study, we show that MANE Select and APPRIS principal transcripts are the best reference transcripts for clinical variation. APPRIS principal and MANE Select transcripts capture almost all ClinVar pathogenic variants, and they are particularly powerful over the 94% of coding genes in which they agree. We find that a vanishingly small number of ClinVar pathogenic variants affect alternative protein products. Alternative isoforms that are likely to be clinically relevant can be predicted using TRIFID scores, the highest scoring alternative transcripts are almost 700 times more likely to house pathogenic variants. We believe that APPRIS, MANE and TRIFID are essential tools for clinical variant interpretation.
Collapse
|
9
|
Wright CJ, Smith CWJ, Jiggins CD. Alternative splicing as a source of phenotypic diversity. Nat Rev Genet 2022; 23:697-710. [PMID: 35821097 DOI: 10.1038/s41576-022-00514-4] [Citation(s) in RCA: 177] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/13/2022] [Indexed: 12/27/2022]
Abstract
A major goal of evolutionary genetics is to understand the genetic processes that give rise to phenotypic diversity in multicellular organisms. Alternative splicing generates multiple transcripts from a single gene, enriching the diversity of proteins and phenotypic traits. It is well established that alternative splicing contributes to key innovations over long evolutionary timescales, such as brain development in bilaterians. However, recent developments in long-read sequencing and the generation of high-quality genome assemblies for diverse organisms has facilitated comparisons of splicing profiles between closely related species, providing insights into how alternative splicing evolves over shorter timescales. Although most splicing variants are probably non-functional, alternative splicing is nonetheless emerging as a dynamic, evolutionarily labile process that can facilitate adaptation and contribute to species divergence.
Collapse
Affiliation(s)
- Charlotte J Wright
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK. .,Department of Zoology, University of Cambridge, Cambridge, UK.
| | | | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
10
|
Pinto A, Cunha C, Chaves R, Butchbach MER, Adega F. Comprehensive In Silico Analysis of Retrotransposon Insertions within the Survival Motor Neuron Genes Involved in Spinal Muscular Atrophy. BIOLOGY 2022; 11:824. [PMID: 35741345 PMCID: PMC9219815 DOI: 10.3390/biology11060824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 05/19/2022] [Accepted: 05/25/2022] [Indexed: 11/16/2022]
Abstract
Transposable elements (TEs) are interspersed repetitive and mobile DNA sequences within the genome. Better tools for evaluating TE-derived sequences have provided insights into the contribution of TEs to human development and disease. Spinal muscular atrophy (SMA) is an autosomal recessive motor neuron disease that is caused by deletions or mutations in the Survival Motor Neuron 1 (SMN1) gene but retention of its nearly perfect orthologue SMN2. Both genes are highly enriched in TEs. To establish a link between TEs and SMA, we conducted a comprehensive, in silico analysis of TE insertions within the SMN1/2 loci of SMA, carrier and healthy genomes. We found an Alu insertion in the promoter region and one L1 element in the 3'UTR that may play an important role in alternative promoter as well as in alternative transcriptional termination. Additionally, several intronic Alu repeats may influence alternative splicing via RNA circularization and causes the presence of new alternative exons. These Alu repeats present throughout the genes are also prone to recombination events that could lead to SMN1 exons deletions and, ultimately, SMA. TE characterization of the SMA genomic region could provide for a better understanding of the implications of TEs on human disease and genomic evolution.
Collapse
Affiliation(s)
- Albano Pinto
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (A.P.); (C.C.); (R.C.)
- BioISI-Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Catarina Cunha
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (A.P.); (C.C.); (R.C.)
- BioISI-Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Raquel Chaves
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (A.P.); (C.C.); (R.C.)
- BioISI-Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| | - Matthew E. R. Butchbach
- Division of Neurology, Nemours Children’s Hospital Delaware, Wilmington, DE 19803, USA;
- Department of Biological Sciences, University of Delaware, Newark, DE 19716, USA
- Department of Pediatrics, Sidney Kimmel College of Medicine, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Filomena Adega
- Laboratory of Cytogenomics and Animal Genomics (CAG), Department of Genetics and Biotechnology (DGB), University of Trás-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal; (A.P.); (C.C.); (R.C.)
- BioISI-Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, 1749-016 Lisbon, Portugal
| |
Collapse
|
11
|
Abstract
Alu RNA are implicated in the poor prognosis of several human disease states. These RNA are transcription products of primate specific transposable elements called Alu elements. These elements are extremely abundant, comprising over 10% of the human genome, and 100 to 1000 cytoplasmic copies of Alu RNA per cell. Alu RNA do not have a single universal functional role aside from selfish self-propagation. Despite this, Alu RNA have been found to operate in a diverse set of translational and transcriptional mechanisms. This review will focus on the current knowledge of Alu RNA involved in human disease states and known mechanisms of action. Examples of Alu RNA that are transcribed in a variety of contexts such as introns, mature mRNA, and non-coding transcripts will be discussed. Past and present challenges in studying Alu RNA, and the future directions of Alu RNA in basic and clinical research will also be examined.
Collapse
Affiliation(s)
| | - Sean A McKenna
- Department of Chemistry, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
12
|
Martinez Gomez L, Pozo F, Walsh TA, Abascal F, Tress ML. The clinical importance of tandem exon duplication-derived substitutions. Nucleic Acids Res 2021; 49:8232-8246. [PMID: 34302486 PMCID: PMC8373072 DOI: 10.1093/nar/gkab623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 07/21/2021] [Indexed: 01/04/2023] Open
Abstract
Most coding genes in the human genome are annotated with multiple alternative transcripts. However, clear evidence for the functional relevance of the protein isoforms produced by these alternative transcripts is often hard to find. Alternative isoforms generated from tandem exon duplication-derived substitutions are an exception. These splice events are rare, but have important functional consequences. Here, we have catalogued the 236 tandem exon duplication-derived substitutions annotated in the GENCODE human reference set. We find that more than 90% of the events have a last common ancestor in teleost fish, so are at least 425 million years old, and twenty-one can be traced back to the Bilateria clade. Alternative isoforms generated from tandem exon duplication-derived substitutions also have significantly more clinical impact than other alternative isoforms. Tandem exon duplication-derived substitutions have >25 times as many pathogenic and likely pathogenic mutations as other alternative events. Tandem exon duplication-derived substitutions appear to have vital functional roles in the cell and may have played a prominent part in metazoan evolution.
Collapse
Affiliation(s)
- Laura Martinez Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.,Eukaryotic Annotation Team, EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA. UK
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| |
Collapse
|
13
|
Pozo F, Martinez-Gomez L, Walsh TA, Rodriguez JM, Di Domenico T, Abascal F, Vazquez J, Tress ML. Assessing the functional relevance of splice isoforms. NAR Genom Bioinform 2021; 3:lqab044. [PMID: 34046593 PMCID: PMC8140736 DOI: 10.1093/nargab/lqab044] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 04/22/2021] [Accepted: 05/17/2021] [Indexed: 12/20/2022] Open
Abstract
Alternative splicing of messenger RNA can generate an array of mature transcripts, but it is not clear how many go on to produce functionally relevant protein isoforms. There is only limited evidence for alternative proteins in proteomics analyses and data from population genetic variation studies indicate that most alternative exons are evolving neutrally. Determining which transcripts produce biologically important isoforms is key to understanding isoform function and to interpreting the real impact of somatic mutations and germline variations. Here we have developed a method, TRIFID, to classify the functional importance of splice isoforms. TRIFID was trained on isoforms detected in large-scale proteomics analyses and distinguishes these biologically important splice isoforms with high confidence. Isoforms predicted as functionally important by the algorithm had measurable cross species conservation and significantly fewer broken functional domains. Additionally, exons that code for these functionally important protein isoforms are under purifying selection, while exons from low scoring transcripts largely appear to be evolving neutrally. TRIFID has been developed for the human genome, but it could in principle be applied to other well-annotated species. We believe that this method will generate valuable insights into the cellular importance of alternative splicing.
Collapse
Affiliation(s)
- Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Thomas A Walsh
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - José Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Tomas Di Domenico
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Hinxton CB10 1SA, UK
| | - Jesús Vazquez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
14
|
Stenz L. The L1-dependant and Pol III transcribed Alu retrotransposon, from its discovery to innate immunity. Mol Biol Rep 2021; 48:2775-2789. [PMID: 33725281 PMCID: PMC7960883 DOI: 10.1007/s11033-021-06258-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2021] [Accepted: 02/26/2021] [Indexed: 02/07/2023]
Abstract
The 300 bp dimeric repeats digestible by AluI were discovered in 1979. Since then, Alu were involved in the most fundamental epigenetic mechanisms, namely reprogramming, pluripotency, imprinting and mosaicism. These Alu encode a family of retrotransposons transcribed by the RNA Pol III machinery, notably when the cytosines that constitute their sequences are de-methylated. Then, Alu hijack the functions of ORF2 encoded by another transposons named L1 during reverse transcription and integration into new sites. That mechanism functions as a complex genetic parasite able to copy-paste Alu sequences. Doing that, Alu have modified even the size of the human genome, as well as of other primate genomes, during 65 million years of co-evolution. Actually, one germline retro-transposition still occurs each 20 births. Thus, Alu continue to modify our human genome nowadays and were implicated in de novo mutation causing diseases including deletions, duplications and rearrangements. Most recently, retrotransposons were found to trigger neuronal diversity by inducing mosaicism in the brain. Finally, boosted during viral infections, Alu clearly interact with the innate immune system. The purpose of that review is to give a condensed overview of all these major findings that concern the fascinating physiology of Alu from their discovery up to the current knowledge.
Collapse
Affiliation(s)
- Ludwig Stenz
- Department of Genetic Medicine and Development, Faculty of Medicine, Geneva University, Geneva, Switzerland. .,Swiss Centre for Applied Human Toxicology, University of Basel, Basel, Switzerland.
| |
Collapse
|