1
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
2
|
Sakamoto F, Kanamori S, Díaz LM, Cádiz A, Ishii Y, Yamaguchi K, Shigenobu S, Nakayama T, Makino T, Kawata M. Detection of evolutionary conserved and accelerated genomic regions related to adaptation to thermal niches in Anolis lizards. Ecol Evol 2024; 14:e11117. [PMID: 38455144 PMCID: PMC10920033 DOI: 10.1002/ece3.11117] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 02/18/2024] [Accepted: 02/22/2024] [Indexed: 03/09/2024] Open
Abstract
Understanding the genetic basis for adapting to thermal environments is important due to serious effects of global warming on ectothermic species. Various genes associated with thermal adaptation in lizards have been identified mainly focusing on changes in gene expression or the detection of positively selected genes using coding regions. Only a few comprehensive genome-wide analyses have included noncoding regions. This study aimed to identify evolutionarily conserved and accelerated genomic regions using whole genomes of eight Anolis lizard species that have repeatedly adapted to similar thermal environments in multiple lineages. Evolutionarily conserved genomic regions were extracted as regions with overall sequence conservation (regions with fewer base substitutions) across all lineages compared with the neutral model. Genomic regions that underwent accelerated evolution in the lineage of interest were identified as those with more base substitutions in the target branch than in the entire background branch. Conserved elements across all branches were relatively abundant in "intergenic" genomic regions among noncoding regions. Accelerated regions (ARs) of each lineage contained a significantly greater proportion of noncoding RNA genes than the entire multiple alignment. Common genes containing ARs within 5 kb of their vicinity in lineages with similar thermal habitats were identified. Many genes associated with circadian rhythms and behavior were found in hot-open and cool-shaded habitat lineages. These genes might play a role in contributing to thermal adaptation and assist future studies examining the function of genes involved in thermal adaptation via genome editing.
Collapse
Affiliation(s)
- Fuku Sakamoto
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| | | | - Luis M. Díaz
- National Museum of Natural History of CubaHavanaCuba
| | - Antonio Cádiz
- Faculty of BiologyUniversity of HavanaHavanaCuba
- Present address:
Department of BiologyUniversity of MiamiCoral GablesFloridaUSA
| | - Yuu Ishii
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| | | | - Shuji Shigenobu
- Trans‐Omics FacilityNational Institute for Basic BiologyOkazakiJapan
- Department of Basic Biology, School of Life ScienceThe Graduate University for Advanced Studies, SOKENDAIOkazakiJapan
| | - Takuro Nakayama
- Division of Life Sciences, Center for Computational SciencesUniversity of TsukubaTsukubaJapan
| | - Takashi Makino
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| | - Masakado Kawata
- Graduate School of Life SciencesTohoku UniversitySendaiJapan
| |
Collapse
|
3
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
4
|
Benisty H, Hernandez-Alias X, Weber M, Anglada-Girotto M, Mantica F, Radusky L, Senger G, Calvet F, Weghorn D, Irimia M, Schaefer MH, Serrano L. Genes enriched in A/T-ending codons are co-regulated and conserved across mammals. Cell Syst 2023; 14:312-323.e3. [PMID: 36889307 DOI: 10.1016/j.cels.2023.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 07/11/2022] [Accepted: 02/09/2023] [Indexed: 03/09/2023]
Abstract
Codon usage influences gene expression distinctly depending on the cell context. Yet, the importance of codon bias in the simultaneous turnover of specific groups of protein-coding genes remains to be investigated. Here, we find that genes enriched in A/T-ending codons are expressed more coordinately in general and across tissues and development than those enriched in G/C-ending codons. tRNA abundance measurements indicate that this coordination is linked to the expression changes of tRNA isoacceptors reading A/T-ending codons. Genes with similar codon composition are more likely to be part of the same protein complex, especially for genes with A/T-ending codons. The codon preferences of genes with A/T-ending codons are conserved among mammals and other vertebrates. We suggest that this orchestration contributes to tissue-specific and ontogenetic-specific expression, which can facilitate, for instance, timely protein complex formation.
Collapse
Affiliation(s)
- Hannah Benisty
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain.
| | - Xavier Hernandez-Alias
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Marc Weber
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Miquel Anglada-Girotto
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Federica Mantica
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Leandro Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Gökçe Senger
- Department of Experimental Oncology, European Institute of Oncology (IEO) IRCCS, Via Adamello 16, Milan 20139, Italy
| | - Ferriol Calvet
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Donate Weghorn
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Manuel Irimia
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain
| | - Martin H Schaefer
- Department of Experimental Oncology, European Institute of Oncology (IEO) IRCCS, Via Adamello 16, Milan 20139, Italy
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain; Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain; ICREA, Pg. Lluis Companys 23, Barcelona 08010, Spain.
| |
Collapse
|
5
|
Antonov IV, O’Loughlin S, Gorohovski AN, O’Connor PB, Baranov PV, Atkins JF. Streptomyces rare codon UUA: from features associated with 2 adpA related locations to candidate phage regulatory translational bypassing. RNA Biol 2023; 20:926-942. [PMID: 37968863 PMCID: PMC10732093 DOI: 10.1080/15476286.2023.2270812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 10/02/2023] [Indexed: 11/17/2023] Open
Abstract
In Streptomyces species, the cell cycle involves a switch from an early and vegetative state to a later phase where secondary products including antibiotics are synthesized, aerial hyphae form and sporulation occurs. AdpA, which has two domains, activates the expression of numerous genes involved in the switch from the vegetative growth phase. The adpA mRNA of many Streptomyces species has a UUA codon in a linker region between 5' sequence encoding one domain and 3' sequence encoding its other and C-terminal domain. UUA codons are exceptionally rare in Streptomyces, and its functional cognate tRNA is not present in a fully modified and acylated form, in the early and vegetative phase of the cell cycle though it is aminoacylated later. Here, we report candidate recoding signals that may influence decoding of the linker region UUA. Additionally, a short ORF 5' of the main ORF has been identified with a GUG at, or near, its 5' end and an in-frame UUA near its 3' end. The latter is commonly 5 nucleotides 5' of the main ORF start. Ribosome profiling data show translation of that 5' region. Ten years ago, UUA-mediated translational bypassing was proposed as a sensor by a Streptomyces phage of its host's cell cycle stage and an effector of its lytic/lysogeny switch. We provide the first experimental evidence supportive of this proposal.
Collapse
Affiliation(s)
- Ivan V. Antonov
- Russian Academy of Science, Institute of Bioengineering, Research Center of Biotechnology, Moscow, Russia
- Laboratory of Bioinformatics, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Sinéad O’Loughlin
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - Alessandro N. Gorohovski
- Russian Academy of Science, Institute of Bioengineering, Research Center of Biotechnology, Moscow, Russia
- Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | - Pavel V. Baranov
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| | - John F. Atkins
- School of Biochemistry and Cell Biology, University College Cork, Cork, Ireland
| |
Collapse
|
6
|
Transcriptional Regulation and Implications for Controlling Hox Gene Expression. J Dev Biol 2022; 10:jdb10010004. [PMID: 35076545 PMCID: PMC8788451 DOI: 10.3390/jdb10010004] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/04/2022] [Accepted: 01/06/2022] [Indexed: 02/06/2023] Open
Abstract
Hox genes play key roles in axial patterning and regulating the regional identity of cells and tissues in a wide variety of animals from invertebrates to vertebrates. Nested domains of Hox expression generate a combinatorial code that provides a molecular framework for specifying the properties of tissues along the A–P axis. Hence, it is important to understand the regulatory mechanisms that coordinately control the precise patterns of the transcription of clustered Hox genes required for their roles in development. New insights are emerging about the dynamics and molecular mechanisms governing transcriptional regulation, and there is interest in understanding how these may play a role in contributing to the regulation of the expression of the clustered Hox genes. In this review, we summarize some of the recent findings, ideas and emerging mechanisms underlying the regulation of transcription in general and consider how they may be relevant to understanding the transcriptional regulation of Hox genes.
Collapse
|
7
|
Klapproth C, Sen R, Stadler PF, Findeiß S, Fallmann J. Common Features in lncRNA Annotation and Classification: A Survey. Noncoding RNA 2021; 7:77. [PMID: 34940758 PMCID: PMC8708962 DOI: 10.3390/ncrna7040077] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/03/2021] [Accepted: 12/06/2021] [Indexed: 12/29/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
Collapse
Affiliation(s)
- Christopher Klapproth
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Rituparno Sen
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz-Center for Infection Research (HZI), D-97080 Würzburg, Germany;
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, D-04103 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá CO-111321, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Sven Findeiß
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| |
Collapse
|
8
|
Jungreis I, Sealfon R, Kellis M. SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes. Nat Commun 2021; 12:2642. [PMID: 33976134 PMCID: PMC8113528 DOI: 10.1038/s41467-021-22905-7] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 03/28/2021] [Indexed: 02/03/2023] Open
Abstract
Despite its clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. We use comparative genomics to provide a high-confidence protein-coding gene set, characterize evolutionary constraint, and prioritize functional mutations. We select 44 Sarbecovirus genomes at ideally-suited evolutionary distances, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for ORFs 3a, 6, 7a, 7b, 8, 9b, and a novel alternate-frame gene, ORF3c, whereas ORFs 2b, 3d/3d-2, 3b, 9c, and 10 lack protein-coding signatures or convincing experimental evidence of protein-coding function. Furthermore, we show no other conserved protein-coding genes remain to be discovered. Mutation analysis suggests ORF8 contributes to within-individual fitness but not person-to-person transmission. Cross-strain and within-strain evolutionary pressures agree, except for fewer-than-expected within-strain mutations in nsp3 and S1, and more-than-expected in nucleocapsid, which shows a cluster of mutations in a predicted B-cell epitope, suggesting immune-avoidance selection. Evolutionary histories of residues disrupted by spike-protein substitutions D614G, N501Y, E484K, and K417N/T provide clues about their biology, and we catalog likely-functional co-inherited mutations. Previously reported RNA-modification sites show no enrichment for conservation. Here we report a high-confidence gene set and evolutionary-history annotations providing valuable resources and insights on SARS-CoV-2 biology, mutations, and evolution.
Collapse
Affiliation(s)
- Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Rachel Sealfon
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
9
|
Jordan-Paiz A, Franco S, Martínez MA. Impact of Synonymous Genome Recoding on the HIV Life Cycle. Front Microbiol 2021; 12:606087. [PMID: 33796084 PMCID: PMC8007914 DOI: 10.3389/fmicb.2021.606087] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 02/25/2021] [Indexed: 12/19/2022] Open
Abstract
Synonymous mutations within protein coding regions introduce changes in DNA or messenger (m) RNA, without mutating the encoded proteins. Synonymous recoding of virus genomes has facilitated the identification of previously unknown virus biological features. Moreover, large-scale synonymous recoding of the genome of human immunodeficiency virus type 1 (HIV-1) has elucidated new antiviral mechanisms within the innate immune response, and has improved our knowledge of new functional virus genome structures, the relevance of codon usage for the temporal regulation of viral gene expression, and HIV-1 mutational robustness and adaptability. Continuous improvements in our understanding of the impacts of synonymous substitutions on virus phenotype - coupled with the decreased cost of chemically synthesizing DNA and improved methods for assembling DNA fragments - have enhanced our ability to identify potential HIV-1 and host factors and other aspects involved in the infection process. In this review, we address how silent mutagenesis impacts HIV-1 phenotype and replication capacity. We also discuss the general potential of synonymous recoding of the HIV-1 genome to elucidate unknown aspects of the virus life cycle, and to identify new therapeutic targets.
Collapse
Affiliation(s)
- Ana Jordan-Paiz
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Sandra Franco
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Miguel Angel Martínez
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| |
Collapse
|
10
|
Kousi M, Söylemez O, Ozanturk A, Mourtzi N, Akle S, Jungreis I, Muller J, Cassa CA, Brand H, Mokry JA, Wolf MY, Sadeghpour A, McFadden K, Lewis RA, Talkowski ME, Dollfus H, Kellis M, Davis EE, Sunyaev SR, Katsanis N. Evidence for secondary-variant genetic burden and non-random distribution across biological modules in a recessive ciliopathy. Nat Genet 2020; 52:1145-1150. [PMID: 33046855 PMCID: PMC8272915 DOI: 10.1038/s41588-020-0707-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Accepted: 08/31/2020] [Indexed: 11/08/2022]
Abstract
The influence of genetic background on driver mutations is well established; however, the mechanisms by which the background interacts with Mendelian loci remain unclear. We performed a systematic secondary-variant burden analysis of two independent cohorts of patients with Bardet-Biedl syndrome (BBS) with known recessive biallelic pathogenic mutations in one of 17 BBS genes for each individual. We observed a significant enrichment of trans-acting rare nonsynonymous secondary variants in patients with BBS compared with either population controls or a cohort of individuals with a non-BBS diagnosis and recessive variants in the same gene set. Strikingly, we found a significant over-representation of secondary alleles in chaperonin-encoding genes-a finding corroborated by the observation of epistatic interactions involving this complex in vivo. These data indicate a complex genetic architecture for BBS that informs the biological properties of disease modules and presents a model for secondary-variant burden analysis in recessive disorders.
Collapse
Affiliation(s)
- Maria Kousi
- Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Onuralp Söylemez
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Aysegül Ozanturk
- Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA
| | - Niki Mourtzi
- Advanced Center for Translational and Genetic Medicine, Lurie Children's Hospital, Chicago, IL, USA
| | - Sebastian Akle
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Boston, MA, USA
| | - Irwin Jungreis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Jean Muller
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France
- Laboratoire de Diagnostic Génétique, Institut de Génétique Médicale d'Alsace, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Christopher A Cassa
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Harrison Brand
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine and Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
- Program in Population and Medical Genetics and Genomics Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jill Anne Mokry
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Maxim Y Wolf
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Azita Sadeghpour
- Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA
| | - Kelsey McFadden
- Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA
| | - Richard A Lewis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Ophthalmology, Baylor College of Medicine, Houston, TX, USA
| | - Michael E Talkowski
- Molecular Neurogenetics Unit and Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine and Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
- Program in Population and Medical Genetics and Genomics Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA
| | - Hélène Dollfus
- Laboratoire de Génétique Médicale, Institut de Génétique Médicale d'Alsace, INSERM U1112, Fédération de Médecine Translationnelle de Strasbourg (FMTS), Université de Strasbourg, Strasbourg, France
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Erica E Davis
- Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA
- Advanced Center for Translational and Genetic Medicine, Lurie Children's Hospital, Chicago, IL, USA
| | - Shamil R Sunyaev
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Nicholas Katsanis
- Center for Human Disease Modeling, Duke University Medical Center, Durham, NC, USA.
- Advanced Center for Translational and Genetic Medicine, Lurie Children's Hospital, Chicago, IL, USA.
- Departments of Pediatrics and Cellular and Molecular Biology, Northwestern University School of Medicine, Chicago, IL, USA.
| |
Collapse
|
11
|
Jungreis I, Sealfon R, Kellis M. SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes. RESEARCH SQUARE 2020:rs.3.rs-80345. [PMID: 33024961 PMCID: PMC7536840 DOI: 10.21203/rs.3.rs-80345/v1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution.
Collapse
Affiliation(s)
- Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Rachel Sealfon
- Center for Computational Biology, Flatiron Institute, New York, NY
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| |
Collapse
|
12
|
Jungreis I, Sealfon R, Kellis M. SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.06.02.130955. [PMID: 32577641 PMCID: PMC7302193 DOI: 10.1101/2020.06.02.130955] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Despite its overwhelming clinical importance, the SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Here, we use comparative genomics to provide a high-confidence protein-coding gene set, characterize protein-level and nucleotide-level evolutionary constraint, and prioritize functional mutations from the ongoing COVID-19 pandemic. We select 44 complete Sarbecovirus genomes at evolutionary distances ideally-suited for protein-coding and non-coding element identification, create whole-genome alignments, and quantify protein-coding evolutionary signatures and overlapping constraint. We find strong protein-coding signatures for all named genes and for 3a, 6, 7a, 7b, 8, 9b, and also ORF3c, a novel alternate-frame gene. By contrast, ORF10, and overlapping-ORFs 9c, 3b, and 3d lack protein-coding signatures or convincing experimental evidence and are not protein-coding. Furthermore, we show no other protein-coding genes remain to be discovered. Cross-strain and within-strain evolutionary pressures largely agree at the gene, amino-acid, and nucleotide levels, with some notable exceptions, including fewer-than-expected mutations in nsp3 and Spike subunit S1, and more-than-expected mutations in Nucleocapsid. The latter also shows a cluster of amino-acid-changing variants in otherwise-conserved residues in a predicted B-cell epitope, which may indicate positive selection for immune avoidance. Several Spike-protein mutations, including D614G, which has been associated with increased transmission, disrupt otherwise-perfectly-conserved amino acids, and could be novel adaptations to human hosts. The resulting high-confidence gene set and evolutionary-history annotations provide valuable resources and insights on COVID-19 biology, mutations, and evolution.
Collapse
Affiliation(s)
- Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| | - Rachel Sealfon
- Center for Computational Biology, Flatiron Institute, New York, NY
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA
- Broad Institute of MIT and Harvard, Cambridge, MA
| |
Collapse
|
13
|
Martínez MA, Jordan-Paiz A, Franco S, Nevot M. Synonymous genome recoding: a tool to explore microbial biology and new therapeutic strategies. Nucleic Acids Res 2020; 47:10506-10519. [PMID: 31584076 PMCID: PMC6846928 DOI: 10.1093/nar/gkz831] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 09/12/2019] [Accepted: 09/30/2019] [Indexed: 12/18/2022] Open
Abstract
Synthetic genome recoding is a new means of generating designed organisms with altered phenotypes. Synonymous mutations introduced into the protein coding region tolerate modifications in DNA or mRNA without modifying the encoded proteins. Synonymous genome-wide recoding has allowed the synthetic generation of different small-genome viruses with modified phenotypes and biological properties. Recently, a decreased cost of chemically synthesizing DNA and improved methods for assembling DNA fragments (e.g. lambda red recombination and CRISPR-based editing) have enabled the construction of an Escherichia coli variant with a 4-Mb synthetic synonymously recoded genome with a reduced number of sense codons (n = 59) encoding the 20 canonical amino acids. Synonymous genome recoding is increasing our knowledge of microbial interactions with innate immune responses, identifying functional genome structures, and strategically ameliorating cis-inhibitory signaling sequences related to splicing, replication (in eukaryotes), and complex microbe functions, unraveling the relevance of codon usage for the temporal regulation of gene expression and the microbe mutant spectrum and adaptability. New biotechnological and therapeutic applications of this methodology can easily be envisaged. In this review, we discuss how synonymous genome recoding may impact our knowledge of microbial biology and the development of new and better therapeutic methodologies.
Collapse
Affiliation(s)
- Miguel Angel Martínez
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Ana Jordan-Paiz
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Sandra Franco
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Maria Nevot
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| |
Collapse
|
14
|
The long-term restoration of ecosystem complexity. Nat Ecol Evol 2020; 4:676-685. [DOI: 10.1038/s41559-020-1154-1] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 02/19/2020] [Indexed: 12/25/2022]
|
15
|
Hecker N, Hiller M. A genome alignment of 120 mammals highlights ultraconserved element variability and placenta-associated enhancers. Gigascience 2020; 9:giz159. [PMID: 31899510 PMCID: PMC6941714 DOI: 10.1093/gigascience/giz159] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 11/29/2019] [Accepted: 12/13/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Multiple alignments of mammalian genomes have been the basis of many comparative genomic studies aiming at annotating genes, detecting regions under evolutionary constraint, and studying genome evolution. A key factor that affects the power of comparative analyses is the number of species included in a genome alignment. RESULTS To utilize the increased number of sequenced genomes and to provide an accessible resource for genomic studies, we generated a mammalian genome alignment comprising 120 species. We used this alignment and the CESAR method to provide protein-coding gene annotations for 119 non-human mammals. Furthermore, we illustrate the utility of this alignment by 2 exemplary analyses. First, we quantified how variable ultraconserved elements (UCEs) are among placental mammals. Leveraging the high taxonomic coverage in our alignment, we estimate that UCEs contain on average 4.7%-15.6% variable alignment columns. Furthermore, we show that the center regions of UCEs are generally most constrained. Second, we identified enhancer sequences that are only conserved in placental mammals. We found that these enhancers are significantly associated with placenta-related genes, suggesting that some of these enhancers may be involved in the evolution of placental mammal-specific aspects of the placenta. CONCLUSION The 120-mammal alignment and all other data are available for analysis and visualization in a genome browser at https://genome-public.pks.mpg.de/and for download at https://bds.mpi-cbg.de/hillerlab/120MammalAlignment/.
Collapse
Affiliation(s)
- Nikolai Hecker
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr. 108, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Noethnitzer Str. 38, 01187 Dresden, Germany
- Center for Systems Biology Dresden, Pfotenhauerstr. 108, 01307 Dresden, Germany
| |
Collapse
|
16
|
Chuang TJ, Chen YJ, Chen CY, Mai TL, Wang YD, Yeh CS, Yang MY, Hsiao YT, Chang TH, Kuo TC, Cho HH, Shen CN, Kuo HC, Lu MY, Chen YH, Hsieh SC, Chiang TW. Integrative transcriptome sequencing reveals extensive alternative trans-splicing and cis-backsplicing in human cells. Nucleic Acids Res 2019; 46:3671-3691. [PMID: 29385530 PMCID: PMC6283421 DOI: 10.1093/nar/gky032] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 01/13/2018] [Indexed: 01/16/2023] Open
Abstract
Transcriptionally non-co-linear (NCL) transcripts can originate from trans-splicing (trans-spliced RNA; 'tsRNA') or cis-backsplicing (circular RNA; 'circRNA'). While numerous circRNAs have been detected in various species, tsRNAs remain largely uninvestigated. Here, we utilize integrative transcriptome sequencing of poly(A)- and non-poly(A)-selected RNA-seq data from diverse human cell lines to distinguish between tsRNAs and circRNAs. We identified 24,498 NCL events and found that a considerable proportion (20-35%) of them arise from both tsRNAs and circRNAs, representing extensive alternative trans-splicing and cis-backsplicing in human cells. We show that sequence generalities of exon circularization are also observed in tsRNAs. Recapitulation of NCL RNAs further shows that inverted Alu repeats can simultaneously promote the formation of tsRNAs and circRNAs. However, tsRNAs and circRNAs exhibit quite different, or even opposite, expression patterns, in terms of correlation with the expression of their co-linear counterparts, expression breadth/abundance, transcript stability, and subcellular localization preference. These results indicate that tsRNAs and circRNAs may play different regulatory roles and analysis of NCL events should take the joint effects of different NCL-splicing types and joint effects of multiple NCL events into consideration. This study describes the first transcriptome-wide analysis of trans-splicing and cis-backsplicing, expanding our understanding of the complexity of the human transcriptome.
Collapse
Affiliation(s)
- Trees-Juen Chuang
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan.,Genome and Systems Biology Degree Program, National Taiwan University, Taipei 10617 & Academia Sinica, Taipei 11529, Taiwan
| | - Yen-Ju Chen
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan.,Genome and Systems Biology Degree Program, National Taiwan University, Taipei 10617 & Academia Sinica, Taipei 11529, Taiwan
| | - Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Te-Lun Mai
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Yi-Da Wang
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chung-Shu Yeh
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan.,Institute of Biochemistry and Molecular Biology, National Yang-Ming University, Taipei 11221, Taiwan
| | - Min-Yu Yang
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Yu-Ting Hsiao
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | | | - Tzu-Chien Kuo
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Hsin-Hua Cho
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Chia-Ning Shen
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Hung-Chih Kuo
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 11529, Taiwan
| | - Mei-Yeh Lu
- High Throughput Genomics Core, Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Yi-Hua Chen
- High Throughput Genomics Core, Biodiversity Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Shan-Chi Hsieh
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 11529, Taiwan
| | - Tai-Wei Chiang
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
17
|
Fontrodona N, Aubé F, Claude JB, Polvèche H, Lemaire S, Tranchevent LC, Modolo L, Mortreux F, Bourgeois CF, Auboeuf D. Interplay between coding and exonic splicing regulatory sequences. Genome Res 2019; 29:711-722. [PMID: 30962178 PMCID: PMC6499313 DOI: 10.1101/gr.241315.118] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 03/28/2019] [Indexed: 01/24/2023]
Abstract
The inclusion of exons during the splicing process depends on the binding of splicing factors to short low-complexity regulatory sequences. The relationship between exonic splicing regulatory sequences and coding sequences is still poorly understood. We demonstrate that exons that are coregulated by any given splicing factor share a similar nucleotide composition bias and preferentially code for amino acids with similar physicochemical properties because of the nonrandomness of the genetic code. Indeed, amino acids sharing similar physicochemical properties correspond to codons that have the same nucleotide composition bias. In particular, we uncover that the TRA2A and TRA2B splicing factors that bind to adenine-rich motifs promote the inclusion of adenine-rich exons coding preferentially for hydrophilic amino acids that correspond to adenine-rich codons. SRSF2 that binds guanine/cytosine-rich motifs promotes the inclusion of GC-rich exons coding preferentially for small amino acids, whereas SRSF3 that binds cytosine-rich motifs promotes the inclusion of exons coding preferentially for uncharged amino acids, like serine and threonine that can be phosphorylated. Finally, coregulated exons encoding amino acids with similar physicochemical properties correspond to specific protein features. In conclusion, the regulation of an exon by a splicing factor that relies on the affinity of this factor for specific nucleotide(s) is tightly interconnected with the exon-encoded physicochemical properties. We therefore uncover an unanticipated bidirectional interplay between the splicing regulatory process and its biological functional outcome.
Collapse
Affiliation(s)
- Nicolas Fontrodona
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Jean-Baptiste Claude
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Hélène Polvèche
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Sébastien Lemaire
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Léon-Charles Tranchevent
- Proteome and Genome Research Unit, Department of Oncology, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg
| | - Laurent Modolo
- LBMC Biocomputing Center, CNRS UMR 5239, INSERM U1210, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
18
|
Chen CY, Chuang TJ. NCLcomparator: systematically post-screening non-co-linear transcripts (circular, trans-spliced, or fusion RNAs) identified from various detectors. BMC Bioinformatics 2019; 20:3. [PMID: 30606103 PMCID: PMC6318855 DOI: 10.1186/s12859-018-2589-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 12/21/2018] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Non-co-linear (NCL) transcripts consist of exonic sequences that are topologically inconsistent with the reference genome in an intragenic fashion (circular or intragenic trans-spliced RNAs) or in an intergenic fashion (fusion or intergenic trans-spliced RNAs). On the basis of RNA-seq data, numerous NCL event detectors have been developed and detected thousands of NCL events in diverse species. However, there are great discrepancies in the identification results among detectors, indicating a considerable proportion of false positives in the detected NCL events. Although several helpful guidelines for evaluating the performance of NCL event detectors have been provided, a systematic guideline for measurement of NCL events identified by existing tools has not been available. RESULTS We develop a software, NCLcomparator, for systematically post-screening the intragenic or intergenic NCL events identified by various NCL detectors. NCLcomparator first examine whether the input NCL events are potentially false positives derived from ambiguous alignments (i.e., the NCL events have an alternative co-linear explanation or multiple matches against the reference genome). To evaluate the reliability of the identified NCL events, we define the NCL score (NCLscore) based on the variation in the number of supporting NCL junction reads identified by the tools examined. Of the input NCL events, we show that the ambiguous alignment-derived events have relatively lower NCLscore values than the other events, indicating that an NCL event with a higher NCLscore has a higher level of reliability. To help selecting highly expressed NCL events, NCLcomparator also provides a series of useful measurements such as the expression levels of the detected NCL events and their corresponding host genes and the junction usage of the co-linear splice junctions at both NCL donor and acceptor sites. CONCLUSION NCLcomparator provides useful guidelines, with the input of identified NCL events from various detectors and the corresponding paired-end RNA-seq data only, to help users selecting potentially high-confidence NCL events for further functional investigation. The software thus helps to facilitate future studies into NCL events, shedding light on the fundamental biology of this important but understudied class of transcripts. NCLcomparator is freely accessible at https://github.com/TreesLab/NCLcomparator .
Collapse
Affiliation(s)
- Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei, 11529 Taiwan
| | | |
Collapse
|
19
|
Functional relevance of synonymous alleles reflected in allele rareness in the population. Genomics 2018; 110:347-354. [DOI: 10.1016/j.ygeno.2018.04.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 04/09/2018] [Indexed: 12/19/2022]
|
20
|
Abrahams L, Hurst LD. Refining the Ambush Hypothesis: Evidence That GC- and AT-Rich Bacteria Employ Different Frameshift Defence Strategies. Genome Biol Evol 2018; 10:1153-1173. [PMID: 29617761 PMCID: PMC5909447 DOI: 10.1093/gbe/evy075] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2018] [Indexed: 12/13/2022] Open
Abstract
Stop codons are frequently selected for beyond their regular termination function for error control. The “ambush hypothesis” proposes out-of-frame stop codons (OSCs) terminating frameshifted translations are selected for. Although early indirect evidence was partially supportive, recent evidence suggests OSC frequencies are not exceptional when considering underlying nucleotide content. However, prior null tests fail to control amino acid/codon usages or possible local mutational biases. We therefore return to the issue using bacterial genomes, considering several tests defining and testing against a null. We employ simulation approaches preserving amino acid order but shuffling synonymous codons or preserving codons while shuffling amino acid order. Additionally, we compare codon usage in amino acid pairs, where one codon can but the next, otherwise identical codon, cannot encode an OSC. OSC frequencies exceed expectations typically in AT-rich genomes, the +1 frame and for TGA/TAA but not TAG. With this complex evidence, simply rejecting or accepting the ambush hypothesis is not warranted. We propose a refined post hoc model, whereby AT-rich genomes have more accidental frameshifts, handled by RF2–RF3 complexes (associated with TGA/TAA) and are mostly +1 (or −2) slips. Supporting this, excesses positively correlate with in silico predicted frameshift probabilities. Thus, we propose a more viable framework, whereby genomes broadly adopt one of the two strategies to combat frameshifts: preventing frameshifting (GC-rich) or permitting frameshifts but minimizing impacts when most are caught early (AT-rich). Our refined framework holds promise yet some features, such as the bias of out-of-frame sense codons, remain unexplained.
Collapse
Affiliation(s)
- Liam Abrahams
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| |
Collapse
|
21
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
22
|
Circular RNA expression is abundant and correlated to aggressiveness in early-stage bladder cancer. NPJ Genom Med 2017; 2:36. [PMID: 29263845 PMCID: PMC5705701 DOI: 10.1038/s41525-017-0038-z] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 10/13/2017] [Accepted: 10/31/2017] [Indexed: 12/26/2022] Open
Abstract
The functions and biomarker potential of circular RNAs (circRNAs) in various cancer types are a rising field of study, as emerging evidence relates circRNAs to tumorigenesis. Here, we profiled the expression of circRNAs in 457 tumors from patients with non-muscle-invasive bladder cancer (NMIBC). We show that a set of highly expressed circRNAs have conserved core splice sites, are associated with Alu repeats, and enriched with Synonymous Constraint Elements as well as microRNA target sites. We identified 113 abundant circRNAs that are differentially expressed between high and low-risk tumor subtypes. Analysis of progression-free survival revealed 13 circRNAs, among them circHIPK3 and circCDYL, where expression correlated with progression independently of the linear transcript and the host gene. In summary, our results demonstrate that abundant circRNAs possess multiple biological features, distinguishing them from low-expressed circRNAs and non-circularized exons, and suggest that circRNAs might serve as a new class of prognostic biomarkers in NMIBC. Expression levels of non-coding “circular” RNA molecules could be used as a prognostic biomarker for patients with early-stage bladder cancer. A team led by Trine Line Hauge Okholm and Jakob Skou Pedersen from Aarhus University Hospital, Denmark, profiled the expression of these loop-forming, potentially gene-regulating RNAs in biopsied tumor samples from 457 patients with bladder cancer that had not invaded nearby muscle tissue. They identified a suite of 113 circular RNAs that were abundant and differentially expressed between patients with different molecular subtypes of bladder cancer. The researchers also found a smaller set of 13 circular RNAs for which expression levels correlated with disease progression. These non-coding RNA molecules, by indicating likely patient outcomes, could potentially serve as future diagnostic aids to inform treatment strategies and decisions.
Collapse
|
23
|
Harrisson KA, Amish SJ, Pavlova A, Narum SR, Telonis‐Scott M, Rourke ML, Lyon J, Tonkin Z, Gilligan DM, Ingram BA, Lintermans M, Gan HM, Austin CM, Luikart G, Sunnucks P. Signatures of polygenic adaptation associated with climate across the range of a threatened fish species with high genetic connectivity. Mol Ecol 2017; 26:6253-6269. [DOI: 10.1111/mec.14368] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 09/22/2017] [Accepted: 09/25/2017] [Indexed: 12/25/2022]
Affiliation(s)
- Katherine A. Harrisson
- School of Biological Sciences Monash University Clayton Vic. Australia
- Department of Ecology Environment and Evolution School of Life Sciences La Trobe University Bundoora Vic. Australia
- Arthur Rylah Institute for Environmental Research Heidelberg Vic. Australia
| | - Stephen J. Amish
- Conservation Genomics Group Division of Biological Sciences University of Montana Missoula MT USA
- Flathead Lake Biological Station University of Montana Polson MT USA
| | - Alexandra Pavlova
- School of Biological Sciences Monash University Clayton Vic. Australia
| | - Shawn R. Narum
- Columbia River Inter‐Tribal Fish Commission Hagerman Fish Culture Experiment Station Hagerman IDUSA
| | | | - Meaghan L. Rourke
- Department of Primary Industries DPI Fisheries Narrandera NSW Australia
| | - Jarod Lyon
- Arthur Rylah Institute for Environmental Research Heidelberg Vic. Australia
| | - Zeb Tonkin
- Arthur Rylah Institute for Environmental Research Heidelberg Vic. Australia
| | - Dean M. Gilligan
- Department of Primary Industries DPI Fisheries, Batemans Bay Fisheries Office Batemans Bay NSW Australia
| | | | - Mark Lintermans
- Institute for Applied Ecology University of Canberra Canberra ACT Australia
| | - Han Ming Gan
- Centre for Integrative Ecology School of Life and Environmental Sciences Deakin University Geelong Vic. Australia
- School of Science Monash University Malaysia Petaling Jaya Selangor Malaysia
- Genomics Facility, Tropical Medicine and Biology Platform Monash University Malaysia Petaling Jaya Selangor Malaysia
| | - Christopher M. Austin
- Centre for Integrative Ecology School of Life and Environmental Sciences Deakin University Geelong Vic. Australia
- School of Science Monash University Malaysia Petaling Jaya Selangor Malaysia
- Genomics Facility, Tropical Medicine and Biology Platform Monash University Malaysia Petaling Jaya Selangor Malaysia
| | - Gordon Luikart
- Conservation Genomics Group Division of Biological Sciences University of Montana Missoula MT USA
- Flathead Lake Biological Station University of Montana Polson MT USA
| | - Paul Sunnucks
- School of Biological Sciences Monash University Clayton Vic. Australia
| |
Collapse
|
24
|
Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res 2017. [PMID: 28645144 PMCID: PMC5737078 DOI: 10.1093/nar/gkx554] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics.
Collapse
Affiliation(s)
- Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| |
Collapse
|
25
|
Dinan AM, Atkins JF, Firth AE. ASXL gain-of-function truncation mutants: defective and dysregulated forms of a natural ribosomal frameshifting product? Biol Direct 2017; 12:24. [PMID: 29037253 PMCID: PMC5644247 DOI: 10.1186/s13062-017-0195-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Accepted: 10/04/2017] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Programmed ribosomal frameshifting (PRF) is a gene expression mechanism which enables the translation of two N-terminally coincident, C-terminally distinct protein products from a single mRNA. Many viruses utilize PRF to control or regulate gene expression, but very few phylogenetically conserved examples are known in vertebrate genes. Additional sex combs-like (ASXL) genes 1 and 2 encode important epigenetic and transcriptional regulatory proteins that control the expression of homeotic genes during key developmental stages. Here we describe an ~150-codon overlapping ORF (termed TF) in ASXL1 and ASXL2 that, with few exceptions, is conserved throughout vertebrates. RESULTS Conservation of the TF ORF, strong suppression of synonymous site variation in the overlap region, and the completely conserved presence of an EH[N/S]Y motif (a known binding site for Host Cell Factor-1, HCF-1, an epigenetic regulatory factor), all indicate that TF is a protein-coding sequence. A highly conserved UCC_UUU_CGU sequence (identical to the known site of +1 ribosomal frameshifting for influenza virus PA-X expression) occurs at the 5' end of the region of enhanced synonymous site conservation in ASXL1. Similarly, a highly conserved RG_GUC_UCU sequence (identical to a known site of -2 ribosomal frameshifting for arterivirus nsp2TF expression) occurs at the 5' end of the region of enhanced synonymous site conservation in ASXL2. CONCLUSIONS Due to a lack of appropriate splice forms, or initiation sites, the most plausible mechanism for translation of the ASXL1 and 2 TF regions is ribosomal frameshifting, resulting in a transframe fusion of the N-terminal half of ASXL1 or 2 to the TF product, termed ASXL-TF. Truncation or frameshift mutants of ASXL are linked to myeloid malignancies and genetic diseases, such as Bohring-Opitz syndrome, likely at least in part as a result of gain-of-function or dominant-negative effects. Our hypothesis now indicates that these disease-associated mutant forms represent overexpressed defective versions of ASXL-TF. REVIEWERS This article was reviewed by Laurence Hurst and Eugene Koonin.
Collapse
Affiliation(s)
- Adam M Dinan
- Department of Pathology, Division of Virology, University of Cambridge, Cambridge, CB2 1QP, UK
| | - John F Atkins
- School of Biochemistry and Cell Biology, University College Cork, T12 YT57, Cork, Ireland.,Department of Human Genetics, University of Utah, Salt Lake City, UT, 84112, USA
| | - Andrew E Firth
- Department of Pathology, Division of Virology, University of Cambridge, Cambridge, CB2 1QP, UK.
| |
Collapse
|
26
|
Hagedorn PH, Persson R, Funder ED, Albæk N, Diemer SL, Hansen DJ, Møller MR, Papargyri N, Christiansen H, Hansen BR, Hansen HF, Jensen MA, Koch T. Locked nucleic acid: modality, diversity, and drug discovery. Drug Discov Today 2017; 23:101-114. [PMID: 28988994 DOI: 10.1016/j.drudis.2017.09.018] [Citation(s) in RCA: 158] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 09/01/2017] [Accepted: 09/27/2017] [Indexed: 01/05/2023]
Abstract
Over the past 20 years, the field of RNA-targeted therapeutics has advanced based on discoveries of modified oligonucleotide chemistries, and an ever-increasing understanding of how to apply cellular assays to identify oligonucleotides with improved pharmacological properties in vivo. Locked nucleic acid (LNA), which exhibits high binding affinity and potency, is widely used for this purpose. Our understanding of RNA biology has also expanded tremendously, resulting in new approaches to engage RNA as a therapeutic target. Recent observations indicate that each oligonucleotide is a unique entity, and small structural differences between oligonucleotides can often lead to substantial differences in their pharmacological properties. Here, we outline new principles for drug discovery exploiting oligonucleotide diversity to identify rare molecules with unique pharmacological properties.
Collapse
Affiliation(s)
- Peter H Hagedorn
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Robert Persson
- Pharmaceutical Sciences, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Erik D Funder
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Nanna Albæk
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Sanna L Diemer
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Dennis J Hansen
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Marianne R Møller
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Natalia Papargyri
- Department of Biotechnology and Biomedicine, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Helle Christiansen
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Bo R Hansen
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Henrik F Hansen
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Mads A Jensen
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark
| | - Troels Koch
- Therapeutic Modalities, Roche Pharma Research and Early Development, Roche Innovation Center Copenhagen, 2970 Hørsholm, Denmark.
| |
Collapse
|
27
|
Qi F, Frishman D. Melting temperature highlights functionally important RNA structure and sequence elements in yeast mRNA coding regions. Nucleic Acids Res 2017; 45:6109-6118. [PMID: 28335026 PMCID: PMC5449622 DOI: 10.1093/nar/gkx161] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Accepted: 02/24/2017] [Indexed: 11/13/2022] Open
Abstract
Secondary structure elements in the coding regions of mRNAs play an important role in gene expression and regulation, but distinguishing functional from non-functional structures remains challenging. Here we investigate the dependence of sequence–structure relationships in the coding regions on temperature based on the recent PARTE data by Wan et al. Our main finding is that the regions with high and low thermostability (high Tm and low Tm regions) are under evolutionary pressure to preserve RNA secondary structure and primary sequence, respectively. Sequences of low Tm regions display a higher degree of evolutionary conservation compared to high Tm regions. Low Tm regions are under strong synonymous constraint, while high Tm regions are not. These findings imply that high Tm regions contain thermo-stable functionally important RNA structures, which impose relaxed evolutionary constraint on sequence as long as the base-pairing patterns remain intact. By contrast, low thermostability regions contain single-stranded functionally important conserved RNA sequence elements accessible for binding by other molecules. We also find that theoretically predicted structures of paralogous mRNA pairs become more similar with growing temperature, while experimentally measured structures tend to diverge, which implies that the melting pathways of RNA structures cannot be fully captured by current computational approaches.
Collapse
Affiliation(s)
- Fei Qi
- Department of Bioinformatics, Technische Universität München, Wissenschaftzentrum Weihenstephan, Maximus-von-Imhof-Forum 3, D-85354 Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftzentrum Weihenstephan, Maximus-von-Imhof-Forum 3, D-85354 Freising, Germany.,St Petersburg State Polytechnic University, St Petersburg 195251, Russia
| |
Collapse
|
28
|
Savisaar R, Hurst LD. Both Maintenance and Avoidance of RNA-Binding Protein Interactions Constrain Coding Sequence Evolution. Mol Biol Evol 2017; 34:1110-1126. [PMID: 28138077 PMCID: PMC5400389 DOI: 10.1093/molbev/msx061] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
While the principal force directing coding sequence (CDS) evolution is selection on protein function, to ensure correct gene expression CDSs must also maintain interactions with RNA-binding proteins (RBPs). Understanding how our genes are shaped by these RNA-level pressures is necessary for diagnostics and for improving transgenes. However, the evolutionary impact of the need to maintain RBP interactions remains unresolved. Are coding sequences constrained by the need to specify RBP binding motifs? If so, what proportion of mutations are affected? Might sequence evolution also be constrained by the need not to specify motifs that might attract unwanted binding, for instance because it would interfere with exon definition? Here, we have scanned human CDSs for motifs that have been experimentally determined to be recognized by RBPs. We observe two sets of motifs-those that are enriched over nucleotide-controlled null and those that are depleted. Importantly, the depleted set is enriched for motifs recognized by non-CDS binding RBPs. Supporting the functional relevance of our observations, we find that motifs that are more enriched are also slower-evolving. The net effect of this selection to preserve is a reduction in the over-all rate of synonymous evolution of 2-3% in both primates and rodents. Stronger motif depletion, on the other hand, is associated with stronger selection against motif gain in evolution. The challenge faced by our CDSs is therefore not only one of attracting the right RBPs but also of avoiding the wrong ones, all while also evolving under selection pressures related to protein structure.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
29
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
30
|
Pancsa R, Tompa P. Coding Regions of Intrinsic Disorder Accommodate Parallel Functions. Trends Biochem Sci 2016; 41:898-906. [DOI: 10.1016/j.tibs.2016.08.009] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Revised: 08/16/2016] [Accepted: 08/19/2016] [Indexed: 02/01/2023]
|
31
|
Rensch T, Villar D, Horvath J, Odom DT, Flicek P. Mitochondrial heteroplasmy in vertebrates using ChIP-sequencing data. Genome Biol 2016; 17:139. [PMID: 27349964 PMCID: PMC4922064 DOI: 10.1186/s13059-016-0996-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 06/03/2016] [Indexed: 01/29/2023] Open
Abstract
BACKGROUND Mitochondrial heteroplasmy, the presence of more than one mitochondrial DNA (mtDNA) variant in a cell or individual, is not as uncommon as previously thought. It is mostly due to the high mutation rate of the mtDNA and limited repair mechanisms present in the mitochondrion. Motivated by mitochondrial diseases, much focus has been placed into studying this phenomenon in human samples and in medical contexts. To place these results in an evolutionary context and to explore general principles of heteroplasmy, we describe an integrated cross-species evaluation of heteroplasmy in mammals that exploits previously reported NGS data. Focusing on ChIP-seq experiments, we developed a novel approach to detect heteroplasmy from the concomitant mitochondrial DNA fraction sequenced in these experiments. RESULTS We first demonstrate that the sequencing coverage of mtDNA in ChIP-seq experiments is sufficient for heteroplasmy detection. We then describe a novel detection method for accurate detection of heteroplasmies, which also accounts for the error rate of NGS technology. Applying this method to 79 individuals from 16 species resulted in 107 heteroplasmic positions present in a total of 45 individuals. Further analysis revealed that the majority of detected heteroplasmies occur in intergenic regions. CONCLUSION In addition to documenting the prevalence of mtDNA in ChIP-seq data, the results of our mitochondrial heteroplasmy detection method suggest that mitochondrial heteroplasmies identified across vertebrates share similar characteristics as found for human heteroplasmies. Although largely consistent with previous studies in individual vertebrates, our integrated cross-species analysis provides valuable insights into the evolutionary dynamics of mitochondrial heteroplasmy.
Collapse
Affiliation(s)
- Thomas Rensch
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Diego Villar
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
| | - Julie Horvath
- Biological and Biomedical Sciences, North Carolina Central University, Durham, NC, 27707, USA
- North Carolina Museum of Natural Sciences, Raleigh, NC, 27601, USA
| | - Duncan T Odom
- Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge, CB2 0RE, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| |
Collapse
|
32
|
Warnefors M, Hartmann B, Thomsen S, Alonso CR. Combinatorial Gene Regulatory Functions Underlie Ultraconserved Elements in Drosophila. Mol Biol Evol 2016; 33:2294-306. [PMID: 27247329 PMCID: PMC4989106 DOI: 10.1093/molbev/msw101] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Ultraconserved elements (UCEs) are discrete genomic elements conserved across large evolutionary distances. Although UCEs have been linked to multiple facets of mammalian gene regulation their extreme evolutionary conservation remains largely unexplained. Here, we apply a computational approach to investigate this question in Drosophila, exploring the molecular functions of more than 1,500 UCEs shared across the genomes of 12 Drosophila species. Our data indicate that Drosophila UCEs are hubs for gene regulatory functions and suggest that UCE sequence invariance originates from their combinatorial roles in gene control. We also note that the gene regulatory roles of intronic and intergenic UCEs (iUCEs) are distinct from those found in exonic UCEs (eUCEs). In iUCEs, transcription factor (TF) and epigenetic factor binding data strongly support iUCE roles in transcriptional and epigenetic regulation. In contrast, analyses of eUCEs indicate that they are two orders of magnitude more likely than the expected to simultaneously include protein-coding sequence, TF-binding sites, splice sites, and RNA editing sites but have reduced roles in transcriptional or epigenetic regulation. Furthermore, we use a Drosophila cell culture system and transgenic Drosophila embryos to validate the notion of UCE combinatorial regulatory roles using an eUCE within the Hox gene Ultrabithorax and show that its protein-coding region also contains alternative splicing regulatory information. Taken together our experiments indicate that UCEs emerge as a result of combinatorial gene regulatory roles and highlight common features in mammalian and insect UCEs implying that similar processes might underlie ultraconservation in diverse animal taxa.
Collapse
Affiliation(s)
- Maria Warnefors
- Sussex Neuroscience, School of Life Sciences, University of Sussex, Brighton, United Kingdom Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Britta Hartmann
- Institute of Human Genetics, Freiburg, Germany BIOSS Centre for Biological Signaling Studies, University Medical Center Freiburg, Freiburg, Germany
| | - Stefan Thomsen
- Sussex Neuroscience, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| | - Claudio R Alonso
- Sussex Neuroscience, School of Life Sciences, University of Sussex, Brighton, United Kingdom
| |
Collapse
|
33
|
Annibalini G, Bielli P, De Santi M, Agostini D, Guescini M, Sisti D, Contarelli S, Brandi G, Villarini A, Stocchi V, Sette C, Barbieri E. MIR retroposon exonization promotes evolutionary variability and generates species-specific expression of IGF-1 splice variants. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016; 1859:757-68. [DOI: 10.1016/j.bbagrm.2016.03.014] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 03/07/2016] [Accepted: 03/23/2016] [Indexed: 12/18/2022]
|
34
|
Abstract
Exonic enhancers (eExons) are coding exons that also function as enhancers of the gene in which they reside or (a) nearby gene(s). Mutations that affect the enhancer activity of these eExons have been associated with human disease. Therefore, eExon mutations should be taken into account in exome and genome sequencing projects, not only because of the ability of these mutations to modify the encoded proteins but also because of their effects on enhancer activity.
Collapse
Affiliation(s)
- Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, 94158, USA. .,Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94158, USA.
| |
Collapse
|
35
|
Martínez MA, Jordan-Paiz A, Franco S, Nevot M. Synonymous Virus Genome Recoding as a Tool to Impact Viral Fitness. Trends Microbiol 2016; 24:134-147. [DOI: 10.1016/j.tim.2015.11.002] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2015] [Revised: 10/28/2015] [Accepted: 11/04/2015] [Indexed: 01/28/2023]
|
36
|
Agoglia RM, Fraser HB. Disentangling Sources of Selection on Exonic Transcriptional Enhancers. Mol Biol Evol 2015; 33:585-90. [PMID: 26500252 DOI: 10.1093/molbev/msv234] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In addition to coding for proteins, exons can also impact transcription by encoding regulatory elements such as enhancers. It has been debated whether such features confer heightened selective constraint, or evolve neutrally. We have addressed this question by developing a new approach to disentangle the sources of selection acting on exonic enhancers, in which we model the evolutionary rates of every possible substitution as a function of their effects on both protein sequence and enhancer activity. In three exonic enhancers, we found no significant association between evolutionary rates and effects on enhancer activity. This suggests that despite having biochemical activity, these exonic enhancers have no detectable selective constraint, and thus are unlikely to play a major role in protein evolution.
Collapse
|
37
|
Bergantino F, Guariniello S, Raucci R, Colonna G, De Luca A, Normanno N, Costantini S. Structure–fluctuation–function relationships of seven pro-angiogenic isoforms of VEGFA, important mediators of tumorigenesis. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2015; 1854:410-25. [DOI: 10.1016/j.bbapap.2015.01.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 01/06/2015] [Accepted: 01/14/2015] [Indexed: 10/24/2022]
|
38
|
Sealfon RS, Lin MF, Jungreis I, Wolf MY, Kellis M, Sabeti PC. FRESCo: finding regions of excess synonymous constraint in diverse viruses. Genome Biol 2015; 16:38. [PMID: 25853568 PMCID: PMC4376164 DOI: 10.1186/s13059-015-0603-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/02/2015] [Indexed: 11/18/2022] Open
Abstract
Background The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding. Results We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures. Conclusions FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0603-7) contains supplementary material, which is available to authorized users.
Collapse
|
39
|
Chen FC, Pan CL, Lin HY. Functional Implications of RNA Splicing for Human Long Intergenic Noncoding RNAs. Evol Bioinform Online 2014; 10:219-28. [PMID: 25574121 PMCID: PMC4264600 DOI: 10.4137/ebo.s20772] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Revised: 11/12/2014] [Accepted: 11/12/2014] [Indexed: 01/12/2023] Open
Abstract
Long intergenic noncoding RNAs (lincRNAs) have been suggested as playing important roles in human gene regulation. The majority of annotated human lincRNAs include multiple exons and are alternatively spliced. However, the connections between alternative RNA splicing (AS) and the functions/regulations of lincRNAs have remained elusive. In this study, we compared the sequence evolution and biological features between single-exonic lincRNAs and multi-exonic lincRNAs (SELs and MELs, respectively) that were present only in the hominoids (hominoid-specific) or conserved in primates (primate-conserved). The MEL exons were further classified into alternatively spliced exons (ASEs) and constitutively spliced exons (CSEs) for evolutionary analyses. Our results indicate that SELs and MELs differed significantly from each other. Firstly, in hominoid-specific lincRNAs, MELs (both CSEs and ASEs) evolved slightly more rapidly than SELs, which evolved approximately at the neutral rate. In primate-conserved lincRNAs, SELs and ASEs evolved slightly more slowly than CSEs and neutral sequences. The evolutionary path of hominid-specific lincRNAs thus seemed to have diverged from that of their more ancestral counterparts. Secondly, both of the exons and transcripts of SELs were significantly longer than those of MELs, and this was probably because SEL transcripts were more resistant to RNA splicing than MELs. Thirdly, SELs were physically closer to coding genes than MELs. Fourthly, SELs were more widely expressed in human tissues than MELs. These results suggested that SELs and MELs represented two biologically distinct groups of genes. In addition, the SEL-MEL and ASE-CSE differences implied that splicing might be important for the functionality or regulations of lincRNAs in primates.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Institute of Population Health Sciences, National Health Research Institutes, Taiwan. ; Department of Biological Science and Technology, National Chiao-Tung University, Taiwan. ; Department of Dentistry, China Medical University, Taiwan
| | - Chia-Lin Pan
- Institute of Population Health Sciences, National Health Research Institutes, Taiwan
| | - Hsuan-Yu Lin
- Institute of Population Health Sciences, National Health Research Institutes, Taiwan
| |
Collapse
|
40
|
Liu G, Zhang R, Xu J, Wu CI, Lu X. Functional conservation of both CDS- and 3'-UTR-located microRNA binding sites between species. Mol Biol Evol 2014; 32:623-8. [PMID: 25414126 DOI: 10.1093/molbev/msu323] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
MicroRNAs (miRNAs) mediate gene regulation posttranscriptionally through pairing of their seed (2-7 nt) to 3'-untranslated regions (3'-UTRs) or coding regions (coding sequences [CDSs]) of their target genes. CDS target sites generally show weaker repression effects than 3'-UTR sites. However, little is known about the conservation of the function, that is, repression effect, for these two groups of target sites. In addition, no systematic analysis of the evolutionary constraint on CDS sites exists to date. To address these questions, we performed RNA-sequencing to quantify the regulatory effect of miR-15a/miR-16 and miR-92a on their CDS and 3'-UTR targets in human and macaque cells. These miRs were knocked down transiently so the repression effect could be tracked immediately. Although on average CDS targets are less derepressed than 3'-UTR targets in both species, both the 3'-UTR targets and the CDS targets are functionally conserved. The evolutionary analysis of miRNA target sites shows that CDS sites are more conserved than nontarget control, albeit to a lesser extent than 3'-UTR sites. In conclusion, CDS target sites are functional, even though they are subject to less functional constraint than 3'-UTR target sites.
Collapse
Affiliation(s)
- Guojing Liu
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing, China
| | - Rui Zhang
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jin Xu
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing, China
| | - Chung-I Wu
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-Sen University, Guangzhou, China Department of Ecology and Evolution, University of Chicago
| | - Xuemei Lu
- Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
41
|
Firth AE. Mapping overlapping functional elements embedded within the protein-coding regions of RNA viruses. Nucleic Acids Res 2014; 42:12425-39. [PMID: 25326325 PMCID: PMC4227794 DOI: 10.1093/nar/gku981] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2014] [Revised: 09/20/2014] [Accepted: 10/04/2014] [Indexed: 12/29/2022] Open
Abstract
Identification of the full complement of genes and other functional elements in any virus is crucial to fully understand its molecular biology and guide the development of effective control strategies. RNA viruses have compact multifunctional genomes that frequently contain overlapping genes and non-coding functional elements embedded within protein-coding sequences. Overlapping features often escape detection because it can be difficult to disentangle the multiple roles of the constituent nucleotides via mutational analyses, while high-throughput experimental techniques are often unable to distinguish functional elements from incidental features. However, RNA viruses evolve very rapidly so that, even within a single species, substitutions rapidly accumulate at neutral or near-neutral sites providing great potential for comparative genomics to distinguish the signature of purifying selection. Computationally identified features can then be efficiently targeted for experimental analysis. Here we analyze alignments of protein-coding virus sequences to identify regions where there is a statistically significant reduction in the degree of variability at synonymous sites, a characteristic signature of overlapping functional elements. Having previously tested this technique by experimental verification of discoveries in selected viruses, we now analyze sequence alignments for ∼700 RNA virus species to identify hundreds of such regions, many of which have not been previously described.
Collapse
Affiliation(s)
- Andrew E Firth
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge CB2 1QP, UK
| |
Collapse
|
42
|
Harrisson KA, Pavlova A, Telonis-Scott M, Sunnucks P. Using genomics to characterize evolutionary potential for conservation of wild populations. Evol Appl 2014; 7:1008-25. [PMID: 25553064 PMCID: PMC4231592 DOI: 10.1111/eva.12149] [Citation(s) in RCA: 162] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 02/10/2014] [Indexed: 12/16/2022] Open
Abstract
Genomics promises exciting advances towards the important conservation goal of maximizing evolutionary potential, notwithstanding associated challenges. Here, we explore some of the complexity of adaptation genetics and discuss the strengths and limitations of genomics as a tool for characterizing evolutionary potential in the context of conservation management. Many traits are polygenic and can be strongly influenced by minor differences in regulatory networks and by epigenetic variation not visible in DNA sequence. Much of this critical complexity is difficult to detect using methods commonly used to identify adaptive variation, and this needs appropriate consideration when planning genomic screens, and when basing management decisions on genomic data. When the genomic basis of adaptation and future threats are well understood, it may be appropriate to focus management on particular adaptive traits. For more typical conservations scenarios, we argue that screening genome-wide variation should be a sensible approach that may provide a generalized measure of evolutionary potential that accounts for the contributions of small-effect loci and cryptic variation and is robust to uncertainty about future change and required adaptive response(s). The best conservation outcomes should be achieved when genomic estimates of evolutionary potential are used within an adaptive management framework.
Collapse
Affiliation(s)
| | - Alexandra Pavlova
- School of Biological Sciences, Monash UniversityMelbourne, Vic., Australia
| | | | - Paul Sunnucks
- School of Biological Sciences, Monash UniversityMelbourne, Vic., Australia
| |
Collapse
|
43
|
Systematic dissection of coding exons at single nucleotide resolution supports an additional role in cell-specific transcriptional regulation. PLoS Genet 2014; 10:e1004592. [PMID: 25340400 PMCID: PMC4207465 DOI: 10.1371/journal.pgen.1004592] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 07/09/2014] [Indexed: 12/22/2022] Open
Abstract
In addition to their protein coding function, exons can also serve as transcriptional enhancers. Mutations in these exonic-enhancers (eExons) could alter both protein function and transcription. However, the functional consequence of eExon mutations is not well known. Here, using massively parallel reporter assays, we dissect the enhancer activity of three liver eExons (SORL1 exon 17, TRAF3IP2 exon 2, PPARG exon 6) at single nucleotide resolution in the mouse liver. We find that both synonymous and non-synonymous mutations have similar effects on enhancer activity and many of the deleterious mutation clusters overlap known liver-associated transcription factor binding sites. Carrying a similar massively parallel reporter assay in HeLa cells with these three eExons found differences in their mutation profiles compared to the liver, suggesting that enhancers could have distinct operating profiles in different tissues. Our results demonstrate that eExon mutations could lead to multiple phenotypes by disrupting both the protein sequence and enhancer activity and that enhancers can have distinct mutation profiles in different cell types. Exons that code for protein can also have additional functions, such as regulating gene transcription through enhancer activity. Here, we changed every nucleotide in three different exons that also function as enhancers, and examined their enhancer activity to test whether nucleotide changes in these exons can affect both the protein sequence and enhancer function. We found that mutations with a significant effect on enhancer function can reside both in regions that change the protein sequence (non-synonymous) and regions that do not change it (synonymous). When we conducted a similar analysis in a different cell type, we observed a difference in the nucleotide changes that cause a significant effect on enhancer activity, suggesting that the enhancer functional units can differ between tissues.
Collapse
|
44
|
N-terminal region of human chemokine receptor CXCR3: Structural analysis of CXCR3(1–48) by experimental and computational studies. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:1868-80. [DOI: 10.1016/j.bbapap.2014.08.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2014] [Revised: 08/06/2014] [Accepted: 08/07/2014] [Indexed: 11/20/2022]
|
45
|
Kessler MD, Dean MD. Effective population size does not predict codon usage bias in mammals. Ecol Evol 2014; 4:3887-900. [PMID: 25505518 PMCID: PMC4242573 DOI: 10.1002/ece3.1249] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 08/04/2014] [Accepted: 08/07/2014] [Indexed: 12/20/2022] Open
Abstract
Synonymous codons are not used at equal frequency throughout the genome, a phenomenon termed codon usage bias (CUB). It is often assumed that interspecific variation in the intensity of CUB is related to species differences in effective population sizes (Ne), with selection on CUB operating less efficiently in species with small Ne. Here, we specifically ask whether variation in Ne predicts differences in CUB in mammals and report two main findings. First, across 41 mammalian genomes, CUB was not correlated with two indirect proxies of Ne (body mass and generation time), even though there was statistically significant evidence of selection shaping CUB across all species. Interestingly, autosomal genes showed higher codon usage bias compared to X-linked genes, and high-recombination genes showed higher codon usage bias compared to low recombination genes, suggesting intraspecific variation in Ne predicts variation in CUB. Second, across six mammalian species with genetic estimates of Ne (human, chimpanzee, rabbit, and three mouse species: Mus musculus, M. domesticus, and M. castaneus), Ne and CUB were weakly and inconsistently correlated. At least in mammals, interspecific divergence in Ne does not strongly predict variation in CUB. One hypothesis is that each species responds to a unique distribution of selection coefficients, confounding any straightforward link between Ne and CUB.
Collapse
Affiliation(s)
- Michael D Kessler
- Molecular and Computational Biology, University of Southern California 1050 Childs Way, Los Angeles, California, 90089
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California 1050 Childs Way, Los Angeles, California, 90089
| |
Collapse
|
46
|
An overview of the sequence features of N- and C-terminal segments of the human chemokine receptors. Cytokine 2014; 70:141-50. [PMID: 25138014 DOI: 10.1016/j.cyto.2014.07.257] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2014] [Revised: 06/21/2014] [Accepted: 07/29/2014] [Indexed: 01/10/2023]
Abstract
Chemokine receptors play a crucial role in the cellular signaling enrolling extracellular ligands chemotactic proteins which recruit immune cells. They possess seven trans-membrane helices, an extracellular N-terminal region with three extracellular hydrophilic loops being important for search and recognition of specific ligand(s), and an intracellular C-terminal region with three intracellular loops that couple G-proteins. Although the functional aspects of the terminal segments of the extra-and intra-cellular G proteins are universally identified, the molecular basis on which they rest are still unclear because they are not definable by means of X-rays due to their high mobility and are not easy to study in the membrane. The purpose of this work is to define which physical-chemical properties of the terminal segments of the human chemokine receptors are at the basis of their functional mechanisms. Therefore, we have evaluated their physical-chemical properties in terms of amino acid composition, local flexibility, disorder propensity, net charge distribution and putative sites of post-translational modifications. Our results support the conclusion that all 19 C-terminal and N-terminal segments of human chemokine receptors are very flexible due to the systematic presence of intrinsic disorder. Although, the purpose of this plasticity clearly appears that of controlling and modulating the binding of ligands, we provide evidence that the overlap of linearly charged stretches, intrinsic disorder and post-translational modification sites, consistently found in these motives, is a necessary feature to exert the function. The role of the intrinsic disorder has been discussed considering the structural information coming from intrinsically disordered model compounds which support the view that the chemokine terminals have to be considered as strong polyampholytes or polyelectrolytes where conformational ensembles and structural transitions between them are modulated by charge fraction variations. Also the role of post-translational modifications has been found coherent with this view because, changing the charge fraction, they guide structural transitions between ensembles. Moreover, we have also considered our results from an evolutionary point of view in order to understand if the features found in humans were also present in other species. Our data evidenced that the structural features of the human terminals of the chemokine receptors were shared and evolutionarily conserved particularly among mammals. This means that the various organisms not only tolerate but select intrinsic disorder for the terminal regions of their receptors, reflecting constraints that point to molecular recognition. In conclusion the terminal segments of chemokine receptors must be considered as strong polyampholytes where the charge fraction variations induced by post-translational modifications are the driving physico-chemical feature able to adapt the conformations of the terminal segments to their functions.
Collapse
|
47
|
Chen FC, Chuang TJ, Lin HY, Hsu MK. The evolution of the coding exome of the Arabidopsis species--the influences of DNA methylation, relative exon position, and exon length. BMC Evol Biol 2014; 14:145. [PMID: 24965500 PMCID: PMC4079183 DOI: 10.1186/1471-2148-14-145] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 06/19/2014] [Indexed: 11/10/2022] Open
Abstract
Background The evolution of the coding exome is a major driving force of functional divergence both between species and between protein isoforms. Exons at different positions in the transcript or in different transcript isoforms may (1) mutate at different rates due to variations in DNA methylation level; and (2) serve distinct biological roles, and thus be differentially targeted by natural selection. Furthermore, intrinsic exonic features, such as exon length, may also affect the evolution of individual exons. Importantly, the evolutionary effects of these intrinsic/extrinsic features may differ significantly between animals and plants. Such inter-lineage differences, however, have not been systematically examined. Results Here we examine how DNA methylation at CpG dinucleotides (CpG methylation), in the context of intrinsic exonic features (exon length and relative exon position in the transcript), influences the evolution of coding exons of Arabidopsis thaliana. We observed fairly different evolutionary patterns in A. thaliana as compared with those reported for animals. Firstly, the mutagenic effect of CpG methylation is the strongest for internal exons and the weakest for first exons despite the stringent selective constraints on the former group. Secondly, the mutagenic effect of CpG methylation increases significantly with length in first exons but not in the other two exon groups. Thirdly, CpG methylation level is correlated with evolutionary rates (dS, dN, and the dN/dS ratio) with markedly different patterns among the three exon groups. The correlations are generally positive, negative, and mixed for first, last, and internal exons, respectively. Fourthly, exon length is a CpG methylation-independent indicator of evolutionary rates, particularly for dN and the dN/dS ratio in last and internal exons. Finally, the evolutionary patterns of coding exons with regard to CpG methylation differ significantly between Arabidopsis species and mammals. Conclusions Our results suggest that intrinsic features, including relative exonic position in the transcript and exon length, play an important role in the evolution of A. thaliana coding exons. Furthermore, CpG methylation is correlated with exonic evolutionary rates differentially between A. thaliana and animals, and may have served different biological roles in the two lineages.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan.
| | | | | | | |
Collapse
|
48
|
Macossay-Castillo M, Kosol S, Tompa P, Pancsa R. Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments. PLoS Comput Biol 2014; 10:e1003607. [PMID: 24809503 PMCID: PMC4014394 DOI: 10.1371/journal.pcbi.1003607] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 03/17/2014] [Indexed: 01/22/2023] Open
Abstract
Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.
Collapse
Affiliation(s)
- Mauricio Macossay-Castillo
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Simone Kosol
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Peter Tompa
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Rita Pancsa
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
- * E-mail:
| |
Collapse
|
49
|
Abstract
Discoveries over the past decade portend a paradigm shift in molecular biology. Evidence suggests that RNA is not only functional as a messenger between DNA and protein but also involved in the regulation of genome organization and gene expression, which is increasingly elaborate in complex organisms. Regulatory RNA seems to operate at many levels; in particular, it plays an important part in the epigenetic processes that control differentiation and development. These discoveries suggest a central role for RNA in human evolution and ontogeny. Here, we review the emergence of the previously unsuspected world of regulatory RNA from a historical perspective.
Collapse
Affiliation(s)
- Kevin V Morris
- School of Biotechnology and Biomedical Sciences, University of New South Wales, Sydney, NSW 2052, Australia; and Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California 92037, USA
| | - John S Mattick
- Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW 2010, Australia; the School of Biotechnology and Biomedical Sciences, and St. Vincent's Clinical School, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
50
|
Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B. Synonymous Mutations Frequently Act as Driver Mutations in Human Cancers. Cell 2014; 156:1324-1335. [DOI: 10.1016/j.cell.2014.01.051] [Citation(s) in RCA: 331] [Impact Index Per Article: 30.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 11/20/2013] [Accepted: 01/15/2014] [Indexed: 01/05/2023]
|