1
|
Eddy SR. Mammalian cells repress random DNA that yeast transcribes. Nature 2024; 628:271-273. [PMID: 38448526 DOI: 10.1038/d41586-024-00575-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
|
2
|
Camellato BR, Brosh R, Ashe HJ, Maurano MT, Boeke JD. Synthetic reversed sequences reveal default genomic states. Nature 2024; 628:373-380. [PMID: 38448583 PMCID: PMC11006607 DOI: 10.1038/s41586-024-07128-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Pervasive transcriptional activity is observed across diverse species. The genomes of extant organisms have undergone billions of years of evolution, making it unclear whether these genomic activities represent effects of selection or 'noise'1-4. Characterizing default genome states could help understand whether pervasive transcriptional activity has biological meaning. Here we addressed this question by introducing a synthetic 101-kb locus into the genomes of Saccharomyces cerevisiae and Mus musculus and characterizing genomic activity. The locus was designed by reversing but not complementing human HPRT1, including its flanking regions, thus retaining basic features of the natural sequence but ablating evolved coding or regulatory information. We observed widespread activity of both reversed and native HPRT1 loci in yeast, despite the lack of evolved yeast promoters. By contrast, the reversed locus displayed no activity at all in mouse embryonic stem cells, and instead exhibited repressive chromatin signatures. The repressive signature was alleviated in a locus variant lacking CpG dinucleotides; nevertheless, this variant was also transcriptionally inactive. These results show that synthetic genomic sequences that lack coding information are active in yeast, but inactive in mouse embryonic stem cells, consistent with a major difference in 'default genomic states' between these two divergent eukaryotic cell types, with implications for understanding pervasive transcription, horizontal transfer of genetic information and the birth of new genes.
Collapse
Affiliation(s)
| | - Ran Brosh
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Hannah J Ashe
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
| | - Matthew T Maurano
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA
- Department of Pathology, NYU Langone Health, New York, NY, USA
| | - Jef D Boeke
- Institute for Systems Genetics, NYU Langone Health, New York, NY, USA.
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.
- Department of Biomedical Engineering, NYU Tandon School of Engineering, New York, NY, USA.
| |
Collapse
|
3
|
Kaur J, Sharma A, Mundlia P, Sood V, Pandey A, Singh G, Barnwal RP. RNA-Small-Molecule Interaction: Challenging the "Undruggable" Tag. J Med Chem 2024. [PMID: 38498010 DOI: 10.1021/acs.jmedchem.3c01354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
RNA targeting, specifically with small molecules, is a relatively new and rapidly emerging avenue with the promise to expand the target space in the drug discovery field. From being "disregarded" as an "undruggable" messenger molecule to FDA approval of an RNA-targeting small-molecule drug Risdiplam, a radical change in perspective toward RNA has been observed in the past decade. RNAs serve important regulatory functions beyond canonical protein synthesis, and their dysregulation has been reported in many diseases. A deeper understanding of RNA biology reveals that RNA molecules can adopt a variety of structures, carrying defined binding pockets that can accommodate small-molecule drugs. Due to its functional diversity and structural complexity, RNA can be perceived as a prospective target for therapeutic intervention. This perspective highlights the proof of concept of RNA-small-molecule interactions, exemplified by targeting of various transcripts with functional modulators. The advent of RNA-oriented knowledge would help expedite drug discovery.
Collapse
Affiliation(s)
- Jaskirat Kaur
- Department of Biophysics, Panjab University, Chandigarh 160014, India
| | - Akanksha Sharma
- Department of Biophysics, Panjab University, Chandigarh 160014, India
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India
| | - Poonam Mundlia
- Department of Biophysics, Panjab University, Chandigarh 160014, India
| | - Vikas Sood
- Department of Biochemistry, Jamia Hamdard, New Delhi 110062, India
| | - Ankur Pandey
- Department of Chemistry, Panjab University, Chandigarh 160014, India
| | - Gurpal Singh
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India
| | | |
Collapse
|
4
|
Luthra I, Jensen C, Chen XE, Salaudeen AL, Rafi AM, de Boer CG. Regulatory activity is the default DNA state in eukaryotes. Nat Struct Mol Biol 2024; 31:559-567. [PMID: 38448573 DOI: 10.1038/s41594-024-01235-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 01/29/2024] [Indexed: 03/08/2024]
Abstract
Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.
Collapse
Affiliation(s)
- Ishika Luthra
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cassandra Jensen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xinyi E Chen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Asfar Lathif Salaudeen
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Abdul Muntakim Rafi
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| |
Collapse
|
5
|
de Boer CG, Taipale J. Hold out the genome: a roadmap to solving the cis-regulatory code. Nature 2024; 625:41-50. [PMID: 38093018 DOI: 10.1038/s41586-023-06661-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/20/2023] [Indexed: 01/05/2024]
Abstract
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Collapse
Affiliation(s)
- Carl G de Boer
- School of Biomedical Engineering, University of British Columbia, Vancouver, British Columbia, Canada.
| | - Jussi Taipale
- Applied Tumor Genomics Research Program, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.
- Department of Biochemistry, University of Cambridge, Cambridge, UK.
| |
Collapse
|
6
|
Ardern Z, Uz-Zaman MH. Between noise and function: Toward a taxonomy of the non-canonical translatome. Cell Syst 2023; 14:343-345. [PMID: 37201506 DOI: 10.1016/j.cels.2023.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 04/17/2023] [Indexed: 05/20/2023]
Abstract
Eukaryotic genomes are pervasively translated, but the properties of translated sequences outside of canonical genes are poorly understood. A new study in Cell Systems reveals a large translatome that is not under significant evolutionary constraint but is still an active part of diverse cellular systems.
Collapse
Affiliation(s)
- Zachary Ardern
- Parasites and Microbes Programme, Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK.
| | - Md Hassan Uz-Zaman
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
7
|
Kravchuk EV, Ashniev GA, Gladkova MG, Orlov AV, Vasileva AV, Boldyreva AV, Burenin AG, Skirda AM, Nikitin PI, Orlova NN. Experimental Validation and Prediction of Super-Enhancers: Advances and Challenges. Cells 2023; 12:cells12081191. [PMID: 37190100 DOI: 10.3390/cells12081191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 04/07/2023] [Accepted: 04/14/2023] [Indexed: 05/17/2023] Open
Abstract
Super-enhancers (SEs) are cis-regulatory elements of the human genome that have been widely discussed since the discovery and origin of the term. Super-enhancers have been shown to be strongly associated with the expression of genes crucial for cell differentiation, cell stability maintenance, and tumorigenesis. Our goal was to systematize research studies dedicated to the investigation of structure and functions of super-enhancers as well as to define further perspectives of the field in various applications, such as drug development and clinical use. We overviewed the fundamental studies which provided experimental data on various pathologies and their associations with particular super-enhancers. The analysis of mainstream approaches for SE search and prediction allowed us to accumulate existing data and propose directions for further algorithmic improvements of SEs' reliability levels and efficiency. Thus, here we provide the description of the most robust algorithms such as ROSE, imPROSE, and DEEPSEN and suggest their further use for various research and development tasks. The most promising research direction, which is based on topic and number of published studies, are cancer-associated super-enhancers and prospective SE-targeted therapy strategies, most of which are discussed in this review.
Collapse
Affiliation(s)
- Ekaterina V Kravchuk
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Leninskiye Gory, MSU, 1-12, 119991 Moscow, Russia
| | - German A Ashniev
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Leninskiye Gory, MSU, 1-12, 119991 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, GSP-1, Leninskiye Gory, MSU, 1-73, 119234 Moscow, Russia
| | - Marina G Gladkova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, GSP-1, Leninskiye Gory, MSU, 1-73, 119234 Moscow, Russia
| | - Alexey V Orlov
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Anastasiia V Vasileva
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Anna V Boldyreva
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Alexandr G Burenin
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Artemiy M Skirda
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Petr I Nikitin
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| | - Natalia N Orlova
- Prokhorov General Physics Institute of the Russian Academy of Sciences, 38 Vavilov St., 119991 Moscow, Russia
| |
Collapse
|
8
|
Palazzo AF, Kejiou NS. Non-Darwinian Molecular Biology. Front Genet 2022; 13:831068. [PMID: 35251134 PMCID: PMC8888898 DOI: 10.3389/fgene.2022.831068] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 01/24/2022] [Indexed: 12/14/2022] Open
Abstract
With the discovery of the double helical structure of DNA, a shift occurred in how biologists investigated questions surrounding cellular processes, such as protein synthesis. Instead of viewing biological activity through the lens of chemical reactions, this new field used biological information to gain a new profound view of how biological systems work. Molecular biologists asked new types of questions that would have been inconceivable to the older generation of researchers, such as how cellular machineries convert inherited biological information into functional molecules like proteins. This new focus on biological information also gave molecular biologists a way to link their findings to concepts developed by genetics and the modern synthesis. However, by the late 1960s this all changed. Elevated rates of mutation, unsustainable genetic loads, and high levels of variation in populations, challenged Darwinian evolution, a central tenant of the modern synthesis, where adaptation was the main driver of evolutionary change. Building on these findings, Motoo Kimura advanced the neutral theory of molecular evolution, which advocates that selection in multicellular eukaryotes is weak and that most genomic changes are neutral and due to random drift. This was further elaborated by Jack King and Thomas Jukes, in their paper “Non-Darwinian Evolution”, where they pointed out that the observed changes seen in proteins and the types of polymorphisms observed in populations only become understandable when we take into account biochemistry and Kimura’s new theory. Fifty years later, most molecular biologists remain unaware of these fundamental advances. Their adaptionist viewpoint fails to explain data collected from new powerful technologies which can detect exceedingly rare biochemical events. For example, high throughput sequencing routinely detects RNA transcripts being produced from almost the entire genome yet are present less than one copy per thousand cells and appear to lack any function. Molecular biologists must now reincorporate ideas from classical biochemistry and absorb modern concepts from molecular evolution, to craft a new lens through which they can evaluate the functionality of transcriptional units, and make sense of our messy, intricate, and complicated genome.
Collapse
|
9
|
Lima JRS, Azevedo-Pinheiro J, Andrade RB, Khayat AS, de Assumpção PP, Ribeiro-dos-Santos Â, Batista dos Santos SE, Moreira FC. Identification and Characterization of Polymorphisms in piRNA Regions. Curr Issues Mol Biol 2022; 44:942-951. [PMID: 35723347 PMCID: PMC8929088 DOI: 10.3390/cimb44020062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/06/2022] [Accepted: 01/20/2022] [Indexed: 12/19/2022] Open
Abstract
piRNAs are a class of noncoding RNAs that perform functions in epigenetic regulation and silencing of transposable elements, a mechanism conserved among most mammals. At present, there are more than 30,000 known piRNAs in humans, of which more than 80% are derived from intergenic regions, and approximately 20% are derived from the introns and exons of pre-mRNAs. It was observed that the expression of the piRNA profile is specific in several organs, suggesting that they play functional roles in different tissues. In addition, some studies suggest that changes in regions that encode piRNAs may have an impact on their function. To evaluate the conservation of these regions and explore the existence of a seed region, SNP and INDEL variant rates were investigated in several genomic regions and compared to piRNA region variant rates. Thus, data analysis, data collection, cleaning, treatment, and exploration were implemented using the R programming language with the help of the RStudio platform. We found that piRNA regions are highly conserved after considering INDELs and do not seem to present an identifiable seed region after considering SNPs and INDEL variants. These findings may contribute to future studies attempting to determine how polymorphisms in piRNA regions can impact diseases.
Collapse
Affiliation(s)
- José Roberto Sobrinho Lima
- Núcleo de Pesquisas em Oncologia (NPO), Programa de Pós-Graduação em Oncologia e Ciências Médicas, Universidade Federal do Pará, Belém 66073-005, PA, Brazil; (J.R.S.L.); (R.B.A.); (A.S.K.); (P.P.d.A.); (Â.R.-d.-S.); (S.E.B.d.S.)
| | - Jhully Azevedo-Pinheiro
- Laboratório de Genética Humana e Médica (LGHM), Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Pará, Belém 66075-110, PA, Brazil;
| | - Roberta Borges Andrade
- Núcleo de Pesquisas em Oncologia (NPO), Programa de Pós-Graduação em Oncologia e Ciências Médicas, Universidade Federal do Pará, Belém 66073-005, PA, Brazil; (J.R.S.L.); (R.B.A.); (A.S.K.); (P.P.d.A.); (Â.R.-d.-S.); (S.E.B.d.S.)
- Laboratório de Genética Humana e Médica (LGHM), Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Pará, Belém 66075-110, PA, Brazil;
| | - André Salim Khayat
- Núcleo de Pesquisas em Oncologia (NPO), Programa de Pós-Graduação em Oncologia e Ciências Médicas, Universidade Federal do Pará, Belém 66073-005, PA, Brazil; (J.R.S.L.); (R.B.A.); (A.S.K.); (P.P.d.A.); (Â.R.-d.-S.); (S.E.B.d.S.)
| | - Paulo Pimentel de Assumpção
- Núcleo de Pesquisas em Oncologia (NPO), Programa de Pós-Graduação em Oncologia e Ciências Médicas, Universidade Federal do Pará, Belém 66073-005, PA, Brazil; (J.R.S.L.); (R.B.A.); (A.S.K.); (P.P.d.A.); (Â.R.-d.-S.); (S.E.B.d.S.)
| | - Ândrea Ribeiro-dos-Santos
- Núcleo de Pesquisas em Oncologia (NPO), Programa de Pós-Graduação em Oncologia e Ciências Médicas, Universidade Federal do Pará, Belém 66073-005, PA, Brazil; (J.R.S.L.); (R.B.A.); (A.S.K.); (P.P.d.A.); (Â.R.-d.-S.); (S.E.B.d.S.)
- Laboratório de Genética Humana e Médica (LGHM), Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Pará, Belém 66075-110, PA, Brazil;
| | - Sidney Emanuel Batista dos Santos
- Núcleo de Pesquisas em Oncologia (NPO), Programa de Pós-Graduação em Oncologia e Ciências Médicas, Universidade Federal do Pará, Belém 66073-005, PA, Brazil; (J.R.S.L.); (R.B.A.); (A.S.K.); (P.P.d.A.); (Â.R.-d.-S.); (S.E.B.d.S.)
- Laboratório de Genética Humana e Médica (LGHM), Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Pará, Belém 66075-110, PA, Brazil;
| | - Fabiano Cordeiro Moreira
- Núcleo de Pesquisas em Oncologia (NPO), Programa de Pós-Graduação em Oncologia e Ciências Médicas, Universidade Federal do Pará, Belém 66073-005, PA, Brazil; (J.R.S.L.); (R.B.A.); (A.S.K.); (P.P.d.A.); (Â.R.-d.-S.); (S.E.B.d.S.)
- Correspondence: ; Tel.: +55-091-98107-0858
| |
Collapse
|
10
|
Mulvey B, Lagunas T, Dougherty JD. Massively Parallel Reporter Assays: Defining Functional Psychiatric Genetic Variants Across Biological Contexts. Biol Psychiatry 2021; 89:76-89. [PMID: 32843144 PMCID: PMC7938388 DOI: 10.1016/j.biopsych.2020.06.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 12/18/2022]
Abstract
Neuropsychiatric phenotypes have long been known to be influenced by heritable risk factors, directly confirmed by the past decade of genetic studies that have revealed specific genetic variants enriched in disease cohorts. However, the initial hope that a small set of genes would be responsible for a given disorder proved false. The more complex reality is that a given disorder may be influenced by myriad small-effect noncoding variants and/or by rare but severe coding variants, many de novo. Noncoding genomic sequences-for which molecular functions cannot usually be inferred-harbor a large portion of these variants, creating a substantial barrier to understanding higher-order molecular and biological systems of disease. Fortunately, novel genetic technologies-scalable oligonucleotide synthesis, RNA sequencing, and CRISPR (clustered regularly interspaced short palindromic repeats)-have opened novel avenues to experimentally identify biologically significant variants en masse. Massively parallel reporter assays (MPRAs) are an especially versatile technique resulting from such innovations. MPRAs are powerful molecular genetics tools that can be used to screen thousands of untranscribed or untranslated sequences and their variants for functional effects in a single experiment. This approach, though underutilized in psychiatric genetics, has several useful features for the field. We review methods for assaying putatively functional genetic variants and regions, emphasizing MPRAs and the opportunities they hold for dissection of psychiatric polygenicity. We discuss literature applying functional assays in neurogenetics, highlighting strengths, caveats, and design considerations-especially regarding disease-relevant variables (cell type, neurodevelopment, and sex), and we ultimately propose applications of MPRA to both computational and experimental neurogenetics of polygenic disease risk.
Collapse
Affiliation(s)
- Bernard Mulvey
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Tomás Lagunas
- Division of Biology and Biomedical Sciences, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, Missouri; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, Missouri.
| |
Collapse
|
11
|
Palazzo AF, Kang YM. GC-content biases in protein-coding genes act as an "mRNA identity" feature for nuclear export. Bioessays 2020; 43:e2000197. [PMID: 33165929 DOI: 10.1002/bies.202000197] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 01/11/2023]
Abstract
It has long been observed that human protein-coding genes have a particular distribution of GC-content: the 5' end of these genes has high GC-content while the 3' end has low GC-content. In 2012, it was proposed that this pattern of GC-content could act as an mRNA identity feature that would lead to it being better recognized by the cellular machinery to promote its nuclear export. In contrast, junk RNA, which largely lacks this feature, would be retained in the nucleus and targeted for decay. Now two recent papers have provided evidence that GC-content does promote the nuclear export of many mRNAs in human cells.
Collapse
Affiliation(s)
- Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, ON, M5G 1M1, Canada
| | - Yoon Mo Kang
- Department of Biochemistry, University of Toronto, Toronto, ON, M5G 1M1, Canada
| |
Collapse
|
12
|
Palazzo AF, Koonin EV. Functional Long Non-coding RNAs Evolve from Junk Transcripts. Cell 2020; 183:1151-1161. [PMID: 33068526 DOI: 10.1016/j.cell.2020.09.047] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 08/20/2020] [Accepted: 09/17/2020] [Indexed: 12/30/2022]
Abstract
Transcriptome studies reveal pervasive transcription of complex genomes, such as those of mammals. Despite popular arguments for functionality of most, if not all, of these transcripts, genome-wide analysis of selective constraints indicates that most of the produced RNA are junk. However, junk is not garbage. On the contrary, junk transcripts provide the raw material for the evolution of diverse long non-coding (lnc) RNAs by non-adaptive mechanisms, such as constructive neutral evolution. The generation of many novel functional entities, such as lncRNAs, that fuels organismal complexity does not seem to be driven by strong positive selection. Rather, the weak selection regime that dominates the evolution of most multicellular eukaryotes provides ample material for functional innovation with relatively little adaptation involved.
Collapse
Affiliation(s)
- Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, ON M5G 1M1, Canada.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
13
|
Modular Organization of Cis-regulatory Control Information of Neurotransmitter Pathway Genes in Caenorhabditis elegans. Genetics 2020; 215:665-681. [PMID: 32444379 DOI: 10.1534/genetics.120.303206] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 05/20/2020] [Indexed: 11/18/2022] Open
Abstract
We explore here the cis-regulatory logic that dictates gene expression in specific cell types in the nervous system. We focus on a set of eight genes involved in the synthesis, transport, and breakdown of three neurotransmitter systems: acetylcholine (unc-17 /VAChT, cha-1 /ChAT, cho-1 /ChT, and ace-2 /AChE), glutamate (eat-4 /VGluT), and γ-aminobutyric acid (unc-25 /GAD, unc-46 /LAMP, and unc-47 /VGAT). These genes are specifically expressed in defined subsets of cells in the nervous system. Through transgenic reporter gene assays, we find that the cellular specificity of expression of all of these genes is controlled in a modular manner through distinct cis-regulatory elements, corroborating the previously inferred piecemeal nature of specification of neurotransmitter identity. This modularity provides the mechanistic basis for the phenomenon of "phenotypic convergence," in which distinct regulatory pathways can generate similar phenotypic outcomes (i.e., the acquisition of a specific neurotransmitter identity) in different neuron classes. We also identify cases of enhancer pleiotropy, in which the same cis-regulatory element is utilized to control gene expression in distinct neuron types. We engineered a cis-regulatory allele of the vesicular acetylcholine transporter, unc-17 /VAChT, to assess the functional contribution of a "shadowed" enhancer. We observed a selective loss of unc-17 /VAChT expression in one cholinergic pharyngeal pacemaker motor neuron class and a behavioral phenotype that matches microsurgical removal of this neuron. Our analysis illustrates the value of understanding cis-regulatory information to manipulate gene expression and control animal behavior.
Collapse
|
14
|
Affiliation(s)
- Stefan Linquist
- Department of Philosophy, University of Guelph, Guelph, Ontario, Canada
- * E-mail:
| | - W. Ford Doolittle
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | | |
Collapse
|
15
|
Brzović Z, Šustar P. Postgenomics function monism. STUDIES IN HISTORY AND PHILOSOPHY OF BIOLOGICAL AND BIOMEDICAL SCIENCES 2020; 80:101243. [PMID: 31924514 DOI: 10.1016/j.shpsc.2019.101243] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 10/08/2019] [Accepted: 12/27/2019] [Indexed: 06/10/2023]
Abstract
The ENCODE project has made important new estimates of human genome functionality, now revising the percentage considered functional to more than 80%, which is in stark contrast to the received view, which estimated that less than 10% of the conserved parts of the human genome are functional. ENCODE's unorthodox use of the notion of biological function has stirred the so-called ENCODE controversy, involving conflicting views about the correct notion of function in postgenomics. The debate hinges on the traditional philosophical contrast between the causal role (CR) and selected effects (SE) approaches. In this paper, we examine the ENCODE controversy in terms of the distinction between function monism and pluralism. We propose to apply a weak etiological account to genomic function ascriptions. In this approach, we can ascribe a function to a genomic structure of an organism if and only if performing the function persists in causally contributing to the organism's and its ancestors' fitness. In comparison to the strong etiological (i.e., the selected effects) approach, the present account does not require there to be selection for the structure in question. This is a monistic approach that enables us to avoid the main difficulties of CR, as well as SE's overdependence on natural selection, while still preserving an evolutionary-constrained notion of biological functions. Our proposal is much more moderate in accommodating the estimates of the functionality of the human genome than both ENCODE's proposal itself and the views of the critics relying on a version of the SE account of functions.
Collapse
Affiliation(s)
- Zdenka Brzović
- Department of Philosophy, Faculty of Humanities and Social Sciences, University of Rijeka, Sveučilišna avenija 4, 51000, Rijeka, Croatia.
| | - Predrag Šustar
- Department of Philosophy, Faculty of Humanities and Social Sciences, University of Rijeka, Sveučilišna avenija 4, 51000, Rijeka, Croatia.
| |
Collapse
|
16
|
Hatje K, Mühlhausen S, Simm D, Kollmar M. The Protein-Coding Human Genome: Annotating High-Hanging Fruits. Bioessays 2019; 41:e1900066. [PMID: 31544971 DOI: 10.1002/bies.201900066] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 08/07/2019] [Indexed: 12/19/2022]
Abstract
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
Collapse
Affiliation(s)
- Klas Hatje
- Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstr. 124, 4070, Basel, Switzerland
| | - Stefanie Mühlhausen
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| | - Dominic Simm
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.,Theoretical Computer Science and Algorithmic Methods, Institute of Computer Science, Georg-August-University Göttingen, Goldschmidtstr. 7, 37077, Göttingen, Germany
| | - Martin Kollmar
- Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany
| |
Collapse
|
17
|
He Y, Tian S, Tian P. Fundamental asymmetry of insertions and deletions in genomes size evolution. J Theor Biol 2019; 482:109983. [PMID: 31445016 DOI: 10.1016/j.jtbi.2019.08.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 08/18/2019] [Accepted: 08/21/2019] [Indexed: 12/01/2022]
Abstract
The origin of large genomes that underlies the long standing "C-value enigma" is only partially explained by selfish DNA. We investigated insertions and deletions (indels) of nucleotides and discussed their relevance in size evolution of random biological sequences (RBS) and genomes. By developing a probabilistic model of RBS based on size evolution of expandable sites in a thought perfect genome, it was found that insertion bias engenders exponential increase of average RBS sizes. When combined with existing large segments of genome that are not subject to selection pressure (e.g. selfish DNA), such insertion bias results in explosive expansion of genomes, and therefore helps explain the "C value enigma" besides selfish DNA. Such increase of RBS size is caused by the fundamental asymmetry of indels, with insertions result in more available sites and deletions result in less deletable nucleotides. In qualitative agreement with the size distribution of known genomes, tails of RBS size distributions exhibit exponential decay with probabilities of larger RBS segments being smaller. Unsurprisingly, a slight deletion bias (higher deletions probabilities) results in a slow decrease of average RBS size and may lead to their eventual vanishing. Contrary to intuition, strictly balanced insertion and deletion results in linearly increasing instead of completely fixed RBS size. Nonetheless, such slow linear increase of average RBS sizes with time are small in magnitude and are consequently not influential on genome size evolution, and certainly not a major contributor for the "C-value enigma". Our model suggested that insertion bias of nucleotides may provide complementary explanation for large genomes besides selfish DNA. The fundamental indel asymmetry is applicable for all forms of genomic insertions and deletions. Long-lasting exponential increase of genome size present energy and material requirement that is impossible to sustain. We therefore concluded that if there were explosively accelerating expansion caused by significant effective insertion bias for any survival species, it must have occurred sporadically. Our model also provided an explanation for the observed proportional evolution of genome size.
Collapse
Affiliation(s)
- Yang He
- School of Life Sciences, Jilin University Changchun, 2699 Qianjin Street, China 130012
| | - Suyan Tian
- Division of Clinical Epidemiology, First Hospital of The Jilin University, 71 Xinmin Street, Changchun, China, 130021.
| | - Pu Tian
- School of Life Sciences and MOE Key laboratory of Molecular Enzymology and Engineering, Jilin University 2699 Qianjin Street, Changchun, China 130012.
| |
Collapse
|
18
|
Hoeppner MP, Denisenko E, Gardner PP, Schmeier S, Poole AM. An Evaluation of Function of Multicopy Noncoding RNAs in Mammals Using ENCODE/FANTOM Data and Comparative Genomics. Mol Biol Evol 2019; 35:1451-1462. [PMID: 29617896 PMCID: PMC5967550 DOI: 10.1093/molbev/msy046] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Mammalian diversification has coincided with a rapid proliferation of various types of noncoding RNAs, including members of both snRNAs and snoRNAs. The significance of this expansion however remains obscure. While some ncRNA copy-number expansions have been linked to functionally tractable effects, such events may equally likely be neutral, perhaps as a result of random retrotransposition. Hindering progress in our understanding of such observations is the difficulty in establishing function for the diverse features that have been identified in our own genome. Projects such as ENCODE and FANTOM have revealed a hidden world of genomic expression patterns, as well as a host of other potential indicators of biological function. However, such projects have been criticized, particularly from practitioners in the field of molecular evolution, where many suspect these data provide limited insight into biological function. The molecular evolution community has largely taken a skeptical view, thus it is important to establish tests of function. We use a range of data, including data drawn from ENCODE and FANTOM, to examine the case for function for the recent copy number expansion in mammals of six evolutionarily ancient RNA families involved in splicing and rRNA maturation. We use several criteria to assess evidence for function: conservation of sequence and structure, genomic synteny, evidence for transposition, and evidence for species-specific expression. Applying these criteria, we find that only a minority of loci show strong evidence for function and that, for the majority, we cannot reject the null hypothesis of no function.
Collapse
Affiliation(s)
- Marc P Hoeppner
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Elena Denisenko
- Institute of Natural and Mathematical Sciences, Massey University, Auckland, New Zealand
| | - Paul P Gardner
- Biomolecular Interaction Centre, School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Sebastian Schmeier
- Institute of Natural and Mathematical Sciences, Massey University, Auckland, New Zealand
| | - Anthony M Poole
- Bioinformatics Institute, School of Biological Sciences, University of Auckland, Auckland, New Zealand
| |
Collapse
|
19
|
Lloyd JP, Tsai ZTY, Sowers RP, Panchy NL, Shiu SH. A Model-Based Approach for Identifying Functional Intergenic Transcribed Regions and Noncoding RNAs. Mol Biol Evol 2019; 35:1422-1436. [PMID: 29554332 DOI: 10.1093/molbev/msy035] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
With advances in transcript profiling, the presence of transcriptional activities in intergenic regions has been well established. However, whether intergenic expression reflects transcriptional noise or activity of novel genes remains unclear. We identified intergenic transcribed regions (ITRs) in 15 diverse flowering plant species and found that the amount of intergenic expression correlates with genome size, a pattern that could be expected if intergenic expression is largely nonfunctional. To further assess the functionality of ITRs, we first built machine learning models using Arabidopsis thaliana as a model that accurately distinguish functional sequences (benchmark protein-coding and RNA genes) and likely nonfunctional ones (pseudogenes and unexpressed intergenic regions) by integrating 93 biochemical, evolutionary, and sequence-structure features. Next, by applying the models genome-wide, we found that 4,427 ITRs (38%) and 796 annotated ncRNAs (44%) had features significantly similar to benchmark protein-coding or RNA genes and thus were likely parts of functional genes. Approximately 60% of ITRs and ncRNAs were more similar to nonfunctional sequences and were likely transcriptional noise. The predictive framework established here provides not only a comprehensive look at how functional, genic sequences are distinct from likely nonfunctional ones, but also a new way to differentiate novel genes from genomic regions with noisy transcriptional activities.
Collapse
Affiliation(s)
- John P Lloyd
- Department of Plant Biology, Michigan State University, East Lansing, MI
| | - Zing Tsung-Yeh Tsai
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI
| | - Rosalie P Sowers
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA
| | | | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI.,Genetics Program, Michigan State University, East Lansing, MI.,Ecology, Evolutionary Biology, and Behavior Program, Michigan State University, East Lansing, MI
| |
Collapse
|
20
|
Transcriptional noise and exaptation as sources for bacterial sRNAs. Biochem Soc Trans 2019; 47:527-539. [PMID: 30837318 DOI: 10.1042/bst20180171] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 02/01/2019] [Accepted: 02/01/2019] [Indexed: 11/17/2022]
Abstract
Understanding how new genes originate and integrate into cellular networks is key to understanding evolution. Bacteria present unique opportunities for both the natural history and experimental study of gene origins, due to their large effective population sizes, rapid generation times, and ease of genetic manipulation. Bacterial small non-coding RNAs (sRNAs), in particular, many of which operate through a simple antisense regulatory logic, may serve as tractable models for exploring processes of gene origin and adaptation. Understanding how and on what timescales these regulatory molecules arise has important implications for understanding the evolution of bacterial regulatory networks, in particular, for the design of comparative studies of sRNA function. Here, we introduce relevant concepts from evolutionary biology and review recent work that has begun to shed light on the timescales and processes through which non-functional transcriptional noise is co-opted to provide regulatory functions. We explore possible scenarios for sRNA origin, focusing on the co-option, or exaptation, of existing genomic structures which may provide protected spaces for sRNA evolution.
Collapse
|
21
|
Veller C, Kleckner N, Nowak MA. A rigorous measure of genome-wide genetic shuffling that takes into account crossover positions and Mendel's second law. Proc Natl Acad Sci U S A 2019; 116:1659-1668. [PMID: 30635424 PMCID: PMC6358705 DOI: 10.1073/pnas.1817482116] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Comparative studies in evolutionary genetics rely critically on evaluation of the total amount of genetic shuffling that occurs during gamete production. Such studies have been hampered by the absence of a direct measure of this quantity. Existing measures consider crossing-over by simply counting the average number of crossovers per meiosis. This is qualitatively inadequate, because the positions of crossovers along a chromosome are also critical: a crossover toward the middle of a chromosome causes more shuffling than a crossover toward the tip. Moreover, traditional measures fail to consider shuffling from independent assortment of homologous chromosomes (Mendel's second law). Here, we present a rigorous measure of genome-wide shuffling that does not suffer from these limitations. We define the parameter [Formula: see text] as the probability that the alleles at two randomly chosen loci are shuffled during gamete production. This measure can be decomposed into separate contributions from crossover number and position and from independent assortment. Intrinsic implications of this metric include the fact that [Formula: see text] is larger when crossovers are more evenly spaced, which suggests a selective advantage of crossover interference. Utilization of [Formula: see text] is enabled by powerful emergent methods for determining crossover positions either cytologically or by DNA sequencing. Application of our analysis to such data from human male and female reveals that (i) [Formula: see text] in humans is close to its maximum possible value of 1/2 and that (ii) this high level of shuffling is due almost entirely to independent assortment, the contribution of which is ∼30 times greater than that of crossovers.
Collapse
Affiliation(s)
- Carl Veller
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
- Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138
| | - Nancy Kleckner
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138;
| | - Martin A Nowak
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138
- Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 02138
- Department of Mathematics, Harvard University, Cambridge, MA 02138
| |
Collapse
|
22
|
Human Genomics in Immunology. Clin Immunol 2019. [DOI: 10.1016/b978-0-7020-6896-6.00033-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
23
|
Gulko B, Siepel A. An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences. Nat Genet 2018; 51:335-342. [PMID: 30559490 DOI: 10.1038/s41588-018-0300-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 10/30/2018] [Indexed: 01/22/2023]
Abstract
Here we ask the question "How much information do epigenomic datasets provide about human genomic function?" We consider nine epigenomic features across 115 cell types and measure information about function as a reduction in entropy under a probabilistic evolutionary model fitted to human and nonhuman primate genomes. Several epigenomic features yield more information in combination than they do individually. We find that the entropy in human genetic variation predominantly reflects a balance between mutation and neutral drift. Our cell-type-specific FitCons scores reveal relationships among cell types and suggest that around 8% of nucleotide sites are constrained by natural selection.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, NY, USA.,Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| |
Collapse
|
24
|
Bourque G, Burns KH, Gehring M, Gorbunova V, Seluanov A, Hammell M, Imbeault M, Izsvák Z, Levin HL, Macfarlan TS, Mager DL, Feschotte C. Ten things you should know about transposable elements. Genome Biol 2018; 19:199. [PMID: 30454069 PMCID: PMC6240941 DOI: 10.1186/s13059-018-1577-z] [Citation(s) in RCA: 613] [Impact Index Per Article: 102.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Transposable elements (TEs) are major components of eukaryotic genomes. However, the extent of their impact on genome evolution, function, and disease remain a matter of intense interrogation. The rise of genomics and large-scale functional assays has shed new light on the multi-faceted activities of TEs and implies that they should no longer be marginalized. Here, we introduce the fundamental properties of TEs and their complex interactions with their cellular environment, which are crucial to understanding their impact and manifold consequences for organismal biology. While we draw examples primarily from mammalian systems, the core concepts outlined here are relevant to a broad range of organisms.
Collapse
Affiliation(s)
- Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, H3A 0G1, Canada.
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, H3A 0G1, Canada.
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Mary Gehring
- Whitehead Institute for Biomedical Research and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| | - Vera Gorbunova
- Department of Biology, University of Rochester, Rochester, NY, 14627, USA
| | - Andrei Seluanov
- Department of Biology, University of Rochester, Rochester, NY, 14627, USA
| | - Molly Hammell
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Michaël Imbeault
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Zsuzsanna Izsvák
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Henry L Levin
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health, Bethesda, Maryland, USA
| | - Todd S Macfarlan
- The Eunice Kennedy Shriver National Institute of Child Health and Human Development, The National Institutes of Health, Bethesda, Maryland, USA
| | - Dixie L Mager
- Terry Fox Laboratory, British Columbia Cancer Agency and Department of Medical Genetics, University of BC, Vancouver, BC, V5Z1L3, Canada
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, 14850, USA.
| |
Collapse
|
25
|
Abstract
New genes arise from pre-existing genes, but some de novo origin from non-genic sequence also seems plausible. A new study has surprisingly concluded that 25% of random DNA sequences yield beneficial products when expressed in bacteria.
Collapse
|
26
|
Edwards JR, Yarychkivska O, Boulard M, Bestor TH. DNA methylation and DNA methyltransferases. Epigenetics Chromatin 2017; 10:23. [PMID: 28503201 PMCID: PMC5422929 DOI: 10.1186/s13072-017-0130-8] [Citation(s) in RCA: 285] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 04/26/2017] [Indexed: 12/18/2022] Open
Abstract
The prevailing views as to the form, function, and regulation of genomic methylation patterns have their origin many years in the past, at a time when the structure of the mammalian genome was only dimly perceived, when the number of protein-encoding mammalian genes was believed to be at least five times greater than the actual number, and when it was not understood that only ~10% of the genome is under selective pressure and likely to have biological function. We use more recent findings from genome biology and whole-genome methylation profiling to provide a reappraisal of the shape of genomic methylation patterns and the nature of the changes that they undergo during gametogenesis and early development. We observe that the sequences that undergo deep changes in methylation status during early development are largely sequences without regulatory function. We also discuss recent findings that begin to explain the remarkable fidelity of maintenance methylation. Rather than a general overview of DNA methylation in mammals (which has been the subject of many reviews), we present a new analysis of the distribution of methylated CpG dinucleotides across the multiple sequence compartments that make up the mammalian genome, and we offer an updated interpretation of the nature of the changes in methylation patterns that occur in germ cells and early embryos. We discuss the cues that might designate specific sequences for demethylation or de novo methylation during development, and we summarize recent findings on mechanisms that maintain methylation patterns in mammalian genomes. We also describe the several human disorders, each very different from the other, that are caused by mutations in DNA methyltransferase genes.
Collapse
Affiliation(s)
- John R Edwards
- Center for Pharmacogenomics, Department of Medicine, Washington University School of Medicine, St. Louis, MO USA
| | - Olya Yarychkivska
- Department of Genetics and Development, College of Physicians and Surgeons of Columbia University, New York, NY USA
| | - Mathieu Boulard
- Department of Genetics and Development, College of Physicians and Surgeons of Columbia University, New York, NY USA
| | - Timothy H Bestor
- Department of Genetics and Development, College of Physicians and Surgeons of Columbia University, New York, NY USA
| |
Collapse
|
27
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
28
|
Wright JB, Sanjana NE. CRISPR Screens to Discover Functional Noncoding Elements. Trends Genet 2016; 32:526-529. [PMID: 27423542 DOI: 10.1016/j.tig.2016.06.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 06/20/2016] [Accepted: 06/21/2016] [Indexed: 12/17/2022]
Abstract
A major challenge in genomics is to identify functional elements in the noncoding genome. Recently, pooled clustered regularly interspersed palindromic repeat (CRISPR) mutagenesis screens of noncoding regions have emerged as a novel method for finding elements that impact gene expression and phenotype/disease-relevant biological processes. Here we review and compare different approaches for high-throughput dissection of noncoding elements.
Collapse
Affiliation(s)
- Jason B Wright
- Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA; McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Neville E Sanjana
- New York Genome Center, New York, NY 10013, USA; Center for Genomics and Systems Biology, Department of Biology, New York University, NY 10003, USA.
| |
Collapse
|
29
|
Meryet-Figuière M, Lambert B, Gauduchon P, Vigneron N, Brotin E, Poulain L, Denoyelle C. An overview of long non-coding RNAs in ovarian cancers. Oncotarget 2016; 7:44719-44734. [PMID: 26992233 PMCID: PMC5190131 DOI: 10.18632/oncotarget.8089] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 02/23/2016] [Indexed: 12/14/2022] Open
Abstract
As with miRNAs a decade ago, the scientific community recently understood that lncRNAs represent a new layer of complexity in the regulation of gene expression. Although only a subset of lncRNAs has been functionally characterized, it is clear that they are deeply involved in the most critical physiological and pathological biological processes. This review shows that in ovarian carcinoma, data already available testify to the importance of lncRNAs and that the demonstration of an ever-growing role of lncRNAs in the biology of this malignancy can be expected from future studies. We also underline the importance of their relationship with associated protein partners and miRNAs. Together, the available information suggests that the emerging field of lncRNAs will pave the way for a better understanding of ovarian cancer biology and might lead to the development of innovative therapeutic approaches. Moreover, lncRNAs expression signatures either alone or in combination with other types of markers (miRNAs, mRNAs, proteins) could prove useful to predict outcome or treatment follow-up in order to improve the therapeutic care of ovarian carcinoma patients.
Collapse
Affiliation(s)
- Matthieu Meryet-Figuière
- Inserm U1199, Biology and Innovative Therapeutics for Locally Aggressive Cancer (BioTICLA) Unit, Caen, France
- Normandie University, Caen, France
- UNICAEN, Caen, France
- Comprehensive Cancer Center CLCC François Baclesse, Unicancer, Caen, France
| | - Bernard Lambert
- Inserm U1199, Biology and Innovative Therapeutics for Locally Aggressive Cancer (BioTICLA) Unit, Caen, France
- Normandie University, Caen, France
- UNICAEN, Caen, France
- Comprehensive Cancer Center CLCC François Baclesse, Unicancer, Caen, France
- CNRS, Paris, France
| | - Pascal Gauduchon
- Inserm U1199, Biology and Innovative Therapeutics for Locally Aggressive Cancer (BioTICLA) Unit, Caen, France
- Normandie University, Caen, France
- UNICAEN, Caen, France
- Comprehensive Cancer Center CLCC François Baclesse, Unicancer, Caen, France
| | - Nicolas Vigneron
- Inserm U1199, Biology and Innovative Therapeutics for Locally Aggressive Cancer (BioTICLA) Unit, Caen, France
- Normandie University, Caen, France
- UNICAEN, Caen, France
- Comprehensive Cancer Center CLCC François Baclesse, Unicancer, Caen, France
| | - Emilie Brotin
- Inserm U1199, Biology and Innovative Therapeutics for Locally Aggressive Cancer (BioTICLA) Unit, Caen, France
- Normandie University, Caen, France
- UNICAEN, Caen, France
- Comprehensive Cancer Center CLCC François Baclesse, Unicancer, Caen, France
| | - Laurent Poulain
- Inserm U1199, Biology and Innovative Therapeutics for Locally Aggressive Cancer (BioTICLA) Unit, Caen, France
- Normandie University, Caen, France
- UNICAEN, Caen, France
- Comprehensive Cancer Center CLCC François Baclesse, Unicancer, Caen, France
| | - Christophe Denoyelle
- Inserm U1199, Biology and Innovative Therapeutics for Locally Aggressive Cancer (BioTICLA) Unit, Caen, France
- Normandie University, Caen, France
- UNICAEN, Caen, France
- Comprehensive Cancer Center CLCC François Baclesse, Unicancer, Caen, France
| |
Collapse
|
30
|
Transforming Big Data into Cancer-Relevant Insight: An Initial, Multi-Tier Approach to Assess Reproducibility and Relevance. Mol Cancer Res 2016; 14:675-82. [PMID: 27401613 DOI: 10.1158/1541-7786.mcr-16-0090] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 06/02/2016] [Indexed: 11/16/2022]
Abstract
The Cancer Target Discovery and Development (CTD(2)) Network was established to accelerate the transformation of "Big Data" into novel pharmacologic targets, lead compounds, and biomarkers for rapid translation into improved patient outcomes. It rapidly became clear in this collaborative network that a key central issue was to define what constitutes sufficient computational or experimental evidence to support a biologically or clinically relevant finding. This article represents a first attempt to delineate the challenges of supporting and confirming discoveries arising from the systematic analysis of large-scale data resources in a collaborative work environment and to provide a framework that would begin a community discussion to resolve these challenges. The Network implemented a multi-tier framework designed to substantiate the biological and biomedical relevance as well as the reproducibility of data and insights resulting from its collaborative activities. The same approach can be used by the broad scientific community to drive development of novel therapeutic and biomarker strategies for cancer. Mol Cancer Res; 14(8); 675-82. ©2016 AACR.
Collapse
|
31
|
Evolutionary direction of processed pseudogenes. SCIENCE CHINA-LIFE SCIENCES 2016; 59:839-49. [PMID: 27333782 DOI: 10.1007/s11427-016-5074-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 03/08/2016] [Indexed: 10/21/2022]
Abstract
While some pseudogenes have been reported to play important roles in gene regulation, little is known about the possible relationship between pseudogene functions and evolutionary process of pseudogenes, or about the forces responsible for the pseudogene evolution. In this study, we characterized human processed pseudogenes in terms of evolutionary dynamics. Our results show that pseudogenes tend to evolve toward: lower GC content, strong dinucleotide bias, reduced abundance of transcription factor binding motifs and short palindromes, and decreased ability to form nucleosomes. We explored possible evolutionary forces that shaped the evolution pattern of pseudogenes, and concluded that mutations in pseudogenes are likely determined, at least partially, by neighbor-dependent mutational bias and recombination-associated selection.
Collapse
|
32
|
Moraes F, Góes A. A decade of human genome project conclusion: Scientific diffusion about our genome knowledge. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2016; 44:215-23. [PMID: 26952518 DOI: 10.1002/bmb.20952] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Revised: 10/09/2015] [Accepted: 11/29/2015] [Indexed: 05/15/2023]
Abstract
The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016.
Collapse
Affiliation(s)
- Fernanda Moraes
- Rio de Janeiro State University, Science and Biology Teaching Department-Biology Institute, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Andréa Góes
- Rio de Janeiro State University, Science and Biology Teaching Department-Biology Institute, Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
33
|
Deciphering ENCODE. Trends Genet 2016; 32:238-249. [DOI: 10.1016/j.tig.2016.02.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 02/03/2016] [Accepted: 02/04/2016] [Indexed: 12/16/2022]
|
34
|
Ball P. The problems of biological information. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:rsta.2015.0072. [PMID: 26857677 DOI: 10.1098/rsta.2015.0072] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/17/2015] [Indexed: 06/05/2023]
Abstract
The discovery of genetic encoding in the DNA molecule, and its mode of translation into protein structures, secured the modern view of biology as an information science. But it remains unclear what kind of information science it is. The all-too-ready analogy with computer programs stored on spools of magnetic tape has been hard to relinquish, even while the complexity of information storage and flow in the cell has become ever more apparent. To understand how life is sustained and evolves through encoding and processing of information, new ideas are now required, within which genetic encoding in DNA seems likely to provide only one part of a much broader and more profound puzzle. In particular, it seems likely that the emerging picture will need to take a more subtle view of causation, context and meaning in the orchestrated, hierarchical processes that make life possible.
Collapse
Affiliation(s)
- Philip Ball
- 18 Hillcourt Road, East Dulwich, London SE22 0PE, UK
| |
Collapse
|
35
|
Cournac A, Koszul R, Mozziconacci J. The 3D folding of metazoan genomes correlates with the association of similar repetitive elements. Nucleic Acids Res 2016; 44:245-55. [PMID: 26609133 PMCID: PMC4705657 DOI: 10.1093/nar/gkv1292] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Revised: 10/13/2015] [Accepted: 11/04/2015] [Indexed: 12/11/2022] Open
Abstract
The potential roles of the numerous repetitive elements found in the genomes of multi-cellular organisms remain speculative. Several studies have suggested a role in stabilizing specific 3D genomic contacts. To test this hypothesis, we exploited inter-chromosomal contacts frequencies obtained from Hi-C experiments and show that the folding of the human, mouse and Drosophila genomes is associated with a significant co-localization of several specific repetitive elements, notably many elements of the SINE family. These repeats tend to be the oldest ones and are enriched in transcription factor binding sites. We propose that the co-localization of these repetitive elements may explain the global conservation of genome folding observed between homologous regions of the human and mouse genome. Taken together, these results support a contribution of specific repetitive elements in maintaining and/or reshaping genome architecture over evolutionary times.
Collapse
Affiliation(s)
- Axel Cournac
- LPTMC, Université Pierre et Marie Curie, Sorbonne université, 4 Place Jussieu 75005 Paris, France Institut Pasteur, Group Spatial Regulation of Genomes, Department of Genomes and Genetics, F-75015 Paris, France CNRS, UMR3525, F-75015 Paris, France
| | - Romain Koszul
- Institut Pasteur, Group Spatial Regulation of Genomes, Department of Genomes and Genetics, F-75015 Paris, France CNRS, UMR3525, F-75015 Paris, France
| | - Julien Mozziconacci
- LPTMC, Université Pierre et Marie Curie, Sorbonne université, 4 Place Jussieu 75005 Paris, France
| |
Collapse
|
36
|
Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJM, Montgomery SB, Griffith OL. ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic Acids Res 2015; 44:D126-32. [PMID: 26578589 PMCID: PMC4702855 DOI: 10.1093/nar/gkv1203] [Citation(s) in RCA: 102] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 10/26/2015] [Indexed: 12/26/2022] Open
Abstract
The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18 species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available for browsing and download at http://www.oreganno.org/.
Collapse
Affiliation(s)
- Robert Lesurf
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Kelsy C Cotto
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Grace Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Malachi Griffith
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Katayoon Kasaian
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC V5Z 4S6, Canada Department of Molecular Biology & Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada Department of Medical Genetics, University of British Columbia, Vancouver, BC V6T 1Z3, Canada
| | - Stephen B Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Obi L Griffith
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO 63108, USA Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO 63110, USA Department of Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | | |
Collapse
|
37
|
The relativity of biological function. Theory Biosci 2015; 134:143-7. [DOI: 10.1007/s12064-015-0215-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 09/03/2015] [Indexed: 01/09/2023]
|
38
|
Brunet TDP, Doolittle WF. Multilevel Selection Theory and the Evolutionary Functions of Transposable Elements. Genome Biol Evol 2015; 7:2445-57. [PMID: 26253318 PMCID: PMC4558868 DOI: 10.1093/gbe/evv152] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
One of several issues at play in the renewed debate over “junk DNA” is the organizational level at which genomic features might be seen as selected, and thus to exhibit function, as etiologically defined. The intuition frequently expressed by molecular geneticists that junk DNA is functional because it serves to “speed evolution” or as an “evolutionary repository” could be recast as a claim about selection between species (or clades) rather than within them, but this is not often done. Here, we review general arguments for the importance of selection at levels above that of organisms in evolution, and develop them further for a common genomic feature: the carriage of transposable elements (TEs). In many species, not least our own, TEs comprise a large fraction of all nuclear DNA, and whether they individually or collectively contribute to fitness—or are instead junk— is a subject of ongoing contestation. Even if TEs generally owe their origin to selfish selection at the lowest level (that of genomes), their prevalence in extant organisms and the prevalence of extant organisms bearing them must also respond to selection within species (on organismal fitness) and between species (on rates of speciation and extinction). At an even higher level, the persistence of clades may be affected (positively or negatively) by TE carriage. If indeed TEs speed evolution, it is at these higher levels of selection that such a function might best be attributed to them as a class.
Collapse
Affiliation(s)
- Tyler D P Brunet
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| | - W Ford Doolittle
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
39
|
Casane D, Fumey J, Laurenti P. [ENCODE apophenia or a panglossian analysis of the human genome]. Med Sci (Paris) 2015; 31:680-6. [PMID: 26152174 DOI: 10.1051/medsci/20153106023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
In September 2012, a batch of more than 30 articles presenting the results of the ENCODE (Encyclopaedia of DNA Elements) project was released. Many of these articles appeared in Nature and Science, the two most prestigious interdisciplinary scientific journals. Since that time, hundreds of other articles dedicated to the further analyses of the Encode data have been published. The time of hundreds of scientists and hundreds of millions of dollars were not invested in vain since this project had led to an apparent paradigm shift: contrary to the classical view, 80% of the human genome is not junk DNA, but is functional. This hypothesis has been criticized by evolutionary biologists, sometimes eagerly, and detailed refutations have been published in specialized journals with impact factors far below those that published the main contribution of the Encode project to our understanding of genome architecture. In 2014, the Encode consortium released a new batch of articles that neither suggested that 80% of the genome is functional nor commented on the disappearance of their 2012 scientific breakthrough. Unfortunately, by that time many biologists had accepted the idea that 80% of the genome is functional, or at least, that this idea is a valid alternative to the long held evolutionary genetic view that it is not. In order to understand the dynamics of the genome, it is necessary to re-examine the basics of evolutionary genetics because, not only are they well established, they also will allow us to avoid the pitfall of a panglossian interpretation of Encode. Actually, the architecture of the genome and its dynamics are the product of trade-offs between various evolutionary forces, and many structural features are not related to functional properties. In other words, evolution does not produce the best of all worlds, not even the best of all possible worlds, but only one possible world.
Collapse
Affiliation(s)
- Didier Casane
- Laboratoire Évolution, génomes, comportement, écologie, CNRS université Paris-Sud UMR 9191, IRD UMR 247, Avenue de la Terrasse, bâtiment 13, boîte postale 1, 91198 Gif-sur-Yvette, France - Université Paris-Diderot, Sorbonne Paris-Cité, Paris, France
| | - Julien Fumey
- Laboratoire Évolution, génomes, comportement, écologie, CNRS université Paris-Sud UMR 9191, IRD UMR 247, Avenue de la Terrasse, bâtiment 13, boîte postale 1, 91198 Gif-sur-Yvette, France
| | - Patrick Laurenti
- Laboratoire Évolution, génomes, comportement, écologie, CNRS université Paris-Sud UMR 9191, IRD UMR 247, Avenue de la Terrasse, bâtiment 13, boîte postale 1, 91198 Gif-sur-Yvette, France - Université Paris-Diderot, Sorbonne Paris-Cité, Paris, France
| |
Collapse
|
40
|
Abstract
Eukaryogenesis is widely viewed as an improbable evolutionary transition uniquely affecting the evolution of life on this planet. However, scientific and popular rhetoric extolling this event as a singularity lacks rigorous evidential and statistical support. Here, we question several of the usual claims about the specialness of eukaryogenesis, focusing on both eukaryogenesis as a process and its outcome, the eukaryotic cell. We argue in favor of four ideas. First, the criteria by which we judge eukaryogenesis to have required a genuinely unlikely series of events 2 billion years in the making are being eroded by discoveries that fill in the gaps of the prokaryote:eukaryote "discontinuity." Second, eukaryogenesis confronts evolutionary theory in ways not different from other evolutionary transitions in individuality; parallel systems can be found at several hierarchical levels. Third, identifying which of several complex cellular features confer on eukaryotes a putative richer evolutionary potential remains an area of speculation: various keys to success have been proposed and rejected over the five-decade history of research in this area. Fourth, and perhaps most importantly, it is difficult and may be impossible to eliminate eukaryocentric bias from the measures by which eukaryotes as a whole are judged to have achieved greater success than prokaryotes as a whole. Overall, we question whether premises of existing theories about the uniqueness of eukaryogenesis and the greater evolutionary potential of eukaryotes have been objectively formulated and whether, despite widespread acceptance that eukaryogenesis was "special," any such notion has more than rhetorical value.
Collapse
|
41
|
Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, Casas E, Cheng HH, Clarke L, Couldrey C, Dalrymple BP, Elsik CG, Foissac S, Giuffra E, Groenen MA, Hayes BJ, Huang LS, Khatib H, Kijas JW, Kim H, Lunney JK, McCarthy FM, McEwan JC, Moore S, Nanduri B, Notredame C, Palti Y, Plastow GS, Reecy JM, Rohrer GA, Sarropoulou E, Schmidt CJ, Silverstein J, Tellam RL, Tixier-Boichard M, Tosser-Klopp G, Tuggle CK, Vilkki J, White SN, Zhao S, Zhou H. Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional Annotation of Animal Genomes project. Genome Biol 2015; 16:57. [PMID: 25854118 PMCID: PMC4373242 DOI: 10.1186/s13059-015-0622-4] [Citation(s) in RCA: 203] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
We describe the organization of a nascent international effort, the Functional Annotation of Animal Genomes (FAANG) project, whose aim is to produce comprehensive maps of functional elements in the genomes of domesticated animal species.
Collapse
|
42
|
Ashour ME, Atteya R, El-Khamisy SF. Topoisomerase-mediated chromosomal break repair: an emerging player in many games. Nat Rev Cancer 2015; 15:137-51. [PMID: 25693836 DOI: 10.1038/nrc3892] [Citation(s) in RCA: 126] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The mammalian genome is constantly challenged by exogenous and endogenous threats. Although much is known about the mechanisms that maintain DNA and RNA integrity, we know surprisingly little about the mechanisms that underpin the pathology and tissue specificity of many disorders caused by defective responses to DNA or RNA damage. Of the different types of endogenous damage, protein-linked DNA breaks (PDBs) are emerging as an important player in cancer development and therapy. PDBs can arise during the abortive activity of DNA topoisomerases, a class of enzymes that modulate DNA topology during several chromosomal transactions, such as gene transcription and DNA replication, recombination and repair. In this Review, we discuss the mechanisms underpinning topoisomerase-induced PDB formation and repair with a focus on their role during gene transcription and the development of tissue-specific cancers.
Collapse
Affiliation(s)
- Mohamed E Ashour
- 1] Krebs Institute, Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, S10 2TN, UK. [2] Center for Genomics, Helmy Institute, Zewail City of Science and Technology, Giza 12588, Egypt
| | - Reham Atteya
- Center for Genomics, Helmy Institute, Zewail City of Science and Technology, Giza 12588, Egypt
| | - Sherif F El-Khamisy
- 1] Krebs Institute, Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, S10 2TN, UK. [2] Center for Genomics, Helmy Institute, Zewail City of Science and Technology, Giza 12588, Egypt
| |
Collapse
|
43
|
Palazzo AF, Lee ES. Non-coding RNA: what is functional and what is junk? Front Genet 2015; 6:2. [PMID: 25674102 PMCID: PMC4306305 DOI: 10.3389/fgene.2015.00002] [Citation(s) in RCA: 497] [Impact Index Per Article: 55.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 01/06/2015] [Indexed: 12/12/2022] Open
Abstract
The genomes of large multicellular eukaryotes are mostly comprised of non-protein coding DNA. Although there has been much agreement that a small fraction of these genomes has important biological functions, there has been much debate as to whether the rest contributes to development and/or homeostasis. Much of the speculation has centered on the genomic regions that are transcribed into RNA at some low level. Unfortunately these RNAs have been arbitrarily assigned various names, such as “intergenic RNA,” “long non-coding RNAs” etc., which have led to some confusion in the field. Many researchers believe that these transcripts represent a vast, unchartered world of functional non-coding RNAs (ncRNAs), simply because they exist. However, there are reasons to question this Panglossian view because it ignores our current understanding of how evolution shapes eukaryotic genomes and how the gene expression machinery works in eukaryotic cells. Although there are undoubtedly many more functional ncRNAs yet to be discovered and characterized, it is also likely that many of these transcripts are simply junk. Here, we discuss how to determine whether any given ncRNA has a function. Importantly, we advocate that in the absence of any such data, the appropriate null hypothesis is that the RNA in question is junk.
Collapse
Affiliation(s)
| | - Eliza S Lee
- Department of Biochemistry, University of Toronto Toronto, ON, Canada
| |
Collapse
|
44
|
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 2015; 47:276-83. [PMID: 25599402 PMCID: PMC4342276 DOI: 10.1038/ng.3196] [Citation(s) in RCA: 173] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct “fingerprints” based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2–7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- 1] Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA. [2] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
45
|
Approximation to the distribution of fitness effects across functional categories in human segregating polymorphisms. PLoS Genet 2014; 10:e1004697. [PMID: 25375159 PMCID: PMC4222666 DOI: 10.1371/journal.pgen.1004697] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 08/22/2014] [Indexed: 02/03/2023] Open
Abstract
Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral or not strongly selected, and we do not rely on fitting the DFE of all new nonsynonymous mutations to a single probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this and other conservation scores to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model on SNP data. Our method serves to approximate the deleterious DFE of mutations that are segregating, regardless of their genomic consequence. We can then compare the proportion of mutations that are negatively selected or neutral across various categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly peaked at neutrality, while the distribution of nonsynonymous polymorphisms has a second peak at [Formula: see text]. Other types of polymorphisms have shapes that fall roughly in between these two. We find that transcriptional start sites, strong CTCF-enriched elements and enhancers are the regulatory categories with the largest proportion of deleterious polymorphisms.
Collapse
|
46
|
Vandenbergh DJ, Schlomer GL. Finding genomic function for genetic associations in nicotine addiction research: the ENCODE project's role in future pharmacogenomic analysis. Pharmacol Biochem Behav 2014; 123:34-44. [PMID: 24486638 PMCID: PMC4117825 DOI: 10.1016/j.pbb.2014.01.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 01/17/2014] [Accepted: 01/22/2014] [Indexed: 11/16/2022]
Abstract
Tobacco-related behaviors and the underlying addiction to nicotine are complex tangles of genetic and environmental factors. Efforts to understand the genetic component of these traits have identified sites in the genome (single nucleotide polymorphisms, or SNPs) that might account for some part of the role of genetics in nicotine addiction. Encouragingly, some of these candidate SNPs remain significant in meta-analyses. However, genetic associations cannot be fully assessed, regardless of statistical significance, without an understanding of the functional consequences of the alleles present at these SNPs. The proper experimental test for allelic function can be very difficult to define, representing a roadblock in translating genetic results into treatment to prevent smoking and other nicotine-related behaviors. This roadblock can be navigated in part with a new web-based tool, the Encyclopedia of DNA Elements (ENCODE). ENCODE is a compilation of searchable data on several types of biochemical functions or "marks" across the genome. These data can be queried for the co-localization of a candidate SNP and a biochemical mark. The presence of a SNP within a marked region of DNA enables the generation of better-informed hypotheses to test possible functional roles of alleles at a candidate SNP. Two examples of such co-localizations are presented. One example reveals ENCODE's ability to relate a candidate SNP's function with a gene very far from the physical location of the SNP. The second example reveals a new potential function of the SNP, rs4105144, that has been genetically associated with the number of cigarettes smoked per day. Details for accessing the ENCODE data for this SNP are provided to serve as a tutorial. By serving as a bridge between genetic associations and biochemical function, ENCODE has the power to propel progress in untangling the genetic aspects of nicotine addiction - a major public health concern.
Collapse
Affiliation(s)
- David J Vandenbergh
- Department of Biobehavioral Health, The Pennsylvania State University, 219 Biobehavioral Health Building, University Park, PA 16802, USA; Penn State Institute of the Neurosciences, 101 Life Sciences Building, University Park, PA 16802, USA.
| | - Gabriel L Schlomer
- Department of Human Development and Family Studies, The Pennsylvania State University, 315 Health and Human Development, East, University Park, PA 16802, USA.
| |
Collapse
|
47
|
Bracken-Grissom H, Collins AG, Collins T, Crandall K, Distel D, Dunn C, Giribet G, Haddock S, Knowlton N, Martindale M, Medina M, Messing C, O'Brien SJ, Paulay G, Putnam N, Ravasi T, Rouse GW, Ryan JF, Schulze A, Wörheide G, Adamska M, Bailly X, Breinholt J, Browne WE, Diaz MC, Evans N, Flot JF, Fogarty N, Johnston M, Kamel B, Kawahara AY, Laberge T, Lavrov D, Michonneau F, Moroz LL, Oakley T, Osborne K, Pomponi SA, Rhodes A, Santos SR, Satoh N, Thacker RW, Van de Peer Y, Voolstra CR, Welch DM, Winston J, Zhou X. The Global Invertebrate Genomics Alliance (GIGA): developing community resources to study diverse invertebrate genomes. J Hered 2014; 105:1-18. [PMID: 24336862 DOI: 10.1093/jhered/est084] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Over 95% of all metazoan (animal) species comprise the "invertebrates," but very few genomes from these organisms have been sequenced. We have, therefore, formed a "Global Invertebrate Genomics Alliance" (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture.
Collapse
|
48
|
Tagu D, Colbourne JK, Nègre N. Genomic data integration for ecological and evolutionary traits in non-model organisms. BMC Genomics 2014; 15:490. [PMID: 25047861 PMCID: PMC4108784 DOI: 10.1186/1471-2164-15-490] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 06/17/2014] [Indexed: 02/02/2023] Open
Abstract
Why is it needed to develop system biology initiatives such as ENCODE on non-model organisms?
Collapse
Affiliation(s)
- Denis Tagu
- />INRA Rennes, UMR 1349 IGEPP, BP 35327, 35657 Le Rheu Cedex, France
| | - John K Colbourne
- />School of Bioscience, University of Birmingham, Birmingham, West Midlands England
| | - Nicolas Nègre
- />Université Montpellier 2, UMR1333 DGIMI, F-34095 Montpellier, France
- />INRA, UMR1333 DGIMI, F-34095 Montpellier, France
| |
Collapse
|
49
|
Lu S. Zn2+ blocks annealing of complementary single-stranded DNA in a sequence-selective manner. Sci Rep 2014; 4:5464. [PMID: 24965053 PMCID: PMC4071324 DOI: 10.1038/srep05464] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Accepted: 06/11/2014] [Indexed: 12/19/2022] Open
Abstract
Zinc is the second most abundant trace element essential for all living organisms. In human body, 30–40% of the total zinc ion (Zn2+) is localized in the nucleus. Intranuclear free Zn2+ sparks caused by reactive oxygen species have been observed in eukaryotic cells, but question if these free Zn2+ outrages could have affected annealing of complementary single-stranded (ss) DNA, a crucial step in DNA synthesis, repair and recombination, has never been raised. Here the author reports that Zn2+ blocks annealing of complementary ssDNA in a sequence-selective manner under near-physiological conditions as demonstrated in vitro using a low-temperature EDTA-free agarose gel electrophoresis (LTEAGE) procedure. Specifically, it is shown that Zn2+ does not block annealing of repetitive DNA sequences lacking CG/GC sites that are the major components of junk DNA. It is also demonstrated that Zn2+ blocks end-joining of double-stranded (ds) DNA fragments with 3′ overhangs mimicking double-strand breaks, and prevents renaturation of long stretches (>1 kb) of denatured dsDNA, in which Zn2+-tolerant intronic DNA provides annealing protection on otherwise Zn2+-sensitive coding DNA. These findings raise a challenging hypothesis that Zn2+-ssDNA interaction might be among natural forces driving eukaryotic genomes to maintain the Zn2+-tolerant repetitive DNA for adapting to the Zn2+-rich nucleus.
Collapse
Affiliation(s)
- Shunwen Lu
- USDA-ARS, Cereal Crops Research Unit, Fargo, ND 58102, USA
| |
Collapse
|
50
|
Doolittle WF, Brunet TDP, Linquist S, Gregory TR. Distinguishing between "function" and "effect" in genome biology. Genome Biol Evol 2014; 6:1234-7. [PMID: 24814287 PMCID: PMC4041003 DOI: 10.1093/gbe/evu098] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Much confusion in genome biology results from conflation of possible meanings of the word “function.” We suggest that, in this connection, attention should be paid to evolutionary biologists and philosophers who have previously dealt with this problem. We need only decide that although all genomic structures have effects, only some of them should be said to have functions. Although it will very often be difficult or impossible to establish function (strictly defined), it should not automatically be assumed. We enjoin genomicists in particular to pay greater attention to parsing biological effects.
Collapse
Affiliation(s)
- W Ford Doolittle
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
| | - Tyler D P Brunet
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
| | | | - T Ryan Gregory
- Department of Integrative Biology, University of Guelph, ON, Canada
| |
Collapse
|