1
|
Abstract
We developed dbCNS (http://yamasati.nig.ac.jp/dbcns), a new database for conserved noncoding sequences (CNSs). CNSs exist in many eukaryotes and are assumed to be involved in protein expression control. Version 1 of dbCNS, introduced here, includes a powerful and precise CNS identification pipeline for multiple vertebrate genomes. Mutations in CNSs may induce morphological changes and cause genetic diseases. For this reason, many vertebrate CNSs have been identified, with special reference to primate genomes. We integrated ∼6.9 million CNSs from many vertebrate genomes into dbCNS, which allows users to extract CNSs near genes of interest using keyword searches. In addition to CNSs, dbCNS contains published genome sequences of 161 species. With purposeful taxonomic sampling of genomes, users can employ CNSs as queries to reconstruct CNS alignments and phylogenetic trees, to evaluate CNS modifications, acquisitions, and losses, and to roughly identify species with CNSs having accelerated substitution rates. dbCNS also produces links to dbSNP for searching pathogenic single-nucleotide polymorphisms in human CNSs. Thus, dbCNS connects morphological changes with genetic diseases. A test analysis using 38 gnathostome genomes was accomplished within 30 s. dbCNS results can evaluate CNSs identified by other stand-alone programs using genome-scale data.
Collapse
Affiliation(s)
- Jun Inoue
- Population Genetics Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan.,Center for Earth Surface System Dynamics, Atmosphere and Ocean Research Institute, University of Tokyo, Kashiwa, Japan
| | - Naruya Saitou
- Population Genetics Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan.,Department of Okinawa Bioinformation Bank, Faculty of Medicine, University of the Ryukyus, Okinawa, Japan
| |
Collapse
|
2
|
Kitano T, Kim CG, Saitou N. Nucleotide sequencing of the HoxA gene cluster using Gorilla fosmid clones. J Genomics 2020; 8:80-83. [PMID: 32934753 PMCID: PMC7484619 DOI: 10.7150/jgen.50468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 08/21/2020] [Indexed: 11/05/2022] Open
Abstract
We sequenced the western gorilla (Gorilla gorilla) HoxA cluster region using seven fosmid clones, and found that the total tiling path sequence was 214,185 bp from the 5' non-genic region of HoxA1 to the 3' non-genic region of Evx1. We compared the nucleotide sequence with the gorilla genome sequence in the NCBI database, and the overall proportion of nucleotide difference was estimated to be 0.0005-0.0007. These estimates are lower than overall genomic polymorphism in gorillas.
Collapse
Affiliation(s)
- Takashi Kitano
- Division of Population Genetics, National Institute of Genetics, Japan
| | - Choong-Gon Kim
- Division of Population Genetics, National Institute of Genetics, Japan
| | - Naruya Saitou
- Division of Population Genetics, National Institute of Genetics, Japan
| |
Collapse
|
3
|
Onimaru K, Kuraku S. Inference of the ancestral vertebrate phenotype through vestiges of the whole-genome duplications. Brief Funct Genomics 2019; 17:352-361. [PMID: 29566222 PMCID: PMC6158797 DOI: 10.1093/bfgp/ely008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Inferring the phenotype of the last common ancestor of living vertebrates is a challenging problem because of several unresolvable factors. They include the lack of reliable out-groups of living vertebrates, poor information about less fossilizable organs and specialized traits of phylogenetically important species, such as lampreys and hagfishes (e.g. secondary loss of vertebrae in adult hagfishes). These factors undermine the reliability of ancestral reconstruction by traditional character mapping approaches based on maximum parsimony. In this article, we formulate an approach to hypothesizing ancestral vertebrate phenotypes using information from the phylogenetic and functional properties of genes duplicated by genome expansions in early vertebrate evolution. We named the conjecture as ‘chronological reconstruction of ohnolog functions (CHROF)’. This CHROF conjecture raises the possibility that the last common ancestor of living vertebrates may have had more complex traits than currently thought.
Collapse
Affiliation(s)
- Koh Onimaru
- RIKEN Center for Life Science Technologies, Kobe, Hyogo Japan.,Department of biological science, Tokyo Institute of Technology, Tokyo, Japan
| | | |
Collapse
|
4
|
Sakuma Y, Matsunami M, Takada T, Suzuki H. Multiple Conserved Elements Structuring Inverted Repeats in the Mammalian Coat Color-Related Gene Asip. Zoolog Sci 2019; 36:23-30. [PMID: 31116535 DOI: 10.2108/zs180081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 09/17/2018] [Indexed: 11/17/2022]
Abstract
In the agouti signaling gene protein (Asip) of the house mouse (Mus musculus), inverted repeat (IR) arrays are known to exist in a non-coding region adjacent to the ventral-specific promoter region and the accompanying two exons (exons 1A and 1A'), which are around 100 kb upstream from the amino acid coding regions of exons 2, 3, and 4. To determine the gene structure of mammalian Asip and to elucidate trends in its evolution, non-coding sequences of six rodent (mouse, rat, Chinese hamster, squirrel, guinea pig, and naked mole rat) and three non-rodent (rabbit, human, and cow) species were retrieved from databases and compared. Our homology search analyses revealed the presence of three to five highly conserved non-coding elements (CNE). These CNEs were found to form IRs in rodents and lagomorphs. Combinations of IRs were further shown to build symmetric, long IR arrays. Intra- and inter-specific comparisons of the sequences of three universal CNEs showed homogeneity between CNE pairs within species. This implies that certain evolutionary constraints maintained the IR structure in the rodent and rabbit species.
Collapse
Affiliation(s)
- Yuki Sakuma
- Laboratory of Ecology and Genetics, Graduate School of Environmental Science, Hokkaido University, Kita-ku, Sapporo 060-0810, Japan
| | - Masatoshi Matsunami
- Laboratory of Ecology and Genetics, Graduate School of Environmental Science, Hokkaido University, Kita-ku, Sapporo 060-0810, Japan, .,Graduate School of Medicine, University of the Ryukyus, Nishihara-cho 903-0215, Japan,
| | - Toyoyuki Takada
- Mammalian Genetics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Hitoshi Suzuki
- Laboratory of Ecology and Genetics, Graduate School of Environmental Science, Hokkaido University, Kita-ku, Sapporo 060-0810, Japan
| |
Collapse
|
5
|
Quinn JP, Savage AL, Bubb VJ. Non-coding genetic variation shaping mental health. Curr Opin Psychol 2018; 27:18-24. [PMID: 30099302 PMCID: PMC6624474 DOI: 10.1016/j.copsyc.2018.07.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 07/16/2018] [Indexed: 12/12/2022]
Abstract
Gene expression determined by the genome mediating a response to cell environment. Genetic variation results in distinct individual response in gene expression. Non-coding DNA is an important site for such functional genetic variation. Gene expression is a major modulator of brain chemistry and thus behavior.
Over 98% of our genome is non-coding and is now recognised to have a major role in orchestrating the tissue specific and stimulus inducible gene expression pattern which underpins our wellbeing and mental health. The non-coding genome responds functionally to our environment at all levels, encompassing the span from psychological to physiological challenge. The gene expression pattern, termed the transcriptome, ultimately gives us our neurochemistry. Therefore a major modulator of mental wellbeing is how our genes are regulated in response to life experiences. Superimposed on the aforementioned non-coding DNA framework is a vast body of genetic variation in the elements that control response to challenges. These differences, termed polymorphisms, allow for a differential response from a specific DNA element to the same challenge thus potentially allowing ‘individuality’ in the modulation of our transcriptome. This review will focus on a fundamental mechanism defining our psychological and psychiatric wellbeing, namely how genetic variation can be correlated with differential gene expression in response to specific challenges, thus resulting in altered neurochemistry which consequently may shape behaviour.
Collapse
Affiliation(s)
- John P Quinn
- Department of Molecular and Clinical Pharmacology, Institute of Translational Medicine, The University of Liverpool, Liverpool L69 3BX, UK.
| | - Abigail L Savage
- Department of Molecular and Clinical Pharmacology, Institute of Translational Medicine, The University of Liverpool, Liverpool L69 3BX, UK
| | - Vivien J Bubb
- Department of Molecular and Clinical Pharmacology, Institute of Translational Medicine, The University of Liverpool, Liverpool L69 3BX, UK
| |
Collapse
|
6
|
Saitou N. Neutral Evolution. INTRODUCTION TO EVOLUTIONARY GENOMICS 2018. [PMCID: PMC7121930 DOI: 10.1007/978-3-319-92642-1_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Neutral evolution is the default process of genomic changes. This is because our world is finite, and the randomness, indispensable for neutral evolution, is important when we consider the history of a finite world. The random nature of DNA propagation is discussed using branching process, coalescent process, Markov process, and diffusion process. Expected evolutionary patterns under neutrality are then discussed on fixation probability, rate of evolution, and amount of DNA variation kept in population. We then discuss various features of neutral evolution starting from evolutionary rates, synonymous and nonsynonymous substitutions, junk DNA, and pseudogenes.
Collapse
Affiliation(s)
- Naruya Saitou
- Division of Population Genetics, National Institute of Genetics (NIG), Mishima, Shizuoka Japan
| |
Collapse
|
7
|
Mahmoudi Saber M, Saitou N. Silencing Effect of Hominoid Highly Conserved Noncoding Sequences on Embryonic Brain Development. Genome Biol Evol 2017; 9:2037-2048. [PMID: 28633494 PMCID: PMC5591954 DOI: 10.1093/gbe/evx105] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/16/2017] [Indexed: 12/12/2022] Open
Abstract
Superfamily Hominoidea, which consists of Hominidae (humans and great apes) and Hylobatidae (gibbons), is well-known for sharing human-like characteristics, however, the genomic origins of these shared unique phenotypes have mainly remained elusive. To decipher the underlying genomic basis of Hominoidea-restricted phenotypes, we identified and characterized Hominoidea-restricted highly conserved noncoding sequences (HCNSs) that are a class of potential regulatory elements which may be involved in evolution of lineage-specific phenotypes. We discovered 679 such HCNSs from human, chimpanzee, gorilla, orangutan and gibbon genomes. These HCNSs were demonstrated to be under purifying selection but with lineage-restricted characteristics different from old CNSs. A significant proportion of their ancestral sequences had accelerated rates of nucleotide substitutions, insertions and deletions during the evolution of common ancestor of Hominoidea, suggesting the intervention of positive Darwinian selection for creating those HCNSs. In contrary to enhancer elements and similar to silencer sequences, these Hominoidea-restricted HCNSs are located in close proximity of transcription start sites. Their target genes are enriched in the nervous system, development and transcription, and they tend to be remotely located from the nearest coding gene. Chip-seq signals and gene expression patterns suggest that Hominoidea-restricted HCNSs are likely to be functional regulatory elements by imposing silencing effects on their target genes in a tissue-restricted manner during fetal brain development. These HCNSs, emerged through adaptive evolution and conserved through purifying selection, represent a set of promising targets for future functional studies of the evolution of Hominoidea-restricted phenotypes.
Collapse
Affiliation(s)
- Morteza Mahmoudi Saber
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| | - Naruya Saitou
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| |
Collapse
|
8
|
Meyer KA, Marques-Bonet T, Sestan N. Differential Gene Expression in the Human Brain Is Associated with Conserved, but Not Accelerated, Noncoding Sequences. Mol Biol Evol 2017; 34:1217-1229. [PMID: 28204568 PMCID: PMC5400397 DOI: 10.1093/molbev/msx076] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Previous studies have found that genes which are differentially expressed within the developing human brain disproportionately neighbor conserved noncoding sequences (CNSs) that have an elevated substitution rate in humans and in other species. One explanation for this general association of differential expression with accelerated CNSs is that genes with pre-existing patterns of differential expression have been preferentially targeted by species-specific regulatory changes. Here we provide support for an alternative explanation: genes that neighbor a greater number of CNSs have a higher probability of differential expression and a higher probability of neighboring a CNS with lineage-specific acceleration. Thus, neighboring an accelerated element from any species signals that a gene likely neighbors many CNSs. We extend the analyses beyond the prenatal time points considered in previous studies to demonstrate that this association persists across developmental and adult periods. Examining differential expression between non-neural tissues suggests that the relationship between the number of CNSs a gene neighbors and its differential expression status may be particularly strong for expression differences among brain regions. In addition, by considering this relationship, we highlight a recently defined set of putative human-specific gain-of-function sequences that, even after adjusting for the number of CNSs neighbored by genes, shows a positive relationship with upregulation in the brain compared with other tissues examined.
Collapse
Affiliation(s)
- Kyle A. Meyer
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Nenad Sestan
- Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT
- Departments of Genetics and Psychiatry, Section of Comparative Medicine, Program in Cellular Neuroscience, Neurodegeneration and Repair, and Yale Child Study Center, Yale School of Medicine, New Haven, CT
| |
Collapse
|
9
|
Hettiarachchi N, Saitou N. GC Content Heterogeneity Transition of Conserved Noncoding Sequences Occurred at the Emergence of Vertebrates. Genome Biol Evol 2016; 8:3377-3392. [PMID: 28040773 PMCID: PMC5203776 DOI: 10.1093/gbe/evw231] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Conserved non-coding sequences (CNSs) of Eukaryotes are known to be significantly enriched in regulatory sequences. CNSs of diverse lineages follow different patterns in abundance, sequence composition, and location. Here, we report a thorough analysis of CNSs in diverse groups of Eukaryotes with respect to GC content heterogeneity. We examined 24 fungi, 19 invertebrates, and 12 non-mammalian vertebrates so as to find lineage specific features of CNSs. We found that fungi and invertebrate CNSs are predominantly GC rich as in plants we previously observed, whereas vertebrate CNSs are GC poor. This result suggests that the CNS GC content transition occurred from the ancestral GC rich state of Eukaryotes to GC poor in the vertebrate lineage due to the enrollment of GC poor transcription factor binding sites that are lineage specific. CNS GC content is closely linked with the nucleosome occupancy that determines the location and structural architecture of DNAs.
Collapse
Affiliation(s)
- Nilmini Hettiarachchi
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan.,Division of Population Genetics, National institute of Genetics, Mishima, Japan
| | - Naruya Saitou
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan .,Division of Population Genetics, National institute of Genetics, Mishima, Japan
| |
Collapse
|
10
|
Babarinde IA, Saitou N. Genomic Locations of Conserved Noncoding Sequences and Their Proximal Protein-Coding Genes in Mammalian Expression Dynamics. Mol Biol Evol 2016; 33:1807-17. [PMID: 27017584 DOI: 10.1093/molbev/msw058] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Experimental studies have found the involvement of certain conserved noncoding sequences (CNSs) in the regulation of the proximal protein-coding genes in mammals. However, reported cases of long range enhancer activities and inter-chromosomal regulation suggest that proximity of CNSs to protein-coding genes might not be important for regulation. To test the importance of the CNS genomic location, we extracted the CNSs conserved between chicken and four mammalian species (human, mouse, dog, and cattle). These CNSs were confirmed to be under purifying selection. The intergenic CNSs are often found in clusters in gene deserts, where protein-coding genes are in paucity. The distribution pattern, ChIP-Seq, and RNA-Seq data suggested that the CNSs are more likely to be regulatory elements and not corresponding to long intergenic noncoding RNAs. Physical distances between CNS and their nearest protein coding genes were well conserved between human and mouse genomes, and CNS-flanking genes were often found in evolutionarily conserved genomic neighborhoods. ChIP-Seq signal and gene expression patterns also suggested that CNSs regulate nearby genes. Interestingly, genes with more CNSs have more evolutionarily conserved expression than those with fewer CNSs. These computationally obtained results suggest that the genomic locations of CNSs are important for their regulatory functions. In fact, various kinds of evolutionary constraints may be acting to maintain the genomic locations of CNSs and protein-coding genes in mammals to ensure proper regulation.
Collapse
Affiliation(s)
- Isaac Adeyemi Babarinde
- Department of Genetics, Graduate University for Advanced Studies, Mishima, Japan Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Naruya Saitou
- Department of Genetics, Graduate University for Advanced Studies, Mishima, Japan Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| |
Collapse
|
11
|
Hettiarachchi N, Kryukov K, Sumiyama K, Saitou N. Lineage-specific conserved noncoding sequences of plant genomes: their possible role in nucleosome positioning. Genome Biol Evol 2014; 6:2527-42. [PMID: 25364802 PMCID: PMC4202324 DOI: 10.1093/gbe/evu188] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/26/2014] [Indexed: 01/01/2023] Open
Abstract
Many studies on conserved noncoding sequences (CNSs) have found that CNSs are enriched significantly in regulatory sequence elements. We conducted whole-genome analysis on plant CNSs to identify lineage-specific CNSs in eudicots, monocots, angiosperms,and vascular plants based on the premise that lineage-specific CNSs define lineage-specific characters and functions in groups of organisms. We identified 27 eudicot, 204 monocot, 6,536 grass, 19 angiosperm, and 2 vascular plant lineage-specific CNSs(lengths range from 16 to 1,517 bp) that presumably originated in their respective common ancestors. A stronger constraint on the CNSs located in the untranslated regions was observed. The CNSs were often flanked by genes involved in transcription regulation. A drop of A+T content near the border of CNSs was observed and CNS regions showed a higher nucleosome occupancy probability. These CNSs are candidate regulatory elements, which are expected to define lineage-specific features of various plant groups.
Collapse
Affiliation(s)
- Nilmini Hettiarachchi
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Kirill Kryukov
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Kenta Sumiyama
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
| | - Naruya Saitou
- Department of Genetics, School of Life Science, Graduate University for Advanced Studies (SOKENDAI), Mishima, Japan
- Division of Population Genetics, National Institute of Genetics, Mishima, Japan
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Japan
| |
Collapse
|
12
|
Sheetlin SL, Park Y, Frith MC, Spouge JL. Frameshift alignment: statistics and post-genomic applications. ACTA ACUST UNITED AC 2014; 30:3575-82. [PMID: 25172925 DOI: 10.1093/bioinformatics/btu576] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
MOTIVATION The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score. RESULTS We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two 'post-genomic' applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.
Collapse
Affiliation(s)
- Sergey L Sheetlin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan
| | - Yonil Park
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan
| | - Martin C Frith
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan
| | - John L Spouge
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA and Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan
| |
Collapse
|
13
|
Babarinde IA, Saitou N. Heterogeneous tempo and mode of conserved noncoding sequence evolution among four mammalian orders. Genome Biol Evol 2014; 5:2330-43. [PMID: 24259317 PMCID: PMC3879966 DOI: 10.1093/gbe/evt177] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Conserved noncoding sequences (CNSs) of vertebrates are considered to be closely linked with protein-coding gene regulatory functions. We examined the abundance and genomic distribution of CNSs in four mammalian orders: primates, rodents, carnivores, and cetartiodactyls. We defined the two thresholds for CNS using conservation level of coding genes; using all the three coding positions and using only first and second codon positions. The abundance of CNSs varied among lineages, with primates and rodents having highest and lowest number of CNSs, respectively, whereas carnivores and cetartiodactyls had intermediate values. These CNSs cover 1.3-5.5% of the mammalian genomes and have signatures of selective constraints that are stronger in more ancestral than the recent ones. Evolution of new CNSs as well as retention of ancestral CNSs contribute to the differences in abundance. The genomic distribution of CNSs is dynamic with higher proportions of rodent and primate CNSs located in the introns compared with carnivores and cetartiodactyls. In fact, 19% of orthologous single-copy CNSs between human and dog are located in different genomic regions. If CNSs can be considered as candidates of gene expression regulatory sequences, heterogeneity of CNSs among the four mammalian orders may have played an important role in creating the order-specific phenotypes. Fewer CNSs in rodents suggest that rodent diversity is related to lower regulatory conservation. With CNSs shown to cluster around genes involved in nervous systems and the higher number of primate CNSs, our result suggests that CNSs may be involved in the higher complexity of the primate nervous system.
Collapse
Affiliation(s)
- Isaac Adeyemi Babarinde
- Department of Genetics, School of Life Science, The Graduate University for Advanced Studies (SOKENDAI), Mishima Japan
| | | |
Collapse
|
14
|
Classification of selectively constrained DNA elements using feature vectors and rule-based classifiers. Genomics 2014; 104:79-86. [PMID: 25058025 DOI: 10.1016/j.ygeno.2014.07.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 07/15/2014] [Indexed: 12/29/2022]
Abstract
Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented.
Collapse
|
15
|
Polychronopoulos D, Sellis D, Almirantis Y. Conserved noncoding elements follow power-law-like distributions in several genomes as a result of genome dynamics. PLoS One 2014; 9:e95437. [PMID: 24787386 PMCID: PMC4008492 DOI: 10.1371/journal.pone.0095437] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Accepted: 03/26/2014] [Indexed: 12/31/2022] Open
Abstract
Conserved, ultraconserved and other classes of constrained elements (collectively referred as CNEs here), identified by comparative genomics in a wide variety of genomes, are non-randomly distributed across chromosomes. These elements are defined using various degrees of conservation between organisms and several thresholds of minimal length. We here investigate the chromosomal distribution of CNEs by studying the statistical properties of distances between consecutive CNEs. We find widespread power-law-like distributions, i.e. linearity in double logarithmic scale, in the inter-CNE distances, a feature which is connected with fractality and self-similarity. Given that CNEs are often found to be spatially associated with genes, especially with those that regulate developmental processes, we verify by appropriate gene masking that a power-law-like pattern emerges irrespectively of whether elements found close or inside genes are excluded or not. An evolutionary model is put forward for the understanding of these findings that includes segmental or whole genome duplication events and eliminations (loss) of most of the duplicated CNEs. Simulations reproduce the main features of the observed size distributions. Power-law-like patterns in the genomic distributions of CNEs are in accordance with current knowledge about their evolutionary history in several genomes.
Collapse
Affiliation(s)
- Dimitris Polychronopoulos
- Institute of Biosciences and Applications, National Center for Scientific Research “Demokritos”, Athens, Greece
- Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, Athens, Greece
| | - Diamantis Sellis
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Yannis Almirantis
- Institute of Biosciences and Applications, National Center for Scientific Research “Demokritos”, Athens, Greece
- * E-mail:
| |
Collapse
|