Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Bernardi G. Isochores and the evolutionary genomics of vertebrates. Gene 2000;241:3-17. [PMID: 10607893 DOI: 10.1016/s0378-1119(99)00485-0] [Citation(s) in RCA: 357] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Number	Cited by Other Article(s)
1	Distinctive Nucleic Acid Recognition by Lysine-Embedded Phenanthridine Peptides. Int J Mol Sci 2024;25:4866. [PMID: 38732083 PMCID: PMC11084427 DOI: 10.3390/ijms25094866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 04/25/2024] [Accepted: 04/27/2024] [Indexed: 05/13/2024] Open Abstract Three new phenanthridine peptide derivatives (19, 22, and 23) were synthesized to explore their potential as spectrophotometric probes for DNA and RNA. UV/Vis and circular dichroism (CD) spectra, mass spectroscopy, and computational analysis confirmed the presence of intramolecular interactions in all three compounds. Computational analysis revealed that compounds alternate between bent and open conformations, highlighting the latter's crucial influence on successful polynucleotide recognition. Substituting one glycine with lysine in two regioisomers (22, 23) resulted in stronger binding interactions with DNA and RNA than for a compound containing two glycines (19), thus emphasizing the importance of lysine. The regioisomer with lysine closer to the phenanthridine ring (23) exhibited a dual and selective fluorimetric response with non-alternating AT and ATT polynucleotides and induction of triplex formation from the AT duplex. The best binding constant (K) with a value of 2.5 × 107 M-1 was obtained for the interaction with AT and ATT polynucleotides. Furthermore, apart from distinguishing between different types of ds-DNA and ds-RNA, the same compound could recognize GC-rich DNA through distinct induced CD signals. Collapse Key Words DNA/RNA recognition molecular dynamics simulations phenanthridine peptides spectrophotometric AT- and GC-base pair probe Collapse MESH Headings Phenanthridines/chemistry Lysine/chemistry Peptides/chemistry DNA/chemistry DNA/metabolism Circular Dichroism RNA/chemistry Nucleic Acid Conformation Collapse Grants IP-2018-01-4694 Croatian Science Foundation IP-2013-11-1477 Croatian Science Foundation Collapse Affiliation(s) Collapse
2	The protein domains of vertebrate species in which selection is more effective have greater intrinsic structural disorder. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.02.530449. [PMID: 38712167 PMCID: PMC11071303 DOI: 10.1101/2023.03.02.530449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024] Abstract The nearly neutral theory of molecular evolution posits variation among species in the effectiveness of selection. In an idealized model, the census population size determines both this minimum magnitude of the selection coefficient required for deleterious variants to be reliably purged, and the amount of neutral diversity. Empirically, an "effective population size" is often estimated from the amount of putatively neutral genetic diversity and is assumed to also capture a species' effectiveness of selection. A potentially more direct measure of the effectiveness of selection is the degree to which selection maintains preferred codons. However, past metrics that compare codon bias across species are confounded by among-species variation in %GC content and/or amino acid composition. Here we propose a new Codon Adaptation Index of Species (CAIS), based on Kullback-Leibler divergence, that corrects for both confounders. We demonstrate the use of CAIS correlations, as well as the Effective Number of Codons, to show that the protein domains of more highly adapted vertebrate species evolve higher intrinsic structural disorder. Collapse Key Words Collapse MESH Headings Collapse Grants R01 GM104040 NIGMS NIH HHS T32 GM132008 NIGMS NIH HHS Collapse Affiliation(s) Collapse
3	Emergence of enhancers at late DNA replicating regions. Nat Commun 2024;15:3451. [PMID: 38658544 PMCID: PMC11043393 DOI: 10.1038/s41467-024-47391-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 03/26/2024] [Indexed: 04/26/2024] Open Abstract Enhancers are fast-evolving genomic sequences that control spatiotemporal gene expression patterns. By examining enhancer turnover across mammalian species and in multiple tissue types, we uncover a relationship between the emergence of enhancers and genome organization as a function of germline DNA replication time. While enhancers are most abundant in euchromatic regions, enhancers emerge almost twice as often in late compared to early germline replicating regions, independent of transposable elements. Using a deep learning sequence model, we demonstrate that new enhancers are enriched for mutations that alter transcription factor (TF) binding. Recently evolved enhancers appear to be mostly neutrally evolving and enriched in eQTLs. They also show more tissue specificity than conserved enhancers, and the TFs that bind to these elements, as inferred by binding sequences, also show increased tissue-specific gene expression. We find a similar relationship with DNA replication time in cancer, suggesting that these observations may be time-invariant principles of genome evolution. Our work underscores that genome organization has a profound impact in shaping mammalian gene regulation. Collapse Key Words evolutionary genetics functional genomics gene regulation computational biology and bioinformatics Collapse MESH Headings Enhancer Elements, Genetic Animals Humans DNA Replication Evolution, Molecular Transcription Factors/metabolism Transcription Factors/genetics Mice Gene Expression Regulation Organ Specificity/genetics Mutation Genome/genetics DNA Transposable Elements/genetics Collapse Grants GNT2009309 Department of Health \| National Health and Medical Research Council (NHMRC) DP200100250 Department of Education and Training \| Australian Research Council (ARC) Snow Medical Collapse Affiliation(s) Collapse
4	Differences in Alu vs L1-rich chromosome bands underpin architectural reorganization of the inactive-X chromosome and SAHFs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574742. [PMID: 38260534 PMCID: PMC10802495 DOI: 10.1101/2024.01.09.574742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024] Abstract The linear DNA sequence of mammalian chromosomes is organized in large blocks of DNA with similar sequence properties, producing a pattern of dark and light staining bands on mitotic chromosomes. Cytogenetic banding is essentially invariant between people and cell-types and thus may be assumed unrelated to genome regulation. We investigate whether large blocks of Alu-rich R-bands and L1-rich G-bands provide a framework upon which functional genome architecture is built. We examine two models of large-scale chromatin condensation: X-chromosome inactivation and formation of senescence-associated heterochromatin foci (SAHFs). XIST RNA triggers gene silencing but also formation of the condensed Barr Body (BB), thought to reflect cumulative gene silencing. However, we find Alu-rich regions are depleted from the L1-rich BB, supporting it is a dense core but not the entire chromosome. Alu-rich bands are also gene-rich, affirming our earlier findings that genes localize at the outer periphery of the BB. SAHFs similarly form within each territory by coalescence of syntenic L1 regions depleted for highly Alu-rich DNA. Analysis of senescent cell Hi-C data also shows large contiguous blocks of G-band and R-band DNA remodel as a segmental unit. Entire dark-bands gain distal intrachromosomal interactions as L1-rich regions form the SAHF. Most striking is that sharp Alu peaks within R-bands resist these changes in condensation. We further show that Chr19, which is exceptionally Alu rich, fails to form a SAHF. Collective results show regulation of genome architecture corresponding to large blocks of DNA and demonstrate resistance of segments with high Alu to chromosome condensation. Collapse Key Words Alu Barr body Chromosome bands LINE1 Nuclear structure Repeats SAHFs Senescence X-inactivation heterochromatin Collapse MESH Headings Collapse Grants R01 HD091357 NICHD NIH HHS R01 HD094788 NICHD NIH HHS R35 GM122597 NIGMS NIH HHS Collapse Affiliation(s) Collapse
5	GC heterogeneity reveals sequence-structures evolution of angiosperm ITS2. BMC PLANT BIOLOGY 2023;23:608. [PMID: 38036992 PMCID: PMC10691020 DOI: 10.1186/s12870-023-04634-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Accepted: 11/26/2023] [Indexed: 12/02/2023] Abstract BACKGROUND Despite GC variation constitutes a fundamental element of genome and species diversity, the precise mechanisms driving it remain unclear. The abundant sequence data available for the ITS2, a commonly employed phylogenetic marker in plants, offers an exceptional resource for exploring the GC variation across angiosperms. RESULTS A comprehensive selection of 8666 species, comprising 165 genera, 63 families, and 30 orders were used for the analyses. The alignment of ITS2 sequence-structures and partitioning of secondary structures into paired and unpaired regions were performed using 4SALE. Substitution rates and frequencies among GC base-pairs in the paired regions of ITS2 were calculated using RNA-specific models in the PHASE package. The results showed that the distribution of ITS2 GC contents on the angiosperm phylogeny was heterogeneous, but their increase was generally associated with ITS2 sequence homogenization, thereby supporting the occurrence of GC-biased gene conversion (gBGC) during the concerted evolution of ITS2. Additionally, the GC content in the paired regions of the ITS2 secondary structure was significantly higher than that of the unpaired regions, indicating the selection of GC for thermodynamic stability. Furthermore, the RNA substitution models demonstrated that base-pair transformations favored both the elevation and fixation of GC in the paired regions, providing further support for gBGC. CONCLUSIONS Our findings highlight the significance of secondary structure in GC investigation, which demonstrate that both gBGC and structure-based selection are influential factors driving angiosperm ITS2 GC content. Collapse Key Words GC-biased gene conversion ITS2 content Secondary structure Thermodynamic stability Collapse MESH Headings Humans Magnoliopsida/genetics Phylogeny Gene Conversion Base Composition RNA Evolution, Molecular Collapse Grants 82173936 National Natural Science Foundation of China Collapse Affiliation(s) Collapse
6	Compositional Structure of the Genome: A Review. BIOLOGY 2023;12:849. [PMID: 37372134 DOI: 10.3390/biology12060849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 06/06/2023] [Accepted: 06/07/2023] [Indexed: 06/29/2023] Abstract As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure. Collapse Key Words DNA compositional structure evolutionary adaptive trends hierarchical genome structure segment compositional signature sequence compositional complexity Collapse MESH Headings Collapse Grants AGL2017-88702-C2-2-R Spanish Minister of Science, Innovation and Universities (former Spanish Minister of Economy and Competitiveness CCA2021-9-77 Stitching Cancer Center Amsterdam PID2020-116711GB-I00 Spanish Ministerio de Ciencia e Innovación FQM-362 Spanish Junta de Andalucía Collapse Affiliation(s) Collapse
7	Main Factors Shaping Amino Acid Usage Across Evolution. J Mol Evol 2023:10.1007/s00239-023-10120-5. [PMID: 37264211 DOI: 10.1007/s00239-023-10120-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 05/17/2023] [Indexed: 06/03/2023] Abstract The standard genetic code determines that in most species, including viruses, there are 20 amino acids that are coded by 61 codons, while the other three codons are stop triplets. Considering the whole proteome each species features its own amino acid frequencies, given the slow rate of change, closely related species display similar GC content and amino acids usage. In contrast, distantly related species display different amino acid frequencies. Furthermore, within certain multicellular species, as mammals, intragenomic differences in the usage of amino acids are evident. In this communication, we shall summarize some of the most prominent and well-established factors that determine the differences found in the amino acid usage, both across evolution and intragenomically. Collapse Key Words Amino acid cost Amino acids GC content GC-skew Hydropathy Optimal growth temperature Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
8	Genes enriched in A/T-ending codons are co-regulated and conserved across mammals. Cell Syst 2023;14:312-323.e3. [PMID: 36889307 DOI: 10.1016/j.cels.2023.02.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 07/11/2022] [Accepted: 02/09/2023] [Indexed: 03/09/2023] Abstract Codon usage influences gene expression distinctly depending on the cell context. Yet, the importance of codon bias in the simultaneous turnover of specific groups of protein-coding genes remains to be investigated. Here, we find that genes enriched in A/T-ending codons are expressed more coordinately in general and across tissues and development than those enriched in G/C-ending codons. tRNA abundance measurements indicate that this coordination is linked to the expression changes of tRNA isoacceptors reading A/T-ending codons. Genes with similar codon composition are more likely to be part of the same protein complex, especially for genes with A/T-ending codons. The codon preferences of genes with A/T-ending codons are conserved among mammals and other vertebrates. We suggest that this orchestration contributes to tissue-specific and ontogenetic-specific expression, which can facilitate, for instance, timely protein complex formation. Collapse Key Words A/T-ending codons RAS genes co-regulation codon usage conservation development mammals synonymous codons tRNA translation efficiency Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
9	Genomic Signature in Evolutionary Biology: A Review. BIOLOGY 2023;12:biology12020322. [PMID: 36829597 PMCID: PMC9953303 DOI: 10.3390/biology12020322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023] Abstract Organisms are unique physical entities in which information is stored and continuously processed. The digital nature of DNA sequences enables the construction of a dynamic information reservoir. However, the distinction between the hardware and software components in the information flow is crucial to identify the mechanisms generating specific genomic signatures. In this work, we perform a bibliometric analysis to identify the different purposes of looking for particular patterns in DNA sequences associated with a given phenotype. This study has enabled us to make a conceptual breakdown of the genomic signature and differentiate the leading applications. On the one hand, it refers to gene expression profiling associated with a biological function, which may be shared across taxa. This signature is the focus of study in precision medicine. On the other hand, it also refers to characteristic patterns in species-specific DNA sequences. This interpretation plays a key role in comparative genomics, identifying evolutionary relationships. Looking at the relevant studies in our bibliographic database, we highlight the main factors causing heterogeneities in genome composition and how they can be quantified. All these findings lead us to reformulate some questions relevant to evolutionary biology. Collapse Key Words alignment-free methods chaos game representation evolutionary biology genome sequence genomic signature Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
10	Germline mutations directions are different between introns of the same gene: case study of the gene coding for amyloid-beta precursor protein. Genetica 2023;151:61-73. [PMID: 36129589 DOI: 10.1007/s10709-022-00166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 09/08/2022] [Indexed: 02/01/2023] Abstract Amyloid-beta precursor protein (APP) is highly conserved in mammals. This feature allowed us to compare nucleotide usage biases in fourfold degenerated sites along the length of its coding region for 146 species of mammals and birds in search of fragments with significant deviations. Even though cytosine usage has the highest value in fourfold degenerated sites in APP coding region from all tested placental mammals, in contrast to marsupial mammals with the bias toward thymine usage, the most frequent germline and somatic mutations in human APP coding region are C to T and G to A transitions. The same mutational AT-pressure is characteristic for germline mutations in introns of human APP gene. However, surprisingly, there are several exceptional introns with deviations in germline mutations rates. The most of those introns surround exons with exceptional biases in nucleotide usage in fourfold degenerated sites. Existence of such fragments in exons 4 and 5, as well as in exon 14, can be connected with the presence of lncRNA genes in complementary strand of DNA. Exceptional nucleotide usage bias in exons 16 and 17 that contain a sequence encoding amyloid-beta peptides can be explained either by the presence of yet unmapped lncRNA(s), or by the autonomous expression of a short mRNA that encodes just C-terminal part of the APP providing an alternative source of amyloid-beta peptides. This hypothesis is supported by the increased rate of T to C transitions in introns 16-17 and 17-18 of Human APP gene relatively to other introns. Collapse Key Words Amyloid-beta precursor protein Autonomous transcription Beta-amyloid peptides Germline mutations Mutational pressure Collapse MESH Headings Pregnancy Animals Female Humans Amyloid beta-Protein Precursor/genetics Introns Germ-Line Mutation RNA, Long Noncoding Base Sequence Placenta Mammals/genetics Nucleotides Peptides/genetics Collapse Grants Collapse Affiliation(s) Collapse
11	Genome Evolution and the Future of Phylogenomics of Non-Avian Reptiles. Animals (Basel) 2023;13:ani13030471. [PMID: 36766360 PMCID: PMC9913427 DOI: 10.3390/ani13030471] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/13/2023] [Accepted: 01/15/2023] [Indexed: 02/01/2023] Open Abstract Non-avian reptiles comprise a large proportion of amniote vertebrate diversity, with squamate reptiles-lizards and snakes-recently overtaking birds as the most species-rich tetrapod radiation. Despite displaying an extraordinary diversity of phenotypic and genomic traits, genomic resources in non-avian reptiles have accumulated more slowly than they have in mammals and birds, the remaining amniotes. Here we review the remarkable natural history of non-avian reptiles, with a focus on the physical traits, genomic characteristics, and sequence compositional patterns that comprise key axes of variation across amniotes. We argue that the high evolutionary diversity of non-avian reptiles can fuel a new generation of whole-genome phylogenomic analyses. A survey of phylogenetic investigations in non-avian reptiles shows that sequence capture-based approaches are the most commonly used, with studies of markers known as ultraconserved elements (UCEs) especially well represented. However, many other types of markers exist and are increasingly being mined from genome assemblies in silico, including some with greater information potential than UCEs for certain investigations. We discuss the importance of high-quality genomic resources and methods for bioinformatically extracting a range of marker sets from genome assemblies. Finally, we encourage herpetologists working in genomics, genetics, evolutionary biology, and other fields to work collectively towards building genomic resources for non-avian reptiles, especially squamates, that rival those already in place for mammals and birds. Overall, the development of this cross-amniote phylogenomic tree of life will contribute to illuminate interesting dimensions of biodiversity across non-avian reptiles and broader amniotes. Collapse Key Words GC content anonymous loci genome size isochores karyotype natural history reduced representation repetitive elements sex determination and chromosomes target capture ultraconserved elements Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
12	Stress-induced transcriptional readthrough into neighboring genes is linked to intron retention. iScience 2022;25:105543. [PMID: 36505935 PMCID: PMC9732411 DOI: 10.1016/j.isci.2022.105543] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 07/10/2022] [Accepted: 11/07/2022] [Indexed: 11/11/2022] Open Abstract Exposure to certain stresses leads to readthrough transcription. Using polyA-selected RNA-seq in mouse fibroblasts subjected to heat shock, oxidative, or osmotic stress, we found that readthrough transcription can proceed into proximal downstream genes, in a phenomenon previously termed "read-in." We found that read-in genes share distinctive genomic characteristics; they are GC-rich and extremely short , with genomic features conserved in human. Using ribosome profiling, we found that read-in genes show significantly reduced translation. Strikingly, read-in genes demonstrate marked intron retention, mostly in their first introns, which could not be explained solely by their short introns and GC-richness, features often associated with intron retention. Finally, we revealed H3K36me3 enrichment upstream to read-in genes. Moreover, demarcation of exon-intron junctions by H3K36me3 was absent in read-in first introns. Our data portray a relationship between read-in and intron retention, suggesting they may have co-evolved to facilitate reduced translation of read-in genes during stress. Collapse Key Words Biological sciences Molecular Genetics Molecular biology Molecular interaction Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
13	The Shift in Synonymous Codon Usage Reveals Similar Genomic Variation during Domestication of Asian and African Rice. Int J Mol Sci 2022;23:12860. [PMID: 36361651 PMCID: PMC9656316 DOI: 10.3390/ijms232112860] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 10/20/2022] [Accepted: 10/24/2022] [Indexed: 10/29/2023] Open Abstract The domestication of wild rice occurred together with genomic variation, including the synonymous nucleotide substitutions that result in synonymous codon usage bias (SCUB). SCUB mirrors the evolutionary specialization of plants, but its characteristics during domestication were not yet addressed. Here, we found cytosine- and guanidine-ending (NNC and NNG) synonymous codons (SCs) were more pronounced than adenosine- and thymine-ending SCs (NNA and NNT) in both wild and cultivated species of Asian and African rice. The ratios of NNC/G to NNA/T codons gradually decreased following the rise in the number of introns, and the preference for NNA/T codons became more obvious in genes with more introns in cultivated rice when compared with those in wild rice. SCUB frequencies were heterogeneous across the exons, with a higher preference for NNA/T in internal exons than in terminal exons. The preference for NNA/T in internal but not terminal exons was more predominant in cultivated rice than in wild rice, with the difference between wild and cultivated rice becoming more remarkable with the rise in exon numbers. The difference in the ratios of codon combinations representing DNA methylation-mediated conversion from cytosine to thymine between wild and cultivated rice coincided with their difference in SCUB frequencies, suggesting that SCUB reveals the possible association between genetic and epigenetic variation during the domestication of rice. Similar patterns of SCUB shift in Asian and African rice indicate that genomic variation occurs in the same non-random manner. SCUB representing non-neutral synonymous mutations can provide insight into the mechanism of genomic variation in domestication and can be used for the genetic dissection of agricultural traits in rice and other crops. Collapse Key Words DNA methylation domestication genomic variation rice synonymous codon usage bias Collapse MESH Headings Domestication Oryza/genetics Codon Usage Thymine Genomics Codon/genetics Crops, Agricultural/genetics Cytosine Collapse Grants 31870242 National Natural Science Foundation of China 32170297 National Natural Science Foundation of China ZR2021ZD32 Key Project of Natural Science Foundation of Shandong 2020ZX08009-11B National Transgenic Project Collapse Affiliation(s) Collapse
14	Alteration of synonymous codon usage bias accompanies polyploidization in wheat. Front Genet 2022;13:979902. [PMID: 36313462 PMCID: PMC9614214 DOI: 10.3389/fgene.2022.979902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 10/03/2022] [Indexed: 11/13/2022] Open Abstract The diploidization of polyploid genomes is accompanied by genomic variation, including synonymous nucleotide substitutions that may lead to synonymous codon usage bias (SCUB). SCUB can mirror the evolutionary specialization of plants, but its effect on the formation of polyploidies is not well documented. We explored this issue here with hexaploid wheat and its progenitors. Synonymous codons (SCs) ending in either cytosine (NNC) or guanidine (NNG) were more frequent than those ending in either adenosine (NNA) or thymine (NNT), and the preference for NNC/G codons followed the increase in genome ploidy. The ratios between NNC/G and NNA/T codons gradually decreased in genes with more introns, and the difference in these ratios between wheat and its progenitors diminished with increasing ploidy. SCUB frequencies were heterogeneous among exons, and the bias preferred to NNA/T in more internal exons, especially for genes with more exons; while the preference did not appear to associate with ploidy. The SCUB alteration of the progenitors was different during the formation of hexaploid wheat, so that SCUB was the homogeneous among A, B and D subgenomes. DNA methylation-mediated conversion from cytosine to thymine weakened following the increase of genome ploidy, coinciding with the stronger bias for NNC/G SCs in the genome as a function of ploidy, suggesting that SCUB contribute to the epigenetic variation in hexaploid wheat. The patterns in SCUB mirrored the formation of hexaploid wheat, which provides new insight into genome shock-induced genetic variation during polyploidization. SCs representing non-neutral synonymous mutations can be used for genetic dissection and improvement of agricultural traits of wheat and other polyploidies. Collapse Key Words Collapse MESH Headings Collapse Grants National Natural Science Foundation of China National Key Project for Research on Transgenic Biology Natural Science Foundation of Shandong Province Collapse Affiliation(s) Collapse
15	Slaying (Yet Again) the Brain-Eating Zombie Called the "Isochore Theory": A Segmentation Algorithm Used to "Confirm" the Existence of Isochores Creates "Isochores" Where None Exist. Int J Mol Sci 2022;23:ijms23126558. [PMID: 35743002 PMCID: PMC9224211 DOI: 10.3390/ijms23126558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/07/2022] [Accepted: 06/09/2022] [Indexed: 01/27/2023] Open Abstract The isochore theory, which was proposed more than 40 years ago, depicts the mammalian genome as a mosaic of long, homogeneous regions that are characterized by their guanine and cytosine (GC) content. The human genome, for instance, was claimed to consist of five compositionally distinct isochore families. The isochore theory, in all its reincarnations, has been repeatedly falsified in the literature, yet isochore proponents have persistently resurrected it by either redefining isochores or by proposing alternative means of testing the theory. Here, I deal with the latest attempt to salvage this seemingly immortal zombie—a sequence segmentation method called isoSegmenter, which was claimed to “identify” isochores while at the same time disregarding the main characteristic attribute of isochores—compositional homogeneity. I used a series of controlled, randomly generated simulated sequences as a benchmark to study the performance of isoSegmenter. The main advantage of using simulated sequences is that, unlike real data, the exact start and stop point of any isochore or homogeneous compositional domain is known. Based on three key performance metrics—sensitivity, precision, and Jaccard similarity index—isoSegmenter was found to be vastly inferior to isoPlotter, a segmentation algorithm with no user input. Moreover, isoSegmenter identified isochores where none exist and failed to identify compositionally homogeneous sequences that were shorter than 100−200 kb. Will this zillionth refutation of “isochores” ensure a final and permanent entombment of the isochore theory? This author is not holding his breath. Collapse Key Words GC content benchmark simulations isoPlotter isoSegmener isochores segmentation algorithms Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
16	Arginine Depletion in Human Cancers. Cancers (Basel) 2021;13:cancers13246274. [PMID: 34944895 PMCID: PMC8699593 DOI: 10.3390/cancers13246274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 12/04/2021] [Accepted: 12/09/2021] [Indexed: 11/25/2022] Open Abstract Simple Summary Thousands of cancer genomes are now publicly available which has led to new insights into the underlying features of cancers. These include the identification of mutational signatures at both nucleotide and amino acid levels. Here, we discuss C > T transitions as a key nucleotide-level mutational signature that leads to a dramatic overrepresentation of arginine substitutions in cancers. We propose that this underlying C > T mutational signature canalizes possible arginine substitution outcomes, favoring histidine, cysteine, glutamine, and tryptophan. This initial asymmetry is then acted on at the amino acid level by purifying selection. Thus, a model of “sequential selection” could explain the documented bias towards arginine substitutions in multiple cancers. Abstract Arginine is encoded by six different codons. Base pair changes in any of these codons can have a broad spectrum of effects including substitutions to twelve different amino acids, eighteen synonymous changes, and two stop codons. Four amino acids (histidine, cysteine, glutamine, and tryptophan) account for over 75% of amino acid substitutions of arginine. This suggests that a mutational bias, or “purifying selection”, mechanism is at work. This bias appears to be driven by C > T and G > A transitions in four of the six arginine codons, a signature that is universal and independent of cancer tissue of origin or histology. Here, we provide a review of the available literature and reanalyze publicly available data from the Catalogue of Somatic Mutations in Cancer (COSMIC). Our analysis identifies several genes with an arginine substitution bias. These include known factors such as IDH1, as well as previously unreported genes, including four cancer driver genes (FGFR3, PPP6C, MAX, GNAQ). We propose that base pair substitution bias and amino acid physiology both play a role in purifying selection. This model may explain the documented arginine substitution bias in cancers. Collapse Key Words arginine cancer mutation purifying selection Collapse MESH Headings Collapse Grants 13150773 NIH HHS Collapse Affiliation(s) Collapse
17	Data-driven selection of the number of change-points via error rate control. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1999820] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
18	Application-based guidelines for best practices in plant flow cytometry. Cytometry A 2021;101:749-781. [PMID: 34585818 DOI: 10.1002/cyto.a.24499] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 08/10/2021] [Accepted: 08/26/2021] [Indexed: 12/15/2022] Abstract Flow cytometry (FCM) is currently the most widely-used method to establish nuclear DNA content in plants. Since simple, 1-3-parameter, flow cytometers, which are sufficient for most plant applications, are commercially available at a reasonable price, the number of laboratories equipped with these instruments, and consequently new FCM users, has greatly increased over the last decade. This paper meets an urgent need for comprehensive recommendations for best practices in FCM for different plant science applications. We discuss advantages and limitations of establishing plant ploidy, genome size, DNA base composition, cell cycle activity, and level of endoreduplication. Applications of such measurements in plant systematics, ecology, molecular biology research, reproduction biology, tissue cultures, plant breeding, and seed sciences are described. Advice is included on how to obtain accurate and reliable results, as well as how to manage troubleshooting that may occur during sample preparation, cytometric measurements, and data handling. Each section is followed by best practice recommendations; tips as to what specific information should be provided in FCM papers are also provided. Collapse Key Words DNA base composition DNA content cell cycle endoreduplication flow cytometric seed screening genome size in vitro cultures intraspecific variation ploidy Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
19	Codon Usage Bias: An Endless Tale. J Mol Evol 2021;89:589-593. [PMID: 34383106 DOI: 10.1007/s00239-021-10027-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 08/06/2021] [Indexed: 11/28/2022] Abstract Since the genetic code is degenerate, several codons are translated to the same amino acid. Although these triplets were historically considered to be "synonymous" and therefore expected to be used at rather equal frequencies in all genomes, we now know that this is not the case. Indeed, since several coding sequences were obtained in the late '70s and early '80s in the last century, coming from either the same or different species, it was evident that (a) each genome, taken globally, displayed different codon usage patterns, which means that different genomes display a particular global codon usage table when all genes are considered together, and (b) there is a strong intragenomic diversity: in other words, within a given species the codon usage pattern can (and usually do) differ greatly among genes in the same genome. These different patterns were attributed to two main factors: first, the mutational bias characteristic of each genome, which determines that GC- poor species display a general bias towards A/T codons while the reverse is true for GC- rich species. Second, the differences in codon usage among genes from the same species are due to natural selection acting at the level of translation, in such a way that highly expressed genes tend to use codons that match with the most abundant isoacceptor tRNAs. Thus, these genes are translated at a highest rate, which in turn leads to avoid the limiting factor in translation which is the number of available ribosomes per cell. Although these explanations are still valid, new factors are almost constantly postulated to affect codon usage. In this mini review, we shall try to summarize them. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
20	Development of shiny dashboard application for “genome-wide association study on analysis of SNPs injected in Homo sapiens genome (snips-HsG)”. GENE REPORTS 2021. [DOI: 10.1016/j.genrep.2021.101033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
21	Neutralism versus selectionism: Chargaff's second parity rule, revisited. Genetica 2021;149:81-88. [PMID: 33880685 PMCID: PMC8057000 DOI: 10.1007/s10709-021-00119-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 04/09/2021] [Indexed: 11/03/2022] Abstract Of Chargaff's four "rules" on DNA base frequencies, the functional interpretation of his second parity rule (PR2) is the most contentious. Thermophile base compositions (GC%) were taken by Galtier and Lobry (1997) as favoring Sueoka's neutral PR2 hypothesis over Forsdyke's selective PR2 hypothesis, namely that mutations improving local within-species recombination efficiency had generated a genome-wide potential for the strands of duplex DNA to separate and initiate recombination through the "kissing" of the tips of stem-loops. However, following Chargaff's GC rule, base composition mainly reflects a species-specific, genome-wide, evolutionary pressure. GC% could not have consistently followed the dictates of temperature, since it plays fundamental roles in both sustaining species integrity and, through primarily neutral genome-wide mutation, fostering speciation. Evidence for a local within-species recombination-initiating role of base order was obtained with a novel technology that masked the contribution of base composition to nucleic acid folding energy. Forsdyke's results were consistent with his PR2 hypothesis, appeared to resolve some root problems in biology and provided a theoretical underpinning for alignment-free taxonomic analyses using relative oligonucleotide frequencies (k-mer analysis). Moreover, consistent with Chargaff's cluster rule, discovery of the thermoadaptive role of the "purine-loading" of open reading frames made less tenable the Galtier-Lobry anti-selectionist arguments. Collapse Key Words Base composition Purine-loading Speciation Stem-loops Taxonomy Thermoadaptation Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
22	The "Genomic Code": DNA Pervasively Moulds Chromatin Structures Leaving no Room for "Junk". Life (Basel) 2021;11:342. [PMID: 33924668 PMCID: PMC8070607 DOI: 10.3390/life11040342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/06/2021] [Accepted: 04/07/2021] [Indexed: 02/07/2023] Open Abstract The chromatin of the human genome was analyzed at three DNA size levels. At the first, compartment level, two "gene spaces" were found many years ago: A GC-rich, gene-rich "genome core" and a GC-poor, gene-poor "genome desert", the former corresponding to open chromatin centrally located in the interphase nucleus, the latter to closed chromatin located peripherally. This bimodality was later confirmed and extended by the discoveries (1) of LADs, the Lamina-Associated Domains, and InterLADs; (2) of two "spatial compartments", A and B, identified on the basis of chromatin interactions; and (3) of "forests and prairies" characterized by high and low CpG islands densities. Chromatin compartments were shown to be associated with the compositionally different, flat and single- or multi-peak DNA structures of the two, GC-poor and GC-rich, "super-families" of isochores. At the second, sub-compartment, level, chromatin corresponds to flat isochores and to isochore loops (due to compositional DNA gradients) that are susceptible to extrusion. Finally, at the short-sequence level, two sets of sequences, GC-poor and GC-rich, define two different nucleosome spacings, a short one and a long one. In conclusion, chromatin structures are moulded according to a "genomic code" by DNA sequences that pervade the genome and leave no room for "junk". Collapse Key Words Genomic code chromatin structure junk DNA Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
23	Polyploidization is accompanied by synonymous codon usage bias in the chloroplast genomes of both cotton and wheat. PLoS One 2020;15:e0242624. [PMID: 33211753 PMCID: PMC7676672 DOI: 10.1371/journal.pone.0242624] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 11/05/2020] [Indexed: 11/27/2022] Open Abstract Synonymous codon usage bias (SCUB) of both nuclear and organellar genes can mirror the evolutionary specialization of plants. The polyploidization process exposes the nucleus to genomic shock, a syndrome which promotes, among other genetic variants, SCUB. Its effect on organellar genes has not, however, been widely addressed. The present analysis targeted the chloroplast genomes of two leading polyploid crop species, namely cotton and bread wheat. The frequency of codons in the chloroplast genomes ending in either adenosine (NNA) or thymine (NNT) proved to be higher than those ending in either guanidine or cytosine (NNG or NNC), and this difference was conserved when comparisons were made between polyploid and diploid forms in both the cotton and wheat taxa. Preference for NNA/T codons was heterogeneous among genes with various numbers of introns and was also differential among the exons. SCUB patterns distinguished tetraploid cotton from its diploid progenitor species, as well as bread wheat from its diploid/tetraploid progenitor species, indicating that SCUB in the chloroplast genome partially mirrors the formation of polyploidies. Collapse Key Words Collapse MESH Headings Codon Codon Usage Exons Genome, Chloroplast Gossypium/genetics Polyploidy Triticum/genetics Collapse Grants Key Technology Research and Development Program of Shandong National Natural Science Foundation of China Collapse Affiliation(s) Collapse
24	GC-content biases in protein-coding genes act as an "mRNA identity" feature for nuclear export. Bioessays 2020;43:e2000197. [PMID: 33165929 DOI: 10.1002/bies.202000197] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 01/11/2023] Abstract It has long been observed that human protein-coding genes have a particular distribution of GC-content: the 5' end of these genes has high GC-content while the 3' end has low GC-content. In 2012, it was proposed that this pattern of GC-content could act as an mRNA identity feature that would lead to it being better recognized by the cellular machinery to promote its nuclear export. In contrast, junk RNA, which largely lacks this feature, would be retained in the nucleus and targeted for decay. Now two recent papers have provided evidence that GC-content does promote the nuclear export of many mRNAs in human cells. Collapse Key Words NXF1 TREX junk RNA lncRNA mRNA mRNA nuclear export nuclear pore complex transcriptional noise Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
25	Mycobacterium lepromatosis genome exhibits unusually high CpG dinucleotide content and selection is key force in shaping codon usage. INFECTION GENETICS AND EVOLUTION 2020;84:104399. [PMID: 32512206 DOI: 10.1016/j.meegid.2020.104399] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 05/30/2020] [Accepted: 06/03/2020] [Indexed: 01/06/2023] Abstract Mycobacterium lepromatosis was identified as a causative agent for leprosy in the year 2008 in the United States and later more cases were identified in Canada, Singapore, Brazil, and Myanmar. It is known to cause diffuse lepromatosis leprosy among humans. Since it is invasive, the mortality rates are higher in comparison to the M. leprae. At genomic level, there exists 90.9% similarity between M. lepromatosis and M. leprae. Codon usage analysis based on analyses of 228 coding sequences (CDSs) of M. lepromatosis, revealed that the genome is GC rich. Among the total 16 dinucleotides, CpG dinucleotide possesses the highest dinucleotide frequency in M. lepromatosis, that is strikingly an unobvious observation since higher CpG is associated with higher proinflammatory cytokine production and NF-κB activation that eventually leads to high pathogenicity. To evade immune response, CpG content is generally less in pathogens. The unusually high CpG content can be explained by the fact that the nucleotide composition of M. lepromatosis is CG rich. Various forces interplay to shape codon usage pattern of any organism including selection; mutation, nucleotide composition as well as GC biased gene conversion. To understand the interplay between various forces; neutrality, parity, Nc-GC3 (Effective number of codons-GC content at 3rd position of the codon), aromaticity (AROMO) and the general average hydropathicity score (GRAVY) analyses have been carried out. The analyses revealed that selection force is the major contributory force. Along with the selection; mutation, nucleotide composition as well as GC biased gene conversion also play role in shaping codon usage bias in M. lepromatosis. This is the first report on the codon usage in M. lepromatosis. Collapse Key Words CAI Codon usage Compositional constrains GC biased gene conversion Mycobacterium lepromatosis RSCU Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
26	Sex Differences in the Recombination Landscape. Am Nat 2020;195:361-379. [PMID: 32017625 PMCID: PMC7537610 DOI: 10.1086/704943] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Abstract Sex differences in overall recombination rates are well known, but little theoretical or empirical attention has been given to how and why sexes differ in their recombination landscapes: the patterns of recombination along chromosomes. In the first scientific review of this phenomenon, we find that recombination is biased toward telomeres in males and more uniformly distributed in females in most vertebrates and many other eukaryotes. Notable exceptions to this pattern exist, however. Fine-scale recombination patterns also frequently differ between males and females. The molecular mechanisms responsible for sex differences remain unclear, but chromatin landscapes play a role. Why these sex differences evolve also is unclear. Hypotheses suggest that they may result from sexually antagonistic selection acting on coding genes and their regulatory elements, meiotic drive in females, selection during the haploid phase of the life cycle, selection against aneuploidy, or mechanistic constraints. No single hypothesis, however, can adequately explain the evolution of sex differences in all cases. Sex-specific recombination landscapes have important consequences for population differentiation and sex chromosome evolution. Collapse Key Words heterochiasmy meiotic drive recombination sex chromosomes Collapse MESH Headings Animals Biological Evolution Chromosomes/genetics Crossing Over, Genetic Epigenesis, Genetic Female Humans Male Meiosis Plants/genetics Recombination, Genetic Sex Characteristics Collapse Grants R01 GM116853 NIGMS NIH HHS Collapse Affiliation(s) Collapse
27	The Genomic Code: A Pervasive Encoding/Molding of Chromatin Structures and a Solution of the "Non-Coding DNA" Mystery. Bioessays 2019;41:e1900106. [PMID: 31701567 DOI: 10.1002/bies.201900106] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 08/07/2019] [Indexed: 12/15/2022] Abstract Recent investigations have revealed 1) that the isochores of the human genome group into two super-families characterized by two different long-range 3D structures, and 2) that these structures, essentially based on the distribution and topology of short sequences, mold primary chromatin domains (and define nucleosome binding). More specifically, GC-poor, gene-poor isochores are low-heterogeneity sequences with oligo-A spikes that mold the lamina-associated domains (LADs), whereas GC-rich, gene-rich isochores are characterized by single or multiple GC peaks that mold the topologically associating domains (TADs). The formation of these "primary TADs" may be followed by extrusion under the action of cohesin and CTCF. Finally, the genomic code, which is responsible for the pervasive encoding and molding of primary chromatin domains (LADs and primary TADs, namely the "gene spaces"/"spatial compartments") resolves the longstanding problems of "non-coding DNA," "junk DNA," and "selfish DNA" leading to a new vision of the genome as shaped by DNA sequences. Collapse Key Words chromatin folding principles isochores lamina-associated domains topologically associating domains Collapse MESH Headings Animals Cell Cycle Proteins/metabolism Chromatin/metabolism Chromosomal Proteins, Non-Histone/metabolism DNA/genetics DNA/metabolism Genome, Human/genetics Genomics/methods Humans Isochores/metabolism Cohesins Collapse Grants Collapse Affiliation(s) Collapse
28	Chromatin structure changes during various processes from a DNA sequence view. Curr Opin Struct Biol 2019;62:1-8. [PMID: 31765966 DOI: 10.1016/j.sbi.2019.10.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/14/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022] Abstract Chromatin mainly consists of protein and DNA, and the sequence information of DNA contributes to controlling the spatial structure of chromatin. Genome-wide contact patterns of chromosome at high precision uncover fine structural properties, conductive to exploring underlying mechanisms on structure establishment and function realization for chromatin. In this short review, we describe changes of chromatin structure during various biological processes from a DNA sequence view, with an increase of the overall domain segregation from birth to senescence and establishment of cell identity related cross-domain contacts. Segregation patterns vary with cell stage and genomic distance. Meanwhile, possible effects of cell cycle, temperature, nuclear lamina and nucleolus on chromatin structure are discussed. At last, important roles of transcription factors and other proteins in proper chromatin organization are also discussed. Collapse Key Words Collapse MESH Headings Animals Base Sequence Cell Differentiation Cellular Senescence Chromatin/chemistry Chromosome Segregation DNA/chemistry Humans Collapse Grants Collapse Affiliation(s) Collapse
29	From 1D sequence to 3D chromatin dynamics and cellular functions: a phase separation perspective. Nucleic Acids Res 2019;46:9367-9383. [PMID: 30053116 PMCID: PMC6182157 DOI: 10.1093/nar/gky633] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 07/12/2018] [Indexed: 11/28/2022] Open Abstract The high-order chromatin structure plays a non-negligible role in gene regulation. However, the mechanism, especially the sequence dependence for the formation of varied chromatin structures in different cells remains to be elucidated. As the nucleotide distributions in human and mouse genomes are highly uneven, we identified CGI (CpG island) forest and prairie genomic domains based on CGI densities of a species, dividing the genome into two sequentially, epigenetically, and transcriptionally distinct regions. These two megabase-sized domains also spatially segregate to different extents in different cell types. Forests and prairies show enhanced segregation from each other in development, differentiation, and senescence, meanwhile the multi-scale forest-prairie spatial intermingling is cell-type specific and increases in differentiation, helping to define cell identity. We propose that the phase separation of the 1D mosaic sequence in space serves as a potential driving force, and together with cell type specific epigenetic marks and transcription factors, shapes the chromatin structure in different cell types. The mosaicity in genome of different species in terms of forests and prairies could relate to observations in their biological processes like development and aging. In this way, we provide a bottoms-up theory to explain the chromatin structural and epigenetic changes in different processes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
30	Codon Usage Differences among Genes Expressed in Different Tissues of Drosophila melanogaster. Genome Biol Evol 2019;11:1054-1065. [PMID: 30859203 PMCID: PMC6456009 DOI: 10.1093/gbe/evz051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/08/2019] [Indexed: 12/22/2022] Open Abstract Codon usage patterns are affected by both mutational biases and translational selection. The frequency at which each codon is used in the genome is directly linked to the cellular concentrations of their corresponding tRNAs. Transfer RNA abundances—as well as the abundances of other potentially relevant factors, such as RNA-binding proteins—may vary across different tissues, making it possible that genes expressed in different tissues are subject to different translational selection regimes, and thus differ in their patterns of codon usage. These differences, however, are poorly understood, having been studied only in Arabidopsis, rice and human, with controversial results in human. Drosophila melanogaster is a suitable model organism to study tissue-specific codon adaptation given its large effective population size. Here, we compare 2,046 genes, each expressed specifically in one tissue of D. melanogaster. We show that genes expressed in different tissues exhibit significant differences in their patterns of codon usage, and that these differences are only partially due to differences in GC content, expression levels, or protein lengths. Remarkably, these differences are stronger when analyses are restricted to highly expressed genes. Our results strongly suggest that genes expressed in different tissues are subject to different regimes of translational selection. Collapse Key Words GC content codon usage expression multivariate analysis tissue specificity Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
31	Conditional Approximate Bayesian Computation: A New Approach for Across-Site Dependency in High-Dimensional Mutation-Selection Models. Mol Biol Evol 2019;35:2819-2834. [PMID: 30203003 DOI: 10.1093/molbev/msy173] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open Abstract A key question in molecular evolutionary biology concerns the relative roles of mutation and selection in shaping genomic data. Moreover, features of mutation and selection are heterogeneous along the genome and over time. Mechanistic codon substitution models based on the mutation-selection framework are promising approaches to separating these effects. In practice, however, several complications arise, since accounting for such heterogeneities often implies handling models of high dimensionality (e.g., amino acid preferences), or leads to across-site dependence (e.g., CpG hypermutability), making the likelihood function intractable. Approximate Bayesian Computation (ABC) could address this latter issue. Here, we propose a new approach, named Conditional ABC (CABC), which combines the sampling efficiency of MCMC and the flexibility of ABC. To illustrate the potential of the CABC approach, we apply it to the study of mammalian CpG hypermutability based on a new mutation-level parameter implying dependence across adjacent sites, combined with site-specific purifying selection on amino-acids captured by a Dirichlet process. Our proof-of-concept of the CABC methodology opens new modeling perspectives. Our application of the method reveals a high level of heterogeneity of CpG hypermutability across loci and mild heterogeneity across taxonomic groups; and finally, we show that CpG hypermutability is an important evolutionary factor in rendering relative synonymous codon usage. All source code is available as a GitHub repository (https://github.com/Simonll/LikelihoodFreePhylogenetics.git). Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
32	Genome-wide nucleotide patterns and potential mechanisms of genome divergence following domestication in maize and soybean. Genome Biol 2019;20:74. [PMID: 31018867 PMCID: PMC6482504 DOI: 10.1186/s13059-019-1683-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 03/28/2019] [Indexed: 01/21/2023] Open Abstract BACKGROUND Plant domestication provides a unique model to study genome evolution. Many studies have been conducted to examine genes, genetic diversity, genome structure, and epigenome changes associated with domestication. Interestingly, domesticated accessions have significantly higher [A] and [T] values across genome-wide polymorphic sites than accessions sampled from the corresponding progenitor species. However, the relative contributions of different genomic regions to this genome divergence pattern and underlying mechanisms have not been well characterized. RESULTS Here, we investigate the genome-wide base-composition patterns by analyzing millions of SNPs segregating among 100 accessions from a teosinte-maize comparison set and among 302 accessions from a wild-domesticated soybean comparison set. We show that non-genic part of the genome has a greater contribution than genic SNPs to the [AT]-increase observed between wild and domesticated accessions in maize and soybean. The separation between wild and domesticated accessions in [AT] values is significantly enlarged in non-genic and pericentromeric regions. Motif frequency and sequence context analyses show the motifs (PyCG) related to solar-UV signature are enriched in these regions, particularly when they are methylated. Additional analysis using population-private SNPs also implicates the role of these motifs in relatively recent mutations. With base-composition across polymorphic sites as a genome phenotype, genome scans identify a set of putative candidate genes involved in UV damage repair pathways. CONCLUSIONS The [AT]-increase is more pronounced in genomic regions that are non-genic, pericentromeric, transposable elements; methylated; and with low recombination. Our findings establish important links among UV radiation, mutation, DNA repair, methylation, and genome evolution. Collapse Key Words Base composition Domestication Evolution Genome divergence Methylation Mutation Solar UV UV damage repair Collapse MESH Headings Base Composition Domestication Mutation Polymorphism, Single Nucleotide Glycine max/genetics Glycine max/radiation effects Sunlight Zea mays/genetics Zea mays/radiation effects Collapse Grants Directorate for Biological Sciences Collapse Affiliation(s) Collapse
33	Fine-Grained Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana. Genetics 2018;211:703-714. [PMID: 30514707 PMCID: PMC6366913 DOI: 10.1534/genetics.118.301721] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 11/29/2018] [Indexed: 01/17/2023] Open Abstract Mutations are the ultimate source of all genetic variation. However, few direct estimates of the contribution of mutation to molecular genetic variation are available. To address this issue, we first analyzed the rate and spectrum of mutations in the Arabidopsis thaliana reference accession after 25 generations of single-seed descent. We then compared the mutation profile in these mutation accumulation (MA) lines against genetic variation observed in the 1001 Genomes Project. The estimated haploid single nucleotide mutation (SNM) rate for A. thaliana is 6.95 × 10⁻⁹ (SE ± 2.68 × 10⁻¹⁰) per site per generation, with SNMs having higher frequency in transposable elements (TEs) and centromeric regions. The estimated indel mutation rate is 1.30 × 10⁻⁹ (±1.07 × 10⁻¹⁰) per site per generation, with deletions being more frequent and larger than insertions. Among the 1694 unique SNMs identified in the MA lines, the positions of 389 SNMs (23%) coincide with biallelic SNPs from the 1001 Genomes population, and in 289 (17%) cases the changes are identical. Of the 329 unique indels identified in the MA lines, 96 (29%) overlap with indels from the 1001 Genomes dataset, and 16 indels (5% of the total) are identical. These overlap frequencies are significantly higher than expected, suggesting that de novo mutations are not uniformly distributed and arise at polymorphic sites more frequently than assumed. These results suggest that high mutation rate potentially contributes to high polymorphism and low mutation rate to reduced polymorphism in natural populations providing insights of mutational inputs in generating natural genetic diversity. Collapse Key Words Arabidopsis thaliana indel mutation accumulation line mutation rate natural polymorphism transposable element Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
34	Evidence for Strong Fixation Bias at 4-fold Degenerate Sites Across Genes in the Great Tit Genome. Front Ecol Evol 2018. [DOI: 10.3389/fevo.2018.00203] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
35	Genome assembly of the Pink Ipê (Handroanthus impetiginosus, Bignoniaceae), a highly valued, ecologically keystone Neotropical timber forest tree. Gigascience 2018;7:1-16. [PMID: 29253216 PMCID: PMC5905499 DOI: 10.1093/gigascience/gix125] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/30/2017] [Indexed: 12/30/2022] Open Abstract Background Handroanthus impetiginosus (Mart. ex DC.) Mattos is a keystone Neotropical hardwood tree widely distributed in seasonally dry tropical forests of South and Mesoamerica. Regarded as the “new mahogany,” it is the second most expensive timber, the most logged species in Brazil, and currently under significant illegal trading pressure. The plant produces large amounts of quinoids, specialized metabolites with documented antitumorous and antibiotic effects. The development of genomic resources is needed to better understand and conserve the diversity of the species, to empower forensic identification of the origin of timber, and to identify genes for important metabolic compounds. Findings The genome assembly covers 503.7 Mb (N50 = 81 316 bp), 90.4% of the 557-Mbp genome, with 13 206 scaffolds. A repeat database with 1508 sequences was developed, allowing masking of ∼31% of the assembly. Depth of coverage indicated that consensus determination adequately removed haplotypes assembled separately due to the extensive heterozygosity of the species. Automatic gene prediction provided 31 688 structures and 35 479 messenger RNA transcripts, while external evidence supported a well-curated set of 28 603 high-confidence models (90% of total). Finally, we used the genomic sequence and the comprehensive gene content annotation to identify genes related to the production of specialized metabolites. Conclusions This genome assembly is the first well-curated resource for a Neotropical forest tree and the first one for a member of the Bignoniaceae family, opening exceptional opportunities to empower molecular, phytochemical, and breeding studies. This work should inspire the development of similar genomic resources for the largely neglected forest trees of the mega-diverse tropical biomes. Collapse Key Words Bignoniaceae RNA-Seq heterozygous genome quinoids transposable elements Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
36	Statistical Estimation of Parameters for Binary Conditionally Nonlinear Autoregressive Time Series. MATHEMATICAL METHODS OF STATISTICS 2018. [DOI: 10.3103/s1066530718020023] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
37	Avian Influenza Virus PB1 Gene in H3N2 Viruses Evolved in Humans To Reduce Interferon Inhibition by Skewing Codon Usage toward Interferon-Altered tRNA Pools. mBio 2018;9:mBio.01222-18. [PMID: 29970470 PMCID: PMC6030557 DOI: 10.1128/mbio.01222-18] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open Abstract Influenza A viruses cause an annual contagious respiratory disease in humans and are responsible for periodic high-mortality human pandemics. Pandemic influenza A viruses usually result from the reassortment of gene segments between human and avian influenza viruses. These avian influenza virus gene segments need to adapt to humans. Here we focus on the human adaptation of the synonymous codons of the avian influenza virus PB1 gene of the 1968 H3N2 pandemic virus. We generated recombinant H3N2 viruses differing only in codon usage of PB1 mRNA and demonstrated that codon usage of the PB1 mRNA of recent H3N2 virus isolates enhances replication in interferon (IFN)-treated human cells without affecting replication in untreated cells, thereby partially alleviating the interferon-induced antiviral state. High-throughput sequencing of tRNA pools explains the reduced inhibition of replication by interferon: the levels of some tRNAs differ between interferon-treated and untreated human cells, and evolution of the codon usage of H3N2 PB1 mRNA is skewed toward interferon-altered human tRNA pools. Consequently, the avian influenza virus-derived PB1 mRNAs of modern H3N2 viruses have acquired codon usages that better reflect tRNA availabilities in IFN-treated cells. Our results indicate that the change in tRNA availabilities resulting from interferon treatment is a previously unknown aspect of the antiviral action of interferon, which has been partially overcome by human-adapted H3N2 viruses. Pandemic influenza A viruses that cause high human mortality usually result from reassortment of gene segments between human and avian influenza viruses. These avian influenza virus gene segments need to adapt to humans. Here we focus on the human adaptation of the avian influenza virus PB1 gene that was incorporated into the 1968 H3N2 pandemic virus. We demonstrate that the coding sequence of the PB1 mRNA of modern H3N2 viruses enhances replication in human cells in which interferon has activated a potent antiviral state. Reduced interferon inhibition results from evolution of PB1 mRNA codons skewed toward the pools of tRNAs in interferon-treated human cells, which, as shown here, differ significantly from the tRNA pools in untreated human cells. Consequently, avian influenza virus-derived PB1 mRNAs of modern H3N2 viruses have acquired codon usages that better reflect tRNA availabilities in IFN-treated cells and are translated more efficiently. Collapse Key Words codon usage evolution influenza A virus influenza PB1 protein interferon Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
38	Protein intrinsic disorder negatively associates with gene age in different eukaryotic lineages. MOLECULAR BIOSYSTEMS 2018;13:2044-2055. [PMID: 28783193 DOI: 10.1039/c7mb00230k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Abstract The emergence of new protein-coding genes in a specific lineage or species provides raw materials for evolutionary adaptations. Until recently, the biology of new genes emerging particularly from non-genic sequences remained unexplored. Although the new genes are subjected to variable selection pressure and face rapid deletion, some of them become functional and are retained in the gene pool. To acquire functional novelties, new genes often get integrated into the pre-existing ancestral networks. However, the mechanism by which young proteins acquire novel interactions remains unanswered till date. Since structural orientation contributes hugely to the mode of proteins' physical interactions, in this regard, we put forward an interesting question - Do new genes encode proteins with stable folds? Addressing the question, we demonstrated that the intrinsic disorder inversely correlates with the evolutionary gene ages - i.e. young proteins are richer in intrinsic disorder than the ancient ones. We further noted that young proteins, which are initially poorly connected hubs, prefer to be structurally more disordered than well-connected ancient proteins. The phenomenon strikingly defies the usual trend of well-connected proteins being highly disordered in structure. We justified that structural disorder might help poorly connected young proteins to undergo promiscuous interactions, which provides the foundation for novel protein interactions. The study focuses on the evolutionary perspectives of young proteins in the light of structural adaptations. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
39	Large scale variation in the rate of germ-line de novo mutation, base composition, divergence and diversity in humans. PLoS Genet 2018;14:e1007254. [PMID: 29590096 PMCID: PMC5891062 DOI: 10.1371/journal.pgen.1007254] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 04/09/2018] [Accepted: 02/13/2018] [Indexed: 01/17/2023] Open Abstract It has long been suspected that the rate of mutation varies across the human genome at a large scale based on the divergence between humans and other species. However, it is now possible to directly investigate this question using the large number of de novo mutations (DNMs) that have been discovered in humans through the sequencing of trios. We investigate a number of questions pertaining to the distribution of mutations using more than 130,000 DNMs from three large datasets. We demonstrate that the amount and pattern of variation differs between datasets at the 1MB and 100KB scales probably as a consequence of differences in sequencing technology and processing. In particular, datasets show different patterns of correlation to genomic variables such as replication time. Never-the-less there are many commonalities between datasets, which likely represent true patterns. We show that there is variation in the mutation rate at the 100KB, 1MB and 10MB scale that cannot be explained by variation at smaller scales, however the level of this variation is modest at large scales-at the 1MB scale we infer that ~90% of regions have a mutation rate within 50% of the mean. Different types of mutation show similar levels of variation and appear to vary in concert which suggests the pattern of mutation is relatively constant across the genome. We demonstrate that variation in the mutation rate does not generate large-scale variation in GC-content, and hence that mutation bias does not maintain the isochore structure of the human genome. We find that genomic features explain less than 40% of the explainable variance in the rate of DNM. As expected the rate of divergence between species is correlated to the rate of DNM. However, the correlations are weaker than expected if all the variation in divergence was due to variation in the mutation rate. We provide evidence that this is due the effect of biased gene conversion on the probability that a mutation will become fixed. In contrast to divergence, we find that most of the variation in diversity can be explained by variation in the mutation rate. Finally, we show that the correlation between divergence and DNM density declines as increasingly divergent species are considered. Collapse Key Words Collapse MESH Headings Animals Base Composition Datasets as Topic Gene Conversion Genetic Variation Genome, Human Germ-Line Mutation Humans Collapse Grants Collapse Affiliation(s) Collapse
40	Distinct patterns of simple sequence repeats and GC distribution in intragenic and intergenic regions of primate genomes. Aging (Albany NY) 2017;8:2635-2654. [PMID: 27644032 PMCID: PMC5191860 DOI: 10.18632/aging.101025] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 08/22/2016] [Indexed: 01/23/2023] Abstract As the first systematic examination of simple sequence repeats (SSRs) and guanine-cytosine (GC) distribution in intragenic and intergenic regions of ten primates, our study showed that SSRs and GC displayed nonrandom distribution for both intragenic and intergenic regions, suggesting that they have potential roles in transcriptional or translational regulation. Our results suggest that the majority of SSRs are distributed in non-coding regions, such as the introns, TEs, and intergenic regions. In these primates, trinucleotide perfect (P) SSRs were the most abundant repeats type in the 5'UTRs and CDSs, whereas, mononucleotide P-SSRs were the most in the intron, 3'UTRs, TEs, and intergenic regions. The GC-contents varied greatly among different intragenic and intergenic regions: 5'UTRs > CDSs > 3'UTRs > TEs > introns > intergenic regions, and high GC-content was frequently distributed in exon-rich regions. Our results also showed that in the same intragenic and intergenic regions, the distribution of GC-contents were great similarity in the different primates. Tri- and hexanucleotide P-SSRs had the most GC-contents in the 5'UTRs and CDSs, whereas mononucleotide P-SSRs had the least GC-contents in the six genomic regions of these primates. The most frequent motifs for different length varied obviously with the different genomic regions. Collapse Key Words GC Simple sequence repeats genomic regions patterns primate genomes Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
41	Mutational Biases and GC-Biased Gene Conversion Affect GC Content in the Plastomes of Dendrobium Genus. Int J Mol Sci 2017;18:E2307. [PMID: 29099062 PMCID: PMC5713276 DOI: 10.3390/ijms18112307] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 09/27/2017] [Accepted: 10/20/2017] [Indexed: 01/03/2023] Open Abstract The variation of GC content is a key genome feature because it is associated with fundamental elements of genome organization. However, the reason for this variation is still an open question. Different kinds of hypotheses have been proposed to explain the variation of GC content during genome evolution. However, these hypotheses have not been explicitly investigated in whole plastome sequences. Dendrobium is one of the largest genera in the orchid species. Evolutionary studies of the plastomic organization and base composition are limited in this genus. In this study, we obtained the high-quality plastome sequences of D. loddigesii and D. devonianum. The comparison results showed a nearly identical organization in Dendrobium plastomes, indicating that the plastomic organization is highly conserved in Dendrobium genus. Furthermore, the impact of three evolutionary forces-selection, mutational biases, and GC-biased gene conversion (gBGC)-on the variation of GC content in Dendrobium plastomes was evaluated. Our results revealed: (1) consistent GC content evolution trends and mutational biases in single-copy (SC) and inverted repeats (IRs) regions; and (2) that gBGC has influenced the plastome-wide GC content evolution. These results suggest that both mutational biases and gBGC affect GC content in the plastomes of Dendrobium genus. Collapse Key Words Dendrobium GC-biased gene conversion (gBGC) GCeq mutational biases plastome assembly selection Collapse MESH Headings Base Composition Dendrobium/genetics Evolution, Molecular Gene Conversion Genome, Plastid Mutation Phylogeny Plastids/genetics Collapse Grants Collapse Affiliation(s) Collapse
42	High GC content causes orphan proteins to be intrinsically disordered. PLoS Comput Biol 2017;13:e1005375. [PMID: 28355220 PMCID: PMC5389847 DOI: 10.1371/journal.pcbi.1005375] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 04/12/2017] [Accepted: 01/21/2017] [Indexed: 01/29/2023] Open Abstract De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken. We show that the GC content of a genome is of great importance for the properties of an orphan protein. GC content affects the frequency of the codons and this affects the probability for each amino acid to be included in a de novo created protein. The codons encoding for Ala, Pro and Gly contain 80% GC, while codons for Lys, Phe, Asn, Tyr and Ile contain 20% or less. The three high GC amino acids are all disorder promoting, while Phe, Tyr and Ile are order promoting. Therefore, random protein sequences at a high GC will be more disordered than the ones created at a low GC. The structural properties of the youngest proteins match to a large degree the properties of random proteins when the GC content is taken into account. In contrast, structural properties of ancient proteins only show a weak correlation with GC content. This suggests that even after fixation in the population, proteins largely resemble random proteins given a certain GC content. Thereafter, during evolution the correlation between structural properties and GC weakens. Collapse Key Words Collapse MESH Headings Animals Base Composition Computational Biology Databases, Protein Drosophila Proteins/chemistry Drosophila Proteins/genetics Evolution, Molecular Gene Ontology Intrinsically Disordered Proteins/chemistry Intrinsically Disordered Proteins/genetics Open Reading Frames Phylogeny Saccharomyces cerevisiae Proteins/chemistry Saccharomyces cerevisiae Proteins/genetics Selection, Genetic Structural Homology, Protein Collapse Grants Vetenskapsrådet BILS SNIC Collapse Affiliation(s) Collapse
43	Migration of mitochondrial DNA in the nuclear genome of colorectal adenocarcinoma. Genome Med 2017;9:31. [PMID: 28356157 PMCID: PMC5370490 DOI: 10.1186/s13073-017-0420-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2016] [Accepted: 03/09/2017] [Indexed: 12/31/2022] Open Abstract Background Colorectal adenocarcinomas are characterized by abnormal mitochondrial DNA (mtDNA) copy number and genomic instability, but a molecular interaction between mitochondrial and nuclear genome remains unknown. Here we report the discovery of increased copies of nuclear mtDNA (NUMT) in colorectal adenocarcinomas, which supports link between mtDNA and genomic instability in the nucleus. We name this phenomenon of nuclear occurrence of mitochondrial component as numtogenesis. We provide a description of NUMT abundance and distribution in tumor versus matched blood-derived normal genomes. Methods Whole-genome sequence data were obtained for colon adenocarcinoma and rectum adenocarcinoma patients participating in The Cancer Genome Atlas, via the Cancer Genomics Hub, using the GeneTorrent file acquisition tool. Data were analyzed to determine NUMT proportion and distribution on a genome-wide scale. A NUMT suppressor gene was identified by comparing numtogenesis in other organisms. Results Our study reveals that colorectal adenocarcinoma genomes, on average, contains up to 4.2-fold more somatic NUMTs than matched normal genomes. Women colorectal tumors contained more NUMT than men. NUMT abundance in tumor predicted parallel abundance in blood. NUMT abundance positively correlated with GC content and gene density. Increased numtogenesis was observed with higher mortality. We identified YME1L1, a human homolog of yeast YME1 (yeast mitochondrial DNA escape 1) to be frequently mutated in colorectal tumors. YME1L1 was also mutated in tumors derived from other tissues. We show that inactivation of YME1L1 results in increased transfer of mtDNA in the nuclear genome. Conclusions Our study demonstrates increased somatic transfer of mtDNA in colorectal tumors. Our study also reveals sex-based differences in frequency of NUMT occurrence and that NUMT in blood reflects NUMT in tumors, suggesting NUMT may be used as a biomarker for tumorigenesis. We identify YME1L1 as the first NUMT suppressor gene in human and demonstrate that inactivation of YME1L1 induces migration of mtDNA to the nuclear genome. Our study reveals that numtogenesis plays an important role in the development of cancer. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0420-6) contains supplementary material, which is available to authorized users. Collapse Key Words Cancer Colorectal cancer Genetic instability Mitochondria Mitochondrial DNA NUMT Numtogenesis Tumor YME1L1 mtDNA transfer Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
44	Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations. Genetics 2016;205:843-856. [PMID: 27974498 DOI: 10.1534/genetics.116.195677] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 12/01/2016] [Indexed: 11/18/2022] Open Abstract Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A[Formula: see text]G mutations. We show that major effects of neighbors on germline mutation lie within [Formula: see text] of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T[Formula: see text]C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif. Collapse Key Words 5-methyl-cytosine bioinformatics context dependent mutation germline mutation log-linear model mutation spectrum sequence motif analysis somatic mutation Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
45	The length of chromatin loops in meiotic prophase I of warm-blooded vertebrates depends on the DNA compositional organization. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416110144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
46	A comparative study on the regulatory region of the PERIOD1 gene among diurnal/nocturnal primates. J Physiol Anthropol 2016;35:21. [PMID: 27680326 PMCID: PMC5039903 DOI: 10.1186/s40101-016-0111-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 09/14/2016] [Indexed: 11/10/2022] Open Abstract BACKGROUND The circadian clock is set up around a 24-h period in humans who are awake in the daytime and sleep in the nighttime, accompanied with physiological and metabolic rhythms. Most haplorhine primates, including humans, are diurnal, while most "primitive" strepsirrhine primates are nocturnal, suggesting primates have evolved from nocturnal to diurnal habits. The mechanisms of physiological changes causing the habits and of genetic changes causing the physiological changes are, however, unknown. To reveal these mechanisms, we focus on the nucleotide sequences of the regulatory region of the PERIOD1 (PER1) gene that is known as one of the key elements of the circadian clock in mammalians. METHODS We determined nucleotide sequences of the regulatory region of PER1 concerning the gene expression for six primates and compared those with those of eight primates from the international DNA database. Based on the sequence data, we constructed a phylogenetic tree including both the diurnal/nocturnal species and investigated the guanine and cytosine (GC) content in the regulatory region. RESULTS The motif sequences regulating gene expression were evolutionary conservative in the primates examined. The phylogenetic tree simply showed phylogenetic relationship among the species and no branching pattern distinguishable between the diurnal and nocturnal groups. We found two cores showing a statistically significant difference between the diurnal and the nocturnal habits related to the GC contents of the regulatory region of PER1. CONCLUSION Our results suggest the possibility that the two cores in the upstream region of PER1 are related to the regulation of gene expression leading to behavioral differences between diurnal and nocturnal primates. Collapse Key Words Circadian clock Diurnal GC content Nocturnal PERIOD1 Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
47	OcculterCut: A Comprehensive Survey of AT-Rich Regions in Fungal Genomes. Genome Biol Evol 2016;8:2044-64. [PMID: 27289099 PMCID: PMC4943192 DOI: 10.1093/gbe/evw121] [Citation(s) in RCA: 83] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2016] [Indexed: 12/03/2022] Open Abstract We present a novel method to measure the local GC-content bias in genomes and a survey of published fungal species. The method, enacted as "OcculterCut" (https://sourceforge.net/projects/occultercut, last accessed April 30, 2016), identified species containing distinct AT-rich regions. In most fungal taxa, AT-rich regions are a signature of repeat-induced point mutation (RIP), which targets repetitive DNA and decreases GC-content though the conversion of cytosine to thymine bases. RIP has in turn been identified as a driver of fungal genome evolution, as RIP mutations can also occur in single-copy genes neighboring repeat-rich regions. Over time RIP perpetuates "two speeds" of gene evolution in the GC-equilibrated and AT-rich regions of fungal genomes. In this study, genomes showing evidence of this process are found to be common, particularly among the Pezizomycotina. Further analysis highlighted differences in amino acid composition and putative functions of genes from these regions, supporting the hypothesis that these regions play an important role in fungal evolution. OcculterCut can also be used to identify genes undergoing RIP-assisted diversifying selection, such as small, secreted effector proteins that mediate host-microbe disease interactions. Collapse Key Words fungi genome evolution isochore repeat-induced point mutation two-speed genome Collapse MESH Headings AT Rich Sequence/genetics Ascomycota/genetics DNA Transposable Elements/genetics DNA, Fungal/genetics Evolution, Molecular Genome, Fungal Mutation Phylogeny Collapse Grants Collapse Affiliation(s) Collapse
48	Cytochrome P450 genes in coronary artery diseases: Codon usage analysis reveals genomic GC adaptation. Gene 2016;590:35-43. [PMID: 27275533 DOI: 10.1016/j.gene.2016.06.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2016] [Revised: 04/12/2016] [Accepted: 06/03/2016] [Indexed: 10/21/2022] Abstract Establishing codon usage biases are imperative for understanding the etiology of coronary artery diseases (CAD) as well as the genetic factors associated with these diseases. The aim of this study was to evaluate the contribution of 18 responsible cytochrome P450 (CYP) genes for the risk of CAD. Effective number of codon (Nc) showed a negative correlation with both GC3 and synonymous codon usage order (SCUO) suggesting an antagonistic relationship between codon usage and Nc of genes. The dinucleotide analysis revealed that CG and TA dinucleotides have the lowest odds ratio in these genes. Principal component analysis showed that GC composition has a profound effect in separating the genes along the first major axis. Our findings revealed that mutational pressure and natural selection could possibly be the major factors responsible for codon bias in these genes. The study not only offers an insight into the mechanisms of genomic GC adaptation, but also illustrates the complexity of CYP genes in CAD. Collapse Key Words Codon usage bias Coronary artery disease Cytochrome P450 Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
49	Evolutionary Rate Heterogeneity of Primary and Secondary Metabolic Pathway Genes in Arabidopsis thaliana. Genome Biol Evol 2015;8:17-28. [PMID: 26556590 PMCID: PMC4758233 DOI: 10.1093/gbe/evv217] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open Abstract Primary metabolism is essential to plants for growth and development, and secondary metabolism helps plants to interact with the environment. Many plant metabolites are industrially important. These metabolites are produced by plants through complex metabolic pathways. Lack of knowledge about these pathways is hindering the successful breeding practices for these metabolites. For a better knowledge of the metabolism in plants as a whole, evolutionary rate variation of primary and secondary metabolic pathway genes is a prerequisite. In this study, evolutionary rate variation of primary and secondary metabolic pathway genes has been analyzed in the model plant Arabidopsis thaliana. Primary metabolic pathway genes were found to be more conserved than secondary metabolic pathway genes. Several factors such as gene structure, expression level, tissue specificity, multifunctionality, and domain number are the key factors behind this evolutionary rate variation. This study will help to better understand the evolutionary dynamics of plant metabolism. Collapse Key Words effective number of codons metabolic pathway genes multifunctionality principal component analysis Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
50	Synonymous codon usage bias in plant mitochondrial genes is associated with intron number and mirrors species evolution. PLoS One 2015;10:e0131508. [PMID: 26110418 PMCID: PMC4481540 DOI: 10.1371/journal.pone.0131508] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 06/03/2015] [Indexed: 11/21/2022] Open Abstract Synonymous codon usage bias (SCUB) is a common event that a non-uniform usage of codons often occurs in nearly all organisms. We previously found that SCUB is correlated with both intron number and exon position in the plant nuclear genome but not in the plastid genome; SCUB in both nuclear and plastid genome can mirror the evolutionary specialization. However, how about the rules in the mitochondrial genome has not been addressed. Here, we present an analysis of SCUB in the mitochondrial genome, based on 24 plant species ranging from algae to land plants. The frequencies of NNA and NNT (A- and T-ending codons) are higher than those of NNG and NNC, with the strongest preference in bryophytes and the weakest in land plants, suggesting an association between SCUB and plant evolution. The preference for NNA and NNT is more evident in genes harboring a greater number of introns in land plants, but the bias to NNA and NNT exhibits even among exons. The pattern of SCUB in the mitochondrial genome differs in some respects to that present in both the nuclear and plastid genomes. Collapse Key Words Collapse MESH Headings Biological Evolution Chloroplasts/genetics Codon Codon, Terminator DNA, Mitochondrial/genetics Evolution, Molecular Exons Gene Frequency Genes, Mitochondrial Genes, Plant Introns Phylogeny Plants/genetics Plastids/genetics Species Specificity Collapse Grants Collapse Affiliation(s) Collapse