1
|
Rakheja I, Bharti V, Sahana S, Das PK, Ranjan G, Kumar A, Jain N, Maiti S. Development of an In Silico Platform (TRIPinRNA) for the Identification of Novel RNA Intramolecular Triple Helices and Their Validation Using Biophysical Techniques. Biochemistry 2024. [PMID: 39668452 DOI: 10.1021/acs.biochem.4c00334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2024]
Abstract
There are surprisingly few RNA intramolecular triple helices known in the human transcriptome. The structure has been most well-studied as a stability-element at the 3' end of lncRNAs such as MALAT1 and NEAT1, but the intrigue remains whether it is indeed as rare as it is understood to be or just waiting for a closer look from a new vantage point. TRIPinRNA, our Python-based in silico platform, allows for a comprehensive sequence-pattern search for potential triplex formation in the human transcriptome─noncoding as well as coding. Using this tool, we report the putative occurrence of homopyrimidine type (canonical) triple helices as well as heteropurine-pyrimidine strand type (noncanonical) triple helices in the human transcriptome and validate the formation of both types of triplexes using biophysical approaches. We find that the occurrence of triplex structures has a strong correlation with local GC content, which might be influencing their formation. By employing a search that encompasses both canonical and noncanonical triplex structures across the human transcriptome, this study enriches the understanding of RNA biology. Lastly, TRIPinRNA can be utilized in finding triplex structures for any organism with an annotated transcriptome.
Collapse
Affiliation(s)
- Isha Rakheja
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Vishal Bharti
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - S Sahana
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Prosad Kumar Das
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Gyan Ranjan
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Ajit Kumar
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Niyati Jain
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Souvik Maiti
- CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Institute of Genomics and Integrative Biology (IGIB)-National Chemical Laboratory (NCL) Joint Center, Council of Scientific and Industrial Research-NCL, Pune 411008, India
| |
Collapse
|
2
|
Liu PC, Wang ZY, Qi M, Hu HY. The Chromosome-level Genome Provides Insights into the Evolution and Adaptation of Extreme Aggression. Mol Biol Evol 2024; 41:msae195. [PMID: 39271164 PMCID: PMC11427683 DOI: 10.1093/molbev/msae195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 08/29/2024] [Accepted: 09/09/2024] [Indexed: 09/15/2024] Open
Abstract
Extremely aggressive behavior, as the special pattern, is rare in most species and characteristic as contestants severely injured or killed ending the combat. Current studies of extreme aggression are mainly from the perspectives of behavioral ecology and evolution, while lacked the aspects of molecular evolutionary biology. Here, a high-quality chromosome-level genome of the parasitoid Anastatus disparis was provided, in which the males exhibit extreme mate-competition aggression. The integrated multiomics analysis highlighted that neurotransmitter dopamine overexpression, energy metabolism (especially from lipid), and antibacterial activity are likely major aspects of evolutionary formation and adaptation for extreme aggression in A. disparis. Conclusively, our study provided new perspectives for molecular evolutionary studies of extreme aggression as well as a valuable genomic resource in Hymenoptera.
Collapse
Affiliation(s)
- Peng-Cheng Liu
- The School of Ecology and Environment, Anhui Normal University, Wuhu, Anhui Province, China
| | - Zi-Yin Wang
- The School of Ecology and Environment, Anhui Normal University, Wuhu, Anhui Province, China
| | - Mei Qi
- The School of Ecology and Environment, Anhui Normal University, Wuhu, Anhui Province, China
| | - Hao-Yuan Hu
- The School of Ecology and Environment, Anhui Normal University, Wuhu, Anhui Province, China
| |
Collapse
|
3
|
Qiu Y, Kang YM, Korfmann C, Pouyet F, Eckford A, Palazzo AF. The GC-content at the 5' ends of human protein-coding genes is undergoing mutational decay. Genome Biol 2024; 25:219. [PMID: 39138526 PMCID: PMC11323403 DOI: 10.1186/s13059-024-03364-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 07/31/2024] [Indexed: 08/15/2024] Open
Abstract
BACKGROUND In vertebrates, most protein-coding genes have a peak of GC-content near their 5' transcriptional start site (TSS). This feature promotes both the efficient nuclear export and translation of mRNAs. Despite the importance of GC-content for RNA metabolism, its general features, origin, and maintenance remain mysterious. We investigate the evolutionary forces shaping GC-content at the transcriptional start site (TSS) of genes through both comparative genomic analysis of nucleotide substitution rates between different species and by examining human de novo mutations. RESULTS Our data suggests that GC-peaks at TSSs were present in the last common ancestor of amniotes, and likely that of vertebrates. We observe that in apes and rodents, where recombination is directed away from TSSs by PRDM9, GC-content at the 5' end of protein-coding gene is currently undergoing mutational decay. In canids, which lack PRDM9 and perform recombination at TSSs, GC-content at the 5' end of protein-coding is increasing. We show that these patterns extend into the 5' end of the open reading frame, thus impacting synonymous codon position choices. CONCLUSIONS Our results indicate that the dynamics of this GC-peak in amniotes is largely shaped by historic patterns of recombination. Since decay of GC-content towards the mutation rate equilibrium is the default state for non-functional DNA, the observed decrease in GC-content at TSSs in apes and rodents indicates that the GC-peak is not being maintained by selection on most protein-coding genes in those species.
Collapse
Affiliation(s)
- Yi Qiu
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5G1M1, Canada
| | - Yoon Mo Kang
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5G1M1, Canada
| | - Christopher Korfmann
- Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, M3J1P3, Canada
| | - Fanny Pouyet
- Laboratoire Interdisciplinaire des Sciences du Numérique, Université Paris-Saclay, 91190, Gif-sur-Yvette, France
| | - Andrew Eckford
- Department of Electrical Engineering and Computer Science, York University, Toronto, Ontario, M3J1P3, Canada
| | - Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, Ontario, M5G1M1, Canada.
| |
Collapse
|
4
|
Picard MAL, Leblay F, Cassan C, Willemsen A, Daron J, Bauffe F, Decourcelle M, Demange A, Bravo IG. Transcriptomic, proteomic, and functional consequences of codon usage bias in human cells during heterologous gene expression. Protein Sci 2023; 32:e4576. [PMID: 36692287 PMCID: PMC9926478 DOI: 10.1002/pro.4576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 01/12/2023] [Accepted: 01/14/2023] [Indexed: 01/25/2023]
Abstract
Differences in codon frequency between genomes, genes, or positions along a gene, modulate transcription and translation efficiency, leading to phenotypic and functional differences. Here, we present a multiscale analysis of the effects of synonymous codon recoding during heterologous gene expression in human cells, quantifying the phenotypic consequences of codon usage bias at different molecular and cellular levels, with an emphasis on translation elongation. Six synonymous versions of an antibiotic resistance gene were generated, fused to a fluorescent reporter, and independently expressed in HEK293 cells. Multiscale phenotype was analyzed by means of quantitative transcriptome and proteome assessment, as proxies for gene expression; cellular fluorescence, as a proxy for single-cell level expression; and real-time cell proliferation in absence or presence of antibiotic, as a proxy for the cell fitness. We show that differences in codon usage bias strongly impact the molecular and cellular phenotype: (i) they result in large differences in mRNA levels and protein levels, leading to differences of over 15 times in translation efficiency; (ii) they introduce unpredicted splicing events; (iii) they lead to reproducible phenotypic heterogeneity; and (iv) they lead to a trade-off between the benefit of antibiotic resistance and the burden of heterologous expression. In human cells in culture, codon usage bias modulates gene expression by modifying mRNA availability and suitability for translation, leading to differences in protein levels and eventually eliciting functional phenotypic changes.
Collapse
Affiliation(s)
- Marion A. L. Picard
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Fiona Leblay
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Cécile Cassan
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Anouk Willemsen
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Josquin Daron
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Frédérique Bauffe
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Mathilde Decourcelle
- BioCampus Montpellier (University of Montpellier, CNRS, INSERM)MontpellierFrance
| | - Antonin Demange
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| | - Ignacio G. Bravo
- French National Center for Scientific ResearchLaboratory MIVEGEC (CNRS, IRD, University of Montpellier)MontpellierFrance
| |
Collapse
|
5
|
Lodato MA, Ziegenfuss JS. The two faces of DNA oxidation in genomic and functional mosaicism during aging in human neurons. FRONTIERS IN AGING 2022; 3:991460. [PMID: 36313183 PMCID: PMC9596766 DOI: 10.3389/fragi.2022.991460] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 09/26/2022] [Indexed: 11/29/2022]
Abstract
Maintaining genomic integrity in post-mitotic neurons in the human brain is paramount because these cells must survive for an individual's entire lifespan. Due to life-long synaptic plasticity and electrochemical transmission between cells, the brain engages in an exceptionally high level of mitochondrial metabolic activity. This activity results in the generation of reactive oxygen species with 8-oxo-7,8-dihydroguanine (8-oxoG) being one of the most prevalent oxidation products in the cell. 8-oxoG is important for the maintenance and transfer of genetic information into proper gene expression: a low basal level of 8-oxoG plays an important role in epigenetic modulation of neurodevelopment and synaptic plasticity, while a dysregulated increase in 8-oxoG damages the genome leading to somatic mutations and transcription errors. The slow yet persistent accumulation of DNA damage in the background of increasing cellular 8-oxoG is associated with normal aging as well as neurological disorders such as Alzheimer's disease and Parkinson's disease. This review explores the current understanding of how 8-oxoG plays a role in brain function and genomic instability, highlighting new methods being used to advance pathological hallmarks that differentiate normal healthy aging and neurodegenerative disease.
Collapse
Affiliation(s)
- Michael A. Lodato
- University of Massachusetts Chan Medical School, Worcester, MA, United States
| | | |
Collapse
|
6
|
CSB-independent, XPC-dependent transcription-coupled repair in Drosophila. Proc Natl Acad Sci U S A 2022; 119:2123163119. [PMID: 35217627 PMCID: PMC8892495 DOI: 10.1073/pnas.2123163119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/27/2022] [Indexed: 02/08/2023] Open
Abstract
Drosophila melanogaster has been extensively used as a model system to study ionizing radiation and chemical-induced mutagenesis, double-strand break repair, and recombination. However, there are only limited studies on nucleotide excision repair in this important model organism. An early study reported that Drosophila lacks the transcription-coupled repair (TCR) form of nucleotide excision repair. This conclusion was seemingly supported by the Drosophila genome sequencing project, which revealed that Drosophila lacks a homolog to CSB, which is known to be required for TCR in mammals and yeasts. However, by using excision repair sequencing (XR-seq) genome-wide repair mapping technology, we recently found that the Drosophila S2 cell line performs TCR comparable to human cells. Here, we have extended this work to Drosophila at all its developmental stages. We find TCR takes place throughout the life cycle of the organism. Moreover, we find that in contrast to humans and other multicellular organisms previously studied, the XPC repair factor is required for both global and transcription-coupled repair in Drosophila.
Collapse
|
7
|
Khandia R, Ali Khan A, Alexiou A, Povetkin SN, Nikolaevna VM. Codon Usage Analysis of Pro-Apoptotic Bim Gene Isoforms. J Alzheimers Dis 2022; 86:1711-1725. [DOI: 10.3233/jad-215691] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Background: Bim is a Bcl-2 homology 3 (BH3)-only proteins, a group of pro-apoptotic proteins involved in physiological and pathological conditions. Both the overexpression and under-expression of Bim protein are associated with the diseased condition, and various isoforms of Bim protein are present with differential apoptotic potential. Objective: The present study attempted to envisage the association of various molecular signatures with the codon choices of Bim isoforms. Methods: Molecular signatures like composition, codon usage, nucleotide skews, the free energy of mRNA transcript, physical properties of proteins, codon adaptation index, relative synonymous codon usage, and dinucleotide odds ratio were determined and analyzed for their associations with codon choices of Bim gene. Results: Skew analysis of the Bim gene indicated the preference of C nucleotide over G, A, and T and preference of G over T and A nucleotides was observed. An increase in C content at the first and third codon position increased gene expression while it decreased at the second codon position. Compositional constraints on nucleotide C at all three codon positions affected gene expression. The analysis revealed an exceptionally high usage of CpC dinucleotide in all the envisaged 31 isoforms of Bim. We correlated it with the requirement of rapid demethylation machinery to fine-tune the Bimgene expression. Also, mutational pressure played a dominant role in shaping codon usage bias in Bim isoforms. Conclusion: An exceptionally high usage of CpC dinucleotide in all the envisaged 31 isoforms of Bim indicates a high order selectional force to fine tune Bim gene expression.
Collapse
Affiliation(s)
- Rekha Khandia
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal, India
| | - Azmat Ali Khan
- Pharmaceutical Biotechnology Laboratory, Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Australia & AFNP Med, Austria
| | | | | |
Collapse
|
8
|
Górski AZ, Piwowar M. Nucleotide spacing distribution analysis for human genome. Mamm Genome 2021; 32:123-128. [PMID: 33723659 PMCID: PMC8012312 DOI: 10.1007/s00335-021-09865-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/02/2021] [Indexed: 11/30/2022]
Abstract
The distribution of nucleotides spacing in human genome was investigated. An analysis of the frequency of occurrence in the human genome of different sequence lengths flanked by one type of nucleotide was carried out showing that the distribution has no self-similar (fractal) structure. The results nevertheless revealed several characteristic features: (i) the distribution for short-range spacing is quite similar to the purely stochastic sequences; (ii) the distribution for long-range spacing essentially deviates from the random sequence distribution, showing strong long-range correlations; (iii) the differences between (A, T) and (C, G) nucleotides are quite significant; (iv) the spacing distribution displays tiny oscillations.
Collapse
Affiliation(s)
- Andrzej Z Górski
- Polish Academy of Sciences, Institute of Nuclear Physics, Radzikowskiego 152 st, 31-342, Kraków, Poland
| | - Monika Piwowar
- Jagiellonian University, Collegium Medicum, Kopernika 7E st, 31-034, Kraków, Poland.
| |
Collapse
|
9
|
The Role of H3K4 Trimethylation in CpG Islands Hypermethylation in Cancer. Biomolecules 2021; 11:biom11020143. [PMID: 33499170 PMCID: PMC7912453 DOI: 10.3390/biom11020143] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 12/30/2020] [Accepted: 01/15/2021] [Indexed: 01/01/2023] Open
Abstract
CpG methylation in transposons, exons, introns and intergenic regions is important for long-term silencing, silencing of parasitic sequences and alternative promoters, regulating imprinted gene expression and determining X chromosome inactivation. Promoter CpG islands, although rich in CpG dinucleotides, are unmethylated and remain so during all phases of mammalian embryogenesis and development, except in specific cases. The biological mechanisms that contribute to the maintenance of the unmethylated state of CpG islands remain elusive, but the modification of established DNA methylation patterns is a common feature in all types of tumors and is considered as an event that intrinsically, or in association with genetic lesions, feeds carcinogenesis. In this review, we focus on the latest results describing the role that the levels of H3K4 trimethylation may have in determining the aberrant hypermethylation of CpG islands in tumors.
Collapse
|
10
|
Palazzo AF, Kang YM. GC-content biases in protein-coding genes act as an "mRNA identity" feature for nuclear export. Bioessays 2020; 43:e2000197. [PMID: 33165929 DOI: 10.1002/bies.202000197] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 01/11/2023]
Abstract
It has long been observed that human protein-coding genes have a particular distribution of GC-content: the 5' end of these genes has high GC-content while the 3' end has low GC-content. In 2012, it was proposed that this pattern of GC-content could act as an mRNA identity feature that would lead to it being better recognized by the cellular machinery to promote its nuclear export. In contrast, junk RNA, which largely lacks this feature, would be retained in the nucleus and targeted for decay. Now two recent papers have provided evidence that GC-content does promote the nuclear export of many mRNAs in human cells.
Collapse
Affiliation(s)
- Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, ON, M5G 1M1, Canada
| | - Yoon Mo Kang
- Department of Biochemistry, University of Toronto, Toronto, ON, M5G 1M1, Canada
| |
Collapse
|
11
|
An intron-derived motif strongly increases gene expression from transcribed sequences through a splicing independent mechanism in Arabidopsis thaliana. Sci Rep 2019; 9:13777. [PMID: 31551463 PMCID: PMC6760150 DOI: 10.1038/s41598-019-50389-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 09/10/2019] [Indexed: 12/29/2022] Open
Abstract
Certain introns significantly increase mRNA accumulation by a poorly understood mechanism. These introns have no effect when located upstream, or more than ~1 Kb downstream, of the start of transcription. We tested the ability of a formerly non-stimulating intron containing 11 copies of the sequence TTNGATYTG, which is over-represented in promoter-proximal introns in Arabidopsis thaliana, to affect expression from various positions. The activity profile of this intron at different locations was similar to that of a natural intron from the UBQ10 gene, suggesting that the motif increases mRNA accumulation by the same mechanism. A series of introns with different numbers of this motif revealed that the effect on expression is linearly dependent on motif copy number up to at least 20, with each copy adding another 1.5-fold increase in mRNA accumulation. Furthermore, 6 copies of the motif stimulated mRNA accumulation to a similar degree from within an intron or when introduced into the 5'-UTR and coding sequences of an intronless construct, demonstrating that splicing is not required for this sequence to boost expression. The ability of this motif to substantially elevate expression from several hundred nucleotides downstream of the transcription start site reveals a novel type of eukaryotic gene regulation.
Collapse
|
12
|
Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019; 19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. RESULTS We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic "tree of life". Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. CONCLUSIONS Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.,Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Snoeck
- Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - L Van Lommel
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
13
|
Fuertes MA, Rodrigo JR, Alonso C. Conserved Critical Evolutionary Gene Structures in Orthologs. J Mol Evol 2019; 87:93-105. [PMID: 30815710 DOI: 10.1007/s00239-019-09889-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 02/13/2019] [Indexed: 12/18/2022]
Abstract
Unravelling gene structure requires the identification and understanding of the constraints that are often associated with the evolutionary history and functional domains of genes. We speculated in this manuscript with the possibility of the existence in orthologs of an emergent highly conserved gene structure that might explain their coordinated evolution during speciation events and their parental function. Here, we will address the following issues: (1) is there any conserved hypothetical structure along ortholog gene sequences? (2) If any, are such conserved structures maintained and conserved during speciation events? The data presented show evidences supporting this hypothesis. We have found that, (1) most orthologs studied share highly conserved compositional structures not observed previously. (2) While the percent identity of nucleotide sequences of orthologs correlates with the percent identity of composon sequences, the number of emergent compositional structures conserved during speciation does not correlate with the percent identity. (3) A broad range of species conserves the emergent compositional stretches. We will also discuss the concept of critical gene structure.
Collapse
Affiliation(s)
- Miguel A Fuertes
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.
| | | | - Carlos Alonso
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain
| |
Collapse
|
14
|
Compositional dynamics and codon usage pattern of BRCA1 gene across nine mammalian species. Genomics 2019; 111:167-176. [DOI: 10.1016/j.ygeno.2018.01.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 12/22/2017] [Accepted: 01/22/2018] [Indexed: 11/19/2022]
|
15
|
Castillo AI, Nelson ADL, Lyons E. Tail Wags the Dog? Functional Gene Classes Driving Genome-Wide GC Content in Plasmodium spp. Genome Biol Evol 2019; 11:497-507. [PMID: 30689842 PMCID: PMC6385630 DOI: 10.1093/gbe/evz015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/18/2019] [Indexed: 01/16/2023] Open
Abstract
Plasmodium parasites are valuable models to understand how nucleotide composition affects mutation, diversification, and adaptation. No other observed eukaryotes have undergone such large changes in genomic Guanine-Cytosine (GC) content as seen in the genus Plasmodium (∼30% within 35-40 Myr). Although mutational biases are known to influence GC content in the human-infective Plasmodium vivax and Plasmodium falciparum; no study has addressed how different gene functional classes contribute to genus-wide compositional changes, or if Plasmodium GC content variation is driven by natural selection. Here, we tested the hypothesis that certain gene processes and functions drive variation in global GC content between Plasmodium species. We performed a large-scale comparative genomic analysis using the genomes and predicted genes of 17 Plasmodium species encompassing a wide genomic GC content range. Genic GC content was sorted and divided into ten equally sized quantiles that were then assessed for functional enrichment classes. In agreement that selection on gene classes may drive genomic GC content, trans-membrane proteins were enriched within extreme GC content quantiles (Q1 and Q10). Specifically, variant surface antigens, which primarily interact with vertebrate immune systems, showed skewed GC content distributions compared with other trans-membrane proteins. Although a definitive causation linking GC content, expression, and positive selection within variant surface antigens from Plasmodium vivax, Plasmodium berghei, and Plasmodium falciparum could not be established, we found that regardless of genomic nucleotide composition, genic GC content and expression were positively correlated during trophozoite stages. Overall, these data suggest that, alongside mutational biases, functional protein classes drive Plasmodium GC content change.
Collapse
Affiliation(s)
- Andreina I Castillo
- School of Environmental Science, Policy, and Management, University of California, Berkeley
| | | | - Eric Lyons
- BIO5 Institute, School of Plant Sciences, University of Arizona
| |
Collapse
|
16
|
Abstract
Peptides encoded by short open reading frames (sORFs) are usually defined as peptides ≤100 aa long. Usually sORFs were ignored by automatic genome annotation programs due to the high probability of false discovery. However, improved computational tools along with a high-throughput RIBO-seq approach identified a myriad of translated sORFs. Their importance becomes evident as we are gaining experimental validation of their diverse cellular functions. This Review examines various computational and experimental approaches of sORFs identification as well as provides the summary of our current knowledge of their functional roles in cells.
Collapse
Affiliation(s)
- Anastasia Chugunova
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia.,Skolkovo Institute of Science and Technology , Skolkovo, Moscow Region 143025, Russia
| | - Tsimafei Navalayeu
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia
| | - Olga Dontsova
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia.,Skolkovo Institute of Science and Technology , Skolkovo, Moscow Region 143025, Russia
| | - Petr Sergiev
- Lomonosov Moscow State University , Department of Chemistry and A.N. Belozersky Institute of Physico-Chemical Biology, Moscow 119992, Russia.,Skolkovo Institute of Science and Technology , Skolkovo, Moscow Region 143025, Russia
| |
Collapse
|
17
|
Špoljarić D, Ugrina I. Limiting distribution of the number of clumps of palindromes in DNA. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2016.1189573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Drago Špoljarić
- Faculty of Mining, Geology and Petroleum Engineering, University of Zagreb, Zagreb, Croatia
| | - Ivo Ugrina
- Faculty of Science, Department of Mathematics, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
18
|
Nath Choudhury M, Uddin A, Chakraborty S. Codon usage bias and its influencing factors for Y-linked genes in human. Comput Biol Chem 2017; 69:77-86. [DOI: 10.1016/j.compbiolchem.2017.05.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2016] [Revised: 05/04/2017] [Accepted: 05/20/2017] [Indexed: 11/30/2022]
|
19
|
Edwards JR, Yarychkivska O, Boulard M, Bestor TH. DNA methylation and DNA methyltransferases. Epigenetics Chromatin 2017; 10:23. [PMID: 28503201 PMCID: PMC5422929 DOI: 10.1186/s13072-017-0130-8] [Citation(s) in RCA: 302] [Impact Index Per Article: 37.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 04/26/2017] [Indexed: 12/18/2022] Open
Abstract
The prevailing views as to the form, function, and regulation of genomic methylation patterns have their origin many years in the past, at a time when the structure of the mammalian genome was only dimly perceived, when the number of protein-encoding mammalian genes was believed to be at least five times greater than the actual number, and when it was not understood that only ~10% of the genome is under selective pressure and likely to have biological function. We use more recent findings from genome biology and whole-genome methylation profiling to provide a reappraisal of the shape of genomic methylation patterns and the nature of the changes that they undergo during gametogenesis and early development. We observe that the sequences that undergo deep changes in methylation status during early development are largely sequences without regulatory function. We also discuss recent findings that begin to explain the remarkable fidelity of maintenance methylation. Rather than a general overview of DNA methylation in mammals (which has been the subject of many reviews), we present a new analysis of the distribution of methylated CpG dinucleotides across the multiple sequence compartments that make up the mammalian genome, and we offer an updated interpretation of the nature of the changes in methylation patterns that occur in germ cells and early embryos. We discuss the cues that might designate specific sequences for demethylation or de novo methylation during development, and we summarize recent findings on mechanisms that maintain methylation patterns in mammalian genomes. We also describe the several human disorders, each very different from the other, that are caused by mutations in DNA methyltransferase genes.
Collapse
Affiliation(s)
- John R Edwards
- Center for Pharmacogenomics, Department of Medicine, Washington University School of Medicine, St. Louis, MO USA
| | - Olya Yarychkivska
- Department of Genetics and Development, College of Physicians and Surgeons of Columbia University, New York, NY USA
| | - Mathieu Boulard
- Department of Genetics and Development, College of Physicians and Surgeons of Columbia University, New York, NY USA
| | - Timothy H Bestor
- Department of Genetics and Development, College of Physicians and Surgeons of Columbia University, New York, NY USA
| |
Collapse
|
20
|
Quigley IK, Kintner C. Rfx2 Stabilizes Foxj1 Binding at Chromatin Loops to Enable Multiciliated Cell Gene Expression. PLoS Genet 2017; 13:e1006538. [PMID: 28103240 PMCID: PMC5245798 DOI: 10.1371/journal.pgen.1006538] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Accepted: 12/14/2016] [Indexed: 11/18/2022] Open
Abstract
Cooperative transcription factor binding at cis-regulatory sites in the genome drives robust eukaryotic gene expression, and many such sites must be coordinated to produce coherent transcriptional programs. The transcriptional program leading to motile cilia formation requires members of the DNA-binding forkhead (Fox) and Rfx transcription factor families and these factors co-localize to cilia gene promoters, but it is not clear how many cilia genes are regulated by these two factors, whether these factors act directly or indirectly, or how these factors act with specificity in the context of a 3-dimensional genome. Here, we use genome-wide approaches to show that cilia genes reside at the boundaries of topological domains and that these areas have low enhancer density. We show that the transcription factors Foxj1 and Rfx2 binding occurs in the promoters of more cilia genes than other known cilia transcription factors and that while Rfx2 binds directly to promoters and enhancers equally, Foxj1 prefers direct binding to enhancers and is stabilized at promoters by Rfx2. Finally, we show that Rfx2 and Foxj1 lie at the anchor endpoints of chromatin loops, suggesting that target genes are activated when Foxj1 bound at distal sites is recruited via a loop created by Rfx2 binding at both sites. We speculate that the primary function of Rfx2 is to stabilize distal enhancers with proximal promoters by operating as a scaffolding factor, bringing key regulatory domains bound by Foxj1 into close physical proximity and enabling coordinated cilia gene expression. The multiciliated cell extends hundreds of motile cilia to produce fluid flow in the airways and other organ systems. The formation of this specialized cell type requires the coordinated expression of hundreds of genes in order to produce all the protein parts motile cilia require. While a relatively small number of transcription factors has been identified that promote gene expression during multiciliate cell differentiation, it is not clear how they work together to coordinate the expression of genes required for multiple motile ciliation. Here, we show that two transcription factors known to drive cilia formation, Foxj1 and Rfx2, play complementary roles wherein Foxj1 activates target genes but tends not to bind near them in the genome, whereas Rfx2 can’t activate target genes by itself but instead acts as a scaffold by localizing Foxj1 to the proper targets. These results suggest not only a mechanism by which complex gene expression is coordinated in multiciliated cells, but also how transcriptional programs in general could be modular and deployed across different cellular contexts with the same basic promoter configuration.
Collapse
Affiliation(s)
- Ian K. Quigley
- Molecular Neurobiology Laboratory, Salk Institute for Biological Studies La Jolla, California, United States of America
- * E-mail:
| | - Chris Kintner
- Molecular Neurobiology Laboratory, Salk Institute for Biological Studies La Jolla, California, United States of America
| |
Collapse
|
21
|
Attig J, Ruiz de Los Mozos I, Haberman N, Wang Z, Emmett W, Zarnack K, König J, Ule J. Splicing repression allows the gradual emergence of new Alu-exons in primate evolution. eLife 2016; 5. [PMID: 27861119 PMCID: PMC5115870 DOI: 10.7554/elife.19545] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 11/01/2016] [Indexed: 01/01/2023] Open
Abstract
Alu elements are retrotransposons that frequently form new exons during primate evolution. Here, we assess the interplay of splicing repression by hnRNPC and nonsense-mediated mRNA decay (NMD) in the quality control and evolution of new Alu-exons. We identify 3100 new Alu-exons and show that NMD more efficiently recognises transcripts with Alu-exons compared to other exons with premature termination codons. However, some Alu-exons escape NMD, especially when an adjacent intron is retained, highlighting the importance of concerted repression by splicing and NMD. We show that evolutionary progression of 3' splice sites is coupled with longer repressive uridine tracts. Once the 3' splice site at ancient Alu-exons reaches a stable phase, splicing repression by hnRNPC decreases, but the exons generally remain sensitive to NMD. We conclude that repressive motifs are strongest next to cryptic exons and that gradual weakening of these motifs contributes to the evolutionary emergence of new alternative exons.
Collapse
Affiliation(s)
- Jan Attig
- Department of Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom.,MRC-Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Igor Ruiz de Los Mozos
- Department of Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Nejc Haberman
- Department of Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom
| | - Zhen Wang
- Institute de Biologie de l'ENS (IBENS), CNRS UMR 8197, Paris, France
| | - Warren Emmett
- Department of Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom.,University College London Genetics Institute, London, United Kingdom
| | - Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Frankfurt, Germany
| | - Julian König
- Institute of Molecular Biology (IMB), Mainz, Germany
| | - Jernej Ule
- Department of Molecular Neuroscience, UCL Institute of Neurology, London, United Kingdom.,MRC-Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
22
|
Yin C. Identification of repeats in DNA sequences using nucleotide distribution uniformity. J Theor Biol 2016; 412:138-145. [PMID: 27816675 DOI: 10.1016/j.jtbi.2016.10.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Revised: 10/10/2016] [Accepted: 10/27/2016] [Indexed: 11/30/2022]
Abstract
Repetitive elements are important in genomic structures, functions and regulations, yet effective methods in precisely identifying repetitive elements in DNA sequences are not fully accessible, and the relationship between repetitive elements and periodicities of genomes is not clearly understood. We present an ab initio method to quantitatively detect repetitive elements and infer the consensus repeat pattern in repetitive elements. The method uses the measure of the distribution uniformity of nucleotides at periodic positions in DNA sequences or genomes. It can identify periodicities, consensus repeat patterns, copy numbers and perfect levels of repetitive elements. The results of using the method on different DNA sequences and genomes demonstrate efficacy and accuracy in identifying repeat patterns and periodicities. The complexity of the method is linear with respect to the lengths of the analyzed sequences. The Python programs in this study are freely available to the public upon request or at https://github.com/cyinbox/DNADU.
Collapse
Affiliation(s)
- Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, The University of Illinois at Chicago, Chicago, IL 60607-7045, USA.
| |
Collapse
|
23
|
Fuertes MA, Rodrigo JR, Alonso C. Do Intron and Coding Sequences of Some Human-Mouse Orthologs Evolve as a Single Unit? J Mol Evol 2016; 82:247-50. [PMID: 27220874 DOI: 10.1007/s00239-016-9746-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 05/11/2016] [Indexed: 11/25/2022]
Abstract
It has been previously suggested that both the coding and the associated non-coding sequences of some human-mouse orthologs could evolve as a single unit. This letter deals with the observation that between mouse and humans some orthologs change significantly their compositional features as an indication that the molecular evolution is a local process. Moreover, the data shown indicate that the coding and the intron sequences of these orthologs do not evolve independently but instead both undergo a concerted evolution, evolving as a single unit, from a compositional cluster in mouse to a different compositional cluster in human.
Collapse
Affiliation(s)
- Miguel Angel Fuertes
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain.
| | | | - Carlos Alonso
- Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, c/Nicolás Cabrera 1, 28049, Madrid, Spain
| |
Collapse
|
24
|
Makova KD, Hardison RC. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 2015; 16:213-23. [PMID: 25732611 PMCID: PMC4500049 DOI: 10.1038/nrg3890] [Citation(s) in RCA: 160] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The variation in local rates of mutations can affect both the evolution of genes and their function in normal and cancer cells. Deciphering the molecular determinants of this variation will be aided by the elucidation of distinct types of mutations, as they differ in regional preferences and in associations with genomic features. Chromatin organization contributes to regional variation in mutation rates, but its contribution differs among mutation types. In both germline and somatic mutations, base substitutions are more abundant in regions of closed chromatin, perhaps reflecting error accumulation late in replication. By contrast, a distinctive mutational state with very high levels of insertions and deletions (indels) and substitutions is enriched in regions of open chromatin. These associations indicate an intricate interplay between the nucleotide sequence of DNA and its dynamic packaging into chromatin, and have important implications for current biomedical research. This Review focuses on recent studies showing associations between chromatin state and mutation rates, including pairwise and multivariate investigations of germline and somatic (particularly cancer) mutations.
Collapse
Affiliation(s)
- Kateryna D Makova
- Department of Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, Huck Institute for Genome Sciences, The Pennsylvania State University, University Park, State College, Pennsylvania 16802, USA
| |
Collapse
|
25
|
Haerty W, Ponting CP. Unexpected selection to retain high GC content and splicing enhancers within exons of multiexonic lncRNA loci. RNA (NEW YORK, N.Y.) 2015; 21:333-46. [PMID: 25589248 PMCID: PMC4338330 DOI: 10.1261/rna.047324.114] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Accepted: 11/25/2014] [Indexed: 06/04/2023]
Abstract
If sequencing was possible only for genomes, and not for RNAs or proteins, then functional protein-coding exons would be recognizable by their unusual patterns of nucleotide composition, specifically a high GC content across the body of exons, and an unusual nucleotide content near their edges. RNAs and proteins can, of course, be sequenced but the extent of functionality of intergenic long noncoding RNAs (lncRNAs) remains under question owing to their low nucleotide conservation. Inspired by the nucleotide composition patterns of protein-coding exons, we sought evidence for functionality across lncRNA loci from diverse species. We found that such patterns across multiexonic lncRNA loci mirror those of proteincoding genes, although to a lesser degree: Specifically, compared with introns, lncRNA exons are GC rich. Additionally we report evidence for the action of purifying selection to preserve exonic splicing enhancers within human multiexonic lncRNAs and nucleotide composition in fruit fly lncRNAs. Our findings provide evidence for selection for more efficient rates of transcription and splicing within lncRNA loci. Despite only a minor proportion of their RNA bases being constrained, multiexonic intergenic lncRNAs appear to require accurate splicing of their exons to transact their function.
Collapse
|
26
|
Abstract
It has been nearly 40 y since it was suggested that genomic methylation patterns could be transmitted via maintenance methylation during S phase and might play a role in the dynamic regulation of gene expression during development [Holliday R, Pugh JE (1975) Science 187(4173):226-232; Riggs AD (1975) Cytogenet Cell Genet 14(1):9-25]. This revolutionary proposal was justified by "... our almost complete ignorance of the mechanism for the unfolding of the genetic program during development" that prevailed at the time. Many correlations between transcriptional activation and demethylation have since been reported, but causation has not been demonstrated and to date there is no reasonable proof of the existence of a complex biochemical system that activates and represses genes via reversible DNA methylation. Such a system would supplement or replace the conserved web of transcription factors that regulate cellular differentiation in organisms that have unmethylated genomes (such as Caenorhaditis elegans and the Dipteran insects) and those that methylate their genomes. DNA methylation does have essential roles in irreversible promoter silencing, as in the monoallelic expression of imprinted genes, in the silencing of transposons, and in X chromosome inactivation in female mammals. Rather than reinforcing or replacing regulatory pathways that are conserved between organisms that have either methylated or unmethylated genomes, DNA methylation endows genomes with the ability to subject specific sequences to irreversible transcriptional silencing even in the presence of all of the factors required for their expression, an ability that is generally unavailable to organisms that have unmethylated genomes.
Collapse
|
27
|
Variation and constraints in species-specific promoter sequences. J Theor Biol 2014; 363:357-66. [PMID: 25149367 DOI: 10.1016/j.jtbi.2014.08.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Revised: 07/30/2014] [Accepted: 08/04/2014] [Indexed: 11/24/2022]
Abstract
A vast literature is nowadays devoted to the search of correlations between transcription related functions and the composition of sequences upstream the Transcription Start Site. Little is known about the possible functional effects of nucleotide distributions on the conformational landscape of DNA in such regions. We have used suitable statistical indicators for identifying sequences that may play an important role in regulating transcription processes. In particular, we have analyzed base composition, periodicity and information content in sets of aligned promoters clustered according to functional information in order to obtain an insight on the main structural differences between promoters regulating genes with different functions. Our results show that when we select promoters according to some biological information, in a single species, at least in vertebrates, we observe structurally different classes of sequences. The highly variable and differentiated gene expression patterns may explain the great extent of structural differentiation observed in complex organisms. In fact, despite our analysis is focused on Homo sapiens, we provide also a comparison with other species, selected at different positions in the phylogenetic tree.
Collapse
|
28
|
Saponaro M, Kantidakis T, Mitter R, Kelly GP, Heron M, Williams H, Söding J, Stewart A, Svejstrup JQ. RECQL5 controls transcript elongation and suppresses genome instability associated with transcription stress. Cell 2014; 157:1037-49. [PMID: 24836610 PMCID: PMC4032574 DOI: 10.1016/j.cell.2014.03.048] [Citation(s) in RCA: 146] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 01/21/2014] [Accepted: 03/13/2014] [Indexed: 01/03/2023]
Abstract
RECQL5 is the sole member of the RECQ family of helicases associated with RNA polymerase II (RNAPII). We now show that RECQL5 is a general elongation factor that is important for preserving genome stability during transcription. Depletion or overexpression of RECQL5 results in corresponding shifts in the genome-wide RNAPII density profile. Elongation is particularly affected, with RECQL5 depletion causing a striking increase in the average rate, concurrent with increased stalling, pausing, arrest, and/or backtracking (transcription stress). RECQL5 therefore controls the movement of RNAPII across genes. Loss of RECQL5 also results in the loss or gain of genomic regions, with the breakpoints of lost regions located in genes and common fragile sites. The chromosomal breakpoints overlap with areas of elevated transcription stress, suggesting that RECQL5 suppresses such stress and its detrimental effects, and thereby prevents genome instability in the transcribed region of genes.
Collapse
Affiliation(s)
- Marco Saponaro
- Mechanisms of Transcription Laboratory, Clare Hall Laboratories, Cancer Research UK London Research Institute, South Mimms, EN6 3LD, UK
| | - Theodoros Kantidakis
- Mechanisms of Transcription Laboratory, Clare Hall Laboratories, Cancer Research UK London Research Institute, South Mimms, EN6 3LD, UK
| | - Richard Mitter
- Bioinformatics and Biostatistics Group, Cancer Research UK London Research Institute, 44 Lincoln's Inn Fields, London WC2A 3LY, UK
| | - Gavin P Kelly
- Bioinformatics and Biostatistics Group, Cancer Research UK London Research Institute, 44 Lincoln's Inn Fields, London WC2A 3LY, UK
| | - Mark Heron
- Gene Center and Center for Integrated Protein Science Munich (CIPSM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany
| | - Hannah Williams
- Mechanisms of Transcription Laboratory, Clare Hall Laboratories, Cancer Research UK London Research Institute, South Mimms, EN6 3LD, UK
| | - Johannes Söding
- Gene Center and Center for Integrated Protein Science Munich (CIPSM), Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, 81377 Munich, Germany
| | - Aengus Stewart
- Bioinformatics and Biostatistics Group, Cancer Research UK London Research Institute, 44 Lincoln's Inn Fields, London WC2A 3LY, UK
| | - Jesper Q Svejstrup
- Mechanisms of Transcription Laboratory, Clare Hall Laboratories, Cancer Research UK London Research Institute, South Mimms, EN6 3LD, UK.
| |
Collapse
|
29
|
Genome-wide analysis of promoters: clustering by alignment and analysis of regular patterns. PLoS One 2014; 9:e85260. [PMID: 24465517 PMCID: PMC3898993 DOI: 10.1371/journal.pone.0085260] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 11/26/2013] [Indexed: 01/08/2023] Open
Abstract
In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.
Collapse
|
30
|
Špoljarić D, Ugrina I. On Statistical Properties of Palindromes in DNA. COMMUN STAT-THEOR M 2013. [DOI: 10.1080/03610926.2012.739253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
31
|
Soon WW, Hariharan M, Snyder MP. High-throughput sequencing for biology and medicine. Mol Syst Biol 2013; 9:640. [PMID: 23340846 PMCID: PMC3564260 DOI: 10.1038/msb.2012.61] [Citation(s) in RCA: 176] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/29/2012] [Indexed: 02/06/2023] Open
Abstract
Advances in genome sequencing have progressed at a rapid pace, with increased throughput accompanied by plunging costs. But these advances go far beyond faster and cheaper. High-throughput sequencing technologies are now routinely being applied to a wide range of important topics in biology and medicine, often allowing researchers to address important biological questions that were not possible before. In this review, we discuss these innovative new approaches-including ever finer analyses of transcriptome dynamics, genome structure and genomic variation-and provide an overview of the new insights into complex biological systems catalyzed by these technologies. We also assess the impact of genotyping, genome sequencing and personal omics profiling on medical applications, including diagnosis and disease monitoring. Finally, we review recent developments in single-cell sequencing, and conclude with a discussion of possible future advances and obstacles for sequencing in biology and health.
Collapse
Affiliation(s)
- Wendy Weijia Soon
- Department of Genetics, Stanford University School of Medicine, Alway Building, 300 Pasteur Drive, Stanford, CA, USA
| | - Manoj Hariharan
- Department of Genetics, Stanford University School of Medicine, Alway Building, 300 Pasteur Drive, Stanford, CA, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Alway Building, 300 Pasteur Drive, Stanford, CA, USA
| |
Collapse
|
32
|
Liou SW, Huang YF. An exon/intron disparity framework based on the nucleotide profile of single sequence. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/s13721-012-0007-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
33
|
McLean MA, Tirosh I. Opposite GC skews at the 5' and 3' ends of genes in unicellular fungi. BMC Genomics 2011; 12:638. [PMID: 22208287 PMCID: PMC3315797 DOI: 10.1186/1471-2164-12-638] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 12/30/2011] [Indexed: 11/24/2022] Open
Abstract
Background GC-skews have previously been linked to transcription in some eukaryotes. They have been associated with transcription start sites, with the coding strand G-biased in mammals and C-biased in fungi and invertebrates. Results We show a consistent and highly significant pattern of GC-skew within genes of almost all unicellular fungi. The pattern of GC-skew is asymmetrical: the coding strand of genes is typically C-biased at the 5' ends but G-biased at the 3' ends, with intermediate skews at the middle of genes. Thus, the initiation, elongation, and termination phases of transcription are associated with different skews. This pattern influences the encoded proteins by generating differential usage of amino acids at the 5' and 3' ends of genes. These biases also affect fourfold-degenerate positions and extend into promoters and 3' UTRs, indicating that skews cannot be accounted by selection for protein function or translation. Conclusions We propose two explanations, the mutational pressure hypothesis, and the adaptive hypothesis. The mutational pressure hypothesis is that different co-factors bind to RNA pol II at different phases of transcription, producing different mutational regimes. The adaptive hypothesis is that cytidine triphosphate deficiency may lead to C-avoidance at the 3' ends of transcripts to control the flow of RNA pol II molecules and reduce their frequency of collisions.
Collapse
Affiliation(s)
- Malcolm A McLean
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
| | | |
Collapse
|
34
|
Managadze D, Rogozin IB, Chernikova D, Shabalina SA, Koonin EV. Negative correlation between expression level and evolutionary rate of long intergenic noncoding RNAs. Genome Biol Evol 2011; 3:1390-404. [PMID: 22071789 PMCID: PMC3242500 DOI: 10.1093/gbe/evr116] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
Collapse
Affiliation(s)
- David Managadze
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | | | |
Collapse
|
35
|
Calistri E, Livi R, Buiatti M. Evolutionary trends of GC/AT distribution patterns in promoters. Mol Phylogenet Evol 2011; 60:228-35. [PMID: 21554969 DOI: 10.1016/j.ympev.2011.04.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Revised: 03/25/2011] [Accepted: 04/17/2011] [Indexed: 11/18/2022]
Abstract
Nucleotide distributions in genomes is known not to be random, showing the presence of specific motifs, long and short range correlations, periodicities, etc. Particularly, motifs are critical for the recognition by specific proteins affecting chromosome organization, transcription and DNA replication but little is known about the possible functional effects of nucleotide distributions on the conformational landscape of DNA, putatively leading to differential selective pressures throughout evolution. Promoter sequences have a fundamental role in the regulation of gene activity and a vast literature suggests that their conformational landscapes may be a critical factor in gene expression dynamics. On these grounds, with the aim of investigating the putative existence of phylogenetic patterns of promoter base distributions, we analyzed GC/AT ratios along the 1000 nucleotide sequences upstream of TSS in wide sets of promoters belonging to organisms ranging from bacteria to pluricellular eukaryotes. The data obtained showed very clear phylogenetic trends throughout evolution of promoter sequence base distributions. Particularly, in all cases either GC-rich or AT-rich monotone gradients were observed: the former being present in eukaryotes, the latter in bacteria along with strand biases. Moreover, within eukaryotes, GC-rich gradients increased in length from unicellular organisms to plants, to vertebrates and, within them, from ancestral to more recent species. Finally, results were thoroughly discussed with particular attention to the possible correlation between nucleotide distribution patterns, evolution, and the putative existence of differential selection pressures, deriving from structural and/or functional constraints, between and within prokaryotes and eukaryotes.
Collapse
Affiliation(s)
- Elisa Calistri
- Dipartimento di Biologia Evoluzionistica, Universita' degli Studi di Firenze, via Romana 19, 50125 Firenze, Italy.
| | | | | |
Collapse
|
36
|
Lu ZX, Jiang P, Cai JJ, Xing Y. Context-dependent robustness to 5' splice site polymorphisms in human populations. Hum Mol Genet 2011; 20:1084-96. [PMID: 21224255 PMCID: PMC3043661 DOI: 10.1093/hmg/ddq553] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2010] [Revised: 12/12/2010] [Accepted: 12/20/2010] [Indexed: 01/23/2023] Open
Abstract
There has been growing evidence for extensive diversity of alternative splicing in human populations. Genetic variants within the 5' splice site can cause splicing differences among human individuals and constitute an important class of human disease mutations. In this study, we explored whether natural variations of splicing could reveal important signals of 5' splice site recognition. In seven lymphoblastoid cell lines of Asian, European and African ancestry, we identified 1174 single nucleotide polymorphisms (SNPs) within the consensus 5' splice site. We selected 129 SNPs predicted to significantly alter the splice site activity, and quantitatively examined their splicing impact in the seven individuals. Surprisingly, outside of the essential GT dinucleotide position, only ∼14% of the tested SNPs altered splicing. Bioinformatic and minigene analyses identified signals that could modify the impact of 5' splice site polymorphisms, most notably a strong 3' splice site and the presence of intronic motifs downstream of the 5' splice site. Strikingly, we found that the poly-G run, a known intronic splicing enhancer, was the most significantly enriched motif downstream of exons unaffected by 5' splice site SNPs. In TRIM62, the upstream 3' splice site and downstream intronic poly-G runs functioned redundantly to protect an exon from its 5' splice site polymorphism. Collectively, our study reveals widespread context-dependent robustness to 5' splice site polymorphisms in human transcriptomes. Consequently, certain exons are more susceptible to 5' splice site mutations. Additionally, our work demonstrates that genetic diversity of alternative splicing can provide significant insights into the splicing code of mammalian cells.
Collapse
Affiliation(s)
| | | | - James J. Cai
- Department of Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Yi Xing
- Department of Internal Medicine and
- Department of Biomedical Engineering, University of Iowa, 3294 CBRB, 285 Newton Rd, Iowa City, IA 52242, USA and
| |
Collapse
|
37
|
SpliceIT: a hybrid method for splice signal identification based on probabilistic and biological inference. J Biomed Inform 2009; 43:208-17. [PMID: 19800027 DOI: 10.1016/j.jbi.2009.09.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2008] [Revised: 08/25/2009] [Accepted: 09/21/2009] [Indexed: 11/23/2022]
Abstract
Splice sites define the boundaries of exonic regions and dictate protein synthesis and function. The splicing mechanism involves complex interactions among positional and compositional features of different lengths. Computational modeling of the underlying constructive information is especially challenging, in order to decipher splicing-inducing elements and alternative splicing factors. SpliceIT (Splice Identification Technique) introduces a hybrid method for splice site prediction that couples probabilistic modeling with discriminative computational or experimental features inferred from published studies in two subsequent classification steps. The first step is undertaken by a Gaussian support vector machine (SVM) trained on the probabilistic profile that is extracted using two alternative position-dependent feature selection methods. In the second step, the extracted predictions are combined with known species-specific regulatory elements, in order to induce a tree-based modeling. The performance evaluation on human and Arabidopsis thaliana splice site datasets shows that SpliceIT is highly accurate compared to current state-of-the-art predictors in terms of the maximum sensitivity, specificity tradeoff without compromising space complexity and in a time-effective way. The source code and supplementary material are available at: http://www.med.auth.gr/research/spliceit/.
Collapse
|
38
|
Kren BT, Unger GM, Sjeklocha L, Trossen AA, Korman V, Diethelm-Okita BM, Reding MT, Steer CJ. Nanocapsule-delivered Sleeping Beauty mediates therapeutic Factor VIII expression in liver sinusoidal endothelial cells of hemophilia A mice. J Clin Invest 2009; 119:2086-99. [PMID: 19509468 DOI: 10.1172/jci34332] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2007] [Accepted: 04/22/2009] [Indexed: 12/16/2022] Open
Abstract
Liver sinusoidal endothelial cells are a major endogenous source of Factor VIII (FVIII), lack of which causes the human congenital bleeding disorder hemophilia A. Despite extensive efforts, gene therapy using viral vectors has shown little success in clinical hemophilia trials. Here we achieved cell type-specific gene targeting using hyaluronan- and asialoorosomucoid-coated nanocapsules, generated using dispersion atomization, to direct genes to liver sinusoidal endothelial cells and hepatocytes, respectively. To highlight the therapeutic potential of this approach, we encapsulated Sleeping Beauty transposon expressing the B domain-deleted canine FVIII in cis with Sleeping Beauty transposase in hyaluronan nanocapsules and injected them intravenously into hemophilia A mice. The treated mice exhibited activated partial thromboplastin times that were comparable to those of wild-type mice at 5 and 50 weeks and substantially shorter than those of untreated controls at the same time points. Further, plasma FVIII activity in the treated hemophilia A mice was nearly identical to that in wild-type mice through 50 weeks, while untreated hemophilia A mice exhibited no detectable FVIII activity. Thus, Sleeping Beauty transposon targeted to liver sinusoidal endothelial cells provided long-term expression of FVIII, without apparent antibody formation, and improved the phenotype of hemophilia A mice.
Collapse
Affiliation(s)
- Betsy T Kren
- Department of Medicine, University of Minnesota Medical School, Minneapolis, Minnesota 55455, USA
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Ivashchenko AT, Khailenko VA, Atambaeva SA. Variations of the length of exons and introns in human genome genes. RUSS J GENET+ 2009. [DOI: 10.1134/s1022795409010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
40
|
Shabalina SA, Zaykin DV, Gris P, Ogurtsov AY, Gauthier J, Shibata K, Tchivileva IE, Belfer I, Mishra B, Kiselycznyk C, Wallace MR, Staud R, Spiridonov NA, Max MB, Goldman D, Fillingim RB, Maixner W, Diatchenko L. Expansion of the human mu-opioid receptor gene architecture: novel functional variants. Hum Mol Genet 2008; 18:1037-51. [PMID: 19103668 PMCID: PMC2649019 DOI: 10.1093/hmg/ddn439] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The μ-opioid receptor (OPRM1) is the principal receptor target for both endogenous and exogenous opioid analgesics. There are substantial individual differences in human responses to painful stimuli and to opiate drugs that are attributed to genetic variations in OPRM1. In searching for new functional variants, we employed comparative genome analysis and obtained evidence for the existence of an expanded human OPRM1 gene locus with new promoters, alternative exons and regulatory elements. Examination of polymorphisms within the human OPRM1 gene locus identified strong association between single nucleotide polymorphism (SNP) rs563649 and individual variations in pain perception. SNP rs563649 is located within a structurally conserved internal ribosome entry site (IRES) in the 5′-UTR of a novel exon 13-containing OPRM1 isoforms (MOR-1K) and affects both mRNA levels and translation efficiency of these variants. Furthermore, rs563649 exhibits very strong linkage disequilibrium throughout the entire OPRM1 gene locus and thus affects the functional contribution of the corresponding haplotype that includes other functional OPRM1 SNPs. Our results provide evidence for an essential role for MOR-1K isoforms in nociceptive signaling and suggest that genetic variations in alternative OPRM1 isoforms may contribute to individual differences in opiate responses.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Bemmo A, Benovoy D, Kwan T, Gaffney DJ, Jensen RV, Majewski J. Gene expression and isoform variation analysis using Affymetrix Exon Arrays. BMC Genomics 2008; 9:529. [PMID: 18990248 PMCID: PMC2585104 DOI: 10.1186/1471-2164-9-529] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Accepted: 11/07/2008] [Indexed: 12/22/2022] Open
Abstract
Background Alternative splicing and isoform level expression profiling is an emerging field of interest within genomics. Splicing sensitive microarrays, with probes targeted to individual exons or exon-junctions, are becoming increasingly popular as a tool capable of both expression profiling and finer scale isoform detection. Despite their intuitive appeal, relatively little is known about the performance of such tools, particularly in comparison with more traditional 3' targeted microarrays. Here, we use the well studied Microarray Quality Control (MAQC) dataset to benchmark the Affymetrix Exon Array, and compare it to two other popular platforms: Illumina, and Affymetrix U133. Results We show that at the gene expression level, the Exon Array performs comparably with the two 3' targeted platforms. However, the interplatform correlation of the results is slightly lower than between the two 3' arrays. We show that some of the discrepancies stem from the RNA amplification protocols, e.g. the Exon Array is able to detect expression of non-polyadenylated transcripts. More importantly, we show that many other differences result from the ability of the Exon Array to monitor more detailed isoform-level changes; several examples illustrate that changes detected by the 3' platforms are actually isoform variations, and that the nature of these variations can be resolved using Exon Array data. Finally, we show how the Exon Array can be used to detect alternative isoform differences, such as alternative splicing, transcript termination, and alternative promoter usage. We discuss the possible pitfalls and false positives resulting from isoform-level analysis. Conclusion The Exon Array is a valuable tool that can be used to profile gene expression while providing important additional information regarding the types of gene isoforms that are expressed and variable. However, analysis of alternative splicing requires much more hands on effort and visualization of results in order to correctly interpret the data, and generally results in considerably higher false positive rates than expression analysis. One of the main sources of error in the MAQC dataset is variation in amplification efficiency across transcripts, most likely caused by joint effects of elevated GC content in the 5' ends of genes and reduced likelihood of random-primed first strand synthesis in the 3' ends of genes. These effects are currently not adequately corrected using existing statistical methods. We outline approaches to reduce such errors by filtering out potentially problematic data.
Collapse
|
42
|
Gorlov IP, Gorlova OY, Amos CI. Relative effects of mutability and selection on single nucleotide polymorphisms in transcribed regions of the human genome. BMC Genomics 2008; 9:292. [PMID: 18559102 PMCID: PMC2442617 DOI: 10.1186/1471-2164-9-292] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2008] [Accepted: 06/17/2008] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Single nucleotide polymorphisms (SNPs) are the most common type of genetic variation in humans. However, the factors that affect SNP density are poorly understood. The goal of this study was to estimate the relative effects of mutability and selection on SNP density in transcribed regions of human genes. It is important for prediction of the regions that harbor functional polymorphisms. RESULTS We used frequency-validated SNPs resulting from single-nucleotide substitutions. SNPs were subdivided into five functional categories: (i) 5' untranslated region (UTR) SNPs, (ii) 3' UTR SNPs, (iii) synonymous SNPs, (iv) SNPs producing conservative missense mutations, and (v) SNPs producing radical missense mutations. Each of these categories was further subdivided into nine mutational categories on the basis of the single-nucleotide substitution type. Thus, 45 functional/mutational categories were analyzed. The relative mutation rate in each mutational category was estimated on the basis of published data. The proportion of segregating sites (PSSs) for each functional/mutational category was estimated by dividing the observed number of SNPs by the number of potential sites in the genome for a given functional/mutational category. By analyzing each functional group separately, we found significant positive correlations between PSSs and relative mutation rates (Spearman's correlation coefficient, at least r = 0.96, df = 9, P < 0.001). We adjusted the PSSs for the mutation rate and found that the functional category had a significant effect on SNP density (F = 5.9, df = 4, P = 0.001), suggesting that selection affects SNP density in transcribed regions of the genome. We used analyses of variance and covariance to estimate the relative effects of selection (functional category) and mutability (relative mutation rate) on the PSSs and found that approximately 87% of variation in PSS was due to variation in the mutation rate and approximately 13% was due to selection, suggesting that the probability that a site located in a transcribed region of a gene is polymorphic mostly depends on the mutability of the site.
Collapse
Affiliation(s)
- Ivan P Gorlov
- Department of Epidemiology, The University of Texas M D Anderson Cancer Center, Houston, Texas 77030, USA.
| | | | | |
Collapse
|
43
|
Abstract
A regional analysis of nucleotide substitution rates along human genes and their flanking regions allows us to quantify the effect of mutational mechanisms associated with transcription in germ line cells. Our analysis reveals three distinct patterns of substitution rates. First, a sharp decline in the deamination rate of methylated CpG dinucleotides, which is observed in the vicinity of the 5' end of genes. Second, a strand asymmetry in complementary substitution rates, which extends from the 5' end to 1 kbp downstream from the 3' end, associated with transcription-coupled repair. Finally, a localized strand asymmetry, an excess of C-->T over G-->A substitution in the nontemplate strand confined to the first 1-2 kbp downstream of the 5' end of genes. We hypothesize that higher exposure of the nontemplate strand near the 5' end of genes leads to a higher cytosine deamination rate. Up to now, only the somatic hypermutation (SHM) pathway has been known to mediate localized and strand-specific mutagenic processes associated with transcription in mammalia. The mutational patterns in SHM are induced by cytosine deaminase, which just targets single-stranded DNA. This DNA conformation is induced by R-loops, which preferentially occur at the 5' ends of genes. We predict that R-loops are extensively formed in the beginning of transcribed regions in germ line cells.
Collapse
|
44
|
Searching for splicing motifs. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2008; 623:85-106. [PMID: 18380342 DOI: 10.1007/978-0-387-77374-2_6] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Intron removal during pre-mRNA splicing in higher eukaryotes requires the accurate identification of the two splice sites at the ends of the exons, or exon definition. The sequences constituting the splice sites provide insufficient information to distinguish true splice sites from the greater number of false splice sites that populate transcripts. Additional information used for exon recognition resides in a large number of positively or negatively acting elements that lie both within exons and in the adjacent introns. The identification of such sequence motifs has progressed rapidly in recent years, such that extensive lists are now available for exonic splicing enhancers and exonic splicing silencers. These motifs have been identified both by empirical experiments and by computational predictions, the validity of the latter being confirmed by experimental verification. Molecular searches have been carried out either by the selection of sequences that bind to splicing factors, or enhance or silence splicing in vitro or in vivo. Computational methods have focused on sequences of 6 or 8 nucleotides that are over- or under-represented in exons, compared to introns or transcripts that do not undergo splicing. These various methods have sought to provide global definitions of motifs, yet the motifs are distinctive to the method used for identification and display little overlap. Astonishingly, at least three-quarters of a typical mRNA would be comprised of these motifs. A present challenge lies in understanding how the cell integrates this surfeit of information to generate what is usually a binary splicing decision.
Collapse
|
45
|
Evans KJ. Genomic DNA from animals shows contrasting strand bias in large and small subsequences. BMC Genomics 2008; 9:43. [PMID: 18221531 PMCID: PMC2267173 DOI: 10.1186/1471-2164-9-43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Accepted: 01/25/2008] [Indexed: 01/09/2023] Open
Abstract
Background For eukaryotes, there is almost no strand bias with regard to base composition, with exceptions for origins of replication and transcription start sites and transcribed regions. This paper revisits the question for subsequences of DNA taken at random from the genome. Results For a typical mammal, for example mouse or human, there is a small strand bias throughout the genomic DNA: there is a correlation between (G - C) and (A - T) on the same strand, (that is between the difference in the number of guanine and cytosine bases and the difference in the number of adenine and thymine bases). For small subsequences – up to 1 kb – this correlation is weak but positive; but for large windows – around 50 kb to 2 Mb – the correlation is strong and negative. This effect is largely independent of GC%. Transcribed and untranscribed regions give similar correlations both for small and large subsequences, but there is a difference in these regions for intermediate sized subsequences. An analysis of the human genome showed that position within the isochore structure did not affect these correlations. An analysis of available genomes of different species shows that this contrast between large and small windows is a general feature of mammals and birds. Further down the evolutionary tree, other organisms show a similar but smaller effect. Except for the nematode, all the animals analysed showed at least a small effect. Conclusion The correlations on the large scale may be explained by DNA replication. Transcription may be a modifier of these effects but is not the fundamental cause. These results cast light on how DNA mutations affect the genome over evolutionary time. At least for vertebrates, there is a broad relationship between body temperature and the size of the correlation. The genome of mammals and birds has a structure marked by strand bias segments.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
46
|
Ke S, Zhang XHF, Chasin LA. Positive selection acting on splicing motifs reflects compensatory evolution. Genome Res 2008; 18:533-43. [PMID: 18204002 DOI: 10.1101/gr.070268.107] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We have used comparative genomics to characterize the evolutionary behavior of predicted splicing regulatory motifs. Using base substitution rates in intronic regions as a calibrator for neutral change, we found a strong avoidance of synonymous substitutions that disrupt predicted exonic splicing enhancers or create predicted exonic splicing silencers. These results attest to the functionality of the hexameric motif set used and suggest that they are subject to purifying selection. We also found that synonymous substitutions in constitutive exons tend to create exonic splicing enhancers and to disrupt exonic splicing silencers, implying positive selection for these splicing promoting events. We present evidence that this positive selection is the result of splicing-positive events compensating for splicing-negative events as well as for mutations that weaken splice-site sequences. Such compensatory events include nonsynonymous mutations, synonymous mutations, and mutations at splice sites. Compensation was also seen from the fact that orthologous exons tend to maintain the same number of predicted splicing motifs. Our data fit a splicing compensation model of exon evolution, in which selection for splicing-positive mutations takes place to counter the effect of an ongoing splicing-negative mutational process, with the exon as a whole being conserved as a unit of splicing. In the course of this analysis, we observed that synonymous positions in general are conserved relative to intronic sequences, suggesting that messenger RNA molecules are rich in sequence information for functions beyond protein coding and splicing.
Collapse
Affiliation(s)
- Shengdong Ke
- Department of Biological Sciences Columbia University New York, New York 10027, USA
| | | | | |
Collapse
|
47
|
Evans KJ. Strand bias structure in mouse DNA gives a glimpse of how chromatin structure affects gene expression. BMC Genomics 2008; 9:16. [PMID: 18194530 PMCID: PMC2266913 DOI: 10.1186/1471-2164-9-16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2007] [Accepted: 01/14/2008] [Indexed: 12/20/2022] Open
Abstract
Background On a single strand of genomic DNA the number of As is usually about equal to the number of Ts (and similarly for Gs and Cs), but deviations have been noted for transcribed regions and origins of replication. Results The mouse genome is shown to have a segmented structure defined by strand bias. Transcription is known to cause a strand bias and numerous analyses are presented to show that the strand bias in question is not caused by transcription. However, these strand bias segments influence the position of genes and their unspliced length. The position of genes within the strand bias structure affects the probability that a gene is switched on and its expression level. Transcription has a highly directional flow within this structure and the peak volume of transcription is around 20 kb from the A-rich/T-rich segment boundary on the T-rich side, directed away from the boundary. The A-rich/T-rich boundaries are SATB1 binding regions, whereas the T-rich/A-rich boundary regions are not. Conclusion The direct cause of the strand bias structure may be DNA replication. The strand bias segments represent a further biological feature, the chromatin structure, which in turn influences the ease of transcription.
Collapse
Affiliation(s)
- Kenneth J Evans
- School of Crystallography, Birkbeck College, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
48
|
DNA sequence and structural properties as predictors of human and mouse promoters. Gene 2007; 410:165-76. [PMID: 18234453 PMCID: PMC2672154 DOI: 10.1016/j.gene.2007.12.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Revised: 11/30/2007] [Accepted: 12/05/2007] [Indexed: 11/21/2022]
Abstract
Promoters play a central role in gene regulation, yet our power to discriminate them from non-promoter sequences in higher eukaryotes is mainly restricted to those associated with CpG islands. Here, we examined in silico the promoters of 30,954 human and 18,083 mouse transcripts in the DBTSS database, to assess the impact of particular sequence and structural features (propeller twist, bendability and nucleosome positioning preference) on promoter classification and prediction. Our analysis showed that a stricter-than-traditional definition of CpG islands captures low and high CpG count promoter classes more accurately than the traditional one. We observed that both human and mouse promoter sequences are flexible with the exception of the TATA box and TSS, which are rigid regions irrespective of association with a CpG island. Therefore varying levels of structural flexibility in promoters may affect their accessibility to proteins, and hence their specificity. For all features investigated, averaged values across core promoters discriminated CpG island associated promoters from background, whereas the same did not hold for promoters without a CpG island. However, local changes around - 34 to - 23 (expected position of TATA box) and the TSS were informative in discriminating promoters (both classes) from non-promoter sequences. Additionally, we investigated ATG deserts and observed that they occur in all promoter sets except those with a TATA-box and without a CpG island in human. Interestingly, all mouse promoter sets showed ATG codon depletion irrespective of the presence of a TATA-box, possibly reflecting a weaker contribution to TSS specificity in mouse.
Collapse
|
49
|
Onouchi Y, Gunji T, Burns JC, Shimizu C, Newburger JW, Yashiro M, Nakamura Y, Yanagawa H, Wakui K, Fukushima Y, Kishi F, Hamamoto K, Terai M, Sato Y, Ouchi K, Saji T, Nariai A, Kaburagi Y, Yoshikawa T, Suzuki K, Tanaka T, Nagai T, Cho H, Fujino A, Sekine A, Nakamichi R, Tsunoda T, Kawasaki T, Nakamura Y, Hata A. ITPKC functional polymorphism associated with Kawasaki disease susceptibility and formation of coronary artery aneurysms. Nat Genet 2007; 40:35-42. [PMID: 18084290 DOI: 10.1038/ng.2007.59] [Citation(s) in RCA: 368] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2007] [Accepted: 10/02/2007] [Indexed: 01/19/2023]
Abstract
Kawasaki disease is a pediatric systemic vasculitis of unknown etiology for which a genetic influence is suspected. We identified a functional SNP (itpkc_3) in the inositol 1,4,5-trisphosphate 3-kinase C (ITPKC) gene on chromosome 19q13.2 that is significantly associated with Kawasaki disease susceptibility and also with an increased risk of coronary artery lesions in both Japanese and US children. Transfection experiments showed that the C allele of itpkc_3 reduces splicing efficiency of the ITPKC mRNA. ITPKC acts as a negative regulator of T-cell activation through the Ca2+/NFAT signaling pathway, and the C allele may contribute to immune hyper-reactivity in Kawasaki disease. This finding provides new insights into the mechanisms of immune activation in Kawasaki disease and emphasizes the importance of activated T cells in the pathogenesis of this vasculitis.
Collapse
Affiliation(s)
- Yoshihiro Onouchi
- Laboratory for Gastrointestinal Diseases, SNP Research Center, RIKEN, Yokohama, Kanagawa, 230-0045, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Lu ZX, Peng J, Su B. A human-specific mutation leads to the origin of a novel splice form of neuropsin (KLK8), a gene involved in learning and memory. Hum Mutat 2007; 28:978-84. [PMID: 17487847 DOI: 10.1002/humu.20547] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Neuropsin (kallikrein 8, KLK8) is a secreted-type serine protease preferentially expressed in the central nervous system and involved in learning and memory. Its splicing pattern is different in human and mouse, with the longer form (type II) only expressed in human. Sequence analysis suggested a recent origin of type II during primate evolution. Here we demonstrate that the type II form is absent in nonhuman primates, and is thus a human-specific splice form. With the use of an in vitro splicing assay, we show that a human-specific T to A mutation (c.71-127T>A) triggers the change of splicing pattern, leading to the origin of a novel splice form in the human brain. Using mutation assay, we prove that this mutation is not only necessary but also sufficient for type II expression. Our results demonstrate a molecular mechanism for the creation of novel proteins through alternative splicing in the central nervous system during human evolution.
Collapse
Affiliation(s)
- Zhi-xiang Lu
- Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | | | | |
Collapse
|