1
|
Wang J, Li C, Han J, Xue Y, Zheng X, Wang R, Radak Z, Nakabeppu Y, Boldogh I, Ba X. Reassessing the roles of oxidative DNA base lesion 8-oxoGua and repair enzyme OGG1 in tumorigenesis. J Biomed Sci 2025; 32:1. [PMID: 39741341 DOI: 10.1186/s12929-024-01093-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 11/08/2024] [Indexed: 01/02/2025] Open
Abstract
ROS cause multiple forms of DNA damage, and among them, 8-oxoguanine (8-oxoGua), an oxidized product of guanine, is one of the most abundant. If left unrepaired, 8-oxoGua may pair with A instead of C, leading to a mutation of G: C to T: A during DNA replication. 8-Oxoguanine DNA glycosylase 1 (OGG1) is a tailored repair enzyme that recognizes 8-oxoGua in DNA duplex and initiates the base excision repair (BER) pathway to remove the lesion and ensure the fidelity of the genome. The accumulation of genomic 8-oxoGua and the dysfunction of OGG1 is readily linked to mutagenesis, and subsequently aging-related diseases and tumorigenesis; however, the direct experimental evidence has long been lacking. Recently, a series of studies have shown that guanine oxidation in the genome has a conservative bias, with the tendency to occur in the regulatory regions, thus, 8-oxoGua is not only a lesion to be repaired, but also an epigenetic modification. In this regard, OGG1 is a specific reader of this base modification. Substrate recognition and/or excision by OGG1 can cause DNA conformation changes, affect chromatin modifications, thereby modulating the transcription of genes involved in a variety of cellular processes, including inflammation, cell proliferation, differentiation, and apoptosis. Thus, in addition to the potential mutagenicity, 8-oxoGua may contribute to tumor development and progression through the altered gene expression stemming from its epigenetic effects.
Collapse
Affiliation(s)
- Jing Wang
- Department of Respiratory Medicine, China-Japan Union Hospital of Jilin University, Changchun, 130031, China
| | - Chunshuang Li
- Key Laboratory of Molecular Epigenetics of Ministry of Education, College of Life Sciences, Northeast Normal University, Changchun, 130024, China
| | - Jinling Han
- Key Laboratory of Molecular Epigenetics of Ministry of Education, College of Life Sciences, Northeast Normal University, Changchun, 130024, China
| | - Yaoyao Xue
- Key Laboratory of Molecular Epigenetics of Ministry of Education, College of Life Sciences, Northeast Normal University, Changchun, 130024, China
| | - Xu Zheng
- Key Laboratory of Molecular Epigenetics of Ministry of Education, College of Life Sciences, Northeast Normal University, Changchun, 130024, China
| | - Ruoxi Wang
- College of Life Sciences, Shandong Normal University, Jinan, 250014, China
| | - Zsolt Radak
- Research Institute of Sport Science, University of Physical Education, Budapest, 1123, Hungary
| | - Yusaku Nakabeppu
- Division of Neurofunctional Genomics, Department of Immunobiology and Neuroscience, Medical Institute of Bioregulation, Kyushu University, Fukuoka, 812-8582, Japan
| | - Istvan Boldogh
- Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, Galveston, TX, 77555, USA.
| | - Xueqing Ba
- Key Laboratory of Molecular Epigenetics of Ministry of Education, College of Life Sciences, Northeast Normal University, Changchun, 130024, China.
| |
Collapse
|
2
|
Panjali Z, Abdolmaleki P, Hajipour-Verdom B, Hahad O, Zendehdel R. Lung cell toxicity of co-exposure to airborne particulate matter and extremely low-frequency magnetic field. Xenobiotica 2022; 52:370-379. [PMID: 35608272 DOI: 10.1080/00498254.2022.2082342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Although the toxic effects of urban airborne particulate matter (PM) have been known on lung cells, there is less attention to co-exposure to PM and extremely low frequency magnetic (ELF-MF) in occupational settings. The present study investigated the influences of PM and ELF-MF co-exposure on toxicity in human lung cells (A549).In this case, total PM (TPM) was evaluated according to NIOSH-0500. The TPM SiO2 and metal contents were determined based on NIOSH-7602 and 7302, respectively. Besides, 900 mG ELF-MF exposure was simulated based on field measurements. The toxicity mechanisms were assessed by examining malondialdehyde, glutathione ratio, gene expression, and DNA strand breaks. Also, the toxicity indicators of the TPM samples were MDA generation, glutathione depletion, and DNA damage, and their impacts were analysed at doses below the LD50 (4 µg).In addition, gene expression of OGG1 and MTH1 was upregulated after TPM exposure at the lowest dose (2 µg). But ITPA was upregulated in the presence of ELF-MF. The co-exposure to TPM and ELF-MF decreased oxidative stress and DNA damage levels compared to a single exposure to TPM.Although the ELF-MF reduced toxicity in response to TPM, this reduction was not lower than the unexposed cells.
Collapse
Affiliation(s)
- Zahra Panjali
- Department of Occupational Health Engineering, Faculty of Health and Medical Engineering, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Parviz Abdolmaleki
- Department of Biophysics, Faculty of Biological Science, Tarbiat Modarres University, Tehran, Iran
| | - Behnam Hajipour-Verdom
- Department of Biophysics, Faculty of Biological Science, Tarbiat Modarres University, Tehran, Iran
| | - Omar Hahad
- Department of Cardiology, Cardiology I, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, Germany.,German Center for Cardiovascular Research (DZHK), Partner Site Rhine-Main, Mainz, Germany
| | - Rezvan Zendehdel
- Department of Occupational Health and Safety, School of Public Health and Safety, Shahid Beheshti University of Medical Science, Tehran, Iran
| |
Collapse
|
3
|
Malfatti MC, Antoniali G, Codrich M, Burra S, Mangiapane G, Dalla E, Tell G. New perspectives in cancer biology from a study of canonical and non-canonical functions of base excision repair proteins with a focus on early steps. Mutagenesis 2021; 35:129-149. [PMID: 31858150 DOI: 10.1093/mutage/gez051] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 12/05/2019] [Indexed: 12/15/2022] Open
Abstract
Alterations of DNA repair enzymes and consequential triggering of aberrant DNA damage response (DDR) pathways are thought to play a pivotal role in genomic instabilities associated with cancer development, and are further thought to be important predictive biomarkers for therapy using the synthetic lethality paradigm. However, novel unpredicted perspectives are emerging from the identification of several non-canonical roles of DNA repair enzymes, particularly in gene expression regulation, by different molecular mechanisms, such as (i) non-coding RNA regulation of tumour suppressors, (ii) epigenetic and transcriptional regulation of genes involved in genotoxic responses and (iii) paracrine effects of secreted DNA repair enzymes triggering the cell senescence phenotype. The base excision repair (BER) pathway, canonically involved in the repair of non-distorting DNA lesions generated by oxidative stress, ionising radiation, alkylation damage and spontaneous or enzymatic deamination of nucleotide bases, represents a paradigm for the multifaceted roles of complex DDR in human cells. This review will focus on what is known about the canonical and non-canonical functions of BER enzymes related to cancer development, highlighting novel opportunities to understand the biology of cancer and representing future perspectives for designing new anticancer strategies. We will specifically focus on APE1 as an example of a pleiotropic and multifunctional BER protein.
Collapse
Affiliation(s)
- Matilde Clarissa Malfatti
- Laboratory of Molecular Biology and DNA repair, Department of Medicine (DAME), University of Udine, Udine, Italy
| | - Giulia Antoniali
- Laboratory of Molecular Biology and DNA repair, Department of Medicine (DAME), University of Udine, Udine, Italy
| | - Marta Codrich
- Laboratory of Molecular Biology and DNA repair, Department of Medicine (DAME), University of Udine, Udine, Italy
| | - Silvia Burra
- Laboratory of Molecular Biology and DNA repair, Department of Medicine (DAME), University of Udine, Udine, Italy
| | - Giovanna Mangiapane
- Laboratory of Molecular Biology and DNA repair, Department of Medicine (DAME), University of Udine, Udine, Italy
| | - Emiliano Dalla
- Laboratory of Molecular Biology and DNA repair, Department of Medicine (DAME), University of Udine, Udine, Italy
| | - Gianluca Tell
- Laboratory of Molecular Biology and DNA repair, Department of Medicine (DAME), University of Udine, Udine, Italy
| |
Collapse
|
4
|
Hao W, Wang J, Zhang Y, Wang C, Xia L, Zhang W, Zafar M, Kang JY, Wang R, Ali Bohio A, Pan L, Zeng X, Wei M, Boldogh I, Ba X. Enzymatically inactive OGG1 binds to DNA and steers base excision repair toward gene transcription. FASEB J 2020; 34:7427-7441. [PMID: 32378256 PMCID: PMC7318607 DOI: 10.1096/fj.201902243r] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 11/19/2019] [Accepted: 03/17/2020] [Indexed: 12/11/2022]
Abstract
8‐Oxoguanine DNA glycosylase1 (OGG1)‐initiated base excision repair (BER) is the primary pathway to remove the pre‐mutagenic 8‐oxo‐7,8‐dihydroguanine (8‐oxoG) from DNA. Recent studies documented 8‐oxoG serves as an epigenetic‐like mark and OGG1 modulates gene expression in oxidatively stressed cells. For this new role of OGG1, two distinct mechanisms have been proposed: one is coupled to base excision, while the other only requires substrate binding of OGG1––both resulting in conformational adjustment in the adjacent DNA sequences providing access for transcription factors to their cis‐elements. The present study aimed to examine if BER activity of OGG1 is required for pro‐inflammatory gene expression. To this end, Ogg1/OGG1 knockout/depleted cells were transfected with constructs expressing wild‐type (wt) and repair‐deficient mutants of OGG1. OGG1's promoter enrichment, oxidative state, and gene expression were examined. Results showed that TNFα exposure increased levels of oxidatively modified cysteine(s) of wt OGG1 without impairing its association with promoter and facilitated gene expression. The excision deficient K249Q mutant was even a more potent activator of gene expression; whereas, mutant OGG1 with impaired substrate recognition/binding was not. These data suggested the interaction of OGG1 with its substrate at regulatory regions followed by conformational adjustment in the adjacent DNA is the primary mode to modulate inflammatory gene expression.
Collapse
Affiliation(s)
- Wenjing Hao
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Jing Wang
- Department of Respiratory Medicine, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Yuanhang Zhang
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Chenxin Wang
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Lan Xia
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Wenhe Zhang
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Muhammad Zafar
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Ju-Yong Kang
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,Faculty of Life Science, Kim Il Sung University, Pyongyang, DPRK
| | - Ruoxi Wang
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China.,Key Laboratory of Animal Resistance Biology of Shandong Province, Institute of Biomedical Sciences, College of Life Sciences, Shandong Normal University, Jinan, China
| | - Ameer Ali Bohio
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Lang Pan
- School of Life Science, Northeast Normal University, Changchun, China.,Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, Galveston, TX, USA
| | - Xianlu Zeng
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Min Wei
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| | - Istvan Boldogh
- Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, Galveston, TX, USA
| | - Xueqing Ba
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, China.,School of Life Science, Northeast Normal University, Changchun, China
| |
Collapse
|
5
|
Wang R, Hao W, Pan L, Boldogh I, Ba X. The roles of base excision repair enzyme OGG1 in gene expression. Cell Mol Life Sci 2018; 75:3741-3750. [PMID: 30043138 PMCID: PMC6154017 DOI: 10.1007/s00018-018-2887-8] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 07/11/2018] [Accepted: 07/19/2018] [Indexed: 12/13/2022]
Abstract
Modifications of DNA strands and nucleobases-both induced and accidental-are associated with unfavorable consequences including loss or gain in genetic information and mutations. Therefore, DNA repair proteins have essential roles in keeping genome fidelity. Recently, mounting evidence supports that 8-oxoguanine (8-oxoG), one of the most abundant genomic base modifications generated by reactive oxygen and nitrogen species, along with its cognate repair protein 8-oxoguanine DNA glycosylase1 (OGG1), has distinct roles in gene expression through transcription modulation or signal transduction. Binding to 8-oxoG located in gene regulatory regions, OGG1 acts as a transcription modulator, which can control transcription factor homing, induce allosteric transition of G-quadruplex structure, or recruit chromatin remodelers. In addition, post-repair complex formed between OGG1 and its repair product-free 8-oxoG increases the levels of active small GTPases and induces downstream signaling cascades to trigger gene expressions. The present review discusses how cells exploit damaged guanine base(s) and the authentic repair protein to orchestrate a profile of various transcriptomes in redox-regulated biological processes.
Collapse
Affiliation(s)
- Ruoxi Wang
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Institute of Genetics and Cytology, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, Jilin, China
- School of Life Science, Northeast Normal University, Changchun, 130024, Jilin, China
| | - Wenjing Hao
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Institute of Genetics and Cytology, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, Jilin, China
- School of Life Science, Northeast Normal University, Changchun, 130024, Jilin, China
| | - Lang Pan
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Institute of Genetics and Cytology, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, Jilin, China
- Department of Physiology, Xiangya Medicine School in Central South University, Changsha, 410078, Hunan, China
| | - Istvan Boldogh
- Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, Galveston, TX, 77555, USA
- Sealy Center for Molecular Medicine, University of Texas Medical Branch at Galveston, Galveston, TX, 77555, USA
| | - Xueqing Ba
- Key Laboratory of Molecular Epigenetics of Ministry of Education, Institute of Genetics and Cytology, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, Jilin, China.
- School of Life Science, Northeast Normal University, Changchun, 130024, Jilin, China.
| |
Collapse
|
6
|
Akhter S, Aziz RK, Kashef MT, Ibrahim ES, Bailey B, Edwards RA. Kullback Leibler divergence in complete bacterial and phage genomes. PeerJ 2017; 5:e4026. [PMID: 29204318 PMCID: PMC5712468 DOI: 10.7717/peerj.4026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 10/22/2017] [Indexed: 12/11/2022] Open
Abstract
The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.
Collapse
Affiliation(s)
- Sajia Akhter
- Computational Science Research Center, San Diego State University, San Diego, CA, USA
| | - Ramy K Aziz
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt.,Department of Computer Science, San Diego State University, San Diego, CA, United States of America
| | - Mona T Kashef
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt
| | - Eslam S Ibrahim
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Cairo, Egypt
| | - Barbara Bailey
- Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, CA, USA.,Department of Computer Science, San Diego State University, San Diego, CA, United States of America.,Department of Mathematics & Statistics, San Diego State University, San Diego, CA, USA.,Department of Biology, San Diego State University, San Diego, CA, USA
| |
Collapse
|
7
|
Ba X, Boldogh I. 8-Oxoguanine DNA glycosylase 1: Beyond repair of the oxidatively modified base lesions. Redox Biol 2017; 14:669-678. [PMID: 29175754 PMCID: PMC5975208 DOI: 10.1016/j.redox.2017.11.008] [Citation(s) in RCA: 167] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Revised: 10/08/2017] [Accepted: 11/08/2017] [Indexed: 12/11/2022] Open
Abstract
Oxidative stress and the resulting damage to genomic DNA are inevitable consequences of endogenous physiological processes, and they are amplified by cellular responses to environmental exposures. One of the most frequent reactions of reactive oxygen species with DNA is the oxidation of guanine to pre-mutagenic 8-oxo-7,8-dihydroguanine (8-oxoG). Despite the vulnerability of guanine to oxidation, vertebrate genes are primarily embedded in GC-rich genomic regions, and over 72% of the promoters of human genes belong to a class with a high GC content. In the promoter, 8-oxoG may serve as an epigenetic mark, and when complexed with the oxidatively inactivated repair enzyme 8-oxoguanine DNA glycosylase 1, provide a platform for the coordination of the initial steps of DNA repair and the assembly of the transcriptional machinery to launch the prompt and preferential expression of redox-regulated genes. Deviations/variations from this artful coordination may be the etiological links between guanine oxidation and various cellular pathologies and diseases during ageing processes.
Collapse
Affiliation(s)
- Xueqing Ba
- The Key Laboratory of Molecular Epigenetics of Ministry of Education, Northeast Normal University, Changchun, Jilin 130024, China; School of Life Science, Northeast Normal University, Changchun, Jilin 130024, China.
| | - Istvan Boldogh
- Department of Microbiology and Immunology, University of Texas Medical Branch at Galveston, Galveston, TX 77555, USA; Sealy Center for Molecular Medicine, University of Texas Medical Branch at Galveston, Galveston, TX 77555, USA.
| |
Collapse
|
8
|
Gupta A, Singh TR. SHIFT: server for hidden stops analysis in frame-shifted translation. BMC Res Notes 2013; 6:68. [PMID: 23432998 PMCID: PMC3598200 DOI: 10.1186/1756-0500-6-68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Accepted: 02/21/2013] [Indexed: 02/07/2023] Open
Abstract
Background Frameshift is one of the three classes of recoding. Frame-shifts lead to waste of energy, resources and activity of the biosynthetic machinery. In addition, some peptides synthesized after frame-shifts are probably cytotoxic which serve as plausible cause for innumerable number of diseases and disorders such as muscular dystrophies, lysosomal storage disorders, and cancer. Hidden stop codons occur naturally in coding sequences among all organisms. These codons are associated with the early termination of translation for incorrect reading frame selection and help to reduce the metabolic cost related to the frameshift events. Researchers have identified several consequences of hidden stop codons and their association with myriad disorders. However the wealth of information available is speckled and not effortlessly acquiescent to data-mining. To reduce this gap, this work describes an algorithmic web based tool to study hidden stops in frameshifted translation for all the lineages through respective genetic code systems. Findings This paper describes SHIFT, an algorithmic web application tool that provides a user-friendly interface for identifying and analyzing hidden stops in frameshifted translation of genomic sequences for all available genetic code systems. We have calculated the correlation between codon usage frequencies and the plausible contribution of codons towards hidden stops in an off-frame context. Markovian chains of various order have been used to model hidden stops in frameshifted peptides and their evolutionary association with naturally occurring hidden stops. In order to obtain reliable and persuasive estimates for the naturally occurring and predicted hidden stops statistical measures have been implemented. Conclusions This paper presented SHIFT, an algorithmic tool that allows user-friendly exploration, analysis, and visualization of hidden stop codons in frameshifted translations. It is expected that this web based tool would serve as a useful complement for analyzing hidden stop codons in all available genetic code systems. SHIFT is freely available for academic and research purpose at http://www.nuccore.org/shift/.
Collapse
Affiliation(s)
- Arun Gupta
- School of Computer Science and IT, DAVV, Indore, M.P., India
| | | |
Collapse
|
9
|
Hilterbrand A, Saelens J, Putonti C. CBDB: the codon bias database. BMC Bioinformatics 2012; 13:62. [PMID: 22536831 PMCID: PMC3463423 DOI: 10.1186/1471-2105-13-62] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2012] [Accepted: 03/26/2012] [Indexed: 02/01/2023] Open
Abstract
Background In many genomes, a clear preference in the usage of particular codons exists. The mechanisms that induce codon biases remain an open question; studies have attributed codon usage to translational selection, mutational bias and drift. Furthermore, correlations between codon usage within host genomes and their viral pathogens have been observed for a myriad of host-virus systems. As such, numerous studies have investigated codon usage and codon bias in an effort to better understand how species evolve. Numerous metrics have been developed to identify biases in codon usage. In addition, a few data repositories of codon bias data are available, differing in the metrics reported as well as the number and taxonomy of strains examined. Description We have created a new web resource called the Codon Bias Database (CBDB) which provides information regarding the codon bias within the set of highly expressed genes for 300+ bacterial genomes. CBDB was developed to provide a resource for researchers investigating codon bias in bacteria, facilitating comparisons between strains and species. Furthermore, the site was created to serve those studying adaptation in phage; the genera selected for this first release of CBDB all have sequenced, annotated bacteriophages. The annotations and sequences for the highly expressed gene set are available for each strain in addition to the strain’s codon bias measurements. Conclusions Comparing species and strains provides a comprehensive look at how codon usage has been shaped over evolutionary time and can elucidate the putative mechanisms behind it. The Codon Bias Database provides a centralized repository of look-up tables and codon usage bias measures for a wide variety of genera, species and strains. Through our analysis of the variation in codon usage within the strains presently available, we find that most members of a genus have a codon composition most similar to other members of its genus, although not necessarily other members of its species.
Collapse
Affiliation(s)
- Adam Hilterbrand
- Department of Biology, Loyola University Chicago, 1032 W Sheridan Road, Chicago, IL 60660, USA
| | | | | |
Collapse
|
10
|
Dass JFP, Sudandiradoss C. Insight into pattern of codon biasness and nucleotide base usage in serotonin receptor gene family from different mammalian species. Gene 2012; 503:92-100. [PMID: 22480817 DOI: 10.1016/j.gene.2012.03.057] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 03/14/2012] [Accepted: 03/17/2012] [Indexed: 11/16/2022]
Abstract
5-HT (5-Hydroxy-tryptamine) or serotonin receptors are found both in central and peripheral nervous system as well as in non-neuronal tissues. In the animal and human nervous system, serotonin produces various functional effects through a variety of membrane bound receptors. In this study, we focus on 5-HT receptor family from different mammals and examined the factors that account for codon and nucleotide usage variation. A total of 110 homologous coding sequences from 11 different mammalian species were analyzed using relative synonymous codon usage (RSCU), correspondence analysis (COA) and hierarchical cluster analysis together with nucleotide base usage frequency of chemically similar amino acid codons. The mean effective number of codon (ENc) value of 37.06 for 5-HT(6) shows very high codon bias within the family and may be due to high selective translational efficiency. The COA and Spearman's rank correlation reveals that the nucleotide compositional mutation bias as the major factors influencing the codon usage in serotonin receptor genes. The hierarchical cluster analysis suggests that gene function is another dominant factor that affects the codon usage bias, while species is a minor factor. Nucleotide base usage was reported using Goldman, Engelman, Stietz (GES) scale reveals the presence of high uracil (>45%) content at functionally important hydrophobic regions. Our in silico approach will certainly help for further investigations on critical inference on evolution, structure, function and gene expression aspects of 5-HT receptors family which are potential antipsychotic drug targets.
Collapse
Affiliation(s)
- J Febin Prabhu Dass
- School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu State, India
| | | |
Collapse
|
11
|
Zhang W, Wu W, Lin W, Zhou P, Dai L, Zhang Y, Huang J, Zhang D. Deciphering heterogeneity in pig genome assembly Sscrofa9 by isochore and isochore-like region analyses. PLoS One 2010; 5:e13303. [PMID: 20948965 PMCID: PMC2952626 DOI: 10.1371/journal.pone.0013303] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Accepted: 09/15/2010] [Indexed: 11/18/2022] Open
Abstract
Background The isochore, a large DNA sequence with relatively small GC variance, is one of the most important structures in eukaryotic genomes. Although the isochore has been widely studied in humans and other species, little is known about its distribution in pigs. Principal Findings In this paper, we construct a map of long homogeneous genome regions (LHGRs), i.e., isochores and isochore-like regions, in pigs to provide an intuitive version of GC heterogeneity in each chromosome. The LHGR pattern study not only quantifies heterogeneities, but also reveals some primary characteristics of the chromatin organization, including the followings: (1) the majority of LHGRs belong to GC-poor families and are in long length; (2) a high gene density tends to occur with the appearance of GC-rich LHGRs; and (3) the density of LINE repeats decreases with an increase in the GC content of LHGRs. Furthermore, a portion of LHGRs with particular GC ranges (50%–51% and 54%–55%) tend to have abnormally high gene densities, suggesting that biased gene conversion (BGC), as well as time- and energy-saving principles, could be of importance to the formation of genome organization. Conclusion This study significantly improves our knowledge of chromatin organization in the pig genome. Correlations between the different biological features (e.g., gene density and repeat density) and GC content of LHGRs provide a unique glimpse of in silico gene and repeats prediction.
Collapse
Affiliation(s)
- Wenqian Zhang
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Wenwu Wu
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Wenchao Lin
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Pengfang Zhou
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Li Dai
- Bioinformatics Center, College of Life Science, Northwest A&F University, Xianyang, Shaanxi, China
| | - Yang Zhang
- Investigation Group of Molecular Virology, Immunology, Oncology and Systems Biology, and Bioinformatics Center, College of Veterinary Medicine, Northwest A&F University, Xianyang, Shaanxi, China
| | - Jingfei Huang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China
- * E-mail: (DZ); (JH)
| | - Deli Zhang
- Investigation Group of Molecular Virology, Immunology, Oncology and Systems Biology, and Bioinformatics Center, College of Veterinary Medicine, Northwest A&F University, Xianyang, Shaanxi, China
- * E-mail: (DZ); (JH)
| |
Collapse
|
12
|
Variable correlation of genome GC% with transfer RNA number as well as with transfer RNA diversity among bacterial groups: α-Proteobacteria and Tenericutes exhibit strong positive correlation. Microbiol Res 2010; 165:232-42. [DOI: 10.1016/j.micres.2009.05.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Revised: 05/04/2009] [Accepted: 05/15/2009] [Indexed: 11/21/2022]
|
13
|
Nuel G, Regad L, Martin J, Camproux AC. Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data. Algorithms Mol Biol 2010; 5:15. [PMID: 20205909 PMCID: PMC2828453 DOI: 10.1186/1748-7188-5-15] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 01/26/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. RESULTS The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. CONCLUSIONS Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.
Collapse
Affiliation(s)
- Gregory Nuel
- LSG, Laboratoire Statistique et Génome, CNRS UMR-8071, INRA UMR-1152, University of Evry, Evry, France
- CNRS, Paris, France
- MAP5, Department of Applied Mathematics, CNRS UMR-8145, University Paris Descartes, Paris, France
| | - Leslie Regad
- EBGM, Equipe de Bioinformatique Génomique et Moleculaire, INSERM UMRS-726, University Paris Diderot, Paris, France
- MTi, Molecules Thérapeutique in silico, INSERM UMRS-973, University Paris Diderot, Paris, France
| | - Juliette Martin
- EBGM, Equipe de Bioinformatique Génomique et Moleculaire, INSERM UMRS-726, University Paris Diderot, Paris, France
- MIG, Mathématique Informatique et Genome, INRA UR-1077, Jouy-en-Josas, France
- IBCP, Institut de Biologie et Chimie des Protéines, IFR 128, CNRS UMR 5086, University of Lyon 1, Lyon, France
| | - Anne-Claude Camproux
- EBGM, Equipe de Bioinformatique Génomique et Moleculaire, INSERM UMRS-726, University Paris Diderot, Paris, France
- MTi, Molecules Thérapeutique in silico, INSERM UMRS-973, University Paris Diderot, Paris, France
| |
Collapse
|
14
|
Sabbbia V, Romero H, Musto H, Naya H. Composition Profile of the Human Genome at the Chromosome Level. J Biomol Struct Dyn 2009; 27:361-70. [DOI: 10.1080/07391102.2009.10507322] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
15
|
Brandão A, Jiang T. The composition of untranslated regions in Trypanosoma cruzi genes. Parasitol Int 2009; 58:215-9. [PMID: 19505588 DOI: 10.1016/j.parint.2009.06.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2009] [Revised: 05/26/2009] [Accepted: 06/01/2009] [Indexed: 11/25/2022]
Abstract
We collected the UTRs from Trypanosomacruzi genes that have been experimentally mapped and are publicly available, and made a comprehensive analysis of their composition features including sequence length, G+C content and relationship to ORF, composition of the most frequent words, and distribution of Simple Sequence Repeats (SSR). T. cruzi UTRs exhibit range length of 10-400bp for 5' UTR and 17-2800 for 3' UTR. Both UTRs display mean G+C content of 40%. Ratios between the UTR and protein coding segments show that the 5' UTR is limited to a maximum of 20% of the total length in the final transcript. The 5' UTR most frequent words in the range 4-12 bases are almost exact complement to the 3' UTR respective words. SSR in 3' UTR are longer than in 5' UTR and are mostly derived from TA/AT, TG/GT, and TTA/ATT. SSR accounts up to 20% of the nucleotide composition in 5' UTR and up to 90% in the 3' UTR.
Collapse
|
16
|
Abstract
Genomes exhibit diverse patterns of species-specific GC content, GC and AT skews, codon bias, and mutation bias. Despite intensive investigations and the rapid accumulation of sequence data, the causes of these a priori different genome biases have not been agreed on and seem multifactorial and idiosyncratic. We show that these biases can arise generically from an instability of the coevolutionary dynamics between genome composition and resource allocation for translation, transcription, and replication. Thus, we offer a unifying framework for understanding and analyzing different genome biases. We develop a test of multistability of nucleotide composition of completely sequenced genomes and reveal a bistability for Borrelia burgdorferi, a genome with pronounced replication-related biases. These results indicate that evolution generates rhetoric, it improves the efficiency of the genome's communication with the cell without modifying the message, and this leads to bias.
Collapse
|
17
|
Guo X, Bao J, Fan L. Evidence of selectively driven codon usage in rice: implications for GC content evolution of Gramineae genes. FEBS Lett 2007; 581:1015-21. [PMID: 17306258 DOI: 10.1016/j.febslet.2007.01.088] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2006] [Revised: 01/27/2007] [Accepted: 01/31/2007] [Indexed: 10/23/2022]
Abstract
Two gene classes characterized by high and low GC content have been found in rice and other cereals, but not dicot genomes. We used paralogs with high and low GC contents in rice and found: (a) a greater increase in GC content at exonic fourfold-redundant sites than at flanking introns; (b) with reference to their orthologs in Arabidopsis, most substitution sites between the two kinds of paralogs are found at 2- and 4-degenerate sites with a T-->C mode, while A-->C and A-->G play major roles at 0-degenerate sites; and (c) high-GC genes have greater bias and codon usage is skewed toward codons that are preferred in highly expressed genes. We believe this is strong evidence for selectively driven codon usage in rice. Another cereal, maize, also showed the same trend as in rice. This represents a potential evolutionary process for the origin of genes with a high GC content in rice and other cereals.
Collapse
Affiliation(s)
- Xingyi Guo
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310029, China
| | | | | |
Collapse
|
18
|
Yang X, Scheffler BE, Weston LA. Recent developments in primer design for DNA polymorphism and mRNA profiling in higher plants. PLANT METHODS 2006; 2:4. [PMID: 16509990 PMCID: PMC1526718 DOI: 10.1186/1746-4811-2-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2006] [Accepted: 03/01/2006] [Indexed: 05/06/2023]
Abstract
Primer design is a critical step in the application of PCR-based technologies in gene expression and genetic diversity analysis. As more plant genomes have been sequenced in recent years, the emphasis of primer design strategy has shifted to genome-wide and high-throughput direction. This paper summarizes recent advances in primer design for profiling of DNA polymorphism and mRNA in higher plants, as well as new primer systems developed for animals that can be adapted for plants.
Collapse
Affiliation(s)
- Xiaohan Yang
- Department of Horticulture, Cornell University, Ithaca, NY 14853, USA
- Department of Plant Sciences, University of Tennessee, 2431 Joe Johnson Drive, Knoxville, TN 37996, USA
| | - Brian E Scheffler
- USDA-ARS-CGRU, MSA Genomics Laboratory, 141 Experiment Station Rd., Stoneville, MS 38776, USA
| | - Leslie A Weston
- Department of Horticulture, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
19
|
Chen LL, Gao F. Detection of nucleolar organizer and mitochondrial DNA insertion regions based on the isochore map of Arabidopsis thaliana. FEBS J 2005; 272:3328-36. [PMID: 15978039 DOI: 10.1111/j.1742-4658.2005.04748.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Eukaryotic genomes are composed of isochores, i.e. long sequences relatively homogeneous in GC content. In this paper, the isochore structure of Arabidopsis thaliana genome has been studied using a windowless technique based on the Z curve method and intuitive curves are drawn for all the five chromosomes. Using these curves, we can calculate the GC content at any resolution, even at the base level. It is observed that all the five chromosomes are composed of several GC-rich and AT-rich regions alternatively. Usually, these regions, named 'isochore-like regions', have large fluctuations in the GC content. Five isochores with little fluctuations are also observed. Detailed analyses have been performed for these isochores. A GC-rich 'isochore-like region' and a GC-isochore in chromosome II and IV, respectively, are the nucleolar organizer regions (NORs), and genes located in the two regions prefer to use GC-ending codons. Another GC-isochore located in chromosome II is a mitochondrial DNA insertion region, the position and size of this region is precisely predicted by the current method. The amino acid usage and codon preference of genes in this organellar-to-nuclear transfer region show significant difference from other regions. Moreover, the centromeres are located in GC-rich 'isochore-like regions' in all the five chromosomes. The current method can provide a useful tool for analyzing whole genomic sequences of eukaryotes.
Collapse
Affiliation(s)
- Ling-Ling Chen
- Laboratory for Computational Biology, Shandong Provincial Research Center for Bioinformatic Engineering and Techniques, Shandong University of Technology, Zibo, China.
| | | |
Collapse
|
20
|
Wernegreen JJ, Funk DJ. Mutation exposed: a neutral explanation for extreme base composition of an endosymbiont genome. J Mol Evol 2005; 59:849-58. [PMID: 15599516 DOI: 10.1007/s00239-003-0192-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2003] [Accepted: 06/29/2004] [Indexed: 10/26/2022]
Abstract
The influence of neutral mutation pressure versus selection on base composition evolution is a subject of considerable controversy. Yet the present study represents the first explicit population genetic analysis of this issue in prokaryotes, the group in which base composition variation is most dramatic. Here, we explore the impact of mutation and selection on the dynamics of synonymous changes in Buchnera aphidicola, the AT-rich bacterial endosymbiont of aphids. Specifically, we evaluated three forms of evidence. (i) We compared the frequencies of directional base changes (AT-->GC vs. GC-->AT) at synonymous sites within and between Buchnera species, to test for selective preference versus effective neutrality of these mutational categories. Reconstructed mutational changes across a robust intraspecific phylogeny showed a nearly 1:1 AT-->GC:GC-->AT ratio. Likewise, stationarity of base composition among Buchnera species indicated equal rates of AT-->GC and GC-->AT substitutions. The similarity of these patterns within and between species supported the neutral model. (ii) We observed an equivalence of relative per-site AT mutation rate and current AT content at synonymous sites, indicating that base composition is at mutational equilibrium. (iii) We demonstrated statistically greater equality in the frequency of mutational categories in Buchnera than in parallel mammalian studies that documented selection on synonymous sites. Our results indicate that effectively neutral mutational pressure, rather than selection, represents the major force driving base composition evolution in Buchnera. Thus they further corroborate recent evidence for the critical role of reduced N(e) in the molecular evolution of bacterial endosymbionts.
Collapse
Affiliation(s)
- Jennifer J Wernegreen
- Josephine Bay Paul Center for Comparative Molecular Biology & Evolution, The Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA.
| | | |
Collapse
|
21
|
Musto H, Naya H, Zavala A, Romero H, Alvarez-Valín F, Bernardi G. Correlations between genomic GC levels and optimal growth temperatures in prokaryotes. FEBS Lett 2004; 573:73-7. [PMID: 15327978 DOI: 10.1016/j.febslet.2004.07.056] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2004] [Revised: 07/20/2004] [Accepted: 07/22/2004] [Indexed: 10/26/2022]
Abstract
In prokaryotes, GC levels range from 25% to 75%, and Topt from approximately 0 degrees C to >100 degrees C. When all species are considered together, no correlation is found between the two variables. Correlations are found, however, when Families of prokaryotes are analysed. Indeed, when Families comprising at least 10 species were studied (a set of 20 Families), positive correlations are found for 15 of them. Furthermore, a comparative analysis by independent contrasts made within the Families in order to control for phylogenetic non-independence showed qualitatively equivalent results. We conclude that Topt is one of the factors that influences genomic GC in prokaryotes.
Collapse
Affiliation(s)
- Héctor Musto
- Laboratorio de Organización y Evolución del Genoma, Facultad de Ciencias, Montevideo, Uruguay
| | | | | | | | | | | |
Collapse
|
22
|
Zhang CT, Zhang R. A nucleotide composition constraint of genome sequences. Comput Biol Chem 2004; 28:149-53. [PMID: 15130543 DOI: 10.1016/j.compbiolchem.2004.02.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2004] [Revised: 02/15/2004] [Accepted: 02/15/2004] [Indexed: 11/23/2022]
Abstract
Let a, c, g and t denote the occurrence frequencies of A, C, G and T, respectively, in a genome. We calculated the statistical quantity S = a2 + c2 + g2 + t2 for each of 809 genomes (11 archaea, 42 bacteria, 3 eukaryota, 90 phages, 36 viroids and 627 viruses) and 236 plasmids. We found that S < 1/3 is strictly valid for almost all of the above genomes or plasmids. As a direct deduction of the above observation, it is shown that (i) the statistical quantity S is a kind of genome order index, which is negatively correlated with the Shannon H function; (ii) S < 1/3 suggests that a minimal value of the Shannon H function is required for each genome; (iii) S defined above would be a new biological statistical quantity, useful to describe the composition features of genomes; (iv) By jointly considering the Chargaff Parity Rule 2, it is shown that the genomic G + C content should be in between 0.211 and 0.789.
Collapse
Affiliation(s)
- Chun-Ting Zhang
- Department of Physics, Tianjin University, Tianjin 300072, China.
| | | |
Collapse
|
23
|
Chattopadhyay S, Chakrabarti J. Temporal changes in phosphoglycerate kinase coding sequences: a quantitative measure. J Comput Biol 2003; 10:83-93. [PMID: 12676052 DOI: 10.1089/106652703763255688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The ratio of the average of the square of the number of the nucleotides to that of the random sequence of the same strand bias is proposed as a quantitative measure of evolution in some coding DNA sequences. Applying this measure to the phosphoglycerate kinase gene we observe a monotonic rise of the ratio with evolution. We present an interpretation of this data on some bacteria.
Collapse
Affiliation(s)
- Sujay Chattopadhyay
- Department of Theoretical Physics, Indian Association for the Cultivation of Science, Calcutta 700 032,
| | | |
Collapse
|
24
|
Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T. Informatics for unveiling hidden genome signatures. Genome Res 2003; 13:693-702. [PMID: 12671005 PMCID: PMC430167 DOI: 10.1101/gr.634603] [Citation(s) in RCA: 140] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
With the increasing amount of available genome sequences, novel tools are needed for comprehensive analysis of species-specific sequence characteristics for a wide variety of genomes. We used an unsupervised neural network algorithm, a self-organizing map (SOM), to analyze di-, tri-, and tetranucleotide frequencies in a wide variety of prokaryotic and eukaryotic genomes. The SOM, which can cluster complex data efficiently, was shown to be an excellent tool for analyzing global characteristics of genome sequences and for revealing key combinations of oligonucleotides representing individual genomes. From analysis of 1- and 10-kb genomic sequences derived from 65 bacteria (a total of 170 Mb) and from 6 eukaryotes (460 Mb), clear species-specific separations of major portions of the sequences were obtained with the di-, tri-, and tetranucleotide SOMs. The unsupervised algorithm could recognize, in most 10-kb sequences, the species-specific characteristics (key combinations of oligonucleotide frequencies) that are signature features of each genome. We were able to classify DNA sequences within one and between many species into subgroups that corresponded generally to biological categories. Because the classification power is very high, the SOM is an efficient and fundamental bioinformatic strategy for extracting a wide range of genomic information from a vast amount of sequences.
Collapse
Affiliation(s)
- Takashi Abe
- Division of Evolutionary Genetics, Department of Population Genetics, National Institute of Genetics, The Graduate University for Advanced Studies, Mishima, Shizuoka-ken 411-8540, Japan
| | | | | | | | | | | |
Collapse
|
25
|
Chen WJ, Bonillo C, Lecointre G. Repeatability of clades as a criterion of reliability: a case study for molecular phylogeny of Acanthomorpha (Teleostei) with larger number of taxa. Mol Phylogenet Evol 2003; 26:262-88. [PMID: 12565036 DOI: 10.1016/s1055-7903(02)00371-8] [Citation(s) in RCA: 211] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Although much progress has been made recently in teleostean phylogeny, relationships among the main lineages of the higher teleosts (Acanthomorpha), containing more than 60% of all fish species, remain poorly defined. This study represents the most extensive taxonomic sampling effort to date to collect new molecular characters for phylogenetic analysis of acanthomorph fishes. We compiled and analyzed three independent data sets, including: (i) mitochondrial ribosomal fragments from 12S and 16s (814bp for 97 taxa); (ii) nuclear ribosomal 28S sequences (847bp for 74 taxa); and (iii) a nuclear protein-coding gene, rhodopsin (759bp for 86 taxa). Detailed analyses were conducted on each data set separately and the principle of taxonomic congruence without consensus trees was used to assess confidence in the results as follows. Repeatability of clades from separate analyses was considered the primary criterion to establish reliability, rather than bootstrap proportions from a single combined (total evidence) data matrix. The new and reliable clades emerging from this study of the acanthomorph radiation were: Gadiformes (cods) with Zeioids (dories); Beloniformes (needlefishes) with Atheriniformes (silversides); blenioids (blennies) with Gobiesocoidei (clingfishes); Channoidei (snakeheads) with Anabantoidei (climbing gouramies); Mastacembeloidei (spiny eels) with Synbranchioidei (swamp-eels); the last two pairs of taxa grouping together, Syngnathoidei (aulostomids, macroramphosids) with Dactylopteridae (flying gurnards); Scombroidei (mackerels) plus Stromatoidei plus Chiasmodontidae; Ammodytidae (sand lances) with Cheimarrhichthyidae (torrentfish); Zoarcoidei (eelpouts) with Cottoidei; Percidae (perches) with Notothenioidei (Antarctic fishes); and a clade grouping Carangidae (jacks), Echeneidae (remoras), Sphyraenidae (barracudas), Menidae (moonfish), Polynemidae (threadfins), Centropomidae (snooks), and Pleuronectiformes (flatfishes).
Collapse
Affiliation(s)
- Wei-Jen Chen
- Laboratoire d'Ichtyologie générale et appliquée, et service de systématique moléculaire (IFR CNRS 1541), Muséun National d'Histoire Naturelle, 43 rue Cuvier, 75231 Paris cedex 05, France.
| | | | | |
Collapse
|
26
|
Bronner G, Spataro B, Page M, Gautier C, Rechenmann F. Modeling comparative mapping using objects and associations. COMPUTERS & CHEMISTRY 2002; 26:413-20. [PMID: 12144172 DOI: 10.1016/s0097-8485(02)00004-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Spatial information on genome organization is essential for both gene prediction and annotation among species and a better understanding of genomes functioning and evolution. We propose in this article an object-association model to formalize comparative genomic mapping. This model is being implemented in the GeMCore knowledge base, for which some original capabilities are described. GeMCore associated to the GeMME graphical interface for molecular evolution was used to spatially characterize the minor shift phenomenon between human and mouse.
Collapse
Affiliation(s)
- G Bronner
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558 Lyon I Bât Grégoire Mendel, Villeurbanne, France.
| | | | | | | | | |
Collapse
|
27
|
Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 2002; 296:79-92. [PMID: 11935017 DOI: 10.1126/science.1068037] [Citation(s) in RCA: 1791] [Impact Index Per Article: 77.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.
Collapse
MESH Headings
- Arabidopsis/genetics
- Base Composition
- Computational Biology
- Contig Mapping
- DNA Transposable Elements
- DNA, Intergenic
- DNA, Plant/chemistry
- DNA, Plant/genetics
- Databases, Nucleic Acid
- Exons
- Gene Duplication
- Genes, Plant
- Genome, Plant
- Genomics
- Introns
- Molecular Sequence Data
- Oryza/genetics
- Plant Proteins/chemistry
- Plant Proteins/genetics
- Polymorphism, Genetic
- Repetitive Sequences, Nucleic Acid
- Sequence Analysis, DNA
- Sequence Homology, Nucleic Acid
- Software
- Species Specificity
- Synteny
Collapse
Affiliation(s)
- Jun Yu
- Beijing Genomics Institute/Center of Genomics and Bioinformatics, Chinese Academy of Sciences, Beijing 101300, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Paulsen M, Takada S, Youngson NA, Benchaib M, Charlier C, Segers K, Georges M, Ferguson-Smith AC. Comparative sequence analysis of the imprinted Dlk1-Gtl2 locus in three mammalian species reveals highly conserved genomic elements and refines comparison with the Igf2-H19 region. Genome Res 2001; 11:2085-94. [PMID: 11731499 PMCID: PMC311216 DOI: 10.1101/gr.206901] [Citation(s) in RCA: 101] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The Dlk1-Gtl2 domain on mouse chromosome 12 contains reciprocally imprinted genes with the potential to contribute to our understanding of common features involved in imprinting control. We have sequenced this conserved region in the mouse and sheep and included the human sequence in a three species comparison. This analysis resulted in a precise conservation map and identification of highly conserved sequence elements, some of which we have shown previously to be differentially methylated in the mouse. Additionally, this analysis facilitated identification of a CpG-rich tandem repeat array located approximately 13-15 kb upstream of Gtl2. Furthermore, we have identified a third imprinted transcript that overlaps with the last Dlk1 exon in the mouse. This transcript lacks a conserved open reading frame and is probably generated by cleavage of extended Dlk1 transcripts. Because Dlk1 and Gtl2 share many of the imprinting properties of the well-characterized Igf2-H19 domain, it has been proposed that the two regions may be regulated in the same way. Comparative genomic examination of the two domains indicates that although there are similarities, other features are very different, including the location of conserved CTCF-binding sites, and the level of conservation at regulatory regions.
Collapse
Affiliation(s)
- M Paulsen
- University of Cambridge, Department of Anatomy, Cambridge CB2 3DY, United Kingdom
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Wang J, Zhang Q, Ren K, She Z. Multi-scaling hierarchical structure analysis on the sequence ofE. coli complete genome. ACTA ACUST UNITED AC 2001. [DOI: 10.1007/bf02901913] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
30
|
Abstract
Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments (>>300 kb on average) relatively homogeneous in G+C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of chromosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G+C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G+C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available.
Collapse
Affiliation(s)
- J L Oliver
- Departamento de Genética, Instituto de Biotecnología, Universidad de Granada, E-18071, Granada, Spain.
| | | | | | | |
Collapse
|
31
|
Abstract
We tried to identify the substitutions involved in the establishment of replication strand bias, which has been recognized as an important evolutionary factor in the evolution of bacterial genomes. First, we analyzed the composition asymmetry of 28 complete bacterial genomes and used it to test the possibility that asymmetric deamination of cytosine might be at the origin of the bias. The model showed significant correlation to the data but left unexplained a significant portion of the variance and indicated a systematic underestimation of GC skews in comparison with TA skews. Second, we analyzed the substitutions acting on the genes from five fully sequenced Chlamydia genomes that had not suffered strand switch since speciation. This analysis showed that substitutions were not at equilibrium in Chlamydia trachomatis or in C. muridarum and that strand bias is still an on-going process in these genes. Third, we identified substitutions involved in the adaptation of genes that had switched strands after speciation. These genes adapted quickly to the skewed composition of the new strand, mostly due to C-->T, A-->G, and C-->G asymmetric substitutions. This observation was reinforced by the analysis of genes that switched strands after divergence between Bacillus subtilis and B. halodurans. Finally, we propose a more extended model based on the analysis of the substitution asymmetries of CHLAMYDIA: This model fits well with the data provided by bacterial genomes presenting strong strand bias.
Collapse
Affiliation(s)
- E P Rocha
- Atelier de BioInformatique, Université Paris VI, Paris, France.
| | | |
Collapse
|
32
|
Abstract
According to New Synthesis doctrine, the direction of evolution is determined by selection and not by "internal causes" that act by way of propensities of variation. This doctrine rests on the theoretical claim that because mutation rates are small in comparison to selection coefficients, mutation is powerless to overcome opposing selection. Using a simple population-genetic model, this claim is shown to depend on assuming the prior availability of variation, so that mutation may act only as a "pressure" on the frequencies of existing alleles, and not as the evolutionary process that introduces novelty. As shown here, mutational bias in the introduction of novelty can strongly influence the course of evolution, even when mutation rates are small in comparison to selection coefficients. Recognizing this mode of causation provides a distinct mechanistic basis for an "internalist" approach to determining the contribution of mutational and developmental factors to evolutionary phenomena such as homoplasy, parallelism, and directionality.
Collapse
Affiliation(s)
- L Y Yampolsky
- Center for Advanced Research in Biotechnology, Rockville, MD 20874, USA
| | | |
Collapse
|
33
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447213 DOI: 10.1002/cfg.58] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|