1
|
Brovkina MV, Chapman MA, Holding ML, Clowney EJ. Emergence and influence of sequence bias in evolutionarily malleable, mammalian tandem arrays. BMC Biol 2023; 21:179. [PMID: 37612705 PMCID: PMC10463633 DOI: 10.1186/s12915-023-01673-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 08/01/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND The radiation of mammals at the extinction of the dinosaurs produced a plethora of new forms-as diverse as bats, dolphins, and elephants-in only 10-20 million years. Behind the scenes, adaptation to new niches is accompanied by extensive innovation in large families of genes that allow animals to contact the environment, including chemosensors, xenobiotic enzymes, and immune and barrier proteins. Genes in these "outward-looking" families are allelically diverse among humans and exhibit tissue-specific and sometimes stochastic expression. RESULTS Here, we show that these tandem arrays of outward-looking genes occupy AT-biased isochores and comprise the "tissue-specific" gene class that lack CpG islands in their promoters. Models of mammalian genome evolution have not incorporated the sharply different functions and transcriptional patterns of genes in AT- versus GC-biased regions. To examine the relationship between gene family expansion, sequence content, and allelic diversity, we use population genetic data and comparative analysis. First, we find that AT bias can emerge during evolutionary expansion of gene families in cis. Second, human genes in AT-biased isochores or with GC-poor promoters experience relatively low rates of de novo point mutation today but are enriched for non-synonymous variants. Finally, we find that isochores containing gene clusters exhibit low rates of recombination. CONCLUSIONS Our analyses suggest that tolerance of non-synonymous variation and low recombination are two forces that have produced the depletion of GC bases in outward-facing gene arrays. In turn, high AT content exerts a profound effect on their chromatin organization and transcriptional regulation.
Collapse
Affiliation(s)
- Margarita V Brovkina
- Graduate Program in Cellular and Molecular Biology, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Margaret A Chapman
- Neurosciences Graduate Program, University of Michigan Medical School, Ann Arbor, MI, USA
| | | | - E Josephine Clowney
- Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI, USA.
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
2
|
Yu W, Chakravarthi VP, Borosha S, Dilower I, Lee EB, Ratri A, Starks RR, Fields PE, Wolfe MW, Faruque MO, Tuteja G, Rumi MAK. Transcriptional regulation of Satb1 in mouse trophoblast stem cells. Front Cell Dev Biol 2022; 10:918235. [PMID: 36589740 PMCID: PMC9795202 DOI: 10.3389/fcell.2022.918235] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 11/18/2022] [Indexed: 12/15/2022] Open
Abstract
SATB homeobox proteins are important regulators of developmental gene expression. Among the stem cell lineages that emerge during early embryonic development, trophoblast stem (TS) cells exhibit robust SATB expression. Both SATB1 and SATB2 act to maintain the trophoblast stem-state. However, the molecular mechanisms that regulate TS-specific Satb expression are not yet known. We identified Satb1 variant 2 as the predominant transcript in trophoblasts. Histone marks, and RNA polymerase II occupancy in TS cells indicated an active state of the promoter. A novel cis-regulatory region with active histone marks was identified ∼21 kbp upstream of the variant 2 promoter. CRISPR/Cas9 mediated disruption of this sequence decreased Satb1 expression in TS cells and chromosome conformation capture analysis confirmed looping of this distant regulatory region into the proximal promoter. Scanning position weight matrices across the enhancer predicted two ELF5 binding sites in close proximity to SATB1 sites, which were confirmed by chromatin immunoprecipitation. Knockdown of ELF5 downregulated Satb1 expression in TS cells and overexpression of ELF5 increased the enhancer-reporter activity. Interestingly, ELF5 interacts with SATB1 in TS cells, and the enhancer activity was upregulated following SATB overexpression. Our findings indicate that trophoblast-specific Satb1 expression is regulated by long-range chromatin looping of an enhancer that interacts with ELF5 and SATB proteins.
Collapse
Affiliation(s)
- Wei Yu
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - V. Praveen Chakravarthi
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Shaon Borosha
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Iman Dilower
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Eun Bee Lee
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Anamika Ratri
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Rebekah R. Starks
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - Patrick E. Fields
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Michael W. Wolfe
- Department of Cell Biology and Physiology, University of Kansas Medical Center, Kansas City, KS, United States
| | - M. Omar Faruque
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| | - Geetu Tuteja
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, United States
| | - M. A. Karim Rumi
- Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS, United States
| |
Collapse
|
3
|
Bergero R, Ellis P, Haerty W, Larcombe L, Macaulay I, Mehta T, Mogensen M, Murray D, Nash W, Neale MJ, O'Connor R, Ottolini C, Peel N, Ramsey L, Skinner B, Suh A, Summers M, Sun Y, Tidy A, Rahbari R, Rathje C, Immler S. Meiosis and beyond - understanding the mechanistic and evolutionary processes shaping the germline genome. Biol Rev Camb Philos Soc 2021; 96:822-841. [PMID: 33615674 PMCID: PMC8246768 DOI: 10.1111/brv.12680] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 12/15/2020] [Accepted: 12/15/2020] [Indexed: 12/11/2022]
Abstract
The separation of germ cell populations from the soma is part of the evolutionary transition to multicellularity. Only genetic information present in the germ cells will be inherited by future generations, and any molecular processes affecting the germline genome are therefore likely to be passed on. Despite its prevalence across taxonomic kingdoms, we are only starting to understand details of the underlying micro-evolutionary processes occurring at the germline genome level. These include segregation, recombination, mutation and selection and can occur at any stage during germline differentiation and mitotic germline proliferation to meiosis and post-meiotic gamete maturation. Selection acting on germ cells at any stage from the diploid germ cell to the haploid gametes may cause significant deviations from Mendelian inheritance and may be more widespread than previously assumed. The mechanisms that affect and potentially alter the genomic sequence and allele frequencies in the germline are pivotal to our understanding of heritability. With the rise of new sequencing technologies, we are now able to address some of these unanswered questions. In this review, we comment on the most recent developments in this field and identify current gaps in our knowledge.
Collapse
Affiliation(s)
- Roberta Bergero
- Institute of Evolutionary BiologyUniversity of EdinburghEdinburghEH9 3JTU.K.
| | - Peter Ellis
- School of BiosciencesUniversity of KentCanterburyCT2 7NJU.K.
| | | | - Lee Larcombe
- Applied Exomics LtdStevenage Bioscience CatalystStevenageSG1 2FXU.K.
| | - Iain Macaulay
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Tarang Mehta
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Mette Mogensen
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
| | - David Murray
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
| | - Will Nash
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Matthew J. Neale
- Genome Damage and Stability Centre, School of Life SciencesUniversity of SussexBrightonBN1 9RHU.K.
| | | | | | - Ned Peel
- Earlham InstituteNorwich Research ParkNorwichNR4 7UZU.K.
| | - Luke Ramsey
- The James Hutton InstituteInvergowrieDundeeDD2 5DAU.K.
| | - Ben Skinner
- School of Life SciencesUniversity of EssexColchesterCO4 3SQU.K.
| | - Alexander Suh
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
- Department of Organismal BiologyUppsala UniversityNorbyvägen 18DUppsala752 36Sweden
| | - Michael Summers
- School of BiosciencesUniversity of KentCanterburyCT2 7NJU.K.
- The Bridge Centre1 St Thomas Street, London BridgeLondonSE1 9RYU.K.
| | - Yu Sun
- Norwich Medical SchoolUniversity of East AngliaNorwich Research Park, Colney LnNorwichNR4 7UGU.K.
| | - Alison Tidy
- School of BiosciencesUniversity of Nottingham, Plant Science, Sutton Bonington CampusSutton BoningtonLE12 5RDU.K.
| | | | - Claudia Rathje
- School of BiosciencesUniversity of KentCanterburyCT2 7NJU.K.
| | - Simone Immler
- School of Biological SciencesUniversity of East AngliaNorwich Research ParkNorwichNR4 7TJU.K.
| |
Collapse
|
4
|
Papin C, Le Gras S, Ibrahim A, Salem H, Karimi MM, Stoll I, Ugrinova I, Schröder M, Fontaine-Pelletier E, Omran Z, Bronner C, Dimitrov S, Hamiche A. CpG Islands Shape the Epigenome Landscape. J Mol Biol 2020; 433:166659. [PMID: 33010306 DOI: 10.1016/j.jmb.2020.09.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 09/22/2020] [Accepted: 09/22/2020] [Indexed: 02/07/2023]
Abstract
Epigenetic modifications and nucleosome positioning play an important role in modulating gene expression. However, how the patterns of epigenetic modifications and nucleosome positioning are established around promoters is not well understood. Here, we have addressed these questions in a series of genome-wide experiments coupled to a novel bioinformatic analysis approach. Our data reveal a clear correlation between CpG density, promoter activity and accumulation of active or repressive histone marks. CGI boundaries define the chromatin promoter regions that will be epigenetically modified. CpG-rich promoters are targeted by histone modifications and histone variants, while CpG-poor promoters are regulated by DNA methylation. CGIs boundaries, but not transcriptional activity, are essential determinants of H2A.Z positioning in vicinity of the promoters, suggesting that the presence of H2A.Z is not related to transcriptional control. Accordingly, H2A.Z depletion has no impact on gene expression of arrested mouse embryonic fibroblasts. Therefore, the underlying DNA sequence, the promoter CpG density and, to a lesser extent, transcriptional activity, are key factors implicated in promoter chromatin architecture.
Collapse
Affiliation(s)
- Christophe Papin
- Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), UdS, CNRS, INSERM, Equipe labellisée Ligue contre le Cancer, 1 rue Laurent Fries, B.P. 10142,67404 Illkirch Cedex, France.
| | - Stéphanie Le Gras
- Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), UdS, CNRS, INSERM, Equipe labellisée Ligue contre le Cancer, 1 rue Laurent Fries, B.P. 10142,67404 Illkirch Cedex, France
| | - Abdulkhaleg Ibrahim
- Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), UdS, CNRS, INSERM, Equipe labellisée Ligue contre le Cancer, 1 rue Laurent Fries, B.P. 10142,67404 Illkirch Cedex, France; Biotechnology Research Center (BTRC), Tripoli, Libya
| | - Hatem Salem
- Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), UdS, CNRS, INSERM, Equipe labellisée Ligue contre le Cancer, 1 rue Laurent Fries, B.P. 10142,67404 Illkirch Cedex, France; Biotechnology Research Center (BTRC), Tripoli, Libya
| | - Mohammad Mahdi Karimi
- Comprehensive Cancer Centre, School of Cancer & Pharmaceutical Sciences, Faculty of Life Sciences & Medicine, King's College London, Denmark Hill, London, UK
| | - Isabelle Stoll
- Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), UdS, CNRS, INSERM, Equipe labellisée Ligue contre le Cancer, 1 rue Laurent Fries, B.P. 10142,67404 Illkirch Cedex, France
| | - Iva Ugrinova
- Roumen Tsanev Institute of Molecular Biology, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Maria Schröder
- Roumen Tsanev Institute of Molecular Biology, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Emeline Fontaine-Pelletier
- Institute for Advanced Biosciences, Inserm U 1209, CNRS UMR 5309, Université Grenoble Alpes, 38000 Grenoble, France
| | - Ziad Omran
- Umm AlQura University, Faculty of Pharmacy, Saudi Arabia
| | - Christian Bronner
- Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), UdS, CNRS, INSERM, Equipe labellisée Ligue contre le Cancer, 1 rue Laurent Fries, B.P. 10142,67404 Illkirch Cedex, France
| | - Stefan Dimitrov
- Roumen Tsanev Institute of Molecular Biology, Bulgarian Academy of Sciences, Sofia, Bulgaria; Institute for Advanced Biosciences, Inserm U 1209, CNRS UMR 5309, Université Grenoble Alpes, 38000 Grenoble, France.
| | - Ali Hamiche
- Institut de Génétique et Biologie Moléculaire et Cellulaire (IGBMC), UdS, CNRS, INSERM, Equipe labellisée Ligue contre le Cancer, 1 rue Laurent Fries, B.P. 10142,67404 Illkirch Cedex, France.
| |
Collapse
|
5
|
Wu Z, Ioannidis NM, Zou J. Predicting target genes of non-coding regulatory variants with IRT. Bioinformatics 2020; 36:4440-4448. [PMID: 32330225 DOI: 10.1093/bioinformatics/btaa254] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/15/2020] [Accepted: 04/17/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. AVAILABILITY AND IMPLEMENTATION Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhenqin Wu
- Department of Chemistry, Stanford University, CA 94305, USA.,Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - Nilah M Ioannidis
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, 94305 CA, USA.,Chan-Zuckerberg Biohub, San Francisco, 94158 CA, USA
| |
Collapse
|
6
|
Singh R, Sophiarani Y. A report on DNA sequence determinants in gene expression. Bioinformation 2020; 16:422-431. [PMID: 32831525 PMCID: PMC7434957 DOI: 10.6026/97320630016422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 04/24/2020] [Indexed: 11/26/2022] Open
Abstract
The biased usage of nucleotides in coding sequence and its correlation with gene expression has been observed in several studies. A complex set of interactions between genes and other components of the expression system determine the amount of proteins produced from coding sequences. It is known that the elongation rate of polypeptide chain is affected by both codon usage bias and specific amino acid compositional constraints. Therefore, it is of interest to review local DNA-sequence elements and other positional as well as combinatorial constraints that play significant role in gene expression.
Collapse
Affiliation(s)
- Ravail Singh
- Indian Institute of Integrative Medicine, CSIR, Canal Road, Jammu-180001
| | | |
Collapse
|
7
|
Biswas R, Panja AS, Bandopadhyay R. In Silico Analyses of Burial Codon Bias Among the Species of Dipterocarpaceae Through Molecular and Phylogenetic Data. Evol Bioinform Online 2019; 15:1176934319834888. [PMID: 31223230 PMCID: PMC6563522 DOI: 10.1177/1176934319834888] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 02/07/2019] [Indexed: 11/15/2022] Open
Abstract
Introduction: DNA barcode, a molecular marker, is used to distinguish among the closely
related species, and it can be applied across a broad range of taxa to
understand ecology and evolution. MaturaseK gene (matK) and
rubisco bisphosphate carboxylase/oxygenase form I gene
(rbcL) of the chloroplast are highly conserved in a
plant system, which are used as core barcode. This present endeavor entails
the comprehensive examination of the under threat plant species based on
success of discrimination on DNA barcode under selection pressure. Result: The family Dipterocarpaceae comprising of 15 genera is under threat due to
some factors, namely, deforestation, habitat alteration, poor seed, pollen
dispersal, etc. Species of this family was grouped into 6 clusters for
matK and 5 clusters and 2 sub-clusters for
rbcL in the phylogenetic tree by using neighbor-joining
method. Cluster I to cluster VI of matK and cluster I to
cluster V of rbcL genes were analyzed by various codon and
substitution bias tools. Mutational pressure guided the codon bias which was
favored by the avoidance of higher GC content and significant negative
correlation between GC12 and GC3 (in sub-cluster I of cluster I
[0.03 < P], cluster I
[0.00001 < P], and cluster II
[0.01 < P] of rbcL, and cluster IV
[0.013 < P] of matK). After
refining the results, it could be speculated that the lower null expectation
values (R = 0.5 or <0.5) were less divergent from the
evolutionary perspective. Apart from that, the higher null expectation
values (R = >0.85) also showed the same result, which
possibly could be due to the negative impact of very high and low transition
rate than transversion. Conclusion: Through the analysis of inter-generic, inter/intra-specific variation and
phylogenetic data, it was found that both selection and mutation played an
important role in synonymous codon choice in these genes, but they acted
inconsistently on the genes, both matK and
rbcL. In vitro stable proteins of both
matK and rbcL were selected through
natural selection rather than mutational selection. matK
gene had higher individual discrimination and barcode success compared with
rbcL. These discriminatory approaches may describe the
problem related to the extinction of plant species. Hence, it becomes very
imperative to identify and detect the under threat plant species in
advance.
Collapse
Affiliation(s)
- Raju Biswas
- UGC-Center of Advanced Study, Department of Botany, The University of Burdwan, Bardhaman, India
| | - Anindya Sundar Panja
- Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore, India
| | - Rajib Bandopadhyay
- UGC-Center of Advanced Study, Department of Botany, The University of Burdwan, Bardhaman, India
| |
Collapse
|
8
|
Uddin A, Paul N, Chakraborty S. The codon usage pattern of genes involved in ovarian cancer. Ann N Y Acad Sci 2019; 1440:67-78. [PMID: 30843242 DOI: 10.1111/nyas.14019] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 01/04/2019] [Accepted: 01/14/2019] [Indexed: 12/20/2022]
Abstract
In this study, we analyzed the compositional dynamics and codon usage pattern of genes involved in ovarian cancer (OC) using a computational method. Mutations in specific genes are associated with OC, and some genes are risk factors for progression of OC, but no work has been reported yet on the codon usage pattern of genes involved in OC. Nucleotide composition analysis of OC-related genes suggested that the overall GC content was higher than AT content; that is, the genes were GC rich. The improved effective number of codons indicated that the overall extent of codon usage bias of genes involved in OC was low. The codons AGC, CTG, ATC, ACC, GTG, and GCC were overrepresented, while the codons TCG, TTA, CTA, CCG, CAA, CGT, ATA, ACG, GTA, GTT, GCG, and GGT were underrepresented in the genes. Correspondence analysis suggested that the codon usage pattern was different in different genes. A highly significant correlation was observed between GC12 and GC3 (r = 0.587, P < 0.01) of genes, suggesting that directional mutation affected the three codon positions. Our report on the codon usage pattern of genes involved in OC includes a new perspective for elucidating the mechanisms of biased usage of synonymous codons, as well as providing useful clues for molecular genetic engineering.
Collapse
Affiliation(s)
- Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Assam, India
| | - Nirmal Paul
- Department of Biotechnology, Assam University, Assam, India
| | | |
Collapse
|
9
|
Kryuchkova-Mostacci N, Robinson-Rechavi M. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform 2017; 18:205-214. [PMID: 26891983 PMCID: PMC5444245 DOI: 10.1093/bib/bbw008] [Citation(s) in RCA: 185] [Impact Index Per Article: 23.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Indexed: 01/06/2023] Open
Abstract
One of the major properties of genes is their expression pattern. Notably, genes are often classified as tissue specific or housekeeping. This property is of interest to molecular evolution as an explanatory factor of, e.g. evolutionary rate, as well as a functional feature which may in itself evolve. While many different methods of measuring tissue specificity have been proposed and used for such studies, there has been no comparison or benchmarking of these methods to our knowledge, and little justification of their use. In this study, we compare nine measures of tissue specificity. Most methods were established for ESTs and microarrays, and several were later adapted to RNA-seq. We analyse their capacity to distinguish gene categories, their robustness to the choice and number of tissues used and their capture of evolutionary conservation signal.
Collapse
Affiliation(s)
- Nadezda Kryuchkova-Mostacci
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
10
|
Karbalaie K, Vallian S, Lachinani L, Tanhaei S, Baharvand H, Nasr-Esfahani MH. Analysis of Promyelocytic Leukemia in Human Embryonic Carcinoma Stem Cells During Retinoic Acid-Induced Neural Differentiation. IRANIAN JOURNAL OF BIOTECHNOLOGY 2017; 14:169-176. [PMID: 28959333 PMCID: PMC5492245 DOI: 10.15171/ijb.1358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND Promyelocytic leukemia protein (PML) is a tumor suppressor protein that is involved in myeloid cell differentiation in response to retinoic acid (RA). In addition, RA acts as a natural morphogen in neural development. OBJECTIVES This study aimed to examine PML gene expression in different stages of in vitro neural differentiation of NT2 cells, and to investigate the possible role of PML in pluripotency and/or neural development. MATERIALS AND METHODS RA was used as a neural inducer for in vitro neural differentiation of NT2 cells. During this process PML mRNA and protein levels were assessed by quantitative real time RT-PCR (QRT-PCR) and Immunoblotting, respectively. Furthermore bisulfite sequencing PCR (BSP) was used to assess PML promoter methylation in NT2 cells and NT2 derived neuronal precursor cells (NT2.NPCs). RESULTS QRT-PCR results showed that, PML had maximum expression with significant differences in NT2 derived neuronal precursor cells relative to NT2 cells and NT2 derived neural cells (NT2.NCs). Numerous isoforms of PML with different intensities appeared in immunoblots of pluripotent NT2 cells, NT2.NPCs, and NT2.NCs. Furthermore, the methylation of the PML promoter in NT2.NCs was 2.6 percent lower than NT2 cell. CONCLUSIONS The observed differences in PML expression in different cellular stages possibly could be attributed to the fact that PML in each developmental state might be involved in different cell signaling machinery and different functions. The appearance of different PML isoforms with more intensity in neural progenitor cells; may suggest apossible role for this protein in neural development.
Collapse
Affiliation(s)
- Khadijeh Karbalaie
- Division of Genetics, Department of Biology, Faculty of Science, University of Isfahan, Isfahan, Iran.,Department of Cellular Biotechnology, Cell Science Research Center, Royan Institute for Biotechnology, ACECR, Isfahan, Iran
| | - Sadeq Vallian
- Division of Genetics, Department of Biology, Faculty of Science, University of Isfahan, Isfahan, Iran
| | - Liana Lachinani
- Department of Cell and Molecular Biology, Cell Science Research Center, Royan Institute for Biotechnology, ACECR, Isfahan, Iran
| | - Somayeh Tanhaei
- Department of Molecular Genetics , Cell Science Research Center, Royan Institute for Biotechnology, ACECR, Isfahan, Iran
| | - Hossein Baharvand
- Department of Stem Cells and Developmental Biology, Cell Science Research Center, Royan Institute for Stem Cell Biology and Technology, ACECR, Tehran, Iran.,Department of Developmental Biology, University of Science and Culture, Tehran, Iran
| | - Mohammad Hossein Nasr-Esfahani
- Department of Cellular Biotechnology, Cell Science Research Center, Royan Institute for Biotechnology, ACECR, Isfahan, Iran
| |
Collapse
|
11
|
Elhamamsy AR. DNA methylation dynamics in plants and mammals: overview of regulation and dysregulation. Cell Biochem Funct 2016; 34:289-98. [PMID: 27003927 DOI: 10.1002/cbf.3183] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Revised: 02/18/2016] [Accepted: 02/29/2016] [Indexed: 12/22/2022]
Abstract
DNA methylation is a major epigenetic marking mechanism regulating various biological functions in mammals and plant. The crucial role of DNA methylation has been observed in cellular differentiation, embryogenesis, genomic imprinting and X-chromosome inactivation. Furthermore, DNA methylation takes part in disease susceptibility, responses to environmental stimuli and the biodiversity of natural populations. In plant, different types of environmental stress have demonstrated the ability to alter the archetype of DNA methylation through the genome, change gene expression and confer a mechanism of adaptation. DNA methylation dynamics are regulated by three processes de novo DNA methylation, methylation maintenance and DNA demethylation. These processes have their similarities and differences between mammals and plants. Furthermore, the dysregulation of DNA methylation dynamics represents one of the primary molecular mechanisms of developing diseases in mammals. This review discusses the regulation and dysregulation of DNA methylation in plants and mammals. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Amr Rafat Elhamamsy
- Clinical Pharmacy Department, Faculty of Pharmacy, Tanta University, Tanta, Egypt
| |
Collapse
|
12
|
Jiang N, Wang L, Chen J, Wang L, Leach L, Luo Z. Conserved and divergent patterns of DNA methylation in higher vertebrates. Genome Biol Evol 2014; 6:2998-3014. [PMID: 25355807 PMCID: PMC4255770 DOI: 10.1093/gbe/evu238] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2014] [Indexed: 02/07/2023] Open
Abstract
DNA methylation in the genome plays a fundamental role in the regulation of gene expression and is widespread in the genome of eukaryotic species. For example, in higher vertebrates, there is a "global" methylation pattern involving complete methylation of CpG sites genome-wide, except in promoter regions that are typically enriched for CpG dinucleotides, or so called "CpG islands." Here, we comprehensively examined and compared the distribution of CpG sites within ten model eukaryotic species and linked the observed patterns to the role of DNA methylation in controlling gene transcription. The analysis revealed two distinct but conserved methylation patterns for gene promoters in human and mouse genomes, involving genes with distinct distributions of promoter CpGs and gene expression patterns. Comparative analysis with four other higher vertebrates revealed that the primary regulatory role of the DNA methylation system is highly conserved in higher vertebrates.
Collapse
Affiliation(s)
- Ning Jiang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Lin Wang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China
| | - Jing Chen
- School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Luwen Wang
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China
| | - Lindsey Leach
- School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| | - Zewei Luo
- Department of Biostatistics & Computational Biology, SKLG, School of Life Sciences, Fudan University, Shanghai, China School of Biosciences, The University of Birmingham, Birmingham B15 2TT United Kingdom
| |
Collapse
|
13
|
Thomas M, Pingault L, Poulet A, Duarte J, Throude M, Faure S, Pichon JP, Paux E, Probst AV, Tatout C. Evolutionary history of Methyltransferase 1 genes in hexaploid wheat. BMC Genomics 2014; 15:922. [PMID: 25342325 PMCID: PMC4223845 DOI: 10.1186/1471-2164-15-922] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 10/13/2014] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Plant and animal methyltransferases are key enzymes involved in DNA methylation at cytosine residues, required for gene expression control and genome stability. Taking advantage of the new sequence surveys of the wheat genome recently released by the International Wheat Genome Sequencing Consortium, we identified and characterized MET1 genes in the hexaploid wheat Triticum aestivum (TaMET1). RESULTS Nine TaMET1 genes were identified and mapped on homoeologous chromosome groups 2A/2B/2D, 5A/5B/5D and 7A/7B/7D. Synteny analysis and evolution rates suggest that the genome organization of TaMET1 genes results from a whole genome duplication shared within the grass family, and a second gene duplication, which occurred specifically in the Triticeae tribe prior to the speciation of diploid wheat. Higher expression levels were observed for TaMET1 homoeologous group 2 genes compared to group 5 and 7, indicating that group 2 homoeologous genes are predominant at the transcriptional level, while group 5 evolved into pseudogenes. We show the connection between low expression levels, elevated evolution rates and unexpected enrichment in CG-dinucleotides (CG-rich isochores) at putative promoter regions of homoeologous group 5 and 7, but not of group 2 TaMET1 genes. Bisulfite sequencing reveals that these CG-rich isochores are highly methylated in a CG context, which is the expected target of TaMET1. CONCLUSIONS We retraced the evolutionary history of MET1 genes in wheat, explaining the predominance of group 2 homoeologous genes and suggest CG-DNA methylation as one of the mechanisms involved in wheat genome dynamics.
Collapse
Affiliation(s)
- Mélanie Thomas
- />UMR CNRS 6293 INSERM U 1103 Clermont Université, Genetics Reproduction and Development (GReD), 24 avenue des Landais, BP80026, 63171 Aubière Cedex, France
- />BIOGEMMA, route d’Ennezat, Centre de Recherche de Chappes, CS 90126, 63720 Chappes, France
| | - Lise Pingault
- />UMR INRA 1095 Blaise Pascal University, Genetics Diversity & Ecophysiology of Cereals (GDEC), Clermont-Ferrand – Theix, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex 2, France
| | - Axel Poulet
- />UMR CNRS 6293 INSERM U 1103 Clermont Université, Genetics Reproduction and Development (GReD), 24 avenue des Landais, BP80026, 63171 Aubière Cedex, France
| | - Jorge Duarte
- />BIOGEMMA, route d’Ennezat, Centre de Recherche de Chappes, CS 90126, 63720 Chappes, France
| | - Mickaël Throude
- />BIOGEMMA, route d’Ennezat, Centre de Recherche de Chappes, CS 90126, 63720 Chappes, France
| | - Sébastien Faure
- />BIOGEMMA, route d’Ennezat, Centre de Recherche de Chappes, CS 90126, 63720 Chappes, France
| | - Jean-Philippe Pichon
- />BIOGEMMA, route d’Ennezat, Centre de Recherche de Chappes, CS 90126, 63720 Chappes, France
| | - Etienne Paux
- />UMR INRA 1095 Blaise Pascal University, Genetics Diversity & Ecophysiology of Cereals (GDEC), Clermont-Ferrand – Theix, 5 chemin de Beaulieu, 63039 Clermont-Ferrand Cedex 2, France
| | - Aline Valeska Probst
- />UMR CNRS 6293 INSERM U 1103 Clermont Université, Genetics Reproduction and Development (GReD), 24 avenue des Landais, BP80026, 63171 Aubière Cedex, France
| | - Christophe Tatout
- />UMR CNRS 6293 INSERM U 1103 Clermont Université, Genetics Reproduction and Development (GReD), 24 avenue des Landais, BP80026, 63171 Aubière Cedex, France
| |
Collapse
|
14
|
Yang H, Li D, Cheng C. Relating gene expression evolution with CpG content changes. BMC Genomics 2014; 15:693. [PMID: 25142157 PMCID: PMC4148958 DOI: 10.1186/1471-2164-15-693] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Accepted: 08/15/2014] [Indexed: 11/10/2022] Open
Abstract
Background Previous studies have shown that CpG dinucleotides are enriched in a subset of promoters and the CpG content of promoters is positively correlated with gene expression levels. But the relationship between divergence of CpG content and gene expression evolution has not been investigated. Here we calculate the normalized CpG (nCpG) content in DNA regions around transcription start site (TSS) and transcription terminal site (TTS) of genes in nine organisms, and relate them with expression levels measured by RNA-seq. Results The nCpG content of TSS shows a bimodal distribution in all organisms except platypus, whereas the nCpG content of TTS only has a single peak. When the nCpG contents are compared between different organisms, we observe a different evolution pattern between TSS and TTS: compared with TTS, TSS exhibits a faster divergence rate between closely related species but are more conserved between distant species. More importantly, we demonstrate the link between gene expression evolution and nCpG content changes: up-/down- regulation of genes in an organism is accompanied by the nCpG content increase/decrease in their TSS and TTS proximal regions. Conclusions Our results suggest that gene expression changes between different organisms are correlated with the alterations in normalized CpG contents of promoters. Our analyses provide evidences for the impact of nCpG content on gene expression evolution. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-693) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Chao Cheng
- HB7400, Remsen 702, Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover NH 03755, USA.
| |
Collapse
|
15
|
Han F, Peng Y, Xu L, Xiao P. Identification, characterization, and utilization of single copy genes in 29 angiosperm genomes. BMC Genomics 2014; 15:504. [PMID: 24950957 PMCID: PMC4092219 DOI: 10.1186/1471-2164-15-504] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 06/17/2014] [Indexed: 01/01/2023] Open
Abstract
Background Single copy genes are common across angiosperm genomes. With the sufficiently high quality sequenced genomes, the identification of large-scale single copy genes among multiple species is possible. Although some characteristics have been reported, our study provides novel insights into single copy genes. Results We identified single copy genes across 29 angiosperm genomes. A significant negative correlation was found between the number of duplicate blocks and the number of single copy genes. We found that a considerable number of single copy genes are located in organelles, showing a preference for binding and catalytic activity. The analysis of effective number of codons (Nc) illustrates that single copy genes have a stronger codon bias than non-single copy genes in eudicots. The relative high expression level of single copy genes was partially confirmed by the RNA-seq data, rather than the Codon Adaptation Index (CAI). Unlike in most other species, a strongly negatively correlation occurs between Nc and GC3 among single copy genes in grass genomes. When compared to all non-single copy genes, single copy genes indicate more conservation (as indicated by Ka and Ks values). But our alternative splicing (AS) results reveal that selective constraints are weaker in single copy genes than in low copy family genes (1–10 in-paralogs) and stronger than high copy family genes (>10 in-paralogs). Using concatenated shared single copy genes, we obtained a well-resolved phylogenetic tree. With the addition of intron sequences, the branch support is improved, but striking incongruences are also evident. Therefore, it is noteworthy that inclusion of intron sequences seems more appropriate for the phylogenetic reconstruction at lower taxonomic levels. Conclusions Our analysis provides insight into the evolutionary characteristics of single copy genes across 29 angiosperm genomes. The results suggest that there are key differences in evolutionary constraints between single copy genes and non-single copy genes. And to some extent, these evolutionary constraints show some species-specific differences, especially between eudicots and monocots. Our preliminary evidence also suggests that the concatenated shared single copy genes are well suited for use in resolving phylogenetic relationships. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-504) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - Peigen Xiao
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Beijing 100193, PR China.
| |
Collapse
|
16
|
Borštnik B, Pumpernik D. The apparent enhancement of CpG transversions in primate lineage is a consequence of multiple replacements. J Bioinform Comput Biol 2014; 12:1450011. [PMID: 24969749 DOI: 10.1142/s0219720014500115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We claim that the apparently enhanced CpG transversions in the form CpG to CpC/GpG or to ApG/CpT are caused by the hypermutable CpG to CpA/TpG transition. The nucleotide replacement counts obtained from the human/chimpanzee/gorilla/orangutan sequence alignments representing the replacements due to the evolutionary species divergence and the results of 1000 genomes project that provide us with the differences due to the intraspecies diversification were analyzed to estimate the ratio of CpG versus non-CpG transversion probabilities. The trinucleotide replacement counts were extracted from the regions that are free of functional constraints. The CpG transversion probabilities based upon the genomic comparisons were found to exceed more than twice the non-CpG transversions. The diversity data emerging from 14 population groups were partitioned in five classes as a function of the parameter quantifying the spread of the polymorphic allele among the group of individuals. The results based upon the human polymorphism exhibit a trend where CpG over non-CpG transversion probability ratio is less and less exceeding unity as the values of the derived allele frequency (DAF) of snps are diminishing. A computer simulation of a simplified model indicates that the phenomenon of the apparent enhancement of CpG transversions can have its source in the interference of the entropic effects with the maximum likelihood methodologies.
Collapse
Affiliation(s)
- Branko Borštnik
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | | |
Collapse
|
17
|
Abstract
Proximal promoters are fundamental genomic elements for gene expression. They vary in terms of GC percentage, CpG abundance, presence of TATA signal, evolutionary conservation, chromosomal spread of transcription start sites and breadth of expression across cell types. These properties are correlated, and it has been suggested that there are two classes of promoters: one class with high CpG, widely spread transcription start sites and broad expression, and another with TATA signals, narrow spread and restricted expression. However, it has been unclear why these properties are correlated in this way. We reexamined these features using the deep FANTOM5 CAGE data from hundreds of cell types. First, we point out subtle but important biases in previous definitions of promoters and of expression breadth. Second, we show that most promoters are rather nonspecifically expressed across many cell types. Third, promoters’ expression breadth is independent of maximum expression level, and therefore correlates with average expression level. Fourth, the data show a more complex picture than two classes, with a network of direct and indirect correlations among promoter properties. By tentatively distinguishing the direct from the indirect correlations, we reveal simple explanations for them.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center, AIST, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | | |
Collapse
|
18
|
Müller F, Tora L. Chromatin and DNA sequences in defining promoters for transcription initiation. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2013; 1839:118-28. [PMID: 24275614 DOI: 10.1016/j.bbagrm.2013.11.003] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Revised: 11/11/2013] [Accepted: 11/11/2013] [Indexed: 01/29/2023]
Abstract
One of the key events in eukaryotic gene regulation and consequent transcription is the assembly of general transcription factors and RNA polymerase II into a functional pre-initiation complex at core promoters. An emerging view of complexity arising from a variety of promoter associated DNA motifs, their binding factors and recent discoveries in characterising promoter associated chromatin properties brings an old question back into the limelight: how is a promoter defined? In addition to position-dependent DNA sequence motifs, accumulating evidence suggests that several parallel acting mechanisms are involved in orchestrating a pattern marked by the state of chromatin and general transcription factor binding in preparation for defining transcription start sites. In this review we attempt to summarise these promoter features and discuss the available evidence pointing at their interactions in defining transcription initiation in developmental contexts. This article is part of a Special Issue entitled: Chromatin and epigenetic regulation of animal development.
Collapse
Affiliation(s)
- Ferenc Müller
- School of Clinical and Experimental Medicine, College of Medical and Dental Sciences, University of Birmingham, B15 2TT Edgbaston, Birmingham, UK.
| | - Làszlò Tora
- Cellular Signaling and Nuclear Dynamics Program, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), UMR 7104 CNRS, UdS, INSERM U964, BP 10142, F-67404 Illkirch Cedex, CU de Strasbourg, France; School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, 637551, Singapore.
| |
Collapse
|
19
|
Julienne H, Zoufir A, Audit B, Arneodo A. Epigenetic regulation of the human genome: coherence between promoter activity and large-scale chromatin environment. FRONTIERS IN LIFE SCIENCE 2013. [DOI: 10.1080/21553769.2013.832706] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
20
|
Audit B, Zaghloul L, Baker A, Arneodo A, Chen CL, d'Aubenton-Carafa Y, Thermes C. Megabase replication domains along the human genome: relation to chromatin structure and genome organisation. Subcell Biochem 2013; 61:57-80. [PMID: 23150246 DOI: 10.1007/978-94-007-4525-4_3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
In higher eukaryotes, the absence of specific sequence motifs, marking the origins of replication has been a serious hindrance to the understanding of (i) the mechanisms that regulate the spatio-temporal replication program, and (ii) the links between origins activation, chromatin structure and transcription. In this chapter, we review the partitioning of the human genome into megabased-size replication domains delineated as N-shaped motifs in the strand compositional asymmetry profiles. They collectively span 28.3% of the genome and are bordered by more than 1,000 putative replication origins. We recapitulate the comparison of this partition of the human genome with high-resolution experimental data that confirms that replication domain borders are likely to be preferential replication initiation zones in the germline. In addition, we highlight the specific distribution of experimental and numerical chromatin marks along replication domains. Domain borders correspond to particular open chromatin regions, possibly encoded in the DNA sequence, and around which replication and transcription are highly coordinated. These regions also present a high evolutionary breakpoint density, suggesting that susceptibility to breakage might be linked to local open chromatin fiber state. Altogether, this chapter presents a compartmentalization of the human genome into replication domains that are landmarks of the human genome organization and are likely to play a key role in genome dynamics during evolution and in pathological situations.
Collapse
|
21
|
Abstract
Cancer has been considered a genetic disease with a wide array of well-characterized gene mutations and chromosomal abnormalities. Of late, aberrant epigenetic modifications have been elucidated in cancer, and together with genetic alterations, they have been helpful in understanding the complex traits observed in neoplasia. "Cancer Epigenetics" therefore has contributed substantially towards understanding the complexity and diversity of various cancers. However, the positioning of epigenetic events during cancer progression is still not clear, though there are some reports implicating aberrant epigenetic modifications in very early stages of cancer. Amongst the most studied aberrant epigenetic modifications are the DNA methylation differences at the promoter regions of genes affecting their expression. Hypomethylation mediated increased expression of oncogenes and hypermethylation mediated silencing of tumor suppressor genes are well known examples. This chapter also explores the correlation of DNA methylation and demethylation enzymes with cancer.
Collapse
Affiliation(s)
- Gopinathan Gokul
- Laboratory of Mammalian Genetics, CDFD, Hyderabad, 500001, India
| | | |
Collapse
|
22
|
Vavouri T, Lehner B. Human genes with CpG island promoters have a distinct transcription-associated chromatin organization. Genome Biol 2012. [PMID: 23186133 PMCID: PMC3580500 DOI: 10.1186/gb-2012-13-11-r110] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background More than 50% of human genes initiate transcription from CpG dinucleotide-rich regions referred to as CpG islands. These genes show differences in their patterns of transcription initiation, and have been reported to have higher levels of some activation-associated chromatin modifications. Results Here we report that genes with CpG island promoters have a characteristic transcription-associated chromatin organization. This signature includes high levels of the transcription elongation-associated histone modifications H4K20me1, H2BK5me1 and H3K79me1/2/3 in the 5' end of the gene, depletion of the activation marks H2AK5ac, H3K14ac and H3K23ac immediately downstream of the transcription start site (TSS), and characteristic epigenetic asymmetries around the TSS. The chromosome organization factor CTCF may be bound upstream of RNA polymerase in most active CpG island promoters, and an unstable nucleosome at the TSS may be specifically marked by H4K20me3, the first example of such a modification. H3K36 monomethylation is only detected as enriched in the bodies of active genes that have CpG island promoters. Finally, as expression levels increase, peak modification levels of the histone methylations H3K9me1, H3K4me1, H3K4me2 and H3K27me1 shift further away from the TSS into the gene body. Conclusions These results suggest that active genes with CpG island promoters have a distinct step-like series of modified nucleosomes after the TSS. The identity, positioning, shape and relative ordering of transcription-associated histone modifications differ between genes with and without CpG island promoters. This supports a model where chromatin organization reflects not only transcription activity but also the type of promoter in which transcription initiates.
Collapse
|
23
|
Baker A, Audit B, Chen CL, Moindrot B, Leleu A, Guilbaud G, Rappailles A, Vaillant C, Goldar A, Mongelard F, d'Aubenton-Carafa Y, Hyrien O, Thermes C, Arneodo A. Replication fork polarity gradients revealed by megabase-sized U-shaped replication timing domains in human cell lines. PLoS Comput Biol 2012; 8:e1002443. [PMID: 22496629 PMCID: PMC3320577 DOI: 10.1371/journal.pcbi.1002443] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 02/09/2012] [Indexed: 12/26/2022] Open
Abstract
In higher eukaryotes, replication program specification in different cell types remains to be fully understood. We show for seven human cell lines that about half of the genome is divided in domains that display a characteristic U-shaped replication timing profile with early initiation zones at borders and late replication at centers. Significant overlap is observed between U-domains of different cell lines and also with germline replication domains exhibiting a N-shaped nucleotide compositional skew. From the demonstration that the average fork polarity is directly reflected by both the compositional skew and the derivative of the replication timing profile, we argue that the fact that this derivative displays a N-shape in U-domains sustains the existence of large-scale gradients of replication fork polarity in somatic and germline cells. Analysis of chromatin interaction (Hi-C) and chromatin marker data reveals that U-domains correspond to high-order chromatin structural units. We discuss possible models for replication origin activation within U/N-domains. The compartmentalization of the genome into replication U/N-domains provides new insights on the organization of the replication program in the human genome.
Collapse
Affiliation(s)
- Antoine Baker
- Université de Lyon, Lyon, France
- Laboratoire Joliot-Curie, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
- Laboratoire de Physique, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
| | - Benjamin Audit
- Université de Lyon, Lyon, France
- Laboratoire Joliot-Curie, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
- Laboratoire de Physique, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
| | - Chun-Long Chen
- Centre de Génétique Moléculaire UPR 3404, CNRS, Gif-sur-Yvette, France
| | - Benoit Moindrot
- Université de Lyon, Lyon, France
- Laboratoire Joliot-Curie, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
| | - Antoine Leleu
- Université de Lyon, Lyon, France
- Laboratoire Joliot-Curie, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
| | - Guillaume Guilbaud
- Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197, Inserm U1024, Paris, France
| | - Aurélien Rappailles
- Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197, Inserm U1024, Paris, France
| | - Cédric Vaillant
- Université de Lyon, Lyon, France
- Laboratoire Joliot-Curie, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
- Laboratoire de Physique, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
| | - Arach Goldar
- Commissariat à l'énergie atomique, iBiTecS, Gif-sur-Yvette, France
| | - Fabien Mongelard
- Université de Lyon, Lyon, France
- Laboratoire Joliot-Curie, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
- Laboratoire de Biologie Moléculaire de la Cellule, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
| | | | - Olivier Hyrien
- Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197, Inserm U1024, Paris, France
| | - Claude Thermes
- Centre de Génétique Moléculaire UPR 3404, CNRS, Gif-sur-Yvette, France
| | - Alain Arneodo
- Université de Lyon, Lyon, France
- Laboratoire Joliot-Curie, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
- Laboratoire de Physique, CNRS, Ecole Normale Supérieure de Lyon, Lyon, France
- * E-mail:
| |
Collapse
|
24
|
Bérard J, Guéguen L. Accurate estimation of substitution rates with neighbor-dependent models in a phylogenetic context. Syst Biol 2012; 61:510-21. [PMID: 22331438 DOI: 10.1093/sysbio/sys024] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Most models and algorithms developed to perform statistical inference from DNA data make the assumption that substitution processes affecting distinct nucleotide sites are stochastically independent. This assumption ensures both mathematical and computational tractability but is in disagreement with observed data in many situations--one well-known example being CpG dinucleotide hypermutability in mammalian genomes. In this paper, we consider the class of RN95 + YpR substitution models, which allows neighbor-dependent effects--including CpG hypermutability--to be taken into account, through transitions between pyrimidine-purine dinucleotides. We show that it is possible to adapt inference methods originally developed under the assumption of independence between sites to RN95 + YpR models, using a mathematically rigorous framework provided by specific structural properties of this class of models. We assess how efficient this approach is at inferring the CpG hypermutability rate from aligned DNA sequences. The method is tested on simulated data and compared against several alternatives; the results suggest that it delivers a high degree of accuracy at a low computational cost. We then apply our method to an alignment of 10 DNA sequences from primate species. Model comparisons within the RN95 + YpR class show the importance of taking into account neighbor-dependent effects. An application of the method to the detection of hypomethylated islands is discussed.
Collapse
Affiliation(s)
- Jean Bérard
- Institut Camille Jordan, UMR CNRS 5208, Université Lyon 1, Villeurbanne F-69622 Cedex, Université de Lyon, Lyon 69003, France
| | | |
Collapse
|
25
|
Xu C, Cai X, Chen Q, Zhou H, Cai Y, Ben A. Factors affecting synonymous codon usage bias in chloroplast genome of oncidium gower ramsey. Evol Bioinform Online 2011; 7:271-8. [PMID: 22253533 PMCID: PMC3255522 DOI: 10.4137/ebo.s8092] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Oncidium Gower Ramsey is a fascinating and important ornamental flower in floral industry. In this research, the complete nucleotide sequence of the chloroplast genome in Oncidium Gower Ramsey was studied, then analyzed using Codonw software. Correspondence analysis and method of effective number of codon as Nc-plot were conducted to analyze synonymous codon usage. According to the corresponding analysis, codon bias in the chloroplast genome of Oncidium Gower Ramsey is related to their gene length, mutation bias, gene hydropathy level of each protein, gene function and selection or gene expression only subtly affect codon usage. This study will provide insights into the molecular evolution study and high-level transgene expression.
Collapse
Affiliation(s)
- Chen Xu
- School of Biochemical and Environmental Engineering, Nanjing Xiaozhuang University, Nanjing 211171, Jiangsu, China
| | | | | | | | | | | |
Collapse
|
26
|
Lee YM, Chen HW, Maurya PK, Su CM, Tzeng CR. MicroRNA regulation via DNA methylation during the morula to blastocyst transition in mice. Mol Hum Reprod 2011; 18:184-93. [PMID: 22053057 DOI: 10.1093/molehr/gar072] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Epigenetic regulation is responsible for transcriptional silencing of genes and parental imprinting. This study addresses the question whether microRNAs (miRNAs) could be affected by DNA methylation during morula-blastocyst transition. Mouse embryos were treated with/without a DNA methyltransferase inhibitor (5-aza-2'-deoxycytidine, 5-aza-dC, 10 nM-5 μM). Changes of miRNAs were analyzed by quantitative real-time (Q-PCR)-based megaplex pre-amp microRNA assays. Development from morula to blastocyst in mice was inhibited by 5-aza-dC in a dose-dependent manner (10 nM-5 μM), with half of the embryos arrested at morula stage when treated with levels of 5-aza-dC as low as 50 nM. In total, 48 down-regulated microRNAs and 17 up-regulated microRNAs (≥5-fold changes) were identified after 5-aza-dC treatment, including let-7e, mir-20a, mir-21, mir-34b, mir-128b and mir-452. Their predicted targets were selected based on software analysis, published databases and further confirmed by Q-PCR. At least eight targets, including dnmt3a, jagged 1, sp1, edg2, abcg4, numa1, tmsb10 and csf1r were confirmed. In conclusion, 5-aza-dC-modified microRNA profiles and identification of the microRNA's targets during the morula to blastocyst stage in mice provide information that helps us to explore the relationship between fertility, microRNA regulation and epigenetic intervention.
Collapse
Affiliation(s)
- Yee-Ming Lee
- Institute of Pharmacology, College of Medicine, National Yang-Ming University, Taipei, Taiwan
| | | | | | | | | |
Collapse
|
27
|
Rach EA, Winter DR, Benjamin AM, Corcoran DL, Ni T, Zhu J, Ohler U. Transcription initiation patterns indicate divergent strategies for gene regulation at the chromatin level. PLoS Genet 2011; 7:e1001274. [PMID: 21249180 PMCID: PMC3020932 DOI: 10.1371/journal.pgen.1001274] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2010] [Accepted: 12/13/2010] [Indexed: 11/18/2022] Open
Abstract
The application of deep sequencing to map 5' capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: "focused" promoters with transcription start sites (TSSs) that occur in a narrowly defined genomic span and "dispersed" promoters with TSSs that are spread over a larger window. Previous studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, but virtually no studies have directly investigated the relationship with chromatin features. Here, we show that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Dispersed promoters display higher associations with well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone variants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes. Notably, differences are conserved across mammals and flies, and they provide for a clearer separation of promoter architectures than the presence and absence of CpG islands or the occurrence of stalled RNA polymerase. Computational models support the stronger contribution of chromatin features to the definition of dispersed promoters compared to focused start sites. Our results show that promoter classes defined from 5' capped transcripts not only reflect differences in the initiation process at the core promoter but also are indicative of divergent transcriptional programs established within gene-proximal nucleosome organization.
Collapse
Affiliation(s)
- Elizabeth A. Rach
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Deborah R. Winter
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Ashlee M. Benjamin
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - David L. Corcoran
- Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, United States of America
| | - Ting Ni
- Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Cell Biology, Duke University, Durham, North Carolina, United States of America
| | - Jun Zhu
- Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Cell Biology, Duke University, Durham, North Carolina, United States of America
| | - Uwe Ohler
- Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina, United States of America
- Department of Computer Science, Duke University, Durham, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
28
|
Hutter B, Paulsen M, Helms V. Identifying CpG islands by different computational techniques. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 13:153-64. [PMID: 19196100 DOI: 10.1089/omi.2008.0046] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
CpG islands (CGIs) are generally regarded as important epigenetic regulatory elements due to their association with promoter regions. However, identification of functional CGIs is hampered by repetitive elements and species-specific particularities. Here, we compared the performance of different CGI detection programs on genomic sequences of human and mouse genes. Although mouse CGIs are shorter and G+C poorer than their human counterparts, the different tools tested in our study reliably identify CGIs in promoter regions in both species. Our study confirms that substantially fewer murine than human CGIs coincide with repetitive elements and indicates that such CGIs are subject to accelerated cytosine deamination. In addition, CpG depletion appears to anticorrelate with the epigenetic features of functional regulatory CGIs. Taking into account different deamination rates in unmethylated CGIs versus those in methylated CGIs might support the detection of functional CGIs in other species for which there is little epigenetic information available.
Collapse
Affiliation(s)
- Barbara Hutter
- Lehrstuhl für Computational Biology, Universität des Saarlandes, Saarbrücken, Germany
| | | | | |
Collapse
|
29
|
Medvedeva YA, Fridman MV, Oparina NJ, Malko DB, Ermakova EO, Kulakovskiy IV, Heinzel A, Makeev VJ. Intergenic, gene terminal, and intragenic CpG islands in the human genome. BMC Genomics 2010; 11:48. [PMID: 20085634 PMCID: PMC2817693 DOI: 10.1186/1471-2164-11-48] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 01/19/2010] [Indexed: 11/10/2022] Open
Abstract
Background Recently, it has been discovered that the human genome contains many transcription start sites for non-coding RNA. Regulatory regions related to transcription of this non-coding RNAs are poorly studied. Some of these regulatory regions may be associated with CpG islands located far from transcription start-sites of any protein coding gene. The human genome contains many such CpG islands; however, until now their properties were not systematically studied. Results We studied CpG islands located in different regions of the human genome using methods of bioinformatics and comparative genomics. We have observed that CpG islands have a preference to overlap with exons, including exons located far from transcription start site, but usually extend well into introns. Synonymous substitution rate of CpG-containing codons becomes substantially reduced in regions where CpG islands overlap with protein-coding exons, even if they are located far downstream from transcription start site. CAGE tag analysis displayed frequent transcription start sites in all CpG islands, including those found far from transcription start sites of protein coding genes. Computational prediction and analysis of published ChIP-chip data revealed that CpG islands contain an increased number of sites recognized by Sp1 protein. CpG islands containing more CAGE tags usually also contain more Sp1 binding sites. This is especially relevant for CpG islands located in 3' gene regions. Various examples of transcription, confirmed by mRNAs or ESTs, but with no evidence of protein coding genes, were found in CAGE-enriched CpG islands located far from transcription start site of any known protein coding gene. Conclusions CpG islands located far from transcription start sites of protein coding genes have transcription initiation activity and display Sp1 binding properties. In exons, overlapping with these islands, the synonymous substitution rate of CpG containing codons is decreased. This suggests that these CpG islands are involved in transcription initiation, possibly of some non-coding RNAs.
Collapse
Affiliation(s)
- Yulia A Medvedeva
- Research Institute for Genetics and Selection of Industrial Microorganisms, Genetika, 1st Dorozhny proezd, 1, Moscow, 117545, Russia.
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Kwon MJ, Oh E, Lee S, Roh MR, Kim SE, Lee Y, Choi YL, In YH, Park T, Koh SS, Shin YK. Identification of novel reference genes using multiplatform expression data and their validation for quantitative gene expression analysis. PLoS One 2009; 4:e6162. [PMID: 19584937 PMCID: PMC2703796 DOI: 10.1371/journal.pone.0006162] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2009] [Accepted: 06/10/2009] [Indexed: 11/18/2022] Open
Abstract
Normalization of mRNA levels using endogenous reference genes (ERGs) is critical for an accurate comparison of gene expression between different samples. Despite the popularity of traditional ERGs (tERGs) such as GAPDH and ACTB, their expression variability in different tissues or disease status has been reported. Here, we first selected candidate housekeeping genes (HKGs) using human gene expression data from different platforms including EST, SAGE, and microarray, and 13 novel ERGs (nERGs) (ARL8B, CTBP1, CUL1, DIMT1L, FBXW2, GPBP1, LUC7L2, OAZ1, PAPOLA, SPG21, TRIM27, UBQLN1, ZNF207) were further identified from these HKGs. The mean coefficient variation (CV) values of nERGs were significantly lower than those of tERGs and the expression level of most nERGs was relatively lower than high expressing tERGs in all dataset. The higher expression stability and lower expression levels of most nERGs were validated in 108 human samples including formalin-fixed paraffin-embedded (FFPE) tissues, frozen tissues and cell lines, through quantitative real-time RT-PCR (qRT-PCR). Furthermore, the optimal number of nERGs required for accurate normalization was as few as two, while four genes were required when using tERGs in FFPE tissues. Most nERGs identified in this study should be better reference genes than tERGs, based on their higher expression stability and fewer numbers needed for normalization when multiple ERGs are required.
Collapse
Affiliation(s)
- Mi Jeong Kwon
- Laboratory of Molecular Pathology, Department of Pharmacy, College of Pharmacy, Seoul National University, Seoul, Korea
| | - Ensel Oh
- Interdiciplinary Program of Bioinformatics, College of Natural Science, Seoul National University, Seoul, Korea
| | - Seungmook Lee
- Department of Statistics, College of Natural Science, Seoul National University, Seoul, Korea
| | | | - Si Eun Kim
- Laboratory of Molecular Pathology, Department of Pharmacy, College of Pharmacy, Seoul National University, Seoul, Korea
| | - Yangsoon Lee
- LG Life Sciences, Ltd., R&D Research Park, Daejeon, Korea
| | - Yoon-La Choi
- Department of Pathology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | | | - Taesung Park
- Department of Statistics, College of Natural Science, Seoul National University, Seoul, Korea
| | - Sang Seok Koh
- Protein Therapeutics Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Korea
| | - Young Kee Shin
- Laboratory of Molecular Pathology, Department of Pharmacy, College of Pharmacy, Seoul National University, Seoul, Korea
- Interdiciplinary Program of Bioinformatics, College of Natural Science, Seoul National University, Seoul, Korea
- * E-mail:
| |
Collapse
|
31
|
Elhaik E, Landan G, Graur D. Can GC content at third-codon positions be used as a proxy for isochore composition? Mol Biol Evol 2009; 26:1829-33. [PMID: 19443854 DOI: 10.1093/molbev/msp100] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition.
Collapse
Affiliation(s)
- Eran Elhaik
- Department of Biology and Biochemistry, University of Houston, TX, USA
| | | | | |
Collapse
|
32
|
Previti C, Harari O, Zwir I, del Val C. Profile analysis and prediction of tissue-specific CpG island methylation classes. BMC Bioinformatics 2009; 10:116. [PMID: 19383127 PMCID: PMC2683815 DOI: 10.1186/1471-2105-10-116] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Accepted: 04/21/2009] [Indexed: 11/10/2022] Open
Abstract
Background The computational prediction of DNA methylation has become an important topic in the recent years due to its role in the epigenetic control of normal and cancer-related processes. While previous prediction approaches focused merely on differences between methylated and unmethylated DNA sequences, recent experimental results have shown the presence of much more complex patterns of methylation across tissues and time in the human genome. These patterns are only partially described by a binary model of DNA methylation. In this work we propose a novel approach, based on profile analysis of tissue-specific methylation that uncovers significant differences in the sequences of CpG islands (CGIs) that predispose them to a tissue- specific methylation pattern. Results We defined CGI methylation profiles that separate not only between constitutively methylated and unmethylated CGIs, but also identify CGIs showing a differential degree of methylation across tissues and cell-types or a lack of methylation exclusively in sperm. These profiles are clearly distinguished by a number of CGI attributes including their evolutionary conservation, their significance, as well as the evolutionary evidence of prior methylation. Additionally, we assess profile functionality with respect to the different compartments of protein coding genes and their possible use in the prediction of DNA methylation. Conclusion Our approach provides new insights into the biological features that determine if a CGI has a functional role in the epigenetic control of gene expression and the features associated with CGI methylation susceptibility. Moreover, we show that the ability to predict CGI methylation is based primarily on the quality of the biological information used and the relationships uncovered between different sources of knowledge. The strategy presented here is able to predict, besides the constitutively methylated and unmethylated classes, two more tissue specific methylation classes conserving the accuracy provided by leading binary methylation classification methods.
Collapse
Affiliation(s)
- Christopher Previti
- Department of Molecular Biophysics, DKFZ, German Cancer Research Center, Heidelberg, Germany.
| | | | | | | |
Collapse
|
33
|
Illingworth RS, Bird AP. CpG islands--'a rough guide'. FEBS Lett 2009; 583:1713-20. [PMID: 19376112 DOI: 10.1016/j.febslet.2009.04.012] [Citation(s) in RCA: 590] [Impact Index Per Article: 36.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2009] [Revised: 04/04/2009] [Accepted: 04/06/2009] [Indexed: 02/07/2023]
Abstract
Mammalian genomes are punctuated by DNA sequences containing an atypically high frequency of CpG sites termed CpG islands (CGIs). CGIs generally lack DNA methylation and associate with the majority of annotated gene promoters. Many studies, however, have identified examples of CGI methylation in malignant cells, leading to improper gene silencing. CGI methylation also occurs in normal tissues and is known to function in X-inactivation and genomic imprinting. More recently, differential methylation has been shown between tissues, suggesting a potential role in transcriptional regulation during cell specification. Many of these tissue-specific methylated CGIs localise to regions distal to promoters, the regulatory function of which remains to be determined.
Collapse
Affiliation(s)
- Robert S Illingworth
- Wellcome Trust Centre for Cell Biology, Michael Swann Building, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, United Kingdom.
| | | |
Collapse
|
34
|
Transcription initiation activity sets replication origin efficiency in mammalian cells. PLoS Genet 2009; 5:e1000446. [PMID: 19360092 PMCID: PMC2661365 DOI: 10.1371/journal.pgen.1000446] [Citation(s) in RCA: 185] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2008] [Accepted: 03/04/2009] [Indexed: 12/24/2022] Open
Abstract
Genomic mapping of DNA replication origins (ORIs) in mammals provides a powerful means for understanding the regulatory complexity of our genome. Here we combine a genome-wide approach to identify preferential sites of DNA replication initiation at 0.4% of the mouse genome with detailed molecular analysis at distinct classes of ORIs according to their location relative to the genes. Our study reveals that 85% of the replication initiation sites in mouse embryonic stem (ES) cells are associated with transcriptional units. Nearly half of the identified ORIs map at promoter regions and, interestingly, ORI density strongly correlates with promoter density, reflecting the coordinated organisation of replication and transcription in the mouse genome. Detailed analysis of ORI activity showed that CpG island promoter-ORIs are the most efficient ORIs in ES cells and both ORI specification and firing efficiency are maintained across cell types. Remarkably, the distribution of replication initiation sites at promoter-ORIs exactly parallels that of transcription start sites (TSS), suggesting a co-evolution of the regulatory regions driving replication and transcription. Moreover, we found that promoter-ORIs are significantly enriched in CAGE tags derived from early embryos relative to all promoters. This association implies that transcription initiation early in development sets the probability of ORI activation, unveiling a new hallmark in ORI efficiency regulation in mammalian cells. The duplication of the genetic information of a cell starts from specific sites on the chromosomes called DNA replication origins. Their number varies from a few hundred in yeast cells to several thousands in human cells, distributed along the genome at comparable distances in both systems. An important question in the field is to understand how origins of replication are specified and regulated in the mammalian genome, as neither their location nor their activity can be directly inferred from the DNA sequence. Previous studies at individual origins and, more recently, at large scale across 1% of the human genome, have revealed that most origins overlap with transcriptional regulatory elements, and specifically with gene promoters. To gain insight into the nature of the relationship between active transcription and origin specification we have combined a genomic mapping of origins at 0.4% of the mouse genome with detailed studies of activation efficiency. The data identify two types of origins with distinct regulatory properties: highly efficient origins map at CpG island-promoters and low efficient origins locate elsewhere in association with transcriptional units. We also find a remarkable parallel organisation of the replication initiation sites and transcription start sites at efficient promoter-origins that suggests a prominent role of transcription initiation in setting the efficiency of replication origin activation.
Collapse
|
35
|
Roymondal U, Das S, Sahoo S. Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome. DNA Res 2009; 16:13-30. [PMID: 19131380 PMCID: PMC2646356 DOI: 10.1093/dnares/dsn029] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We present an expression measure of a gene, devised to predict the level of gene expression from relative codon bias (RCB). There are a number of measures currently in use that quantify codon usage in genes. Based on the hypothesis that gene expressivity and codon composition is strongly correlated, RCB has been defined to provide an intuitively meaningful measure of an extent of the codon preference in a gene. We outline a simple approach to assess the strength of RCB (RCBS) in genes as a guide to their likely expression levels and illustrate this with an analysis of Escherichia coli (E. coli) genome. Our efforts to quantitatively predict gene expression levels in E. coli met with a high level of success. Surprisingly, we observe a strong correlation between RCBS and protein length indicating natural selection in favour of the shorter genes to be expressed at higher level. The agreement of our result with high protein abundances, microarray data and radioactive data demonstrates that the genomic expression profile available in our method can be applied in a meaningful way to the study of cell physiology and also for more detailed studies of particular genes of interest.
Collapse
Affiliation(s)
- Uttam Roymondal
- Department of Mathematics, Raidighi College, South 24 Parganas, Raidighi, West Bengal, India
| | | | | |
Collapse
|
36
|
Necsulea A, Guillet C, Cadoret JC, Prioleau MN, Duret L. The relationship between DNA replication and human genome organization. Mol Biol Evol 2009; 26:729-41. [PMID: 19126867 DOI: 10.1093/molbev/msn303] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Assessment of the impact of DNA replication on genome architecture in Eukaryotes has long been hampered by the scarcity of experimental data. Recent work, relying on computational predictions of origins of replication, suggested that replication might be a major determinant of gene organization in human (Huvet et al. 2007. Human gene organization driven by the coordination of replication and transcription. Genome Res. 17:1278-1285). Here, we address this question by analyzing the first large-scale data set of experimentally determined origins of replication in human: 283 origins identified in HeLa cells, in 1% of the genome covered by ENCODE regions (Cadoret et al. 2008. Genome-wide studies highlight indirect links between human replication origins and gene regulation. Proc Natl Acad Sci USA. 105:15837-15842). We show that origins of replication are not randomly distributed as they display significant overlap with promoter regions and CpG islands. The hypothesis of a selective pressure to avoid frontal collisions between replication and transcription polymerases is not supported by experimental data as we find no evidence for gene orientation bias in the proximity of origins of replication. The lack of a significant orientation bias remains manifest even when considering only genes expressed at a high rate, or in a wide number of tissues, and is not affected by the regional replication timing. Gene expression breadth does not appear to be correlated with the distance from the origins of replication. We conclude that the impact of DNA replication on human genome organization is considerably weaker than previously proposed.
Collapse
|
37
|
Borštnik B, Oblak B, Pumpernik D. The Evolutionary Constraints in Mutational Replacements. Evol Biol 2009. [DOI: 10.1007/978-3-642-00952-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
38
|
Mukhopadhyay P, Basak S, Ghosh TC. Differential selective constraints shaping codon usage pattern of housekeeping and tissue-specific homologous genes of rice and arabidopsis. DNA Res 2008; 15:347-56. [PMID: 18827062 PMCID: PMC2608846 DOI: 10.1093/dnares/dsn023] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Intra-genomic variation between housekeeping and tissue-specific genes has always been a study of interest in higher eukaryotes. To-date, however, no such investigation has been done in plants. Availability of whole genome expression data for both rice and Arabidopsis has made it possible to examine the evolutionary forces in shaping codon usage pattern in both housekeeping and tissue-specific genes in plants. In the present work, we have taken 4065 rice-Arabidopsis homologous gene pairs to study evolutionary forces responsible for codon usage divergence between housekeeping and tissue-specific genes. In both rice and Arabidopsis, it is mutational bias that regulates error minimization in highly expressed genes of both housekeeping and tissue-specific genes. Our results show that, in comparison to tissue-specific genes, housekeeping genes are under strong selective constraint in plants. However, in tissue-specific genes, lowly expressed genes are under stronger selective constraint compared with highly expressed genes. We demonstrated that constraint acting on mRNA secondary structure is responsible for modulating codon usage variations in rice tissue-specific genes. Thus, different evolutionary forces must underline the evolution of synonymous codon usage of highly expressed genes of housekeeping and tissue-specific genes in rice and Arabidopsis.
Collapse
Affiliation(s)
- Pamela Mukhopadhyay
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
39
|
Li C, Hirsch M, Carter P, Asokan A, Zhou X, Wu Z, Samulski RJ. A small regulatory element from chromosome 19 enhances liver-specific gene expression. Gene Ther 2008; 16:43-51. [PMID: 18701910 DOI: 10.1038/gt.2008.134] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Tissue-specific promoters for gene therapy are typically too big for adeno-associated virus (AAV) vectors; thus, the exploration of small effective non-viral regulatory elements is of particular interest. Wild-type AAV can specifically integrate into a region on human chromosome 19 termed AAVS1. Earlier work has determined that a 347 bp fragment (Chr19) of AAVS1 has promoter and transcriptional enhancer activities. In this study, we further characterized this genetic regulation and investigated its application to AAV gene therapy in vitro and in vivo. The Chr19 347 bp fragment was dissected into three regulatory elements in human embryonic kidney cells: (i) TATA-independent promoter activity distributed throughout the fragment regardless of orientation, (ii) an orientation-dependent insulator function near the 5' end and (iii) a 107 bp enhancer region near the 3' end. The small enhancer region, coupled to the mini-CMV promoter, was used to drive the expression of several reporters following transduction by AAV2. In vivo data demonstrated enhanced transgene expression from the Chr19-mini-CMV promoter cassette after tail vein injection primarily in the liver at levels comparable to the chicken beta-actin promoter and higher than the liver-specific TTR promoter (>2-fold). However, we did not observe this increase after muscle injection, suggesting tissue-specific enhancement. All of the results support identification of a small DNA fragment (347 bp) from AAV Chr19 integration site capable of providing efficient and enhanced liver-specific transcription when used in recombinant AAV vectors.
Collapse
Affiliation(s)
- C Li
- Gene Therapy Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | | | | | | | | | |
Collapse
|
40
|
Urrutia AO, Ocaña LB, Hurst LD. Do Alu repeats drive the evolution of the primate transcriptome? Genome Biol 2008; 9:R25. [PMID: 18241332 PMCID: PMC2374697 DOI: 10.1186/gb-2008-9-2-r25] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2007] [Revised: 01/02/2008] [Accepted: 02/01/2008] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Of all repetitive elements in the human genome, Alus are unusual in being enriched near to genes that are expressed across a broad range of tissues. This has led to the proposal that Alus might be modifying the expression breadth of neighboring genes, possibly by providing CpG islands, modifying transcription factor binding, or altering chromatin structure. Here we consider whether Alus have increased expression breadth of genes in their vicinity. RESULTS Contrary to the modification hypothesis, we find that those genes that have always had broad expression are richest in Alus, whereas those that are more likely to have become more broadly expressed have lower enrichment. This finding is consistent with a model in which Alus accumulate near broadly expressed genes but do not affect their expression breadth. Furthermore, this model is consistent with the finding that expression breadth of mouse genes predicts Alu density near their human orthologs. However, Alus were found to be related to some alternative measures of transcription profile divergence, although evidence is contradictory as to whether Alus associate with lowly or highly diverged genes. If Alu have any effect it is not by provision of CpG islands, because they are especially rare near to transcriptional start sites. Previously reported Alu enrichment for genes serving certain cellular functions, suggested to be evidence of functional importance of Alus, appears to be partly a byproduct of the association with broadly expressed genes. CONCLUSION The abundance of Alu near broadly expressed genes is better explained by their preferential preservation near to housekeeping genes rather than by a modifying effect on expression of genes.
Collapse
Affiliation(s)
- Araxi O Urrutia
- Department of Biology and Biochemistry, University of Bath, Bath, BA4 7AY, UK.
| | | | | |
Collapse
|
41
|
DNA sequence and structural properties as predictors of human and mouse promoters. Gene 2007; 410:165-76. [PMID: 18234453 PMCID: PMC2672154 DOI: 10.1016/j.gene.2007.12.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Revised: 11/30/2007] [Accepted: 12/05/2007] [Indexed: 11/21/2022]
Abstract
Promoters play a central role in gene regulation, yet our power to discriminate them from non-promoter sequences in higher eukaryotes is mainly restricted to those associated with CpG islands. Here, we examined in silico the promoters of 30,954 human and 18,083 mouse transcripts in the DBTSS database, to assess the impact of particular sequence and structural features (propeller twist, bendability and nucleosome positioning preference) on promoter classification and prediction. Our analysis showed that a stricter-than-traditional definition of CpG islands captures low and high CpG count promoter classes more accurately than the traditional one. We observed that both human and mouse promoter sequences are flexible with the exception of the TATA box and TSS, which are rigid regions irrespective of association with a CpG island. Therefore varying levels of structural flexibility in promoters may affect their accessibility to proteins, and hence their specificity. For all features investigated, averaged values across core promoters discriminated CpG island associated promoters from background, whereas the same did not hold for promoters without a CpG island. However, local changes around - 34 to - 23 (expected position of TATA box) and the TSS were informative in discriminating promoters (both classes) from non-promoter sequences. Additionally, we investigated ATG deserts and observed that they occur in all promoter sets except those with a TATA-box and without a CpG island in human. Interestingly, all mouse promoter sets showed ATG codon depletion irrespective of the presence of a TATA-box, possibly reflecting a weaker contribution to TSS specificity in mouse.
Collapse
|
42
|
Wiznerowicz M, Jakobsson J, Szulc J, Liao S, Quazzola A, Beermann F, Aebischer P, Trono D. The Kruppel-associated box repressor domain can trigger de novo promoter methylation during mouse early embryogenesis. J Biol Chem 2007; 282:34535-41. [PMID: 17893143 DOI: 10.1074/jbc.m705898200] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The Krüppel-associated box (KRAB) domain is a transcriptional repression module responsible for the DNA binding-dependent gene silencing activity of hundreds of vertebrate zinc finger proteins. We previously exploited KRAB-mediated repression within the context of a tet repressor-KRAB fusion protein and of lentiviral vectors to create a method of external gene control. We demonstrated that with this system transcriptional silencing was fully reversible in cell culture as well as in vivo. Here we reveal that, in sharp contrast, KRAB-mediated repression results in irreversible gene silencing through promoter DNA methylation if it acts during the first few days of mouse development.
Collapse
Affiliation(s)
- Maciej Wiznerowicz
- School of Life Sciences, "Frontiers in Genetics" National Center for Competence in Research, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Jiang C, Han L, Su B, Li WH, Zhao Z. Features and Trend of Loss of Promoter-Associated CpG Islands in the Human and Mouse Genomes. Mol Biol Evol 2007; 24:1991-2000. [PMID: 17591602 DOI: 10.1093/molbev/msm128] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
CpG islands (CGIs) are often considered as gene markers, but the number of CGIs varies among mammalian genomes that have similar numbers of genes. In this study, we investigated the distribution of CGIs in the promoter regions of 3,197 human-mouse orthologous gene pairs and found that the mouse genome has notably fewer CGIs in the promoter regions and less pronounced CGI characteristics than does the human genome. We further inferred CGI's ancestral state using the dog genome as a reference and examined the nucleotide substitution pattern and the mutational direction in the conserved regions of human and mouse CGIs. The results reveal many losses of CGIs in both genomes but the loss rate in the mouse lineage is two to four times the rate in the human lineage. We found an intriguing feature of CGI loss, namely that the loss of a CGI usually starts from erosion at the both edges and gradually moves towards the center. We found functional bias in the genes that have lost promoter-associated CGIs in the human or mouse lineage. Finally, our analysis indicates that the association of CGIs with housekeeping genes is not as strong as previously estimated. Our study provides a detailed view of the evolution of promoter-associated CGIs in the human and mouse genomes and our findings are helpful for understanding the evolution of mammalian genomes and the role of CGIs in gene function.
Collapse
Affiliation(s)
- Cizhong Jiang
- Department of Psychiatry and Center for the Study of Biological Complexity, Virginia Commonwealth, USA
| | | | | | | | | |
Collapse
|
44
|
Ren L, Gao G, Zhao D, Ding M, Luo J, Deng H. Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprints with models of stem cell differentiation. Genome Biol 2007; 8:R35. [PMID: 17349061 PMCID: PMC1868930 DOI: 10.1186/gb-2007-8-3-r35] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2006] [Revised: 01/08/2007] [Accepted: 03/12/2007] [Indexed: 11/26/2022] Open
Abstract
Developmental-stage-related patterns of gene expression correlate with codon usage and genomic GC content in stem cell hierarchies. Background The usage of synonymous codons shows considerable variation among mammalian genes. How and why this usage is non-random are fundamental biological questions and remain controversial. It is also important to explore whether mammalian genes that are selectively expressed at different developmental stages bear different molecular features. Results In two models of mouse stem cell differentiation, we established correlations between codon usage and the patterns of gene expression. We found that the optimal codons exhibited variation (AT- or GC-ending codons) in different cell types within the developmental hierarchy. We also found that genes that were enriched (developmental-pivotal genes) or specifically expressed (developmental-specific genes) at different developmental stages had different patterns of codon usage and local genomic GC (GCg) content. Moreover, at the same developmental stage, developmental-specific genes generally used more GC-ending codons and had higher GCg content compared with developmental-pivotal genes. Further analyses suggest that the model of translational selection might be consistent with the developmental stage-related patterns of codon usage, especially for the AT-ending optimal codons. In addition, our data show that after human-mouse divergence, the influence of selective constraints is still detectable. Conclusion Our findings suggest that developmental stage-related patterns of gene expression are correlated with codon usage (GC3) and GCg content in stem cell hierarchies. Moreover, this paper provides evidence for the influence of natural selection at synonymous sites in the mouse genome and novel clues for linking the molecular features of genes to their patterns of expression during mammalian ontogenesis.
Collapse
Affiliation(s)
- Lichen Ren
- College of Life Sciences, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Ge Gao
- Center for Bioinformatics, College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetics Engineering, Peking University, Beijing, 100871, PR China
| | - Dongxin Zhao
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| | - Mingxiao Ding
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| | - Jingchu Luo
- Center for Bioinformatics, College of Life Sciences, National Laboratory of Protein Engineering and Plant Genetics Engineering, Peking University, Beijing, 100871, PR China
| | - Hongkui Deng
- Department of Cell Biology and Genetics, College of Life Sciences, Peking University, Beijing, 100871, PR China
| |
Collapse
|
45
|
Yamada Y, Shirakawa T, Taylor TD, Okamura K, Soejima H, Uchiyama M, Iwasaka T, Mukai T, Muramoto KI, Sakaki Y, Ito T. A comprehensive analysis of allelic methylation status of CpG islands on human chromosome 11q: comparison with chromosome 21q. ACTA ACUST UNITED AC 2007; 17:300-6. [PMID: 17312950 DOI: 10.1080/10425170600886128] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
It was generally believed that autosomal CpG islands (CGIs) escape methylation. However, our comprehensive analysis of allelic methylation status of 149 CGIs on human chromosome 21q revealed that a sizable fraction of them are methylated on both alleles even in normal blood cells. Here, we performed a similar analysis of 656 CGIs on chromosome 11q, which is gene-rich in contrast with 21q. The results indicate that 11q contains less methylated CGIs, especially those with tandem repeats and those in the coding or 3'-untranslated regions (UTRs), than 21q. Thus, methylation status of CGIs may substantially differ from one chromosome to another.
Collapse
Affiliation(s)
- Yoichi Yamada
- Department of Information and Systems Engineering, Faculty of Engineering, Kanazawa University, Kakuma-machi, Kanazawa 920-1192, Japan
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Suzuki MM, Kerr ARW, De Sousa D, Bird A. CpG methylation is targeted to transcription units in an invertebrate genome. Genome Res 2007; 17:625-31. [PMID: 17420183 PMCID: PMC1855171 DOI: 10.1101/gr.6163007] [Citation(s) in RCA: 173] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
DNA is methylated at the dinucleotide CpG in genomes of a wide range of plants and animals. Among animals, variable patterns of genomic CpG methylation have been described, ranging from undetectable levels (e.g., in Caenorhabditis elegans) to high levels of global methylation in the vertebrates. The most frequent pattern in invertebrate animals, however, is mosaic methylation, comprising domains of methylated DNA interspersed with unmethylated domains. To understand the origin of mosaic DNA methylation patterns, we examined the distribution of DNA methylation in the Ciona intestinalis genome. Bisulfite sequencing and computational analysis revealed methylated domains with sharp boundaries that strongly colocalize with approximately 60% of transcription units. By contrast, promoters, intergenic DNA, and transposons are not preferentially targeted by DNA methylation. Methylated transcription units include evolutionarily conserved genes, whereas the most highly expressed genes preferentially belong to the unmethylated fraction. The results lend support to the hypothesis that CpG methylation functions to suppress spurious transcriptional initiation within infrequently transcribed genes.
Collapse
Affiliation(s)
- Miho M Suzuki
- The Wellcome Trust Centre for Cell Biology, The University of Edinburgh, Michael Swann Building, The King's Buildings, Edinburgh EH9 3JR, UK.
| | | | | | | |
Collapse
|
47
|
Park SY, Kim BH, Kim JH, Cho NY, Choi M, Yu EJ, Lee S, Kang GH. Methylation profiles of CpG island loci in major types of human cancers. J Korean Med Sci 2007; 22:311-7. [PMID: 17449942 PMCID: PMC2693600 DOI: 10.3346/jkms.2007.22.2.311] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Several reports have described aberrant methylation in various types of human cancers. However, the interpretation of methylation frequency in various human cancers has some limitations because of the different materials and methods used for methylation analysis. To gain an insight into the role of DNA hypermethylation in human cancers and allow direct comparison of tissue specific methylation, we generated methylation profiles in 328 human cancers, including 24 breast, 48 colon, 61 stomach, 48 liver, 37 larynx, 24 lung, 40 prostate, and 46 uterine cervical cancer samples by analyzing CpG island hypermethylation of 13 genes using methylation-specific PCR. The mean numbers of methylated genes were 6.5, 4.4, 3.6, 3.4, 3.1, 3.1, 3.1, and 2.1 in gastric, liver, prostate, larynx, colon, lung, uterine cervix, and in breast cancer samples, respectively. The number of genes that were methylated at a frequency of more than 40% in each tumor type ranged from nine (stomach) to one (breast). Generally genes frequently methylated in a specific cancer type differed from those methylated in other cancer types. The findings indicate that aberrant CpG island hypermethylation is a frequent finding in human cancers of various tissue types, and each tissue type has its own distinct methylation pattern.
Collapse
Affiliation(s)
- Seog-Yun Park
- Department of Pathology, Seoul National University College of Medicine, Seoul, Korea
| | - Baek-Hee Kim
- Department of Pathology, Seoul National University College of Medicine, Seoul, Korea
| | - Jeong Ho Kim
- Department of Pathology, Seoul National University College of Medicine, Seoul, Korea
| | | | | | - Eun Joo Yu
- the Cancer Research Institute, Seoul, Korea
| | - Sun Lee
- Department of Pathology, Kyung Hee University College of Medicine, Seoul, Korea
| | - Gyeong Hoon Kang
- Department of Pathology, Seoul National University College of Medicine, Seoul, Korea
- the Cancer Research Institute, Seoul, Korea
| |
Collapse
|
48
|
Baek D, Davis C, Ewing B, Gordon D, Green P. Characterization and predictive discovery of evolutionarily conserved mammalian alternative promoters. Genome Res 2007; 17:145-55. [PMID: 17210929 PMCID: PMC1781346 DOI: 10.1101/gr.5872707] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recent studies suggest that surprisingly many mammalian genes have alternative promoters (APs); however, their biological roles, and the characteristics that distinguish them from single promoters (SPs), remain poorly understood. We constructed a large data set of evolutionarily conserved promoters, and used it to identify sequence features, functional associations, and expression patterns that differ by promoter type. The four promoter categories CpG-rich APs, CpG-poor APs, CpG-rich SPs, and CpG-poor SPs each show characteristic strengths and patterns of sequence conservation, frequencies of putative transcription-related motifs, and tissue and developmental stage expression preferences. APs display substantially higher sequence conservation than SPs and CpG-poor promoters than CpG-rich promoters. Among CpG-poor promoters, APs and SPs show sharply contrasting developmental stage preferences and TATA box frequencies. We developed a discriminator to computationally predict promoter type, verified its accuracy through experimental tests that incorporate a novel method for deconvolving mixed sequence traces, and used it to find several new APs. The discriminator predicts that almost half of all mammalian genes have evolutionarily conserved APs. This high frequency of APs, together with the strong purifying selection maintaining them, implies a crucial role in expanding the expression diversity of the mammalian genome.
Collapse
Affiliation(s)
- Daehyun Baek
- Department of Bioengineering, University of Washington, Seattle, Washington 98195, USA
- Corresponding authors.E-mail ; fax (206) 685-9720.E-mail ; fax (206) 685-9720
| | - Colleen Davis
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Brent Ewing
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - David Gordon
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Phil Green
- Howard Hughes Medical Institute and Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Corresponding authors.E-mail ; fax (206) 685-9720.E-mail ; fax (206) 685-9720
| |
Collapse
|
49
|
Seo D, Jiang C, Zhao Z. A novel statistical method to estimate the effective SNP size in vertebrate genomes and categorized genomic regions. BMC Genomics 2006; 7:329. [PMID: 17196097 PMCID: PMC1769377 DOI: 10.1186/1471-2164-7-329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Accepted: 12/29/2006] [Indexed: 11/29/2022] Open
Abstract
Background The local environment of single nucleotide polymorphisms (SNPs) contains abundant genetic information for the study of mechanisms of mutation, genome evolution, and causes of diseases. Recent studies revealed that neighboring-nucleotide biases on SNPs were strong and the genome-wide bias patterns could be represented by a small subset of the total SNPs. It remains unsolved for the estimation of the effective SNP size, the number of SNPs that are sufficient to represent the bias patterns observed from the whole SNP data. Results To estimate the effective SNP size, we developed a novel statistical method, SNPKS, which considers both the statistical and biological significances. SNPKS consists of two major steps: to obtain an initial effective size by the Kolmogorov-Smirnov test (KS test) and to find an intermediate effective size by interval evaluation. The SNPKS algorithm was implemented in computer programs and applied to the real SNP data. The effective SNP size was estimated to be 38,200, 39,300, 38,000, and 38,700 in the human, chimpanzee, dog, and mouse genomes, respectively, and 39,100, 39,600, 39,200, and 42,200 in human intergenic, genic, intronic, and CpG island regions, respectively. Conclusion SNPKS is the first statistical method to estimate the effective SNP size. It runs efficiently and greatly outperforms the algorithm implemented in SNPNB. The application of SNPKS to the real SNP data revealed the similar small effective SNP size (38,000 – 42,200) in the human, chimpanzee, dog, and mouse genomes as well as in human genomic regions. The findings suggest strong influence of genetic factors across vertebrate genomes.
Collapse
|
50
|
Jiang C, Zhao Z. Mutational spectrum in the recent human genome inferred by single nucleotide polymorphisms. Genomics 2006; 88:527-34. [PMID: 16860534 DOI: 10.1016/j.ygeno.2006.06.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2006] [Revised: 06/01/2006] [Accepted: 06/06/2006] [Indexed: 01/09/2023]
Abstract
So far, there is no genome-wide estimation of the mutational spectrum in humans. In this study, we systematically examined the directionality of the point mutations and maintenance of GC content in the human genome using approximately 1.8 million high-quality human single nucleotide polymorphisms and their ancestral sequences in chimpanzees. The frequency of C-->T (G-->A) changes was the highest among all mutation types and the frequency of each type of transition was approximately fourfold that of each type of transversion. In intergenic regions, when the GC content increased, the frequency of changes from G or C increased. In exons, the frequency of G:C-->A:T was the highest among the genomic categories and contributed mainly by the frequent mutations at the CpG sites. In contrast, mutations at the CpG sites, or CpG-->TpG/CpA mutations, occurred less frequently in the CpG islands relative to intergenic regions with similar GC content. Our results suggest that the GC content is overall not in equilibrium in the human genome, with a trend toward shifting the human genome to be AT rich and shifting the GC content of a region to approach the genome average. Our results, which differ from previous estimates based on limited loci or on the rodent lineage, provide the first representative and reliable mutational spectrum in the recent human genome and categorized genomic regions.
Collapse
Affiliation(s)
- Cizhong Jiang
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23298-0126, USA
| | | |
Collapse
|