1
|
Grech L, Jeffares DC, Sadée CY, Rodríguez-López M, Bitton DA, Hoti M, Biagosch C, Aravani D, Speekenbrink M, Illingworth CJR, Schiffer PH, Pidoux AL, Tong P, Tallada VA, Allshire R, Levin HL, Bähler J. Fitness Landscape of the Fission Yeast Genome. Mol Biol Evol 2019; 36:1612-1623. [PMID: 31077324 PMCID: PMC6657727 DOI: 10.1093/molbev/msz113] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The relationship between DNA sequence, biochemical function, and molecular evolution is relatively well-described for protein-coding regions of genomes, but far less clear in noncoding regions, particularly, in eukaryote genomes. In part, this is because we lack a complete description of the essential noncoding elements in a eukaryote genome. To contribute to this challenge, we used saturating transposon mutagenesis to interrogate the Schizosaccharomyces pombe genome. We generated 31 million transposon insertions, a theoretical coverage of 2.4 insertions per genomic site. We applied a five-state hidden Markov model (HMM) to distinguish insertion-depleted regions from insertion biases. Both raw insertion-density and HMM-defined fitness estimates showed significant quantitative relationships to gene knockout fitness, genetic diversity, divergence, and expected functional regions based on transcription and gene annotations. Through several analyses, we conclude that transposon insertions produced fitness effects in 66-90% of the genome, including substantial portions of the noncoding regions. Based on the HMM, we estimate that 10% of the insertion depleted sites in the genome showed no signal of conservation between species and were weakly transcribed, demonstrating limitations of comparative genomics and transcriptomics to detect functional units. In this species, 3'- and 5'-untranslated regions were the most prominent insertion-depleted regions that were not represented in measures of constraint from comparative genomics. We conclude that the combination of transposon mutagenesis, evolutionary, and biochemical data can provide new insights into the relationship between genome function and molecular evolution.
Collapse
Affiliation(s)
- Leanne Grech
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Daniel C Jeffares
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
- Department of Biology and York Biomedical Research Institute, University of York, United Kingdom
| | - Christoph Y Sadée
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - María Rodríguez-López
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Danny A Bitton
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Mimoza Hoti
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Carolina Biagosch
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Dimitra Aravani
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | | | | | - Philipp H Schiffer
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Alison L Pidoux
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Pin Tong
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Victor A Tallada
- Centro Andaluz de Biología del Desarrollo, Universidad Pablo de Olavide/Consejo Superior de Investigaciones Científicas, Seville, Spain
| | - Robin Allshire
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Henry L Levin
- Division of Molecular and Cellular Biology, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Jürg Bähler
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
- UCL Genetics Institute, University College London, London, United Kingdom
| |
Collapse
|
2
|
New insights into the pathogenicity of non-synonymous variants through multi-level analysis. Sci Rep 2019; 9:1667. [PMID: 30733553 PMCID: PMC6367327 DOI: 10.1038/s41598-018-38189-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 12/19/2018] [Indexed: 12/17/2022] Open
Abstract
Precise classification of non-synonymous single nucleotide variants (SNVs) is a fundamental goal of clinical genetics. Next-generation sequencing technology is effective for establishing the basis of genetic diseases. However, identification of variants that are causal for genetic diseases remains a challenge. We analyzed human non-synonymous SNVs from a multilevel perspective to characterize pathogenicity. We showed that computational tools, though each having its own strength and weakness, tend to be overly dependent on the degree of conservation. For the mutations at non-degenerate sites, the amino acid sites of pathogenic substitutions show a distinct distribution in the classes of protein domains compared with the sites of benign substitutions. Overlooked disease susceptibility of genes explains in part the failures of computational tools. The more pathogenic sites observed, the more likely the gene is expressed in a high abundance or in a high tissue-specific manner, and have a high node degree of protein-protein interaction. The destroyed functions due to some false-negative mutations may arise because of a reprieve from the epigenetic repressed state which shouldn't happen in multiple biological conditions, instead of the defective protein. Our work adds more to our knowledge of non-synonymous SNVs' pathogenicity, thus will benefit the field of clinical genetics.
Collapse
|
3
|
Chuong EB, Elde NC, Feschotte C. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet 2016; 18:71-86. [PMID: 27867194 DOI: 10.1038/nrg.2016.139] [Citation(s) in RCA: 851] [Impact Index Per Article: 94.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Transposable elements (TEs) are a prolific source of tightly regulated, biochemically active non-coding elements, such as transcription factor-binding sites and non-coding RNAs. Many recent studies reinvigorate the idea that these elements are pervasively co-opted for the regulation of host genes. We argue that the inherent genetic properties of TEs and the conflicting relationships with their hosts facilitate their recruitment for regulatory functions in diverse genomes. We review recent findings supporting the long-standing hypothesis that the waves of TE invasions endured by organisms for eons have catalysed the evolution of gene-regulatory networks. We also discuss the challenges of dissecting and interpreting the phenotypic effect of regulatory activities encoded by TEs in health and disease.
Collapse
Affiliation(s)
- Edward B Chuong
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84103, USA
| | - Nels C Elde
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84103, USA
| | - Cédric Feschotte
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah 84103, USA
| |
Collapse
|
4
|
Abstract
Evolutionary conservation has been an accurate predictor of functional elements across the first decade of metazoan genomics. More recently, there has been a move to define functional elements instead from biochemical annotations. Evolutionary methods are, however, more comprehensive than biochemical approaches can be and can assess quantitatively, especially for subtle effects, how biologically important--how injurious after mutation--different types of elements are. Evolutionary methods are thus critical for understanding the large fraction (up to 10%) of the human genome that does not encode proteins and yet might convey function. These methods can also capture the ephemeral nature of much noncoding functional sequence, with large numbers of functional elements having been gained and lost rapidly along each mammalian lineage. Here, we review how different strengths of purifying selection have impacted on protein-coding and non-protein-coding loci and on transcription factor binding sites in mammalian and fruit fly genomes.
Collapse
Affiliation(s)
- Wilfried Haerty
- MRC Functional Genomics Unit, Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom; ,
| | | |
Collapse
|
5
|
Xiao ZD, Diao LT, Yang JH, Xu H, Huang MB, Deng YJ, Zhou H, Qu LH. Deciphering the transcriptional regulation of microRNA genes in humans with ACTLocater. Nucleic Acids Res 2012; 41:e5. [PMID: 22941648 PMCID: PMC3592406 DOI: 10.1093/nar/gks821] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Understanding the transcriptional regulation of microRNAs (miRNAs) is extremely important for determining the specific roles they play in signaling cascades. However, precise identification of transcription factor binding sites (TFBSs) orchestrating the expressions of miRNAs remains a challenge. By combining accessible chromatin sequences of 12 cell types released by the ENCODE Project, we found that a significant fraction (∼80%) of such integrated sequences, evolutionary conserved and in regions upstream of human miRNA genes that are independently transcribed, were preserved across cell types. Accordingly, we developed a computational method, Accessible and Conserved TFBSs Locater (ACTLocater), incorporating this chromatin feature and evolutionary conservation to identify the TFBSs associated with human miRNA genes. ACTLocater achieved high positive predictive values, as revealed by the experimental validation of FOXA1 predictions and by the comparison of its predictions of some other transcription factors (TFs) to empirical ChIP-seq data. Most notably, ACTLocater was widely applicable as indicated by the successful prediction of TF→miRNA interactions in cell types whose chromatin accessibility profiles were not incorporated. By applying ACTLocater to TFs with characterized binding specificities, we compiled a novel repository of putative TF→miRNA interactions and displayed it in ACTViewer, providing a promising foundation for future investigations to elucidate the regulatory mechanisms of miRNA transcription in humans.
Collapse
Affiliation(s)
- Zhen-Dong Xiao
- Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou 510275, People's Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Functional elements demarcated by histone modifications in breast cancer cells. Biochem Biophys Res Commun 2012; 418:475-82. [PMID: 22285863 DOI: 10.1016/j.bbrc.2012.01.042] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 01/08/2012] [Indexed: 01/29/2023]
Abstract
Histone modifications are regarded as one of markers to identify regulatory elements which are DNA segments modulating gene transcription. Aberrant changes of histone modification levels are frequently observed in cancer. We have employed ChIP-Seq to identify regulatory elements in human breast cancer cell line, MCF-7 by comparing histone modification patterns of H3K4me1, H3K4me3, and H3K9/14ac to those in normal mammary epithelial cell line, MCF-10A. The genome-wide analysis shows that H3K4me3 and H3K9/14ac are highly enriched at promoter regions and H3K4me1 has a relatively broad distribution over proximity of TSSs as well as other genomic regions. We identified that many differentially expressed genes in MCF-7 have divergent histone modification patterns. To understand the functional roles of distinctively histone-modified regions, we selected 35 genomic regions marked by at least one histone modification and located from 3 to 10 kb upstream of TSS in both MCF-7 and MCF-10A and assessed their transcriptional activities. About 66% and 60% of selected regions in MCF-7 and MCF-10A, respectively, enhanced the transcriptional activity. Interestingly, most regions marked by H3K4me1 exhibited an enhancer activity. Regions with two or more kinds of histone modifications did show varying activities. In conclusion, our data reflects that comprehensive analysis of histone modification profiles under cell type-specific chromatin environment should provide a better chance for defining functional regulatory elements in the genome.
Collapse
|
7
|
Koroteev MV, Miller J. Scale-free duplication dynamics: a model for ultraduplication. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:061919. [PMID: 22304128 DOI: 10.1103/physreve.84.061919] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 07/04/2011] [Indexed: 05/31/2023]
Abstract
Empirical studies of the genome-wide length distribution of duplicated sequences have revealed an algebraic tail common to nearly all clades. The decay of the tail is often well approximated by a single exponent that takes values within a limited range. We propose and study here scale-free duplication dynamics, a class of model for genome sequence evolution that generates the observed shapes of this distribution. A transition between self-similar and non-self-similar regimes is exhibited. Our model accounts plausibly for the observed form of the algebraic tail, which is not produced by standard models for generating long-range sequence correlations.
Collapse
Affiliation(s)
- M V Koroteev
- Physics and Biology Unit, Okinawa Institute of Science and Technology Suzaki 12-22, Uruma, Okinawa 904-2234, Japan
| | | |
Collapse
|
8
|
Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 2011; 12:628-40. [PMID: 21850043 DOI: 10.1038/nrg3046] [Citation(s) in RCA: 397] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Genome and exome sequencing yield extensive catalogues of human genetic variation. However, pinpointing the few phenotypically causal variants among the many variants present in human genomes remains a major challenge, particularly for rare and complex traits wherein genetic information alone is often insufficient. Here, we review approaches to estimate the deleteriousness of single nucleotide variants (SNVs), which can be used to prioritize disease-causal variants. We describe recent advances in comparative and functional genomics that enable systematic annotation of both coding and non-coding variants. Application and optimization of these methods will be essential to find the genetic answers that sequencing promises to hide in plain sight.
Collapse
|
9
|
Algebraic distribution of segmental duplication lengths in whole-genome sequence self-alignments. PLoS One 2011; 6:e18464. [PMID: 21779315 PMCID: PMC3136455 DOI: 10.1371/journal.pone.0018464] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2010] [Accepted: 03/08/2011] [Indexed: 01/25/2023] Open
Abstract
Distributions of duplicated sequences from genome self-alignment are characterized, including forward and backward alignments in bacteria and eukaryotes. A Markovian process without auto-correlation should generate an exponential distribution expected from local effects of point mutation and selection on localised function; however, the observed distributions show substantial deviation from exponential form – they are roughly algebraic instead – suggesting a novel kind of long-distance correlation that must be non-local in origin.
Collapse
|
10
|
A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol 2011; 9:e1001046. [PMID: 21526222 PMCID: PMC3079585 DOI: 10.1371/journal.pbio.1001046] [Citation(s) in RCA: 1093] [Impact Index Per Article: 78.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2010] [Accepted: 03/10/2011] [Indexed: 12/18/2022] Open
Abstract
The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.
Collapse
|
11
|
Oldmeadow C, Mengersen K, Mattick JS, Keith JM. Multiple evolutionary rate classes in animal genome evolution. Mol Biol Evol 2009; 27:942-53. [PMID: 19955480 DOI: 10.1093/molbev/msp299] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The proportion of functional sequence in the human genome is currently a subject of debate. The most widely accepted figure is that approximately 5% is under purifying selection. In Drosophila, estimates are an order of magnitude higher, though this corresponds to a similar quantity of sequence. These estimates depend on the difference between the distribution of genomewide evolutionary rates and that observed in a subset of sequences presumed to be neutrally evolving. Motivated by the widening gap between these estimates and experimental evidence of genome function, especially in mammals, we developed a sensitive technique for evaluating such distributions and found that they are much more complex than previously apparent. We found strong evidence for at least nine well-resolved evolutionary rate classes in an alignment of four Drosophila species and at least seven classes in an alignment of four mammals, including human. We also identified at least three rate classes in human ancestral repeats. By positing that the largest of these ancestral repeat classes is neutrally evolving, we estimate that the proportion of nonneutrally evolving sequence is 30% of human ancestral repeats and 45% of the aligned portion of the genome. However, we also question whether any of the classes represent neutrally evolving sequences and argue that a plausible alternative is that they reflect variable structure-function constraints operating throughout the genomes of complex organisms.
Collapse
Affiliation(s)
- Christopher Oldmeadow
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, QLD, Australia
| | | | | | | |
Collapse
|
12
|
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2009; 20:110-21. [PMID: 19858363 DOI: 10.1101/gr.097857.109] [Citation(s) in RCA: 1615] [Impact Index Per Article: 100.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.
Collapse
Affiliation(s)
- Katherine S Pollard
- Gladstone Institutes, University of California, San Francisco, San Francisco, California 94158, USA.
| | | | | | | |
Collapse
|
13
|
Binkley J, Karra K, Kirby A, Hosobuchi M, Stone EA, Sidow A. ProPhylER: a curated online resource for protein function and structure based on evolutionary constraint analyses. Genome Res 2009; 20:142-54. [PMID: 19846609 DOI: 10.1101/gr.097121.109] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
ProPhylER (Protein Phylogeny and Evolutionary Rates) is a next-generation curated proteome resource that uses comparative sequence analysis to predict constraint and mutation impact for eukaryotic proteins. Its purpose is to inform any research program for which protein function and structure are relevant, by the predictive power of evolutionary constraint analyses. ProPhylER currently has nearly 9000 clusters of related proteins, including more than 200,000 sequences. It serves data via two interfaces. The "ProPhylER Interface" displays predictive analyses in sequence space; the "CrystalPainter" maps evolutionary constraints onto solved protein structures. Here we summarize ProPhylER's data content and analysis pipeline, demonstrate the use of ProPhylER's interfaces, and evaluate ProPhylER's unique regional analysis of evolutionary constraint. The high accuracy of ProPhylER's regional analysis complements the high resolution of its single-site analysis to effectively guide and inform structure-function investigations and predict the impact of polymorphisms.
Collapse
Affiliation(s)
- Jonathan Binkley
- Stanford University School of Medicine, Departments of Pathology and Genetics, Stanford, California 94305, USA
| | | | | | | | | | | |
Collapse
|
14
|
Abstract
Each human carries a large number of deleterious mutations. Together, these mutations make a significant contribution to human disease. Identification of deleterious mutations within individual genome sequences could substantially impact an individual's health through personalized prevention and treatment of disease. Yet, distinguishing deleterious mutations from the massive number of nonfunctional variants that occur within a single genome is a considerable challenge. Using a comparative genomics data set of 32 vertebrate species we show that a likelihood ratio test (LRT) can accurately identify a subset of deleterious mutations that disrupt highly conserved amino acids within protein-coding sequences, which are likely to be unconditionally deleterious. The LRT is also able to identify known human disease alleles and performs as well as two commonly used heuristic methods, SIFT and PolyPhen. Application of the LRT to three human genomes reveals 796-837 deleterious mutations per individual, approximately 40% of which are estimated to be at <5% allele frequency. However, the overlap between predictions made by the LRT, SIFT, and PolyPhen, is low; 76% of predictions are unique to one of the three methods, and only 5% of predictions are shared across all three methods. Our results indicate that only a small subset of deleterious mutations can be reliably identified, but that this subset provides the raw material for personalized medicine.
Collapse
|
15
|
Wright SI, Andolfatto P. The Impact of Natural Selection on the Genome: Emerging Patterns inDrosophilaandArabidopsis. ANNUAL REVIEW OF ECOLOGY EVOLUTION AND SYSTEMATICS 2008. [DOI: 10.1146/annurev.ecolsys.39.110707.173342] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Stephen I. Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks St., Toronto, Ontario, M5S 3B2 Canada,
| | - Peter Andolfatto
- Department of Ecology and Evolutionary Biology and the Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544,
| |
Collapse
|
16
|
Kuntz SG, Schwarz EM, DeModena JA, De Buysscher T, Trout D, Shizuya H, Sternberg PW, Wold BJ. Multigenome DNA sequence conservation identifies Hox cis-regulatory elements. Genome Res 2008; 18:1955-68. [PMID: 18981268 DOI: 10.1101/gr.085472.108] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced approximately 0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide.
Collapse
Affiliation(s)
- Steven G Kuntz
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Varki A, Geschwind DH, Eichler EE. Explaining human uniqueness: genome interactions with environment, behaviour and culture. Nat Rev Genet 2008; 9:749-63. [PMID: 18802414 DOI: 10.1038/nrg2428] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
What makes us human? Specialists in each discipline respond through the lens of their own expertise. In fact, 'anthropogeny' (explaining the origin of humans) requires a transdisciplinary approach that eschews such barriers. Here we take a genomic and genetic perspective towards molecular variation, explore systems analysis of gene expression and discuss an organ-systems approach. Rejecting any 'genes versus environment' dichotomy, we then consider genome interactions with environment, behaviour and culture, finally speculating that aspects of human uniqueness arose because of a primate evolutionary trend towards increasing and irreversible dependence on learned behaviours and culture - perhaps relaxing allowable thresholds for large-scale genomic diversity.
Collapse
Affiliation(s)
- Ajit Varki
- Center for Academic Research and Training in Anthropogeny, University of California, San Diego, La Jolla, California 92093, USA.
| | | | | |
Collapse
|
18
|
Abstract
In the lab, the cis-regulatory network seems to exhibit great functional redundancy. Many experiments testing enhancer activity of neighboring cis-regulatory elements show largely overlapping expression domains. Of recent interest, mice in which cis-regulatory ultraconserved elements were knocked out showed no obvious phenotype, further suggesting functional redundancy. Here, we present a global evolutionary analysis of mammalian conserved nonexonic elements (CNEs), and find strong evidence to the contrary. Given a set of CNEs conserved between several mammals, we characterize functional dispensability as the propensity for the ancestral element to be lost in mammalian species internal to the spanned species tree. We show that ultraconserved-like elements are over 300-fold less likely than neutral DNA to have been lost during rodent evolution. In fact, many thousands of noncoding loci under purifying selection display near uniform indispensability during mammalian evolution, largely irrespective of nucleotide conservation level. These findings suggest that many genomic noncoding elements possess functions that contribute noticeably to organism fitness in naturally evolving populations.
Collapse
Affiliation(s)
- Cory McLean
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
19
|
Abstract
While hundreds of microbial genomes are sequenced, the challenge remains to define their cis-regulatory maps. Here, we present a comparative genomic analysis of the cis-regulatory map of Shewanella oneidensis, an important model organism for bioremediation because of its extraordinary abilities to use a wide variety of metals and organic molecules as electron acceptors in respiration. First, from the experimentally verified transcriptional regulatory networks of Escherichia coli, we inferred 24 DNA motifs that are conserved in S. oneidensis. We then applied a new comparative approach on five Shewanella genomes that allowed us to systematically identify 194 nonredundant palindromic DNA motifs and corresponding regulons in S. oneidensis. Sixty-four percent of the predicted motifs are conserved in at least three of the seven newly sequenced and distantly related Shewanella genomes. In total, we obtained 209 unique DNA motifs in S. oneidensis that cover 849 unique transcription units. Besides conservation in other genomes, 77 of these motifs are supported by at least one additional type of evidence, including matching to known transcription factor binding motifs and significant functional enrichment or expression coherence of the corresponding target genes. Using the same approach on a more focused gene set, 990 differentially expressed genes derived from published microarray data of S. oneidensis during exposure to metal ions, we identified 31 putative cis-regulatory motifs (16 with at least one type of additional supporting evidence) that are potentially involved in the process of metal reduction. The majority (18/31) of those motifs had been found in our whole-genome comparative approach, further demonstrating that such an approach is capable of uncovering a large fraction of the regulatory map of a genome even in the absence of experimental data. The integrated computational approach developed in this study provides a useful strategy to identify genome-wide cis-regulatory maps and a novel avenue to explore the regulatory pathways for particular biological processes in bacterial systems.
Collapse
Affiliation(s)
- Jiajian Liu
- Department of Genetics, Washington University School of Medicine, 660 S Euclid, Box 8232, St Louis, MO 63110, USA
| | | | | |
Collapse
|
20
|
Rana-Díez P, Colón C, Alonso-Fernández JR, Solar A, Barros-Tizón JC, Barros-Casas D, Sirvent J, Carracedo A, Barros F. Three novel mutations in the CFTR gene identified in Galician patients. J Cyst Fibros 2008; 7:520-2. [PMID: 18676185 DOI: 10.1016/j.jcf.2008.05.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2008] [Revised: 05/26/2008] [Accepted: 05/29/2008] [Indexed: 10/21/2022]
Abstract
We report three novel CFTR missense mutations detected in Spanish patients from Galicia (North West of Spain). In the first case, a patient homozygous for a novel S1045Y mutation died due to pulmonary problems. In the other two cases, both heterozygous for novel mutations combined with the F508del mutation, clinical symptoms were different depending on the mutation, detected as M595I and A107V.
Collapse
Affiliation(s)
- P Rana-Díez
- Fundacion Publica Galega de Medicina Xenomica, Grupo de Medicina Xenómica, CIBERER, Santiago de Compostela, 15706 Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Cooper GM, Brown CD. Qualifying the relationship between sequence conservation and molecular function. Genome Res 2008; 18:201-5. [PMID: 18245453 DOI: 10.1101/gr.7205808] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Quantification of evolutionary constraints via sequence conservation can be leveraged to annotate genomic functional sequences. Recent efforts addressing the converse of this relationship have identified many sites in metazoan genomes with molecular function but without detectable conservation between related species. Here, we discuss explanations and implications for these results considering both practical and theoretical issues. In particular, phylogenetic scope influences the relationship between sequence conservation and function. Comparisons of distantly related species can detect constraint with high specificity due to the loss of conserved neutral sequence, but sensitivity is sacrificed as a result of functional changes related to lineage-specific biology. The strength of natural selection operating on functional sequence is also important. Mutations to functional sequences that result in small fitness effects are subject to weaker constraints. Therefore, particularly when comparing highly divergent species, functional sequences that are degenerate or biologically redundant will be prone to turnover, wherein functional sequences are replaced by effectively equivalent, but nonorthologous counterparts. Finally, considering the size and complexity of metazoan genomes and the fact that many nonconserved sequences are associated with sequence-degenerate, low-level molecular functions, we find it likely that there exist many biochemically functional sequences that are not under constraint. This hypothesis does not lead to the conclusion that huge amounts of vertebrate genomes are functionally important, but rather that such "functionality" represents molecular noise that has weak or no effect on organismal phenotypes.
Collapse
Affiliation(s)
- Gregory M Cooper
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | |
Collapse
|
22
|
Maston GA, Evans SK, Green MR. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 2008; 7:29-59. [PMID: 16719718 DOI: 10.1146/annurev.genom.7.080505.115623] [Citation(s) in RCA: 567] [Impact Index Per Article: 33.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The faithful execution of biological processes requires a precise and carefully orchestrated set of steps that depend on the proper spatial and temporal expression of genes. Here we review the various classes of transcriptional regulatory elements (core promoters, proximal promoters, distal enhancers, silencers, insulators/boundary elements, and locus control regions) and the molecular machinery (general transcription factors, activators, and coactivators) that interacts with the regulatory elements to mediate precisely controlled patterns of gene expression. The biological importance of transcriptional regulation is highlighted by examples of how alterations in these transcriptional components can lead to disease. Finally, we discuss the methods currently used to identify transcriptional regulatory elements, and the ability of these methods to be scaled up for the purpose of annotating the entire human genome.
Collapse
Affiliation(s)
- Glenn A Maston
- Howard Hughes Medical Institute, Programs in Gene Function and Expression and Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
| | | | | |
Collapse
|
23
|
Abstract
Nonsynonymous single nucleotide polymorphisms (nsSNPs) are coding variants that introduce amino acid changes in their corresponding proteins. Because nsSNPs can affect protein function, they are believed to have the largest impact on human health compared with SNPs in other regions of the genome. Therefore, it is important to distinguish those nsSNPs that affect protein function from those that are functionally neutral. Here we provide an overview of amino acid substitution (AAS) prediction methods, which use sequence and/or structure to predict the effect of an AAS on protein function. Most methods predict approximately 25-30% of human nsSNPs to negatively affect protein function, and such nsSNPs tend to be rare in the population. We discuss the utility of AAS prediction methods for Mendelian and complex diseases as well as their broader applications for understanding protein function.
Collapse
Affiliation(s)
- Pauline C Ng
- Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
| | | |
Collapse
|
24
|
Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nat Rev Genet 2008; 9:303-13. [PMID: 18347593 DOI: 10.1038/nrg2185] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The comparison of genomic sequences is now a common approach to identifying and characterizing functional regions in vertebrate genomes. However, for theoretical reasons and because of practical issues, the generation of these data sets is non-trivial and can have many pitfalls. We are currently seeing an explosion of comparative sequence data, the benefits and limitations of which need to be disseminated to the scientific community. This Review provides a critical overview of the different types of sequence data that are available for analysis and of contemporary comparative sequence analysis methods, highlighting both their strengths and limitations. Approaches to determining the biological significance of constrained sequence are also explored.
Collapse
|
25
|
Abstract
Recent progress resolving the phylogenetic relationships of the major lineages of mammals has had a broad impact in evolutionary biology, comparative genomics and the biomedical sciences. Novel insights into the timing and historical biogeography of early mammalian diversification have resulted from a new molecular tree for placental mammals coupled with dating approaches that relax the assumption of the molecular clock. We highlight the numerous applications to come from a well-resolved phylogeny and genomic prospecting in multiple lineages of mammals, from identifying regulatory elements in mammalian genomes to assessing the functional consequences of mutations in human disease loci and those driving adaptive evolution.
Collapse
Affiliation(s)
- Mark S Springer
- Department of Biology, University of California, Riverside, CA 92521, USA.
| | | |
Collapse
|
26
|
Abstract
While less than 1.5% of the mammalian genome encodes proteins, it is now evident that the vast majority is transcribed, mainly into non-protein-coding RNAs. This raises the question of what fraction of the genome is functional, i.e., composed of sequences that yield functional products, are required for the expression (regulation or processing) of these products, or are required for chromosome replication and maintenance. Many of the observed noncoding transcripts are differentially expressed, and, while most have not yet been studied, increasing numbers are being shown to be functional and/or trafficked to specific subcellular locations, as well as exhibit subtle evidence of selection. On the other hand, analyses of conservation patterns indicate that only approximately 5% (3%-8%) of the human genome is under purifying selection for functions common to mammals. However, these estimates rely on the assumption that reference sequences (usually ancient transposon-derived sequences) have evolved neutrally, which may not be the case, and if so would lead to an underestimate of the fraction of the genome under evolutionary constraint. These analyses also do not detect functional sequences that are evolving rapidly and/or have acquired lineage-specific functions. Indeed, many regulatory sequences and known functional noncoding RNAs, including many microRNAs, are not conserved over significant evolutionary distances, and recent evidence from the ENCODE project suggests that many functional elements show no detectable level of sequence constraint. Thus, it is likely that much more than 5% of the genome encodes functional information, and although the upper bound is unknown, it may be considerably higher than currently thought.
Collapse
Affiliation(s)
- Michael Pheasant
- ARC Special Research Centre for Functional and Applied Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland 4072, Australia
| | | |
Collapse
|
27
|
Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D, Schwartz AS, Hou M, Taylor J, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Brown JB, Bickel P, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Stone EA, Rosenbloom KR, Kent WJ, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VVB, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Hinrichs A, Trumbower H, Clawson H, Zweig A, Kuhn RM, Barber G, Harte R, Karolchik D, Field MA, Moore RA, Matthewson CA, Schein JE, Marra MA, Antonarakis SE, Batzoglou S, Goldman N, Hardison R, Haussler D, Miller W, Pachter L, Green ED, Sidow A. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 2007; 17:760-74. [PMID: 17567995 PMCID: PMC1891336 DOI: 10.1101/gr.6034307] [Citation(s) in RCA: 149] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.
Collapse
Affiliation(s)
- Elliott H Margulies
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SCJ, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, et alBirney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SCJ, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CWH, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JNS, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PIW, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VVB, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B, de Jong PJ. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007; 447:799-816. [PMID: 17571346 PMCID: PMC2212820 DOI: 10.1038/nature05874] [Show More Authors] [Citation(s) in RCA: 3864] [Impact Index Per Article: 214.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Collapse
|
29
|
Devaney JM, Hoffman EP, Gordish-Dressman H, Kearns A, Zambraski E, Clarkson PM. IGF-II gene region polymorphisms related to exertional muscle damage. J Appl Physiol (1985) 2007; 102:1815-23. [PMID: 17289909 DOI: 10.1152/japplphysiol.01165.2006] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
We examined the association of a novel single-nucleotide polymorphism (SNP) in IGF-I (IGF-I -C1245T located in the promoter) and eight SNPs in the IGF-II gene region with indicators of muscle damage [strength loss, muscle soreness, and increases in circulating levels of creatine kinase (CK) and myoglobin] after eccentric exercise. We also examined two SNPs in the IGF binding protein-3 (IGFBP-3). The age, height, and body mass of the 151 subjects studied were 24.1 +/- 5.2 yr, 170.8 +/- 9.9 cm, and 73.3 +/- 17.0 kg, respectively. There were no significant associations of phenotypes with IGF-I. IGF-II SNP (G12655A, rs3213216) and IGFBP-3 SNP (A8618T, rs6670) were not significantly associated with any variable. The most significant finding in this study was that for men, IGF-II (C13790G, rs3213221), IGF-II (ApaI, G17200A, rs680), IGF-II antisense (IGF2AS) (G11711T, rs7924316), and IGFBP-3 (-C1592A, rs2132570) were significantly associated with muscle damage indicators. We found that men who were 1) homozygous for the rare IGF-II C13790G allele and rare allele for the ApaI (G17200A) SNP demonstrated the greatest strength loss immediately after exercise, greatest soreness, and highest postexercise serum CK activity; 2) homozygous wild type for IGF2AS (G11711T, rs7924316) had the greatest strength loss and most muscle soreness; and 3) homozygous wild type for the IGF2AS G11711T SNP showed the greatest strength loss, highest muscle soreness, and greater CK and myoglobin response to exercise. In women, fewer significant associations appeared.
Collapse
Affiliation(s)
- Joseph M Devaney
- Children's National Medical Center, Washington, District of Columbia, USA
| | | | | | | | | | | |
Collapse
|
30
|
Roh TY, Wei G, Farrell CM, Zhao K. Genome-wide prediction of conserved and nonconserved enhancers by histone acetylation patterns. Genome Res 2006; 17:74-81. [PMID: 17135569 PMCID: PMC1716270 DOI: 10.1101/gr.5767907] [Citation(s) in RCA: 109] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Comparative genomic studies have been useful in identifying transcriptional regulatory elements in higher eukaryotic genomes, but many important regulatory elements cannot be detected by such analyses due to evolutionary variations and alignment tool limitations. Therefore, in this study we exploit the highly conserved nature of epigenetic modifications to identify potential transcriptional enhancers. By using a high-resolution genome-wide mapping technique, which combines the chromatin immunoprecipitation and serial analysis of gene expression assays, we have recently determined the distribution of lysine 9/14-diacetylated histone H3 in human T cells. We showed the existence of 46,813 regions with clusters of histone acetylation, termed histone acetylation islands, some of which correspond to known transcriptional regulatory elements. In the present study, we find that 4679 sequences conserved between human and pufferfish coincide with histone acetylation islands, and random sampling shows that 33% (13/39) of these can function as transcriptional enhancers in human Jurkat T cells. In addition, by comparing the human histone acetylation island sequences with mouse genome sequences, we find that despite the conservation of many of these regions between these species, 21,855 of these sequences are not conserved. Furthermore, we demonstrate that about 50% (26/51) of these nonconserved sequences have enhancer activity in Jurkat cells, and that many of the orthologous mouse sequences also have enhancer activity in addition to conserved epigenetic modification patterns in mouse T-cell chromatin. Therefore, by combining epigenetic modification and sequence data, we have established a novel genome-wide method for identifying regulatory elements not discernable by comparative genomics alone.
Collapse
Affiliation(s)
- Tae-young Roh
- Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Gang Wei
- Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Catherine M. Farrell
- Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Keji Zhao
- Laboratory of Molecular Immunology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
- Corresponding authorE-mail ; fax (301) 480-0961
| |
Collapse
|
31
|
Tang CS, Zhao YZ, Smith DK, Epstein RJ. Intron length and accelerated 3' gene evolution. Genomics 2006; 88:682-689. [PMID: 16928427 DOI: 10.1016/j.ygeno.2006.06.017] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2006] [Revised: 06/27/2006] [Accepted: 06/28/2006] [Indexed: 11/24/2022]
Abstract
Genetic evolution depends in part upon a balance between negative selection and environmentally driven mutation. To explore whether this balance is affected by gene structure, we have used phylogenetic data mining to compare gene compositions across a range of species. Here we show that genomes of higher species exhibit a greater frequency of 5' CpG islands and of CpG-->TpG/CpA transitions. This latter mutational pattern exhibits a 5'-to-3' trend in higher species, consistent with a length-dependent effect on methylation-dependent CpG suppression. Associated strand asymmetry (TpG>CpA) declines with gene length, implying attenuation of transcription-coupled repair 3' to introns. A sharp 3' rise in coding region single-nucleotide polymorphism frequency further supports a mechanistic role for intron length in promoting genetic variation by reducing repair and/or weakening negative selection. Consistent with this, the Ka/Ks ratio of 3' exons exceeds that of centrally located exons in intron-containing, but not in intronless, genes (p<0.0003). We conclude that the efficiency of transcription-coupled repair decreases with gene length, suggesting in turn that 3' gene evolution is accelerated both by introns and by gene methylation.
Collapse
Affiliation(s)
- Clara S Tang
- Laboratory of Computational Oncology, Department of Medicine, Pokfulam, Hong Kong
| | - Yong Z Zhao
- Laboratory of Computational Oncology, Department of Medicine, Pokfulam, Hong Kong
| | - David K Smith
- Department of Biochemistry, The University of Hong Kong, Pokfulam, Hong Kong
| | - Richard J Epstein
- Laboratory of Computational Oncology, Department of Medicine, Pokfulam, Hong Kong.
| |
Collapse
|
32
|
Pollard DA, Moses AM, Iyer VN, Eisen MB. Detecting the limits of regulatory element conservation and divergence estimation using pairwise and multiple alignments. BMC Bioinformatics 2006; 7:376. [PMID: 16904011 PMCID: PMC1613255 DOI: 10.1186/1471-2105-7-376] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2006] [Accepted: 08/14/2006] [Indexed: 01/01/2023] Open
Abstract
Background Molecular evolutionary studies of noncoding sequences rely on multiple alignments. Yet how multiple alignment accuracy varies across sequence types, tree topologies, divergences and tools, and further how this variation impacts specific inferences, remains unclear. Results Here we develop a molecular evolution simulation platform, CisEvolver, with models of background noncoding and transcription factor binding site evolution, and use simulated alignments to systematically examine multiple alignment accuracy and its impact on two key molecular evolutionary inferences: transcription factor binding site conservation and divergence estimation. We find that the accuracy of multiple alignments is determined almost exclusively by the pairwise divergence distance of the two most diverged species and that additional species have a negligible influence on alignment accuracy. Conserved transcription factor binding sites align better than surrounding noncoding DNA yet are often found to be misaligned at relatively short divergence distances, such that studies of binding site gain and loss could easily be confounded by alignment error. Divergence estimates from multiple alignments tend to be overestimated at short divergence distances but reach a tool specific divergence at which they cease to increase, leading to underestimation at long divergences. Our most striking finding was that overall alignment accuracy, binding site alignment accuracy and divergence estimation accuracy vary greatly across branches in a tree and are most accurate for terminal branches connecting sister taxa and least accurate for internal branches connecting sub-alignments. Conclusion Our results suggest that variation in alignment accuracy can lead to errors in molecular evolutionary inferences that could be construed as biological variation. These findings have implications for which species to choose for analyses, what kind of errors would be expected for a given set of species and how multiple alignment tools and phylogenetic inference methods might be improved to minimize or control for alignment errors.
Collapse
Affiliation(s)
- Daniel A Pollard
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
| | - Alan M Moses
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
| | - Venky N Iyer
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Graduate Group in Biophysics, University of California, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
- Department of Genome Sciences, Genomics Division, Ernest Orlando Lawrence Berkeley National Lab, Berkeley, CA 94720, USA
- Center for Integrative Genomics, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
33
|
Prabhakar S, Poulin F, Shoukry M, Afzal V, Rubin EM, Couronne O, Pennacchio LA. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res 2006; 16:855-63. [PMID: 16769978 PMCID: PMC1484452 DOI: 10.1101/gr.4717506] [Citation(s) in RCA: 154] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons. To address this problem, we identified evolutionarily conserved noncoding regions in primate, mammalian, and more distant comparisons using a uniform approach (Gumby) that facilitates unbiased assessment of the impact of evolutionary distance on predictive power. We benchmarked computational predictions against previously identified cis-regulatory elements at diverse genomic loci and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using an in vivo enhancer assay in transgenic mice. Human regulatory elements were identified with acceptable sensitivity (53%-80%) and true-positive rate (27%-67%) by comparison with one to five other eutherian mammals or six other simian primates. More distant comparisons (marsupial, avian, amphibian, and fish) failed to identify many of the empirically defined functional noncoding elements. Our results highlight the practical utility of close sequence comparisons, and the loss of sensitivity entailed by more distant comparisons. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole-genome comparative analysis that explains most of the observations from empirical benchmarking. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for in vivo testing at embryonic time points.
Collapse
Affiliation(s)
- Shyam Prabhakar
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
- Corresponding authors.E-mail ; fax (510) 486-4229. E-mail ; fax (510) 486-4229
| | - Francis Poulin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Malak Shoukry
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Veena Afzal
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Edward M. Rubin
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
| | - Olivier Couronne
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
| | - Len A. Pennacchio
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
- Corresponding authors.E-mail ; fax (510) 486-4229. E-mail ; fax (510) 486-4229
| |
Collapse
|
34
|
Abstract
MicroRNAs are short (∼22 nt) regulatory RNA molecules that play key roles in metazoan development and have been implicated in human disease. First discovered in Caenorhabditis elegans, over 2500 microRNAs have been isolated in metazoans and plants; it has been estimated that there may be more than a thousand microRNA genes in the human genome alone. Motivated by the experimental observation of strong conservation of the microRNA let-7 among nearly all metazoans, we developed a novel methodology to characterize the class of such strongly conserved sequences: we identified a non-redundant set of all sequences 20 to 29 bases in length that are shared among three insects: fly, bee and mosquito. Among the few hundred sequences greater than 20 bases in length are close to 40% of the 78 confirmed fly microRNAs, along with other non-coding RNAs and coding sequence.
Collapse
Affiliation(s)
- T. Tran
- Department of Biochemistry, Baylor College of Medicine TX, USA
| | - P. Havlak
- Department of Human Genome Sequencing Center, Baylor College of Medicine TX, USA
| | - J. Miller
- Department of Biochemistry, Baylor College of Medicine TX, USA
- To whom correspondence should be addressed. Tel: +1 713 798 3542; Fax: +1 713 796 9438;
| |
Collapse
|
35
|
Zhou L, Nian M, Gu J, Irwin DM. Intron 1 sequences are required for pancreatic expression of the human proglucagon gene. Am J Physiol Regul Integr Comp Physiol 2006; 290:R634-41. [PMID: 16223847 DOI: 10.1152/ajpregu.00596.2005] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The mammalian proglucagon gene is expressed in pancreatic islet A-cells, intestinal L-cells, and select neurons of the brain, where posttranslational processing results in the liberation of a unique profile of peptides. Despite the importance of proglucagon-derived peptides in human biology, little is known about the regulation of the human gene, as the rat gene has been the preferred model for understanding the regulation of proglucagon gene expression. Previously, we have shown that although the immediate promoter region of the rat proglucagon gene is sufficient for expression in pancreatic islet cells, the homologous human proglucagon promoter sequences are not sufficient. We have now used a comparative genomic approach to identify noncoding sequences near the human proglucagon gene that are conserved among mammals, and thus potentially are regulatory sequences. Our alignments identified three evolutionarily conserved noncoding regions (ECR), one is the immediate promoter region (ECR1), the second is about 5 kb 5′ to the mRNA start site (ECR2), and the third is near the 3′ end of the first intron (ECR3). Our in vitro transient transfection assays with reporter gene constructs that include the human ECR3 support expression in rodent islet cell lines. Complementary studies with transgenic mice possessing a reporter gene regulated by a human proglucagon gene promoter-intron 1 (including ECR3) sequences express the reporter gene in the pancreas, as well as the intestine and selected neurons. These studies suggest that conserved sequences within intron 1 of the human proglucagon gene are important for expression in the pancreas.
Collapse
Affiliation(s)
- Li Zhou
- Department of Laboratory Medicine and Pathobiology, University of Toronto, 100 College St., Toronto, Ontario, Canada, M5G 1L5
| | | | | | | |
Collapse
|
36
|
Abstract
Genome sequence analysis of RNAs presents special challenges to computational biology, because conserved RNA secondary structure plays a large part in RNA analysis. Algorithms well suited for RNA secondary structure and sequence analysis have been borrowed from computational linguistics. These "stochastic context-free grammar" (SCFG) algorithms have enabled the development of new RNA gene-finding and RNA homology search software. The aim of this paper is to provide an accessible introduction to the strengths and weaknesses of SCFG methods and to describe the state of the art in one particular kind of application: SCFG-based RNA similarity searching. The INFERNAL and RSEARCH programs are capable of identifying distant RNA homologs in a database search by looking for both sequence and secondary structure conservation.
Collapse
Affiliation(s)
- S R Eddy
- Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, Saint Louis, Missouri 63108, USA
| |
Collapse
|