251
|
Abbasi AA. Unraveling ancient segmental duplication events in human genome by phylogenetic analysis of multigene families residing on HOX-cluster paralogons. Mol Phylogenet Evol 2010; 57:836-48. [PMID: 20696259 DOI: 10.1016/j.ympev.2010.07.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2010] [Revised: 07/21/2010] [Accepted: 07/27/2010] [Indexed: 11/25/2022]
Abstract
BACKGROUND Vertebrate genomes contain extensive intra-genomic conserved synteny, which is the presence of similar set of genes on two or more chromosomes (paralogons). The existence of these paralogons has led to the proposal that vertebrate genome was structured by one or more rounds of ancient whole genome duplications (2R hypothesis). RESULTS The 2R hypothesis was tested by phylogenetic analysis of gene families residing on human HOX-bearing chromosomes (HOX-cluster paralogons). These results revealed that, based on their duplication history, 23 gene families with representation on three or four of the human HOX-bearing chromosomes can be partitioned into four discrete co-duplicated groups. The distinct genes within each co-duplicated group share the same evolutionary history and are duplicated in concert with each other, while the constituent genes of two different co-duplicated groups do not share their evolutionary history and are not duplicated simultaneously. These co-duplicated groups are large constituting members from 3 to 8 gene families and suggest that human HOX-cluster paralogons were shaped by ancient segmental duplications (SDs) and rearrangement events that occurred at least as early as before the divergence of bony fishes and tetrapods. CONCLUSIONS Based on the recovery of ancient SD events in this analysis and given the widespread evidence in favor of the fact that recent SD events played a pivotal role in changing genome architecture of primates and other recently diverged animals, it is concluded that a more realistic model of ancient vertebrate genome evolutionary history can be deduced by tracing the evolutionary trajectory of the genomes of recently diverged vertebrate species.
Collapse
Affiliation(s)
- Amir Ali Abbasi
- National Center for Bioinformatics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan.
| |
Collapse
|
252
|
Buljan M, Frankish A, Bateman A. Quantifying the mechanisms of domain gain in animal proteins. Genome Biol 2010; 11:R74. [PMID: 20633280 PMCID: PMC2926785 DOI: 10.1186/gb-2010-11-7-r74] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Revised: 06/04/2010] [Accepted: 07/15/2010] [Indexed: 11/21/2022] Open
Abstract
Background Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Results Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. Conclusions The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Collapse
Affiliation(s)
- Marija Buljan
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | | | | |
Collapse
|
253
|
Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB. Annotating non-coding regions of the genome. Nat Rev Genet 2010; 11:559-71. [PMID: 20628352 DOI: 10.1038/nrg2814] [Citation(s) in RCA: 339] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.
Collapse
Affiliation(s)
- Roger P Alexander
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| | | | | | | | | |
Collapse
|
254
|
Khurana E, Lam HYK, Cheng C, Carriero N, Cayting P, Gerstein MB. Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Res 2010; 38:6997-7007. [PMID: 20615899 PMCID: PMC2978362 DOI: 10.1093/nar/gkq587] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Duplicated pseudogenes in the human genome are disabled copies of functioning parent genes. They result from block duplication events occurring throughout evolutionary history. Relatively recent duplications (with sequence similarity ≥90% and length ≥1 kb) are termed segmental duplications (SDs); here, we analyze the interrelationship of SDs and pseudogenes. We present a decision-tree approach to classify pseudogenes based on their (and their parents’) characteristics in relation to SDs. The classification identifies 140 novel pseudogenes and makes possible improved annotation for the 3172 pseudogenes located in SDs. In particular, it reveals that many pseudogenes in SDs likely did not arise directly from parent genes, but are the result of a multi-step process. In these cases, the initial duplication or retrotransposition of a parent gene gives rise to a ‘parent pseudogene’, followed by further duplication creating duplicated–duplicated or duplicated–processed pseudogenes, respectively. Moreover, we can precisely identify these parent pseudogenes by overlap with ancestral SD loci. Finally, a comparison of nucleotide substitutions per site in a pseudogene with its surrounding SD region allows us to estimate the time difference between duplication and disablement events, and this suggests that most duplicated pseudogenes in SDs were likely disabled around the time of the original duplication.
Collapse
Affiliation(s)
- Ekta Khurana
- Program in Computational Biology and Bioinformatics, Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | | | | | | | | |
Collapse
|
255
|
von Grotthuss M, Ashburner M, Ranz JM. Fragile regions and not functional constraints predominate in shaping gene organization in the genus Drosophila. Genome Res 2010; 20:1084-96. [PMID: 20601587 DOI: 10.1101/gr.103713.109] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
During evolution, gene repatterning across eukaryotic genomes is not uniform. Some genomic regions exhibit a gene organization conserved phylogenetically, while others are recurrently involved in chromosomal rearrangement, resulting in breakpoint reuse. Both gene order conservation and breakpoint reuse can result from the existence of functional constraints on where chromosomal breakpoints occur or from the existence of regions that are susceptible to breakage. The balance between these two mechanisms is still poorly understood. Drosophila species have very dynamic genomes and, therefore, can be very informative. We compared the gene organization of the main five chromosomal elements (Muller's elements A-E) of nine Drosophila species. Under a parsimonious evolutionary scenario, we estimate that 6116 breakpoints differentiate the gene orders of the species and that breakpoint reuse is associated with approximately 80% of the orthologous landmarks. The comparison of the observed patterns of change in gene organization with those predicted under different simulated modes of evolution shows that fragile regions alone can explain the observed key patterns of Muller's element A (X chromosome) more often than for any other Muller's element. High levels of fragility plus constraints operating on approximately 15% of the genome are sufficient to explain the observed patterns of change and conservation across species. The orthologous landmarks more likely to be under constraint exhibit both a remarkable internal functional heterogeneity and a lack of common functional themes with the exception of the presence of highly conserved noncoding elements. Fragile regions rather than functional constraints have been the main determinant of the evolution of the Drosophila chromosomes.
Collapse
Affiliation(s)
- Marcin von Grotthuss
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California 92697, USA
| | | | | |
Collapse
|
256
|
Nowick K, Hamilton AT, Zhang H, Stubbs L. Rapid sequence and expression divergence suggest selection for novel function in primate-specific KRAB-ZNF genes. Mol Biol Evol 2010; 27:2606-17. [PMID: 20573777 DOI: 10.1093/molbev/msq157] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Recent segmental duplications (SDs), arising from duplication events that occurred within the past 35-40 My, have provided a major resource for the evolution of proteins with primate-specific functions. KRAB zinc finger (KRAB-ZNF) transcription factor genes are overrepresented among genes contained within these recent human SDs. Here, we examine the structural and functional diversity of the 70 human KRAB-ZNF genes involved in the most recent primate SD events including genes that arose in the hominid lineage. Despite their recent advent, many parent-daughter KRAB-ZNF gene pairs display significant differences in zinc finger structure and sequence, expression, and splicing patterns, each of which could significantly alter the regulatory functions of the paralogous genes. Paralogs that emerged on the lineage to humans and chimpanzees have undergone more evolutionary changes per unit of time than genes already present in the common ancestor of rhesus macaques and great apes. Taken together, these data indicate that a substantial fraction of the recently evolved primate-specific KRAB-ZNF gene duplicates have acquired novel functions that may possibly define novel regulatory pathways and suggest an active ongoing selection for regulatory diversity in primates.
Collapse
Affiliation(s)
- Katja Nowick
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, USA
| | | | | | | |
Collapse
|
257
|
Teague B, Waterman MS, Goldstein S, Potamousis K, Zhou S, Reslewic S, Sarkar D, Valouev A, Churas C, Kidd JM, Kohn S, Runnheim R, Lamers C, Forrest D, Newton MA, Eichler EE, Kent-First M, Surti U, Livny M, Schwartz DC. High-resolution human genome structure by single-molecule analysis. Proc Natl Acad Sci U S A 2010; 107:10848-53. [PMID: 20534489 PMCID: PMC2890719 DOI: 10.1073/pnas.0914638107] [Citation(s) in RCA: 144] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Variation in genome structure is an important source of human genetic polymorphism: It affects a large proportion of the genome and has a variety of phenotypic consequences relevant to health and disease. In spite of this, human genome structure variation is incompletely characterized due to a lack of approaches for discovering a broad range of structural variants in a global, comprehensive fashion. We addressed this gap with Optical Mapping, a high-throughput, high-resolution single-molecule system for studying genome structure. We used Optical Mapping to create genome-wide restriction maps of a complete hydatidiform mole and three lymphoblast-derived cell lines, and we validated the approach by demonstrating a strong concordance with existing methods. We also describe thousands of new variants with sizes ranging from kb to Mb.
Collapse
Affiliation(s)
- Brian Teague
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Michael S. Waterman
- Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089-2910
| | - Steven Goldstein
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Konstantinos Potamousis
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Shiguo Zhou
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Susan Reslewic
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Deepayan Sarkar
- Department of Statistics, University of Wisconsin, 1300 University Avenue, Madison, WI 53706-1510
| | - Anton Valouev
- Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089-2910
| | - Christopher Churas
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Jeffrey M. Kidd
- Department of Genome Sciences, University of Washington, 1705 NE Pacific Street, Seattle, WA 98195-5065
| | - Scott Kohn
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Rodney Runnheim
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Casey Lamers
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Dan Forrest
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| | - Michael A. Newton
- Department of Statistics, University of Wisconsin, 1300 University Avenue, Madison, WI 53706-1510
- Department of Biostatistics and Medical Informatics, University of Wisconsin, 1300 University Avenue, Madison, WI 53706-1510
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 1705 NE Pacific Street, Seattle, WA 98195-5065
| | - Marijo Kent-First
- Department of Animal Science, Department of Biological Sciences, Mississippi State University, 130 Harned Hall, Lee Boulevard, Mississippi State, MS 39762-9698
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15213-2536; and
| | - Miron Livny
- Department of Computer Sciences, University of Wisconsin, 1210 West Dayton Street, Madison, WI 53706-1685
| | - David C. Schwartz
- The Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin, 425 Henry Mall, Madison, WI 53706-1580
| |
Collapse
|
258
|
Shumay E, Fowler JS, Volkow ND. Genomic features of the human dopamine transporter gene and its potential epigenetic States: implications for phenotypic diversity. PLoS One 2010; 5:e11067. [PMID: 20548783 PMCID: PMC2883569 DOI: 10.1371/journal.pone.0011067] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2010] [Accepted: 05/18/2010] [Indexed: 02/06/2023] Open
Abstract
Human dopamine transporter gene (DAT1 or SLC6A3) has been associated with various brain-related diseases and behavioral traits and, as such, has been investigated intensely in experimental- and clinical-settings. However, the abundance of research data has not clarified the biological mechanism of DAT regulation; similarly, studies of DAT genotype-phenotype associations yielded inconsistent results. Hence, our understanding of the control of the DAT protein product is incomplete; having this knowledge is critical, since DAT plays the major role in the brain's dopaminergic circuitry. Accordingly, we reevaluated the genomic attributes of the SLC6A3 gene that might confer sensitivity to regulation, hypothesizing that its unique genomic characteristics might facilitate highly dynamic, region-specific DAT expression, so enabling multiple regulatory modes. Our comprehensive bioinformatic analyzes revealed very distinctive genomic characteristics of the SLC6A3, including high inter-individual variability of its sequence (897 SNPs, about 90 repeats and several CNVs spell out all abbreviations in abstract) and pronounced sensitivity to regulation by epigenetic mechanisms, as evident from the GC-bias composition (0.55) of the SLC6A3, and numerous intragenic CpG islands (27 CGIs). We propose that this unique combination of the genomic features and the regulatory attributes enables the differential expression of the DAT1 gene and fulfills seemingly contradictory demands to its regulation; that is, robustness of region-specific expression and functional dynamics.
Collapse
Affiliation(s)
- Elena Shumay
- Brookhaven National Laboratory, Medical Department, Upton, New York, United States of America
- * E-mail: (ES); (JSF); (NDV)
| | - Joanna S. Fowler
- Brookhaven National Laboratory, Medical Department, Upton, New York, United States of America
- * E-mail: (ES); (JSF); (NDV)
| | - Nora D. Volkow
- National Institute on Drug Abuse, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (ES); (JSF); (NDV)
| |
Collapse
|
259
|
Brown CA, Murray AW, Verstrepen KJ. Rapid expansion and functional divergence of subtelomeric gene families in yeasts. Curr Biol 2010; 20:895-903. [PMID: 20471265 DOI: 10.1016/j.cub.2010.04.027] [Citation(s) in RCA: 245] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2010] [Revised: 04/13/2010] [Accepted: 04/14/2010] [Indexed: 12/31/2022]
Abstract
BACKGROUND Subtelomeres, regions proximal to telomeres, exhibit characteristics unique to eukaryotic genomes. Genes residing in these loci are subject to epigenetic regulation and elevated rates of both meiotic and mitotic recombination. However, most genome sequences do not contain assembled subtelomeric sequences, and, as a result, subtelomeres are often overlooked in comparative genomics. RESULTS We studied the evolution and functional divergence of subtelomeric gene families in the yeast lineage. Our computational results show that subtelomeric families are evolving and expanding much faster than families that do not contain subtelomeric genes. Focusing on three related subtelomeric MAL gene families involved in disaccharide metabolism that show typical patterns of rapid expansion and evolution, we show experimentally how frequent duplication events followed by functional divergence yield novel alleles that allow the metabolism of different carbohydrates. CONCLUSIONS Taken together, our computational and experimental analyses show that the extraordinary instability of eukaryotic subtelomeres supports rapid adaptation to novel niches by promoting gene recombination and duplication followed by functional divergence of the alleles.
Collapse
Affiliation(s)
- Chris A Brown
- Faculty of Arts and Sciences Center for Systems Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | | | | |
Collapse
|
260
|
Dick DM, Riley B, Kendler KS. Nature and nurture in neuropsychiatric genetics: where do we stand? DIALOGUES IN CLINICAL NEUROSCIENCE 2010. [PMID: 20373663 PMCID: PMC3181950 DOI: 10.31887/dcns.2010.12.1/ddick] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Both genetic and nongenetic risk factors, as well as interactions and correlations between them, are thought to contribute to the etiology of psychiatric and behavioral phenotypes. Genetic epidemiology consistently supports the involvement of genes in liability. Molecular genetic studies have been less successful in identifying liability genes, but recent progress suggests that a number of specific genes contributing to risk have been identified. Collectively, the results are complex and inconsistent, with a single common DNA variant in any gene influencing risk across human populations. Few specific genetic variants influencing risk have been unambiguously identified. Contemporary approaches, however, hold great promise to further elucidate liability genes and variants, as well as their potential inter-relationships with each other and with the environment. We will review the fields of genetic epidemiology and molecular genetics, providing examples from the literature to illustrate the key concepts emerging from this work.
Collapse
Affiliation(s)
- Danielle M Dick
- Department of Psychiatry, Virginia Institute of Psychiatric and Behavioral Genetics, Richmond 23298, USA
| | | | | |
Collapse
|
261
|
Völker M, Backström N, Skinner BM, Langley EJ, Bunzey SK, Ellegren H, Griffin DK. Copy number variation, chromosome rearrangement, and their association with recombination during avian evolution. Genome Res 2010; 20:503-11. [PMID: 20357050 PMCID: PMC2847753 DOI: 10.1101/gr.103663.109] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2009] [Accepted: 02/08/2010] [Indexed: 11/25/2022]
Abstract
Chromosomal rearrangements and copy number variants (CNVs) play key roles in genome evolution and genetic disease; however, the molecular mechanisms underlying these types of structural genomic variation are not fully understood. The availability of complete genome sequences for two bird species, the chicken and the zebra finch, provides, for the first time, an ideal opportunity to analyze the relationship between structural genomic variation (chromosomal and CNV) and recombination on a genome-wide level. The aims of this study were therefore threefold: (1) to combine bioinformatics, physical mapping to produce comprehensive comparative maps of the genomes of chicken and zebra finch. In so doing, this allowed the identification of evolutionary chromosomal rearrangements distinguishing them. The previously reported interchromosomal conservation of synteny was confirmed, but a larger than expected number of intrachromosomal rearrangements were reported; (2) to hybridize zebra finch genomic DNA to a chicken tiling path microarray and identify CNVs in the zebra finch genome relative to chicken; 32 interspecific CNVs were identified; and (3) to test the hypothesis that there is an association between CNV, chromosomal rearrangements, and recombination by correlating data from (1) and (2) with recombination rate data from a high-resolution genetic linkage map of the zebra finch. We found a highly significant association of both chromosomal rearrangements and CNVs with elevated recombination rates. The results thus provide support for the notion of recombination-based processes playing a major role in avian genome evolution.
Collapse
Affiliation(s)
- Martin Völker
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NJ, United Kingdom
| | - Niclas Backström
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| | - Benjamin M. Skinner
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NJ, United Kingdom
| | - Elizabeth J. Langley
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NJ, United Kingdom
| | - Sydney K. Bunzey
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NJ, United Kingdom
| | - Hans Ellegren
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| | - Darren K. Griffin
- School of Biosciences, University of Kent, Canterbury, Kent CT2 7NJ, United Kingdom
| |
Collapse
|
262
|
Lemmers RJLF, van der Vliet PJ, van der Gaag KJ, Zuniga S, Frants RR, de Knijff P, van der Maarel SM. Worldwide population analysis of the 4q and 10q subtelomeres identifies only four discrete interchromosomal sequence transfers in human evolution. Am J Hum Genet 2010; 86:364-77. [PMID: 20206332 DOI: 10.1016/j.ajhg.2010.01.035] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2009] [Revised: 01/07/2010] [Accepted: 01/22/2010] [Indexed: 01/01/2023] Open
Abstract
Subtelomeres are dynamic structures composed of blocks of homologous DNA sequences. These so-called duplicons are dispersed over many chromosome ends. We studied the human 4q and 10q subtelomeres, which contain the polymorphic macrosatellite repeat D4Z4 and which share high sequence similarity over a region of, on average, >200 kb. Sequence analysis of four polymorphic markers in the African, European, and Asian HAPMAP panels revealed 17 subtelomeric 4q and eight subtelomeric 10qter haplotypes. Haplotypes that are composed of a mixture of 4q and 10q sequences were detected at frequencies >10% in all three populations, seemingly supporting a mechanism of ongoing interchromosomal exchanges between these chromosomes. We constructed an evolutionary network of most haplotypes and identified the 4q haplotype ancestral to all 4q and 10q haplotypes. According to the network, all subtelomeres originate from only four discrete sequence-transfer events during human evolution, and haplotypes with mixtures of 4q- and 10q-specific sequences represent intermediate structures in the transition from 4q to 10q subtelomeres. Haplotype distribution studies on a large number of globally dispersed human DNA samples from the HGDP-CEPH panel supported our findings and show that all haplotypes were present before human migration out of Africa. D4Z4 repeat array contractions on the 4A161 haplotype cause Facioscapulohumeral muscular dystrophy (FSHD), whereas contractions on most other haplotypes are nonpathogenic. We propose that the limited occurrence of interchromosomal sequence transfers results in an accumulation of haplotype-specific polymorphisms that can explain the unique association of FSHD with D4Z4 contractions in a single 4q subtelomere.
Collapse
MESH Headings
- Alleles
- Base Sequence
- Chromosomes, Human, Pair 10/genetics
- Chromosomes, Human, Pair 4/genetics
- DNA/genetics
- DNA Primers/genetics
- Databases, Nucleic Acid
- Evolution, Molecular
- Genetics, Population
- Haplotypes
- Humans
- Molecular Sequence Data
- Polymorphism, Genetic
- Repetitive Sequences, Nucleic Acid
- Sequence Homology, Nucleic Acid
- Telomere/genetics
Collapse
Affiliation(s)
- Richard J L F Lemmers
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
263
|
Chen FC, Chen CJ, Li WH, Chuang TJ. Gene family size conservation is a good indicator of evolutionary rates. Mol Biol Evol 2010; 27:1750-8. [PMID: 20194423 PMCID: PMC2908708 DOI: 10.1093/molbev/msq055] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The evolution of duplicate genes has been a topic of broad interest. Here, we propose that the conservation of gene family size is a good indicator of the rate of sequence evolution and some other biological properties. By comparing the human–chimpanzee–macaque orthologous gene families with and without family size conservation, we demonstrate that genes with family size conservation evolve more slowly than those without family size conservation. Our results further demonstrate that both family expansion and contraction events may accelerate gene evolution, resulting in elevated evolutionary rates in the genes without family size conservation. In addition, we show that the duplicate genes with family size conservation evolve significantly more slowly than those without family size conservation. Interestingly, the median evolutionary rate of singletons falls in between those of the above two types of duplicate gene families. Our results thus suggest that the controversy on whether duplicate genes evolve more slowly than singletons can be resolved when family size conservation is taken into consideration. Furthermore, we also observe that duplicate genes with family size conservation have the highest level of gene expression/expression breadth, the highest proportion of essential genes, and the lowest gene compactness, followed by singletons and then by duplicate genes without family size conservation. Such a trend accords well with our observations of evolutionary rates. Our results thus point to the importance of family size conservation in the evolution of duplicate genes.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan
| | | | | | | |
Collapse
|
264
|
Genome destabilization by homologous recombination in the germ line. Nat Rev Mol Cell Biol 2010; 11:182-95. [PMID: 20164840 DOI: 10.1038/nrm2849] [Citation(s) in RCA: 159] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Meiotic recombination, which promotes proper homologous chromosome segregation at the first meiotic division, normally occurs between allelic sequences on homologues. However, recombination can also take place between non-allelic DNA segments that share high sequence identity. Such non-allelic homologous recombination (NAHR) can markedly alter genome architecture during gametogenesis by generating chromosomal rearrangements. Indeed, NAHR-mediated deletions, duplications, inversions and other alterations have been implicated in numerous human genetic disorders. Studies in yeast have provided insights into the molecular mechanisms of meiotic NAHR as well as the cellular strategies that limit it.
Collapse
|
265
|
Abstract
Single nucleotide polymorphism arrays (SNP-A) have recently been widely applied as a powerful karyotyping tool in numerous translational cancer studies. SNP-A complements traditional metaphase cytogenetics with the unique ability to delineate a previously hidden chromosomal defect, copy neutral loss of heterozygosity (CN-LOH). Emerging data demonstrate that selected hematologic malignancies exhibit abundant CN-LOH, often in the setting of a normal metaphase karyotype and no previously identified clonal marker. In this review, we explore emerging biologic and clinical features of CN-LOH relevant to hematologic malignancies. In myeloid malignancies, CN-LOH has been associated with the duplication of oncogenic mutations with concomitant loss of the normal allele. Examples include JAK2, MPL, c-KIT, and FLT3. More recent investigations have focused on evaluation of candidate genes contained in common CN-LOH and deletion regions and have led to the discovery of tumor suppressor genes, including c-CBL and family members, as well as TET2. Investigations into the underlying mechanisms generating CN-LOH have great promise for elucidating general cancer mechanisms. We anticipate that further detailed characterization of CN-LOH lesions will probably facilitate our discovery of a more complete set of pathogenic molecular lesions, disease and prognosis markers, and better understanding of the initiation and progression of hematologic malignancies.
Collapse
|
266
|
Jun J, Ryvkin P, Hemphill E, Nelson C. Duplication mechanism and disruptions in flanking regions determine the fate of Mammalian gene duplicates. J Comput Biol 2010; 16:1253-66. [PMID: 19772436 DOI: 10.1089/cmb.2009.0074] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Here we identify duplicated genes in five mammalian genomes and classify these duplicates based on the mechanisms by which they were generated. Retrotransposition accounts for at least half of all predicted duplicate genes in these genomes, with tandem and interspersed DNA-mediated duplicates comprising the other half. Estimation of the evolutionary rates in each class revealed greater rate asymmetry between retrotransposed and interspersed DNA duplicate pairs than between tandem duplicates, suggesting that retrotransposed and interspersed DNA duplicates are diverging more quickly. In an attempt to understand the basis of this asymmetry, we identified disruption of flanking DNA as an indicator of new duplicate fate-loss of local synteny accelerates the asymmetry of divergence of interspersed DNA duplicates. We also show that intact retrogenes are enriched in intergenic regions and indel purified regions of the human genome. Moreover, intact retrogenes closest to annotated genes show the greatest levels of purifying selective pressure. Together, these findings suggest that the differential evolution of duplicate genes may be significantly influenced by changes in local genome architecture.
Collapse
Affiliation(s)
- Jin Jun
- Department of Computer Science and Engineering, University of Connecticut , Storrs, CT 06269, USA
| | | | | | | |
Collapse
|
267
|
Pasic I, Shlien A, Durbin AD, Stavropoulos DJ, Baskin B, Ray PN, Novokmet A, Malkin D. Recurrent focal copy-number changes and loss of heterozygosity implicate two noncoding RNAs and one tumor suppressor gene at chromosome 3q13.31 in osteosarcoma. Cancer Res 2010; 70:160-71. [PMID: 20048075 DOI: 10.1158/0008-5472.can-09-1902] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Osteosarcomas are copy number alteration (CNA)-rich malignant bone tumors. Using microarrays, fluorescence in situ hybridization, and quantitative PCR, we characterize a focal region of chr3q13.31 (osteo3q13.31) harboring CNAs in 80% of osteosarcomas. As such, osteo3q13.31 is the most altered region in osteosarcoma and contests the view that CNAs in osteosarcoma are nonrecurrent. Most (67%) osteo3q13.31 CNAs are deletions, with 75% of these monoallelic and frequently accompanied by loss of heterozygosity (LOH) in flanking DNA. Notably, these CNAs often involve the noncoding RNAs LOC285194 and BC040587 and, in some cases, a tumor suppressor gene that encodes the limbic system-associated membrane protein (LSAMP). Ubiquitous changes occur in these genes in osteosarcoma, usually involving loss of expression. Underscoring their functional significance, expression of these genes is correlated with the presence of osteo3q13.31 CNAs. Focal osteo3q13.31 CNAs and LOH are also common in cell lines from other cancers, identifying osteo3q13.31 as a generalized candidate region for tumor suppressor genes. Osteo3q13.31 genes may function as a unit, given significant correlation in their expression despite the great genetic distances between them. In support of this notion, depleting either LSAMP or LOC285194 promoted proliferation of normal osteoblasts by regulation of apoptotic and cell-cycle transcripts and also VEGF receptor 1. Moreover, genetic deletions of LOC285194 or BC040587 were also associated with poor survival of osteosarcoma patients. Our findings identify osteo3q13.31 as a novel region of cooperatively acting tumor suppressor genes.
Collapse
Affiliation(s)
- Ivan Pasic
- Institute of Medical Science, Department of Medical Biophysics, University of Toronto, and Program in Genetics and Genome Biology, Department of Pediatric Laboratory Medicine, Division of Hematology/Oncology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | |
Collapse
|
268
|
Genomic segmental duplications on the basis of the t(9;22) rearrangement in chronic myeloid leukemia. Oncogene 2010; 29:2509-16. [DOI: 10.1038/onc.2009.524] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
269
|
Nowick K, Stubbs L. Lineage-specific transcription factors and the evolution of gene regulatory networks. Brief Funct Genomics 2010; 9:65-78. [PMID: 20081217 DOI: 10.1093/bfgp/elp056] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Nature is replete with examples of diverse cell types, tissues and body plans, forming very different creatures from genomes with similar gene complements. However, while the genes and the structures of proteins they encode can be highly conserved, the production of those proteins in specific cell types and at specific developmental time points might differ considerably between species. A full understanding of the factors that orchestrate gene expression will be essential to fully understand evolutionary variety. Transcription factor (TF) proteins, which form gene regulatory networks (GRNs) to act in cooperative or competitive partnerships to regulate gene expression, are key components of these unique regulatory programs. Although many TFs are conserved in structure and function, certain classes of TFs display extensive levels of species diversity. In this review, we highlight families of TFs that have expanded through gene duplication events to create species-unique repertoires in different evolutionary lineages. We discuss how the hierarchical structures of GRNs allow for flexible small to large-scale phenotypic changes. We survey evidence that explains how newly evolved TFs may be integrated into an existing GRN and how molecular changes in TFs might impact the GRNs. Finally, we review examples of traits that evolved due to lineage-specific TFs and species differences in GRNs.
Collapse
Affiliation(s)
- Katja Nowick
- Department of Cell and Developmental Biology, Institute for Genomic Biology, University of Illinois, 1206 W. Gregory Drive, Urbana, IL 61802, USA
| | | |
Collapse
|
270
|
Evolution in health and medicine Sackler colloquium: Genomic disorders: a window into human gene and genome evolution. Proc Natl Acad Sci U S A 2010; 107 Suppl 1:1765-71. [PMID: 20080665 DOI: 10.1073/pnas.0906222107] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Gene duplications alter the genetic constitution of organisms and can be a driving force of molecular evolution in humans and the great apes. In this context, the study of genomic disorders has uncovered the essential role played by the genomic architecture, especially low copy repeats (LCRs) or segmental duplications (SDs). In fact, regardless of the mechanism, LCRs can mediate or stimulate rearrangements, inciting genomic instability and generating dynamic and unstable regions prone to rapid molecular evolution. In humans, copy-number variation (CNV) has been implicated in common traits such as neuropathy, hypertension, color blindness, infertility, and behavioral traits including autism and schizophrenia, as well as disease susceptibility to HIV, lupus nephritis, and psoriasis among many other clinical phenotypes. The same mechanisms implicated in the origin of genomic disorders may also play a role in the emergence of segmental duplications and the evolution of new genes by means of genomic and gene duplication and triplication, exon shuffling, exon accretion, and fusion/fission events.
Collapse
|
271
|
Abbasi AA. Piecemeal or big bangs: correlating the vertebrate evolution with proposed models of gene expansion events. Nat Rev Genet 2010; 11:166. [PMID: 20051988 DOI: 10.1038/nrg2600-c1] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
272
|
The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet 2010; 11:97-108. [PMID: 20051986 DOI: 10.1038/nrg2689] [Citation(s) in RCA: 899] [Impact Index Per Article: 59.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Gene duplications and their subsequent divergence play an important part in the evolution of novel gene functions. Several models for the emergence, maintenance and evolution of gene copies have been proposed. However, a clear consensus on how gene duplications are fixed and maintained in genomes is lacking. Here, we present a comprehensive classification of the models that are relevant to all stages of the evolution of gene duplications. Each model predicts a unique combination of evolutionary dynamics and functional properties. Setting out these predictions is an important step towards identifying the main mechanisms that are involved in the evolution of gene duplications.
Collapse
|
273
|
Kahn CL, Mozes S, Raphael BJ. Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes. Algorithms Mol Biol 2010; 5:11. [PMID: 20047668 PMCID: PMC2820476 DOI: 10.1186/1748-7188-5-11] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 01/04/2010] [Indexed: 02/06/2023] Open
Abstract
Background Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG). Conclusion These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.
Collapse
|
274
|
Abstract
High-throughput genotyping technologies have become popular in studies that aim to reveal the genetics behind polygenic traits such as complex disease and the diverse response to some drug treatments. These technologies utilize bioinformatics tools to define strategies, analyze data, and estimate the final associations between certain genetic markers and traits. The strategy followed for an association study depends on its efficiency and cost. The efficiency is based on the assumed characteristics of the polymorphisms' allele frequencies and linkage disequilibrium for putative casual alleles. Statistically significant markers (single mutations or haplotypes) that cause a human disorder should be validated and their biological function elucidated. The aim of this chapter is to present a subset of bioinformatics tools for haplotype inference, tag SNP selection, and genome-wide association studies using a high-throughput generated SNP data set.
Collapse
Affiliation(s)
- Ana M Aransay
- Functional Genomics Unit, Parque Technológico de Bizkaia, Derio, Spain
| | | | | |
Collapse
|
275
|
Abstract
The rat is an important system for modeling human disease. Four years ago, the rich 150-year history of rat research was transformed by the sequencing and annotation of the rat genome, ushering in an era of exceptional opportunity for identifying genes and pathways underlying disease phenotypes. With the genome sequence in place, there is the prospect of not only linking the extensive literature of mechanistic and pharmacological studies in the rat to its genome, but by using comparative genomics to other organisms as well. Genome-wide association studies (GWAS) in human populations have recently provided a direct approach for finding robust genetic associations in common diseases, but identifying the precise genes and their mechanisms of action remains challenging.The explosion of genomic tools and sequence over the last decade has created a wealth of data. Along with the data has arisen a need to manage it and to make it usable to scientists with a wide-range of research interests. This chapter is designed to overview the existing sequence and its utility, as well as provide a glimpse of some of the databases and bioinformatic tools available to the investigator.
Collapse
Affiliation(s)
- Elizabeth A Worthey
- Human & Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | | |
Collapse
|
276
|
Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat Biotechnol 2009; 28:47-55. [PMID: 20037582 DOI: 10.1038/nbt.1600] [Citation(s) in RCA: 135] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 12/08/2009] [Indexed: 11/09/2022]
Abstract
Structural variants (SVs) are a major source of human genomic variation; however, characterizing them at nucleotide resolution remains challenging. Here we assemble a library of breakpoints at nucleotide resolution from collating and standardizing ~2,000 published SVs. For each breakpoint, we infer its ancestral state (through comparison to primate genomes) and its mechanism of formation (e.g., nonallelic homologous recombination, NAHR). We characterize breakpoint sequences with respect to genomic landmarks, chromosomal location, sequence motifs and physical properties, finding that the occurrence of insertions and deletions is more balanced than previously reported and that NAHR-formed breakpoints are associated with relatively rigid, stable DNA helices. Finally, we demonstrate an approach, BreakSeq, for scanning the reads from short-read sequenced genomes against our breakpoint library to accurately identify previously overlooked SVs, which we then validate by PCR. As new data become available, we expect our BreakSeq approach will become more sensitive and facilitate rapid SV genotyping of personal genomes.
Collapse
|
277
|
Tandem repeats modify the structure of human genes hosted in segmental duplications. Genome Biol 2009; 10:R137. [PMID: 19954527 PMCID: PMC2812944 DOI: 10.1186/gb-2009-10-12-r137] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Revised: 10/08/2009] [Accepted: 12/02/2009] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Recently duplicated genes are often subject to genomic rearrangements that can lead to the development of novel gene structures. Here we specifically investigated the effect of variations in internal tandem repeats (ITRs) on the gene structure of human paralogs located in segmental duplications. RESULTS We found that around 7% of the primate-specific genes located within duplicated regions of the genome contain variable tandem repeats. These genes are members of large groups of recently duplicated paralogs that are often polymorphic in the human population. Half of the identified ITRs occur within coding exons and may be either kept or spliced out from the mature transcript. When ITRs reside within exons, they encode variable amino acid repeats. When located at exon-intron boundaries, ITRs can generate alternative splicing patterns through the formation of novel introns. CONCLUSIONS Our study shows that variation in the number of ITRs impacts on recently duplicated genes by modifying their coding sequence, splicing pattern, and tissue expression. The resulting effect is the production of a variety of primate-specific proteins, which mostly differ in number and sequence of amino acid repeats.
Collapse
|
278
|
Liu GE, Ventura M, Cellamare A, Chen L, Cheng Z, Zhu B, Li C, Song J, Eichler EE. Analysis of recent segmental duplications in the bovine genome. BMC Genomics 2009; 10:571. [PMID: 19951423 PMCID: PMC2796684 DOI: 10.1186/1471-2164-10-571] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2009] [Accepted: 12/01/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Duplicated sequences are an important source of gene innovation and structural variation within mammalian genomes. We performed the first systematic and genome-wide analysis of segmental duplications in the modern domesticated cattle (Bos taurus). Using two distinct computational analyses, we estimated that 3.1% (94.4 Mb) of the bovine genome consists of recently duplicated sequences (>or= 1 kb in length, >or= 90% sequence identity). Similar to other mammalian draft assemblies, almost half (47% of 94.4 Mb) of these sequences have not been assigned to cattle chromosomes. RESULTS In this study, we provide the first experimental validation large duplications and briefly compared their distribution on two independent bovine genome assemblies using fluorescent in situ hybridization (FISH). Our analyses suggest that the (75-90%) of segmental duplications are organized into local tandem duplication clusters. Along with rodents and carnivores, these results now confidently establish tandem duplications as the most likely mammalian archetypical organization, in contrast to humans and great ape species which show a preponderance of interspersed duplications. A cross-species survey of duplicated genes and gene families indicated that duplication, positive selection and gene conversion have shaped primates, rodents, carnivores and ruminants to different degrees for their speciation and adaptation. We identified that bovine segmental duplications corresponding to genes are significantly enriched for specific biological functions such as immunity, digestion, lactation and reproduction. CONCLUSION Our results suggest that in most mammalian lineages segmental duplications are organized in a tandem configuration. Segmental duplications remain problematic for genome and assembly and we highlight genic regions that require higher quality sequence characterization. This study provides insights into mammalian genome evolution and generates a valuable resource for cattle genomics research.
Collapse
Affiliation(s)
- George E Liu
- USDA, ARS, ANRI, Bovine Functional Genomics Laboratory, Beltsville, Maryland 20705, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
279
|
Delprat A, Negre B, Puig M, Ruiz A. The transposon Galileo generates natural chromosomal inversions in Drosophila by ectopic recombination. PLoS One 2009; 4:e7883. [PMID: 19936241 PMCID: PMC2775673 DOI: 10.1371/journal.pone.0007883] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2009] [Accepted: 10/01/2009] [Indexed: 11/25/2022] Open
Abstract
Background Transposable elements (TEs) are responsible for the generation of chromosomal inversions in several groups of organisms. However, in Drosophila and other Dipterans, where inversions are abundant both as intraspecific polymorphisms and interspecific fixed differences, the evidence for a role of TEs is scarce. Previous work revealed that the transposon Galileo was involved in the generation of two polymorphic inversions of Drosophila buzzatii. Methodology/Principal Findings To assess the impact of TEs in Drosophila chromosomal evolution and shed light on the mechanism involved, we isolated and sequenced the two breakpoints of another widespread polymorphic inversion from D. buzzatii, 2z3. In the non inverted chromosome, the 2z3 distal breakpoint was located between genes CG2046 and CG10326 whereas the proximal breakpoint lies between two novel genes that we have named Dlh and Mdp. In the inverted chromosome, the analysis of the breakpoint sequences revealed relatively large insertions (2,870-bp and 4,786-bp long) including two copies of the transposon Galileo (subfamily Newton), one at each breakpoint, plus several other TEs. The two Galileo copies: (i) are inserted in opposite orientation; (ii) present exchanged target site duplications; and (iii) are both chimeric. Conclusions/Significance Our observations provide the best evidence gathered so far for the role of TEs in the generation of Drosophila inversions. In addition, they show unequivocally that ectopic recombination is the causative mechanism. The fact that the three polymorphic D. buzzatii inversions investigated so far were generated by the same transposon family is remarkable and is conceivably due to Galileo's unusual structure and current (or recent) transpositional activity.
Collapse
Affiliation(s)
- Alejandra Delprat
- Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Bellaterra (Barcelona), Spain
| | | | | | | |
Collapse
|
280
|
Transposable elements in gene regulation and in the evolution of vertebrate genomes. Curr Opin Genet Dev 2009; 19:607-12. [PMID: 19914058 DOI: 10.1016/j.gde.2009.10.013] [Citation(s) in RCA: 114] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Revised: 10/20/2009] [Accepted: 10/26/2009] [Indexed: 01/30/2023]
Abstract
Repetitive DNA and in particular transposable elements have been intimately linked to eukaryotic genomes for millions of years. Once overlooked for being only a collection of selfish debris and a nuisance for sequence assembly, genomic repeats are now being recognized as a key driving force in genome evolution. Indeed, by changing the DNA landscape of genomes, transposable elements have been a rich source of innovation in genes, regulatory elements and genome structures. In this review, I will focus on recent advances that demonstrate that genomic repeats have had a global impact on vertebrate gene regulatory networks. I will also summarize results that show how transposable elements have been a major catalyst of structural rearrangements throughout evolution.
Collapse
|
281
|
Liu YJ, Zheng D, Balasubramanian S, Carriero N, Khurana E, Robilotto R, Gerstein MB. Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics 2009; 10:480. [PMID: 19835609 PMCID: PMC2770531 DOI: 10.1186/1471-2164-10-480] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 10/16/2009] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Pseudogenes provide a record of the molecular evolution of genes. As glycolysis is such a highly conserved and fundamental metabolic pathway, the pseudogenes of glycolytic enzymes comprise a standardized genomic measuring stick and an ideal platform for studying molecular evolution. One of the glycolytic enzymes, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), has already been noted to have one of the largest numbers of associated pseudogenes, among all proteins. RESULTS We assembled the first comprehensive catalog of the processed and duplicated pseudogenes of glycolytic enzymes in many vertebrate model-organism genomes, including human, chimpanzee, mouse, rat, chicken, zebrafish, pufferfish, fruitfly, and worm (available at http://pseudogene.org/glycolysis/). We found that glycolytic pseudogenes are predominantly processed, i.e. retrotransposed from the mRNA of their parent genes. Although each glycolytic enzyme plays a unique role, GAPDH has by far the most pseudogenes, perhaps reflecting its large number of non-glycolytic functions or its possession of a particularly retrotranspositionally active sub-sequence. Furthermore, the number of GAPDH pseudogenes varies significantly among the genomes we studied: none in zebrafish, pufferfish, fruitfly, and worm, 1 in chicken, 50 in chimpanzee, 62 in human, 331 in mouse, and 364 in rat. Next, we developed a simple method of identifying conserved syntenic blocks (consistently applicable to the wide range of organisms in the study) by using orthologous genes as anchors delimiting a conserved block between a pair of genomes. This approach showed that few glycolytic pseudogenes are shared between primate and rodent lineages. Finally, by estimating pseudogene ages using Kimura's two-parameter model of nucleotide substitution, we found evidence for bursts of retrotranspositional activity approximately 42, 36, and 26 million years ago in the human, mouse, and rat lineages, respectively. CONCLUSION Overall, we performed a consistent analysis of one group of pseudogenes across multiple genomes, finding evidence that most of them were created within the last 50 million years, subsequent to the divergence of rodent and primate lineages.
Collapse
Affiliation(s)
- Yuen-Jong Liu
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, 110 Francis Street, Boston, MA, USA
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Deyou Zheng
- Albert Einstein College of Medicine of Yeshiva University, Department of Neurology, Rose F. Kennedy Center, 1410 Pelham Parkway South, Room 915B, Bronx, NY 10461, USA
| | - Suganthi Balasubramanian
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Nicholas Carriero
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Ekta Khurana
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
| | - Rebecca Robilotto
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Mark B Gerstein
- Department of Molecular Biophysics and Biochemistry, P.O. Box 208114, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| |
Collapse
|
282
|
Marques-Bonet T, Ryder OA, Eichler EE. Sequencing primate genomes: what have we learned? Annu Rev Genomics Hum Genet 2009; 10:355-86. [PMID: 19630567 DOI: 10.1146/annurev.genom.9.081307.164420] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We summarize the progress in whole-genome sequencing and analyses of primate genomes. These emerging genome datasets have broadened our understanding of primate genome evolution revealing unexpected and complex patterns of evolutionary change. This includes the characterization of genome structural variation, episodic changes in the repeat landscape, differences in gene expression, new models regarding speciation, and the ephemeral nature of the recombination landscape. The functional characterization of genomic differences important in primate speciation and adaptation remains a significant challenge. Limited access to biological materials, the lack of detailed phenotypic data and the endangered status of many critical primate species have significantly attenuated research into the genetic basis of primate evolution. Next-generation sequencing technologies promise to greatly expand the number of available primate genome sequences; however, such draft genome sequences will likely miss critical genetic differences within complex genomic regions unless dedicated efforts are put forward to understand the full spectrum of genetic variation.
Collapse
Affiliation(s)
- Tomas Marques-Bonet
- Department of Genome Sciences, University of Washington and the Howard Hughes Medical Institute, Seattle, Washington 98105, USA.
| | | | | |
Collapse
|
283
|
Trombetta B, Cruciani F, Underhill PA, Sellitto D, Scozzari R. Footprints of X-to-Y gene conversion in recent human evolution. Mol Biol Evol 2009; 27:714-25. [PMID: 19812029 DOI: 10.1093/molbev/msp231] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Different X-homologous regions of the male-specific portion of the human Y chromosome (MSY) are characterized by a different content of putative single nucleotide polymorphisms (SNPs), as reported in public databases. The possible role of X-to-Y nonallelic gene conversion in contributing to these differences remains poorly understood. We explored this issue by analyzing sequence variation in three regions of the MSY characterized by a different degree of X-Y similarity and a different density of putative SNPs: the PCDH11Y gene in the X-transposed (X-Y identity 99%, high putative SNP content); the TBL1Y gene in the X-degenerate (X-Y identity 86-88%, low putative SNP content); and VCY genes-containing region in the P8 palindrome (X-Y identity 95%, low putative SNP content). Present findings do not provide any evidence for gene conversion in the PCDH11Y and TBL1Y genes; they also strongly suggest that most putative SNPs of the PCDH11Y gene (and possibly the entire X-transposed region) are most likely X-Y paralogous sequence variants, which have been entered in the databases as SNPs. On the other hand, clear evidence for the VCY genes in the P8 palindrome having acted as an acceptor of X-to-Y gene conversion was obtained. A rate of 1.8 x 10(-7) X-to-Y conversions/bp/year was estimated for these genes. These findings indicate that in the VCY region of the MSY, X-to-Y gene conversion can be highly effective to increase the level of diversity among human Y chromosomes and suggest an additional explanation for the ability of the Y chromosome to retard degradation during evolution. Present data are expected to pave the way for future investigations on the role of nonallelic gene conversion in double-strand break repair and the maintenance of Y chromosome integrity.
Collapse
Affiliation(s)
- Beniamino Trombetta
- Dipartimento di Genetica e Biologia Molecolare, Sapienza Università di Roma, Rome, Italy
| | | | | | | | | |
Collapse
|
284
|
Marques-Bonet T, Girirajan S, Eichler EE. The origins and impact of primate segmental duplications. Trends Genet 2009; 25:443-54. [PMID: 19796838 PMCID: PMC2847396 DOI: 10.1016/j.tig.2009.08.002] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2009] [Revised: 08/07/2009] [Accepted: 08/10/2009] [Indexed: 12/25/2022]
Abstract
Duplicated sequences are substrates for the emergence of new genes and are an important source of genetic instability associated with rare and common diseases. Analyses of primate genomes have shown an increase in the proportion of interspersed segmental duplications (SDs) within the genomes of humans and great apes. This contrasts with other mammalian genomes that seem to have their recently duplicated sequences organized in a tandem configuration. In this review, we focus on the mechanistic origin and impact of this difference with respect to evolution, genetic diversity and primate phenotype. Although many genomes will be sequenced in the future, resolution of this aspect of genomic architecture still requires high quality sequences and detailed analyses.
Collapse
Affiliation(s)
- Tomas Marques-Bonet
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | |
Collapse
|
285
|
Norman PJ, Abi-Rached L, Gendzekhadze K, Hammond JA, Moesta AK, Sharma D, Graef T, McQueen KL, Guethlein LA, Carrington CVF, Chandanayingyong D, Chang YH, Crespí C, Saruhan-Direskeneli G, Hameed K, Kamkamidze G, Koram KA, Layrisse Z, Matamoros N, Milà J, Park MH, Pitchappan RM, Ramdath DD, Shiau MY, Stephens HAF, Struik S, Tyan D, Verity DH, Vaughan RW, Davis RW, Fraser PA, Riley EM, Ronaghi M, Parham P. Meiotic recombination generates rich diversity in NK cell receptor genes, alleles, and haplotypes. Genome Res 2009; 19:757-69. [PMID: 19411600 DOI: 10.1101/gr.085738.108] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Natural killer (NK) cells contribute to the essential functions of innate immunity and reproduction. Various genes encode NK cell receptors that recognize the major histocompatibility complex (MHC) Class I molecules expressed by other cells. For primate NK cells, the killer-cell immunoglobulin-like receptors (KIR) are a variable and rapidly evolving family of MHC Class I receptors. Studied here is KIR3DL1/S1, which encodes receptors for highly polymorphic human HLA-A and -B and comprises three ancient allelic lineages that have been preserved by balancing selection throughout human evolution. While the 3DS1 lineage of activating receptors has been conserved, the two 3DL1 lineages of inhibitory receptors were diversified through inter-lineage recombination with each other and with 3DS1. Prominent targets for recombination were D0-domain polymorphisms, which modulate enhancer function, and dimorphism at position 283 in the D2 domain, which influences inhibitory function. In African populations, unequal crossing over between the 3DL1 and 3DL2 genes produced a deleted KIR haplotype in which the telomeric "half" was reduced to a single fusion gene with functional properties distinct from its 3DL1 and 3DL2 parents. Conversely, in Eurasian populations, duplication of the KIR3DL1/S1 locus by unequal crossing over has enabled individuals to carry and express alleles of all three KIR3DL1/S1 lineages. These results demonstrate how meiotic recombination combines with an ancient, preserved diversity to create new KIR phenotypes upon which natural selection acts. A consequence of such recombination is to blur the distinction between alleles and loci in the rapidly evolving human KIR gene family.
Collapse
Affiliation(s)
- Paul J Norman
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
286
|
Stahl PD, Wainszelbaum MJ. Human-specific genes may offer a unique window into human cell signaling. Sci Signal 2009; 2:pe59. [PMID: 19797272 DOI: 10.1126/scisignal.289pe59] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The identification and characterization of human-specific genes and the cellular processes that the encoded proteins control have the potential to help us understand at the molecular level what makes humans different from other species. The sequencing of the human genome and the genomes of closely related primates has revealed the presence of a small number of human- or human-lineage-specific genes that have no orthologs in lower species. Human-specific and human-lineage-specific genes are likely to function as regulators of cell signaling events, and by fine-tuning pathways, the encoded proteins may contribute to human-specific characteristics and behaviors. In addition, human-specific genes may represent biomarkers for examining human-specific characteristics of various diseases. Investigation of the gene encoding TBC1D3 is one example of a search that may lead to understanding the evolution and the function of human-specific genes, because it is absent in lower species and present in high copy number in the human genome.
Collapse
Affiliation(s)
- Philip D Stahl
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63110, USA.
| | | |
Collapse
|
287
|
Abstract
The analysis of genome rearrangements provides a global view on the evolution of a set of related species. We present a new algorithm called EMRAE (efficient method to recover ancestral events) to reliably predict a wide-range of rearrangement events in the ancestry of a group of species. Using simulated data sets, we show that EMRAE achieves comparable sensitivity but significantly higher specificity when predicting evolutionary events relative to other tools to study genome rearrangements. We apply our approach to the synteny blocks of six mammalian genomes (human, chimpanzee, rhesus macaque, mouse, rat, and dog) and predict 1109 rearrangement events, including 831 inversions, 15 translocations, 237 transpositions, and 26 fusions/fissions. Studying the sequence features at the breakpoints of the primate rearrangement events, we demonstrate that they are not only enriched in segmental duplications (SDs), but that the enrichment of matching pairs of SDs is even stronger within the pairs of breakpoints associated with recovered events. We also show that pairs of L1 repeats are frequently associated with ancestral inversions across all studied lineages. Together, this substantiates the model that regions of high sequence identity have been associated with rearrangement events throughout the mammalian phylogeny.
Collapse
Affiliation(s)
- Hao Zhao
- Computational and Mathematical Biology, Genome Institute of Singapore, Singapore
| | | |
Collapse
|
288
|
Abstract
Orthology analysis aims at identifying orthologous genes and gene products from different organisms and, therefore, is a powerful tool in modern computational and experimental biology. Although reconciliation-based orthology methods are generally considered more accurate than distance-based ones, the traditional parsimony-based implementation of reconciliation-based orthology analysis (most parsimonious reconciliation [MPR]) suffers from a number of shortcomings. For example, 1) it is limited to orthology predictions from the reconciliation that minimizes the number of gene duplication and loss events, 2) it cannot evaluate the support of this reconciliation in relation to the other reconciliations, and 3) it cannot make use of prior knowledge (e.g., about species divergence times) that provides auxiliary information for orthology predictions. We present a probabilistic approach to reconciliation-based orthology analysis that addresses all these issues by estimating orthology probabilities. The method is based on the gene evolution model, an explicit evolutionary model for gene duplication and gene loss inside a species tree, that generalizes the standard birth-death process. We describe the probabilistic approach to orthology analysis using 2 experimental data sets and show that the use of orthology probabilities allows a more informative analysis than MPR and, in particular, that it is less sensitive to taxon sampling problems. We generalize these anecdotal observations and show, using data generated under biologically realistic conditions, that MPR give false orthology predictions at a substantial frequency. Last, we provide a new orthology prediction method that allows an orthology and paralogy classification with any chosen sensitivity/specificity combination from the spectra of achievable combinations. We conclude that probabilistic orthology analysis is a strong and more advanced alternative to traditional orthology analysis and that it provides a framework for sophisticated comparative studies of processes in genome evolution.
Collapse
Affiliation(s)
- Bengt Sennblad
- Stockholm Bioinformatics Center, Department of Biochemistry, Stockholm University, AlbaNova, 106 91 Stockholm, Sweden.
| | | |
Collapse
|
289
|
Tian X, Pascal G, Monget P. Evolution and functional divergence of NLRP genes in mammalian reproductive systems. BMC Evol Biol 2009; 9:202. [PMID: 19682372 PMCID: PMC2735741 DOI: 10.1186/1471-2148-9-202] [Citation(s) in RCA: 136] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Accepted: 08/14/2009] [Indexed: 12/31/2022] Open
Abstract
Background NLRPs (Nucleotide-binding oligomerization domain, Leucine rich Repeat and Pyrin domain containing Proteins) are members of NLR (Nod-like receptors) protein family. Recent researches have shown that NLRP genes play important roles in both mammalian innate immune system and reproductive system. Several of NLRP genes were shown to be specifically expressed in the oocyte in mammals. The aim of the present work was to study how these genes evolved and diverged after their duplication, as well as whether natural selection played a role during their evolution. Results By using in silico methods, we have evaluated the evolution and functional divergence of NLRP genes, in particular of mouse reproduction-related Nlrp genes. We found that (1) major NLRP genes have been duplicated before the divergence of mammals, with certain lineage-specific duplications in primates (NLRP7 and 11) and in rodents (Nlrp1, 4 and 9 duplicates); (2) tandem duplication events gave rise to a mammalian reproduction-related NLRP cluster including NLRP2, 4, 5, 7, 8, 9, 11, 13 and 14 genes; (3) the function of mammalian oocyte-specific NLRP genes (NLRP4, 5, 9 and 14) might have diverged during gene evolution; (4) recent segmental duplications concerning Nlrp4 copies and vomeronasal 1 receptor encoding genes (V1r) have been undertaken in the mouse; and (5) duplicates of Nlrp4 and 9 in the mouse might have been subjected to adaptive evolution. Conclusion In conclusion, this study brings us novel information on the evolution of mammalian reproduction-related NLRPs. On the one hand, NLRP genes duplicated and functionally diversified in mammalian reproductive systems (such as NLRP4, 5, 9 and 14). On the other hand, during evolution, different lineages adapted to develop their own NLRP genes, particularly in reproductive function (such as the specific expansion of Nlrp4 and Nlrp9 in the mouse).
Collapse
Affiliation(s)
- Xin Tian
- Physiologie de la Reproduction et des Comportements, UMR 6175 INRA-CNRS-Université François Rabelais de Tours-Haras Nationaux, 37380 Nouzilly, France.
| | | | | |
Collapse
|
290
|
Skinner BM, Robertson LBW, Tempest HG, Langley EJ, Ioannou D, Fowler KE, Crooijmans RPMA, Hall AD, Griffin DK, Völker M. Comparative genomics in chicken and Pekin duck using FISH mapping and microarray analysis. BMC Genomics 2009; 10:357. [PMID: 19656363 PMCID: PMC2907691 DOI: 10.1186/1471-2164-10-357] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2009] [Accepted: 08/05/2009] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND The availability of the complete chicken (Gallus gallus) genome sequence as well as a large number of chicken probes for fluorescent in-situ hybridization (FISH) and microarray resources facilitate comparative genomic studies between chicken and other bird species. In a previous study, we provided a comprehensive cytogenetic map for the turkey (Meleagris gallopavo) and the first analysis of copy number variants (CNVs) in birds. Here, we extend this approach to the Pekin duck (Anas platyrhynchos), an obvious target for comparative genomic studies due to its agricultural importance and resistance to avian flu. RESULTS We provide a detailed molecular cytogenetic map of the duck genome through FISH assignment of 155 chicken clones. We identified one inter- and six intrachromosomal rearrangements between chicken and duck macrochromosomes and demonstrated conserved synteny among all microchromosomes analysed. Array comparative genomic hybridisation revealed 32 CNVs, of which 5 overlap previously designated "hotspot" regions between chicken and turkey. CONCLUSION Our results suggest extensive conservation of avian genomes across 90 million years of evolution in both macro- and microchromosomes. The data on CNVs between chicken and duck extends previous analyses in chicken and turkey and supports the hypotheses that avian genomes contain fewer CNVs than mammalian genomes and that genomes of evolutionarily distant species share regions of copy number variation ("CNV hotspots"). Our results will expedite duck genomics, assist marker development and highlight areas of interest for future evolutionary and functional studies.
Collapse
Affiliation(s)
| | - Lindsay BW Robertson
- Department of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
- Institute of Cancer Research, Belmont, Surrey, SM2 5NG, UK
| | - Helen G Tempest
- Department of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
- Bridge Genoma, 1 St Thomas Street, London Bridge, London, SE1 9RY, UK
| | | | - Dimitris Ioannou
- Department of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
| | - Katie E Fowler
- Department of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
| | - Richard PMA Crooijmans
- Animal Breeding and Genomics Centre, Wageningen University, Marijkeweg 40, 6709 PG Wageningen, The Netherlands
| | - Anthony D Hall
- Cherry Valley Ltd, Rothwell, Market Rasen, Lincolnshire, LN7 6BJ, UK
| | - Darren K Griffin
- Department of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
| | - Martin Völker
- Department of Biosciences, University of Kent, Canterbury, CT2 7NJ, UK
| |
Collapse
|
291
|
Hastings PJ, Lupski JR, Rosenberg SM, Ira G. Mechanisms of change in gene copy number. Nat Rev Genet 2009; 10:551-64. [PMID: 19597530 DOI: 10.1038/nrg2593] [Citation(s) in RCA: 885] [Impact Index Per Article: 55.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Deletions and duplications of chromosomal segments (copy number variants, CNVs) are a major source of variation between individual humans and are an underlying factor in human evolution and in many diseases, including mental illness, developmental disorders and cancer. CNVs form at a faster rate than other types of mutation, and seem to do so by similar mechanisms in bacteria, yeast and humans. Here we review current models of the mechanisms that cause copy number variation. Non-homologous end-joining mechanisms are well known, but recent models focus on perturbation of DNA replication and replication of non-contiguous DNA segments. For example, cellular stress might induce repair of broken replication forks to switch from high-fidelity homologous recombination to non-homologous repair, thus promoting copy number change.
Collapse
Affiliation(s)
- P J Hastings
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.
| | | | | | | |
Collapse
|
292
|
Vergara IA, Mah AK, Huang JC, Tarailo-Graovac M, Johnsen RC, Baillie DL, Chen N. Polymorphic segmental duplication in the nematode Caenorhabditis elegans. BMC Genomics 2009; 10:329. [PMID: 19622155 PMCID: PMC2728738 DOI: 10.1186/1471-2164-10-329] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2009] [Accepted: 07/21/2009] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The nematode Caenorhabditis elegans was the first multicellular organism to have its genome fully sequenced. Over the last 10 years since the original publication in 1998, the C. elegans genome has been scrutinized and the last gaps were filled in November 2002, which present a unique opportunity for examining genome-wide segmental duplications. RESULTS Here, we performed analysis of the C. elegans genome in search for segmental duplications using a new tool -- OrthoCluster -- we have recently developed. We detected 3,484 duplicated segments -- duplicons -- ranging in size from 234 bp to 108 Kb. The largest pair of duplicons, 108 kb in length located on the left arm of Chromosome V, was further characterized. They are nearly identical at the DNA level (99.7% identity) and each duplicon contains 26 putative protein coding genes. Genotyping of 76 wild-type strains obtained from different labs in the C. elegans community revealed that not all strains contain this duplication. In fact, only 29 strains carry this large segmental duplication, suggesting a very recent duplication event in the C. elegans genome. CONCLUSION This report represents the first demonstration that the C. elegans laboratory wild-type N2 strains has acquired large-scale differences.
Collapse
Affiliation(s)
- Ismael A Vergara
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Allan K Mah
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Jim C Huang
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Maja Tarailo-Graovac
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Robert C Johnsen
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - David L Baillie
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Nansheng Chen
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| |
Collapse
|
293
|
Kemppainen E, Fernández-Ayala DJM, Galbraith LCA, O'Dell KMC, Jacobs HT. Phenotypic suppression of the Drosophila mitochondrial disease-like mutant tko(25t) by duplication of the mutant gene in its natural chromosomal context. Mitochondrion 2009; 9:353-63. [PMID: 19616644 DOI: 10.1016/j.mito.2009.07.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2009] [Revised: 06/24/2009] [Accepted: 07/13/2009] [Indexed: 10/20/2022]
Abstract
A mutation in the Drosophila gene technical knockout (tko(25t)), encoding mitoribosomal protein S12, phenocopies human mitochondrial disease. We isolated three spontaneous X-dominant suppressors of tko(25t) (designated Weeble), exhibiting almost wild-type phenotype and containing overlapping segmental duplications including the mutant allele, plus a second mitoribosomal protein gene, mRpL14. Ectopic, expressed copies of tko(25t) and mRpL14 conferred no phenotypic suppression. When placed over a null allele of tko, Weeble retained the mutant phenotype, even in the presence of additional transgenic copies of tko(25t). Increased mutant gene dosage can thus compensate the mutant phenotype, but only when located in its normal chromosomal context.
Collapse
Affiliation(s)
- Esko Kemppainen
- Institute of Medical Technology and Tampere University Hospital, FI-33014 University of Tampere, Finland
| | | | | | | | | |
Collapse
|
294
|
|
295
|
Costantini M, Bernardi G. Mapping insertions, deletions and SNPs on Venter's chromosomes. PLoS One 2009; 4:e5972. [PMID: 19543403 PMCID: PMC2696090 DOI: 10.1371/journal.pone.0005972] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Accepted: 05/19/2009] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND The very recent availability of fully sequenced individual human genomes is a major revolution in biology which is certainly going to provide new insights into genetic diseases and genomic rearrangements. RESULTS We mapped the insertions, deletions and SNPs (single nucleotide polymorphisms) that are present in Craig Venter's genome, more precisely on chromosomes 17 to 22, and compared them with the human reference genome hg17. Our results show that insertions and deletions are almost absent in L1 and generally scarce in L2 isochore families (GC-poor L1+L2 isochores represent slightly over half of the human genome), whereas they increase in GC-rich isochores, largely paralleling the densities of genes, retroviral integrations and Alu sequences. The distributions of insertions/deletions are in striking contrast with those of SNPs which exhibit almost the same density across all isochore families with, however, a trend for lower concentrations in gene-rich regions. CONCLUSIONS Our study strongly suggests that the distribution of insertions/deletions is due to the structure of chromatin which is mostly open in gene-rich, GC-rich isochores, and largely closed in gene-poor, GC-poor isochores. The different distributions of insertions/deletions and SNPs are clearly related to the two different responsible mechanisms, namely recombination and point mutations.
Collapse
Affiliation(s)
- Maria Costantini
- Stazione Zoologica Anton Dohrn, Naples, Italy
- * E-mail: (MC); (GB)
| | - Giorgio Bernardi
- Stazione Zoologica Anton Dohrn, Naples, Italy
- * E-mail: (MC); (GB)
| |
Collapse
|
296
|
Gökçümen O, Lee C. Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods 2009; 49:18-25. [PMID: 19545629 DOI: 10.1016/j.ymeth.2009.06.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2009] [Revised: 06/13/2009] [Accepted: 06/18/2009] [Indexed: 01/10/2023] Open
Abstract
A substantial amount of genomic variation is now known to exist in humans and other primate species. Single nucleotide polymorphisms (SNPs) are thought to represent the vast majority of genomic differences among individuals in a given primate species and comprise about 0.1% of the genomes of two humans. However, recent studies have now shown that structural variation msay account for as much as 0.7% of the genomic differences in humans, of which copy number variants (CNVs) are the largest component. CNVs are segments of DNA that can range in size from hundreds of bases to millions of base pairs in length and have different number of copies between individuals. Recent technological advancements in array technologies led to genome-wide identification of CNVs and consequently revealed thousands of variable loci in humans, comprising as much as 12% of the human genome [A.J. Iafrate, L. Feuk, M.N. Rivera, M.L. Listewnik, P.K. Donahoe, Y. Qi, S.W. Scherer, C. Lee, Nat. Genet. 36 (2004) 949-951, [3]]. CNVs in humans have already been associated with susceptibility to certain complex diseases, dietary adaptation, and several neurological conditions. In addition, recent studies have shown that CNVs can be successfully implemented in population genetics research, providing important insights into human genetic variation. Nevertheless, the important role of CNVs in primate evolution and genetic diversity is still largely unknown. This article aims to outline the strengths and weaknesses of current comparative genomic hybridization array technologies that have been employed to detect CNV variation and the applications of these techniques to primate genetic research.
Collapse
Affiliation(s)
- Omer Gökçümen
- Cytogenetics Research Laboratory, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 221 Longwood Avenue, EBRC 404, Boston, MA 02115, USA.
| | | |
Collapse
|
297
|
She X, Rohl CA, Castle JC, Kulkarni AV, Johnson JM, Chen R. Definition, conservation and epigenetics of housekeeping and tissue-enriched genes. BMC Genomics 2009; 10:269. [PMID: 19534766 PMCID: PMC2706266 DOI: 10.1186/1471-2164-10-269] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2008] [Accepted: 06/17/2009] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Housekeeping genes (HKG) are constitutively expressed in all tissues while tissue-enriched genes (TEG) are expressed at a much higher level in a single tissue type than in others. HKGs serve as valuable experimental controls in gene and protein expression experiments, while TEGs tend to represent distinct physiological processes and are frequently candidates for biomarkers or drug targets. The genomic features of these two groups of genes expressed in opposing patterns may shed light on the mechanisms by which cells maintain basic and tissue-specific functions. RESULTS Here, we generate gene expression profiles of 42 normal human tissues on custom high-density microarrays to systematically identify 1,522 HKGs and 975 TEGs and compile a small subset of 20 housekeeping genes which are highly expressed in all tissues with lower variance than many commonly used HKGs. Cross-species comparison shows that both the functions and expression patterns of HKGs are conserved. TEGs are enriched with respect to both segmental duplication and copy number variation, while no such enrichment is observed for HKGs, suggesting the high expression of HKGs are not due to high copy numbers. Analysis of genomic and epigenetic features of HKGs and TEGs reveals that the high expression of HKGs across different tissues is associated with decreased nucleosome occupancy at the transcription start site as indicated by enhanced DNase hypersensitivity. Additionally, we systematically and quantitatively demonstrated that the CpG islands' enrichment in HKGs transcription start sites (TSS) and their depletion in TEGs TSS. Histone methylation patterns differ significantly between HKGs and TEGs, suggesting that methylation contributes to the differential expression patterns as well. CONCLUSION We have compiled a set of high quality HKGs that should provide higher and more consistent expression when used as references in laboratory experiments than currently used HKGs. The comparison of genomic features between HKGs and TEGs shows that HKGs are more conserved than TEGs in terms of functions, expression pattern and polymorphisms. In addition, our results identify chromatin structure and epigenetic features of HKGs and TEGs that are likely to play an important role in regulating their strikingly different expression patterns.
Collapse
Affiliation(s)
- Xinwei She
- Rosetta Inpharmatics LLC, Seattle, WA 98109, USA.
| | | | | | | | | | | |
Collapse
|
298
|
A primate subfamily of galectins expressed at the maternal-fetal interface that promote immune cell death. Proc Natl Acad Sci U S A 2009; 106:9731-6. [PMID: 19497882 DOI: 10.1073/pnas.0903568106] [Citation(s) in RCA: 160] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Galectins are proteins that regulate immune responses through the recognition of cell-surface glycans. We present evidence that 16 human galectin genes are expressed at the maternal-fetal interface and demonstrate that a cluster of 5 galectin genes on human chromosome 19 emerged during primate evolution as a result of duplication and rearrangement of genes and pseudogenes via a birth and death process primarily mediated by transposable long interspersed nuclear elements (LINEs). Genes in the cluster are found only in anthropoids, a group of primate species that differ from their strepsirrhine counterparts by having relatively large brains and long gestations. Three of the human cluster genes (LGALS13, -14, and -16) were found to be placenta-specific. Homology modeling revealed conserved three-dimensional structures of galectins in the human cluster; however, analyses of 24 newly derived and 69 publicly available sequences in 10 anthropoid species indicate functional diversification by evidence of positive selection and amino acid replacements in carbohydrate-recognition domains. Moreover, we demonstrate altered sugar-binding capacities of 6 recombinant galectins in the cluster. We show that human placenta-specific galectins are predominantly expressed by the syncytiotrophoblast, a primary site of metabolic exchange where, early during pregnancy, the fetus comes in contact with immune cells circulating in maternal blood. Because ex vivo functional assays demonstrate that placenta-specific galectins induce the apoptosis of T lymphocytes, we propose that these galectins reduce the danger of maternal immune attacks on the fetal semiallograft, presumably conferring additional immune tolerance mechanisms and in turn sustaining hemochorial placentation during the long gestation of anthropoid primates.
Collapse
|
299
|
Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol 2009; 7:e1000112. [PMID: 19468303 PMCID: PMC2680341 DOI: 10.1371/journal.pbio.1000112] [Citation(s) in RCA: 361] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2008] [Accepted: 04/03/2009] [Indexed: 02/06/2023] Open
Abstract
A finished clone-based assembly of the mouse genome reveals extensive recent sequence duplication during recent evolution and rodent-specific expansion of certain gene families. Newly assembled duplications contain protein-coding genes that are mostly involved in reproductive function. The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. The availability of an accurate genome sequence provides the bedrock upon which modern biomedical research is based. Here we describe a high-quality assembly, Build 36, of the mouse genome. This assembly was put together by aligning overlapping individual clones representing parts of the genome, and it provides a more complete picture than previous assemblies, because it adds much rodent-specific sequence that was previously unavailable. The addition of these sequences provides insight into both the genomic architecture and the gene complement of the mouse. In particular, it highlights recent gene duplications and the expansion of certain gene families during rodent evolution. An improved understanding of the mouse genome and thus mouse biology will enhance the utility of the mouse as a model for human disease.
Collapse
|
300
|
Patrushev LI, Minkevich IG. The problem of the eukaryotic genome size. BIOCHEMISTRY (MOSCOW) 2009; 73:1519-52. [PMID: 19216716 DOI: 10.1134/s0006297908130117] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The current state of knowledge concerning the unsolved problem of the huge interspecific eukaryotic genome size variations not correlating with the species phenotypic complexity (C-value enigma also known as C-value paradox) is reviewed. Characteristic features of eukaryotic genome structure and molecular mechanisms that are the basis of genome size changes are examined in connection with the C-value enigma. It is emphasized that endogenous mutagens, including reactive oxygen species, create a constant nuclear environment where any genome evolves. An original quantitative model and general conception are proposed to explain the C-value enigma. In accordance with the theory, the noncoding sequences of the eukaryotic genome provide genes with global and differential protection against chemical mutagens and (in addition to the anti-mutagenesis and DNA repair systems) form a new, third system that protects eukaryotic genetic information. The joint action of these systems controls the spontaneous mutation rate in coding sequences of the eukaryotic genome. It is hypothesized that the genome size is inversely proportional to functional efficiency of the anti-mutagenesis and/or DNA repair systems in a particular biological species. In this connection, a model of eukaryotic genome evolution is proposed.
Collapse
Affiliation(s)
- L I Patrushev
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia.
| | | |
Collapse
|