1
|
Porubsky D, Dashnow H, Sasani TA, Logsdon GA, Hallast P, Noyes MD, Kronenberg ZN, Mokveld T, Koundinya N, Nolan C, Steely CJ, Guarracino A, Dolzhenko E, Harvey WT, Rowell WJ, Grigorev K, Nicholas TJ, Goldberg ME, Oshima KK, Lin J, Ebert P, Watkins WS, Leung TY, Hanlon VCT, McGee S, Pedersen BS, Happ HC, Jeong H, Munson KM, Hoekzema K, Chan DD, Wang Y, Knuth J, Garcia GH, Fanslow C, Lambert C, Lee C, Smith JD, Levy S, Mason CE, Garrison E, Lansdorp PM, Neklason DW, Jorde LB, Quinlan AR, Eberle MA, Eichler EE. Human de novo mutation rates from a four-generation pedigree reference. Nature 2025:10.1038/s41586-025-08922-2. [PMID: 40269156 DOI: 10.1038/s41586-025-08922-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Accepted: 03/20/2025] [Indexed: 04/25/2025]
Abstract
Understanding the human de novo mutation (DNM) rate requires complete sequence information1. Here using five complementary short-read and long-read sequencing technologies, we phased and assembled more than 95% of each diploid human genome in a four-generation, twenty-eight-member family (CEPH 1463). We estimate 98-206 DNMs per transmission, including 74.5 de novo single-nucleotide variants, 7.4 non-tandem repeat indels, 65.3 de novo indels or structural variants originating from tandem repeats, and 4.4 centromeric DNMs. Among male individuals, we find 12.4 de novo Y chromosome events per generation. Short tandem repeats and variable-number tandem repeats are the most mutable, with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 16% of de novo single-nucleotide variants are postzygotic in origin with no paternal bias, including early germline mosaic mutations. We place all this variation in the context of a high-resolution recombination map (~3.4 kb breakpoint resolution) and find no correlation between meiotic crossover and de novo structural variants. These near-telomere-to-telomere familial genomes provide a truth set to understand the most fundamental processes underlying human genetic variation.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Thomas A Sasani
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Nidhi Koundinya
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Cody J Steely
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY, USA
| | - Andrea Guarracino
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Kirill Grigorev
- Space Biosciences Research Branch, NASA Ames Research Center, Moffett Field, CA, USA
- Blue Marble Space Institute of Science, Seattle, WA, USA
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Michael E Goldberg
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Keisuke K Oshima
- Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - W Scott Watkins
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Tiffany Y Leung
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Vincent C T Hanlon
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Sean McGee
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hannah C Happ
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Altos Labs, San Diego, CA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Daniel D Chan
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Yanni Wang
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Gage H Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
| | - Erik Garrison
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Deborah W Neklason
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
2
|
Porubsky D, Dashnow H, Sasani TA, Logsdon GA, Hallast P, Noyes MD, Kronenberg ZN, Mokveld T, Koundinya N, Nolan C, Steely CJ, Guarracino A, Dolzhenko E, Harvey WT, Rowell WJ, Grigorev K, Nicholas TJ, Oshima KK, Lin J, Ebert P, Watkins WS, Leung TY, Hanlon VCT, McGee S, Pedersen BS, Goldberg ME, Happ HC, Jeong H, Munson KM, Hoekzema K, Chan DD, Wang Y, Knuth J, Garcia GH, Fanslow C, Lambert C, Lee C, Smith JD, Levy S, Mason CE, Garrison E, Lansdorp PM, Neklason DW, Jorde LB, Quinlan AR, Eberle MA, Eichler EE. A familial, telomere-to-telomere reference for human de novo mutation and recombination from a four-generation pedigree. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.05.606142. [PMID: 39149261 PMCID: PMC11326147 DOI: 10.1101/2024.08.05.606142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 de novo single-nucleotide variants (SNVs), 7.4 non-tandem repeat indels, 79.6 de novo indels or structural variants (SVs) originating from tandem repeats, 7.7 centromeric de novo SVs and SNVs, and 12.4 de novo Y chromosome events per generation. STRs and VNTRs are the most mutable with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations, documenting de novo SVs, and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length, and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 17% of de novo SNVs are postzygotic in origin with no paternal bias. We place all this variation in the context of a high-resolution recombination map (~3.5 kbp breakpoint resolution). We observe a strong maternal recombination bias (1.36 maternal:paternal ratio) with a consistent reduction in the number of crossovers with increasing paternal (r=0.85) and maternal (r=0.65) age. However, we observe no correlation between meiotic crossover locations and de novo SVs, arguing against non-allelic homologous recombination as a predominant mechanism. The use of multiple orthogonal technologies, near-telomere-to-telomere phased genome assemblies, and a multi-generation family to assess transmission has created the most comprehensive, publicly available "truth set" of all classes of genomic variants. The resource can be used to test and benchmark new algorithms and technologies to understand the most fundamental processes underlying human genetic variation.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Thomas A Sasani
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Present address: Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Michelle D Noyes
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Nidhi Koundinya
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Cody J Steely
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY, USA
| | - Andrea Guarracino
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William J Rowell
- Department of Internal Medicine, University of Kentucky College of Medicine, Lexington, KY, USA
| | - Kirill Grigorev
- Blue Marble Space Institute of Science, Seattle, WA, USA
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
| | - Thomas J Nicholas
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Keisuke K Oshima
- Present address: Department of Genetics, Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jiadong Lin
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty and University Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - W Scott Watkins
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Tiffany Y Leung
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
| | | | - Sean McGee
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brent S Pedersen
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Michael E Goldberg
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hannah C Happ
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Present address: Altos Labs, San Diego, CA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Daniel D Chan
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
| | - Yanni Wang
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
| | - Jordan Knuth
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Gage H Garcia
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Shawn Levy
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA
| | - Erik Garrison
- Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Deborah W Neklason
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Vollger MR, Dishuck PC, Harvey WT, DeWitt WS, Guitart X, Goldberg ME, Rozanski AN, Lucas J, Asri M, Munson KM, Lewis AP, Hoekzema K, Logsdon GA, Porubsky D, Paten B, Harris K, Hsieh P, Eichler EE. Increased mutation and gene conversion within human segmental duplications. Nature 2023; 617:325-334. [PMID: 37165237 PMCID: PMC10172114 DOI: 10.1038/s41586-023-05895-y] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.
Collapse
Affiliation(s)
- Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William S DeWitt
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael E Goldberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison N Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kelley Harris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
4
|
Hollox EJ, Zuccherato LW, Tucci S. Genome structural variation in human evolution. Trends Genet 2021; 38:45-58. [PMID: 34284881 DOI: 10.1016/j.tig.2021.06.015] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 06/21/2021] [Accepted: 06/22/2021] [Indexed: 01/01/2023]
Abstract
Structural variation (SV) is a large difference (typically >100 bp) in the genomic structure of two genomes and includes both copy number variation and variation that does not change copy number of a genomic region, such as an inversion. Improved reference genomes, combined with widespread genome sequencing using short-read sequencing technology, and increasingly using long-read sequencing, have reignited interest in SV. Recent large-scale studies and functional focused analyses have highlighted the role of SV in human evolution. In this review, we highlight human-specific SVs involved in changes in the brain, population-specific SVs that affect response to the environment, including adaptation to diet and infectious diseases, and summarise the contribution of archaic hominin admixture to present-day human SV.
Collapse
Affiliation(s)
- Edward J Hollox
- Department of Genetics and Genome Biology, University of Leicester, UK.
| | - Luciana W Zuccherato
- Núcleo de Ensino e Pesquisa, Instituto Mário Penna, Belo Horizonte, Brazil; Departmento de Bioquímica e Imunologia, Universidade de Minas Gerais, Belo Horizonte, Brazil
| | - Serena Tucci
- Department of Anthropology, Yale University, New Haven, CT, USA
| |
Collapse
|
5
|
Esoh KK, Apinjoh TO, Nyanjom SG, Wonkam A, Chimusa ER, Amenga-Etego L, Amambua-Ngwa A, Achidi EA. Fine scale human genetic structure in three regions of Cameroon reveals episodic diversifying selection. Sci Rep 2021; 11:1039. [PMID: 33441574 PMCID: PMC7807043 DOI: 10.1038/s41598-020-79124-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 10/28/2020] [Indexed: 01/29/2023] Open
Abstract
Inferences from genetic association studies rely largely on the definition and description of the underlying populations that highlight their genetic similarities and differences. The clustering of human populations into subgroups (population structure) can significantly confound disease associations. This study investigated the fine-scale genetic structure within Cameroon that may underlie disparities observed with Cameroonian ethnicities in malaria genome-wide association studies in sub-Saharan Africa. Genotype data of 1073 individuals from three regions and three ethnic groups in Cameroon were analyzed using measures of genetic proximity to ascertain fine-scale genetic structure. Model-based clustering revealed distinct ancestral proportions among the Bantu, Semi-Bantu and Foulbe ethnic groups, while haplotype-based coancestry estimation revealed possible longstanding and ongoing sympatric differentiation among individuals of the Foulbe ethnic group, and their Bantu and Semi-Bantu counterparts. A genome scan found strong selection signatures in the HLA gene region, confirming longstanding knowledge of natural selection on this genomic region in African populations following immense disease pressure. Signatures of selection were also observed in the HBB gene cluster, a genomic region known to be under strong balancing selection in sub-Saharan Africa due to its co-evolution with malaria. This study further supports the role of evolution in shaping genomes of Cameroonian populations and reveals fine-scale hierarchical structure among and within Cameroonian ethnicities that may impact genetic association studies in the country.
Collapse
Affiliation(s)
- Kevin K Esoh
- Department of Biochemistry, Jomo Kenyatta University of Agriculture and Technology, P.O. Box 62000, Nairobi, City Square, Kenya
| | - Tobias O Apinjoh
- Department of Biochemistry and Molecular Biology, University of Buea, P.O. Box 63, Buea, South West Region, Cameroon.
| | - Steven G Nyanjom
- Department of Biochemistry, Jomo Kenyatta University of Agriculture and Technology, P.O. Box 62000, Nairobi, City Square, Kenya
| | - Ambroise Wonkam
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
| | - Emile R Chimusa
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Health Sciences Campus, Anzio Rd, Observatory, 7925, South Africa
| | - Lucas Amenga-Etego
- West African Centre for Cell Biology of Infectious Pathogens, University of Ghana, Legon, Accra, Ghana
| | | | - Eric A Achidi
- Department of Biochemistry and Molecular Biology, University of Buea, P.O. Box 63, Buea, South West Region, Cameroon
| |
Collapse
|
6
|
MacQueen A, Tian D, Chang W, Holub E, Kreitman M, Bergelson J. Population Genetics of the Highly Polymorphic RPP8 Gene Family. Genes (Basel) 2019; 10:E691. [PMID: 31500388 PMCID: PMC6771003 DOI: 10.3390/genes10090691] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 08/31/2019] [Accepted: 09/03/2019] [Indexed: 02/06/2023] Open
Abstract
Plant nucleotide-binding domain and leucine-rich repeat containing (NLR) genes provide some of the most extreme examples of polymorphism in eukaryotic genomes, rivalling even the vertebrate major histocompatibility complex. Surprisingly, this is also true in Arabidopsis thaliana, a predominantly selfing species with low heterozygosity. Here, we investigate how gene duplication and intergenic exchange contribute to this extraordinary variation. RPP8 is a three-locus system that is configured chromosomally as either a direct-repeat tandem duplication or as a single copy locus, plus a locus 2 Mb distant. We sequenced 48 RPP8 alleles from 37 accessions of A. thaliana and 12 RPP8 alleles from Arabidopsis lyrata to investigate the patterns of interlocus shared variation. The tandem duplicates display fixed differences and share less variation with each other than either shares with the distant paralog. A high level of shared polymorphism among alleles at one of the tandem duplicates, the single-copy locus and the distal locus, must involve both classical crossing over and intergenic gene conversion. Despite these polymorphism-enhancing mechanisms, the observed nucleotide diversity could not be replicated under neutral forward-in-time simulations. Only by adding balancing selection to the simulations do they approach the level of polymorphism observed at RPP8. In this NLR gene triad, genetic architecture, gene function and selection all combine to generate diversity.
Collapse
Affiliation(s)
- Alice MacQueen
- Integrative Biology, The University of Texas at Austin, Austin, TX 78712, USA.
| | - Dacheng Tian
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210008, China.
| | - Wenhan Chang
- Department of Ecology & Evolution, The University of Chicago, Chicago, IL 60637, USA.
| | - Eric Holub
- School of Life Sciences, Wellesbourne Innovation Campus, University of Warwick, Wellesbourne CV359EF, UK.
| | - Martin Kreitman
- Department of Ecology & Evolution, The University of Chicago, Chicago, IL 60637, USA.
| | - Joy Bergelson
- Department of Ecology & Evolution, The University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
7
|
Hartasánchez DA, Brasó-Vives M, Heredia-Genestar JM, Pybus M, Navarro A. Effect of Collapsed Duplications on Diversity Estimates: What to Expect. Genome Biol Evol 2018; 10:2899-2905. [PMID: 30364947 PMCID: PMC6239678 DOI: 10.1093/gbe/evy223] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/08/2018] [Indexed: 12/19/2022] Open
Abstract
The study of segmental duplications (SDs) and copy-number variants (CNVs) is of great importance in the fields of genomics and evolution. However, SDs and CNVs are usually excluded from genome-wide scans for natural selection. Because of high identity between copies, SDs and CNVs that are not included in reference genomes are prone to be collapsed-that is, mistakenly aligned to the same region-when aligning sequence data from single individuals to the reference. Such collapsed duplications are additionally challenging because concerted evolution between duplications alters their site frequency spectrum and linkage disequilibrium patterns. To investigate the potential effect of collapsed duplications upon natural selection scans we obtained expectations for four summary statistics from simulations of duplications evolving under a range of interlocus gene conversion and crossover rates. We confirm that summary statistics traditionally used to detect the action of natural selection on DNA sequences cannot be applied to SDs and CNVs since in some cases values for known duplications mimic selective signatures. As a proof of concept of the pervasiveness of collapsed duplications, we analyzed data from the 1,000 Genomes Project. We find that, within regions identified as variable in copy number, diversity between individuals with the duplication is consistently higher than between individuals without the duplication. Furthermore, the frequency of single nucleotide variants (SNVs) deviating from Hardy-Weinberg Equilibrium is higher in individuals with the duplication, which strongly suggests that higher diversity is a consequence of collapsed duplications and incorrect evaluation of SNVs within these CNV regions.
Collapse
Affiliation(s)
- Diego A Hartasánchez
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.,Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Villeurbanne, France
| | - Marina Brasó-Vives
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Jose Maria Heredia-Genestar
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Marc Pybus
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.,National Institute for Bioinformatics (INB), Barcelona, Catalonia, Spain.,Centre for Genomic Regulation (CRG), Barcelona, Catalonia, Spain
| |
Collapse
|
8
|
Steenwyk JL, Soghigian JS, Perfect JR, Gibbons JG. Copy number variation contributes to cryptic genetic variation in outbreak lineages of Cryptococcus gattii from the North American Pacific Northwest. BMC Genomics 2016; 17:700. [PMID: 27590805 PMCID: PMC5009542 DOI: 10.1186/s12864-016-3044-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 08/24/2016] [Indexed: 12/13/2022] Open
Abstract
Background Copy number variants (CNVs) are a class of structural variants (SVs) and are defined as fragments of DNA that are present at variable copy number in comparison with a reference genome. Recent advances in bioinformatics methodologies and sequencing technologies have enabled the high-resolution quantification of genome-wide CNVs. In pathogenic fungi SVs have been shown to alter gene expression, influence host specificity, and drive fungicide resistance, but little attention has focused specifically on CNVs. Using publicly available sequencing data, we identified 90 isolates across 212 Cryptococcus gattii genomes that belong to the VGII subgroups responsible for the recent deadly outbreaks in the North American Pacific Northwest. We generated CNV profiles for each sample to investigate the prevalence and function of CNV in C. gattii. Results We identified eight genetic clusters among publicly available Illumina whole genome sequence data from 212 C. gattii isolates through population structure analysis. Three clusters represent the VGIIa, VGIIb, and VGIIc subgroups from the North American Pacific Northwest. CNV was bioinformatically predicted and affected ~300–400 Kilobases (Kb) of the C. gattii VGII subgroup genomes. Sixty-seven loci, encompassing 58 genes, showed highly divergent patterns of copy number variation between VGII subgroups. Analysis of PFam domains within divergent CN variable genes revealed enrichment of protein domains associated with transport, cell wall organization and external encapsulating structure. Conclusions CNVs may contribute to pathological and phenotypic differences observed between the C. gattii VGIIa, VGIIb, and VGIIc subpopulations. Genes overlapping with population differentiated CNVs were enriched for several virulence related functional terms. These results uncover novel candidate genes to examine the genetic and functional underpinnings of C. gattii pathogenicity. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3044-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jacob L Steenwyk
- Biology Department, Clark University, 950 Main Street, Worcester, MA, USA.,Current address: Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - John S Soghigian
- Biology Department, Clark University, 950 Main Street, Worcester, MA, USA.,Current address: Department of Environmental Sciences, The Connecticut Agricultural Experiment Station, New Haven, CT, USA
| | - John R Perfect
- Division of Infectious Diseases, Department of Medicine, Duke University Medical Center, Durham, NC, USA
| | - John G Gibbons
- Biology Department, Clark University, 950 Main Street, Worcester, MA, USA.
| |
Collapse
|
9
|
Bickhart DM, Xu L, Hutchison JL, Cole JB, Null DJ, Schroeder SG, Song J, Garcia JF, Sonstegard TS, Van Tassell CP, Schnabel RD, Taylor JF, Lewin HA, Liu GE. Diversity and population-genetic properties of copy number variations and multicopy genes in cattle. DNA Res 2016; 23:253-62. [PMID: 27085184 PMCID: PMC4909312 DOI: 10.1093/dnares/dsw013] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 02/29/2016] [Indexed: 11/14/2022] Open
Abstract
The diversity and population genetics of copy number variation (CNV) in domesticated animals are not well understood. In this study, we analysed 75 genomes of major taurine and indicine cattle breeds (including Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, and Romagnola), sequenced to 11-fold coverage to identify 1,853 non-redundant CNV regions. Supported by high validation rates in array comparative genomic hybridization (CGH) and qPCR experiments, these CNV regions accounted for 3.1% (87.5 Mb) of the cattle reference genome, representing a significant increase over previous estimates of the area of the genome that is copy number variable (∼2%). Further population genetics and evolutionary genomics analyses based on these CNVs revealed the population structures of the cattle taurine and indicine breeds and uncovered potential diversely selected CNVs near important functional genes, including AOX1, ASZ1, GAT, GLYAT, and KRTAP9-1. Additionally, 121 CNV gene regions were found to be either breed specific or differentially variable across breeds, such as RICTOR in dairy breeds and PNPLA3 in beef breeds. In contrast, clusters of the PRP and PAG genes were found to be duplicated in all sequenced animals, suggesting that subfunctionalization, neofunctionalization, or overdominance play roles in diversifying those fertility-related genes. These CNV results provide a new glimpse into the diverse selection histories of cattle breeds and a basis for correlating structural variation with complex traits in the future.
Collapse
Affiliation(s)
- Derek M Bickhart
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Lingyang Xu
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | - Jana L Hutchison
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - John B Cole
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Daniel J Null
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Steven G Schroeder
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | | | - Tad S Sonstegard
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | | | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Harris A Lewin
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - George E Liu
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| |
Collapse
|
10
|
Hartasánchez DA, Brasó-Vives M, Fuentes-Díaz J, Vallès-Codina O, Navarro A. SeDuS: segmental duplication simulator. Bioinformatics 2015; 32:148-50. [PMID: 26358728 DOI: 10.1093/bioinformatics/btv481] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 08/12/2015] [Indexed: 11/12/2022] Open
Abstract
SUMMARY SeDuS is the first flexible and user-friendly forward-in-time simulator of patterns of molecular evolution within segmental duplications undergoing interlocus gene conversion and crossover. SeDuS introduces known features of interlocus gene conversion such as biased directionality and dependence on local sequence identity. Additionally, it includes aspects such as different selective pressures acting upon copy number and flexible crossover distributions. A graphical user interface allows fast fine-tuning of relevant parameters and straightforward real-time analysis of the evolution of duplicates. AVAILABILITY AND IMPLEMENTATION SeDuS is implemented in C++ and can be run via command line or through a graphical user interface developed using Qt C++. Source code and binary executables for Linux, OS X and Windows are freely available at www.biologiaevolutiva.org/sedus/. A tutorial with a detailed description of implementation, parameters and output files is available online. CONTACT arcadi.navarro@upf.edu.
Collapse
Affiliation(s)
- Diego A Hartasánchez
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, 08003
| | - Marina Brasó-Vives
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, 08003
| | - Juanma Fuentes-Díaz
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, 08003
| | - Oriol Vallès-Codina
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, 08003
| | - Arcadi Navarro
- Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, 08003, National Institute for Bioinformatics, 08003, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 and Centre for Genomic Regulation (CRG), 08003, Barcelona, Spain
| |
Collapse
|
11
|
Estimating copy numbers of alleles from population-scale high-throughput sequencing data. BMC Bioinformatics 2015; 16 Suppl 1:S4. [PMID: 25707811 PMCID: PMC4331703 DOI: 10.1186/1471-2105-16-s1-s4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. Results We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. Conclusions Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.
Collapse
|
12
|
Interplay of interlocus gene conversion and crossover in segmental duplications under a neutral scenario. G3-GENES GENOMES GENETICS 2014; 4:1479-89. [PMID: 24906640 PMCID: PMC4132178 DOI: 10.1534/g3.114.012435] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Interlocus gene conversion is a major evolutionary force that drives the concerted evolution of duplicated genomic regions. Theoretical models successfully have addressed the effects of interlocus gene conversion and the importance of crossover in the evolutionary fate of gene families and duplications but have not considered complex recombination scenarios, such as the presence of hotspots. To study the interplay between interlocus gene conversion and crossover, we have developed a forward-time simulator that allows the exploration of a wide range of interlocus gene conversion rates under different crossover models. Using it, we have analyzed patterns of nucleotide variation and linkage disequilibrium within and between duplicate regions, focusing on a neutral scenario with constant population size and validating our results with the existing theoretical models. We show that the interaction of gene conversion and crossover is nontrivial and that the location of crossover junctions is a fundamental determinant of levels of variation and linkage disequilibrium in duplicated regions. We also show that if crossover activity between duplications is strong enough, recurrent interlocus gene conversion events can break linkage disequilibrium within duplicates. Given the complex nature of interlocus gene conversion and crossover, we provide a framework to explore their interplay to help increase knowledge on molecular evolution within segmental duplications under more complex scenarios, such as demographic changes or natural selection.
Collapse
|
13
|
Human gene copy number variation and infectious disease. Hum Genet 2014; 133:1217-33. [PMID: 25110110 DOI: 10.1007/s00439-014-1457-x] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 05/20/2014] [Indexed: 01/05/2023]
Abstract
Variability in the susceptibility to infectious disease and its clinical manifestation can be determined by variation in the environment and by genetic variation in the pathogen and the host. Despite several successes based on candidate gene studies, defining the host variation affecting infectious disease has not been as successful as for other multifactorial diseases. Both single nucleotide variation and copy number variation (CNV) of the host contribute to the host's susceptibility to infectious disease. In this review we focus on CNV, particularly on complex multiallelic CNV that is often not well characterised either directly by hybridisation methods or indirectly by analysis of genotypes and flanking single nucleotide variants. We summarise the well-known examples, such as α-globin deletion and susceptibility to severe malaria, as well as more recent controversies, such as the extensive CNV of the chemokine gene CCL3L1 and HIV infection. We discuss the potential biological mechanisms that could underly any genetic association and reflect on the extensive complexity and functional variation generated by a combination of CNV and sequence variation, as illustrated by the Fc gamma receptor genes FCGR3A, FCGR3B and FCGR2C. We also highlight some understudied areas that might prove fruitful areas for further research.
Collapse
|
14
|
Both positive and negative selection pressures contribute to the polymorphism pattern of the duplicated human CYP21A2 gene. PLoS One 2013; 8:e81977. [PMID: 24312389 PMCID: PMC3843699 DOI: 10.1371/journal.pone.0081977] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2013] [Accepted: 10/20/2013] [Indexed: 11/19/2022] Open
Abstract
The human steroid 21-hydroxylase gene (CYP21A2) participates in cortisol and aldosterone biosynthesis, and resides together with its paralogous (duplicated) pseudogene in a multiallelic copy number variation (CNV), called RCCX CNV. Concerted evolution caused by non-allelic gene conversion has been described in great ape CYP21 genes, and the same conversion activity is responsible for a serious genetic disorder of CYP21A2, congenital adrenal hyperplasia (CAH). In the current study, 33 CYP21A2 haplotype variants encoding 6 protein variants were determined from a European population. CYP21A2 was shown to be one of the most diverse human genes (HHe=0.949), but the diversity of intron 2 was greater still. Contrary to previous findings, the evolution of intron 2 did not follow concerted evolution, although the remaining part of the gene did. Fixed sites (different fixed alleles of sites in human CYP21 paralogues) significantly accumulated in intron 2, indicating that the excess of fixed sites was connected to the lack of effective non-allelic conversion and concerted evolution. Furthermore, positive selection was presumably focused on intron 2, and possibly associated with the previous genetic features. However, the positive selection detected by several neutrality tests was discerned along the whole gene. In addition, the clear signature of negative selection was observed in the coding sequence. The maintenance of the CYP21 enzyme function is critical, and could lead to negative selection, whereas the presumed gene regulation altering steroid hormone levels via intron 2 might help fast adaptation, which broadly characterizes the genes of human CNVs responding to the environment.
Collapse
|
15
|
Hanikenne M, Kroymann J, Trampczynska A, Bernal M, Motte P, Clemens S, Krämer U. Hard selective sweep and ectopic gene conversion in a gene cluster affording environmental adaptation. PLoS Genet 2013; 9:e1003707. [PMID: 23990800 PMCID: PMC3749932 DOI: 10.1371/journal.pgen.1003707] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 06/22/2013] [Indexed: 12/27/2022] Open
Abstract
Among the rare colonizers of heavy-metal rich toxic soils, Arabidopsis halleri is a compelling model extremophile, physiologically distinct from its sister species A. lyrata, and A. thaliana. Naturally selected metal hypertolerance and extraordinarily high leaf metal accumulation in A. halleri both require Heavy Metal ATPase4 (HMA4) encoding a PIB-type ATPase that pumps Zn(2+) and Cd(2+) out of specific cell types. Strongly enhanced HMA4 expression results from a combination of gene copy number expansion and cis-regulatory modifications, when compared to A. thaliana. These findings were based on a single accession of A. halleri. Few studies have addressed nucleotide sequence polymorphism at loci known to govern adaptations. We thus sequenced 13 DNA segments across the HMA4 genomic region of multiple A. halleri individuals from diverse habitats. Compared to control loci flanking the three tandem HMA4 gene copies, a gradual depletion of nucleotide sequence diversity and an excess of low-frequency polymorphisms are hallmarks of positive selection in HMA4 promoter regions, culminating at HMA4-3. The accompanying hard selective sweep is segmentally eclipsed as a consequence of recurrent ectopic gene conversion among HMA4 protein-coding sequences, resulting in their concerted evolution. Thus, HMA4 coding sequences exhibit a network-like genealogy and locally enhanced nucleotide sequence diversity within each copy, accompanied by lowered sequence divergence between paralogs in any given individual. Quantitative PCR corroborated that, across A. halleri, three genomic HMA4 copies generate overall 20- to 130-fold higher transcript levels than in A. thaliana. Together, our observations constitute an unexpectedly complex profile of polymorphism resulting from natural selection for increased gene product dosage. We propose that these findings are paradigmatic of a category of multi-copy genes from a broad range of organisms. Our results emphasize that enhanced gene product dosage, in addition to neo- and sub-functionalization, can account for the genomic maintenance of gene duplicates underlying environmental adaptation.
Collapse
Affiliation(s)
- Marc Hanikenne
- Functional Genomics and Plant Molecular Imaging, Center for Protein Engineering (CIP), Department of Life Sciences, University of Liège, Liège, Belgium
| | - Juergen Kroymann
- Laboratoire d'Ecologie, Systématique et Evolution, Université Paris-Sud/CNRS, Orsay, France
| | | | - María Bernal
- Department of Plant Physiology, Ruhr University Bochum, Bochum, Germany
| | - Patrick Motte
- Functional Genomics and Plant Molecular Imaging, Center for Protein Engineering (CIP), Department of Life Sciences, University of Liège, Liège, Belgium
| | - Stephan Clemens
- Department of Plant Physiology, University of Bayreuth, Bayreuth, Germany
| | - Ute Krämer
- Department of Plant Physiology, Ruhr University Bochum, Bochum, Germany
| |
Collapse
|
16
|
Bánlaki Z, Szabó JA, Szilágyi Á, Patócs A, Prohászka Z, Füst G, Doleschall M. Intraspecific evolution of human RCCX copy number variation traced by haplotypes of the CYP21A2 gene. Genome Biol Evol 2013; 5:98-112. [PMID: 23241443 PMCID: PMC3595039 DOI: 10.1093/gbe/evs121] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The RCCX region is a complex, multiallelic, tandem copy number variation (CNV). Two complete genes, complement component 4 (C4) and steroid 21-hydroxylase (CYP21A2, formerly CYP21B), reside in its variable region. RCCX is prone to nonallelic homologous recombination (NAHR) such as unequal crossover, generating duplications and deletions of RCCX modules, and gene conversion. A series of allele-specific long-range polymerase chain reaction coupled to the whole-gene sequencing of CYP21A2 was developed for molecular haplotyping. By means of the developed techniques, 35 different kinds of CYP21A2 haplotype variant were experimentally determined from 112 unrelated European subjects. The number of the resolved CYP21A2 haplotype variants was increased to 61 by bioinformatic haplotype reconstruction. The CYP21A2 haplotype variants could be assigned to the haplotypic RCCX CNV structures (the copy number of RCCX modules) in most cases. The genealogy network constructed from the CYP21A2 haplotype variants delineated the origin of RCCX structures. The different RCCX structures were located in tight groups. The minority of groups with identical RCCX structure occurred once in the network, implying monophyletic origin, but the majority of groups occurred several times and in different locations, indicating polyphyletic origin. The monophyletic groups were often created by single unequal crossover, whereas recurrent unequal crossover events generated some of the polyphyletic groups. As a result of recurrent NAHR events, more CYP21A2 haplotype variants with different allele patterns belonged to the same RCCX structure. The intraspecific evolution of RCCX CNV described here has provided a reasonable expectation for that of complex, multiallelic, tandem CNVs in humans.
Collapse
Affiliation(s)
- Zsófia Bánlaki
- 3rd Department of Internal Medicine, Semmelweis University, Budapest, Hungary
| | | | | | | | | | | | | |
Collapse
|