1
|
Rivas-González I, Schierup MH, Wakeley J, Hobolth A. TRAILS: Tree reconstruction of ancestry using incomplete lineage sorting. PLoS Genet 2024; 20:e1010836. [PMID: 38330138 PMCID: PMC10880969 DOI: 10.1371/journal.pgen.1010836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 02/21/2024] [Accepted: 01/22/2024] [Indexed: 02/10/2024] Open
Abstract
Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution.
Collapse
Affiliation(s)
| | - Mikkel H. Schierup
- Bioinformatics Research Center (BiRC), Aarhus University, Aarhus, Denmark
| | - John Wakeley
- Department of Organismic and Evolutionary Biology, Harvard University, Massachusetts, United States of America
| | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
2
|
Tigano A, Khan R, Omer AD, Weisz D, Dudchenko O, Multani AS, Pathak S, Behringer RR, Aiden EL, Fisher H, MacManes MD. Chromosome size affects sequence divergence between species through the interplay of recombination and selection. Evolution 2022; 76:782-798. [PMID: 35271737 PMCID: PMC9314927 DOI: 10.1111/evo.14467] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 12/12/2021] [Indexed: 01/21/2023]
Abstract
The structure of the genome shapes the distribution of genetic diversity and sequence divergence. To investigate how the relationship between chromosome size and recombination rate affects sequence divergence between species, we combined empirical analyses and evolutionary simulations. We estimated pairwise sequence divergence among 15 species from three different mammalian clades-Peromyscus rodents, Mus mice, and great apes-from chromosome-level genome assemblies. We found a strong significant negative correlation between chromosome size and sequence divergence in all species comparisons within the Peromyscus and great apes clades but not the Mus clade, suggesting that the dramatic chromosomal rearrangements among Mus species may have masked the ancestral genomic landscape of divergence in many comparisons. Our evolutionary simulations showed that the main factor determining differences in divergence among chromosomes of different sizes is the interplay of recombination rate and selection, with greater variation in larger populations than in smaller ones. In ancestral populations, shorter chromosomes harbor greater nucleotide diversity. As ancestral populations diverge, diversity present at the onset of the split contributes to greater sequence divergence in shorter chromosomes among daughter species. The combination of empirical data and evolutionary simulations revealed that chromosomal rearrangements, demography, and divergence times may also affect the relationship between chromosome size and divergence, thus deepening our understanding of the role of genome structure in the evolution of species divergence.
Collapse
Affiliation(s)
- Anna Tigano
- Molecular, Cellular, and Biomedical Sciences DepartmentUniversity of New HampshireDurhamNH03824USA,Hubbard Center for Genome StudiesUniversity of New HampshireDurhamNH03824USA,Current address: Department of BiologyUniversity of British Columbia – Okanagan CampusKelownaBCV1 V 1V7Canada
| | - Ruqayya Khan
- The Center for Genome ArchitectureDepartment of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Arina D. Omer
- The Center for Genome ArchitectureDepartment of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - David Weisz
- The Center for Genome ArchitectureDepartment of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA
| | - Olga Dudchenko
- The Center for Genome ArchitectureDepartment of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA,Department of Computer ScienceDepartment of Computational and Applied MathematicsRice UniversityHoustonTX77030USA
| | - Asha S. Multani
- Department of GeneticsM.D. Anderson Cancer CenterUniversity of TexasHoustonTX77030USA
| | - Sen Pathak
- Department of GeneticsM.D. Anderson Cancer CenterUniversity of TexasHoustonTX77030USA
| | - Richard R. Behringer
- Department of GeneticsM.D. Anderson Cancer CenterUniversity of TexasHoustonTX77030USA
| | - Erez L. Aiden
- The Center for Genome ArchitectureDepartment of Molecular and Human GeneticsBaylor College of MedicineHoustonTX77030USA,Department of Computer ScienceDepartment of Computational and Applied MathematicsRice UniversityHoustonTX77030USA,Center for Theoretical and Biological PhysicsRice UniversityHoustonTX77030USA,Shanghai Institute for Advanced Immunochemical StudiesShanghaiTech UniversityShanghai201210China,School of Agriculture and EnvironmentUniversity of Western AustraliaPerthWA6009Australia
| | - Heidi Fisher
- Department of BiologyUniversity of MarylandCollege ParkMD20742USA
| | - Matthew D. MacManes
- Molecular, Cellular, and Biomedical Sciences DepartmentUniversity of New HampshireDurhamNH03824USA,Hubbard Center for Genome StudiesUniversity of New HampshireDurhamNH03824USA
| |
Collapse
|
3
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
4
|
Wang MS, Thakur M, Jhala Y, Wang S, Srinivas Y, Dai SS, Liu ZX, Chen HM, Green RE, Koepfli KP, Shapiro B. OUP accepted manuscript. Genome Biol Evol 2022; 14:6524629. [PMID: 35137061 PMCID: PMC8841465 DOI: 10.1093/gbe/evac012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/13/2022] [Indexed: 11/14/2022] Open
Affiliation(s)
- Ming-Shan Wang
- Howard Hughes Medical Institute, University of California Santa Cruz, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, USA
- Corresponding authors: E-mails: ; ; ;
| | - Mukesh Thakur
- Zoological Survey of India, New Alipore, Kolkata, West Bengal, India
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Corresponding authors: E-mails: ; ; ;
| | | | - Sheng Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Yellapu Srinivas
- Wildlife Institute of India, Chandrabani, Dehradun, Uttarakhand, India
| | - Shan-Shan Dai
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Zheng-Xi Liu
- College of Animal Science, Jilin University, Changchun, China
| | - Hong-Man Chen
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, China
| | - Richard E Green
- Department of Biomolecular Engineering, University of California Santa Cruz, USA
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, USA
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, District of Columbia, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg, Russia
- Corresponding authors: E-mails: ; ; ;
| | - Beth Shapiro
- Howard Hughes Medical Institute, University of California Santa Cruz, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, USA
- Corresponding authors: E-mails: ; ; ;
| |
Collapse
|
5
|
Klink GV, O'Keefe H, Gogna A, Bazykin GA, Elson JL. A broad comparative genomics approach to understanding the pathogenicity of Complex I mutations. Sci Rep 2021; 11:19578. [PMID: 34599203 PMCID: PMC8486755 DOI: 10.1038/s41598-021-98360-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Accepted: 09/01/2021] [Indexed: 12/29/2022] Open
Abstract
Disease caused by mutations of mitochondrial DNA (mtDNA) are highly variable in both presentation and penetrance. Over the last 30 years, clinical recognition of this group of diseases has increased. It has been suggested that haplogroup background could influence the penetrance and presentation of disease-causing mutations; however, to date there is only one well-established example of such an effect: the increased penetrance of two Complex I Leber's hereditary optic neuropathy mutations on a haplogroup J background. This paper conducts the most extensive investigation to date into the importance of haplogroup context in the pathogenicity of mtDNA mutations in Complex I. We searched for proven human point mutations across more than 900 metazoans finding human disease-causing mutations and potential masking variants. We found more than a half of human pathogenic variants as compensated pathogenic deviations (CPD) in at least in one animal species from our multiple sequence alignments. Some variants were found in many species, and some were even the most prevalent amino acids across our dataset. Variants were also found in other primates, and in such cases, we looked for non-human amino acids in sites with high probability to interact with the CPD in folded protein. Using this "local interactions" approach allowed us to find potential masking substitutions in other amino acid sites. We suggest that the masking variants might arise in humans, resulting in variability of mutation effect in our species.
Collapse
Affiliation(s)
- Galya V Klink
- Sector of Molecular Evolution, Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russian Federation
| | - Hannah O'Keefe
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Amrita Gogna
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Georgii A Bazykin
- Sector of Molecular Evolution, Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russian Federation.
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, Russian Federation.
| | - Joanna L Elson
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK.
- Human Metabolomics, North-West University, Potchefstroom, South Africa.
| |
Collapse
|
6
|
Liu X, Ogilvie HA, Nakhleh L. Variational inference using approximate likelihood under the coalescent with recombination. Genome Res 2021; 31:2107-2119. [PMID: 34426513 PMCID: PMC8559707 DOI: 10.1101/gr.273631.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 08/17/2021] [Indexed: 11/30/2022]
Abstract
Coalescent methods are proven and powerful tools for population genetics, phylogenetics, epidemiology, and other fields. A promising avenue for the analysis of large genomic alignments, which are increasingly common, is coalescent hidden Markov model (coalHMM) methods, but these methods have lacked general usability and flexibility. We introduce a novel method for automatically learning a coalHMM and inferring the posterior distributions of evolutionary parameters using black-box variational inference, with the transition rates between local genealogies derived empirically by simulation. This derivation enables our method to work directly with three or four taxa and through a divide-and-conquer approach with more taxa. Using a simulated data set resembling a human–chimp–gorilla scenario, we show that our method has comparable or better accuracy to previous coalHMM methods. Both species divergence times and population sizes were accurately inferred. The method also infers local genealogies, and we report on their accuracy. Furthermore, we discuss a potential direction for scaling the method to larger data sets through a divide-and-conquer approach. This accuracy means our method is useful now, and by deriving transition rates by simulation, it is flexible enough to enable future implementations of various population models.
Collapse
Affiliation(s)
- Xinhao Liu
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Huw A Ogilvie
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
7
|
Wang W, Wuyun Q, Liu KJ. An Application of Random Walk Resampling to Phylogenetic HMM Inference and Learning. IEEE Trans Nanobioscience 2020; 19:506-517. [PMID: 32396096 DOI: 10.1109/tnb.2020.2991302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Statistical resampling methods are widely used for confidence interval placement and as a data perturbation technique for statistical inference and learning. An important assumption of popular resampling methods such as the standard bootstrap is that input observations are identically and independently distributed (i.i.d.). However, within the area of computational biology and bioinformatics, many different factors can contribute to intra-sequence dependence, such as recombination and other evolutionary processes governing sequence evolution. The SEquential RESampling ("SERES") framework was previously proposed to relax the simplifying assumption of i.i.d. input observations. SERES resampling takes the form of random walks on an input of either aligned or unaligned biomolecular sequences. This study introduces the first application of SERES random walks on aligned sequence inputs and is also the first to demonstrate the utility of SERES as a data perturbation technique to yield improved statistical estimates. We focus on the classical problem of recombination-aware local genealogical inference. We show in a simulation study that coupling SERES resampling and re-estimation with recHMM, a hidden Markov model-based method, produces local genealogical inferences with consistent and often large improvements in terms of topological accuracy. We further evaluate method performance using empirical HIV genome sequence datasets.
Collapse
|
8
|
Castellano D, Macià MC, Tataru P, Bataillon T, Munch K. Comparison of the Full Distribution of Fitness Effects of New Amino Acid Mutations Across Great Apes. Genetics 2019; 213:953-966. [PMID: 31488516 PMCID: PMC6827385 DOI: 10.1534/genetics.119.302494] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 08/29/2019] [Indexed: 12/31/2022] Open
Abstract
The distribution of fitness effects (DFE) is central to many questions in evolutionary biology. However, little is known about the differences in DFE between closely related species. We use >9000 coding genes orthologous one-to-one across great apes, gibbons, and macaques to assess the stability of the DFE across great apes. We use the unfolded site frequency spectrum of polymorphic mutations (n = 8 haploid chromosomes per population) to estimate the DFE. We find that the shape of the deleterious DFE is strikingly similar across great apes. We confirm that effective population size (Ne ) is a strong predictor of the strength of negative selection, consistent with the nearly neutral theory. However, we also find that the strength of negative selection varies more than expected given the differences in Ne between species. Across species, mean fitness effects of new deleterious mutations covaries with Ne , consistent with positive epistasis among deleterious mutations. We find that the strength of negative selection for the smallest populations, bonobos and western chimpanzees, is higher than expected given their Ne This may result from a more efficient purging of strongly deleterious recessive variants in these populations. Forward simulations confirm that these findings are not artifacts of the way we are inferring Ne and DFE parameters. All findings are replicated using only GC-conservative mutations, thereby confirming that GC-biased gene conversion is not affecting our conclusions.
Collapse
Affiliation(s)
- David Castellano
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Moisès Coll Macià
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Paula Tataru
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark
| |
Collapse
|
9
|
Steinrücken M, Kamm J, Spence JP, Song YS. Inference of complex population histories using whole-genome sequences from multiple populations. Proc Natl Acad Sci U S A 2019; 116:17115-17120. [PMID: 31387977 PMCID: PMC6708337 DOI: 10.1073/pnas.1905060116] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
There has been much interest in analyzing genome-scale DNA sequence data to infer population histories, but inference methods developed hitherto are limited in model complexity and computational scalability. Here we present an efficient, flexible statistical method, diCal2, that can use whole-genome sequence data from multiple populations to infer complex demographic models involving population size changes, population splits, admixture, and migration. Applying our method to data from Australian, East Asian, European, and Papuan populations, we find that the population ancestral to Australians and Papuans started separating from East Asians and Europeans about 100,000 y ago, and that the separation of East Asians and Europeans started about 50,000 y ago, with pervasive gene flow between all pairs of populations.
Collapse
Affiliation(s)
- Matthias Steinrücken
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637
- Department of Human Genetics, University of Chicago, Chicago, IL 60637
| | - Jack Kamm
- Department of Statistics, University of California, Berkeley, CA 94720
- Chan Zuckerberg Biohub, San Francisco, CA 94158
| | - Jeffrey P Spence
- Computational Biology Graduate Group, University of California, Berkeley, CA 94720
| | - Yun S Song
- Department of Statistics, University of California, Berkeley, CA 94720;
- Chan Zuckerberg Biohub, San Francisco, CA 94158
- Computer Science Division, University of California, Berkeley, CA 94720
| |
Collapse
|
10
|
Dutheil JY, Hobolth A. Ancestral Population Genomics. Methods Mol Biol 2019; 1910:555-589. [PMID: 31278677 DOI: 10.1007/978-1-4939-9074-0_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Borrowing both from population genetics and phylogenetics, the field of population genomics emerged as full genomes of several closely related species were available. Providing we can properly model sequence evolution within populations undergoing speciation events, this resource enables us to estimate key population genetics parameters such as ancestral population sizes and split times. Furthermore we can enhance our understanding of the recombination process and investigate various selective forces. With the advent of resequencing technologies, genome-wide patterns of diversity in extant populations have now come to complement this picture, offering an increasing power to study more recent genetic history.We discuss the basic models of genomes in populations, including speciation models for closely related species. A major point in our discussion is that only a few complete genomes contain much information about the whole population. The reason being that recombination unlinks genomic regions, and therefore a few genomes contain many segments with distinct histories. The challenge of population genomics is to decode this mosaic of histories in order to infer scenarios of demography and selection. We survey modeling strategies for understanding genetic variation in ancestral populations and species. The underlying models build on the coalescent with recombination process and introduce further assumptions to scale the analyses to genomic data sets.
Collapse
Affiliation(s)
- Julien Y Dutheil
- Department of Evolutionary Genetics, Max Planck Institute of Evolutionary Biology, Plön, Germany.
| | - Asger Hobolth
- Bioinformatics Research Center (BiRC), Aarhus University, Aarhus, Denmark
| |
Collapse
|
11
|
Leaché AD, Zhu T, Rannala B, Yang Z. The Spectre of Too Many Species. Syst Biol 2019; 68:168-181. [PMID: 29982825 PMCID: PMC6292489 DOI: 10.1093/sysbio/syy051] [Citation(s) in RCA: 133] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 06/29/2018] [Accepted: 06/29/2018] [Indexed: 11/21/2022] Open
Abstract
Recent simulation studies examining the performance of Bayesian species delimitation as implemented in the bpp program have suggested that bpp may detect population splits but not species divergences and that it tends to over-split when data of many loci are analyzed. Here, we confirm these results and provide the mathematical justifications. We point out that the distinction between population and species splits made in the protracted speciation model (PSM) has no influence on the generation of gene trees and sequence data, which explains why no method can use such data to distinguish between population splits and speciation. We suggest that the PSM is unrealistic as its mechanism for assigning species status assumes instantaneous speciation, contradicting prevailing taxonomic practice. We confirm the suggestion, based on simulation, that in the case of speciation with gene flow, Bayesian model selection as implemented in bpp tends to detect population splits when the amount of data (the number of loci) increases. We discuss the use of a recently proposed empirical genealogical divergence index (gdi) for species delimitation and illustrate that parameter estimates produced by a full likelihood analysis as implemented in bpp provide much more reliable inference under the gdi than the approximate method phrapl. We distinguish between Bayesian model selection and parameter estimation and suggest that the model selection approach is useful for identifying sympatric cryptic species, while the parameter estimation approach may be used to implement empirical criteria for determining species status among allopatric populations.
Collapse
Affiliation(s)
- Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, USA
| | - Tianqi Zhu
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
| | - Bruce Rannala
- Department of Evolution and Ecology, University of California Davis, One Shields Avenue, Davis, USA
| | - Ziheng Yang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, China
- Department of Genetics, University College London, London, UK
- Radcliffe Institute for Advanced Studies, Harvard University, Cambridge, USA
| |
Collapse
|
12
|
Rogers J, Raveendran M, Harris RA, Mailund T, Leppälä K, Athanasiadis G, Schierup MH, Cheng J, Munch K, Walker JA, Konkel MK, Jordan V, Steely CJ, Beckstrom TO, Bergey C, Burrell A, Schrempf D, Noll A, Kothe M, Kopp GH, Liu Y, Murali S, Billis K, Martin FJ, Muffato M, Cox L, Else J, Disotell T, Muzny DM, Phillips-Conroy J, Aken B, Eichler EE, Marques-Bonet T, Kosiol C, Batzer MA, Hahn MW, Tung J, Zinner D, Roos C, Jolly CJ, Gibbs RA, Worley KC. The comparative genomics and complex population history of Papio baboons. SCIENCE ADVANCES 2019; 5:eaau6947. [PMID: 30854422 PMCID: PMC6401983 DOI: 10.1126/sciadv.aau6947] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Accepted: 12/06/2018] [Indexed: 05/26/2023]
Abstract
Recent studies suggest that closely related species can accumulate substantial genetic and phenotypic differences despite ongoing gene flow, thus challenging traditional ideas regarding the genetics of speciation. Baboons (genus Papio) are Old World monkeys consisting of six readily distinguishable species. Baboon species hybridize in the wild, and prior data imply a complex history of differentiation and introgression. We produced a reference genome assembly for the olive baboon (Papio anubis) and whole-genome sequence data for all six extant species. We document multiple episodes of admixture and introgression during the radiation of Papio baboons, thus demonstrating their value as a model of complex evolutionary divergence, hybridization, and reticulation. These results help inform our understanding of similar cases, including modern humans, Neanderthals, Denisovans, and other ancient hominins.
Collapse
Affiliation(s)
- Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - R. Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, CF Møllers Alle 8, DK-8000 Aarhus, Denmark
| | - Kalle Leppälä
- Bioinformatics Research Centre, Aarhus University, CF Møllers Alle 8, DK-8000 Aarhus, Denmark
| | - Georgios Athanasiadis
- Bioinformatics Research Centre, Aarhus University, CF Møllers Alle 8, DK-8000 Aarhus, Denmark
| | - Mikkel Heide Schierup
- Bioinformatics Research Centre, Aarhus University, CF Møllers Alle 8, DK-8000 Aarhus, Denmark
| | - Jade Cheng
- Bioinformatics Research Centre, Aarhus University, CF Møllers Alle 8, DK-8000 Aarhus, Denmark
| | - Kasper Munch
- Bioinformatics Research Centre, Aarhus University, CF Møllers Alle 8, DK-8000 Aarhus, Denmark
| | - Jerilyn A. Walker
- Department of Biological Sciences, 202 Life Sciences Building, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Miriam K. Konkel
- Department of Genetics and Biochemistry, 105 Collings Street, Clemson University, Clemson, SC 29634, USA
| | - Vallmer Jordan
- Department of Biological Sciences, 202 Life Sciences Building, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Cody J. Steely
- Department of Biological Sciences, 202 Life Sciences Building, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Thomas O. Beckstrom
- Department of Biological Sciences, 202 Life Sciences Building, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Christina Bergey
- Department of Anthropology, New York University, 25 Waverly Place, New York, NY 10003, USA
- Departments of Anthropology and Biology, Pennsylvania State University, 514 Carpenter Building, University Park, PA 16802, USA
| | - Andrew Burrell
- Department of Anthropology, New York University, 25 Waverly Place, New York, NY 10003, USA
| | - Dominik Schrempf
- Institut für Populationsgenetik, Veterinärmedizinische Universität Wien, Veterinärplatz 11210 Vienna, Austria
| | - Angela Noll
- Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Maximillian Kothe
- Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Gisela H. Kopp
- Cognitive Ethology Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
- Department of Biology, University of Konstanz, Universitätsstr. 10, 78467 Konstanz, Germany
- Department of Migration and Immuno-Ecology, Max Planck Institute for Ornithology, Am Obstberg 1, 78315 Radolfzell, Germany
| | - Yue Liu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Shwetha Murali
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Department of Genome Sciences, University of Washington, 3720 15th Avenue NE, S413C, Box 355065, Seattle, WA 98195-5065, USA
- Howard Hughes Medical Institute, University of Washington, 3720 15th Avenue NE, S413C, Box 355065, Seattle, WA 98195-5065, USA
| | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Laura Cox
- Southwest National Primate Research Center, Texas Biomedical Research Institute, 8715 W. Military Drive, San Antonio, TX 78227, USA
- Center for Precision Medicine, Department of Internal Medicine, Section on Molecular Medicine, Wake Forest School of Medicine, 475 Vine Street, Winston-Salem, NC 27101, USA
| | - James Else
- Department of Pathology and Laboratory Medicine and Yerkes Primate Research Center, 954 Gatewood Road, Emory University, Atlanta, GA 30322, USA
| | - Todd Disotell
- Department of Anthropology, New York University, 25 Waverly Place, New York, NY 10003, USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jane Phillips-Conroy
- Department of Neuroscience, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, MO 63110, USA
- Department of Anthropology, Washington University, McMillan Hall, 1 Brookings Drive, St. Louis, MO 63130, USA
| | - Bronwen Aken
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington, 3720 15th Avenue NE, S413C, Box 355065, Seattle, WA 98195-5065, USA
- Howard Hughes Medical Institute, University of Washington, 3720 15th Avenue NE, S413C, Box 355065, Seattle, WA 98195-5065, USA
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Dr. Aiguader, 88. 08003, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Passeig de Lluís Companys, 23, 08010, Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology, Baldiri Reixac, 4, 08028, Barcelona, Spain
- Institut Catala de Paleontologia Miquel Crusafont, Universitat Autonoma de Barcelona, c/de les Columnes, s/n. Campus de la UAB. 08193–Cerdanyola del Vallès, Barcelona, Spain
| | - Carolin Kosiol
- Institut für Populationsgenetik, Veterinärmedizinische Universität Wien, Veterinärplatz 11210 Vienna, Austria
- Centre for Biological Diversity, School of Biology, University of St. Andrews, Dyers Brae House, Greenside Place, St Andrews, Fife, KY16 9TH, UK
| | - Mark A. Batzer
- Department of Biological Sciences, 202 Life Sciences Building, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Matthew W. Hahn
- Department of Biology and Department of Computer Science, Indiana University, 1001 E. 3rd Street, Bloomington, IN 47405, USA
| | - Jenny Tung
- Department of Biology, Duke University, Box 90338, Durham, NC 27708, USA
- Duke Population Research Institute, Duke University, Box 90989, Durham, NC 27708, USA
- Institute of Primate Research, P.O. Box 24481, Nairobi, Kenya
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Christian Roos
- Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Kellnerweg 4, 37077 Göttingen, Germany
| | - Clifford J. Jolly
- Department of Anthropology, New York University, 25 Waverly Place, New York, NY 10003, USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Kim C. Worley
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | | |
Collapse
|
13
|
Incomplete lineage sorting rather than hybridization explains the inconsistent phylogeny of the wisent. Commun Biol 2018; 1:169. [PMID: 30374461 PMCID: PMC6195592 DOI: 10.1038/s42003-018-0176-6] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 09/12/2018] [Indexed: 12/30/2022] Open
Abstract
The wisent or European bison is the largest European herbivore and is completely cross-fertile with its American relative. However, mtDNA genome of wisent is similar to that of cattle, which suggests that wisent emerged as a hybrid of bison and an extinct cattle-like species. Here, we analyzed nuclear whole-genome sequences of the bovine species, and found only a minor and recent gene flow between wisent and cattle. Furthermore, we identified an appreciable heterogeneity of the nuclear gene tree topologies of the bovine species. The relative frequencies of various topologies, including the mtDNA topology, were consistent with frequencies of incomplete lineage sorting (ILS) as estimated by tree coalescence analysis. This indicates that ILS has occurred and may well account for the anomalous wisent mtDNA phylogeny as the outcome of a rare event. We propose that ILS is a possible explanation of phylogenomic anomalies among closely related species. Kun Wang et al. present a genomic analysis identifying incomplete lineage sorting and hybridization in the mitochondrial DNA of the European bison (wisent). They find that incomplete lineage sorting is the most feasible explanation for the phylogenetic heterogeneity observed in Bovidae.
Collapse
|
14
|
Beeravolu CR, Hickerson MJ, Frantz LAF, Lohse K. ABLE: blockwise site frequency spectra for inferring complex population histories and recombination. Genome Biol 2018; 19:145. [PMID: 30253810 PMCID: PMC6156964 DOI: 10.1186/s13059-018-1517-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 08/22/2018] [Indexed: 01/08/2023] Open
Abstract
We introduce ABLE (Approximate Blockwise Likelihood Estimation), a novel simulation-based composite likelihood method that uses the blockwise site frequency spectrum to jointly infer past demography and recombination. ABLE is explicitly designed for a wide variety of data from unphased diploid genomes to genome-wide multi-locus data (for example, RADSeq) and can also accommodate arbitrarily large samples. We use simulations to demonstrate the accuracy of this method to infer complex histories of divergence and gene flow and reanalyze whole genome data from two species of orangutan. ABLE is available for download at https://github.com/champost/ABLE.
Collapse
Affiliation(s)
- Champak R Beeravolu
- Biology Department, The City College of New York, New York, 10031, NY, USA. .,Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, 8057, Switzerland.
| | - Michael J Hickerson
- Biology Department, The City College of New York, New York, 10031, NY, USA.,The Graduate Center, The City University of New York, New York, 10016, NY, USA.,Division of Invertebrate Zoology, American Museum of Natural History, New York, 10024, NY, USA
| | - Laurent A F Frantz
- Paleogenomics and Bio-Archaeology Research Network, Research Laboratory for Archeology and History of Art, University of Oxford, Oxford, OX1 3QY, UK.,School of Biological and Chemical Sciences, Queen Mary University of London, London, E1 4NS, UK
| | - Konrad Lohse
- Institute of Evolutionary Biology, University of Edinburgh, King's Buildings, Edinburgh, EH9 3FL, UK
| |
Collapse
|
15
|
The Evolution and Population Diversity of Bison in Pleistocene and Holocene Eurasia: Sex Matters. DIVERSITY 2018. [DOI: 10.3390/d10030065] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Knowledge about the origin and evolutionary history of the bison has been improved recently owing to several genomic and paleogenomic studies published in the last two years, which elucidated large parts of the evolution of bison populations during the Upper Pleistocene and Holocene in Eurasia. The produced data, however, were interpreted in contradicting manners. Here, we have gathered, reanalyzed and compared previously published or unpublished morphometric and genetic data that have not yet been integrated and that we synthesize in a unified framework. In particular, we re-estimate dates of divergence of mitogenome lineages based on an extended dataset comprising 81 complete ancient bison mitogenomes and we revisit putative gene flow between the Bos and Bison genera based on comparative analyses of ancient and modern bison genomes, thereby questioning published conclusions. Morphometric analyses taking into account sexual dimorphism invalidate a previous claim that Bison schoetensacki was present in France during the Late Pleistocene. Both morphometric and genome analyses reveal that Eurasian bison belonging to different Bison priscus and Bison bonasus lineages maintained parallel evolutionary paths with gene flow during a long period of incomplete speciation that ceased only upon the migration of B. priscus to the American continent establishing the American bison lineage. Our nuclear genome analysis of the evolutionary history of B. bonasus allows us to reject the previous hypothesis that it is a hybrid of B. priscus and Bos primigenius. Based on present-day behavioral studies of European and American bison, we propose that apparently conflicting lines of evidence can be reconciled by positing that female bison drove the specialization of bison populations to different ecological niches while male bison drove regular homogenizing genetic exchanges between populations.
Collapse
|
16
|
Pervasive introgression facilitated domestication and adaptation in the Bos species complex. Nat Ecol Evol 2018; 2:1139-1145. [DOI: 10.1038/s41559-018-0562-y] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Accepted: 04/20/2018] [Indexed: 12/23/2022]
|
17
|
Schrider DR, Kern AD. Supervised Machine Learning for Population Genetics: A New Paradigm. Trends Genet 2018; 34:301-312. [PMID: 29331490 PMCID: PMC5905713 DOI: 10.1016/j.tig.2017.12.005] [Citation(s) in RCA: 199] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Revised: 11/29/2017] [Accepted: 12/08/2017] [Indexed: 01/21/2023]
Abstract
As population genomic datasets grow in size, researchers are faced with the daunting task of making sense of a flood of information. To keep pace with this explosion of data, computational methodologies for population genetic inference are rapidly being developed to best utilize genomic sequence data. In this review we discuss a new paradigm that has emerged in computational population genomics: that of supervised machine learning (ML). We review the fundamentals of ML, discuss recent applications of supervised ML to population genetics that outperform competing methods, and describe promising future directions in this area. Ultimately, we argue that supervised ML is an important and underutilized tool that has considerable potential for the world of evolutionary genomics.
Collapse
Affiliation(s)
- Daniel R Schrider
- Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA.
| | - Andrew D Kern
- Department of Genetics, and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08554, USA.
| |
Collapse
|
18
|
Springer MS, Gatesy J. Delimiting Coalescence Genes (C-Genes) in Phylogenomic Data Sets. Genes (Basel) 2018; 9:genes9030123. [PMID: 29495400 PMCID: PMC5867844 DOI: 10.3390/genes9030123] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 02/02/2018] [Accepted: 02/19/2018] [Indexed: 02/07/2023] Open
Abstract
coalescence methods have emerged as a popular alternative for inferring species trees with large genomic datasets, because these methods explicitly account for incomplete lineage sorting. However, statistical consistency of summary coalescence methods is not guaranteed unless several model assumptions are true, including the critical assumption that recombination occurs freely among but not within coalescence genes (c-genes), which are the fundamental units of analysis for these methods. Each c-gene has a single branching history, and large sets of these independent gene histories should be the input for genome-scale coalescence estimates of phylogeny. By contrast, numerous studies have reported the results of coalescence analyses in which complete protein-coding sequences are treated as c-genes even though exons for these loci can span more than a megabase of DNA. Empirical estimates of recombination breakpoints suggest that c-genes may be much shorter, especially when large clades with many species are the focus of analysis. Although this idea has been challenged recently in the literature, the inverse relationship between c-gene size and increased taxon sampling in a dataset-the 'recombination ratchet'-is a fundamental property of c-genes. For taxonomic groups characterized by genes with long intron sequences, complete protein-coding sequences are likely not valid c-genes and are inappropriate units of analysis for summary coalescence methods unless they occur in recombination deserts that are devoid of incomplete lineage sorting (ILS). Finally, it has been argued that coalescence methods are robust when the no-recombination within loci assumption is violated, but recombination must matter at some scale because ILS, a by-product of recombination, is the raison d'etre for coalescence methods. That is, extensive recombination is required to yield the large number of independently segregating c-genes used to infer a species tree. If coalescent methods are powerful enough to infer the correct species tree for difficult phylogenetic problems in the anomaly zone, where concatenation is expected to fail because of ILS, then there should be a decreasing probability of inferring the correct species tree using longer loci with many intralocus recombination breakpoints (i.e., increased levels of concatenation).
Collapse
Affiliation(s)
- Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA 92521, USA.
| | - John Gatesy
- Division of Vertebrate Zoology and Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA.
| |
Collapse
|
19
|
Leonardi M, Librado P, Der Sarkissian C, Schubert M, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, Gamba C, Willerslev E, Orlando L. Evolutionary Patterns and Processes: Lessons from Ancient DNA. Syst Biol 2018; 66:e1-e29. [PMID: 28173586 PMCID: PMC5410953 DOI: 10.1093/sysbio/syw059] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Revised: 06/04/2016] [Accepted: 06/06/2016] [Indexed: 12/02/2022] Open
Abstract
Ever since its emergence in 1984, the field of ancient DNA has struggled to overcome the challenges related to the decay of DNA molecules in the fossil record. With the recent development of high-throughput DNA sequencing technologies and molecular techniques tailored to ultra-damaged templates, it has now come of age, merging together approaches in phylogenomics, population genomics, epigenomics, and metagenomics. Leveraging on complete temporal sample series, ancient DNA provides direct access to the most important dimension in evolution—time, allowing a wealth of fundamental evolutionary processes to be addressed at unprecedented resolution. This review taps into the most recent findings in ancient DNA research to present analyses of ancient genomic and metagenomic data.
Collapse
Affiliation(s)
- Michela Leonardi
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Pablo Librado
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Clio Der Sarkissian
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Mikkel Schubert
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Ahmed H Alfarhan
- Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Saleh A Alquraishi
- Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | | | - Cristina Gamba
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark
| | - Eske Willerslev
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark.,Zoology Department, College of Science, King Saud University, Riyadh, Saudi Arabia
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Øster Voldgade, Copenhagen, Denmark.,Université de Toulouse, University Paul Sabatier (UPS), Laboratoire AMIS, Toulouse, France
| |
Collapse
|
20
|
Pedersen CET, Albrechtsen A, Etter PD, Johnson EA, Orlando L, Chikhi L, Siegismund HR, Heller R. A southern African origin and cryptic structure in the highly mobile plains zebra. Nat Ecol Evol 2018; 2:491-498. [PMID: 29358610 DOI: 10.1038/s41559-017-0453-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 12/14/2017] [Indexed: 12/30/2022]
Abstract
The plains zebra (Equus quagga) is an ecologically important species of the African savannah. It is also one of the most numerous and widely distributed ungulates, and six subspecies have been described based on morphological variation. However, the within-species evolutionary processes have been difficult to resolve due to its high mobility and a lack of consensus regarding the population structure. We obtained genome-wide DNA polymorphism data from more than 167,000 loci for 59 plains zebras from across the species range, encompassing all recognized extant subspecies, as well as three mountain zebras (Equus zebra) and three Grevy's zebras (Equus grevyi). Surprisingly, the population genetic structure does not mirror the morphology-based subspecies delineation, underlining the dangers of basing management units exclusively on morphological variation. We use demographic modelling to provide insights into the past phylogeography of the species. The results identify a southern African location as the most likely source region from which all extant populations expanded around 370,000 years ago. We show evidence for inclusion of the extinct and phenotypically divergent quagga (Equus quagga quagga) in the plains zebra variation and reveal that it was less divergent from the other subspecies than the northernmost (Ugandan) extant population.
Collapse
Affiliation(s)
- Casper-Emil T Pedersen
- Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Anders Albrechtsen
- Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark
| | - Paul D Etter
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
| | - Eric A Johnson
- Institute of Molecular Biology, University of Oregon, Eugene, OR, USA
| | - Ludovic Orlando
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Lounes Chikhi
- Instituto Gulbenkian de Ciência, Oeiras, Portugal.,Centre National de la Recherche Scientifique, Université Paul Sabatier, École Nationale de Formation Agronomique, UMR 5174 Laboratoire Évolution et Diversité Biologique, Toulouse, France
| | - Hans R Siegismund
- Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark
| | - Rasmus Heller
- Department of Biology, Section for Computational and RNA Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
21
|
Abstract
With the advent of sequencing techniques population genomics took a major shift. The structure of data sets has evolved from a sample of a few loci in the genome, sequenced in dozens of individuals, to collections of complete genomes, virtually comprising all available loci. Initially sequenced in a few individuals, such genomic data sets are now reaching and even exceeding the size of traditional data sets in the number of haplotypes sequenced. Because all loci in a genome are not independent, this evolution of data sets is mirrored by a methodological change. The evolutionary processes that generate the observed sequences are now modeled spatially along genomes whereas it was previously described temporally (either in a forward or backward manner). Although the spatial process of sequence evolution is complex, approximations to the model feature Markovian properties, permitting efficient inference. In this chapter, we introduce these recent developments that enable the modeling of the evolutionary history of a sample of several individual genomes. Such models assume the occurrence of meiotic recombination, and therefore, to date, they are dedicated to the analysis of eukaryotic species.
Collapse
|
22
|
Reid MJC, Switzer WM, Schillaci MA, Klegarth AR, Campbell E, Ragonnet-Cronin M, Joanisse I, Caminiti K, Lowenberger CA, Galdikas BMF, Hollocher H, Sandstrom PA, Brooks JI. Bayesian inference reveals ancient origin of simian foamy virus in orangutans. INFECTION GENETICS AND EVOLUTION 2017; 51:54-66. [PMID: 28274887 DOI: 10.1016/j.meegid.2017.03.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Revised: 02/25/2017] [Accepted: 03/03/2017] [Indexed: 02/08/2023]
Abstract
Simian foamy viruses (SFVs) infect most nonhuman primate species and appears to co-evolve with its hosts. This co-evolutionary signal is particularly strong among great apes, including orangutans (genus Pongo). Previous studies have identified three distinct orangutan SFV clades. The first of these three clades is composed of SFV from P. abelii from Sumatra, the second consists of SFV from P. pygmaeus from Borneo, while the third clade is mixed, comprising an SFV strain found in both species of orangutan. The existence of the mixed clade has been attributed to an expansion of P. pygmaeus into Sumatra following the Mount Toba super-volcanic eruption about 73,000years ago. Divergence dating, however, has yet to be performed to establish a temporal association with the Toba eruption. Here, we use a Bayesian framework and a relaxed molecular clock model with fossil calibrations to test the Toba hypothesis and to gain a more complete understanding of the evolutionary history of orangutan SFV. As with previous studies, our results show a similar three-clade orangutan SFV phylogeny, along with strong statistical support for SFV-host co-evolution in orangutans. Using Bayesian inference, we date the origin of orangutan SFV to >4.7 million years ago (mya), while the mixed species clade dates to approximately 1.7mya, >1.6 million years older than the Toba super-eruption. These results, combined with fossil and paleogeographic evidence, suggest that the origin of SFV in Sumatran and Bornean orangutans, including the mixed species clade, likely occurred on the mainland of Indo-China during the Late Pliocene and Calabrian stage of the Pleistocene, respectively.
Collapse
Affiliation(s)
- Michael J C Reid
- Department of Anthropology, University of Toronto Scarborough, 1265 Military Trail, Scarborough, Ontario M1C 1A4, Canada; Department of Anthropology, University of Toronto, 19 Russell Street, Toronto, Ontario M5S 2S2, Canada.
| | - William M Switzer
- Laboratory Branch, Division of HIV/AIDS Prevention, Center for Disease Control and Prevention, Atlanta, GA 30329, USA.
| | - Michael A Schillaci
- Department of Anthropology, University of Toronto Scarborough, 1265 Military Trail, Scarborough, Ontario M1C 1A4, Canada.
| | - Amy R Klegarth
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA; Department of Anthropology, University of Washington, Seattle, WA 98105, USA.
| | - Ellsworth Campbell
- Laboratory Branch, Division of HIV/AIDS Prevention, Center for Disease Control and Prevention, Atlanta, GA 30329, USA.
| | - Manon Ragonnet-Cronin
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, West Mains Road, Edinburgh EH9 3JT, United Kingdom
| | - Isabelle Joanisse
- National HIV & Retrovirology Laboratories, JC Wilt Infectious Diseases Research Centre, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Kyna Caminiti
- Centre for Biosecurity, Public Health Agency of Canada, 100 Colonnade Road, Ottawa, Ontario, Canada.
| | - Carl A Lowenberger
- Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Birute Mary F Galdikas
- Department of Archaeology, Simon Fraser University, Burnaby, British Columbia, Canada; Orangutan Foundation International, 824 S. Wellesley Ave., Los Angeles, CA 90049, USA
| | - Hope Hollocher
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA.
| | - Paul A Sandstrom
- National HIV & Retrovirology Laboratories, JC Wilt Infectious Diseases Research Centre, National Microbiology Laboratory, Public Health Agency of Canada, Ottawa, Ontario, Canada.
| | - James I Brooks
- National HIV & Retrovirology Laboratories, JC Wilt Infectious Diseases Research Centre, National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada; The Ottawa Hospital, Division of Infectious Diseases, Department of Medicine, University of Ottawa, 1053 Carling Ave., Ottawa, ONK1Y 4E9, Canada
| |
Collapse
|
23
|
Abascal F, Corvelo A, Cruz F, Villanueva-Cañas JL, Vlasova A, Marcet-Houben M, Martínez-Cruz B, Cheng JY, Prieto P, Quesada V, Quilez J, Li G, García F, Rubio-Camarillo M, Frias L, Ribeca P, Capella-Gutiérrez S, Rodríguez JM, Câmara F, Lowy E, Cozzuto L, Erb I, Tress ML, Rodriguez-Ales JL, Ruiz-Orera J, Reverter F, Casas-Marce M, Soriano L, Arango JR, Derdak S, Galán B, Blanc J, Gut M, Lorente-Galdos B, Andrés-Nieto M, López-Otín C, Valencia A, Gut I, García JL, Guigó R, Murphy WJ, Ruiz-Herrera A, Marques-Bonet T, Roma G, Notredame C, Mailund T, Albà MM, Gabaldón T, Alioto T, Godoy JA. Extreme genomic erosion after recurrent demographic bottlenecks in the highly endangered Iberian lynx. Genome Biol 2016; 17:251. [PMID: 27964752 PMCID: PMC5155386 DOI: 10.1186/s13059-016-1090-1] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 10/25/2016] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Genomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns of genomic erosion that might limit their viability, and offer tools for their effective conservation. The Iberian lynx (Lynx pardinus) is the most endangered felid and a unique example of a species on the brink of extinction. RESULTS We generate the first annotated draft of the Iberian lynx genome and carry out genome-based analyses of lynx demography, evolution, and population genetics. We identify a series of severe population bottlenecks in the history of the Iberian lynx that predate its known demographic decline during the 20th century and have greatly impacted its genome evolution. We observe drastically reduced rates of weak-to-strong substitutions associated with GC-biased gene conversion and increased rates of fixation of transposable elements. We also find multiple signatures of genetic erosion in the two remnant Iberian lynx populations, including a high frequency of potentially deleterious variants and substitutions, as well as the lowest genome-wide genetic diversity reported so far in any species. CONCLUSIONS The genomic features observed in the Iberian lynx genome may hamper short- and long-term viability through reduced fitness and adaptive potential. The knowledge and resources developed in this study will boost the research on felid evolution and conservation genomics and will benefit the ongoing conservation and management of this emblematic species.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - André Corvelo
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Fernando Cruz
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
- Department of Integrative Ecology, Doñana Biological Station (EBD), Spanish National Research Council (CSIC), C/ Americo Vespucio, s/n, 41092, Sevilla, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - José L Villanueva-Cañas
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Anna Vlasova
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Marina Marcet-Houben
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Begoña Martínez-Cruz
- Department of Integrative Ecology, Doñana Biological Station (EBD), Spanish National Research Council (CSIC), C/ Americo Vespucio, s/n, 41092, Sevilla, Spain
| | - Jade Yu Cheng
- Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, 8000, Aarhus, Denmark
| | - Pablo Prieto
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Víctor Quesada
- Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología (IUOPA), Universidad de Oviedo, 33006, Oviedo, Spain
| | - Javier Quilez
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, 08003, Barcelona, Spain
| | - Gang Li
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine, Texas A&M University, College Station, TX, 77843, USA
| | - Francisca García
- Servei de Cultius Cel.lulars (SCC, SCAC), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Miriam Rubio-Camarillo
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Leonor Frias
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Paolo Ribeca
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Salvador Capella-Gutiérrez
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - José M Rodríguez
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Francisco Câmara
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Ernesto Lowy
- Bioinformatics Core Facility, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Luca Cozzuto
- Bioinformatics Core Facility, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Ionas Erb
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Jose L Rodriguez-Ales
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Jorge Ruiz-Orera
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Ferran Reverter
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Mireia Casas-Marce
- Department of Integrative Ecology, Doñana Biological Station (EBD), Spanish National Research Council (CSIC), C/ Americo Vespucio, s/n, 41092, Sevilla, Spain
| | - Laura Soriano
- Department of Integrative Ecology, Doñana Biological Station (EBD), Spanish National Research Council (CSIC), C/ Americo Vespucio, s/n, 41092, Sevilla, Spain
| | - Javier R Arango
- Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología (IUOPA), Universidad de Oviedo, 33006, Oviedo, Spain
| | - Sophia Derdak
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Beatriz Galán
- Department of Environmental Biology, Center for Biological Research (CIB), Spanish National Research Council (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain
| | - Julie Blanc
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - Belen Lorente-Galdos
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, 08003, Barcelona, Spain
| | - Marta Andrés-Nieto
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Spain
| | - Carlos López-Otín
- Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología (IUOPA), Universidad de Oviedo, 33006, Oviedo, Spain
| | - Alfonso Valencia
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
| | - José L García
- Department of Environmental Biology, Center for Biological Research (CIB), Spanish National Research Council (CSIC), Ramiro de Maeztu 9, 28040, Madrid, Spain
| | - Roderic Guigó
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
- Computational Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - William J Murphy
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine, Texas A&M University, College Station, TX, 77843, USA
| | - Aurora Ruiz-Herrera
- Institut de Biotecnologia i de Biomedicina, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Spain
- Departament de Biologia Cel.lular, Fisiologia i Immunologia, Universitat Autònoma de Barcelona, 08193, Cerdanyola del Vallès, Spain
| | - Tomas Marques-Bonet
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
- Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, 08003, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010, Barcelona, Spain
| | - Guglielmo Roma
- Bioinformatics Core Facility, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Cedric Notredame
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, 8000, Aarhus, Denmark
| | - M Mar Albà
- Evolutionary Genomics Group, Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute (IMIM), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010, Barcelona, Spain
| | - Toni Gabaldón
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010, Barcelona, Spain
| | - Tyler Alioto
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Dr. Aiguader 88, 08003, Barcelona, Spain
| | - José A Godoy
- Department of Integrative Ecology, Doñana Biological Station (EBD), Spanish National Research Council (CSIC), C/ Americo Vespucio, s/n, 41092, Sevilla, Spain.
| |
Collapse
|
24
|
|
25
|
Gautier M, Moazami-Goudarzi K, Levéziel H, Parinello H, Grohs C, Rialle S, Kowalczyk R, Flori L. Deciphering the Wisent Demographic and Adaptive Histories from Individual Whole-Genome Sequences. Mol Biol Evol 2016; 33:2801-2814. [PMID: 27436010 PMCID: PMC5062319 DOI: 10.1093/molbev/msw144] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
As the largest European herbivore, the wisent (Bison bonasus) is emblematic of the continent wildlife but has unclear origins. Here, we infer its demographic and adaptive histories from two individual whole-genome sequences via a detailed comparative analysis with bovine genomes. We estimate that the wisent and bovine species diverged from 1.7 × 106 to 850,000 years before present (YBP) through a speciation process involving an extended period of limited gene flow. Our data further support the occurrence of more recent secondary contacts, posterior to the Bos taurus and Bos indicus divergence (∼150,000 YBP), between the wisent and (European) taurine cattle lineages. Although the wisent and bovine population sizes experienced a similar sharp decline since the Last Glacial Maximum, we find that the wisent demography remained more fluctuating during the Pleistocene. This is in agreement with a scenario in which wisents responded to successive glaciations by habitat fragmentation rather than southward and eastward migration as for the bovine ancestors. We finally detect 423 genes under positive selection between the wisent and bovine lineages, which shed a new light on the genome response to different living conditions (temperature, available food resource, and pathogen exposure) and on the key gene functions altered by the domestication process.
Collapse
Affiliation(s)
- Mathieu Gautier
- CBGP, INRA, CIRAD, IRD, Supagro, Montferrier-sur-Lez, France IBC, Institut de Biologie Computationnelle, Montpellier, France
| | | | | | - Hugues Parinello
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier, France
| | - Cécile Grohs
- GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France
| | - Stéphanie Rialle
- MGX-Montpellier GenomiX, c/o Institut de Génomique Fonctionnelle, Montpellier, France
| | - Rafał Kowalczyk
- Mammal Research Institute, Polish Academy of Sciences, Białowieża, Poland
| | - Laurence Flori
- GABI, INRA, AgroParisTech, Université Paris-Saclay, Jouy-en-Josas, France INTERTRYP, CIRAD, IRD, Montpellier, France
| |
Collapse
|
26
|
Herrera CS, Hirooka Y, Chaverri P. Pseudocospeciation of the mycoparasite Cosmospora with their fungal hosts. Ecol Evol 2016; 6:1504-14. [PMID: 27087926 PMCID: PMC4775519 DOI: 10.1002/ece3.1967] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Revised: 12/28/2015] [Accepted: 01/03/2016] [Indexed: 01/07/2023] Open
Abstract
Species of Cosmospora are parasites of other fungi (mycoparasites), including species belonging to the Xylariales. Based on prior taxonomic work, these fungi were determined to be highly host specific. We suspected that the association of Cosmospora and their hosts could not be a result of random chance, and tested the cospeciation of Cosmospora and the their hosts with contemporary methods (e.g., ParaFit, PACo, and Jane). The cophylogeny of Cosmospora and their hosts was found to be congruent, but only host‐parasite links in more recent evolutionary lineages of the host were determined as coevolutionary. Reconciliation reconstructions determined at least five host‐switch events early in the evolution of Cosmospora. Additionally, the rates of evolution between Cosmospora and their hosts were unequal. This pattern is more likely to be explained by pseudocospeciation (i.e., host switches followed by cospeciation), which also produces congruent cophylogenies.
Collapse
Affiliation(s)
- Cesar S Herrera
- Department of Plant Science and Landscape Architecture University of Maryland 2112 Plant Sciences Building College Park Maryland 20742 United States
| | - Yuuri Hirooka
- Department of Clinical Plant Science, Faculty of Bioscience Hosei University 3-7-2 Kajino-cho Koganei Tokyo Japan
| | - Priscila Chaverri
- Department of Plant Science and Landscape Architecture University of Maryland 2112 Plant Sciences Building College Park Maryland 20742 United States; Escuela de Biología Universidad de Costa Rica Apartado 11501-2060 San Pedro San José Costa Rica
| |
Collapse
|
27
|
Hejase HA, Liu KJ. Mapping the genomic architecture of adaptive traits with interspecific introgressive origin: a coalescent-based approach. BMC Genomics 2016; 17 Suppl 1:8. [PMID: 26819241 PMCID: PMC4895787 DOI: 10.1186/s12864-015-2298-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Recent studies of eukaryotes including human and Neandertal, mice, and butterflies have highlighted the major role that interspecific introgression has played in adaptive trait evolution. A common question arises in each case: what is the genomic architecture of the introgressed traits? One common approach that can be used to address this question is association mapping, which looks for genotypic markers that have significant statistical association with a trait. It is well understood that sample relatedness can be a confounding factor in association mapping studies if not properly accounted for. Introgression and other evolutionary processes (e.g., incomplete lineage sorting) typically introduce variation among local genealogies, which can also differ from global sample structure measured across all genomic loci. In contrast, state-of-the-art association mapping methods assume fixed sample relatedness across the genome, which can lead to spurious inference. We therefore propose a new association mapping method called Coal-Map, which uses coalescent-based models to capture local genealogical variation alongside global sample structure. Using simulated and empirical data reflecting a range of evolutionary scenarios, we compare the performance of Coal-Map against EIGENSTRAT, a leading association mapping method in terms of its popularity, power, and type I error control. Our empirical data makes use of hundreds of mouse genomes for which adaptive interspecific introgression has recently been described. We found that Coal-Map's performance is comparable or better than EIGENSTRAT in terms of statistical power and false positive rate. Coal-Map's performance advantage was greatest on model conditions that most closely resembled empirically observed scenarios of adaptive introgression. These conditions had: (1) causal SNPs contained in one or a few introgressed genomic loci and (2) varying rates of gene flow - from high rates to very low rates where incomplete lineage sorting dominated as a primary cause of local genealogical variation.
Collapse
Affiliation(s)
- Hussein A Hejase
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, 48824, MI, USA.
| | - Kevin J Liu
- Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, East Lansing, 48824, MI, USA.
| |
Collapse
|
28
|
Zhou J, Teo YY. Estimating time to the most recent common ancestor (TMRCA): comparison and application of eight methods. Eur J Hum Genet 2015; 24:1195-201. [PMID: 26669663 DOI: 10.1038/ejhg.2015.258] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 10/19/2015] [Accepted: 10/29/2015] [Indexed: 11/09/2022] Open
Abstract
Investigating how an ancestral population diverges to give rise to distinct subpopulations remains a fundamental pursuit in population genetics. There is broad consensus for the 'Out-of-Africa' hypothesis that states that modern humans arose ∼200 000 years ago in Africa and spread throughout the continent ∼100 000 years ago. This was followed by several waves of major population dispersals across the globe, although the exact nature of the population divergence remains debatable. Existing methods to estimate population divergence time differ in their methodological frameworks and demographic assumptions, and require different types of genetic data as input. These fundamental differences often result in the methods producing inconsistent estimates of the population divergence time, further confounding attempts to robustly uncover the history of human migration, especially when most population genetic studies do not employ multiple methods to estimate the time to the most recent common ancestor (TMRCA). Here, we chose eight popular methods for estimating TMRCA and evaluated their robustness and accuracy in correctly identifying the true TMRCA through a series of simulations that mimicked different evolutionary scenarios. We subsequently applied all eight methods to estimate the population divergence time between Southeast Asian Malays and South Asian Indians using deep whole-genome sequencing data.
Collapse
Affiliation(s)
- Jin Zhou
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Yik-Ying Teo
- Department of Statistics and Applied Probability, National University of Singapore, Singapore.,Saw Swee Hock School of Public Health, National University of Singapore, Singapore.,NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore.,Life Sciences Institute, National University of Singapore, Singapore.,Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore
| |
Collapse
|
29
|
|
30
|
Ramaswamy K, Yik WY, Wang XM, Oliphant EN, Lu W, Shibata D, Ryder OA, Hacia JG. Derivation of induced pluripotent stem cells from orangutan skin fibroblasts. BMC Res Notes 2015; 8:577. [PMID: 26475477 PMCID: PMC4609060 DOI: 10.1186/s13104-015-1567-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 10/07/2015] [Indexed: 01/08/2023] Open
Abstract
Background Orangutans are an endangered species whose natural habitats are restricted to the Southeast Asian islands of Borneo and Sumatra. Along with the African great apes, orangutans are among the closest living relatives to humans. For potential species conservation and functional genomics studies, we derived induced pluripotent stem cells (iPSCs) from cryopreserved somatic cells obtained from captive orangutans. Results Primary skin fibroblasts from two Sumatran orangutans were transduced with retroviral vectors expressing the human OCT4, SOX2, KLF4, and c-MYC factors. Candidate orangutan iPSCs were characterized by global gene expression and DNA copy number analysis. All were consistent with pluripotency and provided no evidence of large genomic insertions or deletions. In addition, orangutan iPSCs were capable of producing cells derived from all three germ layers in vitro through embryoid body differentiation assays and in vivo through teratoma formation in immune-compromised mice. Conclusions We demonstrate that orangutan skin fibroblasts are capable of being reprogrammed into iPSCs with hallmark molecular signatures and differentiation potential. We suggest that reprogramming orangutan somatic cells in genome resource banks could provide new opportunities for advancing assisted reproductive technologies relevant for species conservation efforts. Furthermore, orangutan iPSCs could have applications for investigating the phenotypic relevance of genomic changes that occurred in the human, African great ape, and/or orangutan lineages. This provides opportunities for orangutan cell culture models that would otherwise be impossible to develop from living donors due to the invasive nature of the procedures required for obtaining primary cells. Electronic supplementary material The online version of this article (doi:10.1186/s13104-015-1567-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Krishna Ramaswamy
- Department of Biochemistry and Molecular Biology, University of Southern California, Los Angeles, CA, USA.
| | - Wing Yan Yik
- Department of Biochemistry and Molecular Biology, University of Southern California, Los Angeles, CA, USA.
| | - Xiao-Ming Wang
- Department of Biochemistry and Molecular Biology, University of Southern California, Los Angeles, CA, USA.
| | - Erin N Oliphant
- Department of Biochemistry and Molecular Biology, University of Southern California, Los Angeles, CA, USA.
| | - Wange Lu
- Department of Biochemistry and Molecular Biology, University of Southern California, Los Angeles, CA, USA.
| | - Darryl Shibata
- Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Oliver A Ryder
- San Diego Zoo Institute for Conservation Research , San Diego Zoo Global, San Diego, CA, USA.
| | - Joseph G Hacia
- Department of Biochemistry and Molecular Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
31
|
Chen H. Population genetic studies in the genomic sequencing era. DONG WU XUE YAN JIU = ZOOLOGICAL RESEARCH 2015; 36:223-32. [PMID: 26228473 DOI: 10.13918/j.issn.2095-8137.2015.4.223] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Recent advances in high-throughput sequencing technologies have revolutionized the field of population genetics. Data now routinely contain genomic level polymorphism information, and the low cost of DNA sequencing enables researchers to investigate tens of thousands of subjects at a time. This provides an unprecedented opportunity to address fundamental evolutionary questions, while posing challenges on traditional population genetic theories and methods. This review provides an overview of the recent methodological developments in the field of population genetics, specifically methods used to infer ancient population history and investigate natural selection using large-sample, large-scale genetic data. Several open questions are also discussed at the end of the review.
Collapse
Affiliation(s)
- Hua Chen
- Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101,
| |
Collapse
|
32
|
Cheng JY, Mailund T. Ancestral population genomics using coalescence hidden Markov models and heuristic optimisation algorithms. Comput Biol Chem 2015; 57:80-92. [PMID: 25819138 DOI: 10.1016/j.compbiolchem.2015.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Accepted: 02/02/2015] [Indexed: 10/23/2022]
Abstract
With full genome data from several closely related species now readily available, we have the ultimate data for demographic inference. Exploiting these full genomes, however, requires models that can explicitly model recombination along alignments of full chromosomal length. Over the last decade a class of models, based on the sequential Markov coalescence model combined with hidden Markov models, has been developed and used to make inference in simple demographic scenarios. To move forward to more complex demographic modelling we need better and more automated ways of specifying these models and efficient optimisation algorithms for inferring the parameters in complex and often high-dimensional models. In this paper we present a framework for building such coalescence hidden Markov models for pairwise alignments and present results for using heuristic optimisation algorithms for parameter estimation. We show that we can build more complex demographic models than our previous frameworks and that we obtain more accurate parameter estimates using heuristic optimisation algorithms than when using our previous gradient based approaches. Our new framework provides a flexible way of constructing coalescence hidden Markov models almost automatically. While estimating parameters in more complex models is still challenging we show that using heuristic optimisation algorithms we still get a fairly good accuracy.
Collapse
Affiliation(s)
- Jade Yu Cheng
- Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, 8000 Aarhus, Denmark.
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, 8000 Aarhus, Denmark.
| |
Collapse
|
33
|
The SMC' is a highly accurate approximation to the ancestral recombination graph. Genetics 2015; 200:343-55. [PMID: 25786855 DOI: 10.1534/genetics.114.173898] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 03/12/2015] [Indexed: 11/18/2022] Open
Abstract
Two sequentially Markov coalescent models (SMC and SMC') are available as tractable approximations to the ancestral recombination graph (ARG). We present a Markov process describing coalescence at two fixed points along a pair of sequences evolving under the SMC'. Using our Markov process, we derive a number of new quantities related to the pairwise SMC', thereby analytically quantifying for the first time the similarity between the SMC' and the ARG. We use our process to show that the joint distribution of pairwise coalescence times at recombination sites under the SMC' is the same as it is marginally under the ARG, which demonstrates that the SMC' is, in a particular well-defined, intuitive sense, the most appropriate first-order sequentially Markov approximation to the ARG. Finally, we use these results to show that population size estimates under the pairwise SMC are asymptotically biased, while under the pairwise SMC' they are approximately asymptotically unbiased.
Collapse
|
34
|
Kumagai S, Uyenoyama MK. Genealogical histories in structured populations. Theor Popul Biol 2015; 102:3-15. [PMID: 25770971 DOI: 10.1016/j.tpb.2015.01.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Revised: 12/13/2014] [Accepted: 01/29/2015] [Indexed: 11/28/2022]
Abstract
In genealogies of genes sampled from structured populations, lineages coalesce at rates dependent on the states of the lineages. For migration and coalescence events occurring on comparable time scales, for example, only lineages residing in the same deme of a geographically subdivided population can have descended from a common ancestor in the immediately preceding generation. Here, we explore aspects of genealogical structure in a population comprising two demes, between which migration may occur. We use generating functions to obtain exact densities and moments of coalescence time, number of mutations, total tree length, and age of the most recent common ancestor of the sample. We describe qualitative features of the distribution of gene genealogies, including factors that influence the geographical location of the most recent common ancestor and departures of the distribution of internode lengths from exponential.
Collapse
Affiliation(s)
- Seiji Kumagai
- Department of Biology, Box 90338, Duke University, Durham, NC 27708-0338, USA
| | - Marcy K Uyenoyama
- Department of Biology, Box 90338, Duke University, Durham, NC 27708-0338, USA.
| |
Collapse
|
35
|
Moaeen-ud-Din M, Bilal G. Sequence diversity and molecular evolutionary rates between buffalo and cattle. J Anim Breed Genet 2015; 132:74-84. [PMID: 25619307 DOI: 10.1111/jbg.12100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 05/12/2014] [Indexed: 12/19/2022]
Abstract
Identification of genes of importance regarding production traits in buffalo is impaired by a paucity of genomic resources. Choice to fill this gap is to exploit data available for cow. The cross-species application of comparative genomics tools is potential gear to investigate the buffalo genome. However, this is dependent on nucleotide sequences similarity. In this study, gene diversity between buffalo and cattle was determined using 86 gene orthologues. There was approximately 3% difference in all genes in terms of nucleotide diversity and 0.267 ± 0.134 in amino acids, indicating the possibility for successfully using cross-species strategies for genomic studies. There were significantly higher non-synonymous substitutions both in cattle and buffalo; however, there was similar difference in terms of dN- dS (4.414 versus 4.745) in buffalo and cattle, respectively. Higher rate of non-synonymous substitutions at similar level in buffalo and cattle indicated a similar positive selection pressure. Results for relative rate test were assessed with the chi-squared test. There was no significance difference on unique mutations between cattle and buffalo lineages at synonymous sites. However, there was a significance difference on unique mutations for non-synonymous sites, indicating ongoing mutagenic process that generates substitutional mutation at approximately the same rate at silent sites. Moreover, despite of common ancestry, our results indicate a different divergent time among genes of cattle and buffalo. This is the first demonstration that variable rates of molecular evolution may be present within the family Bovidae.
Collapse
Affiliation(s)
- M Moaeen-ud-Din
- Laboratories of Animal Breeding & Genetics, Faculty of Veterinary & Animal Sciences, PMAS-Arid Agriculture University, Rawalpindi, Pakistan
| | | |
Collapse
|
36
|
Nater A, Greminger MP, Arora N, van Schaik CP, Goossens B, Singleton I, Verschoor EJ, Warren KS, Krützen M. Reconstructing the demographic history of orang-utans using Approximate Bayesian Computation. Mol Ecol 2015; 24:310-27. [PMID: 25439562 DOI: 10.1111/mec.13027] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Revised: 11/24/2014] [Accepted: 11/27/2014] [Indexed: 11/27/2022]
Abstract
Investigating how different evolutionary forces have shaped patterns of DNA variation within and among species requires detailed knowledge of their demographic history. Orang-utans, whose distribution is currently restricted to the South-East Asian islands of Borneo (Pongo pygmaeus) and Sumatra (Pongo abelii), have likely experienced a complex demographic history, influenced by recurrent changes in climate and sea levels, volcanic activities and anthropogenic pressures. Using the most extensive sample set of wild orang-utans to date, we employed an Approximate Bayesian Computation (ABC) approach to test the fit of 12 different demographic scenarios to the observed patterns of variation in autosomal, X-chromosomal, mitochondrial and Y-chromosomal markers. In the best-fitting model, Sumatran orang-utans exhibit a deep split of populations north and south of Lake Toba, probably caused by multiple eruptions of the Toba volcano. In addition, we found signals for a strong decline in all Sumatran populations ~24 ka, probably associated with hunting by human colonizers. In contrast, Bornean orang-utans experienced a severe bottleneck ~135 ka, followed by a population expansion and substructuring starting ~82 ka, which we link to an expansion from a glacial refugium. We showed that orang-utans went through drastic changes in population size and connectedness, caused by recurrent contraction and expansion of rainforest habitat during Pleistocene glaciations and probably hunting by early humans. Our findings emphasize the fact that important aspects of the evolutionary past of species with complex demographic histories might remain obscured when applying overly simplified models.
Collapse
Affiliation(s)
- Alexander Nater
- Anthropological Institute & Museum, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland
| | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Hobolth A, Jensen JL. Markovian approximation to the finite loci coalescent with recombination along multiple sequences. Theor Popul Biol 2014; 98:48-58. [DOI: 10.1016/j.tpb.2014.01.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 10/23/2013] [Accepted: 01/18/2014] [Indexed: 10/25/2022]
|
38
|
Whole-genome sequencing of the snub-nosed monkey provides insights into folivory and evolutionary history. Nat Genet 2014; 46:1303-10. [DOI: 10.1038/ng.3137] [Citation(s) in RCA: 137] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Accepted: 10/09/2014] [Indexed: 11/08/2022]
|
39
|
Dasmeh P, Serohijos AWR, Kepp KP, Shakhnovich EI. The influence of selection for protein stability on dN/dS estimations. Genome Biol Evol 2014; 6:2956-67. [PMID: 25355808 PMCID: PMC4224349 DOI: 10.1093/gbe/evu223] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Understanding the relative contributions of various evolutionary processes-purifying selection, neutral drift, and adaptation-is fundamental to evolutionary biology. A common metric to distinguish these processes is the ratio of nonsynonymous to synonymous substitutions (i.e., dN/dS) interpreted from the neutral theory as a null model. However, from biophysical considerations, mutations have non-negligible effects on the biophysical properties of proteins such as folding stability. In this work, we investigated how stability affects the rate of protein evolution in phylogenetic trees by using simulations that combine explicit protein sequences with associated stability changes. We first simulated myoglobin evolution in phylogenetic trees with a biophysically realistic approach that accounts for 3D structural information and estimates of changes in stability upon mutation. We then compared evolutionary rates inferred directly from simulation to those estimated using maximum-likelihood (ML) methods. We found that the dN/dS estimated by ML methods (ωML) is highly predictive of the per gene dN/dS inferred from the simulated phylogenetic trees. This agreement is strong in the regime of high stability where protein evolution is neutral. At low folding stabilities and under mutation-selection balance, we observe deviations from neutrality (per gene dN/dS > 1 and dN/dS < 1). We showed that although per gene dN/dS is robust to these deviations, ML tests for positive selection detect statistically significant per site dN/dS > 1. Altogether, we show how protein biophysics affects the dN/dS estimations and its subsequent interpretation. These results are important for improving the current approaches for detecting positive selection.
Collapse
Affiliation(s)
- Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University DTU Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark Present address: Max Planck Institute of Immunobiology and Epigenetics, Stübeweg, Freiburg, Germany
| | | | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Kongens Lyngby, Denmark
| | | |
Collapse
|
40
|
Abstract
Recombination allows different parts of the genome to have different genealogical histories. When a species splits in two, allelic lineages sort into the two descendant species, and this lineage sorting varies along the genome. If speciation events are close in time, the lineage sorting process may be incomplete at the second speciation event and lead to gene genealogies that do not match the species phylogeny. We review different recent approaches to model lineage sorting along the genome and show how it is possible to learn about population sizes, natural selection, and recombination rates in ancestral species from application of these models to genome alignments of great ape species.
Collapse
Affiliation(s)
- Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark; , ,
| | | | | |
Collapse
|
41
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|
42
|
Abstract
Recombination maps of ancestral species can be constructed from comparative analyses of genomes from closely related species, exemplified by a recently published map of the human-chimpanzee ancestor. Such maps resolve differences in recombination rate between species into changes along individual branches in the speciation tree, and allow identification of associated changes in the genomic sequences. We describe how coalescent hidden Markov models are able to call individual recombination events in ancestral species through inference of incomplete lineage sorting along a genomic alignment. In the great apes, speciation events are sufficiently close in time that a map can be inferred for the ancestral species at each internal branch - allowing evolution of recombination rate to be tracked over evolutionary time scales from speciation event to speciation event. We see this approach as a way of characterizing the evolution of recombination rate and the genomic properties that influence it.
Collapse
|
43
|
Liu KJ, Dai J, Truong K, Song Y, Kohn MH, Nakhleh L. An HMM-based comparative genomic framework for detecting introgression in eukaryotes. PLoS Comput Biol 2014; 10:e1003649. [PMID: 24922281 PMCID: PMC4055573 DOI: 10.1371/journal.pcbi.1003649] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2013] [Accepted: 04/14/2014] [Indexed: 12/20/2022] Open
Abstract
One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on PhyloNet-HMM—a new comparative genomic framework for detecting introgression in genomes. PhyloNet-HMM combines phylogenetic networks with hidden Markov models (HMMs) to simultaneously capture the (potentially reticulate) evolutionary history of the genomes and dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci. Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus) genome detected a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgressed genomic regions. Based on our analysis, it is estimated that about 9% of all sites within chromosome 7 are of introgressive origin (these cover about 13 Mbp of chromosome 7, and over 300 genes). Further, our model detected no introgression in a negative control data set. We also found that our model accurately detected introgression and other evolutionary processes from synthetic data sets simulated under the coalescent model with recombination, isolation, and migration. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism. Hybridization is the mating between individuals from two different species. While hybridization introduces genetic material into a host genome, this genetic material may be transient and is purged from the population within a few generations after hybridization. However, in other cases, the introduced genetic material persists in the population—a process known as introgression—and can have significant evolutionary implications. In this paper, we introduce a novel method for detecting introgression in genomes using a comparative genomic approach. The method scans multiple aligned genomes for signatures of introgression by incorporating phylogenetic networks and hidden Markov models. The method allows for teasing apart true signatures of introgression from spurious ones that arise due to population effects and resemble those of introgression. Using the new method, we analyzed two sets of variation data from chromosome 7 in mouse genomes. The method detected previously reported introgressed regions as well as new ones in one of the data sets. In the other data set, which was selected as a negative control, the method detected no introgression. Furthermore, our method accurately detected introgression in simulated evolutionary scenarios and accurately inferred related population genetic quantities. Our method enables systematic comparative analyses of genomes where introgression is suspected, and can work with genome-wide data.
Collapse
Affiliation(s)
- Kevin J. Liu
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
- * E-mail: (KJL); (LN)
| | - Jingxuan Dai
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Kathy Truong
- Department of Computer Science, Rice University, Houston, Texas, United States of America
| | - Ying Song
- The State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Michael H. Kohn
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
| | - Luay Nakhleh
- Department of Computer Science, Rice University, Houston, Texas, United States of America
- Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas, United States of America
- * E-mail: (KJL); (LN)
| |
Collapse
|
44
|
Rasmussen MD, Hubisz MJ, Gronau I, Siepel A. Genome-wide inference of ancestral recombination graphs. PLoS Genet 2014; 10:e1004342. [PMID: 24831947 PMCID: PMC4022496 DOI: 10.1371/journal.pgen.1004342] [Citation(s) in RCA: 176] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Accepted: 03/17/2014] [Indexed: 01/23/2023] Open
Abstract
The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.
Collapse
Affiliation(s)
- Matthew D. Rasmussen
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail: (MDR); (AS)
| | - Melissa J. Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambs, United Kingdom
- * E-mail: (MDR); (AS)
| |
Collapse
|
45
|
Harris K, Sheehan S, Kamm JA, Song YS. Decoding coalescent hidden Markov models in linear time. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2014; 8394:100-114. [PMID: 25340178 DOI: 10.1007/978-3-319-05269-4_8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.
Collapse
Affiliation(s)
- Kelley Harris
- Department of Mathematics, University of California, Berkeley
| | - Sara Sheehan
- Computer Science Division, University of California, Berkeley
| | - John A Kamm
- Department of Statistics, University of California, Berkeley
| | - Yun S Song
- Department of Integrative Biology, University of California, Berkeley
| |
Collapse
|
46
|
Bedoya-Reina OC, Ratan A, Burhans R, Kim HL, Giardine B, Riemer C, Li Q, Olson TL, Loughran TP, Vonholdt BM, Perry GH, Schuster SC, Miller W. Galaxy tools to study genome diversity. Gigascience 2013; 2:17. [PMID: 24377391 PMCID: PMC3877877 DOI: 10.1186/2047-217x-2-17] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 12/12/2013] [Indexed: 12/02/2022] Open
Abstract
Background Intra-species genetic variation can be used to investigate population structure, selection, and gene flow in non-model vertebrates; and due to the plummeting costs for genome sequencing, it is now possible for small labs to obtain full-genome variation data from their species of interest. However, those labs may not have easy access to, and familiarity with, computational tools to analyze those data. Results We have created a suite of tools for the Galaxy web server aimed at handling nucleotide and amino-acid polymorphisms discovered by full-genome sequencing of several individuals of the same species, or using a SNP genotyping microarray. In addition to providing user-friendly tools, a main goal is to make published analyses reproducible. While most of the examples discussed in this paper deal with nuclear-genome diversity in non-human vertebrates, we also illustrate the application of the tools to fungal genomes, human biomedical data, and mitochondrial sequences. Conclusions This project illustrates that a small group can design, implement, test, document, and distribute a Galaxy tool collection to meet the needs of a particular community of biologists.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Webb Miller
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
47
|
Sand A, Kristiansen M, Pedersen CNS, Mailund T. zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm. BMC Bioinformatics 2013; 14:339. [PMID: 24266924 PMCID: PMC4222747 DOI: 10.1186/1471-2105-14-339] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 11/14/2013] [Indexed: 11/10/2022] Open
Abstract
Background Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. Results We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models. Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. Conclusions We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.
Collapse
Affiliation(s)
- Andreas Sand
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark.
| | | | | | | |
Collapse
|
48
|
Tataru P, Sand A, Hobolth A, Mailund T, Pedersen CNS. Algorithms for hidden markov models restricted to occurrences of regular expressions. BIOLOGY 2013; 2:1282-95. [PMID: 24833225 PMCID: PMC4009796 DOI: 10.3390/biology2041282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2013] [Revised: 10/08/2013] [Accepted: 11/05/2013] [Indexed: 11/24/2022]
Abstract
Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model.
Collapse
Affiliation(s)
- Paula Tataru
- Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, DK-8000 Aarhus C, Denmark.
| | - Andreas Sand
- Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, DK-8000 Aarhus C, Denmark.
| | - Asger Hobolth
- Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, DK-8000 Aarhus C, Denmark.
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, DK-8000 Aarhus C, Denmark.
| | - Christian N S Pedersen
- Bioinformatics Research Centre, Aarhus University, C. F. Møllers Allé 8, DK-8000 Aarhus C, Denmark.
| |
Collapse
|
49
|
Ma X, Kelley JL, Eilertson K, Musharoff S, Degenhardt JD, Martins AL, Vinar T, Kosiol C, Siepel A, Gutenkunst RN, Bustamante CD. Population genomic analysis reveals a rich speciation and demographic history of orang-utans (Pongo pygmaeus and Pongo abelii). PLoS One 2013; 8:e77175. [PMID: 24194868 PMCID: PMC3806739 DOI: 10.1371/journal.pone.0077175] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Accepted: 08/30/2013] [Indexed: 12/04/2022] Open
Abstract
To gain insights into evolutionary forces that have shaped the history of Bornean and Sumatran populations of orang-utans, we compare patterns of variation across more than 11 million single nucleotide polymorphisms found by previous mitochondrial and autosomal genome sequencing of 10 wild-caught orang-utans. Our analysis of the mitochondrial data yields a far more ancient split time between the two populations (~3.4 million years ago) than estimates based on autosomal data (0.4 million years ago), suggesting a complex speciation process with moderate levels of primarily male migration. We find that the distribution of selection coefficients consistent with the observed frequency spectrum of autosomal non-synonymous polymorphisms in orang-utans is similar to the distribution in humans. Our analysis indicates that 35% of genes have evolved under detectable negative selection. Overall, our findings suggest that purifying natural selection, genetic drift, and a complex demographic history are the dominant drivers of genome evolution for the two orang-utan populations.
Collapse
Affiliation(s)
- Xin Ma
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Joanna L. Kelley
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Kirsten Eilertson
- Bioinformatics Core, Gladstone Institutes, San Francisco, California, United States of America
| | - Shaila Musharoff
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Jeremiah D. Degenhardt
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - André L. Martins
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Tomas Vinar
- Department of Applied Informatics, Comenius University, Bratislava, Slovakia
| | - Carolin Kosiol
- Institute of Population Genetics, Vetmeduni Vienna, Vienna, Austria
| | - Adam Siepel
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Ryan N. Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona, United States of America
| | - Carlos D. Bustamante
- Department of Genetics, Stanford University, Stanford, California, United States of America
| |
Collapse
|
50
|
Mathew LA, Staab PR, Rose LE, Metzler D. Why to account for finite sites in population genetic studies and how to do this with Jaatha 2.0. Ecol Evol 2013; 3:3647-62. [PMID: 24198930 PMCID: PMC3810865 DOI: 10.1002/ece3.722] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2013] [Revised: 06/20/2013] [Accepted: 06/23/2013] [Indexed: 11/06/2022] Open
Abstract
With the advent of next-generation sequencing technologies, large data sets of several thousand loci from multiple conspecific individuals are available. Such data sets should make it possible to obtain accurate estimates of population genetic parameters, even for complex models of population history. In the analyses of large data sets, it is difficult to consider finite-sites mutation models (FSMs). Here, we use extensive simulations to demonstrate that the inclusion of FSMs is necessary to avoid severe biases in the estimation of the population mutation rate θ, population divergence times, and migration rates. We present a new version of Jaatha, an efficient composite-likelihood method for estimating demographic parameters from population genetic data and evaluate the usefulness of Jaatha in two biological examples. For the first application, we infer the speciation process of two wild tomato species, Solanum chilense and Solanum peruvianum. In our second application example, we demonstrate that Jaatha is readily applicable to NGS data by analyzing genome-wide data from two southern European populations of Arabidopsis thaliana. Jaatha is now freely available as an R package from the Comprehensive R Archive Network (CRAN).
Collapse
Affiliation(s)
- Lisha A Mathew
- Life Sciences, École Polytechnique Fédérale de Lausanne Lausanne, Switzerland ; Swiss Institiute of Bioinformatics (SIB) Lausanne, Switzerland
| | | | | | | |
Collapse
|