1
|
Goli RC, Chishi KG, Ganguly I, Singh S, Dixit S, Rathi P, Diwakar V, Sree C C, Limbalkar OM, Sukhija N, Kanaka K. Global and Local Ancestry and its Importance: A Review. Curr Genomics 2024; 25:237-260. [PMID: 39156729 PMCID: PMC11327809 DOI: 10.2174/0113892029298909240426094055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 03/02/2024] [Accepted: 03/11/2024] [Indexed: 08/20/2024] Open
Abstract
The fastest way to significantly change the composition of a population is through admixture, an evolutionary mechanism. In animal breeding history, genetic admixture has provided both short-term and long-term advantages by utilizing the phenomenon of complementarity and heterosis in several traits and genetic diversity, respectively. The traditional method of admixture analysis by pedigree records has now been replaced greatly by genome-wide marker data that enables more precise estimations. Among these markers, SNPs have been the popular choice since they are cost-effective, not so laborious, and automation of genotyping is easy. Certain markers can suggest the possibility of a population's origin from a sample of DNA where the source individual is unknown or unwilling to disclose their lineage, which are called Ancestry-Informative Markers (AIMs). Revealing admixture level at the locus-specific level is termed as local ancestry and can be exploited to identify signs of recent selective response and can account for genetic drift. Considering the importance of genetic admixture and local ancestry, in this mini-review, both concepts are illustrated, encompassing basics, their estimation/identification methods, tools/software used and their applications.
Collapse
Affiliation(s)
| | - Kiyevi G. Chishi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Indrajit Ganguly
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Sanjeev Singh
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - S.P. Dixit
- ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, Haryana, India
| | - Pallavi Rathi
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Vikas Diwakar
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | - Chandana Sree C
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
| | | | - Nidhi Sukhija
- ICAR-National Dairy Research Institute, Karnal, 132001, Haryana, India
- Central Tasar Research and Training Institute, Ranchi, 835303, Jharkhand, India
| | - K.K Kanaka
- ICAR- Indian Institute of Agricultural Biotechnology, Ranchi, 834010, Jharkhand, India
| |
Collapse
|
2
|
Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLoS Comput Biol 2016; 12:e1004842. [PMID: 27145223 PMCID: PMC4856371 DOI: 10.1371/journal.pcbi.1004842] [Citation(s) in RCA: 382] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 03/02/2016] [Indexed: 01/23/2023] Open
Abstract
A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods. Our understanding of the distribution of genetic variation in natural populations has been driven by mathematical models of the underlying biological and demographic processes. A key strength of such coalescent models is that they enable efficient simulation of data we might see under a variety of evolutionary scenarios. However, current methods are not well suited to simulating genome-scale data sets on hundreds of thousands of samples, which is essential if we are to understand the data generated by population-scale sequencing projects. Similarly, processing the results of large simulations also presents researchers with a major challenge, as it can take many days just to read the data files. In this paper we solve these problems by introducing a new way to represent information about the ancestral process. This new representation leads to huge gains in simulation speed and storage efficiency so that large simulations complete in minutes and the output files can be processed in seconds.
Collapse
Affiliation(s)
- Jerome Kelleher
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | | | - Gilean McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
3
|
Arenas M. The importance and application of the ancestral recombination graph. Front Genet 2013; 4:206. [PMID: 24133504 PMCID: PMC3796270 DOI: 10.3389/fgene.2013.00206] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2013] [Accepted: 09/24/2013] [Indexed: 11/13/2022] Open
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology “Severo Ochoa,” Consejo Superior de Investigaciones Científicas, Universidad Autónoma de MadridMadrid, Spain
| |
Collapse
|
4
|
Arenas M. Computer programs and methodologies for the simulation of DNA sequence data with recombination. Front Genet 2013; 4:9. [PMID: 23378848 PMCID: PMC3561691 DOI: 10.3389/fgene.2013.00009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 01/17/2013] [Indexed: 11/13/2022] Open
Abstract
Computer simulations are useful in evolutionary biology for hypothesis testing, to verify analytical methods, to analyze interactions among evolutionary processes, and to estimate evolutionary parameters. In particular, the simulation of DNA sequences with recombination may help in understanding the role of recombination in diverse evolutionary questions, such as the genome structure. Consequently, plenty of computer simulators have been developed to simulate DNA sequence data with recombination. However, the choice of an appropriate tool, among all currently available simulators, is critical if recombination simulations are to be biologically meaningful. This review provides a practical survival guide to commonly used computer programs and methodologies for the simulation of coding and non-coding DNA sequences with recombination. It may help in the correct design of computer simulation experiments of recombination. In addition, the study includes a review of simulation studies investigating the impact of ignoring recombination when performing various evolutionary analyses, such as phylogenetic tree and ancestral sequence reconstructions. Alternative analytical methodologies accounting for recombination are also reviewed.
Collapse
Affiliation(s)
- Miguel Arenas
- Centre for Molecular Biology "Severo Ochoa," Consejo Superior de Investigaciones Científicas Madrid, Spain
| |
Collapse
|
5
|
Buendia P, Narasimhan G. Sliding MinPD: building evolutionary networks of serial samples via an automated recombination detection approach. Bioinformatics 2007; 23:2993-3000. [PMID: 17717035 PMCID: PMC3187926 DOI: 10.1093/bioinformatics/btm413] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Traditional phylogenetic methods assume tree-like evolutionary models and are likely to perform poorly when provided with sequence data from fast-evolving, recombining viruses. Furthermore, these methods assume that all the sequence data are from contemporaneous taxa, which is not valid for serially-sampled data. A more general approach is proposed here, referred to as the Sliding MinPD method, that reconstructs evolutionary networks for serially-sampled sequences in the presence of recombination. RESULTS Sliding MinPD combines distance-based phylogenetic methods with automated recombination detection based on the best-known sliding window approaches to reconstruct serial evolutionary networks. Its performance was evaluated through comprehensive simulation studies and was also applied to a set of serially-sampled HIV sequences from a single patient. The resulting network organizations reveal unique patterns of viral evolution and may help explain the emergence of disease-associated mutants and drug-resistant strains with implications for patient prognosis and treatment strategies.
Collapse
|