Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Intarapanich A, Shaw PJ, Assawamakin A, Wangkumhang P, Ngamphiw C, Chaichoompu K, Piriyapongsa J, Tongsima S. Iterative pruning PCA improves resolution of highly structured populations. BMC Bioinformatics 2009;10:382. [PMID: 19930644 PMCID: PMC2790469 DOI: 10.1186/1471-2105-10-382] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2009] [Accepted: 11/23/2009] [Indexed: 12/12/2022] Open

For:	Intarapanich A, Shaw PJ, Assawamakin A, Wangkumhang P, Ngamphiw C, Chaichoompu K, Piriyapongsa J, Tongsima S. Iterative pruning PCA improves resolution of highly structured populations. BMC Bioinformatics 2009;10:382. [PMID: 19930644 PMCID: PMC2790469 DOI: 10.1186/1471-2105-10-382] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2009] [Accepted: 11/23/2009] [Indexed: 12/12/2022] Open

Number

Cited by Other Article(s)

Smaragdov MG, Kudinov AA. Assessing the power of principal components and wright's fixation index analyzes applied to reveal the genome-wide genetic differences between herds of Holstein cows. BMC Genet 2020;21:47. [PMID: 32345235 PMCID: PMC7189535 DOI: 10.1186/s12863-020-00848-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 03/27/2020] [Indexed: 11/30/2022] Open

Abstract

Background

Due to the advent of SNP array technology, a genome-wide analysis of genetic differences between populations and breeds has become possible at a previously unattainable level. The Wright’s fixation index (F_st) and the principal component analysis (PCA) are widely used methods in animal genetics studies. In paper we compared the power of these methods, their complementing each other and which of them is the most powerful.

Results

Comparative analysis of the power Principal Components Analysis (PCA) and F_st were carried out to reveal genetic differences between herds of Holsteinized cows. Totally, 803 BovineSNP50 genotypes of cows from 13 herds were used in current study. Obtained F_st values were in the range of 0.002–0.012 (mean 0.0049) while for rare SNPs with MAF 0.0001–0.005 they were even smaller in the range of 0.001–0.01 (mean 0.0027). Genetic relatedness of the cows in the herds was the cause of such small F_st values. The contribution of rare alleles with MAF 0.0001–0.01 to the F_st values was much less than common alleles and this effect depends on linkage disequilibrium (LD). Despite of substantial change in the MAF spectrum and the number of SNPs we observed small effect size of LD - based pruning on F_st data. PCA analysis confirmed the mutual admixture and small genetic difference between herds. Moreover, PCA analysis of the herds based on the visualization the results of a single eigenvector cannot be used to significantly differentiate herds. Only summed eigenvectors should be used to realize full power of PCA to differentiate small between herds genetic difference. Finally, we presented evidences that the significance of F_st data far exceeds the significance of PCA data when these methods are used to reveal genetic differences between herds.

Conclusions

LD - based pruning had a small effect on findings of F_st and PCA analyzes. Therefore, for weakly structured populations the LD - based pruning is not effective. In addition, our results show that the significance of genetic differences between herds obtained by F_st analysis exceeds the values of PCA. Proposed, to differentiate herds or low structured populations we recommend primarily using the F_st approach and only then PCA.

Collapse

Yahya P, Sulong S, Harun A, Wangkumhang P, Wilantho A, Ngamphiw C, Tongsima S, Zilfalil BA. Ancestry-informative marker (AIM) SNP panel for the Malay population. Int J Legal Med 2019;134:123-134. [PMID: 31760471 DOI: 10.1007/s00414-019-02184-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 10/15/2019] [Indexed: 10/25/2022]

Chaichoompu K, Abegaz F, Cavadas B, Fernandes V, Müller-Myhsok B, Pereira L, Van Steen K. A different view on fine-scale population structure in Western African populations. Hum Genet 2019;139:45-59. [PMID: 31630246 PMCID: PMC6942040 DOI: 10.1007/s00439-019-02069-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 10/09/2019] [Indexed: 01/03/2023]

Tvedebrink T, Eriksen PS. Inference of admixed ancestry with Ancestry Informative Markers. Forensic Sci Int Genet 2019;42:147-153. [DOI: 10.1016/j.fsigen.2019.06.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Revised: 05/29/2019] [Accepted: 06/18/2019] [Indexed: 01/26/2023]

Chaichoompu K, Abegaz F, Tongsima S, Shaw PJ, Sakuntabhai A, Pereira L, Van Steen K. IPCAPS: an R package for iterative pruning to capture population structure. SOURCE CODE FOR BIOLOGY AND MEDICINE 2019;14:2. [PMID: 30936940 PMCID: PMC6427891 DOI: 10.1186/s13029-019-0072-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 02/21/2019] [Indexed: 01/29/2023]

Cheung EY, Gahan ME, McNevin D. Prediction of biogeographical ancestry in admixed individuals. Forensic Sci Int Genet 2018;36:104-111. [DOI: 10.1016/j.fsigen.2018.06.013] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 05/09/2018] [Accepted: 06/20/2018] [Indexed: 12/14/2022]

Alhusain L, Hafez AM. Nonparametric approaches for population structure analysis. Hum Genomics 2018;12:25. [PMID: 29743099 PMCID: PMC5944014 DOI: 10.1186/s40246-018-0156-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Accepted: 04/24/2018] [Indexed: 12/28/2022] Open

Yahya P, Sulong S, Harun A, Wan Isa H, Ab Rajab NS, Wangkumhang P, Wilantho A, Ngamphiw C, Tongsima S, Zilfalil BA. Analysis of the genetic structure of the Malay population: Ancestry-informative marker SNPs in the Malay of Peninsular Malaysia. Forensic Sci Int Genet 2017;30:152-159. [DOI: 10.1016/j.fsigen.2017.07.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 06/23/2017] [Accepted: 07/10/2017] [Indexed: 12/27/2022]

A comparison of DMET Plus microarray and genome-wide technologies by assessing population substructure. Pharmacogenet Genomics 2016;26:147-153. [PMID: 26731477 DOI: 10.1097/fpc.0000000000000200] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Abstract

OBJECTIVE

The capacity of the Affymetrix drug metabolism enzymes and transporters (DMET) Plus pharmacogenomics genotyping chip to estimate population substructure and cryptic relatedness was evaluated. The results were compared with estimates using genome-wide HapMap data for the same individuals.

METHODS

For 301 unrelated individuals, spanning three continental populations and one admixed population, genotypic data were collected using the Affymetrix DMET Plus microarray. Genome-wide data on these individuals were obtained from HapMap release 3. Population substructure was assessed using Eigenstrat and ADMIXTURE software for both platforms. Cryptic relatedness was explored by inbreeding coefficient estimation. Nonparametric tests were used to determine correlations of the analytical results of the two genotyping platforms.

RESULTS

Principal components analysis identified population substructure for both datasets, with 15.8 and 16.6% of the total variance explained in the first two principal components for DMET Plus and HapMap data, respectively. ADMIXTURE results correctly identified four subpopulations within each dataset. Nonparametric rank correlations indicated significant associations between analyses with an average ρ=0.7272 (P<10) across the three continental populations and ρ=0.4888 for the admixed population. Concordance correlation coefficients (average ρc=0.9693 across all four subpopulations) strongly indicate concordance between ADMIXTURE results. Inbreeding coefficients were slightly inflated (16 individuals>0.15) using DMET Plus data and no cryptic relatedness was indicated using HapMap data. The inflated inbreeding estimation could be because of the limited number of markers provided by DMET as a random sample of 1832 markers from HapMap also yielded inflated estimates of cryptic relatedness (39 individuals>0.15). Furthermore, use of single nucleotide polymorphisms located in genes involved in metabolism and transport may have different allele frequencies in subpopulations than single nucleotide polymorphisms sampled from the whole genome.

CONCLUSION

The DMET Plus pharmacogenomics genotyping chip is effective in quantifying population substructure across the three continental populations and inferring the presence of an admixed population. On the basis of our results, these microarrays offer sufficient depth for covariate adjustment of population substructure in genomic association studies.

Collapse

Duforet-Frebourg N, Gattepaille LM, Blum MGB, Jakobsson M. HaploPOP: a software that improves population assignment by combining markers into haplotypes. BMC Bioinformatics 2015;16:242. [PMID: 26227424 PMCID: PMC4521458 DOI: 10.1186/s12859-015-0661-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 07/03/2015] [Indexed: 01/27/2023] Open

Waters EK, Sidhu HS, Sidhu LA, Mercer GN. Extended Lotka–Volterra equations incorporating population heterogeneity: Derivation and analysis of the predator–prey case. Ecol Modell 2015. [DOI: 10.1016/j.ecolmodel.2014.11.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Putman AI, Carbone I. Challenges in analysis and interpretation of microsatellite data for population genetic studies. Ecol Evol 2014;4:4399-428. [PMID: 25540699 PMCID: PMC4267876 DOI: 10.1002/ece3.1305] [Citation(s) in RCA: 237] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 10/02/2014] [Accepted: 10/03/2014] [Indexed: 12/14/2022] Open

Abstract

Advancing technologies have facilitated the ever-widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (F ST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity-based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration-specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context-dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non-human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical limitations, and identifies lessons that could be applied toward emerging markers and high-throughput technologies in population genetics.

Collapse

Limpiti T, Amornbunchornvej C, Intarapanich A, Assawamakin A, Tongsima S. iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:903-914. [PMID: 26356862 DOI: 10.1109/tcbb.2014.2322372] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Wangkumhang P, Shaw PJ, Chaichoompu K, Ngamphiw C, Assawamakin A, Nuinoon M, Sripichai O, Svasti S, Fucharoen S, Praphanphoj V, Tongsima S. Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure. PLoS One 2013;8:e79522. [PMID: 24223962 PMCID: PMC3817124 DOI: 10.1371/journal.pone.0079522] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2013] [Accepted: 09/23/2013] [Indexed: 12/22/2022] Open

Abstract

There is considerable ethno-linguistic and genetic variation among human populations in Asia, although tracing the origins of this diversity is complicated by migration events. Thailand is at the center of Mainland Southeast Asia (MSEA), a region within Asia that has not been extensively studied. Genetic substructure may exist in the Thai population, since waves of migration from southern China throughout its recent history may have contributed to substantial gene flow. Autosomal SNP data were collated for 438,503 markers from 992 Thai individuals. Using the available self-reported regional origin, four Thai subpopulations genetically distinct from each other and from other Asian populations were resolved by Neighbor-Joining analysis using a 41,569 marker subset. Using an independent Principal Components-based unsupervised clustering approach, four major MSEA subpopulations were resolved in which regional bias was apparent. A major ancestry component was common to these MSEA subpopulations and distinguishes them from other Asian subpopulations. On the other hand, these MSEA subpopulations were admixed with other ancestries, in particular one shared with Chinese. Subpopulation clustering using only Thai individuals and the complete marker set resolved four subpopulations, which are distributed differently across Thailand. A Sino-Thai subpopulation was concentrated in the Central region of Thailand, although this constituted a minority in an otherwise diverse region. Among the most highly differentiated markers which distinguish the Thai subpopulations, several map to regions known to affect phenotypic traits such as skin pigmentation and susceptibility to common diseases. The subpopulation patterns elucidated have important implications for evolutionary and medical genetics. The subpopulation structure within Thailand may reflect the contributions of different migrants throughout the history of MSEA. The information will also be important for genetic association studies to account for population-structure confounding effects.

Collapse

Liu Y, Nyunoya T, Leng S, Belinsky SA, Tesfaigzi Y, Bruse S. Softwares and methods for estimating genetic ancestry in human populations. Hum Genomics 2013;7:1. [PMID: 23289408 PMCID: PMC3542037 DOI: 10.1186/1479-7364-7-1] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 11/26/2012] [Indexed: 01/10/2023] Open

Neuditschko M, Khatkar MS, Raadsma HW. NetView: a high-definition network-visualization approach to detect fine-scale population structures from genome-wide patterns of variation. PLoS One 2012;7:e48375. [PMID: 23152744 PMCID: PMC3485224 DOI: 10.1371/journal.pone.0048375] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2012] [Accepted: 09/25/2012] [Indexed: 02/06/2023] Open

Abstract

High-throughput sequencing and single nucleotide polymorphism (SNP) genotyping can be used to infer complex population structures. Fine-scale population structure analysis tracing individual ancestry remains one of the major challenges. Based on network theory and recent advances in SNP chip technology, we investigated an unsupervised network clustering method called Super Paramagnetic Clustering (Spc). When applied to whole-genome marker data it identifies the natural divisions of groups of individuals into population clusters without use of prior ancestry information. Furthermore, we optimised an analysis pipeline called NetView, a high-definition network visualization, starting with computation of genetic distance, followed clustering using Spc and finally visualization of clusters with Cytoscape. We compared NetView against commonly used methodologies including Principal Component Analyses (PCA) and a model-based algorithm, Admixture, on whole-genome-wide SNP data derived from three previously described data sets: simulated (2.5 million SNPs, 5 populations), human (1.4 million SNPs, 11 populations) and cattle (32,653 SNPs, 19 populations). We demonstrate that individuals can be effectively allocated to their correct population whilst simultaneously revealing fine-scale structure within the populations. Analyzing the human HapMap populations, we identified unexpected genetic relatedness among individuals, and population stratification within the Indian, African and Mexican samples. In the cattle data set, we correctly assigned all individuals to their respective breeds and detected fine-scale population sub-structures reflecting different sample origins and phenotypes. The NetView pipeline is computationally extremely efficient and can be easily applied on large-scale genome-wide data sets to assign individuals to particular populations and to reproduce fine-scale population structures without prior knowledge of individual ancestry. NetView can be used on any data from which a genetic relationship/distance between individuals can be calculated.

Collapse

Lawson DJ, Falush D. Population identification using genetic data. Annu Rev Genomics Hum Genet 2012;13:337-61. [PMID: 22703172 DOI: 10.1146/annurev-genom-082410-101510] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Lenstra JA, Groeneveld LF, Eding H, Kantanen J, Williams JL, Taberlet P, Nicolazzi EL, Sölkner J, Simianer H, Ciani E, Garcia JF, Bruford MW, Ajmone-Marsan P, Weigend S. Molecular tools and analytical approaches for the characterization of farm animal genetic diversity. Anim Genet 2012;43:483-502. [DOI: 10.1111/j.1365-2052.2011.02309.x] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2011] [Indexed: 12/30/2022]

Limpiti T, Intarapanich A, Assawamakin A, Shaw PJ, Wangkumhang P, Piriyapongsa J, Ngamphiw C, Tongsima S. Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure. BMC Bioinformatics 2011;12:255. [PMID: 21699684 PMCID: PMC3148578 DOI: 10.1186/1471-2105-12-255] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 06/23/2011] [Indexed: 01/20/2023] Open

Abstract

Background

The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis.

Results

A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA.

Conclusions

The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm improves the estimation of the number of subpopulations and the individual assignment accuracy, especially for very large and complex datasets. Furthermore, we have demonstrated that the structure resolved by this approach complements parametric analysis, allowing a much more comprehensive account of population structure. The new version of the ipPCA software with EigenDev incorporated can be downloaded from http://www4a.biotec.or.th/GI/tools/ippca.

Collapse