1
|
Sethuraman A, Janzen FJ, Weisrock DW, Obrycki JJ. Insights from Population Genomics to Enhance and Sustain Biological Control of Insect Pests. Insects 2020; 11:E462. [PMID: 32708047 PMCID: PMC7469154 DOI: 10.3390/insects11080462] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 07/15/2020] [Accepted: 07/17/2020] [Indexed: 01/25/2023]
Abstract
Biological control-the use of organisms (e.g., nematodes, arthropods, bacteria, fungi, viruses) for the suppression of insect pest species-is a well-established, ecologically sound and economically profitable tactic for crop protection. This approach has served as a sustainable solution for many insect pest problems for over a century in North America. However, all pest management tactics have associated risks. Specifically, the ecological non-target effects of biological control have been examined in numerous systems. In contrast, the need to understand the short- and long-term evolutionary consequences of human-mediated manipulation of biological control organisms for importation, augmentation and conservation biological control has only recently been acknowledged. Particularly, population genomics presents exceptional opportunities to study adaptive evolution and invasiveness of pests and biological control organisms. Population genomics also provides insights into (1) long-term biological consequences of releases, (2) the ecological success and sustainability of this pest management tactic and (3) non-target effects on native species, populations and ecosystems. Recent advances in genomic sequencing technology and model-based statistical methods to analyze population-scale genomic data provide a much needed impetus for biological control programs to benefit by incorporating a consideration of evolutionary consequences. Here, we review current technology and methods in population genomics and their applications to biological control and include basic guidelines for biological control researchers for implementing genomic technology and statistical modeling.
Collapse
Affiliation(s)
- Arun Sethuraman
- Department of Biological Sciences, California State University San Marcos, San Marcos, CA 92096, USA
| | - Fredric J Janzen
- Department of Ecology, Evolution, & Organismal Biology, Iowa State University, Ames, IA 50010, USA
- Kellogg Biological Station, Michigan State University, Hickory Corners, MI 49060, USA
| | - David W Weisrock
- Department of Biology, University of Kentucky, Lexington, KY 40506, USA
| | - John J Obrycki
- Department of Entomology, University of Kentucky, Lexington, KY 40506, USA
| |
Collapse
|
2
|
Chen Y, Liang KY, Tong P, Beaty TH, Barnes KC, Linda Kao WH. A pseudolikelihood approach for assessing genetic association in case-control studies with unmeasured population structure. Stat Methods Med Res 2020; 29:3153-3165. [PMID: 32393154 DOI: 10.1177/0962280220921212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The case-control study design is one of the main tools for detecting associations between genetic markers and diseases. It is well known that population substructure can lead to spurious association between disease status and a genetic marker if the prevalence of disease and the marker allele frequency vary across subpopulations. In this paper, we propose a novel statistical method to estimate the association in case-control studies with unmeasured population substructure. The proposed method takes two steps. First, the information on genomic markers and disease status is used to infer the population substructure; second, the association between the disease and the test marker adjusting for the population substructure is modeled and estimated parametrically through polytomous logistic regression. The performance of the proposed method, relative to the existing methods, on bias, coverage probability and computational time, is assessed through simulations. The method is applied to an end-stage renal disease study in African Americans population.
Collapse
Affiliation(s)
- Yong Chen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, USA
| | | | - Pan Tong
- Department of Bioinformatics & Computational Biology, University of Texas, Houston, USA
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins University, Baltimore, USA
| | - Kathleen C Barnes
- University of Colorado Denver - Anschutz Medical Campus, Aurora, USA
| | - W H Linda Kao
- Department of Epidemiology, Johns Hopkins University, Baltimore, USA
| |
Collapse
|
3
|
Abstract
The analysis of population structure has many applications in medical and population genetic research. Such analysis is used to provide clear insight into the underlying genetic population substructure and is a crucial prerequisite for any analysis of genetic data. The analysis involves grouping individuals into subpopulations based on shared genetic variations. The most widely used markers to study the variation of DNA sequences between populations are single nucleotide polymorphisms. Data preprocessing is a necessary step to assess the quality of the data and to determine which markers or individuals can reasonably be included in the analysis. After preprocessing, several methods can be utilized to uncover population substructure, which can be categorized into two broad approaches: parametric and nonparametric. Parametric approaches use statistical models to infer population structure and assign individuals into subpopulations. However, these approaches suffer from many drawbacks that make them impractical for large datasets. In contrast, nonparametric approaches do not suffer from these drawbacks, making them more viable than parametric approaches for analyzing large datasets. Consequently, nonparametric approaches are increasingly used to reveal population substructure. Thus, this paper reviews and discusses the nonparametric approaches that are available for population structure analysis along with some implications to resolve challenges.
Collapse
Affiliation(s)
- Luluah Alhusain
- College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia.
| | - Alaaeldin M Hafez
- College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
4
|
de Los Campos G, Veturi Y, Vazquez AI, Lehermeier C, Pérez-Rodríguez P. Incorporating Genetic Heterogeneity in Whole-Genome Regressions Using Interactions. J Agric Biol Environ Stat 2015; 20:467-490. [PMID: 26660276 PMCID: PMC4666286 DOI: 10.1007/s13253-015-0222-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Accepted: 09/16/2015] [Indexed: 11/22/2022]
Abstract
Naturally and artificially selected populations usually exhibit some degree of stratification. In Genome-Wide Association Studies and in Whole-Genome Regressions (WGR) analyses, population stratification has been either ignored or dealt with as a potential confounder. However, systematic differences in allele frequency and in patterns of linkage disequilibrium can induce sub-population-specific effects. From this perspective, structure acts as an effect modifier rather than as a confounder. In this article, we extend WGR models commonly used in plant and animal breeding to allow for sub-population-specific effects. This is achieved by decomposing marker effects into main effects and interaction components that describe group-specific deviations. The model can be used both with variable selection and shrinkage methods and can be implemented using existing software for genomic selection. Using a wheat and a pig breeding data set, we compare parameter estimates and the prediction accuracy of the interaction WGR model with WGR analysis ignoring population stratification (across-group analysis) and with a stratified (i.e., within-sub-population) WGR analysis. The interaction model renders trait-specific estimates of the average correlation of effects between sub-populations; we find that such correlation not only depends on the extent of genetic differentiation in allele frequencies between groups but also varies among traits. The evaluation of prediction accuracy shows a modest superiority of the interaction model relative to the other two approaches. This superiority is the result of better stability in performance of the interaction models across data sets and traits; indeed, in almost all cases, the interaction model was either the best performing model or it performed close to the best performing model.
Collapse
Affiliation(s)
- Gustavo de Los Campos
- Department of Epidemiology & Biostatistics, Michigan State University, 909 Fee Road, Room B601, East Lansing, MI 48824 USA ; Department of Statistics & Probability, Michigan State University, 619 Red Cedar Rd., East Lansing, MI 48824 USA
| | - Yogasudha Veturi
- University of Alabama at Birmingham, Ryals Public Health Bldg. 443, Birmingham, AL 35294 USA
| | - Ana I Vazquez
- Department of Epidemiology & Biostatistics, Michigan State University, 909 Fee Road, Room B601, East Lansing, MI 48824 USA
| | - Christina Lehermeier
- Department of Plant Breeding, Technische Universität München, Liesel-Beckmann-Str. 2, 85354 Freising, Germany
| | - Paulino Pérez-Rodríguez
- Colegio de Postgraduados, Km. 36.5, Carretera Mexico, Montecillo, 56230 Texcoco, Estado de México Mexico
| |
Collapse
|
5
|
Lehermeier C, Schön CC, de Los Campos G. Assessment of Genetic Heterogeneity in Structured Plant Populations Using Multivariate Whole-Genome Regression Models. Genetics 2015; 201:323-37. [PMID: 26122758 DOI: 10.1534/genetics.115.177394] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 06/25/2015] [Indexed: 01/27/2023] Open
Abstract
Plant breeding populations exhibit varying levels of structure and admixture; these features are likely to induce heterogeneity of marker effects across subpopulations. Traditionally, structure has been dealt with as a potential confounder, and various methods exist to "correct" for population stratification. However, these methods induce a mean correction that does not account for heterogeneity of marker effects. The animal breeding literature offers a few recent studies that consider modeling genetic heterogeneity in multibreed data, using multivariate models. However, these methods have received little attention in plant breeding where population structure can have different forms. In this article we address the problem of analyzing data from heterogeneous plant breeding populations, using three approaches: (a) a model that ignores population structure [A-genome-based best linear unbiased prediction (A-GBLUP)], (b) a stratified (i.e., within-group) analysis (W-GBLUP), and (c) a multivariate approach that uses multigroup data and accounts for heterogeneity (MG-GBLUP). The performance of the three models was assessed on three different data sets: a diversity panel of rice (Oryza sativa), a maize (Zea mays L.) half-sib panel, and a wheat (Triticum aestivum L.) data set that originated from plant breeding programs. The estimated genomic correlations between subpopulations varied from null to moderate, depending on the genetic distance between subpopulations and traits. Our assessment of prediction accuracy features cases where ignoring population structure leads to a parsimonious more powerful model as well as others where the multivariate and stratified approaches have higher predictive power. In general, the multivariate approach appeared slightly more robust than either the A- or the W-GBLUP.
Collapse
|
6
|
Abstract
BACKGROUND Population inference is an important problem in genetics used to remove population stratification in genome-wide association studies and to detect migration patterns or shared ancestry. An individual's genotype can be modeled as a probabilistic function of ancestral population memberships, Q, and the allele frequencies in those populations, P. The parameters, P and Q, of this binomial likelihood model can be inferred using slow sampling methods such as Markov Chain Monte Carlo methods or faster gradient based approaches such as sequential quadratic programming. This paper proposes a least-squares simplification of the binomial likelihood model motivated by a Euclidean interpretation of the genotype feature space. This results in a faster algorithm that easily incorporates the degree of admixture within the sample of individuals and improves estimates without requiring trial-and-error tuning. RESULTS We show that the expected value of the least-squares solution across all possible genotype datasets is equal to the true solution when part of the problem has been solved, and that the variance of the solution approaches zero as its size increases. The Least-squares algorithm performs nearly as well as Admixture for these theoretical scenarios. We compare least-squares, Admixture, and FRAPPE for a variety of problem sizes and difficulties. For particularly hard problems with a large number of populations, small number of samples, or greater degree of admixture, least-squares performs better than the other methods. On simulated mixtures of real population allele frequencies from the HapMap project, Admixture estimates sparsely mixed individuals better than Least-squares. The least-squares approach, however, performs within 1.5% of the Admixture error. On individual genotypes from the HapMap project, Admixture and least-squares perform qualitatively similarly and within 1.2% of each other. Significantly, the least-squares approach nearly always converges 1.5- to 6-times faster. CONCLUSIONS The computational advantage of the least-squares approach along with its good estimation performance warrants further research, especially for very large datasets. As problem sizes increase, the difference in estimation performance between all algorithms decreases. In addition, when prior information is known, the least-squares approach easily incorporates the expected degree of admixture to improve the estimate.
Collapse
Affiliation(s)
- R Mitchell Parry
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - May D Wang
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
- Parker H. Petit Institute of Bioengineering and Biosciences and Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Winship Cancer Institute and Hematology and Oncology Department, Emory University, 30322, Atlanta, GA, USA
| |
Collapse
|
7
|
Kumar R, Williams LK, Kato A, Peterson EL, Favoreto S, Hulse K, Wang D, Beckman K, Thyne S, LeNoir M, Meade K, Lanfear DE, Levin AM, Favro D, Yang JJ, Weiss K, Boushey HA, Grammer L, Avila PC, Burchard EG, Schleimer R. Genetic variation in B cell-activating factor of the TNF family (BAFF) and asthma exacerbations among African American subjects. J Allergy Clin Immunol 2012; 130:996-9.e6. [PMID: 22728080 DOI: 10.1016/j.jaci.2012.04.047] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 01/30/2012] [Accepted: 04/11/2012] [Indexed: 10/28/2022]
|
8
|
Blair C, Weigel DE, Balazik M, Keeley ATH, Walker FM, Landguth E, Cushman S, Murphy M, Waits L, Balkenhol N. A simulation-based evaluation of methods for inferring linear barriers to gene flow. Mol Ecol Resour 2012; 12:822-33. [PMID: 22551194 DOI: 10.1111/j.1755-0998.2012.03151.x] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Different analytical techniques used on the same data set may lead to different conclusions about the existence and strength of genetic structure. Therefore, reliable interpretation of the results from different methods depends on the efficacy and reliability of different statistical methods. In this paper, we evaluated the performance of multiple analytical methods to detect the presence of a linear barrier dividing populations. We were specifically interested in determining if simulation conditions, such as dispersal ability and genetic equilibrium, affect the power of different analytical methods for detecting barriers. We evaluated two boundary detection methods (Monmonier's algorithm and WOMBLING), two spatial Bayesian clustering methods (TESS and GENELAND), an aspatial clustering approach (STRUCTURE), and two recently developed, non-Bayesian clustering methods [PSMIX and discriminant analysis of principal components (DAPC)]. We found that clustering methods had higher success rates than boundary detection methods and also detected the barrier more quickly. All methods detected the barrier more quickly when dispersal was long distance in comparison to short-distance dispersal scenarios. Bayesian clustering methods performed best overall, both in terms of highest success rates and lowest time to barrier detection, with GENELAND showing the highest power. None of the methods suggested a continuous linear barrier when the data were generated under an isolation-by-distance (IBD) model. However, the clustering methods had higher potential for leading to incorrect barrier inferences under IBD unless strict criteria for successful barrier detection were implemented. Based on our findings and those of previous simulation studies, we discuss the utility of different methods for detecting linear barriers to gene flow.
Collapse
Affiliation(s)
- Christopher Blair
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcocks Street, Toronto, ON M5S 3B2, Canada.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Ding L, Wiener H, Abebe T, Altaye M, Go RCP, Kercsmar C, Grabowski G, Martin LJ, Khurana Hershey GK, Chakorborty R, Baye TM. Comparison of measures of marker informativeness for ancestry and admixture mapping. BMC Genomics 2011; 12:622. [PMID: 22185208 PMCID: PMC3276602 DOI: 10.1186/1471-2164-12-622] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Accepted: 12/20/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Admixture mapping is a powerful gene mapping approach for an admixed population formed from ancestral populations with different allele frequencies. The power of this method relies on the ability of ancestry informative markers (AIMs) to infer ancestry along the chromosomes of admixed individuals. In this study, more than one million SNPs from HapMap databases and simulated data have been interrogated in admixed populations using various measures of ancestry informativeness: Fisher Information Content (FIC), Shannon Information Content (SIC), F statistics (FST), Informativeness for Assignment Measure (In), and the Absolute Allele Frequency Differences (delta, δ). The objectives are to compare these measures of informativeness to select SNP markers for ancestry inference, and to determine the accuracy of AIM panels selected by each measure in estimating the contributions of the ancestors to the admixed population. RESULTS FST and In had the highest Spearman correlation and the best agreement as measured by Kappa statistics based on deciles. Although the different measures of marker informativeness performed comparably well, analyses based on the top 1 to 10% ranked informative markers of simulated data showed that In was better in estimating ancestry for an admixed population. CONCLUSIONS Although millions of SNPs have been identified, only a small subset needs to be genotyped in order to accurately predict ancestry with a minimal error rate in a cost-effective manner. In this article, we compared various methods for selecting ancestry informative SNPs using simulations as well as SNP genotype data from samples of admixed populations and showed that the In measure estimates ancestry proportion (in an admixed population) with lower bias and mean square error.
Collapse
Affiliation(s)
- Lili Ding
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Howard Wiener
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tilahun Abebe
- Department of Biology, University of Northern Iowa, Cedar Falls, IA, USA
| | - Mekbib Altaye
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Rodney CP Go
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Carolyn Kercsmar
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Greg Grabowski
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Lisa J Martin
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Gurjit K Khurana Hershey
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| | - Ranajit Chakorborty
- Center for Computational Genomics, Institute of Applied Genetics, Department of Forensic and Investigative Genetics, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Tesfaye M Baye
- Cincinnati Children's Hospital Medical Center, Department of Pediatrics, University of Cincinnati, Cincinnati, OH, USA
| |
Collapse
|
10
|
Abstract
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy-Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets.
Collapse
|
11
|
Onogi A, Nurimoto M, Morita M. Characterization of a Bayesian genetic clustering algorithm based on a Dirichlet process prior and comparison among Bayesian clustering methods. BMC Bioinformatics 2011; 12:263. [PMID: 21708038 PMCID: PMC3161044 DOI: 10.1186/1471-2105-12-263] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2010] [Accepted: 06/28/2011] [Indexed: 11/16/2022] Open
Abstract
Background A Bayesian approach based on a Dirichlet process (DP) prior is useful for inferring genetic population structures because it can infer the number of populations and the assignment of individuals simultaneously. However, the properties of the DP prior method are not well understood, and therefore, the use of this method is relatively uncommon. We characterized the DP prior method to increase its practical use. Results First, we evaluated the usefulness of the sequentially-allocated merge-split (SAMS) sampler, which is a technique for improving the mixing of Markov chain Monte Carlo algorithms. Although this sampler has been implemented in a preceding program, HWLER, its effectiveness has not been investigated. We showed that this sampler was effective for population structure analysis. Implementation of this sampler was useful with regard to the accuracy of inference and computational time. Second, we examined the effect of a hyperparameter for the prior distribution of allele frequencies and showed that the specification of this parameter was important and could be resolved by considering the parameter as a variable. Third, we compared the DP prior method with other Bayesian clustering methods and showed that the DP prior method was suitable for data sets with unbalanced sample sizes among populations. In contrast, although current popular algorithms for population structure analysis, such as those implemented in STRUCTURE, were suitable for data sets with uniform sample sizes, inferences with these algorithms for unbalanced sample sizes tended to be less accurate than those with the DP prior method. Conclusions The clustering method based on the DP prior was found to be useful because it can infer the number of populations and simultaneously assign individuals into populations, and it is suitable for data sets with unbalanced sample sizes among populations. Here we presented a novel program, DPART, that implements the SAMS sampler and can consider the hyperparameter for the prior distribution of allele frequencies to be a variable.
Collapse
Affiliation(s)
- Akio Onogi
- Maebashi Institute of Animal Science, Livestock Improvement Association of Japan, Inc., 316 Kanamaru, Maebashi, Gunma, 371-0121, Japan.
| | | | | |
Collapse
|
12
|
Gould W, Peterson EL, Karungi G, Zoratti A, Gaggin J, Toma G, Yan S, Levin AM, Yang JJ, Wells K, Wang M, Burke RR, Beckman K, Popadic D, Land SJ, Kumar R, Seibold MA, Lanfear DE, Burchard EG, Williams LK. Factors predicting inhaled corticosteroid responsiveness in African American patients with asthma. J Allergy Clin Immunol 2011; 126:1131-8. [PMID: 20864153 DOI: 10.1016/j.jaci.2010.08.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Revised: 07/30/2010] [Accepted: 08/02/2010] [Indexed: 01/13/2023]
Abstract
BACKGROUND African American patients disproportionately experience uncontrolled asthma. Treatment with an inhaled corticosteroid (ICS) is considered first-line therapy for persistent asthma. OBJECTIVE We sought to determine the degree to which African American patients respond to ICS medication and whether the level of response is influenced by other factors, including genetic ancestry. METHODS Patients aged 12 to 56 years who received care from a large health system in southeast Michigan and who resided in Detroit were recruited to participate if they had a diagnosis of asthma. Patients were treated with 6 weeks of inhaled beclomethasone dipropionate, and pulmonary function was remeasured after treatment. Ancestry was determined by genotyping ancestry-informative markers. The main outcome measure was ICS responsiveness defined as the change in prebronchodilator FEV(1) over the 6-week course of treatment. RESULTS Among 147 participating African American patients with asthma, average improvement in FEV(1) after 6 weeks of ICS treatment was 11.6%. The mean proportion of African ancestry in this group was 78.4%. The degree of baseline bronchodilator reversibility was the only factor consistently associated with ICS responsiveness, as measured by both an improvement in FEV(1) and patient-reported asthma control (P = .001 and P = .021, respectively). The proportion of African ancestry was not significantly associated with ICS responsiveness. CONCLUSIONS Although baseline pulmonary function parameters appear to be associated with the likelihood to respond to ICS treatment, the proportion of genetic African ancestry does not. This study suggests that genetic ancestry might not contribute to differences in ICS controller response among African American patients with asthma.
Collapse
Affiliation(s)
- Wendy Gould
- Department of Internal Medicine, Henry Ford Health System, Detroit, MI 48202, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
LIU NIANJUN, ZHAO HONGYU, PATKI AMIT, LIMDI NITAA, ALLISON DAVIDB. Controlling Population Structure in Human Genetic Association Studies with Samples of Unrelated Individuals. Stat Interface 2011; 4:317-326. [PMID: 22308192 PMCID: PMC3269890 DOI: 10.4310/sii.2011.v4.n3.a6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In genetic studies, associations between genotypes and phenotypes may be confounded by unrecognized population structure and/or admixture. Studies have shown that even in European populations, which are thought to be relatively homogeneous, population stratification exists and can affect the validity of association studies. A number of methods have been proposed to address this issue in recent years. Among them, the mixed-model based approach and the principal component-based approach have several advantages over other methods. However, these approaches have not been thoroughly evaluated on large human datasets. The objectives of this study are to (1) evaluate and compare the performance of the mixed-model approach and the principal component-based approach for genetic association mapping using human data consisting of unrelated individuals, and (2) understand the relationship between these two approaches. To achieve these goals, we simulate datasets based on the HapMap data under various scenarios. Our results indicate that the mixed-model approach performs well in controlling for population structure/admixture. It has similar performance as that based on principal component analysis. However, the approach combining mixed-model and principal component analysis does not perform as well as either method itself.
Collapse
Affiliation(s)
- NIANJUN LIU
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294
| | - HONGYU ZHAO
- Department of Epidemiology and Public Health, Department of Genetics, Yale University School of Medicine, New Haven, CT 06520
| | - AMIT PATKI
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294
| | - NITA A. LIMDI
- Department of Neurology, University of Alabama at Birmingham, 1719 6th Avenue South, CIRC-312, Birmingham, AL 35294
| | | |
Collapse
|
14
|
Baye TM, Wilke RA. Mapping genes that predict treatment outcome in admixed populations. Pharmacogenomics J 2010; 10:465-77. [PMID: 20921971 PMCID: PMC2991422 DOI: 10.1038/tpj.2010.71] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2010] [Revised: 07/07/2010] [Accepted: 08/05/2010] [Indexed: 01/19/2023]
Abstract
There is great interest in characterizing the genetic architecture underlying drug response. For many drugs, gene-based dosing models explain a considerable amount of the overall variation in treatment outcome. As such, prescription drug labels are increasingly being modified to contain pharmacogenetic information. Genetic data must, however, be interpreted within the context of relevant clinical covariates. Even the most predictive models improve with the addition of data related to biogeographical ancestry. The current review explores analytical strategies that leverage population structure to more fully characterize genetic determinants of outcome in large clinical practice-based cohorts. The success of this approach will depend upon several key factors: (1) the availability of outcome data from groups of admixed individuals (that is, populations recombined over multiple generations), (2) a measurable difference in treatment outcome (that is, efficacy and toxicity end points), and (3) a measurable difference in allele frequency between the ancestral populations.
Collapse
Affiliation(s)
- T M Baye
- Division of Asthma Research, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH 45229-3039, USA.
| | | |
Collapse
|
15
|
Jin Y, Hu D, Peterson EL, Eng C, Levin AM, Wells K, Beckman K, Kumar R, Seibold MA, Karungi G, Zoratti A, Gaggin J, Campbell J, Galanter J, Chapela R, Rodríguez-Santana JR, Watson HG, Meade K, Lenoir M, Rodríguez-Cintrón W, Avila PC, Lanfear DE, Burchard EG, Williams LK. Dual-specificity phosphatase 1 as a pharmacogenetic modifier of inhaled steroid response among asthmatic patients. J Allergy Clin Immunol 2010; 126:618-25.e1-2. [PMID: 20673984 DOI: 10.1016/j.jaci.2010.06.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2009] [Revised: 06/03/2010] [Accepted: 06/08/2010] [Indexed: 11/15/2022]
Abstract
BACKGROUND Inhaled corticosteroids (ICSs) are considered first-line treatment for persistent asthma, yet there is significant variability in treatment response. Dual-specificity phosphatase 1 (DUSP1) appears to mediate the anti-inflammatory action of corticosteroids. OBJECTIVE We sought to determine whether variants in the DUSP1 gene are associated with clinical response to ICS treatment. METHODS Study participants with asthma were drawn from the following multiethnic cohorts: the Genetics of Asthma in Latino Americans (GALA) study; the Study of African Americans, Asthma, Genes & Environments (SAGE); and the Study of Asthma Phenotypes and Pharmacogenomic Interactions by Race-ethnicity (SAPPHIRE). We screened GALA study participants for genetic variants that modified the relationship between ICS use and bronchodilator response. We then replicated our findings in SAGE and SAPPHIRE participants. In a group of SAPPHIRE participants treated with ICSs for 6 weeks, we examined whether a DUSP1 polymorphism was associated with changes in FEV(1) and self-reported asthma control. RESULTS The DUSP1 polymorphisms rs881152 and rs34507926 localized to different haplotype blocks and appeared to significantly modify the relationship between ICS use and bronchodilator response among GALA study participants. This interaction was also seen for rs881152 among SAPPHIRE but not SAGE participants. Among the group of SAPPHIRE participants prospectively treated with ICSs for 6 weeks, rs881152 genotype was significantly associated with changes in self-reported asthma control but not FEV(1). CONCLUSION DUSP1 polymorphisms were associated with clinical response to ICS therapy and therefore might be useful in the future to identify asthmatic patients more likely to respond to this controller treatment.
Collapse
Affiliation(s)
- Ying Jin
- Center for Health Services Research, Henry Ford Health System, Detroit, Mich; Wayne State University School of Medicine, Detroit, Mich 48202, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Kumar R, Seibold MA, Aldrich MC, Williams LK, Reiner AP, Colangelo L, Galanter J, Gignoux C, Hu D, Sen S, Choudhry S, Peterson EL, Rodriguez-Santana J, Rodriguez-Cintron W, Nalls MA, Leak TS, O'Meara E, Meibohm B, Kritchevsky SB, Li R, Harris TB, Nickerson DA, Fornage M, Enright P, Ziv E, Smith LJ, Liu K, Burchard EG. Genetic ancestry in lung-function predictions. N Engl J Med 2010; 363:321-30. [PMID: 20647190 PMCID: PMC2922981 DOI: 10.1056/nejmoa0907897] [Citation(s) in RCA: 191] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
BACKGROUND Self-identified race or ethnic group is used to determine normal reference standards in the prediction of pulmonary function. We conducted a study to determine whether the genetically determined percentage of African ancestry is associated with lung function and whether its use could improve predictions of lung function among persons who identified themselves as African American. METHODS We assessed the ancestry of 777 participants self-identified as African American in the Coronary Artery Risk Development in Young Adults (CARDIA) study and evaluated the relation between pulmonary function and ancestry by means of linear regression. We performed similar analyses of data for two independent cohorts of subjects identifying themselves as African American: 813 participants in the Health, Aging, and Body Composition (HABC) study and 579 participants in the Cardiovascular Health Study (CHS). We compared the fit of two types of models to lung-function measurements: models based on the covariates used in standard prediction equations and models incorporating ancestry. We also evaluated the effect of the ancestry-based models on the classification of disease severity in two asthma-study populations. RESULTS African ancestry was inversely related to forced expiratory volume in 1 second (FEV(1)) and forced vital capacity in the CARDIA cohort. These relations were also seen in the HABC and CHS cohorts. In predicting lung function, the ancestry-based model fit the data better than standard models. Ancestry-based models resulted in the reclassification of asthma severity (based on the percentage of the predicted FEV(1)) in 4 to 5% of participants. CONCLUSIONS Current predictive equations, which rely on self-identified race alone, may misestimate lung function among subjects who identify themselves as African American. Incorporating ancestry into normative equations may improve lung-function estimates and more accurately categorize disease severity. (Funded by the National Institutes of Health and others.)
Collapse
Affiliation(s)
- Rajesh Kumar
- Division of Allergy and Immunology, Children's Memorial Hospital, 2300 Children's Plaza, Box 60, Chicago IL 60614, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Abstract
This article reviews recent developments in Bayesian algorithms that explicitly include geographical information in the inference of population structure. Current models substantially differ in their prior distributions and background assumptions, falling into two broad categories: models with or without admixture. To aid users of this new generation of spatially explicit programs, we clarify the assumptions underlying the models, and we test these models in situations where their assumptions are not met. We show that models without admixture are not robust to the inclusion of admixed individuals in the sample, thus providing an incorrect assessment of population genetic structure in many cases. In contrast, admixture models are robust to an absence of admixture in the sample. We also give statistical and conceptual reasons why data should be explored using spatially explicit models that include admixture.
Collapse
Affiliation(s)
- Olivier François
- Grenoble IT, Joseph Fourier University, CNRS UMR 5525, TIMC, Group of Computational and Mathematical Biology, 38706 La Tronche, France
| | | |
Collapse
|
18
|
Intarapanich A, Shaw PJ, Assawamakin A, Wangkumhang P, Ngamphiw C, Chaichoompu K, Piriyapongsa J, Tongsima S. Iterative pruning PCA improves resolution of highly structured populations. BMC Bioinformatics 2009; 10:382. [PMID: 19930644 PMCID: PMC2790469 DOI: 10.1186/1471-2105-10-382] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2009] [Accepted: 11/23/2009] [Indexed: 12/12/2022] Open
Abstract
Background Non-random patterns of genetic variation exist among individuals in a population owing to a variety of evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult to correctly estimate the number of subpopulations and assign individuals to them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods are thus becoming increasingly relied upon for population structure analysis. Current PCA-based methods can accurately detect structure; however, the accuracy in resolving subpopulations and assigning individuals to them is wanting. When subpopulations are closely related to one another, they overlap in PCA space and appear as a conglomerate. This problem is exacerbated when some subpopulations in the dataset are genetically far removed from others. We propose a novel PCA-based framework which addresses this shortcoming. Results A novel population structure analysis algorithm called iterative pruning PCA (ipPCA) was developed which assigns individuals to subpopulations and infers the total number of subpopulations present. Genotypic data from simulated and real population datasets with different degrees of structure were analyzed. For datasets with simple structures, the subpopulation assignments of individuals made by ipPCA were largely consistent with the STRUCTURE, BAPS and AWclust algorithms. On the other hand, highly structured populations containing many closely related subpopulations could be accurately resolved only by ipPCA, and not by other methods. Conclusion The algorithm is computationally efficient and not constrained by the dataset complexity. This systematic subpopulation assignment approach removes the need for prior population labels, which could be advantageous when cryptic stratification is encountered in datasets containing individuals otherwise assumed to belong to a homogenous population.
Collapse
Affiliation(s)
- Apichart Intarapanich
- BIOTEC 113 Thailand Science Park, Paholyothin Road, Klong 1, Klong Luang, Pathumtani 12120, Thailand.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Rodríguez-Ramilo ST, Toro MA, Fernández J. Assessing population genetic structure via the maximisation of genetic distance. Genet Sel Evol 2009; 41:49. [PMID: 19900278 PMCID: PMC2776585 DOI: 10.1186/1297-9686-41-49] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2009] [Accepted: 11/09/2009] [Indexed: 01/23/2023] Open
Abstract
Background The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics. Methods In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm) of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set. Results The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found. Conclusion This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy-Weinberg and linkage equilibrium, performs well under different simulated scenarios and with real data. Therefore, it could be a useful tool to determine genetically homogeneous groups, especially in those situations where the number of clusters is high, with complex population structure and where Hardy-Weinberg and/or linkage equilibrium are present.
Collapse
Affiliation(s)
- Silvia T Rodríguez-Ramilo
- Departamento de Mejora Genética Animal, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Crta, A Coruña Km, 7,5, 28040 Madrid, Spain.
| | | | | |
Collapse
|
20
|
Vaughan LK, Divers J, Padilla M, Redden DT, Tiwari HK, Pomp D, Allison DB. The use of plasmodes as a supplement to simulations: A simple example evaluating individual admixture estimation methodologies. Comput Stat Data Anal 2009; 53:1755-1766. [PMID: 20161321 DOI: 10.1016/j.csda.2008.02.032] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
With the advent of powerful computers, simulation studies are becoming an important tool in statistical methodology research. However, computer simulations of a specific process are only as good as our understanding of the underlying mechanisms. An attractive supplement to simulations is the use of plasmode datasets. Plasmodes are data sets that are generated by natural biologic processes, under experimental conditions that allow some aspect of the truth to be known. The benefit of the plasmode approach is that the data are generated through completely natural processes, thus circumventing the common concern of the realism and accuracy of computer simulated data. The estimation of admixture, or the proportion of an individual's genome that originates from different founding populations, is a particularly difficult research endeavor that is well suited to the use of plasmodes. Current methods have been tested with simulations of complex populations where the underlying mechanisms such as the rate and distribution of recombination are not well understood. To demonstrate the utility of this method data derived from mouse crosses is used to evaluate the effectiveness of several admixture estimation methodologies. Each cross shares a common founding population so that the ancestry proportion for each individual is known, allowing for the comparison of true and estimated individual admixture values. Analysis shows that the different estimation methodologies (Structure, AdmixMap and FRAPPE) examined all perform well with simple datasets. However, the performance of the estimation methodologies varied greatly when applied to a plasmode consisting of three founding populations. The results of these examples illustrate the utility of plasmodes in the evaluation of statistical genetics methodologies.
Collapse
Affiliation(s)
- Laura K Vaughan
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294
| | | | | | | | | | | | | |
Collapse
|
21
|
Bingham E, Mannila H. Complexity control in a mixture model by the Hardy–Weinberg equilibrium. Comput Stat Data Anal 2009. [DOI: 10.1016/j.csda.2008.07.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
22
|
Arrigo N, Tuszynski JW, Ehrich D, Gerdes T, Alvarez N. Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring. BMC Bioinformatics 2009; 10:33. [PMID: 19171029 PMCID: PMC2656475 DOI: 10.1186/1471-2105-10-33] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2008] [Accepted: 01/26/2009] [Indexed: 11/10/2022] Open
Abstract
Background Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses. Results Using a new scoring algorithm, RawGeno, we show that scoring errors – in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and "technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous) – induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (Ibin) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets. Conclusion Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at .
Collapse
Affiliation(s)
- Nils Arrigo
- Laboratory of Evolutionary Botany, Institute of Biology, University of Neuchâtel, 11 rue Emile-Argand, CH-2000 Neuchâtel, Switzerland.
| | | | | | | | | |
Collapse
|
23
|
Sazonova N, Harner EJ. Haplotype inference and block partitioning in mixed population samples. J Bioinform Comput Biol 2008; 6:1177-92. [PMID: 19090023 DOI: 10.1142/s0219720008003898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2007] [Revised: 07/25/2008] [Accepted: 07/26/2008] [Indexed: 11/18/2022]
Abstract
Multi-population haplotype inference and block partitioning is a difficult task when dealing with mixed genotype samples. A number of studies have shown that the haplotype block structures, as well as the collections of common haplotypes and their frequencies, vary significantly among world populations. These differences are more extreme when the geographical locations for the populations are more distant. Some of the previous studies performed haplotype inference in multi-population samples with known population assignment. Others developed algorithms for clustering of the mixed haplotype or genotype samples with different block structures or genetic marker profiles. We present a new algorithm that performs haplotype inference and block partitioning in a mixed sample of genotypes from two populations when the population assignments are not known. Given a mixed genotype sample, the proposed algorithm (HAPLOCLUST) extracts two clusters of genotypes with different block structures in addition to performing haplotype inference on each of these clusters. When tested on a set of unrelated individuals, our algorithm provides correct assignments comparable to those of two state-of-the-art algorithms for population stratification. The contribution of HAPLOCLUST consists of performing haplotype/block-based population stratification and simultaneously finding the haplotype resolution and block partitioning for the extracted clusters.
Collapse
Affiliation(s)
- Nadezhda Sazonova
- Department of Mathematics and Computer Science, Clarkson University, Potsdam, NY 13676, USA.
| | | |
Collapse
|
24
|
Yang JJ, Burchard EG, Choudhry S, Johnson CC, Ownby DR, Favro D, Chen J, Akana M, Ha C, Kwok PY, Krajenta R, Havstad SL, Joseph CL, Seibold MA, Shriver MD, Williams LK. Differences in allergic sensitization by self-reported race and genetic ancestry. J Allergy Clin Immunol 2008; 122:820-827.e9. [PMID: 19014772 DOI: 10.1016/j.jaci.2008.07.044] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2008] [Revised: 07/29/2008] [Accepted: 07/30/2008] [Indexed: 01/10/2023]
Abstract
BACKGROUND Many allergic conditions occur more frequently in African American patients when compared with white patients; however, it is not known whether this represents genetic predisposition or disparate environmental exposures. OBJECTIVE We sought to assess the relationship of self-reported race and genetic ancestry to allergic sensitization. METHODS We included 601 women enrolled in a population-based cohort study whose self-reported race was African American or white. Genetic ancestry was estimated by using markers that differentiate West African and European ancestry. We assessed the relationship between allergic sensitization (defined as > or =1 allergen-specific IgE results) and both self-reported race and genetic ancestry. Regression models adjusted for sociodemographic variables, environmental exposures, and location of residence. RESULTS The average proportion of West African ancestry in African American participants was 0.69, whereas the mean proportion of European ancestry in white participants was 0.79. Self-reported African American race was associated with allergic sensitization when compared with those who reported being white (adjusted odds ratio, 2.19; 95% CI, 1.22-3.93), even after adjusting for other variables. Genetic ancestry was not significantly associated with allergic sensitization after accounting for location of residence (adjusted odds ratio, 2.09 for urban vs suburban residence; 95% CI, 1.32-3.31). CONCLUSION Self-reported race and location of residence appeared to be more important predictors of allergic sensitization when compared with genetic ancestry, suggesting that the disparity in allergic sensitization by race might be primarily a result of environmental factors rather than genetic differences.
Collapse
|
25
|
Aldrich MC, Selvin S, Hansen HM, Barcellos LF, Wrensch MR, Sison JD, Quesenberry CP, Kittles RA, Silva G, Buffler PA, Seldin MF, Wiencke JK. Comparison of statistical methods for estimating genetic admixture in a lung cancer study of African Americans and Latinos. Am J Epidemiol 2008; 168:1035-46. [PMID: 18791191 DOI: 10.1093/aje/kwn224] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A variety of methods are available for estimating genetic admixture proportions in populations; however, few investigators have conducted detailed comparisons using empirical data. The authors characterized admixture proportions among self-identified African Americans (n = 535) and Latinos (n = 412) living in the San Francisco Bay Area who participated in a lung cancer case-control study (1998-2003). Individual estimates of genetic ancestry based on 184 informative markers were obtained from a Bayesian approach and 2 maximum likelihood approaches and were compared using descriptive statistics, Pearson correlation coefficients, and Bland-Altman plots. Case-control differences in individual admixture proportions were assessed using 2-sample t tests and logistic regression analysis. Results indicated that Bayesian and frequentist approaches to estimating admixture provide similar estimates and inferences. No difference was observed in admixture proportions between African-American cases and controls, but Latino cases and controls significantly differed according to Amerindian and European genetic ancestry. Differences in admixture proportions between Latino cases and controls were not unexpected, since cases were more likely to have been born in the United States. Genetic admixture proportions provide a quantitative measure of ancestry differences among Latinos that can be used in analyses of genetic risk factors.
Collapse
Affiliation(s)
- Melinda C Aldrich
- University of California, San Francisco, Box 2911 Rock Hall, Mission Bay 582, 1550 4th Street, San Francisco, CA 94143-2911, USA.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
NADIR ALVAREZ, NILS ARRIGO, CONSORTIUM INTRABIODIV. SIMIL: anr(CRAN) scripts collection for computing genetic structure similarities based onstructure2 outputs. Mol Ecol Resour 2008; 8:757-62. [DOI: 10.1111/j.1755-0998.2007.02076.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
27
|
Santafé G, Lozano JA, Larrañaga P. Inference of population structure using genetic markers and a Bayesian model averaging approach for clustering. J Comput Biol 2008; 15:207-20. [PMID: 18312151 DOI: 10.1089/cmb.2007.0051] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The analysis of the structure of populations on the basis of genetic data is essential in population genetics. It is used, for instance, to study the evolution of species or to correct for population stratification in association studies. These genetic data, normally based on DNA polymorphisms, may contain irrelevant information that biases the inference of population structure. In this paper we adapt a recently proposed algorithm, named multistart EMA, to be used in the inference of population structure. This algorithm is able to deal with irrelevant information when obtaining the (probabilistic) population partition. Additionally, we present a maker selection test able to obtain the most relevant markers to retrieve that population partition. The proposed algorithm is compared with the widely used STRUCTURE software on the basis of the F(ST) metric and the log-likelihood score. It is shown that the proposed algorithm improves the obtention of the population structure. Moreover, information about relevant markers obtained by the multi-start EMA can be used to improve the results obtained by other methods, correct for population stratification or even also reduce the economical cost of sequencing new samples. The software presented in this paper is available online at http://www.sc.ehu.es/ccwbayes/members/guzman.
Collapse
Affiliation(s)
- Guzmán Santafé
- Computer Science and Artificial Intelligence Department, University of the Basque Country, San Sebastian, Spain.
| | | | | |
Collapse
|
28
|
Tiwari HK, Barnholtz-Sloan J, Wineinger N, Padilla MA, Vaughan LK, Allison DB. Review and evaluation of methods correcting for population stratification with a focus on underlying statistical principles. Hum Hered 2008; 66:67-86. [PMID: 18382087 PMCID: PMC2803696 DOI: 10.1159/000119107] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
When two or more populations have been separated by geographic or cultural boundaries for many generations, drift, spontaneous mutations, differential selection pressures and other factors may lead to allele frequency differences among populations. If these 'parental' populations subsequently come together and begin inter-mating, disequilibrium among linked markers may span a greater genetic distance than it typically does among populations under panmixia [see glossary]. This extended disequilibrium can make association studies highly effective and more economical than disequilibrium mapping in panmictic populations since less marker loci are needed to detect regions of the genome that harbor phenotype-influencing loci. However, under some circumstances, this process of intermating (as well as other processes) can produce disequilibrium between pairs of unlinked loci and thus create the possibility of confounding or spurious associations due to this population stratification. Accordingly, researchers are advised to employ valid statistical tests for linkage disequilibrium mapping allowing conduct of genetic association studies that control for such confounding. Many recent papers have addressed this need. We provide a comprehensive review of advances made in recent years in correcting for population stratification and then evaluate and synthesize these methods based on statistical principles such as (1) randomization, (2) conditioning on sufficient statistics, and (3) identifying whether the method is based on testing the genotype-phenotype covariance (conditional upon familial information) and/or testing departures of the marginal distribution from the expected genotypic frequencies.
Collapse
Affiliation(s)
- Hemant K Tiwari
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, AL 35294, USA.
| | | | | | | | | | | |
Collapse
|
29
|
NADIR ALVAREZ, NILS ARRIGO, CONSORTIUM INTRABIODIV. SIMIL: an r (CRAN) scripts collection for computing genetic structure similarities based on structure 2 outputs. Mol Ecol Resour 2008. [DOI: 10.1111/j.1471-8286.2007.02076.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
30
|
Gao X, Starmer JD. AWclust: point-and-click software for non-parametric population structure analysis. BMC Bioinformatics 2008; 9:77. [PMID: 18237431 PMCID: PMC2253519 DOI: 10.1186/1471-2105-9-77] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2007] [Accepted: 01/31/2008] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Population structure analysis is important to genetic association studies and evolutionary investigations. Parametric approaches, e.g. STRUCTURE and L-POP, usually assume Hardy-Weinberg equilibrium (HWE) and linkage equilibrium among loci in sample population individuals. However, the assumptions may not hold and allele frequency estimation may not be accurate in some data sets. The improved version of STRUCTURE (version 2.1) can incorporate linkage information among loci but is still sensitive to high background linkage disequilibrium. Nowadays, large-scale single nucleotide polymorphisms (SNPs) are becoming popular in genetic studies. Therefore, it is imperative to have software that makes full use of these genetic data to generate inference even when model assumptions do not hold or allele frequency estimation suffers from high variation. RESULTS We have developed point-and-click software for non-parametric population structure analysis distributed as an R package. The software takes advantage of the large number of SNPs available to categorize individuals into ethnically similar clusters and it does not require assumptions about population models. Nor does it estimate allele frequencies. Moreover, this software can also infer the optimal number of populations. CONCLUSION Our software tool employs non-parametric approaches to assign individuals to clusters using SNPs. It provides efficient computation and an intuitive way for researchers to explore ethnic relationships among individuals. It can be complementary to parametric approaches in population structure analysis.
Collapse
Affiliation(s)
- Xiaoyi Gao
- Miami Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, FL 33136, USA.
| | | |
Collapse
|
31
|
Abstract
An association study can be used to investigate how individuals with unique genetic variants respond to a drug treatment. In an association study, individuals may come from different ethnic groups or an admixed population. The heterogeneity of genetic backgrounds among individuals in association studies may lead to false-positive or false-negative results. Confounding caused by population structure and recent admixture may be one major factor that contributes to the lack of replication of association study results. Confounding can be detected and adjusted. Major methods that adjust for population stratification are described and explained in this chapter. Their advantages and disadvantages are discussed.
Collapse
Affiliation(s)
- Donglei Hu
- Institute for Human Genetics, Comprehensive Cancer Center, Department of Medicine, University of California San Francisco, San Francisco, California, USA
| | | |
Collapse
|
32
|
Abstract
Recently, the amplified fragment length polymorphism (AFLP) technique has gained a lot of popularity, and is now frequently applied to a wide variety of organisms. Technical specificities of the AFLP procedure have been well documented over the years, but there is on the contrary little or scattered information about the statistical analysis of AFLPs. In this review, we describe the various methods available to handle AFLP data, focusing on four research topics at the population or individual level of analysis: (i) assessment of genetic diversity; (ii) identification of population structure; (iii) identification of hybrid individuals; and (iv) detection of markers associated with phenotypes. Two kinds of analysis methods can be distinguished, depending on whether they are based on the direct study of band presences or absences in AFLP profiles ('band-based' methods), or on allelic frequencies estimated at each locus from these profiles ('allele frequency-based' methods). We investigate the characteristics and limitations of these statistical tools; finally, we appeal for a wider adoption of methodologies borrowed from other research fields, like for example those especially designed to deal with binary data.
Collapse
Affiliation(s)
- A Bonin
- Diversity Arrays Technology P/L, Yarralumla, ACT 2600, Australia
| | | | | |
Collapse
|
33
|
Abstract
Many studies in the fields of genetic epidemiology and applied population genetics are predicated on, or require, an assessment of the genetic background diversity of the individuals chosen for study. A number of strategies have been developed for assessing genetic background diversity. These strategies typically focus on genotype data collected on the individuals in the study, based on a panel of DNA markers. However, many of these strategies are either rooted in cluster analysis techniques, and hence suffer from problems inherent to the assignment of the biological and statistical meaning to resulting clusters, or have formulations that do not permit easy and intuitive extensions. We describe a very general approach to the problem of assessing genetic background diversity that extends the analysis of molecular variance (AMOVA) strategy introduced by Excoffier and colleagues some time ago. As in the original AMOVA strategy, the proposed approach, termed generalized AMOVA (GAMOVA), requires a genetic similarity matrix constructed from the allelic profiles of individuals under study and/or allele frequency summaries of the populations from which the individuals have been sampled. The proposed strategy can be used to either estimate the fraction of genetic variation explained by grouping factors such as country of origin, race, or ethnicity, or to quantify the strength of the relationship of the observed genetic background variation to quantitative measures collected on the subjects, such as blood pressure levels or anthropometric measures. Since the formulation of our test statistic is rooted in multivariate linear models, sets of variables can be related to genetic background in multiple regression-like contexts. GAMOVA can also be used to complement graphical representations of genetic diversity such as tree diagrams (dendrograms) or heatmaps. We examine features, advantages, and power of the proposed procedure and showcase its flexibility by using it to analyze a wide variety of published data sets, including data from the Human Genome Diversity Project, classical anthropometry data collected by Howells, and the International HapMap Project. Humans exhibit great genetic diversity. Understanding the factors that contribute to and sustain this diversity is an important research area. Not only can such understanding shed light on human origins, but it can also assist in the discovery of genes and genetic factors that contribute to debilitating diseases. Statistical analysis methods that can facilitate the identification of factors contributing to or associated with human genetic diversity are growing in number as new high-throughput molecular genetic assays and technologies are developed. We consider the use of an analysis method termed generalized analysis of molecular variance (GAMOVA), which builds off of previously proposed analysis methods for testing hypotheses about the factors associated with genetic background diversity. We apply the method in a wide variety of settings and show that it is both flexible and powerful. GAMOVA has great potential to assist in population-based human genetic studies, as it can be used to address questions such as: Is a sample of affected cases and unaffected controls from a homogeneous population, or is there evidence of heterogeneity that could affect the results of an association study? Is there reason to believe that the ancestry of a set of individuals influences the traits that they have?
Collapse
Affiliation(s)
- Caroline M Nievergelt
- Department of Psychiatry, University of California at San Diego, La Jolla, California, United States of America
- Rebecca and John Moores UCSD Cancer Center, University of California at San Diego, La Jolla, California, United States of America
- The Center for Human Genetics and Genomics, University of California at San Diego, La Jolla, California, United States of America
- The Stein Institute for Research on Aging, University of California at San Diego, La Jolla, California, United States of America
| | - Ondrej Libiger
- Department of Psychiatry, University of California at San Diego, La Jolla, California, United States of America
- Rebecca and John Moores UCSD Cancer Center, University of California at San Diego, La Jolla, California, United States of America
- The Center for Human Genetics and Genomics, University of California at San Diego, La Jolla, California, United States of America
| | - Nicholas J Schork
- Department of Psychiatry, University of California at San Diego, La Jolla, California, United States of America
- Department of Family and Preventive Medicine, University of California at San Diego, La Jolla, California, United States of America
- Rebecca and John Moores UCSD Cancer Center, University of California at San Diego, La Jolla, California, United States of America
- The Center for Human Genetics and Genomics, University of California at San Diego, La Jolla, California, United States of America
- The Stein Institute for Research on Aging, University of California at San Diego, La Jolla, California, United States of America
- Scripps Genomic Medicine and Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|