1
|
Šorgić D, Stefanović A, Keckarević D, Popović M. XGBoost as a reliable machine learning tool for predicting ancestry using autosomal STR profiles - Proof of method. Forensic Sci Int Genet 2025; 76:103183. [PMID: 39637759 DOI: 10.1016/j.fsigen.2024.103183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 10/28/2024] [Accepted: 11/25/2024] [Indexed: 12/07/2024]
Abstract
The aim of this study was to test the validity of a predictive model of ancestry affiliation based on Short Tandem Repeat (STR) profiles. Frequencies of 29 genetic markers from the Promega website for four distinct population groups (African Americans, Asians, Caucasians, Hispanic Americans) were used to generate 360,000 profiles (90000 profiles per group), which were later used to train and test a range of machine learning algorithms with the goal of establishing the most optimal model for accurate ancestry prediction. The chosen models (Decision Trees, Support Vector Machines, XGBoost, among others) were deployed in Python, and their performance was compared. The XGBoost model outperformed others, displaying significant predictive power with an accuracy rating of 94.24 % for all four classes, and an accuracy rating of 99.06 % on a differentiation task involving Asian, African American, and Caucasian subsamples and an accuracy rating of 98.57 % when differentiating between the African-American, Asian, and the mixed group combining Caucasians and Hispanics. Evaluating the impact of training set size revealed that model accuracy peaked at 94 % with 90,000 profiles per category, but decreased to 83 % as the number of profiles per category was reduced to 500, particularly affecting precision when distinguishing between Caucasian and Hispanic subgroups. The study further investigated the impact of marker quantity on model accuracy, finding that the use of 21 markers, commonly available in commercial amplification kits, resulted in an accuracy of 96.3 % for African Americans, Asians, and Caucasians, and 88.28 % for all four groups combined. These findings underscore the potential of STR-based models in forensic analysis and hint at the broader applicability of machine learning in genetic ancestry determination, with implications for enhancing the precision and reliability of forensic investigations, particularly in heterogeneous environments where ancestral background can be a crucial piece of information.
Collapse
|
2
|
Yang HC, Chen CW, Lin YT, Chu SK. Genetic ancestry plays a central role in population pharmacogenomics. Commun Biol 2021; 4:171. [PMID: 33547344 PMCID: PMC7864978 DOI: 10.1038/s42003-021-01681-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 01/06/2021] [Indexed: 12/12/2022] Open
Abstract
Recent studies have pointed out the essential role of genetic ancestry in population pharmacogenetics. In this study, we analyzed the whole-genome sequencing data from The 1000 Genomes Project (Phase 3) and the pharmacogenetic information from Drug Bank, PharmGKB, PharmaADME, and Biotransformation. Here we show that ancestry-informative markers are enriched in pharmacogenetic loci, suggesting that trans-ancestry differentiation must be carefully considered in population pharmacogenetics studies. Ancestry-informative pharmacogenetic loci are located in both protein-coding and non-protein-coding regions, illustrating that a whole-genome analysis is necessary for an unbiased examination over pharmacogenetic loci. Finally, those ancestry-informative pharmacogenetic loci that target multiple drugs are often a functional variant, which reflects their importance in biological functions and pathways. In summary, we develop an efficient algorithm for an ultrahigh-dimensional principal component analysis. We create genetic catalogs of ancestry-informative markers and genes. We explore pharmacogenetic patterns and establish a high-accuracy prediction panel of genetic ancestry. Moreover, we construct a genetic ancestry pharmacogenomic database Genetic Ancestry PhD (http://hcyang.stat.sinica.edu.tw/databases/genetic_ancestry_phd/). Hsin-Chou Yang et al. examine population structure in several genomic databases and identify that pharmacogenetic loci are enriched for markers of genetic ancestry. Their results suggest that genetic ancestry must be carefully considered in population pharmacogenetics studies.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan. .,Institute of Statistics, National Cheng Kung University, Tainan, Taiwan. .,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan.
| | - Chia-Wei Chen
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Yu-Ting Lin
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Shih-Kai Chu
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
3
|
Verdugo RA, Di Genova A, Herrera L, Moraga M, Acuña M, Berríos S, Llop E, Valenzuela CY, Bustamante ML, Digman D, Symon A, Asenjo S, López P, Blanco A, Suazo J, Barozet E, Caba F, Villalón M, Alvarado S, Cáceres D, Salgado K, Portales P, Moreno-Estrada A, Gignoux CR, Sandoval K, Bustamante CD, Eng C, Huntsman S, Burchard EG, Loira N, Maass A, Cifuentes L. Development of a small panel of SNPs to infer ancestry in Chileans that distinguishes Aymara and Mapuche components. Biol Res 2020; 53:15. [PMID: 32299502 PMCID: PMC7161194 DOI: 10.1186/s40659-020-00284-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 04/09/2020] [Indexed: 12/30/2022] Open
Abstract
Background Current South American populations trace their origins mainly to three continental ancestries, i.e. European, Amerindian and African. Individual variation in relative proportions of each of these ancestries may be confounded with socio-economic factors due to population stratification. Therefore, ancestry is a potential confounder variable that should be considered in epidemiologic studies and in public health plans. However, there are few studies that have assessed the ancestry of the current admixed Chilean population. This is partly due to the high cost of genome-scale technologies commonly used to estimate ancestry. In this study we have designed a small panel of SNPs to accurately assess ancestry in the largest sampling to date of the Chilean mestizo population (n = 3349) from eight cities. Our panel is also able to distinguish between the two main Amerindian components of Chileans: Aymara from the north and Mapuche from the south. Results A panel of 150 ancestry-informative markers (AIMs) of SNP type was selected to maximize ancestry informativeness and genome coverage. Of these, 147 were successfully genotyped by KASPar assays in 2843 samples, with an average missing rate of 0.012, and a 0.95 concordance with microarray data. The ancestries estimated with the panel of AIMs had relative high correlations (0.88 for European, 0.91 for Amerindian, 0.70 for Aymara, and 0.68 for Mapuche components) with those obtained with AXIOM LAT1 array. The country’s average ancestry was 0.53 ± 0.14 European, 0.04 ± 0.04 African, and 0.42 ± 0.14 Amerindian, disaggregated into 0.18 ± 0.15 Aymara and 0.25 ± 0.13 Mapuche. However, Mapuche ancestry was highest in the south (40.03%) and Aymara in the north (35.61%) as expected from the historical location of these ethnic groups. We make our results available through an online app and demonstrate how it can be used to adjust for ancestry when testing association between incidence of a disease and nongenetic risk factors. Conclusions We have conducted the most extensive sampling, across many different cities, of current Chilean population. Ancestry varied significantly by latitude and human development. The panel of AIMs is available to the community for estimating ancestry at low cost in Chileans and other populations with similar ancestry.
Collapse
Affiliation(s)
- Ricardo A Verdugo
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile.,Departamento de Oncología Básico Clínica, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Alex Di Genova
- Mathomics, Centro de Modelamiento Matemático y Centro para la Regulación del Genoma, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Luisa Herrera
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Mauricio Moraga
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Mónica Acuña
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Soledad Berríos
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Elena Llop
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Carlos Y Valenzuela
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - M Leonor Bustamante
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile.,Departamento de Psiquiatría, y Salud Mental Norte, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Dayhana Digman
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Adriana Symon
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Soledad Asenjo
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Pamela López
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - Alejandro Blanco
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile
| | - José Suazo
- Instituto de Investigación en Ciencias Odontológicas, Facultad de Odontología, Universidad de Chile, Santiago, Chile
| | - Emmanuelle Barozet
- Departamento de Sociología, Facultad de Ciencias Sociales, Universidad de Chile, Centro de Estudios de Conflicto y Cohesión, Social, Santiago, Chile
| | - Fresia Caba
- Facultad de Ciencias de la Salud, Universidad de Tarapacá, Arica, Chile
| | - Marcelo Villalón
- Instituto de Salud Poblacional "Escuela de Salud Pública", Universidad de Chile, Santiago, Chile
| | - Sergio Alvarado
- Instituto de Salud Poblacional "Escuela de Salud Pública", Universidad de Chile, Santiago, Chile
| | - Dante Cáceres
- Instituto de Salud Poblacional "Escuela de Salud Pública", Universidad de Chile, Santiago, Chile
| | - Katherine Salgado
- Facultad de Ciencias de la Salud, Universidad de Tarapacá, Arica, Chile
| | - Pilar Portales
- Corporación Municipal de Desarrollo Social, Iquique, Chile
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity (LANGEBIO), CINVESTAV, Irapuato, Guanajuato, 36821, Mexico
| | | | - Karla Sandoval
- National Laboratory of Genomics for Biodiversity (LANGEBIO), CINVESTAV, Irapuato, Guanajuato, 36821, Mexico
| | | | - Celeste Eng
- Department of Medicine, University of California, San Francisco, CA, USA
| | - Scott Huntsman
- Department of Medicine, University of California, San Francisco, CA, USA
| | - Esteban G Burchard
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | - Nicolás Loira
- Mathomics, Centro de Modelamiento Matemático y Centro para la Regulación del Genoma, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Alejandro Maass
- Mathomics, Centro de Modelamiento Matemático y Centro para la Regulación del Genoma, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile.,Departamento de Ingeniería Matemática, Facultad de Ciencias Físicas y Matemáticas, Universidad de Chile, Santiago, Chile
| | - Lucía Cifuentes
- Programa de Genética Humana del ICBM, Facultad de Medicina, Universidad de Chile, Independencia 1027, Santiago, Chile.
| |
Collapse
|
4
|
Ancestry informative markers (AIMs) for Korean and other East Asian and South East Asian populations. Int J Legal Med 2019; 133:1711-1719. [DOI: 10.1007/s00414-019-02129-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 07/26/2019] [Indexed: 01/28/2023]
|
5
|
The three-hybrid genetic composition of an Ecuadorian population using AIMs-InDels compared with autosomes, mitochondrial DNA and Y chromosome data. Sci Rep 2019; 9:9247. [PMID: 31239502 PMCID: PMC6592923 DOI: 10.1038/s41598-019-45723-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2019] [Accepted: 06/04/2019] [Indexed: 11/08/2022] Open
Abstract
The history of Ecuador was marked by the arrival of Europeans with Africans, resulting in the mixture of Native Americans with Africans and Europeans. The present study contributes to the knowledge of the Ecuadorian mestizo population by offering information about ancestry and ethnic heterogeneity. Forty-six AIM-InDels (Ancestry Informative Insertion/Deletion Markers) were used to obtain information on 240 Ecuadorian individuals from three regions (Amazonia, the Highlands, and the Coast). As a result, the population involved a significant contribution from Native Americans (values up to 51%), followed by Europeans (values up to 33%) and Africans (values up to 13%). Furthermore, we compared the data obtained with nine previously reported scientific articles on autosomal, mitochondrial DNA and Y chromosomes. The admixture results correspond to Ecuador's historical background and vary slightly between regions.
Collapse
|
6
|
Moriot A, Santos C, Freire-Aradas A, Phillips C, Hall D. Inferring biogeographic ancestry with compound markers of slow and fast evolving polymorphisms. Eur J Hum Genet 2018; 26:1697-1707. [PMID: 29995845 PMCID: PMC6189140 DOI: 10.1038/s41431-018-0215-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 04/23/2018] [Accepted: 06/12/2018] [Indexed: 11/09/2022] Open
Abstract
Bio-geographic ancestry is an area of considerable interest in the medical genetics, anthropology and forensics. Although genome-wide panels are ideal as they provide dense genotyping data, small sets of ancestry informative marker provide a cost-effective way to investigate genetic ancestry and population structure. Here, we investigate the performance of a reduced marker set that combine different types of autosomal markers through haplotype analysis. In particular, recently described DIP-STR markers should offer the advantage of comprising both, low mutation rate Indels (DIPs), to study human history over longer time scale; and high mutation rate STRs, to trace relatively recent demographic events. In this study, we assessed the ability of an initial set of 23 DIP-STRs to distinguish major population groups using the HGDP-CEPH reference samples. The results obtained applying the STRUCTURE algorithm show that the discrimination capacity of the DIP-STRs is comparable to currently used small-scale ancestry informative markers by approaching seven major demographic groups. Yet, the DIP-STRs show an improved success rate in assigning individuals to populations of Europe and Middle East. These data show a remarkable ability of a preliminary set of 23 DIP-STR markers to infer major biogeographic origins. A novel set of DIP-STRs preselected to contain ancestry information should lead to further improvements.
Collapse
Affiliation(s)
- Amandine Moriot
- Unité de Génétique Forensique, Centre Universitaire Romand de Médecine Légale, Centre Hospitalier Universitaire Vaudois et Université de Lausanne, Lausanne, Switzerland
| | - Carla Santos
- Forensic Genetics Unit, Institute of Forensic Science, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Ana Freire-Aradas
- Forensic Genetics Unit, Institute of Forensic Science, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Science, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Diana Hall
- Unité de Génétique Forensique, Centre Universitaire Romand de Médecine Légale, Centre Hospitalier Universitaire Vaudois et Université de Lausanne, Lausanne, Switzerland.
| |
Collapse
|
7
|
Park A, Kim J, Zaso MJ, Glatt SJ, Sher KJ, Scott-Sheldon LAJ, Eckert TL, Vanable PA, Carey KB, Ewart CK, Carey MP. The interaction between the dopamine receptor D4 (DRD4) variable number tandem repeat polymorphism and perceived peer drinking norms in adolescent alcohol use and misuse. Dev Psychopathol 2017; 29:173-183. [PMID: 26902782 PMCID: PMC4995157 DOI: 10.1017/s0954579416000080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Peer drinking norms are arguably one of the strongest correlates of adolescent drinking. Prospective studies indicate that adolescents tend to select peers based on drinking (peer selection) and their peers' drinking is associated with changes in adolescent drinking over time (peer socialization). The present study investigated whether the peer selection and socialization processes in adolescent drinking differed as a function of the dopamine receptor D4 (DRD4) variable number tandem repeat genotype in two independent prospective data sets. The first sample was 174 high school students drawn from a two-wave 6-month prospective study. The second sample was 237 college students drawn from a three-wave annual prospective study. Multigroup cross-lagged panel analyses of the high school student sample indicated stronger socialization via peer drinking norms among carriers, whereas analyses of the college student sample indicated stronger drinking-based peer selection in the junior year among carriers, compared to noncarriers. Although replication and meta-analytic synthesis are needed, these findings suggest that in part genetically determined peer selection (carriers of the DRD4 seven-repeat allele tend to associate with peers who have more favorable attitudes toward drinking and greater alcohol use) and peer socialization (carriers' subsequent drinking behaviors are more strongly associated with their peer drinking norms) may differ across adolescent developmental stages.
Collapse
|
8
|
GlobalFiler ® Express DNA amplification kit in South Africa: Extracting the past from the present. Forensic Sci Int Genet 2016; 24:194-201. [DOI: 10.1016/j.fsigen.2016.07.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 06/10/2016] [Accepted: 07/11/2016] [Indexed: 01/23/2023]
|
9
|
Phillips C, Santos C, Fondevila M, Carracedo Á, Lareu MV. Inference of Ancestry in Forensic Analysis I: Autosomal Ancestry-Informative Marker Sets. Methods Mol Biol 2016; 1420:233-53. [PMID: 27259744 DOI: 10.1007/978-1-4939-3597-0_18] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
An expanding choice of ancestry-informative marker single nucleotide polymorphisms (AIM-SNPs) is becoming available for the forensic user in the form of sensitive SNaPshot-based tests or in alternative single-base extension genotyping systems (e.g., Sequenom iPLEX) that can be adapted for analysis with SNaPshot. In addition, alternative ancestry-informative variation: Indels and STRs can be analyzed using direct PCR-to-CE techniques that offer the possibility to detect mixed profiles. We review the current forensically viable AIM panels, their optimized PCR multiplexes, and the population differentiation power they offer. We also describe how improved population divergence balance can be achieved with the enlarged multiplex scales of next-generation sequencing approaches to enable analysis of admixed individuals without biased estimation of co-ancestry proportions.
Collapse
Affiliation(s)
- Chris Phillips
- Forensic Genetics Unit, Luis Concheiro Institute of Forensic Sciences, Genomic Medicine Group, University of Santiago de Compostela, Galicia, 15782, Spain.
| | - Carla Santos
- Forensic Genetics Unit, Luis Concheiro Institute of Forensic Sciences, Genomic Medicine Group, University of Santiago de Compostela, Galicia, 15782, Spain
| | - Manuel Fondevila
- Forensic Genetics Unit, Luis Concheiro Institute of Forensic Sciences, Genomic Medicine Group, University of Santiago de Compostela, Galicia, 15782, Spain
| | - Ángel Carracedo
- Forensic Genetics Unit, Luis Concheiro Institute of Forensic Sciences, Genomic Medicine Group, University of Santiago de Compostela, Galicia, 15782, Spain
- Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Maria Victoria Lareu
- Forensic Genetics Unit, Luis Concheiro Institute of Forensic Sciences, Genomic Medicine Group, University of Santiago de Compostela, Galicia, 15782, Spain
| |
Collapse
|
10
|
Forensic genetic analysis of bio-geographical ancestry. Forensic Sci Int Genet 2015; 18:49-65. [DOI: 10.1016/j.fsigen.2015.05.012] [Citation(s) in RCA: 151] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Revised: 05/02/2015] [Accepted: 05/14/2015] [Indexed: 01/20/2023]
|
11
|
Regan JF, Kamitaki N, Legler T, Cooper S, Klitgord N, Karlin-Neumann G, Wong C, Hodges S, Koehler R, Tzonev S, McCarroll SA. A rapid molecular approach for chromosomal phasing. PLoS One 2015; 10:e0118270. [PMID: 25739099 PMCID: PMC4349636 DOI: 10.1371/journal.pone.0118270] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 01/12/2015] [Indexed: 11/18/2022] Open
Abstract
Determining the chromosomal phase of pairs of sequence variants - the arrangement of specific alleles as haplotypes - is a routine challenge in molecular genetics. Here we describe Drop-Phase, a molecular method for quickly ascertaining the phase of pairs of DNA sequence variants (separated by 1-200 kb) without cloning or manual single-molecule dilution. In each Drop-Phase reaction, genomic DNA segments are isolated in tens of thousands of nanoliter-sized droplets together with allele-specific fluorescence probes, in a single reaction well. Physically linked alleles partition into the same droplets, revealing their chromosomal phase in the co-distribution of fluorophores across droplets. We demonstrated the accuracy of this method by phasing members of trios (revealing 100% concordance with inheritance information), and demonstrate a common clinical application by phasing CFTR alleles at genomic distances of 11-116 kb in the genomes of cystic fibrosis patients. Drop-Phase is rapid (requiring less than 4 hours), scalable (to hundreds of samples), and effective at long genomic distances (200 kb).
Collapse
Affiliation(s)
- John F. Regan
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
- * E-mail: (JFR); (SAM)
| | - Nolan Kamitaki
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Cambridge, Massachusetts, United States of America
| | - Tina Legler
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
| | - Samantha Cooper
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
| | - Niels Klitgord
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
| | - George Karlin-Neumann
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
| | - Catherine Wong
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Shawn Hodges
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
| | - Ryan Koehler
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
| | - Svilen Tzonev
- Digital Biology Center, Bio-Rad Laboratories, Pleasanton, California, United States of America
| | - Steven A. McCarroll
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Cambridge, Massachusetts, United States of America
- * E-mail: (JFR); (SAM)
| |
Collapse
|
12
|
Brough HA, Liu AH, Sicherer S, Makinson K, Douiri A, Brown SJ, Stephens AC, Irwin McLean WH, Turcanu V, Wood RA, Jones SM, Burks W, Dawson P, Stablein D, Sampson H, Lack G. Atopic dermatitis increases the effect of exposure to peanut antigen in dust on peanut sensitization and likely peanut allergy. J Allergy Clin Immunol 2014; 135:164-70. [PMID: 25457149 PMCID: PMC4282723 DOI: 10.1016/j.jaci.2014.10.007] [Citation(s) in RCA: 245] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Revised: 10/11/2014] [Accepted: 10/14/2014] [Indexed: 01/17/2023]
Abstract
BACKGROUND History and severity of atopic dermatitis (AD) are risk factors for peanut allergy. Recent evidence suggests that children can become sensitized to food allergens through an impaired skin barrier. Household peanut consumption, which correlates strongly with peanut protein levels in household dust, is a risk factor for peanut allergy. OBJECTIVE We sought to assess whether environmental peanut exposure (EPE) is a risk for peanut sensitization and allergy and whether markers of an impaired skin barrier modify this risk. METHODS Peanut protein in household dust (in micrograms per gram) was assessed in highly atopic children (age, 3-15 months) recruited to the Consortium of Food Allergy Research Observational Study. History and severity of AD, peanut sensitization, and likely allergy (peanut-specific IgE, ≥5 kUA/mL) were assessed at recruitment into the Consortium of Food Allergy Research study. RESULTS There was an exposure-response relationship between peanut protein levels in household dust and peanut skin prick test (SPT) sensitization and likely allergy. In the final multivariate model an increase in 4 log2 EPE units increased the odds of peanut SPT sensitization (1.71-fold; 95% CI, 1.13- to 2.59-fold; P = .01) and likely peanut allergy (PA; 2.10-fold; 95% CI, 1.20- to 3.67-fold; P < .01). The effect of EPE on peanut SPT sensitization was augmented in children with a history of AD (OR, 1.97; 95% CI, 1.26-3.09; P < .01) and augmented even further in children with a history of severe AD (OR, 2.41; 95% CI, 1.30-4.47; P < .01); the effect of EPE on PA was also augmented in children with a history of AD (OR, 2.34; 95% CI, 1.31-4.18; P < .01). CONCLUSION Exposure to peanut antigen in dust through an impaired skin barrier in atopically inflamed skin is a plausible route for peanut SPT sensitization and PA.
Collapse
Affiliation(s)
- Helen A Brough
- Paediatric Allergy, Department of Asthma, Allergy and Respiratory Science, King's College London, Guys' Hospital, London, United Kingdom
| | - Andrew H Liu
- Paediatric Allergy, National Jewish Health, Denver, Colo
| | - Scott Sicherer
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, Jaffe Food Allergy Institute, New York, NY
| | - Kerry Makinson
- Paediatric Allergy, Department of Asthma, Allergy and Respiratory Science, King's College London, Guys' Hospital, London, United Kingdom
| | - Abdel Douiri
- Department of Public Health Science, School of Medicine, King's College London, London, United Kingdom
| | - Sara J Brown
- Centre for Dermatology and Genetic Medicine, College of Life Sciences and College of Medicine, Dentistry and Nursing, University of Dundee, Dundee, United Kingdom
| | - Alick C Stephens
- Paediatric Allergy, Department of Asthma, Allergy and Respiratory Science, King's College London, Guys' Hospital, London, United Kingdom
| | - W H Irwin McLean
- Centre for Dermatology and Genetic Medicine, College of Life Sciences and College of Medicine, Dentistry and Nursing, University of Dundee, Dundee, United Kingdom
| | - Victor Turcanu
- Paediatric Allergy, Department of Asthma, Allergy and Respiratory Science, King's College London, Guys' Hospital, London, United Kingdom
| | - Robert A Wood
- Department of Pediatrics, Division of Allergy and Immunology, Johns Hopkins University School of Medicine, Baltimore, Md
| | - Stacie M Jones
- Department of Pediatrics, University of Arkansas for Medical Sciences and Arkansas Children's Hospital, Little Rock, Ark
| | - Wesley Burks
- Department of Pediatrics, University of North Carolina, Chapel Hill, NC
| | | | | | - Hugh Sampson
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, Jaffe Food Allergy Institute, New York, NY
| | - Gideon Lack
- Paediatric Allergy, Department of Asthma, Allergy and Respiratory Science, King's College London, Guys' Hospital, London, United Kingdom.
| |
Collapse
|
13
|
Yang HC, Lin CW, Chen CW, Chen JJ. Applying genome-wide gene-based expression quantitative trait locus mapping to study population ancestry and pharmacogenetics. BMC Genomics 2014; 15:319. [PMID: 24779372 PMCID: PMC4236814 DOI: 10.1186/1471-2164-15-319] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2013] [Accepted: 04/15/2014] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Gene-based analysis has become popular in genomic research because of its appealing biological and statistical properties compared with those of a single-locus analysis. However, only a few, if any, studies have discussed a mapping of expression quantitative trait loci (eQTL) in a gene-based framework. Neither study has discussed ancestry-informative eQTL nor investigated their roles in pharmacogenetics by integrating single nucleotide polymorphism (SNP)-based eQTL (s-eQTL) and gene-based eQTL (g-eQTL). RESULTS In this g-eQTL mapping study, the transcript expression levels of genes (transcript-level genes; T-genes) were correlated with the SNPs of genes (sequence-level genes; S-genes) by using a method of gene-based partial least squares (PLS). Ancestry-informative transcripts were identified using a rank-score-based multivariate association test, and ancestry-informative eQTL were identified using Fisher's exact test. Furthermore, key ancestry-predictive eQTL were selected in a flexible discriminant analysis. We analyzed SNPs and gene expression of 210 independent people of African-, Asian- and European-descent. We identified numerous cis- and trans-acting g-eQTL and s-eQTL for each population by using PLS. We observed ancestry information enriched in eQTL. Furthermore, we identified 2 ancestry-informative eQTL associated with adverse drug reactions and/or drug response. Rs1045642, located on MDR1, is an ancestry-informative eQTL (P = 2.13E-13, using Fisher's exact test) associated with adverse drug reactions to amitriptyline and nortriptyline and drug responses to morphine. Rs20455, located in KIF6, is an ancestry-informative eQTL (P = 2.76E-23, using Fisher's exact test) associated with the response to statin drugs (e.g., pravastatin and atorvastatin). The ancestry-informative eQTL of drug biotransformation genes were also observed; cross-population cis-acting expression regulators included SPG7, TAP2, SLC7A7, and CYP4F2. Finally, we also identified key ancestry-predictive eQTL and established classification models with promising training and testing accuracies in separating samples from close populations. CONCLUSIONS In summary, we developed a gene-based PLS procedure and a SAS macro for identifying g-eQTL and s-eQTL. We established data archives of eQTL for global populations. The program and data archives are accessible at http://www.stat.sinica.edu.tw/hsinchou/genetics/eQTL/HapMapII.htm. Finally, the results from our investigations regarding the interrelationship between eQTL, ancestry information, and pharmacodynamics provide rich resources for future eQTL studies and practical applications in population genetics and medical genetics.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, No 128, Academia Road, Section 2, Nankang, Taipei, Taiwan
- School of Public Health, National Defense Medical Center, Taipei, Taiwan
| | - Chien-Wei Lin
- Institute of Statistical Science, Academia Sinica, No 128, Academia Road, Section 2, Nankang, Taipei, Taiwan
| | - Chia-Wei Chen
- Institute of Statistical Science, Academia Sinica, No 128, Academia Road, Section 2, Nankang, Taipei, Taiwan
| | - James J Chen
- National Center for Toxicological Research, Food and Drug Administration, Little Rock, Arkansas, USA
| |
Collapse
|
14
|
Porras-Hurtado L, Ruiz Y, Santos C, Phillips C, Carracedo A, Lareu MV. An overview of STRUCTURE: applications, parameter settings, and supporting software. Front Genet 2013; 4:98. [PMID: 23755071 PMCID: PMC3665925 DOI: 10.3389/fgene.2013.00098] [Citation(s) in RCA: 282] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 05/14/2013] [Indexed: 12/22/2022] Open
Abstract
Objectives: We present an up-to-date review of STRUCTURE software: one of the most widely used population analysis tools that allows researchers to assess patterns of genetic structure in a set of samples. STRUCTURE can identify subsets of the whole sample by detecting allele frequency differences within the data and can assign individuals to those sub-populations based on analysis of likelihoods. The review covers STRUCTURE's most commonly used ancestry and frequency models, plus an overview of the main applications of the software in human genetics including case-control association studies (CCAS), population genetics, and forensic analysis. The review is accompanied by supplementary material providing a step-by-step guide to running STRUCTURE. Methods: With reference to a worked example, we explore the effects of changing the principal analysis parameters on STRUCTURE results when analyzing a uniform set of human genetic data. Use of the supporting software: CLUMPP and distruct is detailed and we provide an overview and worked example of STRAT software, applicable to CCAS. Conclusion: The guide offers a simplified view of how STRUCTURE, CLUMPP, distruct, and STRAT can be applied to provide researchers with an informed choice of parameter settings and supporting software when analyzing their own genetic data.
Collapse
Affiliation(s)
- Liliana Porras-Hurtado
- Universidad Tecnológica de Pereira Pereira, Colombia ; Forensic Genetics Unit, Institute of Legal Medicine, University of Santiago de Compostela Santiago de Compostela, Spain
| | | | | | | | | | | |
Collapse
|
15
|
Phillips C, Fernandez-Formoso L, Gelabert-Besada M, Garcia-Magariños M, Santos C, Fondevila M, Carracedo Á, Lareu MV. Development of a novel forensic STR multiplex for ancestry analysis and extended identity testing. Electrophoresis 2013; 34:1151-62. [DOI: 10.1002/elps.201200621] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Revised: 12/13/2012] [Accepted: 12/17/2012] [Indexed: 11/11/2022]
Affiliation(s)
- Chris Phillips
- Forensic Genetics Unit; Institute of Legal Medicine; University of Santiago de Compostela; Santiago de Compostela; Spain
| | - Luis Fernandez-Formoso
- Forensic Genetics Unit; Institute of Legal Medicine; University of Santiago de Compostela; Santiago de Compostela; Spain
| | - Miguel Gelabert-Besada
- Forensic Genetics Unit; Institute of Legal Medicine; University of Santiago de Compostela; Santiago de Compostela; Spain
| | | | - Carla Santos
- Forensic Genetics Unit; Institute of Legal Medicine; University of Santiago de Compostela; Santiago de Compostela; Spain
| | - Manuel Fondevila
- Forensic Genetics Unit; Institute of Legal Medicine; University of Santiago de Compostela; Santiago de Compostela; Spain
| | | | - Maria Victoria Lareu
- Forensic Genetics Unit; Institute of Legal Medicine; University of Santiago de Compostela; Santiago de Compostela; Spain
| |
Collapse
|
16
|
Yang HC, Wang PL, Lin CW, Chen CH, Chen CH. Integrative analysis of single nucleotide polymorphisms and gene expression efficiently distinguishes samples from closely related ethnic populations. BMC Genomics 2012; 13:346. [PMID: 22839760 PMCID: PMC3453505 DOI: 10.1186/1471-2164-13-346] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2012] [Accepted: 07/16/2012] [Indexed: 01/11/2023] Open
Abstract
Background Ancestry informative markers (AIMs) are a type of genetic marker that is informative for tracing the ancestral ethnicity of individuals. Application of AIMs has gained substantial attention in population genetics, forensic sciences, and medical genetics. Single nucleotide polymorphisms (SNPs), the materials of AIMs, are useful for classifying individuals from distinct continental origins but cannot discriminate individuals with subtle genetic differences from closely related ancestral lineages. Proof-of-principle studies have shown that gene expression (GE) also is a heritable human variation that exhibits differential intensity distributions among ethnic groups. GE supplies ethnic information supplemental to SNPs; this motivated us to integrate SNP and GE markers to construct AIM panels with a reduced number of required markers and provide high accuracy in ancestry inference. Few studies in the literature have considered GE in this aspect, and none have integrated SNP and GE markers to aid classification of samples from closely related ethnic populations. Results We integrated a forward variable selection procedure into flexible discriminant analysis to identify key SNP and/or GE markers with the highest cross-validation prediction accuracy. By analyzing genome-wide SNP and/or GE markers in 210 independent samples from four ethnic groups in the HapMap II Project, we found that average testing accuracies for a majority of classification analyses were quite high, except for SNP-only analyses that were performed to discern study samples containing individuals from two close Asian populations. The average testing accuracies ranged from 0.53 to 0.79 for SNP-only analyses and increased to around 0.90 when GE markers were integrated together with SNP markers for the classification of samples from closely related Asian populations. Compared to GE-only analyses, integrative analyses of SNP and GE markers showed comparable testing accuracies and a reduced number of selected markers in AIM panels. Conclusions Integrative analysis of SNP and GE markers provides high-accuracy and/or cost-effective classification results for assigning samples from closely related or distantly related ancestral lineages to their original ancestral populations. User-friendly BIASLESS (Biomarkers Identification and Samples Subdivision) software was developed as an efficient tool for selecting key SNP and/or GE markers and then building models for sample subdivision. BIASLESS was programmed in R and R-GUI and is available online at http://www.stat.sinica.edu.tw/hsinchou/genetics/prediction/BIASLESS.htm.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan.
| | | | | | | | | |
Collapse
|
17
|
Non AL, Gravlee CC, Mulligan CJ. Education, genetic ancestry, and blood pressure in African Americans and Whites. Am J Public Health 2012; 102:1559-65. [PMID: 22698014 DOI: 10.2105/ajph.2011.300448] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
OBJECTIVES We assessed the relative roles of education and genetic ancestry in predicting blood pressure (BP) within African Americans and explored the association between education and BP across racial groups. METHODS We used t tests and linear regressions to examine the associations of genetic ancestry, estimated from a genomewide set of autosomal markers, and education with BP variation among African Americans in the Family Blood Pressure Program. We also performed linear regressions in self-identified African Americans and Whites to explore the association of education with BP across racial groups. RESULTS Education, but not genetic ancestry, significantly predicted BP variation in the African American subsample (b=-0.51 mm Hg per year additional education; P=.001). Although education was inversely associated with BP in the total population, within-group analyses showed that education remained a significant predictor of BP only among the African Americans. We found a significant interaction (b=3.20; P=.006) between education and self-identified race in predicting BP. CONCLUSIONS Racial disparities in BP may be better explained by differences in education than by genetic ancestry. Future studies of ancestry and disease should include measures of the social environment.
Collapse
Affiliation(s)
- Amy L Non
- Harvard Center for Population and Development Studies, Harvard University, Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
18
|
Ancestry informative marker set for han chinese population. G3-GENES GENOMES GENETICS 2012; 2:339-41. [PMID: 22413087 PMCID: PMC3291503 DOI: 10.1534/g3.112.001941] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 01/08/2012] [Indexed: 12/02/2022]
Abstract
The population of Han Chinese is ∼1.226 billion people. Genetic heterogeneity between northern Han Chinese (N-Han) and southern Han Chinese (S-Han) has been demonstrated by recent genome-wide studies. As an initial step toward health disparities and personalized medicine in Chinese population, this study developed a set of ancestry informative markers (AIM) for Han Chinese population.
Collapse
|
19
|
Pereira R, Phillips C, Pinto N, Santos C, dos Santos SEB, Amorim A, Carracedo Á, Gusmão L. Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing. PLoS One 2012; 7:e29684. [PMID: 22272242 PMCID: PMC3260179 DOI: 10.1371/journal.pone.0029684] [Citation(s) in RCA: 196] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Accepted: 12/02/2011] [Indexed: 02/06/2023] Open
Abstract
Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control association studies, but also in population and forensic genetics studies. This work presents a set of 46 ancestry-informative insertion deletion polymorphisms selected to efficiently measure population admixture proportions of four different origins (African, European, East Asian and Native American). All markers are analyzed in short fragments (under 230 basepairs) through a single PCR followed by capillary electrophoresis (CE) allowing a very simple one tube PCR-to-CE approach. HGDP-CEPH diversity panel samples from the four groups, together with Oceanians, were genotyped to evaluate the efficiency of the assay in clustering populations from different continental origins and to establish reference databases. In addition, other populations from diverse geographic origins were tested using the HGDP-CEPH samples as reference data. The results revealed that the AIM-INDEL set developed is highly efficient at inferring the ancestry of individuals and provides good estimates of ancestry proportions at the population level. In conclusion, we have optimized the multiplexed genotyping of 46 AIM-INDELs in a simple and informative assay, enabling a more straightforward alternative to the commonly available AIM-SNP typing methods dependent on complex, multi-step protocols or implementation of large-scale genotyping technologies.
Collapse
Affiliation(s)
- Rui Pereira
- IPATIMUP – Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Institute of Forensic Sciences Luis Concheiro, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Christopher Phillips
- Institute of Forensic Sciences Luis Concheiro, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Nádia Pinto
- IPATIMUP – Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Faculty of Sciences, University of Porto, Porto, Portugal
- Mathematics Research Centre, University of Porto, Porto, Portugal
| | - Carla Santos
- Institute of Forensic Sciences Luis Concheiro, University of Santiago de Compostela, Santiago de Compostela, Spain
| | | | - António Amorim
- IPATIMUP – Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Faculty of Sciences, University of Porto, Porto, Portugal
| | - Ángel Carracedo
- Institute of Forensic Sciences Luis Concheiro, University of Santiago de Compostela, Santiago de Compostela, Spain
- Genomics Medicine Group, CIBERER, University of Santiago de Compostela, Santiago de Compostela, Spain
| | - Leonor Gusmão
- IPATIMUP – Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
| |
Collapse
|