1
|
Vilà-Valls L, Abdeli A, Lucas-Sánchez M, Bekada A, Calafell F, Benhassine T, Comas D. Understanding the genomic heterogeneity of North African Imazighen: from broad to microgeographical perspectives. Sci Rep 2024; 14:9979. [PMID: 38693301 PMCID: PMC11063056 DOI: 10.1038/s41598-024-60568-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 04/24/2024] [Indexed: 05/03/2024] Open
Abstract
The strategic location of North Africa has led to cultural and demographic shifts, shaping its genetic structure. Historical migrations brought different genetic components that are evident in present-day North African genomes, along with autochthonous components. The Imazighen (plural of Amazigh) are believed to be the descendants of autochthonous North Africans and speak various Amazigh languages, which belong to the Afro-Asiatic language family. However, the arrival of different human groups, especially during the Arab conquest, caused cultural and linguistic changes in local populations, increasing their heterogeneity. We aim to characterize the genetic structure of the region, using the largest Amazigh dataset to date and other reference samples. Our findings indicate microgeographical genetic heterogeneity among Amazigh populations, modeled by various admixture waves and different effective population sizes. A first admixture wave is detected group-wide around the twelfth century, whereas a second wave appears in some Amazigh groups around the nineteenth century. These events involved populations with higher genetic ancestry from south of the Sahara compared to the current North Africans. A plausible explanation would be the historical trans-Saharan slave trade, which lasted from the Roman times to the nineteenth century. Furthermore, our investigation shows that assortative mating in North Africa has been rare.
Collapse
Affiliation(s)
- Laura Vilà-Valls
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Amine Abdeli
- Laboratoire de Biologie Cellulaire et Moléculaire, Faculté Des Sciences Biologiques, Université des Sciences et de la Technologie Houari Boumediene, Alger, Algeria
| | - Marcel Lucas-Sánchez
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Asmahan Bekada
- Département de Biotechnologie, Faculté des Sciences de la Nature et de la Vie, Université Oran 1 (Ahmad Ben Bella), Oran, Algeria
| | - Francesc Calafell
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Traki Benhassine
- Laboratoire de Biologie Cellulaire et Moléculaire, Faculté Des Sciences Biologiques, Université des Sciences et de la Technologie Houari Boumediene, Alger, Algeria
| | - David Comas
- Departament de Medicina i Ciències de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
2
|
Valenzuela-García LI, Ayala-García VM, Ramos-Rosales DF, Jacquez-Flores RE, Urtiz-Estrada N, Hernández EMM, Barraza-Salas M. The rs7208505 Polymorphism and Differential Expression of the SKA2 Gene in the Prefrontal Cortex of Suicide Victims from the Mexican Population. Arch Suicide Res 2024; 28:674-685. [PMID: 37204142 DOI: 10.1080/13811118.2023.2209155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
OBJECTIVE The main aim of the current study was to investigate whether SKA2 gene expression in the postmortem brain of rs7208505 genotype are altered in suicide victims from a Mexican population. METHODS In this study, we report a genetic analysis of expression levels of the SKA2 gene in the prefrontal cortex of the postmortem brain of suicidal subjects (n = 22) compared to subjects who died of causes other than suicide (n = 22) in a Mexican population using RT-qPCR assays. Additionally, we genotyped the rs7208505 polymorphism in suicide victims (n = 98) and controls (n = 88) and we evaluate the association of genotypes for the SNP rs7208505 with expression level of SKA2. RESULTS The results showed that the expression of the SKA2 gene was significantly higher in suicide victims compared to control subjects (p = 0.044). Interestingly, we observed a greater proportion of allele A of the rs7208505 in suicide victims than controls. Even though there was no association between the SNP with suicide in the study population we found a significative association of the expression level from SKA2 with the allele A of the rs7208505 and suicide. CONCLUSION The evidence suggests that the expression of SKA2 in the prefrontal cortex may be a critical factor in the etiology of suicidal behavior.
Collapse
|
3
|
Guo B, Borda V, Laboulaye R, Spring MD, Wojnarski M, Vesely BA, Silva JC, Waters NC, O'Connor TD, Takala-Harrison S. Strong positive selection biases identity-by-descent-based inferences of recent demography and population structure in Plasmodium falciparum. Nat Commun 2024; 15:2499. [PMID: 38509066 PMCID: PMC10954658 DOI: 10.1038/s41467-024-46659-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
Malaria genomic surveillance often estimates parasite genetic relatedness using metrics such as Identity-By-Decent (IBD), yet strong positive selection stemming from antimalarial drug resistance or other interventions may bias IBD-based estimates. In this study, we use simulations, a true IBD inference algorithm, and empirical data sets from different malaria transmission settings to investigate the extent of this bias and explore potential correction strategies. We analyze whole genome sequence data generated from 640 new and 3089 publicly available Plasmodium falciparum clinical isolates. We demonstrate that positive selection distorts IBD distributions, leading to underestimated effective population size and blurred population structure. Additionally, we discover that the removal of IBD peak regions partially restores the accuracy of IBD-based inferences, with this effect contingent on the population's background genetic relatedness and extent of inbreeding. Consequently, we advocate for selection correction for parasite populations undergoing strong, recent positive selection, particularly in high malaria transmission settings.
Collapse
Affiliation(s)
- Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Victor Borda
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michele D Spring
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Mariusz Wojnarski
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Brian A Vesely
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Joana C Silva
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA
- Global Health and Tropical Medicine (GHTM), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa (NOVA), Lisbon, Portugal
| | - Norman C Waters
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
4
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, Reich D. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024; 56:143-151. [PMID: 38123640 PMCID: PMC10786714 DOI: 10.1038/s41588-023-01582-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- BIOMICs Research Group, University of the Basque Country, Vitoria-Gasteiz, Spain
- Ikerbasque-Basque Foundation of Science, Bilbao, Spain
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
5
|
Koyama S, Wang Y, Paruchuri K, Uddin MM, Cho SMJ, Urbut SM, Haidermota S, Hornsby WE, Green RC, Daly MJ, Neale BM, Ellinor PT, Smoller JW, Lebo MS, Karlson EW, Martin AR, Natarajan P. Decoding Genetics, Ancestry, and Geospatial Context for Precision Health. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.10.24.23297096. [PMID: 37961173 PMCID: PMC10635180 DOI: 10.1101/2023.10.24.23297096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Mass General Brigham, an integrated healthcare system based in the Greater Boston area of Massachusetts, annually serves 1.5 million patients. We established the Mass General Brigham Biobank (MGBB), encompassing 142,238 participants, to unravel the intricate relationships among genomic profiles, environmental context, and disease manifestations within clinical practice. In this study, we highlight the impact of ancestral diversity in the MGBB by employing population genetics, geospatial assessment, and association analyses of rare and common genetic variants. The population structures captured by the genetics mirror the sequential immigration to the Greater Boston area throughout American history, highlighting communities tied to shared genetic and environmental factors. Our investigation underscores the potency of unbiased, large-scale analyses in a healthcare-affiliated biobank, elucidating the dynamic interplay across genetics, immigration, structural geospatial factors, and health outcomes in one of the earliest American sites of European colonization.
Collapse
Affiliation(s)
- Satoshi Koyama
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Ying Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kaavya Paruchuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Md Mesbah Uddin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - So Mi J. Cho
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Integrative Research Center for Cerebrovascular and Cardiovascular Diseases, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sarah M. Urbut
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Sara Haidermota
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Whitney E. Hornsby
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Robert C. Green
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Department of Medicine (Genetics), MassGeneralBrigham, Boston, MA, USA
- Broad Institute and Ariadne Labs, Boston, MA, USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Finland
- University of Helsinki, Helsinki, Finland
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Patrick T. Ellinor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Jordan W. Smoller
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Matthew S. Lebo
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Cambridge, MA, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
| | - Elizabeth W. Karlson
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Mass General Brigham Personalized Medicine, Cambridge, MA, USA
- Division of Rheumatology, Inflammation and Immunity, Department of Medicine, Brigham and Women’s Hospital., Boston, MA, USA
| | - Alicia R. Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Pradeep Natarajan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Ziyatdinov A, Torres J, Alegre-Díaz J, Backman J, Mbatchou J, Turner M, Gaynor SM, Joseph T, Zou Y, Liu D, Wade R, Staples J, Panea R, Popov A, Bai X, Balasubramanian S, Habegger L, Lanche R, Lopez A, Maxwell E, Jones M, García-Ortiz H, Ramirez-Reyes R, Santacruz-Benítez R, Nag A, Smith KR, Damask A, Lin N, Paulding C, Reppell M, Zöllner S, Jorgenson E, Salerno W, Petrovski S, Overton J, Reid J, Thornton TA, Abecasis G, Berumen J, Orozco-Orozco L, Collins R, Baras A, Hill MR, Emberson JR, Marchini J, Kuri-Morales P, Tapia-Conyer R. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 2023; 622:784-793. [PMID: 37821707 PMCID: PMC10600010 DOI: 10.1038/s41586-023-06595-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/31/2023] [Indexed: 10/13/2023]
Abstract
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.
Collapse
Affiliation(s)
| | - Jason Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| | - Jesús Alegre-Díaz
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | | | - Michael Turner
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- Oxford Kidney Unit, Churchill Hospital, Oxford, UK
| | | | | | - Yuxin Zou
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Daren Liu
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Rachel Wade
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | - Alex Popov
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | - Alex Lopez
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | - Raul Ramirez-Reyes
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Rogelio Santacruz-Benítez
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Abhishek Nag
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Katherine R Smith
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Amy Damask
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Nan Lin
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | | | | | | | | | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Michael R Hill
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan R Emberson
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | - Pablo Kuri-Morales
- Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico
| | - Roberto Tapia-Conyer
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico.
| |
Collapse
|
7
|
Guo B, Borda V, Laboulaye R, Spring MD, Wojnarski M, Vesely BA, Silva JC, Waters NC, O'Connor TD, Takala-Harrison S. Strong Positive Selection Biases Identity-By-Descent-Based Inferences of Recent Demography and Population Structure in Plasmodium falciparum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.14.549114. [PMID: 37502843 PMCID: PMC10370022 DOI: 10.1101/2023.07.14.549114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Malaria genomic surveillance often estimates parasite genetic relatedness using metrics such as Identity-By-Decent (IBD). Yet, strong positive selection stemming from antimalarial drug resistance or other interventions may bias IBD-based estimates. In this study, we utilized simulations, a true IBD inference algorithm, and empirical datasets from different malaria transmission settings to investigate the extent of such bias and explore potential correction strategies. We analyzed whole genome sequence data generated from 640 new and 4,026 publicly available Plasmodium falciparum clinical isolates. Our findings demonstrated that positive selection distorts IBD distributions, leading to underestimated effective population size and blurred population structure. Additionally, we discovered that the removal of IBD peak regions partially restored the accuracy of IBD-based inferences, with this effect contingent on the population's background genetic relatedness. Consequently, we advocate for selection correction for parasite populations undergoing strong, recent positive selection, particularly in high malaria transmission settings.
Collapse
Affiliation(s)
- Bing Guo
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| | - Victor Borda
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Roland Laboulaye
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Michele D Spring
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Mariusz Wojnarski
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Brian A Vesely
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Joana C Silva
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Norman C Waters
- Armed Forces Research Institute of Medical Sciences, Bangkok, Thailand
| | - Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Shannon Takala-Harrison
- Center for Vaccine Development and Global Health, University of Maryland School of Medicine, Baltimore, MD USA
| |
Collapse
|
8
|
Witt KE, Funk A, Añorve-Garibay V, Fang LL, Huerta-Sánchez E. The Impact of Modern Admixture on Archaic Human Ancestry in Human Populations. Genome Biol Evol 2023; 15:evad066. [PMID: 37103242 PMCID: PMC10194819 DOI: 10.1093/gbe/evad066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 03/07/2023] [Accepted: 04/17/2023] [Indexed: 04/28/2023] Open
Abstract
Admixture, the genetic merging of parental populations resulting in mixed ancestry, has occurred frequently throughout the course of human history. Numerous admixture events have occurred between human populations across the world, which have shaped genetic ancestry in modern humans. For example, populations in the Americas are often mosaics of different ancestries due to recent admixture events as part of European colonization. Admixed individuals also often have introgressed DNA from Neanderthals and Denisovans that may have come from multiple ancestral populations, which may affect how archaic ancestry is distributed across an admixed genome. In this study, we analyzed admixed populations from the Americas to assess whether the proportion and location of admixed segments due to recent admixture impact an individual's archaic ancestry. We identified a positive correlation between non-African ancestry and archaic alleles, as well as a slight increase of Denisovan alleles in Indigenous American segments relative to European segments in admixed genomes. We also identify several genes as candidates for adaptive introgression, based on archaic alleles present at high frequency in admixed American populations but low frequency in East Asian populations. These results provide insights into how recent admixture events between modern humans redistributed archaic ancestry in admixed genomes.
Collapse
Affiliation(s)
- Kelsey E Witt
- Ecology, Evolution, and Organismal Biology, Brown University, Providence, Rhode Island
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
| | - Alyssa Funk
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
- Molecular Biology, Cell Biology, & Biochemistry, Brown University, Providence, Rhode Island
| | - Valeria Añorve-Garibay
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
- Licenciatura en Ciencias Genómicas, Escuela Nacional de Estudios Superiores Unidad Juriquilla, Universidad Nacional Autónoma de México, Querétaro, Mexico
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Querétaro, Mexico
| | - Lesly Lopez Fang
- Department of Life & Environmental Sciences, University of California, Merced, California, United States of America
| | - Emilia Huerta-Sánchez
- Ecology, Evolution, and Organismal Biology, Brown University, Providence, Rhode Island
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island
| |
Collapse
|
9
|
Gilbert E, Zurel H, MacMillan ME, Demiriz S, Mirhendi S, Merrigan M, O'Reilly S, Molloy AM, Brody LC, Bodmer W, Leach RA, Scott REM, Mugford G, Randhawa R, Stephens JC, Symington AL, Cavalleri GL, Phillips MS. The Newfoundland and Labrador mosaic founder population descends from an Irish and British diaspora from 300 years ago. Commun Biol 2023; 6:469. [PMID: 37117635 PMCID: PMC10147672 DOI: 10.1038/s42003-023-04844-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 03/28/2023] [Indexed: 04/30/2023] Open
Abstract
The founder population of Newfoundland and Labrador (NL) is a unique genetic resource, in part due to its geographic and cultural isolation, where historical records describe a migration of European settlers, primarily from Ireland and England, to NL in the 18th and 19th centuries. Whilst its historical isolation, and increased prevalence of certain monogenic disorders are well appreciated, details of the fine-scale genetic structure and ancestry of the population are lacking. Understanding the genetic origins and background of functional, disease causing, genetic variants would aid genetic mapping efforts in the Province. Here, we leverage dense genome-wide SNP data on 1,807 NL individuals to reveal fine-scale genetic structure in NL that is clustered around coastal communities and correlated with Christian denomination. We show that the majority of NL European ancestry can be traced back to the south-east and south-west of Ireland and England, respectively. We date a substantial population size bottleneck approximately 10-15 generations ago in NL, associated with increased haplotype sharing and autozygosity. Our results reveal insights into the population history of NL and demonstrate evidence of a population conducive to further genetic studies and biomarker discovery.
Collapse
Affiliation(s)
- Edmund Gilbert
- School of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, Ireland.
- FutureNeuro SFI Research Centre, Royal College of Surgeons in Ireland, Dublin, Ireland.
| | - Heather Zurel
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | | | - Sedat Demiriz
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | - Sadra Mirhendi
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | | | | | - Anne M Molloy
- School of Medicine, Trinity College, Dublin, Ireland
| | - Lawrence C Brody
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Walter Bodmer
- Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, UK
| | - Richard A Leach
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | - Roderick E M Scott
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | - Gerald Mugford
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | - Ranjit Randhawa
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | | | - Alison L Symington
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| | - Gianpiero L Cavalleri
- School of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, Ireland
- FutureNeuro SFI Research Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Michael S Phillips
- Sequence Bioinformatics, Inc., St. John's, Newfoundland and Labrador, Canada
| |
Collapse
|
10
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Patterson N, Reich D. ancIBD - Screening for identity by descent segments in human ancient DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531671. [PMID: 36945531 PMCID: PMC10028887 DOI: 10.1101/2023.03.08.531671] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Long DNA sequences shared between two individuals, known as Identical by descent (IBD) segments, are a powerful signal for identifying close and distant biological relatives because they only arise when the pair shares a recent common ancestor. Existing methods to call IBD segments between present-day genomes cannot be straightforwardly applied to ancient DNA data (aDNA) due to typically low coverage and high genotyping error rates. We present ancIBD, a method to identify IBD segments for human aDNA data implemented as a Python package. Our approach is based on a Hidden Markov Model, using as input genotype probabilities imputed based on a modern reference panel of genomic variation. Through simulation and downsampling experiments, we demonstrate that ancIBD robustly identifies IBD segments longer than 8 centimorgan for aDNA data with at least either 0.25x average whole-genome sequencing (WGS) coverage depth or at least 1x average depth for in-solution enrichment experiments targeting a widely used aDNA SNP set ('1240k'). This application range allows us to screen a substantial fraction of the aDNA record for IBD segments and we showcase two downstream applications. First, leveraging the fact that biological relatives up to the sixth degree are expected to share multiple long IBD segments, we identify relatives between 10,156 ancient Eurasian individuals and document evidence of long-distance migration, for example by identifying a pair of two approximately fifth-degree relatives who were buried 1410km apart in Central Asia 5000 years ago. Second, by applying ancIBD, we reveal new details regarding the spread of ancestry related to Steppe pastoralists into Europe starting 5000 years ago. We find that the first individuals in Central and Northern Europe carrying high amounts of Steppe-ancestry, associated with the Corded Ware culture, share high rates of long IBD (12-25 cM) with Yamnaya herders of the Pontic-Caspian steppe, signaling a strong bottleneck and a recent biological connection on the order of only few hundred years, providing evidence that the Yamnaya themselves are a main source of Steppe ancestry in Corded Ware people. We also detect elevated sharing of long IBD segments between Corded Ware individuals and people associated with the Globular Amphora culture (GAC) from Poland and Ukraine, who were Copper Age farmers not yet carrying Steppe-like ancestry. These IBD links appear for all Corded Ware groups in our analysis, indicating that individuals related to GAC contexts must have had a major demographic impact early on in the genetic admixtures giving rise to various Corded Ware groups across Europe. These results show that detecting IBD segments in aDNA can generate new insights both on a small scale, relevant to understanding the life stories of people, and on the macroscale, relevant to large-scale cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germanÿ
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
11
|
VanHise K, Chan JL, Wertheimer S, Handelsman RG, Clark E, Buttle R, Wang ET, Azziz R, Pisarska MD. Regional Variation in Hormonal and Metabolic Parameters of White and Black Women With PCOS in the United States. J Clin Endocrinol Metab 2023; 108:706-712. [PMID: 36218376 DOI: 10.1210/clinem/dgac515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Indexed: 11/19/2022]
Abstract
CONTEXT Ongoing research is needed to determine geo-epidemiologic differences of polycystic ovary syndrome (PCOS). OBJECTIVE Determine hormonal and metabolic parameters of women with PCOS in 2 environments. METHODS Prospective cohort study. SETTING Tertiary-care based specialty clinics in Alabama and California. PATIENTS OR OTHER PARTICIPANTS A total of 1610 women with PCOS by National Institutes of Health Criteria from 1987 to 2010. INTERVENTIONS Interview, physical examination, laboratory studies. MAIN OUTCOMES MEASURES Demographic data, menstrual cycle history, and hormonal and metabolic parameters were collected. Hirsutism was defined as modified Ferriman-Gallwey scores ≥4. Androgen values greater than laboratory reference ranges or >95th percentile of all values were considered elevated (hyperandrogenemia). Metabolic parameters included body mass index (BMI), waist-hip-ratio (WHR), glucose tolerance test, and homeostatic model assessment for insulin resistance (HOMA-IR) scores. RESULTS Alabama women with PCOS were younger with a higher BMI. After adjustment for age and BMI, Alabama women with PCOS were more likely hirsute (adjusted odds ratio [aOR], 1.8; 95% CI, 1.4-2.4; P < 0.001), with elevated HOMA-IR scores (adjusted beta coefficient 3.6; 95% CI, 1.61-5.5; P < 0.001). California women with PCOS were more likely to have hyperandrogenemia (free testosterone aOR, 0.14; 95% CI, 0.11-0.18; P < 0.001; total testosterone aOR, 0.41; 95% CI, 0.33-0.51). Results were similar when stratified by White race. In Black women with PCOS, BMI and WHR did not differ between locations, yet differences in androgen profiles and metabolic dysfunction remained. CONCLUSION Alabama women with PCOS, regardless of Black or White race, were more likely hirsute with metabolic dysfunction, whereas California women with PCOS were more likely to demonstrate hyperandrogenemia, highlighting potential environmental impacts on PCOS.
Collapse
Affiliation(s)
| | | | | | | | | | - Rae Buttle
- Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Erica T Wang
- Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Ricardo Azziz
- Department of Obstetrics and Gynecology, Division of Reproductive Endocrinology and Infertility, University of Alabama at Birmingham, Birmingham, AL 35233, USA
- Department of Obstetrics and Gynecology, Division of Reproductive Endocrinology and Infertility, University of Albany, SUNY, Albany, NY 12208, USA
| | | |
Collapse
|
12
|
Vilà-Valls L, Aizpurua-Iraola J, Casinge S, Bojs K, Flores-Bello A, Font-Porterias N, Comas D. Genomic Insights into the Population History of the Resande or Swedish Travelers. Genome Biol Evol 2023; 15:6991919. [PMID: 36655389 PMCID: PMC9907538 DOI: 10.1093/gbe/evad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/29/2022] [Accepted: 01/09/2023] [Indexed: 01/20/2023] Open
Abstract
The Resande are a minority ethnic group in Sweden, who were characterized by an itinerant way of life, and they have been suggested to originate from the mixture between Swedish and Romani populations. Because the population history of the Resande has been scarcely studied, we analyzed genome-wide genotype array data from unrelated Resande individuals in order to shed light on their origins and demographic history for the first time from a genetic perspective. Our results confirm the Romani-related ancestry of this population and suggest an admixture event between a Romani-like population and a general Swedish-like population that occurred approximately between the mid-18th and mid-19th centuries, two centuries after the arrival of the first historically reported Romani families in Sweden. This inferred date suggests that the Romani group involved in the admixture is related to the pre-18th-century arrivals of Romani in Scandinavia. In addition, a reduction in the population size is detected previous to the admixture event, suggesting a subtle signal of isolation. The present work constitutes a step forward toward a better representation of ethnic minorities and underrepresented groups in population genetic analyses. In order to know in more detail the complete history of human populations, it is time to focus on studying populations that have not been previously considered for a general scenario and that can provide valuable information to fill in the gaps that still remain uncovered.
Collapse
Affiliation(s)
- Laura Vilà-Valls
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Julen Aizpurua-Iraola
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | | | | | | | | | | |
Collapse
|
13
|
The impact of modern admixture on archaic human ancestry in human populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.16.524232. [PMID: 36711776 PMCID: PMC9882123 DOI: 10.1101/2023.01.16.524232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Admixture, the genetic merging of parental populations resulting in mixed ancestry, has occurred frequently throughout the course of human history. Numerous admixture events have occurred between human populations across the world, as well as introgression between humans and archaic humans, Neanderthals and Denisovans. One example are genomes from populations in the Americas, as these are often mosaics of different ancestries due to recent admixture events as part of European colonization. In this study, we analyzed admixed populations from the Americas to assess whether the proportion and location of admixed segments due to recent admixture impact an individual’s archaic ancestry. We identified a positive correlation between non-African ancestry and archaic alleles, as well as a slight enrichment of Denisovan alleles in Indigenous American segments relative to European segments in admixed genomes. We also identify several genes as candidates for adaptive introgression, based on archaic alleles present at high frequency in admixed American populations but low frequency in East Asian populations. These results provide insights into how recent admixture events between modern humans redistributed archaic ancestry in admixed genomes.
Collapse
|
14
|
Stites SD, Coe NB. Let's Not Repeat History's Mistakes: Two Cautions to Scientists on the Use of Race in Alzheimer's Disease and Alzheimer's Disease Related Dementias Research. J Alzheimers Dis 2023; 92:729-740. [PMID: 36806503 PMCID: PMC10123855 DOI: 10.3233/jad-220507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
Abstract
Alzheimer's disease and Alzheimer's disease related dementias (AD/ADRD) research has advanced gene and biomarker technologies to aid identification of individuals at risk for dementia. This innovation is a lynchpin in development of disease-modifying therapies. The emerging science could transform outcomes for patients and families. However, current limitations in the racial representation and inclusion of racial diversity in research limits the relevance of these technologies: AD/ADRD research cohorts used to define biomarker cutoffs are mostly White, despite clinical and epidemiologic research that shows Black populations are among those experiencing the greatest burdens of AD/ADRD. White cohorts alone are insufficient to characterize heterogeneity in disease and in life experiences that can alter AD/ADRD's courses. The National Institute on Aging (NIA) has called for increased racial diversity in AD/ADRD research. While scientists are working to implement NIA's plan to build more diverse research cohorts, they are also seeking out opportunities to consider race in AD/ADRD research. Recently, scientists have posed two ways of including race in AD/ADRD research: ancestry-based verification of race and race-based adjustment of biomarker test results. Both warrant careful examination for how they are impacting AD/ADRD science with respect to specific study objectives and the broader mission of the field. If these research methods are not grounded in pursuit of equity and justice, biases they introduce into AD/ADRD science could perpetuate, or even worsen, disparities in AD/ADRD research and care.
Collapse
Affiliation(s)
- Shana D. Stites
- Department of Psychiatry, Perlman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Norma B. Coe
- Department of Medical Ethics and Health Policy, Perelman School of Medicine and Co-Director of the Population Aging Research Center (PARC), University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
15
|
Sherif FF, Ahmed KS. Unsupervised clustering of SARS-CoV-2 using deep convolutional autoencoder. JOURNAL OF ENGINEERING AND APPLIED SCIENCE 2022. [PMCID: PMC9383682 DOI: 10.1186/s44147-022-00125-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
SARS-CoV-2’s population structure might have a substantial impact on public health management and diagnostics if it can be identified. It is critical to rapidly monitor and characterize their lineages circulating globally for a more accurate diagnosis, improved care, and faster treatment. For a clearer picture of the SARS-CoV-2 population structure, clustering the sequencing data is essential. Here, deep clustering techniques were used to automatically group 29,017 different strains of SARS-CoV-2 into clusters. We aim to identify the main clusters of SARS-CoV-2 population structure based on convolutional autoencoder (CAE) trained with numerical feature vectors mapped from coronavirus Spike peptide sequences. Our clustering findings revealed that there are six large SARS-CoV-2 population clusters (C1, C2, C3, C4, C5, C6). These clusters contained 43 unique lineages in which the 29,017 publicly accessible strains were dispersed. In all the resulting six clusters, the genetic distances within the same cluster (intra-cluster distances) are less than the distances between inter-clusters (P-value 0.0019, Wilcoxon rank-sum test). This indicates substantial evidence of a connection between the cluster’s lineages. Furthermore, comparisons of the K-means and hierarchical clustering methods have been examined against the proposed deep learning clustering method. The intra-cluster genetic distances of the proposed method were smaller than those of K-means alone and hierarchical clustering methods. We used T-distributed stochastic-neighbor embedding (t-SNE) to show the outcomes of the deep learning clustering. The strains were isolated correctly between clusters in the t-SNE plot. Our results showed that the (C5) cluster exclusively includes Gamma lineage (P.1) only, suggesting that strains of P.1 in C5 are more diversified than those in the other clusters. Our study indicates that the genetic similarity between strains in the same cluster enables a better understanding of the major features of the unknown population lineages when compared to some of the more prevalent viral isolates. This information helps researchers figure out how the virus changed over time and spread to people all over the world.
Collapse
|
16
|
Knight SC, McCurdy SR, Rhead B, Coignet MV, Park DS, Roberts GHL, Berkowitz ND, Zhang M, Turissini D, Delgado K, Pavlovic M, Haug Baltzell AK, Guturu H, Rand KA, Girshick AR, Hong EL, Ball CA. COVID-19 susceptibility and severity risks in a cross-sectional survey of over 500 000 US adults. BMJ Open 2022; 12:e049657. [PMID: 36223959 PMCID: PMC9561492 DOI: 10.1136/bmjopen-2021-049657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVES The enormous toll of the COVID-19 pandemic has heightened the urgency of collecting and analysing population-scale datasets in real time to monitor and better understand the evolving pandemic. The objectives of this study were to examine the relationship of risk factors to COVID-19 susceptibility and severity and to develop risk models to accurately predict COVID-19 outcomes using rapidly obtained self-reported data. DESIGN A cross-sectional study. SETTING AncestryDNA customers in the USA who consented to research. PARTICIPANTS The AncestryDNA COVID-19 Study collected self-reported survey data on symptoms, outcomes, risk factors and exposures for over 563 000 adult individuals in the USA in just under 4 months, including over 4700 COVID-19 cases as measured by a self-reported positive test. RESULTS We replicated previously reported associations between several risk factors and COVID-19 susceptibility and severity outcomes, and additionally found that differences in known exposures accounted for many of the susceptibility associations. A notable exception was elevated susceptibility for men even after adjusting for known exposures and age (adjusted OR=1.36, 95% CI=1.19 to 1.55). We also demonstrated that self-reported data can be used to build accurate risk models to predict individualised COVID-19 susceptibility (area under the curve (AUC)=0.84) and severity outcomes including hospitalisation and critical illness (AUC=0.87 and 0.90, respectively). The risk models achieved robust discriminative performance across different age, sex and genetic ancestry groups within the study. CONCLUSIONS The results highlight the value of self-reported epidemiological data to rapidly provide public health insights into the evolving COVID-19 pandemic.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Miao Zhang
- Ancestry.com, San Francisco, California, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Avadhanam S, Williams AL. Simultaneous inference of parental admixture proportions and admixture times from unphased local ancestry calls. Am J Hum Genet 2022; 109:1405-1420. [PMID: 35908549 PMCID: PMC9388397 DOI: 10.1016/j.ajhg.2022.06.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 06/24/2022] [Indexed: 02/06/2023] Open
Abstract
Population genetic analyses of local ancestry tracts routinely assume that the ancestral admixture process is identical for both parents of an individual, an assumption that may be invalid when considering recent admixture. Here, we present Parental Admixture Proportion Inference (PAPI), a Bayesian tool for inferring the admixture proportions and admixture times for each parent of a single admixed individual. PAPI analyzes unphased local ancestry tracts and has two components: a binomial model that leverages genome-wide ancestry fractions to infer parental admixture proportions and a hidden Markov model (HMM) that infers admixture times from tract lengths. Crucially, the HMM accounts for unobserved within-ancestry recombination by approximating the pedigree crossover dynamics, enabling inference of parental admixture times. In simulations, we find that PAPI's admixture proportion estimates deviate from the truth by 0.047 on average, outperforming ANCESTOR and PedMix by 46.0% and 57.6%, respectively. Moreover, PAPI's admixture time estimates were strongly correlated with the truth (R=0.76) but have an average downward bias of 1.01 generations that is partly attributable to inaccuracies in local ancestry inference. As an illustration of its utility, we ran PAPI on African American genotypes from the PAGE study (N = 5,786) and found strong evidence of assortative mating by ancestry proportion: couples' ancestry proportions are highly correlated (R = 0.87) and are closer to each other than expected under random mating (p < 10-6). We anticipate that PAPI will be useful in studying the population dynamics of admixture and will also be of interest to individuals seeking to learn about their personal genealogies.
Collapse
Affiliation(s)
- Siddharth Avadhanam
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
18
|
Caro-Consuegra R, Nieves-Colón MA, Rawls E, Rubin-de-Celis V, Lizárraga B, Vidaurre T, Sandoval K, Fejerman L, Stone AC, Moreno-Estrada A, Bosch E. Uncovering signals of positive selection in Peruvian populations from three ecological regions. Mol Biol Evol 2022; 39:6647595. [PMID: 35860855 PMCID: PMC9356722 DOI: 10.1093/molbev/msac158] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Perú hosts extremely diverse ecosystems which can be broadly classified into three major ecoregions: the Pacific desert coast, the Andean highlands, and the Amazon rainforest. Since its initial peopling approximately 12,000 years ago, the populations inhabiting such ecoregions might have differentially adapted to their contrasting environmental pressures. Previous studies have described several candidate genes underlying adaptation to hypobaric hypoxia among Andean highlanders. However, the adaptive genetic diversity of coastal and rainforest populations has been less studied. Here, we gathered genome-wide SNP-array data from 286 Peruvians living across the three ecoregions and analysed signals of recent positive selection through population differentiation and haplotype-based selection scans. Among highland populations, we identify candidate genes related to cardiovascular function (TLL1, DUSP27, TBX5, PLXNA4, SGCD), to the Hypoxia-Inducible Factor pathway (TGFA, APIP), to skin pigmentation (MITF), as well as to glucose (GLIS3) and glycogen metabolism (PPP1R3C, GANC). In contrast, most signatures of adaptation in coastal and rainforest populations comprise candidate genes related to the immune system (including SIGLEC8, TRIM21, CD44 and ICAM1 in the coast; CBLB and PRDM1 in rainforest and the BRD2- HLA-DOA- HLA-DPA1 region in both), possibly as a result of strong pathogen-driven selection. This study identifies candidate genes related to human adaptation to the diverse environments of South America.
Collapse
Affiliation(s)
- Rocio Caro-Consuegra
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Maria A Nieves-Colón
- Laboratorio Nacional de Genómica para la Biodiversidad, Unidad de Genómica Avanzada (UGA-LANGEBIO), CINVESTAV, Irapuato, Guanajuato, Mexico.,School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA.,Department of Anthropology, University of Minnesota Twin Cities, Minneapolis, MN, USA
| | - Erin Rawls
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA
| | - Verónica Rubin-de-Celis
- Laboratorio de Genómica Molecular Evolutiva, Instituto de Ciencia y Tecnología, Universidad Ricardo Palma, Lima, Perú
| | - Beatriz Lizárraga
- Emeritus Professor, Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Perú
| | | | - Karla Sandoval
- Laboratorio Nacional de Genómica para la Biodiversidad, Unidad de Genómica Avanzada (UGA-LANGEBIO), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Laura Fejerman
- Department of Public Health Sciences, University of California Davis, Davis, CA, USA
| | - Anne C Stone
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, USA.,Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Andrés Moreno-Estrada
- Laboratorio Nacional de Genómica para la Biodiversidad, Unidad de Genómica Avanzada (UGA-LANGEBIO), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Elena Bosch
- Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.,Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM), Reus, Spain
| |
Collapse
|
19
|
Revealing the recent demographic history of Europe via haplotype sharing in the UK Biobank. Proc Natl Acad Sci U S A 2022; 119:e2119281119. [PMID: 35696575 PMCID: PMC9233301 DOI: 10.1073/pnas.2119281119] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Haplotype-based analyses have recently been leveraged to interrogate the fine-scale structure in specific geographic regions, notably in Europe, although an equivalent haplotype-based understanding across the whole of Europe with these tools is lacking. Furthermore, study of identity-by-descent (IBD) sharing in a large sample of haplotypes across Europe would allow a direct comparison between different demographic histories of different regions. The UK Biobank (UKBB) is a population-scale dataset of genotype and phenotype data collected from the United Kingdom, with established sampling of worldwide ancestries. The exact content of these non-UK ancestries is largely uncharacterized, where study could highlight valuable intracontinental ancestry references with deep phenotyping within the UKBB. In this context, we sought to investigate the sample of European ancestry captured in the UKBB. We studied the haplotypes of 5,500 UKBB individuals with a European birthplace; investigated the population structure and demographic history in Europe, showing in parallel the variety of footprints of demographic history in different genetic regions around Europe; and expand knowledge of the genetic landscape of the east and southeast of Europe. Providing an updated map of European genetics, we leverage IBD-segment sharing to explore the extent of population isolation and size across the continent. In addition to building and expanding upon previous knowledge in Europe, our results show the UKBB as a source of diverse ancestries beyond Britain. These worldwide ancestries sampled in the UKBB may complement and inform researchers interested in specific communities or regions not limited to Britain.
Collapse
|
20
|
Using an Unsupervised Clustering Model to Detect the Early Spread of SARS-CoV-2 Worldwide. Genes (Basel) 2022; 13:genes13040648. [PMID: 35456454 PMCID: PMC9030792 DOI: 10.3390/genes13040648] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 03/29/2022] [Accepted: 04/05/2022] [Indexed: 02/04/2023] Open
Abstract
Deciphering the population structure of SARS-CoV-2 is critical to inform public health management and reduce the risk of future dissemination. With the continuous accruing of SARS-CoV-2 genomes worldwide, discovering an effective way to group these genomes is critical for organizing the landscape of the population structure of the virus. Taking advantage of recently published state-of-the-art machine learning algorithms, we used an unsupervised deep learning clustering algorithm to group a total of 16,873 SARS-CoV-2 genomes. Using single nucleotide polymorphisms as input features, we identified six major subtypes of SARS-CoV-2. The proportions of the clusters across the continents revealed distinct geographical distributions. Comprehensive analysis indicated that both genetic factors and human migration factors shaped the specific geographical distribution of the population structure. This study provides a different approach using clustering methods to study the population structure of a never-seen-before and fast-growing species such as SARS-CoV-2. Moreover, clustering techniques can be used for further studies of local population structures of the proliferating virus.
Collapse
|
21
|
Coop G, Przeworski M. Lottery, luck, or legacy. A review of "The Genetic Lottery: Why DNA matters for social equality". Evolution 2022; 76:846-853. [PMID: 35225362 PMCID: PMC9313868 DOI: 10.1111/evo.14449] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 01/26/2022] [Indexed: 01/30/2023]
Abstract
A book review of "The genetic lottery: why DNA matters for social equality." (Princeton University Press, 2021) by Kathryn Paige Harden.
Collapse
Affiliation(s)
- Graham Coop
- Center for Population Biology and Department of Evolution and EcologyUniversity of California, DavisDavisCaliforniaUSA
| | - Molly Przeworski
- Department of Biological Sciences and Department of Systems BiologyColumbia UniversityNew YorkUSA
| |
Collapse
|
22
|
Parcha V, Heindl B, Kalra R, Bress A, Rao S, Pandey A, Gower B, Irvin MR, McDonald MLN, Li P, Arora G, Arora P. Genetic European Ancestry and Incident Diabetes in Black Individuals: Insights From the SPRINT Trial. CIRCULATION. GENOMIC AND PRECISION MEDICINE 2022; 15:e003468. [PMID: 35089798 PMCID: PMC8847245 DOI: 10.1161/circgen.121.003468] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
BACKGROUND Black individuals have high incident diabetes risk, despite having paradoxically lower triglyceride and higher HDL (high-density lipoprotein) cholesterol levels. The basis of this is poorly understood. We evaluated the participants of SPRINT (Systolic Blood Pressure Intervention Trial) to assess the association of estimated European genetic ancestry with the risk of incident diabetes in self-identified Black individuals. METHODS Self-identified non-Hispanic Black SPRINT participants free of diabetes at baseline were included. Black participants were stratified into tertiles (T1-T3) of European ancestry proportions estimated using 106 biallelic ancestry informative genetic markers. The multivariable-adjusted association of European ancestry proportion with indices of baseline metabolic syndrome (ie, fasting plasma glucose, triglycerides, HDL cholesterol, body mass index, and blood pressure) was assessed. Multivariable-adjusted Cox regression determined the risk of incident diabetes (fasting plasma glucose ≥126 mg/dL or self-reported diabetes treatment) across tertiles of European ancestry proportion. RESULTS Among 2466 Black SPRINT participants, a higher European ancestry proportion was independently associated with higher baseline triglyceride and lower HDL cholesterol levels (P<0.001 for both). European ancestry proportion was not associated with baseline fasting plasma glucose, body mass index, and blood pressure (P>0.05). Compared with the first tertile, those in the second (hazard ratio, 0.64 [95% CI, 0.45-0.90]) and third tertiles (hazard ratio, 0.61 [95% CI, 0.44-0.89]) of the European ancestry proportion had a lower risk of incident diabetes. A 5% point higher European ancestry was associated with a 29% lower risk of incident diabetes (hazard ratio, 0.71 [95% CI, 0.55-0.93]). There was no evidence of a differential association between the European ancestry proportion tertiles and incident diabetes between those randomized to intensive versus standard blood pressure treatment. CONCLUSIONS The higher risk of incident diabetes in Black individuals may have genetic determinants in addition to adverse social factors. Further research may help understand the interplay between biological and social determinants of cardiometabolic health in Black individuals. Registration: URL: https://www.clinicaltrials.gov; Unique identifier: NCT01206062.
Collapse
Affiliation(s)
- Vibhu Parcha
- Division of Cardiovascular Disease, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Brittain Heindl
- Division of Cardiovascular Disease, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Rajat Kalra
- Cardiovascular Division, University of Minnesota, Minneapolis, MN, USA
| | - Adam Bress
- Department of Population Health Sciences, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Shreya Rao
- Division of Cardiology, Department of Internal Medicine, UT Southwestern Medical Center, Dallas, TX, USA
| | - Ambarish Pandey
- Division of Cardiology, Department of Internal Medicine, UT Southwestern Medical Center, Dallas, TX, USA
| | - Barbara Gower
- Department of Nutrition Sciences, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Marguerite R. Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Merry-Lynn N. McDonald
- Division of Pulmonary, Allergy, and Critical Care, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Peng Li
- School of Nursing, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Garima Arora
- Division of Cardiovascular Disease, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Pankaj Arora
- Division of Cardiovascular Disease, University of Alabama at Birmingham, Birmingham, AL, USA
- Section of Cardiology, Birmingham Veterans Affairs Medical Center, Birmingham, AL, USA
| |
Collapse
|
23
|
Karim MR, Cochez M, Zappa A, Sahay R, Rebholz-Schuhmann D, Beyan O, Decker S. Convolutional Embedded Networks for Population Scale Clustering and Bio-Ancestry Inferencing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:369-382. [PMID: 32750845 DOI: 10.1109/tcbb.2020.2994649] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The study of genetic variants (GVs) can help find correlating population groups and to identify cohorts that are predisposed to common diseases and explain differences in disease susceptibility and how patients react to drugs. Machine learning techniques are increasingly being applied to identify interacting GVs to understand their complex phenotypic traits. Since the performance of a learning algorithm not only depends on the size and nature of the data but also on the quality of underlying representation, deep neural networks (DNNs) can learn non-linear mappings that allow transforming GVs data into more clustering and classification friendly representations than manual feature selection. In this paper, we propose convolutional embedded networks (CEN) in which we combine two DNN architectures called convolutional embedded clustering (CEC) and convolutional autoencoder (CAE) classifier for clustering individuals and predicting geographic ethnicity based on GVs, respectively. We employed CAE-based representation learning to 95 million GVs from the '1000 genomes' (covering 2,504 individuals from 26 ethnic origins) and 'Simons genome diversity' (covering 279 individuals from 130 ethnic origins) projects. Quantitative and qualitative analyses with a focus on accuracy and scalability show that our approach outperforms state-of-the-art approaches such as VariantSpark and ADMIXTURE. In particular, CEC can cluster targeted population groups in 22 hours with an adjusted rand index (ARI) of 0.915, the normalized mutual information (NMI) of 0.92, and the clustering accuracy (ACC) of 89 percent. Contrarily, the CAE classifier can predict the geographic ethnicity of unknown samples with an F1 and Mathews correlation coefficient (MCC) score of 0.9004 and 0.8245, respectively. Further, to provide interpretations of the predictions, we identify significant biomarkers using gradient boosted trees (GBT) and SHapley Additive exPlanations (SHAP). Overall, our approach is transparent and faster than the baseline methods, and scalable for 5 to 100 percent of the full human genome.
Collapse
|
24
|
Motazedi E, Cheng W, Thomassen JQ, Frei O, Rongve A, Athanasiu L, Bahrami S, Shadrin A, Ulstein I, Stordal E, Brækhus A, Saltvedt I, Sando SB, O’Connell KS, Hindley G, van der Meer D, Bergh S, Nordestgaard BG, Tybjærg-Hansen A, Bråthen G, Pihlstrøm L, Djurovic S, Frikke-Schmidt R, Fladby T, Aarsland D, Selbæk G, Seibert TM, Dale AM, Fan CC, Andreassen OA. Using Polygenic Hazard Scores to Predict Age at Onset of Alzheimer's Disease in Nordic Populations. J Alzheimers Dis 2022; 88:1533-1544. [PMID: 35848024 PMCID: PMC10022308 DOI: 10.3233/jad-220174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Polygenic hazard scores (PHS) estimate age-dependent genetic risk of late-onset Alzheimer's disease (AD), but there is limited information about the performance of PHS on real-world data where the population of interest differs from the model development population and part of the model genotypes are missing or need to be imputed. OBJECTIVE The aim of this study was to estimate age-dependent risk of late-onset AD using polygenic predictors in Nordic populations. METHODS We used Desikan PHS model, based on Cox proportional hazards assumption, to obtain age-dependent hazard scores for AD from individual genotypes in the Norwegian DemGene cohort (n = 2,772). We assessed the risk discrimination and calibration of Desikan model and extended it by adding new genotype markers (the Desikan Nordic model). Finally, we evaluated both Desikan and Desikan Nordic models in two independent Danish cohorts: The Copenhagen City Heart Study (CCHS) cohort (n = 7,643) and The Copenhagen General Population Study (CGPS) cohort (n = 10,886). RESULTS We showed a robust prediction efficiency of Desikan model in stratifying AD risk groups in Nordic populations, even when some of the model SNPs were missing or imputed. We attempted to improve Desikan PHS model by adding new SNPs to it, but we still achieved similar risk discrimination and calibration with the extended model. CONCLUSION PHS modeling has the potential to guide the timing of treatment initiation based on individual risk profiles and can help enrich clinical trials with people at high risk to AD in Nordic populations.
Collapse
Affiliation(s)
- Ehsan Motazedi
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
| | - Weiqiu Cheng
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
| | - Jesper Q. Thomassen
- Department of Clinical Biochemistry, Copenhagen University Hospital – Rigshospitalet, 2100 Copenhagen, Denmark
| | - Oleksandr Frei
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
- Center for Bioinformatics, Department of Informatics, University of Oslo, PO box 1080, Blindern, 0316 Oslo, Norway
| | - Arvid Rongve
- Department of Clinical Medicine, University of Bergen, 5020 Bergen, Norway
| | - Lavinia Athanasiu
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
| | - Shahram Bahrami
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
| | - Alexey Shadrin
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
| | - Ingun Ulstein
- Department of Geriatric Medicine, Oslo University Hospital, Ullevål, 0424 Oslo, Norway
| | - Eystein Stordal
- Department of Neuromedicine and Movement Science (INB), NTNU, Faculty of Medicine and Health Sciences, N-7491 Trondheim, Norway
- Clinic of Psychiatry, Namsos Hospital, 7801 Namsos, Norway
| | - Anne Brækhus
- Department of Geriatric Medicine, Oslo University Hospital, Ullevål, 0424 Oslo, Norway
- Department of Neurology, Oslo University Hospital, 0424 Oslo, Norway
| | - Ingvild Saltvedt
- Department of Neuromedicine and Movement Science (INB), NTNU, Faculty of Medicine and Health Sciences, N-7491 Trondheim, Norway
- Department of geriatric medicine, Clinic of Medicine, St. Olavs Hospital, Trondheim university hospital, Trondheim, Norway
| | - Sigrid B. Sando
- Department of Neuromedicine and Movement Science (INB), NTNU, Faculty of Medicine and Health Sciences, N-7491 Trondheim, Norway
- University Hospital of Trondheim, Department of Neurology and Clinical Neurophysiology, Postboks 3250 Torgarden, N-7006 Trondheim, Norway
| | - Kevin S. O’Connell
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
| | - Guy Hindley
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigny Park, London, SE5 8AB
| | - Dennis van der Meer
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
- School for Mental Health and Neuroscience, Maastricht University, the Netherlands
| | - Sverre Bergh
- Research center for Age-related Functional Decline and Disease, Innlandet Hospital Trust, 2381 Brumunddal, Norway
- Norwegian National Centre for Ageing and Health, Vestfold Hospital Trust, 3103 Tønsberg, Norway
| | - Børge G. Nordestgaard
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical Biochemistry, Copenhagen University Hospital – Herlev Gentofte, 2730 Herlev, Denmark
| | - Anne Tybjærg-Hansen
- Department of Clinical Biochemistry, Copenhagen University Hospital – Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Geir Bråthen
- Department of Neuromedicine and Movement Science (INB), NTNU, Faculty of Medicine and Health Sciences, N-7491 Trondheim, Norway
- University Hospital of Trondheim, Department of Neurology and Clinical Neurophysiology, Postboks 3250 Torgarden, N-7006 Trondheim, Norway
| | - Lasse Pihlstrøm
- Department of Neurology, Oslo University Hospital, 0424 Oslo, Norway
| | - Srdjan Djurovic
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
- NORMENT Centre, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Ruth Frikke-Schmidt
- Department of Clinical Biochemistry, Copenhagen University Hospital – Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Tormod Fladby
- Department of Neuromedicine and Movement Science (INB), NTNU, Faculty of Medicine and Health Sciences, N-7491 Trondheim, Norway
- Klinikk for indremedisin og lab fag (AHUSKIL), Akershus University Hospital, 1478 Lørenskog, Norway
| | - Dag Aarsland
- Department of Old-Age Psychiatry, Stavanger University Hospital, 4011 Stavanger, Norway
- Department of Old Age Psychiatry, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, PO Box P070, De Crespigny Park, London SE5 8AF
| | - Geir Selbæk
- Department of Geriatric Medicine, Oslo University Hospital, Ullevål, 0424 Oslo, Norway
- Norwegian National Centre for Ageing and Health, Vestfold Hospital Trust, 3103 Tønsberg, Norway
- Faculty of Medicine, University of Oslo, PO BOX 1078 Blindern, 0316 Oslo, Norway
| | - Tyler M. Seibert
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
- Department of Radiology, University of California San Diego, La Jolla, CA, USA
- Department of Radiation Medicine, University of California San Diego, La Jolla, CA, USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA
| | - Anders M. Dale
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
- Department of Radiology, University of California San Diego, La Jolla, CA, USA
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Department of Neurosciences, University of California San Diego, La Jolla, CA, USA
| | - Chun C. Fan
- Center for Multimodal Imaging and Genetics, University of California San Diego, La Jolla, CA 92093, USA
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, USA
- Population Neuroscience and Genetics Lab, University of California San Diego, La Jolla, CA, USA
| | - Ole A. Andreassen
- NORMENT Centre, Institute of Clinical Medicine, University of Oslo and Division of Mental Health and Addiction, Oslo University Hospital, 0407 Oslo, Norway
| |
Collapse
|
25
|
Kun Á. Is there still evolution in the human population? Biol Futur 2022; 73:359-374. [PMID: 36592324 PMCID: PMC9806833 DOI: 10.1007/s42977-022-00146-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 12/08/2022] [Indexed: 01/03/2023]
Abstract
It is often claimed that humanity has stopped evolving because modern medicine erased all selection on survival. Even if that would be true, and it is not, there would be other mechanisms of evolution which could still led to changes in allelic frequencies. Here I show, by applying basic evolutionary genetics knowledge, that we expect humanity to evolve. The results from genome sequencing projects have repeatedly affirmed that there are still recent signs of selection in our genomes. I give some examples of such adaptation. Then I briefly discuss what our evolutionary future has in store for us.
Collapse
Affiliation(s)
- Ádám Kun
- grid.5591.80000 0001 2294 6276Department of Plant Systematics, Ecology and Theoretical Biology, Eötvös University, Budapest, Hungary ,Parmenides Center for the Conceptual Foundations of Science, Pöcking, Germany ,grid.481817.3Institute of Evolution, Centre for Ecological Research, Budapest, Hungary ,grid.5018.c0000 0001 2149 4407MTA-ELTE Theoretical Biology and Evolutionary Ecology Research Group, Budapest, Hungary ,grid.5018.c0000 0001 2149 4407MTA-ELTE-MTM Ecology Research Group, Budapest, Hungary
| |
Collapse
|
26
|
Flores-Bello A, Font-Porterias N, Aizpurua-Iraola J, Duarri-Redondo S, Comas D. The genetic scenario of Mercheros: an under-represented group within the Iberian Peninsula. BMC Genomics 2021; 22:897. [PMID: 34911433 PMCID: PMC8672588 DOI: 10.1186/s12864-021-08203-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 11/18/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The general picture of human genetic variation has been vastly depicted in the last years, yet many populations remain broadly understudied. In this work, we analyze for the first time the Merchero population, a Spanish minority ethnic group that has been scarcely studied and historically persecuted. Mercheros have been roughly characterised by an itinerant history, common traditional occupations, and the usage of their own language. RESULTS Here, we examine the demographic history and genetic scenario of Mercheros, by using genome-wide array data, whole mitochondrial sequences, and Y chromosome STR markers from 25 individuals. These samples have been complemented with a wide-range of present-day populations from Western Eurasia and North Africa. Our results show that the genetic diversity of Mercheros is explained within the context of the Iberian Peninsula, evidencing a modest signal of Roma admixture. In addition, Mercheros present low genetic isolation and intrapopulation heterogeneity. CONCLUSIONS This study represents the first genetic characterisation of the Merchero population, depicting their fine-scale ancestry components and genetic scenario within the Iberian Peninsula. Since ethnicity is not only influenced by genetic ancestry but also cultural factors, other studies from multiple disciplines are needed to further explore the Merchero population. As with Mercheros, there is a considerable gap of underrepresented populations and ethnic groups in publicly available genetic data. Thus, we encourage the consideration of more ethnically diverse population panels in human genetic studies, as an attempt to improve the representation of human populations and better reconstruct their fine-scale history.
Collapse
Affiliation(s)
- André Flores-Bello
- Departament de Ciències de la Salut i de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Neus Font-Porterias
- Departament de Ciències de la Salut i de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Julen Aizpurua-Iraola
- Departament de Ciències de la Salut i de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - Sara Duarri-Redondo
- Departament de Ciències de la Salut i de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, 08003, Barcelona, Spain
| | - David Comas
- Departament de Ciències de la Salut i de la Vida, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, 08003, Barcelona, Spain.
| |
Collapse
|
27
|
Li Y, Liu Q, Zeng Z, Luo Y. Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.09.04.283358. [PMID: 34845455 PMCID: PMC8629198 DOI: 10.1101/2020.09.04.283358] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Identifying the population structure of the newly emerged coronavirus SARS-CoV-2 has significant potential to inform public health management and diagnosis. As SARS-CoV-2 sequencing data accrued, grouping them into clusters is important for organizing the landscape of the population structure of the virus. Due to the limited prior information on the newly emerged coronavirus, we utilized four different clustering algorithms to group 16,873 SARS-CoV-2 strains, which automatically enables the identification of spatial structure for SARS-CoV-2. A total of six distinct genomic clusters were identified using mutation profiles as input features. Comparison of the clustering results reveals that the four algorithms produced highly consistent results, but the state-of-the-art unsupervised deep learning clustering algorithm performed best and produced the smallest intra-cluster pairwise genetic distances. The varied proportions of the six clusters within different continents revealed specific geographical distributions. In particular, our analysis found that Oceania was the only continent on which the strains were dispersively distributed into six clusters. In summary, this study provides a concrete framework for the use of clustering methods to study the global population structure of SARS-CoV-2. In addition, clustering methods can be used for future studies of variant population structures in specific regions of these fast-growing viruses.
Collapse
Affiliation(s)
- Yawei Li
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Qingyun Liu
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Zexian Zeng
- Department of Data Science, Dana Farber Cancer Institute, Harvard T. H. Chan School of Public Health, Boston, MA 02115, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
28
|
Hateley S, Lopez-Izquierdo A, Jou CJ, Cho S, Schraiber JG, Song S, Maguire CT, Torres N, Riedel M, Bowles NE, Arrington CB, Kennedy BJ, Etheridge SP, Lai S, Pribble C, Meyers L, Lundahl D, Byrnes J, Granka JM, Kauffman CA, Lemmon G, Boyden S, Scott Watkins W, Karren MA, Knight S, Brent Muhlestein J, Carlquist JF, Anderson JL, Chahine KG, Shah KU, Ball CA, Benjamin IJ, Yandell M, Tristani-Firouzi M. The history and geographic distribution of a KCNQ1 atrial fibrillation risk allele. Nat Commun 2021; 12:6442. [PMID: 34750360 PMCID: PMC8575962 DOI: 10.1038/s41467-021-26741-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 10/20/2021] [Indexed: 11/08/2022] Open
Abstract
The genetic architecture of atrial fibrillation (AF) encompasses low impact, common genetic variants and high impact, rare variants. Here, we characterize a high impact AF-susceptibility allele, KCNQ1 R231H, and describe its transcontinental geographic distribution and history. Induced pluripotent stem cell-derived cardiomyocytes procured from risk allele carriers exhibit abbreviated action potential duration, consistent with a gain-of-function effect. Using identity-by-descent (IBD) networks, we estimate the broad- and fine-scale population ancestry of risk allele carriers and their relatives. Analysis of ancestral migration routes reveals ancestors who inhabited Denmark in the 1700s, migrated to the Northeastern United States in the early 1800s, and traveled across the Midwest to arrive in Utah in the late 1800s. IBD/coalescent-based allele dating analysis reveals a relatively recent origin of the AF risk allele (~5000 years). Thus, our approach broadens the scope of study for disease susceptibility alleles to the context of human migration and ancestral origins.
Collapse
Affiliation(s)
| | | | - Chuanchau J Jou
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Scott Cho
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | | | - Colin T Maguire
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Natalia Torres
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Michael Riedel
- Cardiovascular Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Neil E Bowles
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Cammon B Arrington
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Brett J Kennedy
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Susan P Etheridge
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Shuping Lai
- Cardiovascular Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Chase Pribble
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Lindsay Meyers
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Derek Lundahl
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | | | - Christopher A Kauffman
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Gordon Lemmon
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Steven Boyden
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - W Scott Watkins
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Mary Anne Karren
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | - Khushi U Shah
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | - Ivor J Benjamin
- Cardiovascular Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Martin Tristani-Firouzi
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA.
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA.
| |
Collapse
|
29
|
Zimmerman KD, Schurr TG, Chen W, Nayak U, Mychaleckyj JC, Quet Q, Moultrie LH, Divers J, Keene KL, Kamen DL, Gilkeson GS, Hunt KJ, Spruill IJ, Fernandes JK, Aldrich MC, Reich D, Garvey WT, Langefeld CD, Sale MM, Ramos PS. Genetic landscape of Gullah African Americans. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2021; 175:905-919. [PMID: 34008864 PMCID: PMC8286328 DOI: 10.1002/ajpa.24333] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 03/30/2021] [Accepted: 04/17/2021] [Indexed: 01/20/2023]
Abstract
OBJECTIVES Gullah African Americans are descendants of formerly enslaved Africans living in the Sea Islands along the coast of the southeastern U.S., from North Carolina to Florida. Their relatively high numbers and geographic isolation were conducive to the development and preservation of a unique culture that retains deep African features. Although historical evidence supports a West-Central African ancestry for the Gullah, linguistic and cultural evidence of a connection to Sierra Leone has led to the suggestion of this country/region as their ancestral home. This study sought to elucidate the genetic structure and ancestry of the Gullah. MATERIALS AND METHODS We leveraged whole-genome genotype data from Gullah, African Americans from Jackson, Mississippi, African populations from Sierra Leone, and population reference panels from Africa and Europe to infer population structure, ancestry proportions, and global estimates of admixture. RESULTS Relative to non-Gullah African Americans from the Southeast US, the Gullah exhibited higher mean African ancestry, lower European admixture, a similarly small Native American contribution, and increased male-biased European admixture. A slightly tighter bottleneck in the Gullah 13 generations ago suggests a largely shared demographic history with non-Gullah African Americans. Despite a slightly higher relatedness to populations from Sierra Leone, our data demonstrate that the Gullah are genetically related to many West African populations. DISCUSSION This study confirms that subtle differences in African American population structure exist at finer regional levels. Such observations can help to inform medical genetics research in African Americans, and guide the interpretation of genetic data used by African Americans seeking to explore ancestral identities.
Collapse
Affiliation(s)
- Kip D. Zimmerman
- Center for Precision MedicineWake Forest School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Theodore G. Schurr
- Department of AnthropologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Wei‐Min Chen
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Uma Nayak
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Josyf C. Mychaleckyj
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Queen Quet
- Gullah/Geechee NationSt. Helena IslandSouth CarolinaUSA
| | - Lee H. Moultrie
- Lee H. Moultrie & AssociatesNorth CharlestonSouth CarolinaUSA
| | - Jasmin Divers
- Department of Health Services ResearchNew York University Winthrop HospitalMineolaNew YorkUSA
| | - Keith L. Keene
- Department of BiologyEast Carolina UniversityGreenvilleNorth CarolinaUSA
- Center for Health DisparitiesEast Carolina University Brody School of MedicineGreenvilleNorth CarolinaUSA
| | - Diane L. Kamen
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Gary S. Gilkeson
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Kelly J. Hunt
- Department of Public Health SciencesMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Ida J. Spruill
- College of NursingMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Jyotika K. Fernandes
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Melinda C. Aldrich
- Department of Thoracic SurgeryVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
- Vanderbilt Genetics InstituteVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - David Reich
- Department of GeneticsHarvard Medical SchoolBostonMassachusettsUSA
- Howard Hughes Medical InstituteHarvard Medical SchoolBostonMassachusettsUSA
- Broad Institute of MIT and HarvardCambridgeMassachusettsUSA
- Department of Human Evolutionary BiologyHarvard UniversityCambridgeMassachusettsUSA
| | - W. Timothy Garvey
- Department of Nutrition ScienceUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - Carl D. Langefeld
- Center for Precision MedicineWake Forest School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Michèle M. Sale
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Paula S. Ramos
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
- Department of Public Health SciencesMedical University of South CarolinaCharlestonSouth CarolinaUSA
| |
Collapse
|
30
|
Rapid detection of identity-by-descent tracts for mega-scale datasets. Nat Commun 2021; 12:3546. [PMID: 34112768 PMCID: PMC8192555 DOI: 10.1038/s41467-021-22910-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/01/2021] [Indexed: 01/08/2023] Open
Abstract
The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections.
Collapse
|
31
|
Pal M, Lace B, Labrie Y, Laflamme N, Rioux N, Setty ST, Dugas M, Gosselin L, Droit A, Chrestian N, Rivest S. A founder mutation in the PLPBP gene in families from Saguenay-Lac-St-Jean region affected by a pyridoxine-dependent epilepsy. JIMD Rep 2021; 59:32-41. [PMID: 33977028 PMCID: PMC8100403 DOI: 10.1002/jmd2.12196] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 11/19/2020] [Accepted: 12/16/2020] [Indexed: 11/24/2022] Open
Abstract
Pyridoxine-dependent epilepsy (PDE) is a relatively rare subgroup of epileptic disorders. They generally present in infancy as an early onset epileptic encephalopathy or seizures, refractory to standard treatments, with rapid and variable responses to vitamin B6 treatment. Whole exome sequencing of three unrelated families identified homozygous pathogenic mutation c.370_373del, p.Asp124fs in PLPBP gene in five persons. Haplotype analysis showed a single shared profile for the affected persons and their parents, leading to a hypothesis about founder effect of the mutation in Saguenay-Lac-St-Jean region of French Canadians. All affected probands also shared one single mitochondrial haplotype T2b3 and two rare variations in the mitochondrial genome m.801A>G and m.5166A>G suggesting that a single individual female introduced PLPBP mutation c.370_373del, p.Asp124fs in Quebec. The mutation p.Asp124fs causes a severe disease phenotype with delayed myelination and cortical/subcortical brain atrophy. The most noteworthy radiological finding in this Quebec founder mutation is the presence of the temporal cysts that can be used as a marker of the disease. Also, both patients, who are alive, had a history of prenatal supplements taken by their mothers as antiemetic medication with high doses of pyridoxine. In the context of suspected PDE in patients with neonatal refractory seizures, treatment with pyridoxine and/or Pyridoxal-5-phophate has to be started immediately and continued until the results of genetic analysis received. Even with early appropriate treatment, neurological outcome of our patient is still poor.
Collapse
Affiliation(s)
- Maitou Pal
- Faculty of MedicineLaval UniversityQuébecQuébecCanada
| | - Baiba Lace
- Department of Medical GeneticsCentre Mère Enfant Soleil, Laval UniversityQuébecQuébecCanada
| | - Yvan Labrie
- Centre de recherche CHU de Québec‐ Université Laval, Laval UniversityQuébecQuébecCanada
| | - Nathalie Laflamme
- Centre de recherche CHU de Québec‐ Université Laval, Laval UniversityQuébecQuébecCanada
| | - Nadie Rioux
- Centre de recherche CHU de Québec‐ Université Laval, Laval UniversityQuébecQuébecCanada
| | - Samarth Thonta Setty
- Centre de recherche CHU de Québec‐ Université Laval, Laval UniversityQuébecQuébecCanada
| | - Marc‐Andre Dugas
- Department of PediatricsCentre Mère Enfant Soleil, Laval UniversityQuébecQuébecCanada
| | - Louise Gosselin
- Centre de recherche CHU de Québec‐ Université Laval, Laval UniversityQuébecQuébecCanada
| | - Arnaud Droit
- Centre de recherche CHU de Québec‐ Université Laval, Laval UniversityQuébecQuébecCanada
| | - Nicolas Chrestian
- Department of Pediatric Neurology, Pediatric Neuromuscular DisorderCentre Mère Enfant Soleil, Laval UniversityQuébecQuébecCanada
| | - Serge Rivest
- Centre de recherche CHU de Québec‐ Université Laval, Laval UniversityQuébecQuébecCanada
| |
Collapse
|
32
|
Belbin GM, Cullina S, Wenric S, Soper ER, Glicksberg BS, Torre D, Moscati A, Wojcik GL, Shemirani R, Beckmann ND, Cohain A, Sorokin EP, Park DS, Ambite JL, Ellis S, Auton A, Bottinger EP, Cho JH, Loos RJF, Abul-Husn NS, Zaitlen NA, Gignoux CR, Kenny EE. Toward a fine-scale population health monitoring system. Cell 2021; 184:2068-2083.e11. [PMID: 33861964 DOI: 10.1016/j.cell.2021.03.034] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 11/18/2020] [Accepted: 03/12/2021] [Indexed: 12/22/2022]
Abstract
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.
Collapse
Affiliation(s)
- Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sinead Cullina
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Stephane Wenric
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Emily R Soper
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Denis Torre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Arden Moscati
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Genevieve L Wojcik
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Ruhollah Shemirani
- Information Science Institute, University of Southern California, Marina del Rey, CA 90089, USA
| | - Noam D Beckmann
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ariella Cohain
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Elena P Sorokin
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Danny S Park
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Jose-Luis Ambite
- Information Science Institute, University of Southern California, Marina del Rey, CA 90089, USA
| | - Steve Ellis
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Adam Auton
- Department of Genetics, Albert Einstein College of Medicine, New York, NY 10461, USA
| | -
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | -
- Regeneron Genetics Center, Tarrytown, New York, NY 10591, USA
| | - Erwin P Bottinger
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Judy H Cho
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ruth J F Loos
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Noura S Abul-Husn
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Noah A Zaitlen
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA 90033, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
33
|
Fockler J, Kwang W, Ashford MT, Flenniken D, Hwang J, Truran D, Mackin RS, Jin C, O'Hara R, Hallmayer JF, Yesavage JA, Weiner MW, Nosheny RL. Brain health registry GenePool study: A novel approach to online genetics research. ALZHEIMERS & DEMENTIA-TRANSLATIONAL RESEARCH & CLINICAL INTERVENTIONS 2021; 7:e12118. [PMID: 33614891 PMCID: PMC7882536 DOI: 10.1002/trc2.12118] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 10/07/2020] [Accepted: 11/04/2020] [Indexed: 12/29/2022]
Abstract
Introduction Remote data collection, including the establishment of online registries, is a novel approach to efficiently identify risk for cognitive decline and Alzheimer's disease (AD) in older adults, with growing evidence for feasibility and validity. Addition of genetic data to online registries has the potential to facilitate identification of older adults at risk and to advance the understanding of genetic contributions to AD. Methods 573 older adult participants with longitudinal online Brain Health Registry (BHR) data underwent apolipoprotein E (APOE) genotyping using remotely collected saliva samples and a novel, automated Biofluid Collection Management Portal. We evaluated acceptability of genetic sample collection and estimated associations between (1) sociodemographic variables and willingness to participate in genetics research and (2) APOE results and online cognitive and functional assessments. We also assessed acceptance of hypothetical genetics research participation by surveying a larger sample of 25,888 BHR participants. Results 51% of invited participants enrolled in the BHR genetics study, BHR‐GenePool Study (BHR‐GPS); 27% of participants had at least one APOE ε4 allele. Older participants and those with higher educational attainment were more likely to participate. In the remotely administered Cogstate Brief Battery, APOE ε4/ε4 homozygotes (HM) had worse online learning scores, and greater decline in processing speed and attention, compared to ε3/ε4 heterozygotes (HT) and ε4 non‐carriers (NC). Discussion APOE genotyping of more than 500 older adults enrolled in BHR supports the feasibility and validity of a novel, remote biofluids collection approach from a large cohort of older adults, with data linkage to longitudinal online cognitive data. This approach can be expanded for efficient collection of genetic data and other information from biofluids in the future.
Collapse
Affiliation(s)
- Juliet Fockler
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Radiology and Biomedical Imaging University of California, San Francisco San Francisco California USA
| | - Winnie Kwang
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Radiology and Biomedical Imaging University of California, San Francisco San Francisco California USA
| | - Miriam T Ashford
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Radiology and Biomedical Imaging University of California, San Francisco San Francisco California USA
| | - Derek Flenniken
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Radiology and Biomedical Imaging University of California, San Francisco San Francisco California USA
| | - Joshua Hwang
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Radiology and Biomedical Imaging University of California, San Francisco San Francisco California USA
| | - Diana Truran
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Radiology and Biomedical Imaging University of California, San Francisco San Francisco California USA
| | - R Scott Mackin
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Psychiatry University of California, San Francisco San Francisco California USA
| | - Chengshi Jin
- San Francisco Department of Biostatistics and Epidemiology University of California, San Francisco San Francisco California USA
| | - Ruth O'Hara
- Department of Psychiatry and Behavioral Sciences Stanford University Stanford California USA
| | - Joachim F Hallmayer
- Department of Psychiatry and Behavioral Sciences Stanford University Stanford California USA
| | - Jerome A Yesavage
- Department of Psychiatry and Behavioral Sciences Stanford University Stanford California USA
| | - Michael W Weiner
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Radiology and Biomedical Imaging University of California, San Francisco San Francisco California USA
| | - Rachel L Nosheny
- VA Advanced Imaging Research Center San Francisco Veteran's Administration Medical Center San Francisco California USA.,San Francisco Department of Psychiatry University of California, San Francisco San Francisco California USA
| |
Collapse
|
34
|
Naseri A, Tang K, Geng X, Shi J, Zhang J, Shakya P, Liu X, Zhang S, Zhi D. Personalized genealogical history of UK individuals inferred from biobank-scale IBD segments. BMC Biol 2021; 19:32. [PMID: 33593342 PMCID: PMC7888130 DOI: 10.1186/s12915-021-00964-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 01/19/2021] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND The genealogical histories of individuals within populations are of interest to studies aiming both to uncover detailed pedigree information and overall quantitative population demographic histories. However, the analysis of quantitative details of individual genealogical histories has faced challenges from incomplete available pedigree records and an absence of objective and quantitative details in pedigree information. Although complete pedigree information for most individuals is difficult to track beyond a few generations, it is possible to describe a person's genealogical history using their genetic relatives revealed by identity by descent (IBD) segments-long genomic segments shared by two individuals within a population, which are identical due to inheritance from common ancestors. When modern biobanks collect genotype information for a significant fraction of a population, dense genetic connections of a person can be traced using such IBD segments, offering opportunities to characterize individuals in the context of the underlying populations. Here, we conducted an individual-centric analysis of IBD segments among the UK Biobank participants that represent 0.7% of the UK population. RESULTS We made a high-quality call set of IBD segments over 5 cM among all 500,000 UK Biobank participants. On average, one UK individual shares IBD segments with 14,000 UK Biobank participants, which we refer to as "relatives." Using these segments, approximately 80% of a person's genome can be imputed. We subsequently propose genealogical descriptors based on the genetic connections of relative cohorts of individuals sharing at least one IBD segment and show that such descriptors offer important information about one's genetic makeup, personal genealogical history, and social behavior. Through analysis of relative counts sharing segments at different lengths, we identified a group, potentially British Jews, who has a distinct pattern of familial expansion history. Finally, using the enrichment of relatives in one's neighborhood, we identified regional variations of personal preference favoring living closer to one's extended families. CONCLUSIONS Our analysis revealed genetic makeup, personal genealogical history, and social behaviors at the population scale, opening possibilities for further studies of individual's genetic connections in biobank data.
Collapse
Affiliation(s)
- Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xin Geng
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Junjie Shi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Jing Zhang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Pramesh Shakya
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoming Liu
- USF Genomics, College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
- Center for Precision Health, School of Biomedical Informatics, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| |
Collapse
|
35
|
Debortoli G, de Araujo GS, Fortes-Lima C, Parra EJ, Suarez-Kurtz G. Identification of ancestry proportions in admixed groups across the Americas using clinical pharmacogenomic SNP panels. Sci Rep 2021; 11:1007. [PMID: 33441860 PMCID: PMC7806998 DOI: 10.1038/s41598-020-80389-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/14/2020] [Indexed: 11/09/2022] Open
Abstract
We evaluated the performance of three PGx panels to estimate biogeographical ancestry: the DMET panel, and the VIP and Preemptive PGx panels described in the literature. Our analysis indicate that the three panels capture quite well the individual variation in admixture proportions observed in recently admixed populations throughout the Americas, with the Preemptive PGx and DMET panels performing better than the VIP panel. We show that these panels provide reliable information about biogeographic ancestry and can be used to guide the implementation of PGx clinical decision-support (CDS) tools. We also report that using these panels it is possible to control for the effects of population stratification in association studies in recently admixed populations, as exemplified with a warfarin dosing GWA study in a sample from Brazil.
Collapse
Affiliation(s)
- Guilherme Debortoli
- Department of Anthropology, University of Toronto at Mississauga, Mississauga, ON, Canada
| | | | - Cesar Fortes-Lima
- Sub-Department of Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Esteban J Parra
- Department of Anthropology, University of Toronto at Mississauga, Mississauga, ON, Canada.
| | - Guilherme Suarez-Kurtz
- Instituto Nacional de Câncer and Rede Nacional de Farmacogenética, Rio de Janeiro, Brazil.
| |
Collapse
|
36
|
Chen NC, Solomon B, Mun T, Iyer S, Langmead B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol 2021. [PMID: 33397413 DOI: 10.1101/2020.03.03.975219] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023] Open
Abstract
Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
37
|
Chen NC, Solomon B, Mun T, Iyer S, Langmead B. Reference flow: reducing reference bias using multiple population genomes. Genome Biol 2021; 22:8. [PMID: 33397413 PMCID: PMC7780692 DOI: 10.1186/s13059-020-02229-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 12/08/2020] [Indexed: 12/30/2022] Open
Abstract
Most sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.
Collapse
Affiliation(s)
- Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Brad Solomon
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Taher Mun
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Sheila Iyer
- Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, USA.
| |
Collapse
|
38
|
Deconvolving Human Evolutionary History: Using Network-Based Approaches to Better Understand Our Past. SYSTEMS MEDICINE 2021. [DOI: 10.1016/b978-0-12-801238-3.11468-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
39
|
Spear ML, Diaz-Papkovich A, Ziv E, Yracheta JM, Gravel S, Torgerson DG, Hernandez RD. Recent shifts in the genomic ancestry of Mexican Americans may alter the genetic architecture of biomedical traits. eLife 2020; 9:e56029. [PMID: 33372659 PMCID: PMC7771964 DOI: 10.7554/elife.56029] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 12/13/2020] [Indexed: 11/13/2022] Open
Abstract
People in the Americas represent a diverse continuum of populations with varying degrees of admixture among African, European, and Amerindigenous ancestries. In the United States, populations with non-European ancestry remain understudied, and thus little is known about the genetic architecture of phenotypic variation in these populations. Using genotype data from the Hispanic Community Health Study/Study of Latinos, we find that Amerindigenous ancestry increased by an average of ~20% spanning 1940s-1990s in Mexican Americans. These patterns result from complex interactions between several population and cultural factors which shaped patterns of genetic variation and influenced the genetic architecture of complex traits in Mexican Americans. We show for height how polygenic risk scores based on summary statistics from a European-based genome-wide association study perform poorly in Mexican Americans. Our findings reveal temporal changes in population structure within Hispanics/Latinos that may influence biomedical traits, demonstrating a need to improve our understanding of admixed populations.
Collapse
Affiliation(s)
- Melissa L Spear
- Biomedical Sciences Graduate Program, University of California, San FranciscoSan FranciscoUnited States
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
| | - Alex Diaz-Papkovich
- McGill Genome Centre, McGill UniversityMontrealCanada
- Quantitative Life Sciences Program, McGill UniversityMontrealCanada
| | - Elad Ziv
- Division of General Internal Medicine, University of California, San FranciscoSan FranciscoUnited States
- Department of Medicine, University of California, San FranciscoSan FranciscoUnited States
- Institute of Human Genetics, University of California, San FranciscoSan FranciscoUnited States
- Helen Diller Family Comprehensive Cancer Center, University of California, San FranciscoSan FranciscoUnited States
| | - Joseph M Yracheta
- Native BioData ConsortiumEagle ButteUnited States
- Bloomberg School of Public Health, Johns Hopkins UniversityBaltimoreUnited States
| | - Simon Gravel
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
| | - Dara G Torgerson
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
- Department of Epidemiology and Biostatistics University of California, San FranciscoSan FranciscoUnited States
| | - Ryan D Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San FranciscoSan FranciscoUnited States
- McGill Genome Centre, McGill UniversityMontrealCanada
- Department of Human Genetics, McGill UniversityMontrealCanada
- Institute of Human Genetics, University of California, San FranciscoSan FranciscoUnited States
- Bakar Computational Health Sciences Institute, University of California, San FranciscoSan FranciscoUnited States
- Quantitative Biosciences Institute, University of California, San FranciscoSan FranciscoUnited States
| |
Collapse
|
40
|
Fortes-Lima C, Verdu P. Anthropological genetics perspectives on the transatlantic slave trade. Hum Mol Genet 2020; 30:R79-R87. [PMID: 33331897 DOI: 10.1093/hmg/ddaa271] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Revised: 12/07/2020] [Accepted: 12/11/2020] [Indexed: 01/07/2023] Open
Abstract
During the Trans-Atlantic Slave Trade (TAST), around twelve million Africans were enslaved and forcibly moved from Africa to the Americas and Europe, durably influencing the genetic and cultural landscape of a large part of humanity since the 15th century. Following historians, archaeologists, and anthropologists, population geneticists have, since the 1950's mainly, extensively investigated the genetic diversity of populations on both sides of the Atlantic. These studies shed new lights into the largely unknown genetic origins of numerous enslaved-African descendant communities in the Americas, by inferring their genetic relationships with extant African, European, and Native American populations. Furthermore, exploring genome-wide data with novel statistical and bioinformatics methods, population geneticists have been increasingly able to infer the last 500 years of admixture histories of these populations. These inferences have highlighted the diversity of histories experienced by enslaved-African descendants, and the complex influences of socioeconomic, political, and historical contexts on human genetic diversity patterns during and after the slave trade. Finally, the recent advances of paleogenomics unveiled crucial aspects of the life and health of the first generation of enslaved-Africans in the Americas. Altogether, human population genetics approaches in the genomic and paleogenomic era need to be coupled with history, archaeology, anthropology, and demography in interdisciplinary research, to reconstruct the multifaceted and largely unknown history of the TAST and its influence on human biological and cultural diversities today. Here, we review anthropological genomics studies published over the past 15 years and focusing on the history of enslaved-African descendant populations in the Americas.
Collapse
Affiliation(s)
- Cesar Fortes-Lima
- Sub-department of Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, 75236, Sweden
| | - Paul Verdu
- Unité Mixte de Recherche7206 Eco-Anthropology, CNRS-MNHN-Université de Paris, Musée de l'Homme, Paris, 75016, France
| |
Collapse
|
41
|
Samuels DC, Below JE, Ness S, Yu H, Leng S, Guo Y. Alternative Applications of Genotyping Array Data Using Multivariant Methods. Trends Genet 2020; 36:857-867. [PMID: 32773169 PMCID: PMC7572808 DOI: 10.1016/j.tig.2020.07.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/08/2020] [Accepted: 07/09/2020] [Indexed: 10/23/2022]
Abstract
One of the forerunners that pioneered the revolution of high-throughput genomic technologies is the genotyping microarray technology, which can genotype millions of single-nucleotide variants simultaneously. Owing to apparent benefits, such as high speed, low cost, and high throughput, the genotyping array has gained lasting applications in genome-wide association studies (GWAS) and thus accumulated an enormous amount of data. Empowered by continuous manufactural upgrades and analytical innovation, unconventional applications of genotyping array data have emerged to address more diverse genetic problems, holding promise of boosting genetic research into human diseases through the re-mining of the rich accumulated data. Here, we review several unconventional genotyping array analysis techniques that have been built on the idea of large-scale multivariant analysis and provide empirical application examples. These unconventional outcomes of genotyping arrays include polygenic score, runs of homozygosity (ROH)/heterozygosity ratio, distant pedigree computation, and mitochondrial DNA (mtDNA) copy number inference.
Collapse
Affiliation(s)
- David C Samuels
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN 37232, USA
| | - Jennifer E Below
- Devision of Genetic Medicine, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Scott Ness
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Hui Yu
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Shuguang Leng
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA
| | - Yan Guo
- Department of Internal Medicine, Comprehensive Cancer Center, University of New Mexico, Albuquerque, NM 87109, USA.
| |
Collapse
|
42
|
Abstract
Understanding the influence of genetics on human disease is among the primary goals for biology and medicine. To this end, the direct study of natural human genetic variation has provided valuable insights into human physiology and disease as well as into the origins and migrations of humans. In this review, we discuss the foundations of population genetics, which provide a crucial context to the study of human genes and traits. In particular, genome-wide association studies and similar methods have revealed thousands of genetic loci associated with diseases and traits, providing invaluable information into the biology of these traits. Simultaneously, as the study of rare genetic variation has expanded, so-called human knockouts have elucidated the function of human genes and the therapeutic potential of targeting them.
Collapse
Affiliation(s)
- Konrad J. Karczewski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;,
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| | - Alicia R. Martin
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;,
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA
| |
Collapse
|
43
|
Native American gene flow into Polynesia predating Easter Island settlement. Nature 2020; 583:572-577. [PMID: 32641827 PMCID: PMC8939867 DOI: 10.1038/s41586-020-2487-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Accepted: 05/22/2020] [Indexed: 11/08/2022]
Abstract
The possibility of voyaging contact between prehistoric Polynesian and Native American populations has long intrigued researchers. Proponents have pointed to the existence of New World crops, such as the sweet potato and bottle gourd, in the Polynesian archaeological record, but nowhere else outside the pre-Columbian Americas1-6, while critics have argued that these botanical dispersals need not have been human mediated7. The Norwegian explorer Thor Heyerdahl controversially suggested that prehistoric South American populations had an important role in the settlement of east Polynesia and particularly of Easter Island (Rapa Nui)2. Several limited molecular genetic studies have reached opposing conclusions, and the possibility continues to be as hotly contested today as it was when first suggested8-12. Here we analyse genome-wide variation in individuals from islands across Polynesia for signs of Native American admixture, analysing 807 individuals from 17 island populations and 15 Pacific coast Native American groups. We find conclusive evidence for prehistoric contact of Polynesian individuals with Native American individuals (around AD 1200) contemporaneous with the settlement of remote Oceania13-15. Our analyses suggest strongly that a single contact event occurred in eastern Polynesia, before the settlement of Rapa Nui, between Polynesian individuals and a Native American group most closely related to the indigenous inhabitants of present-day Colombia.
Collapse
|
44
|
A Geometric Clustering Tool (AGCT) to robustly unravel the inner cluster structures of time-series gene expressions. PLoS One 2020; 15:e0233755. [PMID: 32628677 PMCID: PMC7337352 DOI: 10.1371/journal.pone.0233755] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 05/12/2020] [Indexed: 11/19/2022] Open
Abstract
Systems biology aims at holistically understanding the complexity of biological systems. In particular, nowadays with the broad availability of gene expression measurements, systems biology challenges the deciphering of the genetic cell machinery from them. In order to help researchers, reverse engineer the genetic cell machinery from these noisy datasets, interactive exploratory clustering methods, pipelines and gene clustering tools have to be specifically developed. Prior methods/tools for time series data, however, do not have the following four major ingredients in analytic and methodological view point: (i) principled time-series feature extraction methods, (ii) variety of manifold learning methods for capturing high-level view of the dataset, (iii) high-end automatic structure extraction, and (iv) friendliness to the biological user community. With a view to meet the requirements, we present AGCT (A Geometric Clustering Tool), a software package used to unravel the complex architecture of large-scale, non-necessarily synchronized time-series gene expression data. AGCT capture signals on exhaustive wavelet expansions of the data, which are then embedded on a low-dimensional non-linear map using manifold learning algorithms, where geometric proximity captures potential interactions. Post-processing techniques, including hard and soft information geometric clustering algorithms, facilitate the summarizing of the complete map as a smaller number of principal factors which can then be formally identified using embedded statistical inference techniques. Three-dimension interactive visualization and scenario recording over the processing helps to reproduce data analysis results without additional time. Analysis of the whole-cell Yeast Metabolic Cycle (YMC) moreover, Yeast Cell Cycle (YCC) datasets demonstrate AGCT's ability to accurately dissect all stages of metabolism and the cell cycle progression, independently of the time course and the number of patterns related to the signal. Analysis of Pentachlorophenol iduced dataset demonstrat how AGCT dissects data to identify two networks: Interferon signaling and NRF2-signaling networks.
Collapse
|
45
|
Mszar R, Buscher S, Taylor HL, Rice-DeFosse MT, McCann D. Familial Hypercholesterolemia and the Founder Effect Among Franco-Americans: A Brief History and Call to Action. CJC Open 2020; 2:161-167. [PMID: 32462130 PMCID: PMC7242505 DOI: 10.1016/j.cjco.2020.01.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Accepted: 01/19/2020] [Indexed: 01/01/2023] Open
Abstract
Familial hypercholesterolemia (FH) is an inherited disorder characterized by chronically elevated low-density lipoprotein cholesterol levels and an increased risk of premature atherosclerotic cardiovascular disease. FH has been shown to disproportionately affect French Canadians and other ethnic populations due to the presence of a founder effect characterized by reduced genetic diversity resulting from relatively few individuals with FH-causing genetic mutations establishing self-contained populations. Beginning in the mid-1800s, approximately 1 million French Canadians immigrated to the Northeastern United States and largely remained in these small, tight-knit communities. Despite extensive genetic- and population-based research involving the French-Canadian founder population, primarily in the Province of Quebec, little is known regarding Franco-Americans in the United States. Concurrent with addressing the underdiagnosis rate of FH in the general population, we propose the following steps to leverage this founder effect and meet the cardiovascular needs of Franco-Americans: (1) increase cascade screening in regions of the United States with a high proportion of individuals of French-Canadian descent; (2) promote registry-based, epidemiological research to elucidate accurate prevalence estimates as well as diagnostic and treatment gaps in Franco-Americans; and (3) validate contemporary risk stratification strategies such as the Montreal-FH-SCORE to enable optimal lipid management and prevention of premature atherosclerotic cardiovascular disease among French-Canadian descendants.
Collapse
Affiliation(s)
- Reed Mszar
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut, USA
| | - Sara Buscher
- Division of General Pediatrics, Boston Children’s Hospital, Boston, Massachusetts
| | - Heidi L. Taylor
- Department of Sociology, Bates College, Lewiston, Maine, USA
| | - Mary T. Rice-DeFosse
- Department of French and Francophone Studies, Bates College, Lewiston, Maine, USA
| | - Dervilla McCann
- Department of Cardiology, Central Maine Medical Center, Lewiston, Maine, USA
| |
Collapse
|
46
|
Seidman DN, Shenoy SA, Kim M, Babu R, Woods IG, Dyer TD, Lehman DM, Curran JE, Duggirala R, Blangero J, Williams AL. Rapid, Phase-free Detection of Long Identity-by-Descent Segments Enables Effective Relationship Classification. Am J Hum Genet 2020; 106:453-466. [PMID: 32197076 PMCID: PMC7118564 DOI: 10.1016/j.ajhg.2020.02.012] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 02/18/2020] [Indexed: 01/29/2023] Open
Abstract
Identity-by-descent (IBD) segments are a useful tool for applications ranging from demographic inference to relationship classification, but most detection methods rely on phasing information and therefore require substantial computation time. As genetic datasets grow, methods for inferring IBD segments that scale well will be critical. We developed IBIS, an IBD detector that locates long regions of allele sharing between unphased individuals, and benchmarked it with Refined IBD, GERMLINE, and TRUFFLE on 3,000 simulated individuals. Phasing these with Beagle 5 takes 4.3 CPU days, followed by either Refined IBD or GERMLINE segment detection in 2.9 or 1.1 h, respectively. By comparison, IBIS finishes in 6.8 min or 7.8 min with IBD2 functionality enabled: speedups of 805-946× including phasing time. TRUFFLE takes 2.6 h, corresponding to IBIS speedups of 20.2-23.3×. IBIS is also accurate, inferring ≥7 cM IBD segments at quality comparable to Refined IBD and GERMLINE. With these segments, IBIS classifies first through third degree relatives in real Mexican American samples at rates meeting or exceeding other methods tested and identifies fourth through sixth degree pairs at rates within 0.0%-2.0% of the top method. While allele frequency-based approaches that do not detect segments can infer relationship degrees faster than IBIS, the fastest are biased in admixed samples, with KING inferring 30.8% fewer fifth degree Mexican American relatives correctly compared with IBIS. Finally, we ran IBIS on chromosome 2 of the UK Biobank dataset and estimate its runtime on the autosomes to be 3.3 days parallelized across 128 cores.
Collapse
Affiliation(s)
- Daniel N Seidman
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Sushila A Shenoy
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Minsoo Kim
- Department of Genetic Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Ramya Babu
- Department of Computer Science, Cornell University, Ithaca, NY 14853, USA
| | - Ian G Woods
- Department of Biology, Ithaca College, Ithaca, NY 14850, USA
| | - Thomas D Dyer
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Donna M Lehman
- Department of Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Joanne E Curran
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Ravindranath Duggirala
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute and Department of Human Genetics, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX 78520, USA
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
47
|
Zhou Y, Browning SR, Browning BL. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data. Am J Hum Genet 2020; 106:426-437. [PMID: 32169169 PMCID: PMC7118582 DOI: 10.1016/j.ajhg.2020.02.010] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 02/12/2020] [Indexed: 12/24/2022] Open
Abstract
Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.
Collapse
Affiliation(s)
- Ying Zhou
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
48
|
Dai CL, Vazifeh MM, Yeang CH, Tachet R, Wells RS, Vilar MG, Daly MJ, Ratti C, Martin AR. Population Histories of the United States Revealed through Fine-Scale Migration and Haplotype Analysis. Am J Hum Genet 2020; 106:371-388. [PMID: 32142644 PMCID: PMC7058830 DOI: 10.1016/j.ajhg.2020.02.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 02/05/2020] [Indexed: 12/11/2022] Open
Abstract
The population of the United States is shaped by centuries of migration, isolation, growth, and admixture between ancestors of global origins. Here, we assemble a comprehensive view of recent population history by studying the ancestry and population structure of more than 32,000 individuals in the US using genetic, ancestral birth origin, and geographic data from the National Geographic Genographic Project. We identify migration routes and barriers that reflect historical demographic events. We also uncover the spatial patterns of relatedness in subpopulations through the combination of haplotype clustering, ancestral birth origin analysis, and local ancestry inference. Examples of these patterns include substantial substructure and heterogeneity in Hispanics/Latinos, isolation-by-distance in African Americans, elevated levels of relatedness and homozygosity in Asian immigrants, and fine-scale structure in European descents. Taken together, our results provide detailed insights into the genetic structure and demographic history of the diverse US population.
Collapse
Affiliation(s)
- Chengzhen L Dai
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mohammad M Vazifeh
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chen-Hsiang Yeang
- Institute of Statistical Science, Academia Sinica, Nankang, Taipei, Taiwan
| | - Remi Tachet
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | - Miguel G Vilar
- Genographic Project, National Geographic Society, Washington, DC 20036, USA
| | - Mark J Daly
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Carlo Ratti
- Senseable City Lab, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
| |
Collapse
|
49
|
Edge MD, Coop G. Attacks on genetic privacy via uploads to genealogical databases. eLife 2020; 9:51810. [PMID: 31908268 PMCID: PMC6992384 DOI: 10.7554/elife.51810] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 12/23/2019] [Indexed: 02/06/2023] Open
Abstract
Direct-to-consumer (DTC) genetics services are increasingly popular, with tens of millions of customers. Several DTC genealogy services allow users to upload genetic data to search for relatives, identified as people with genomes that share identical by state (IBS) regions. Here, we describe methods by which an adversary can learn database genotypes by uploading multiple datasets. For example, an adversary who uploads approximately 900 genomes could recover at least one allele at SNP sites across up to 82% of the genome of a median person of European ancestries. In databases that detect IBS segments using unphased genotypes, approximately 100 falsified uploads can reveal enough genetic information to allow genome-wide genetic imputation. We provide a proof-of-concept demonstration in the GEDmatch database, and we suggest countermeasures that will prevent the exploits we describe.
Collapse
Affiliation(s)
- Michael D Edge
- Center for Population Biology, University of California, Davis, Davis, United States.,Department of Evolution and Ecology, University of California, Davis, Davis, United States.,Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, United States
| | - Graham Coop
- Center for Population Biology, University of California, Davis, Davis, United States.,Department of Evolution and Ecology, University of California, Davis, Davis, United States
| |
Collapse
|
50
|
Greenbaum G, Rubin A, Templeton AR, Rosenberg NA. Network-based hierarchical population structure analysis for large genomic data sets. Genome Res 2019; 29:2020-2033. [PMID: 31694865 PMCID: PMC6886512 DOI: 10.1101/gr.250092.119] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 11/01/2019] [Indexed: 01/24/2023]
Abstract
Analysis of population structure in natural populations using genetic data is a common practice in ecological and evolutionary studies. With large genomic data sets of populations now appearing more frequently across the taxonomic spectrum, it is becoming increasingly possible to reveal many hierarchical levels of structure, including fine-scale genetic clusters. To analyze these data sets, methods need to be appropriately suited to the challenges of extracting multilevel structure from whole-genome data. Here, we present a network-based approach for constructing population structure representations from genetic data. The use of community-detection algorithms from network theory generates a natural hierarchical perspective on the representation that the method produces. The method is computationally efficient, and it requires relatively few assumptions regarding the biological processes that underlie the data. We show the approach by analyzing population structure in the model plant species Arabidopsis thaliana and in human populations. These examples illustrate how network-based approaches for population structure analysis are well-suited to extracting valuable ecological and evolutionary information in the era of large genomic data sets.
Collapse
Affiliation(s)
- Gili Greenbaum
- Department of Biology, Stanford University, Stanford, California 94305, USA
| | - Amir Rubin
- Department of Computer Science, Ben-Gurion University of the Negev, Be'er-Sheva, 8410501, Israel
| | - Alan R Templeton
- Department of Biology, Washington University, St. Louis, Missouri 63130, USA
- Department of Evolutionary and Environmental Ecology, University of Haifa, Haifa, 31905, Israel
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, California 94305, USA
| |
Collapse
|