1
|
Andrawus M, David GB, Terziyska I, Sharvit L, Bergman A, Barzilai N, Raj SM, Govindaraju DR, Atzmon G. Genome integrity as a potential index of longevity in Ashkenazi Centenarian's families. GeroScience 2024:10.1007/s11357-024-01178-0. [PMID: 38724875 DOI: 10.1007/s11357-024-01178-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 04/24/2024] [Indexed: 06/19/2024] Open
Abstract
The aging process, or senescence, is characterized by age-specific decline in physical and physiological function, and increased frailty and genomic changes, including mutation accumulation. However, the mechanisms through which changes in genomic architecture influence human longevity have remained obscure. Copy number variants (CNVs), an abundant class of genomic variants, offer unique opportunities for understanding age-related genomic changes. Here we report the spectrum of CNVs in a cohort of 670 Ashkenazi Jewish centenarians, their progeny, and unrelated controls. The average ages of these groups were 97.4 ± 2.8, 69.2 ± 9.2, and 66.5 ± 7.0 respectively. For the first time, we compared different size classes of CNVs, from 1 kB to 100 MB in size. Using a high-resolution custom Affymetrix array, targeting 44,639 genomic regions, we identified a total of 12,166, 22,188, and 10,285 CNVs in centenarians, their progeny, and control groups, respectively. Interestingly, the offspring group showed the highest number of unique CNVs, followed by control and centenarians. While both gains and losses were found in all three groups, centenarians showed a significantly higher average number of both total gains and losses relative to their controls (p < 0.0327, 0.0182, respectively). Moreover, centenarians showed a lower total length of genomic material lost, suggesting that they may maintain superior genomic integrity over time. We also observe a significance fold increase of CNVs among the offspring, implying greater genomic integrity and a putative mechanism for longevity preservation. Genomic regions that experienced loss or gains appear to be distributed across many sites in the genome and contain genes involved in DNA transcription, cellular transport, developmental pathways, and metabolic functions. Our findings suggest that the exceptional longevity observed in centenarians may be attributed to the prolonged maintenance of functionally important genes. These genes are intrinsic to specific genomic regions as well as to the overall integrity of the genomic architecture. Additionally, a strong association between longer CNVs and differential gene expression observed in this study supports the notion that genomic integrity could positively influence longevity.
Collapse
Affiliation(s)
| | - Gil Ben David
- Sagol Department of Neurobiology, Faculty of Natural Sciences, University of Haifa, 199 Aba Khoushy Ave., 3498838, Mount Carmel, Haifa, Israel
| | | | - Lital Sharvit
- Faculty of Natural Sciences, University of Haifa, Haifa, Israel
| | - Aviv Bergman
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Nir Barzilai
- Departments of Medicine and Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Srilakshmi M Raj
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | | | - Gil Atzmon
- Faculty of Natural Sciences, University of Haifa, Haifa, Israel.
- Departments of Medicine and Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA.
| |
Collapse
|
2
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. Am J Hum Genet 2024; 111:691-700. [PMID: 38513668 PMCID: PMC11023918 DOI: 10.1016/j.ajhg.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
3
|
Ji Q, Yao Y, Li Z, Zhou Z, Qian J, Tang Q, Xie J. Characterizing identity by descent segments in Chinese interpopulation unrelated individual pairs. Mol Genet Genomics 2024; 299:37. [PMID: 38494535 DOI: 10.1007/s00438-024-02132-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 02/22/2024] [Indexed: 03/19/2024]
Abstract
Identity by descent (IBD) segments, uninterrupted DNA segments derived from the same ancestral chromosomes, are widely used as indicators of relationships in genetics. A great deal of research focuses on IBD segments between related pairs, while the statistical analyses of segments in irrelevant individuals are rare. In this study, we investigated the basic informative features of IBD segments in unrelated pairs in Chinese populations from the 1000 Genome Project. A total of 5922 IBD segments in Chinese interpopulation unrelated individual pairs were detected via IBIS and the average length of IBD was 3.71 Mb in length. It was found that 17.86% of unrelated pairs shared at least one IBD segment in the Chinese cohort. Furthermore, a total of 49 chromosomal regions where IBD segments clustered in high abundance were identified, which might be sharing hotspots in the human genome. Such regions could also be observed in other ancestry populations, which implies that similar IBD backgrounds also exist. Altogether, these results demonstrated the distribution of common background IBD segments, which helps improve the accuracy in pedigree studies based on IBD analysis.
Collapse
Affiliation(s)
- Qiqi Ji
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhimin Li
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Jinglei Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China.
| |
Collapse
|
4
|
Zhang W, Yuan K, Wen R, Li H, Ni X. Reconstruct recent multi-population migration history by using identical-by-descent sharing. J Genet Genomics 2024:S1673-8527(24)00035-3. [PMID: 38423503 DOI: 10.1016/j.jgg.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/19/2024] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
Identical-by-descent (IBD) is a fundamental genomic characteristic in population genetics and has been widely used for population history reconstruction. However, limited by the nature of IBD, which could only capture the relationship between two individuals/haplotypes, existing IBD-based history inference is constrained to two populations. In this study, we propose a novel framework by leveraging IBD sharing in multi-population and develop a method, MatrixIBD, to reconstruct recent multi-population migration history. Specifically, we employ the structured coalescent theory to precisely model the genealogical process and then estimate the IBD sharing across multiple populations. Within our model, we establish a theoretical connection between migration history and IBD sharing. Our method is rigorously evaluated through simulations, revealing its remarkable accuracy and robustness. Furthermore, we apply MatrixIBD to Central and South Asia in the Human Genome Diversity Project and successfully reconstruct the recent migration history of three closely related populations in South Asia. By taking into account the IBD sharing across multiple populations simultaneously, MatrixIBD enables us to attain clearer and more comprehensive insights into the history of regions characterized by complex migration dynamics. This approach provides a holistic perspective on intricate patterns embedded within the recent population migration history.
Collapse
Affiliation(s)
- Wenxiao Zhang
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
| | - Kai Yuan
- The Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ru Wen
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China
| | - Haifang Li
- Baidu Incorporated, Beijing 100085, China
| | - Xumin Ni
- School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China.
| |
Collapse
|
5
|
Freudiger A, Jovanovic VM, Huang Y, Snyder-Mackler N, Conrad DF, Miller B, Montague MJ, Westphal H, Stadler PF, Bley S, Horvath JE, Brent LJN, Platt ML, Ruiz-Lambides A, Tung J, Nowick K, Ringbauer H, Widdig A. Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574911. [PMID: 38260273 PMCID: PMC10802400 DOI: 10.1101/2024.01.09.574911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
Collapse
Affiliation(s)
- Annika Freudiger
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Vladimir M Jovanovic
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Noah Snyder-Mackler
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, USA
| | - Donald F Conrad
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Michael J Montague
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hendrikje Westphal
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, USA
| | - Stefanie Bley
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Julie E Horvath
- Department of Biological and Biomedical Sciences, North Carolina Central University, North Carolina, Durham, USA
- Research and Collections Section, North Carolina Museum of Natural Sciences, North Carolina, Raleigh, USA
- Department of Biological Sciences, North Carolina State University, North Carolina, Raleigh, USA
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Lauren J N Brent
- Centre for Research in Animal Behaviour, University of Exeter, Exeter, UK
| | - Michael L Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Marketing Department, the Wharton School of Business, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Angelina Ruiz-Lambides
- Cayo Santiago Field Station, Caribbean Primate Research Center, University of Puerto Rico, Punta Santiago, Puerto Rico
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Department of Biology, Duke University, Durham, North Carolina, USA
- Duke University Population Research Institute, Durham, North Carolina, USA
| | - Katja Nowick
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anja Widdig
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| |
Collapse
|
6
|
He S, Wang Y, Luo Y, Xue M, Wu M, Tan H, Peng Y, Wang K, Fang M. Integrated analysis strategy of genome-wide functional gene mining reveals DKK2 gene underlying meat quality in Shaziling synthesized pigs. BMC Genomics 2024; 25:30. [PMID: 38178019 PMCID: PMC10765619 DOI: 10.1186/s12864-023-09925-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024] Open
Abstract
BACKGROUND Shaziling pig is a well-known indigenous breed in China who has superior meat quality traits. However, the genetic mechanism and genomic evidence underlying meat quality characteristics of Shaziling pigs are still unclear. To explore and investigate the germplasm characteristics of Shaziling pigs, we totally analyzed 67 individual's whole genome sequencing data for the first time (20 Shaziling pigs [S], 20 Dabasha pigs [DBS], 11 Yorkshire pigs [Y], 10 Berkshire pigs [BKX], 5 Basha pigs [BS] and 1 Warthog). RESULTS A total of 2,538,577 SNPs with high quality were detected and 9 candidate genes which was specifically selected in S and shared in S to DBS were precisely mined and screened using an integrated analysis strategy of identity-by-descent (IBD) and selective sweep. Of them, dickkopf WNT signaling pathway inhibitor 2 (DKK2), the antagonist of Wnt signaling pathway, was the most promising candidate gene which was not only identified an association of palmitic acid and palmitoleic acid quantitative trait locus in PigQTLdb, but also specifically selected in S compared to other 48 Chinese local pigs of 12 populations and 39 foreign pigs of 4 populations. Subsequently, a mutation at 12,726-bp of DKK2 intron 1 (g.114874954 A > C) was identified associated with intramuscular fat content using method of PCR-RFLP in 21 different pig populations. We observed DKK2 specifically expressed in adipose tissues. Overexpression of DKK2 decreased the content of triglyceride, fatty acid synthase and expression of relevant genes of adipogenic and Wnt signaling pathway, while interference of DKK2 got contrary effect during adipogenesis differentiation of porcine preadipocytes and 3T3-L1 cells. CONCLUSIONS Our findings provide an analysis strategy for mining functional genes of important economic traits and provide fundamental data and molecular evidence for improving pig meat quality traits and molecular breeding.
Collapse
Affiliation(s)
- Shuaihan He
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Yubei Wang
- Sanya Institute of China Agricultural University, Sanya, 572025, China
| | - Yabiao Luo
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Mingming Xue
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Maisheng Wu
- Xiangtan Bureau of Animal Husbandry and Veterinary Medicine and Aquatic Product, Xiangtan, 411102, China
| | - Hong Tan
- Xiangtan Bureau of Animal Husbandry and Veterinary Medicine and Aquatic Product, Xiangtan, 411102, China
| | - Yinglin Peng
- Hunan Institute of Animal & Veterinary Science, Changsha, 410131, China
| | - Kejun Wang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, 450002, China.
| | - Meiying Fang
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572025, China.
| |
Collapse
|
7
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Olalde I, Patterson N, Reich D. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet 2024; 56:143-151. [PMID: 38123640 PMCID: PMC10786714 DOI: 10.1038/s41588-023-01582-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 10/20/2023] [Indexed: 12/23/2023]
Abstract
Long DNA segments shared between two individuals, known as identity-by-descent (IBD), reveal recent genealogical connections. Here we introduce ancIBD, a method for identifying IBD segments in ancient human DNA (aDNA) using a hidden Markov model and imputed genotype probabilities. We demonstrate that ancIBD accurately identifies IBD segments >8 cM for aDNA data with an average depth of >0.25× for whole-genome sequencing or >1× for 1240k single nucleotide polymorphism capture data. Applying ancIBD to 4,248 ancient Eurasian individuals, we identify relatives up to the sixth degree and genealogical connections between archaeological groups. Notably, we reveal long IBD sharing between Corded Ware and Yamnaya groups, indicating that the Yamnaya herders of the Pontic-Caspian Steppe and the Steppe-related ancestry in various European Corded Ware groups share substantial co-ancestry within only a few hundred years. These results show that detecting IBD segments can generate powerful insights into the growing aDNA record, both on a small scale relevant to life stories and on a large scale relevant to major cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germany
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Iñigo Olalde
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- BIOMICs Research Group, University of the Basque Country, Vitoria-Gasteiz, Spain
- Ikerbasque-Basque Foundation of Science, Bilbao, Spain
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA.
- Department of Genetics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
8
|
Fournier R, Tsangalidou Z, Reich D, Palamara PF. Haplotype-based inference of recent effective population size in modern and ancient DNA samples. Nat Commun 2023; 14:7945. [PMID: 38040695 PMCID: PMC10692198 DOI: 10.1038/s41467-023-43522-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 11/10/2023] [Indexed: 12/03/2023] Open
Abstract
Individuals sharing recent ancestors are likely to co-inherit large identical-by-descent (IBD) genomic regions. The distribution of these IBD segments in a population may be used to reconstruct past demographic events such as effective population size variation, but accurate IBD detection is difficult in ancient DNA data and in underrepresented populations with limited reference data. In this work, we introduce an accurate method for inferring effective population size variation during the past ~2000 years in both modern and ancient DNA data, called HapNe. HapNe infers recent population size fluctuations using either IBD sharing (HapNe-IBD) or linkage disequilibrium (HapNe-LD), which does not require phasing and can be computed in low coverage data, including data sets with heterogeneous sampling times. HapNe shows improved accuracy in a range of simulated demographic scenarios compared to currently available methods for IBD-based and LD-based inference of recent effective population size, while requiring fewer computational resources. We apply HapNe to several modern populations from the 1,000 Genomes Project, the UK Biobank, the Allen Ancient DNA Resource, and recently published samples from Iron Age Britain, detecting multiple instances of recent effective population size variation across these groups.
Collapse
Affiliation(s)
| | | | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
9
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565574. [PMID: 37961601 PMCID: PMC10635131 DOI: 10.1101/2023.11.03.565574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more efficient collection and storage of identity by descent (IBD) information than approaches that detect and store pairwise IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach.
Collapse
Affiliation(s)
| | - Brian L. Browning
- Department of Biostatistics, University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA
| |
Collapse
|
10
|
Smaragdov MG. Identification of homozygosity-rich regions in the Holstein genome. Vavilovskii Zhurnal Genet Selektsii 2023; 27:471-479. [PMID: 37808215 PMCID: PMC10556852 DOI: 10.18699/vjgb-23-57] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 02/03/2023] [Accepted: 02/27/2023] [Indexed: 10/10/2023] Open
Abstract
In this study, 371 Holstein cows from six herds and 26 Holstein bulls, which were used in these herds, were genotyped by the Illumina BovineSNP50 array. For runs of homozygosity (ROH) identification, consecutive and sliding runs were performed by the detectRUNS and Plink software. The missing calls did not significantly affect the ROH data. The mean number of ROH identified by consecutive runs was 95.4 ± 2.7, and that by sliding runs was 86.0 ± 2.6 in cows, while this number for Holstein bulls was lower 58.9 ± 1.9. The length of the ROH segments varied from 1 Mb to over 16 Mb, with the largest number of ROH having a length of 1-2 Mb. Of the 29 chromosomes, BTA 14, BTA 16, and BTA 7 were the most covered by ROH. The mean coefficient of inbreeding across the herds was 0.111 ± 0.003 and 0.104 ± 0.004 based on consecutive and sliding runs, respectively, and 0.078 ± 0.005 for bulls based on consecutive runs. These values do not exceed those for Holstein cattle in North America. The results of this study confirmed the more accurate identification of ROH by consecutive runs, and also that the number of allowed heterozygous SNPs may have a significant effect on ROH data.
Collapse
Affiliation(s)
- M G Smaragdov
- Russian Research Institute of Farm Animal Genetics and Breeding - Branch of the L.K. Ernst Federal Science Center for Animal Husbandry, St. Petersburg, Pushkin, Russia
| |
Collapse
|
11
|
Liu J, Wei YL, Yang L, Jiang L, Zhao WT, Li CX. Testing of two SNP array-based genealogy algorithms using extended Han Chinese pedigrees and recommendations for improved performances in forensic practice. Electrophoresis 2023; 44:1435-1445. [PMID: 37501329 DOI: 10.1002/elps.202200237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 05/16/2023] [Accepted: 07/02/2023] [Indexed: 07/29/2023]
Abstract
Distant genetic relatives can be linked to a crime scene sample by computing identity-by-state (IBS) and identity-by-descent (IBD) shared by individuals. To test the methods of genetic genealogy estimation and optimal the parameters for forensic investigation, a family-based genetic genealogy analysis was performed using a dataset of 262 Han Chinese individuals from 11 families. The dataset covered relative pairs from 1st- to 14th degrees. But the 7th-degree relative is the most distant kinship to be fully investigated, and each individual has ∼200 relatives within the 7th degree. The KING algorithm by calculating IBS and IBD statistics can correctly discriminate the first-degree relationships of monozygotic twin, parent-offspring and full sibling. The inferred relationship was reliable within the fifth-degree, false positive rate <1.8%. The IBD segment algorithm, GERMLINE + ERSA, could provide reliable inference result prolonged to eighth degree. Analysis of IBD segments produced obviously false negative estimations (<27.4%) rather than false positives (0%) within the eighth-degree inferences. We studied different minimum IBD segment threshold settings (changed from >0 to 6 cM); the inferred results did not make much difference. In distant relative analysis, genetically undetectable relationships begin to occur from the sixth degree (second cousin once removed), which means the offspring after seven meiotic divisions may share no ancestor IBD segment at all. Application of KING and GERMLINE + ERSA worked complementarily to ensure accurate inference from first degree to eighth degree. Using simulated low call rate data, the KING algorithm shows better tolerance to marker decrease compared with the GERMLINE + ERSA segment algorithm.
Collapse
Affiliation(s)
- Jing Liu
- National Engineering Laboratory for Forensic Science, Key Laboratory of Forensic Genetics of Ministry of Public Security, Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Beijing, P. R. China
- Key Laboratory of Evidence Science, China University of Political Science and Law, Beijing, P. R. China
| | - Yi-Liang Wei
- Jiangsu Key Laboratory of Phylogenomics and Comparative Genomics, School of Life Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, P. R. China
| | - Lan Yang
- School of Forensic Science, Shanxi Medical University, Taiyuan, Shanxi, P. R. China
| | - Li Jiang
- National Engineering Laboratory for Forensic Science, Key Laboratory of Forensic Genetics of Ministry of Public Security, Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Beijing, P. R. China
| | - Wen-Ting Zhao
- National Engineering Laboratory for Forensic Science, Key Laboratory of Forensic Genetics of Ministry of Public Security, Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Beijing, P. R. China
| | - Cai-Xia Li
- National Engineering Laboratory for Forensic Science, Key Laboratory of Forensic Genetics of Ministry of Public Security, Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Beijing, P. R. China
| |
Collapse
|
12
|
Nguyen R, Kapp JD, Sacco S, Myers SP, Green RE. A computational approach for positive genetic identification and relatedness detection from low-coverage shotgun sequencing data. J Hered 2023; 114:504-512. [PMID: 37381815 PMCID: PMC10445519 DOI: 10.1093/jhered/esad041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 06/28/2023] [Indexed: 06/30/2023] Open
Abstract
Several methods exist for detecting genetic relatedness or identity by comparing DNA information. These methods generally require genotype calls, either single-nucleotide polymorphisms or short tandem repeats, at the sites used for comparison. For some DNA samples, like those obtained from bone fragments or single rootless hairs, there is often not enough DNA present to generate genotype calls that are accurate and complete enough for these comparisons. Here, we describe IBDGem, a fast and robust computational procedure for detecting genomic regions of identity-by-descent by comparing low-coverage shotgun sequence data against genotype calls from a known query individual. At less than 1× genome coverage, IBDGem reliably detects segments of relatedness and can make high-confidence identity detections with as little as 0.01× genome coverage.
Collapse
Affiliation(s)
- Remy Nguyen
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Joshua D Kapp
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - Steven P Myers
- California Department of Justice Jan Bashinski DNA Laboratory, Richmond, CA, United States
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, United States
| |
Collapse
|
13
|
Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet 2023; 55:768-776. [PMID: 37127670 PMCID: PMC10181934 DOI: 10.1038/s41588-023-01379-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Accepted: 03/22/2023] [Indexed: 05/03/2023]
Abstract
Genome-wide genealogies compactly represent the evolutionary history of a set of genomes and inferring them from genetic data has the potential to facilitate a wide range of analyses. We introduce a method, ARG-Needle, for accurately inferring biobank-scale genealogies from sequencing or genotyping array data, as well as strategies to utilize genealogies to perform association and other complex trait analyses. We use these methods to build genome-wide genealogies using genotyping data for 337,464 UK Biobank individuals and test for association across seven complex traits. Genealogy-based association detects more rare and ultra-rare signals (N = 134, frequency range 0.0007-0.1%) than genotype imputation using ~65,000 sequenced haplotypes (N = 64). In a subset of 138,039 exome sequencing samples, these associations strongly tag (average r = 0.72) underlying sequencing variants enriched (4.8×) for loss-of-function variation. These results demonstrate that inferred genome-wide genealogies may be leveraged in the analysis of complex traits, complementing approaches that require the availability of large, population-specific sequencing panels.
Collapse
Affiliation(s)
- Brian C Zhang
- Department of Statistics, University of Oxford, Oxford, UK
| | - Arjun Biddanda
- Department of Statistics, University of Oxford, Oxford, UK
| | - Árni Freyr Gunnarsson
- Department of Statistics, University of Oxford, Oxford, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Fergus Cooper
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Pier Francesco Palamara
- Department of Statistics, University of Oxford, Oxford, UK.
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
14
|
Medvedev A, Lebedev M, Ponomarev A, Kosaretskiy M, Osipenko D, Tischenko A, Kosaretskiy E, Wang H, Kolobkov D, Chamberlain-Evans V, Vakhitov R, Nikonorov P. GRAPE: genomic relatedness detection pipeline. F1000Res 2023; 11:589. [PMID: 37224332 PMCID: PMC10182380 DOI: 10.12688/f1000research.111658.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/22/2023] [Indexed: 05/26/2023] Open
Abstract
Classifying the degree of relatedness between pairs of individuals has both scientific and commercial applications. As an example, genome-wide association studies (GWAS) may suffer from high rates of false positive results due to unrecognized population structure. This problem becomes especially relevant with recent increases in large-cohort studies. Accurate relationship classification is also required for genetic linkage analysis to identify disease-associated loci. Additionally, DNA relatives matching service is one of the leading drivers for the direct-to-consumer genetic testing market. Despite the availability of scientific and research information on the methods for determining kinship and the accessibility of relevant tools, the assembly of the pipeline, which stably operates on a real-world genotypic data, requires significant research and development resources. Currently, there is no open source end-to-end solution for relatedness detection in genomic data, that is fast, reliable and accurate for both close and distant degrees of kinship, combines all the necessary processing steps to work on a real data, and is ready for production integration. To address this, we developed GRAPE: Genomic RelAtedness detection PipelinE. It combines data preprocessing, identity-by-descent (IBD) segments detection, and accurate relationship estimation. The project uses software development best practices, as well as Global Alliance for Genomics and Health (GA4GH) standards and tools. Pipeline efficiency is demonstrated on both simulated and real-world datasets. GRAPE is available from: https://github.com/genxnetwork/grape.
Collapse
Affiliation(s)
- Alexander Medvedev
- Skolkovo Institute of Science and Technology, Moscow, Russian Federation
- GENXT, Hinxton, UK
| | | | | | | | | | | | | | - Hui Wang
- GENXT, Hinxton, UK
- Huazhong Agricultural University, Wuhan, China
| | | | | | | | | |
Collapse
|
15
|
Ringbauer H, Huang Y, Akbari A, Mallick S, Patterson N, Reich D. ancIBD - Screening for identity by descent segments in human ancient DNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531671. [PMID: 36945531 PMCID: PMC10028887 DOI: 10.1101/2023.03.08.531671] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Long DNA sequences shared between two individuals, known as Identical by descent (IBD) segments, are a powerful signal for identifying close and distant biological relatives because they only arise when the pair shares a recent common ancestor. Existing methods to call IBD segments between present-day genomes cannot be straightforwardly applied to ancient DNA data (aDNA) due to typically low coverage and high genotyping error rates. We present ancIBD, a method to identify IBD segments for human aDNA data implemented as a Python package. Our approach is based on a Hidden Markov Model, using as input genotype probabilities imputed based on a modern reference panel of genomic variation. Through simulation and downsampling experiments, we demonstrate that ancIBD robustly identifies IBD segments longer than 8 centimorgan for aDNA data with at least either 0.25x average whole-genome sequencing (WGS) coverage depth or at least 1x average depth for in-solution enrichment experiments targeting a widely used aDNA SNP set ('1240k'). This application range allows us to screen a substantial fraction of the aDNA record for IBD segments and we showcase two downstream applications. First, leveraging the fact that biological relatives up to the sixth degree are expected to share multiple long IBD segments, we identify relatives between 10,156 ancient Eurasian individuals and document evidence of long-distance migration, for example by identifying a pair of two approximately fifth-degree relatives who were buried 1410km apart in Central Asia 5000 years ago. Second, by applying ancIBD, we reveal new details regarding the spread of ancestry related to Steppe pastoralists into Europe starting 5000 years ago. We find that the first individuals in Central and Northern Europe carrying high amounts of Steppe-ancestry, associated with the Corded Ware culture, share high rates of long IBD (12-25 cM) with Yamnaya herders of the Pontic-Caspian steppe, signaling a strong bottleneck and a recent biological connection on the order of only few hundred years, providing evidence that the Yamnaya themselves are a main source of Steppe ancestry in Corded Ware people. We also detect elevated sharing of long IBD segments between Corded Ware individuals and people associated with the Globular Amphora culture (GAC) from Poland and Ukraine, who were Copper Age farmers not yet carrying Steppe-like ancestry. These IBD links appear for all Corded Ware groups in our analysis, indicating that individuals related to GAC contexts must have had a major demographic impact early on in the genetic admixtures giving rise to various Corded Ware groups across Europe. These results show that detecting IBD segments in aDNA can generate new insights both on a small scale, relevant to understanding the life stories of people, and on the macroscale, relevant to large-scale cultural-historical events.
Collapse
Affiliation(s)
- Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, Universität Leipzig, Leipzig, Germanÿ
| | - Ali Akbari
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Swapan Mallick
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| | - Nick Patterson
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - David Reich
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
16
|
Kurant DE. Opportunities and Challenges with Artificial Intelligence in Genomics. Clin Lab Med 2023; 43:87-97. [PMID: 36764810 DOI: 10.1016/j.cll.2022.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The development of artificial intelligence and machine learning algorithms may allow for advances in patient care. There are existing and potential applications in cancer diagnosis and monitoring, identification of at-risk groups of individuals, classification of genetic variants, and even prediction of patient ancestry. This article provides an overview of some current and future applications of artificial intelligence in genomic medicine, in addition to discussing challenges and considerations when bringing these tools into clinical practice.
Collapse
Affiliation(s)
- Danielle E Kurant
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
17
|
Nyerki E, Kalmár T, Schütz O, Lima RM, Neparáczki E, Török T, Maróti Z. correctKin: an optimized method to infer relatedness up to the 4th degree from low-coverage ancient human genomes. Genome Biol 2023; 24:38. [PMID: 36855115 PMCID: PMC9972692 DOI: 10.1186/s13059-023-02882-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 02/17/2023] [Indexed: 03/02/2023] Open
Abstract
Kinship analysis from very low-coverage ancient sequences has been possible up to the second degree with large uncertainties. We propose a new, accurate, and fast method, correctKin, to estimate the kinship coefficient and the confidence interval using low-coverage ancient data. We perform simulations and also validate correctKin on experimental modern and ancient data with widely different genome coverages (0.12×-11.9×) using samples with known family relations and known/unknown population structure. Based on our results, correctKin allows for the reliable identification of relatedness up to the 4th degree from variable/low-coverage ancient or badly degraded forensic whole genome sequencing data.
Collapse
Affiliation(s)
- Emil Nyerki
- Department of Pediatrics, University of Szeged Albert Szent-Györgyi Medical Center Faculty of Medicine, Szeged, Hungary.,Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary
| | - Tibor Kalmár
- Department of Pediatrics, University of Szeged Albert Szent-Györgyi Medical Center Faculty of Medicine, Szeged, Hungary
| | - Oszkár Schütz
- Department of Genetics, University of Szeged, Szeged, Hungary
| | - Rui M Lima
- Institute of Plant Biology, Biological Research Centre, Szeged, Hungary
| | - Endre Neparáczki
- Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary.,Department of Genetics, University of Szeged, Szeged, Hungary
| | - Tibor Török
- Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary.,Department of Genetics, University of Szeged, Szeged, Hungary
| | - Zoltán Maróti
- Department of Pediatrics, University of Szeged Albert Szent-Györgyi Medical Center Faculty of Medicine, Szeged, Hungary. .,Department of Archaeogenetics, Institute of Hungarian Research, Budapest, Hungary.
| |
Collapse
|
18
|
Yasmin T, Andres EM, Ashraf K, Basra MAR, Raza MH. Genome-wide analysis of runs of homozygosity in Pakistani controls with no history of speech or language-related developmental phenotypes. Ann Hum Biol 2023; 50:100-107. [PMID: 36786444 PMCID: PMC10284496 DOI: 10.1080/03014460.2023.2180087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 01/27/2023] [Indexed: 02/15/2023]
Abstract
BACKGROUND Runs of homozygosity (ROHs) analysis of controls provide a convenient resource to minimize the association of false positive results of disease-associated ROHs and genetic variants for simple and complex disorders in individuals from the same population. Evidence for the value of ROHs to speech or language-related traits is restricted due to the absence of population-matched behaviourally defined controls and limited family-based studies. AIM This study aims to identify common ROHs in the Pakistani population, focussing on the total length and frequency of ROHs of variable sizes, shared ROHs, and their genomic distribution. SUBJECTS AND METHODS We performed homozygosity analysis (in PLINK) of 86 individuals (39 males, 47 females) with no history of speech or language-related phenotypes (controls) who had been genotyped with the Illumina Infinium QC Array-24. RESULTS ROHs of 1-<4 megabases (Mb) were frequent in unrelated individuals. We observed ROHs over 20 Mb among six individuals. Over 30 percent of the identified ROHs were shared among several individuals, indicating consanguinity's effect on the Pakistani population. CONCLUSION Our findings serve as a foundation for family-based genetic studies of consanguineous families with speech or language-related disorders to ultimately narrow the homozygosity regions of interest to identify pathogenic variants.
Collapse
Affiliation(s)
- Tahira Yasmin
- Centre for Clinical and Nutritional Chemistry, School of Chemistry, University of The Punjab, Lahore, Pakistan
| | - Erin M. Andres
- Thompson Center for Autism & Neurodevelopment, University of Missouri, Columbia, MO, USA
- Child Language Doctoral Program (CLDP), University of Kansas, Lawrence, KS, 66045, USA
| | - Komal Ashraf
- Centre for Clinical and Nutritional Chemistry, School of Chemistry, University of The Punjab, Lahore, Pakistan
| | - Muhammad Asim Raza Basra
- Centre for Clinical and Nutritional Chemistry, School of Chemistry, University of The Punjab, Lahore, Pakistan
| | - Muhammad Hashim Raza
- Child Language Doctoral Program (CLDP), University of Kansas, Lawrence, KS, 66045, USA
| |
Collapse
|
19
|
Popli D, Peyrégne S, Peter BM. KIN: a method to infer relatedness from low-coverage ancient DNA. Genome Biol 2023; 24:10. [PMID: 36650598 PMCID: PMC9843908 DOI: 10.1186/s13059-023-02847-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Accepted: 01/04/2023] [Indexed: 01/19/2023] Open
Abstract
Genetic kinship of ancient individuals can provide insights into their culture and social hierarchy, and is relevant for downstream genetic analyses. However, estimating relatedness from ancient DNA is difficult due to low-coverage, ascertainment bias, or contamination from various sources. Here, we present KIN, a method to estimate the relatedness of a pair of individuals from the identical-by-descent segments they share. KIN accurately classifies up to 3rd-degree relatives using at least 0.05x sequence coverage and differentiates siblings from parent-child pairs. It incorporates additional models to adjust for contamination and detect inbreeding, which improves classification accuracy.
Collapse
Affiliation(s)
- Divyaratan Popli
- grid.419518.00000 0001 2159 1813 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Stéphane Peyrégne
- grid.419518.00000 0001 2159 1813 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Benjamin M. Peter
- grid.419518.00000 0001 2159 1813 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
20
|
Tang K, Naseri A, Wei Y, Zhang S, Zhi D. Open-source benchmarking of IBD segment detection methods for biobank-scale cohorts. Gigascience 2022; 11:giac111. [PMID: 36472573 PMCID: PMC9724555 DOI: 10.1093/gigascience/giac111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 08/04/2022] [Accepted: 09/28/2022] [Indexed: 12/12/2022] Open
Abstract
In the recent biobank era of genetics, the problem of identical-by-descent (IBD) segment detection received renewed interest, as IBD segments in large cohorts offer unprecedented opportunities in the study of population and genealogical history, as well as genetic association of long haplotypes. While a new generation of efficient methods for IBD segment detection becomes available, direct comparison of these methods is difficult: existing benchmarks were often evaluated in different datasets, with some not openly accessible; methods benchmarked were run under suboptimal parameters; and benchmark performance metrics were not defined consistently. Here, we developed a comprehensive and completely open-source evaluation of the power, accuracy, and resource consumption of these IBD segment detection methods using realistic population genetic simulations with various settings. Our results pave the road for fair evaluation of IBD segment detection methods and provide an practical guide for users.
Collapse
Affiliation(s)
- Kecong Tang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Ardalan Naseri
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Degui Zhi
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
21
|
Alva O, Leroy A, Heiske M, Pereda-Loth V, Tisseyre L, Boland A, Deleuze JF, Rocha J, Schlebusch C, Fortes-Lima C, Stoneking M, Radimilahy C, Rakotoarisoa JA, Letellier T, Pierron D. The loss of biodiversity in Madagascar is contemporaneous with major demographic events. Curr Biol 2022; 32:4997-5007.e5. [PMID: 36334586 DOI: 10.1016/j.cub.2022.09.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 07/13/2022] [Accepted: 09/28/2022] [Indexed: 11/06/2022]
Abstract
Only 400 km off the coast of East Africa, the island of Madagascar is one of the last large land masses to have been colonized by humans. While many questions surround the human occupation of Madagascar, recent studies raise the question of human impact on endemic biodiversity and landscape transformation. Previous genetic and linguistic analyses have shown that the Malagasy population has emerged from an admixture that happened during the last millennium, between Bantu-speaking African populations and Austronesian-speaking Asian populations. By studying the sharing of chromosome segments between individuals (IBD determination), local ancestry information, and simulated genetic data, we inferred that the Malagasy ancestral Asian population was isolated for more than 1,000 years with an effective size of just a few hundred individuals. This isolation ended around 1,000 years before present (BP) by admixture with a small African population. Around the admixture time, there was a rapid demographic expansion due to intrinsic population growth of the newly admixed population, which coincides with extensive changes in Madagascar's landscape and the extinction of all endemic large-bodied vertebrates. Therefore, our approach can provide new insights into past human demography and associated impacts on ecosystems.
Collapse
Affiliation(s)
- Omar Alva
- Équipe de Médecine Evolutive, EVOLSAN faculté de chirurgie dentaire, Université Toulouse III, Toulouse, France
| | - Anaïs Leroy
- Équipe de Médecine Evolutive, EVOLSAN faculté de chirurgie dentaire, Université Toulouse III, Toulouse, France
| | - Margit Heiske
- Équipe de Médecine Evolutive, EVOLSAN faculté de chirurgie dentaire, Université Toulouse III, Toulouse, France
| | - Veronica Pereda-Loth
- Équipe de Médecine Evolutive, EVOLSAN faculté de chirurgie dentaire, Université Toulouse III, Toulouse, France
| | - Lenka Tisseyre
- Équipe de Médecine Evolutive, EVOLSAN faculté de chirurgie dentaire, Université Toulouse III, Toulouse, France
| | - Anne Boland
- Commissariat à l'Energie Atomique, Institut Génomique, Centre National de Génotypage, 91000 Evry, France
| | - Jean-François Deleuze
- Commissariat à l'Energie Atomique, Institut Génomique, Centre National de Génotypage, 91000 Evry, France
| | - Jorge Rocha
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Campus de Vairão, Universidade do Porto, 4485-661 Vairão, Portugal; Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, 4099-002 Porto, Portugal; BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Campus de Vairão, 4485-661 Vairão, Portugal
| | - Carina Schlebusch
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Cesar Fortes-Lima
- Human Evolution, Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18C, 75236 Uppsala, Sweden
| | - Mark Stoneking
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, D-04103 Leipzig, Germany; Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, Villeurbanne, France
| | - Chantal Radimilahy
- Musée d'Art et d'Archéologie, University of Antananarivo, Antananarivo, Madagascar
| | | | - Thierry Letellier
- Équipe de Médecine Evolutive, EVOLSAN faculté de chirurgie dentaire, Université Toulouse III, Toulouse, France
| | - Denis Pierron
- Équipe de Médecine Evolutive, EVOLSAN faculté de chirurgie dentaire, Université Toulouse III, Toulouse, France.
| |
Collapse
|
22
|
Noto K, Ruiz L. Accurate genome-wide phasing from IBD data. BMC Bioinformatics 2022; 23:502. [PMID: 36424541 PMCID: PMC9686111 DOI: 10.1186/s12859-022-05066-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/17/2022] [Indexed: 11/25/2022] Open
Abstract
As genotype databases increase in size, so too do the number of detectable segments of identity by descent (IBD): segments of the genome where two individuals share an identical copy of one of their two parental haplotypes, due to shared ancestry. We show that given a large enough genotype database, these segments of IBD collectively overlap entire chromosomes, including instances of IBD that span multiple chromosomes, and can be used to accurately separate the alleles inherited from each parent across the entire genome. The resulting phase is not an improvement over state-of-the-art local phasing methods, but provides accurate long-range phasing that indicates which of two haplotypes in different regions of the genome, including different chromosomes, was inherited from the same parent. We are able to separate the DNA inherited from each parent completely, across the entire genome, with 98% median accuracy in a test set of 30,000 individuals. We estimate the IBD data requirements for accurate genome-wide phasing, and we propose a method for estimating confidence in the resulting phase. We show that our methods do not require the genotypes of close family, and that they are robust to genotype errors and missing data. In fact, our method can impute missing data accurately and correct genotype errors.
Collapse
|
23
|
Huang M, Liu M, Li H, King J, Smuts A, Budowle B, Ge J. A machine learning approach for missing persons cases with high genotyping errors. Front Genet 2022; 13:971242. [PMID: 36263419 PMCID: PMC9573995 DOI: 10.3389/fgene.2022.971242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 09/16/2022] [Indexed: 11/22/2022] Open
Abstract
Estimating the relationships between individuals is one of the fundamental challenges in many fields. In particular, relationship.ip estimation could provide valuable information for missing persons cases. The recently developed investigative genetic genealogy approach uses high-density single nucleotide polymorphisms (SNPs) to determine close and more distant relationships, in which hundreds of thousands to tens of millions of SNPs are generated either by microarray genotyping or whole-genome sequencing. The current studies usually assume the SNP profiles were generated with minimum errors. However, in the missing person cases, the DNA samples can be highly degraded, and the SNP profiles generated from these samples usually contain lots of errors. In this study, a machine learning approach was developed for estimating the relationships with high error SNP profiles. In this approach, a hierarchical classification strategy was employed first to classify the relationships by degree and then the relationship types within each degree separately. As for each classification, feature selection was implemented to gain better performance. Both simulated and real data sets with various genotyping error rates were utilized in evaluating this approach, and the accuracies of this approach were higher than individual measures; namely, this approach was more accurate and robust than the individual measures for SNP profiles with genotyping errors. In addition, the highest accuracy could be obtained by providing the same genotyping error rates in train and test sets, and thus estimating genotyping errors of the SNP profiles is critical to obtaining high accuracy of relationship estimation.
Collapse
Affiliation(s)
- Meng Huang
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Muyi Liu
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Hongmin Li
- Department of Computer Science, College of Science, California State University, East Bay, Hayward, CA, United States
| | - Jonathan King
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Amy Smuts
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Jianye Ge
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
- *Correspondence: Jianye Ge,
| |
Collapse
|
24
|
Oriol Sabat B, Mas Montserrat D, Giro-i-Nieto X, Ioannidis AG. SALAI-Net: species-agnostic local ancestry inference network. Bioinformatics 2022; 38:ii27-ii33. [PMID: 36124792 PMCID: PMC9486591 DOI: 10.1093/bioinformatics/btac464] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Local ancestry inference (LAI) is the high resolution prediction of ancestry labels along a DNA sequence. LAI is important in the study of human history and migrations, and it is beginning to play a role in precision medicine applications including ancestry-adjusted genome-wide association studies (GWASs) and polygenic risk scores (PRSs). Existing LAI models do not generalize well between species, chromosomes or even ancestry groups, requiring re-training for each different setting. Furthermore, such methods can lack interpretability, which is an important element in each of these applications. RESULTS We present SALAI-Net, a portable statistical LAI method that can be applied on any set of species and ancestries (species-agnostic), requiring only haplotype data and no other biological parameters. Inspired by identity by descent methods, SALAI-Net estimates population labels for each segment of DNA by performing a reference matching approach, which leads to an interpretable and fast technique. We benchmark our models on whole-genome data of humans and we test these models' ability to generalize to dog breeds when trained on human data. SALAI-Net outperforms previous methods in terms of balanced accuracy, while generalizing between different settings, species and datasets. Moreover, it is up to two orders of magnitude faster and uses considerably less RAM memory than competing methods. AVAILABILITY AND IMPLEMENTATION We provide an open source implementation and links to publicly available data at github.com/AI-sandbox/SALAI-Net. Data is publicly available as follows: https://www.internationalgenome.org (1000 Genomes), https://www.simonsfoundation.org/simons-genome-diversity-project (Simons Genome Diversity Project), https://www.sanger.ac.uk/resources/downloads/human/hapmap3.html (HapMap), ftp://ngs.sanger.ac.uk/production/hgdp/hgdp_wgs.20190516 (Human Genome Diversity Project) and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA448733 (Canid genomes). SUPPLEMENTARY INFORMATION Supplementary data are available from Bioinformatics online.
Collapse
Affiliation(s)
- Benet Oriol Sabat
- Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain
- Department of Biomedical Data Science, Stanford Medical School
| | | | - Xavier Giro-i-Nieto
- Department of Signal Theory and Communications, Universitat Politecnica de Catalunya, Barcelona 08034, Spain
| | - Alexander G Ioannidis
- Department of Biomedical Data Science, Stanford Medical School
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
25
|
da Cruz PRS, Ananina G, Secolin R, Gil-da-Silva-Lopes VL, Lima CSP, de França PHC, Donatti A, Lourenço GJ, de Araujo TK, Simioni M, Lopes-Cendes I, Costa FF, de Melo MB. Demographic history differences between Hispanics and Brazilians imprint haplotype features. G3 GENES|GENOMES|GENETICS 2022; 12:6576632. [PMID: 35511163 PMCID: PMC9258545 DOI: 10.1093/g3journal/jkac111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/27/2022] [Indexed: 11/24/2022]
Abstract
Admixture is known to greatly impact the genetic landscape of a population and, while genetic variation underlying human phenotypes has been shown to differ among populations, studies on admixed subjects are still scarce. Latin American populations are the result of complex demographic history, such as 2 or 3-way admixing events, bottlenecks and/or expansions, and adaptive events unique to the American continent. To explore the impact of these events on the genetic structure of Latino populations, we evaluated the following haplotype features: linkage disequilibrium, shared identity by descent segments, runs of homozygosity, and extended haplotype homozygosity (integrated haplotype score) in Latinos represented in the 1000 Genome Project along with array data from 171 Brazilians sampled in the South and Southeast regions of Brazil. We found that linkage disequilibrium decay relates to the amount of American and African ancestry. The extent of identity by descent sharing positively correlates with historical effective population sizes, which we found to be steady or growing, except for Puerto Ricans and Colombians. Long runs of homozygosity, a particular instance of autozygosity, was only enriched in Peruvians and Native Americans. We used simulations to account for random sampling and linkage disequilibrium to filter positive selection indexes and found 244 unique markers under selection, 26 of which are common to 2 or more populations. Some markers exhibiting positive selection signals had estimated time to the most recent common ancestor consistent with human adaptation to the American continent. In conclusion, Latino populations present highly divergent haplotype characteristics that impact genetic architecture and underlie complex phenotypes.
Collapse
Affiliation(s)
- Pedro Rodrigues Sousa da Cruz
- Laboratory of Human Genetics, Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas—UNICAMP , Campinas, SP 13083-875, Brazil
| | - Galina Ananina
- Laboratory of Human Genetics, Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas—UNICAMP , Campinas, SP 13083-875, Brazil
| | - Rodrigo Secolin
- Department of Medical Genetics and Genomic Medicine, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
- The Brazilian Institute of Neuroscience and Neurotechnology (BRAINN) , Campinas, SP 13083-887, Brazil
| | - Vera Lúcia Gil-da-Silva-Lopes
- Department of Medical Genetics and Genomic Medicine, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
| | - Carmen Silvia Passos Lima
- Clinical Oncology Service, Department of Internal Medicine, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
| | | | - Amanda Donatti
- Department of Medical Genetics and Genomic Medicine, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
- The Brazilian Institute of Neuroscience and Neurotechnology (BRAINN) , Campinas, SP 13083-887, Brazil
| | - Gustavo Jacob Lourenço
- Laboratory of Cancer Genetics, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
| | - Tânia Kawasaki de Araujo
- Department of Medical Genetics and Genomic Medicine, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
| | - Milena Simioni
- Department of Medical Genetics and Genomic Medicine, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
| | - Iscia Lopes-Cendes
- Department of Medical Genetics and Genomic Medicine, School of Medical Sciences, University of Campinas—UNICAMP , Campinas, SP 13083-887, Brazil
- The Brazilian Institute of Neuroscience and Neurotechnology (BRAINN) , Campinas, SP 13083-887, Brazil
| | - Fernando Ferreira Costa
- Hematology and Hemotherapy Center, University of Campinas—UNICAMP, Campinas, SP, 13083-878 , Brazil
| | - Mônica Barbosa de Melo
- Laboratory of Human Genetics, Center for Molecular Biology and Genetic Engineering (CBMEG), University of Campinas—UNICAMP , Campinas, SP 13083-875, Brazil
| |
Collapse
|
26
|
Turner SD, Nagraj V, Scholz M, Jessa S, Acevedo C, Ge J, Woerner AE, Budowle B. Evaluating the Impact of Dropout and Genotyping Error on SNP-Based Kinship Analysis With Forensic Samples. Front Genet 2022; 13:882268. [PMID: 35846115 PMCID: PMC9282869 DOI: 10.3389/fgene.2022.882268] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 05/16/2022] [Indexed: 11/13/2022] Open
Abstract
Technological advances in sequencing and single nucleotide polymorphism (SNP) genotyping microarray technology have facilitated advances in forensic analysis beyond short tandem repeat (STR) profiling, enabling the identification of unknown DNA samples and distant relationships. Forensic genetic genealogy (FGG) has facilitated the identification of distant relatives of both unidentified remains and unknown donors of crime scene DNA, invigorating the use of biological samples to resolve open cases. Forensic samples are often degraded or contain only trace amounts of DNA. In this study, the accuracy of genome-wide relatedness methods and identity by descent (IBD) segment approaches was evaluated in the presence of challenges commonly encountered with forensic data: missing data and genotyping error. Pedigree whole-genome simulations were used to estimate the genotypes of thousands of individuals with known relationships using multiple populations with different biogeographic ancestral origins. Simulations were also performed with varying error rates and types. Using these data, the performance of different methods for quantifying relatedness was benchmarked across these scenarios. When the genotyping error was low (<1%), IBD segment methods outperformed genome-wide relatedness methods for close relationships and are more accurate at distant relationship inference. However, with an increasing genotyping error (1–5%), methods that do not rely on IBD segment detection are more robust and outperform IBD segment methods. The reduced call rate had little impact on either class of methods. These results have implications for the use of dense SNP data in forensic genomics for distant kinship analysis and FGG, especially when the sample quality is low.
Collapse
Affiliation(s)
- Stephen D. Turner
- Signature Science, LLC., Austin, TX, United States
- *Correspondence: Stephen D. Turner,
| | - V.P. Nagraj
- Signature Science, LLC., Austin, TX, United States
| | | | | | | | - Jianye Ge
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - August E. Woerner
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| | - Bruce Budowle
- Center for Human Identification, University of North Texas Health Science Center, Fort Worth, TX, United States
- Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, United States
| |
Collapse
|
27
|
Tournebize R, Chu G, Moorjani P. Reconstructing the history of founder events using genome-wide patterns of allele sharing across individuals. PLoS Genet 2022; 18:e1010243. [PMID: 35737729 PMCID: PMC9223333 DOI: 10.1371/journal.pgen.1010243] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Accepted: 05/08/2022] [Indexed: 11/30/2022] Open
Abstract
Founder events play a critical role in shaping genetic diversity, fitness and disease risk in a population. Yet our understanding of the prevalence and distribution of founder events in humans and other species remains incomplete, as most existing methods require large sample sizes or phased genomes. Thus, we developed ASCEND that measures the correlation in allele sharing between pairs of individuals across the genome to infer the age and strength of founder events. We show that ASCEND can reliably estimate the parameters of founder events under a range of demographic scenarios. We then apply ASCEND to two species with contrasting evolutionary histories: ~460 worldwide human populations and ~40 modern dog breeds. In humans, we find that over half of the analyzed populations have evidence for recent founder events, associated with geographic isolation, modes of sustenance, or cultural practices such as endogamy. Notably, island populations have lower population sizes than continental groups and most hunter-gatherer, nomadic and indigenous groups have evidence of recent founder events. Many present-day groups––including Native Americans, Oceanians and South Asians––have experienced more extreme founder events than Ashkenazi Jews who have high rates of recessive diseases due their known history of founder events. Using ancient genomes, we show that the strength of founder events differs markedly across geographic regions and time––with three major founder events related to the peopling of Americas and a trend in decreasing strength of founder events in Europe following the Neolithic transition and steppe migrations. In dogs, we estimate extreme founder events in most breeds that occurred in the last 25 generations, concordant with the establishment of many dog breeds during the Victorian times. Our analysis highlights a widespread history of founder events in humans and dogs and elucidates some of the demographic and cultural practices related to these events. A founder event occurs when small numbers of ancestral individuals give rise to a large fraction of the population. Founder events reduce genetic variation and increase the risk of recessive diseases. Despite their importance in evolutionary and disease studies, we still only have a limited comprehension of their prevalence and properties in humans and other species, as most existing methods require large sample sizes or phased genomes. Here, we present a flexible method, ASCEND, to infer the timing and the strength of founder events that is suitable for sparse datasets with few samples or limited coverage. ASCEND provides reliable estimates across a wide range of demographic scenarios. By applying it to data from two species (humans and dogs), we document a widespread history of recent founder events in both species and provide insights about the demographic processes related to these events. Our analysis helps to identify groups with strong founder events that should be prioritized for future studies as they offer a unique opportunity for biological discovery and reducing disease burden through mapping of recessive disease-causing genes and pathways, as previously shown in studies of Ashkenazi Jews and Finns.
Collapse
Affiliation(s)
- Rémi Tournebize
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
- Center for Computational Biology, University of California, Berkeley, California, United States of America
- * E-mail: (RT); (PM)
| | - Gillian Chu
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, California, United States of America
| | - Priya Moorjani
- Department of Molecular and Cell Biology, University of California, Berkeley, California, United States of America
- Center for Computational Biology, University of California, Berkeley, California, United States of America
- * E-mail: (RT); (PM)
| |
Collapse
|
28
|
Smith J, Qiao Y, Williams AL. Evaluating the utility of identity-by-descent segment numbers for relatedness inference via information theory and classification. G3 (BETHESDA, MD.) 2022; 12:jkac072. [PMID: 35348675 PMCID: PMC9157175 DOI: 10.1093/g3journal/jkac072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 03/07/2022] [Indexed: 11/29/2022]
Abstract
Despite decades of methods development for classifying relatives in genetic studies, pairwise relatedness methods' recalls are above 90% only for first through third-degree relatives. The top-performing approaches, which leverage identity-by-descent segments, often use only kinship coefficients, while others, including estimation of recent shared ancestry (ERSA), use the number of segments relatives share. To quantify the potential for using segment numbers in relatedness inference, we leveraged information theory measures to analyze exact (i.e. produced by a simulator) identity-by-descent segments from simulated relatives. Over a range of settings, we found that the mutual information between the relatives' degree of relatedness and a tuple of their kinship coefficient and segment number is on average 4.6% larger than between the degree and the kinship coefficient alone. We further evaluated identity-by-descent segment number utility by building a Bayes classifier to predict first through sixth-degree relationships using different feature sets. When trained and tested with exact segments, the inclusion of segment numbers improves the recall by between 0.28% and 3% for second through sixth-degree relatives. However, the recalls improve by less than 1.8% per degree when using inferred segments, suggesting limitations due to identity-by-descent detection accuracy. Last, we compared our Bayes classifier that includes segment numbers with both ERSA and IBIS and found comparable recalls, with the Bayes classifier and ERSA slightly outperforming each other across different degrees. Overall, this study shows that identity-by-descent segment numbers can improve relatedness inference, but errors from current SNP array-based detection methods yield dampened signals in practice.
Collapse
Affiliation(s)
- Jesse Smith
- School of Applied and Engineering Physics, Cornell University, Ithaca, NY 14853, USA
| | - Ying Qiao
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
29
|
Fan C, Mancuso N, Chiang CW. A genealogical estimate of genetic relationships. Am J Hum Genet 2022; 109:812-824. [PMID: 35417677 DOI: 10.1016/j.ajhg.2022.03.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 03/25/2022] [Indexed: 12/23/2022] Open
Abstract
The application of genetic relationships among individuals, characterized by a genetic relationship matrix (GRM), has far-reaching effects in human genetics. However, the current standard to calculate the GRM treats linked markers as independent and does not explicitly model the underlying genealogical history of the study sample. Here, we propose a coalescent-informed framework, namely the expected GRM (eGRM), to infer the expected relatedness between pairs of individuals given an ancestral recombination graph (ARG) of the sample. Through extensive simulations, we show that the eGRM is an unbiased estimate of latent pairwise genome-wide relatedness and is robust when computed with ARG inferred from incomplete genetic data. As a result, the eGRM better captures the structure of a population than the canonical GRM, even when using the same genetic information. More importantly, our framework allows a principled approach to estimate the eGRM at different time depths of the ARG, thereby revealing the time-varying nature of population structure in a sample. When applied to SNP array genotypes from a population sample from Northern and Eastern Finland, we find that clustering analysis with the eGRM reveals population structure driven by subpopulations that would not be apparent via the canonical GRM and that temporally the population model is consistent with recent divergence and expansion. Taken together, our proposed eGRM provides a robust tree-centric estimate of relatedness with wide application to genetic studies.
Collapse
|
30
|
Lett BM, Kirkpatrick BW. Identifying genetic variants and pathways influencing daughter averages for twinning in North American Holstein cattle and evaluating the potential for genomic selection. J Dairy Sci 2022; 105:5972-5984. [PMID: 35525609 DOI: 10.3168/jds.2021-21238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 03/04/2022] [Indexed: 11/19/2022]
Abstract
Multiple birth in dairy cattle is a detrimental trait both economically for producers and for animal health. Genetics of twinning is complex and has led to several quantitative trait loci regions being associated with increased twinning. To identify variants associated with this trait, calving records from 2 time periods were used to estimate daughter averages for twinning for Holstein bulls. Multiple analyses were conducted and compared including GWAS, genomic prediction, and gene set enrichment analysis for pathway detection. Although pathway analysis did not yield many congruent pathways of interest between data sets, it did indicate two of interest. Both pathways have ties to the strong candidate region on BTA11 from the genome-wide association analysis across data sets. This region does not overlap with previously identified quantitative trait loci regions for twinning or ovulation rate in cattle. The strongest associated SNPs were upstream from 2 candidate genes LHCGR and FSHR, which are involved in folliculogenesis. Genomic prediction showed a moderate correlation accuracy (0.43) when predicting genomic breeding values for bulls with estimates from calving records from 2010 to 2016. Future analysis of the region on BTA11 and the relation of the candidate genes could improve this accuracy.
Collapse
Affiliation(s)
- Beth M Lett
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison 53706
| | - Brian W Kirkpatrick
- Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison 53706.
| |
Collapse
|
31
|
Turner SD, Nagraj VP, Scholz M, Jessa S, Acevedo C, Ge J, Woerner AE, Budowle B. skater: an R package for SNP-based kinship analysis, testing, and evaluation. F1000Res 2022; 11:18. [PMID: 35222994 PMCID: PMC8844523 DOI: 10.12688/f1000research.76004.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/20/2021] [Indexed: 11/20/2022] Open
Abstract
Motivation: SNP-based kinship analysis with genome-wide relationship estimation and IBD segment analysis methods produces results that often require further downstream process- ing and manipulation. A dedicated software package that consistently and intuitively imple- ments this analysis functionality is needed. Results: Here we present the skater R package for SNP-based kinship analysis, testing, and evaluation with R. The skater package contains a suite of well-documented tools for importing, parsing, and analyzing pedigree data, performing relationship degree inference, benchmarking relationship degree classification, and summarizing IBD segment data. Availability: The skater package is implemented as an R package and is released under the MIT license at https://github.com/signaturescience/skater. Documentation is available at https://signaturescience.github.io/skater.
Collapse
Affiliation(s)
| | - V P Nagraj
- Signature Science, LLC., Austin, TX, 78759, USA
| | | | | | | | - Jianye Ge
- Center for Human Identification, Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA
| | - August E Woerner
- Center for Human Identification, Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA
| | - Bruce Budowle
- Center for Human Identification, Department of Microbiology, Immunology, and Genetics, University of North Texas Health Science Center, Fort Worth, TX, 76107, USA
| |
Collapse
|
32
|
Balagué-Dobón L, Cáceres A, González JR. Fully exploiting SNP arrays: a systematic review on the tools to extract underlying genomic structure. Brief Bioinform 2022; 23:6535682. [PMID: 35211719 PMCID: PMC8921734 DOI: 10.1093/bib/bbac043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 01/25/2022] [Accepted: 01/28/2022] [Indexed: 12/12/2022] Open
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of genomic variation and the most accessible to genotype in large cohorts. However, they individually explain a small proportion of phenotypic differences between individuals. Ancestry, collective SNP effects, structural variants, somatic mutations or even differences in historic recombination can potentially explain a high percentage of genomic divergence. These genetic differences can be infrequent or laborious to characterize; however, many of them leave distinctive marks on the SNPs across the genome allowing their study in large population samples. Consequently, several methods have been developed over the last decade to detect and analyze different genomic structures using SNP arrays, to complement genome-wide association studies and determine the contribution of these structures to explain the phenotypic differences between individuals. We present an up-to-date collection of available bioinformatics tools that can be used to extract relevant genomic information from SNP array data including population structure and ancestry; polygenic risk scores; identity-by-descent fragments; linkage disequilibrium; heritability and structural variants such as inversions, copy number variants, genetic mosaicisms and recombination histories. From a systematic review of recently published applications of the methods, we describe the main characteristics of R packages, command-line tools and desktop applications, both free and commercial, to help make the most of a large amount of publicly available SNP data.
Collapse
|
33
|
Benchmarking phasing software with a whole-genome sequenced cattle pedigree. BMC Genomics 2022; 23:130. [PMID: 35164677 PMCID: PMC8845340 DOI: 10.1186/s12864-022-08354-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 01/24/2022] [Indexed: 12/30/2022] Open
Abstract
Background Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. Results After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors. Conclusions We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08354-6.
Collapse
|
34
|
Çelik G, Tuncalı T. ROHMM-A flexible hidden Markov model framework to detect runs of homozygosity from genotyping data. Hum Mutat 2021; 43:158-168. [PMID: 34923717 DOI: 10.1002/humu.24316] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 11/29/2021] [Accepted: 12/15/2021] [Indexed: 11/05/2022]
Abstract
Runs of long homozygous (ROH) stretches are considered to be the result of consanguinity and usually contain recessive deleterious disease-causing mutations. Several algorithms have been developed to detect ROHs. Here, we developed a simple alternative strategy by examining X chromosome non-pseudoautosomal region to detect the ROHs from next-generation sequencing data utilizing the genotype probabilities and the hidden Markov model algorithm as a tool, namely ROHMM. It is implemented purely in java and contains both a command line and a graphical user interface. We tested ROHMM on simulated data as well as real population data from the 1000G Project and a clinical sample. Our results have shown that ROHMM can perform robustly producing highly accurate homozygosity estimations under all conditions thereby meeting and even exceeding the performance of its natural competitors.
Collapse
Affiliation(s)
- Gökalp Çelik
- Health Sciences Institute, Department of Medical Genetics, Ankara Yildirim Beyazit University, Ankara, Turkey
| | - Timur Tuncalı
- Department of Medical Genetics, Ankara University School of Medicine, Ankara, Turkey
| |
Collapse
|
35
|
Arciero E, Dogra SA, Malawsky DS, Mezzavilla M, Tsismentzoglou T, Huang QQ, Hunt KA, Mason D, Sharif SM, van Heel DA, Sheridan E, Wright J, Small N, Carmi S, Iles MM, Martin HC. Fine-scale population structure and demographic history of British Pakistanis. Nat Commun 2021; 12:7189. [PMID: 34893604 PMCID: PMC8664933 DOI: 10.1038/s41467-021-27394-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 11/09/2021] [Indexed: 02/08/2023] Open
Abstract
Previous genetic and public health research in the Pakistani population has focused on the role of consanguinity in increasing recessive disease risk, but little is known about its recent population history or the effects of endogamy. Here, we investigate fine-scale population structure, history and consanguinity patterns using genotype chip data from 2,200 British Pakistanis. We reveal strong recent population structure driven by the biraderi social stratification system. We find that all subgroups have had low recent effective population sizes (Ne), with some showing a decrease 15‒20 generations ago that has resulted in extensive identity-by-descent sharing and homozygosity, increasing the risk of recessive disorders. Our results from two orthogonal methods (one using machine learning and the other coalescent-based) suggest that the detailed reporting of parental relatedness for mothers in the cohort under-represents the true levels of consanguinity. These results demonstrate the impact of cultural practices on population structure and genomic diversity in Pakistanis, and have important implications for medical genetic studies.
Collapse
Affiliation(s)
- Elena Arciero
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
| | - Sufyan A. Dogra
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Daniel S. Malawsky
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Massimo Mezzavilla
- grid.5133.40000 0001 1941 4308Department of Medical Sciences, University of Trieste, Trieste, Italy
| | - Theofanis Tsismentzoglou
- grid.9909.90000 0004 1936 8403Leeds Institute for Data Analytics, University of Leeds, Leeds, UK ,grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Qin Qin Huang
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Karen A. Hunt
- grid.4868.20000 0001 2171 1133Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Dan Mason
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Saghira Malik Sharif
- grid.415967.80000 0000 9965 1030Yorkshire Regional Genetics Service, Leeds Teaching Hospitals NHS Trust, Leeds, UK
| | - David A. van Heel
- grid.4868.20000 0001 2171 1133Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Eamonn Sheridan
- grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - John Wright
- grid.418449.40000 0004 0379 5398Bradford Institute for Health Research, Bradford Teaching Hospitals NHS Foundation Trust, Bradford, UK
| | - Neil Small
- grid.6268.a0000 0004 0379 5283Faculty of Health Studies, University of Bradford, Richmond Road, Bradford, UK
| | - Shai Carmi
- grid.9619.70000 0004 1937 0538Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mark M. Iles
- grid.9909.90000 0004 1936 8403Leeds Institute for Data Analytics, University of Leeds, Leeds, UK ,grid.9909.90000 0004 1936 8403Leeds Institute of Medical Research, University of Leeds, Leeds, UK
| | - Hilary C. Martin
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
36
|
de Vries JH, Kling D, Vidaki A, Arp P, Kalamara V, Verbiest MMPJ, Piniewska-Róg D, Parsons TJ, Uitterlinden AG, Kayser M. Impact of SNP microarray analysis of compromised DNA on kinship classification success in the context of investigative genetic genealogy. Forensic Sci Int Genet 2021; 56:102625. [PMID: 34753062 DOI: 10.1016/j.fsigen.2021.102625] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/25/2021] [Accepted: 10/27/2021] [Indexed: 11/04/2022]
Abstract
Single nucleotide polymorphism (SNP) data generated with microarray technologies have been used to solve murder cases via investigative leads obtained from identifying relatives of the unknown perpetrator included in accessible genomic databases, an approach referred to as investigative genetic genealogy (IGG). However, SNP microarrays were developed for relatively high input DNA quantity and quality, while DNA typically obtainable from crime scene stains is of low DNA quantity and quality, and SNP microarray data obtained from compromised DNA are largely missing. By applying the Illumina Global Screening Array (GSA) to 264 DNA samples with systematically altered quantity and quality, we empirically tested the impact of SNP microarray analysis of compromised DNA on kinship classification success, as relevant in IGG. Reference data from manufacturer-recommended input DNA quality and quantity were used to estimate genotype accuracy in the compromised DNA samples and for simulating data of different degree relatives. Although stepwise decrease of input DNA amount from 200 ng to 6.25 pg led to decreased SNP call rates and increased genotyping errors, kinship classification success did not decrease down to 250 pg for siblings and 1st cousins, 1 ng for 2nd cousins, while at 25 pg and below kinship classification success was zero. Stepwise decrease of input DNA quality via increased DNA fragmentation resulted in the decrease of genotyping accuracy as well as kinship classification success, which went down to zero at the average DNA fragment size of 150 base pairs. Combining decreased DNA quantity and quality in mock casework and skeletal samples further highlighted possibilities and limitations. Overall, GSA analysis achieved maximal kinship classification success from 800 to 200 times lower input DNA quantities than manufacturer-recommended, although DNA quality plays a key role too, while compromised DNA produced false negative kinship classifications rather than false positive ones.
Collapse
Affiliation(s)
- Jard H de Vries
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Daniel Kling
- Department of Forensic Genetics and Toxicology, National Board of Forensic Medicine, Artillerigatan 12, 587 58 Linköping, Sweden
| | - Athina Vidaki
- Erasmus MC, University Medical Center Rotterdam, Department of Genetic Identification, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Pascal Arp
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Vivian Kalamara
- Erasmus MC, University Medical Center Rotterdam, Department of Genetic Identification, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Michael M P J Verbiest
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Danuta Piniewska-Róg
- Malopolska Centre of Biotechnology, Jagiellonian University, 30-387 Krakow, Poland; Department of Forensic Medicine, Jagiellonian University Medical College, 31-531 Krakow, Poland
| | - Thomas J Parsons
- International Commission on Missing Persons, Koninginnegracht 12a, 2514 AA Den Haag, the Netherlands
| | - André G Uitterlinden
- Erasmus MC, University Medical Center Rotterdam, Department of Internal Medicine, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands; Erasmus MC, University Medical Center Rotterdam, Department of Epidemiology, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands
| | - Manfred Kayser
- Erasmus MC, University Medical Center Rotterdam, Department of Genetic Identification, Dr. Molewaterplein 40, 3015 GD Rotterdam, the Netherlands.
| |
Collapse
|
37
|
Hateley S, Lopez-Izquierdo A, Jou CJ, Cho S, Schraiber JG, Song S, Maguire CT, Torres N, Riedel M, Bowles NE, Arrington CB, Kennedy BJ, Etheridge SP, Lai S, Pribble C, Meyers L, Lundahl D, Byrnes J, Granka JM, Kauffman CA, Lemmon G, Boyden S, Scott Watkins W, Karren MA, Knight S, Brent Muhlestein J, Carlquist JF, Anderson JL, Chahine KG, Shah KU, Ball CA, Benjamin IJ, Yandell M, Tristani-Firouzi M. The history and geographic distribution of a KCNQ1 atrial fibrillation risk allele. Nat Commun 2021; 12:6442. [PMID: 34750360 PMCID: PMC8575962 DOI: 10.1038/s41467-021-26741-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 10/20/2021] [Indexed: 11/08/2022] Open
Abstract
The genetic architecture of atrial fibrillation (AF) encompasses low impact, common genetic variants and high impact, rare variants. Here, we characterize a high impact AF-susceptibility allele, KCNQ1 R231H, and describe its transcontinental geographic distribution and history. Induced pluripotent stem cell-derived cardiomyocytes procured from risk allele carriers exhibit abbreviated action potential duration, consistent with a gain-of-function effect. Using identity-by-descent (IBD) networks, we estimate the broad- and fine-scale population ancestry of risk allele carriers and their relatives. Analysis of ancestral migration routes reveals ancestors who inhabited Denmark in the 1700s, migrated to the Northeastern United States in the early 1800s, and traveled across the Midwest to arrive in Utah in the late 1800s. IBD/coalescent-based allele dating analysis reveals a relatively recent origin of the AF risk allele (~5000 years). Thus, our approach broadens the scope of study for disease susceptibility alleles to the context of human migration and ancestral origins.
Collapse
Affiliation(s)
| | | | - Chuanchau J Jou
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Scott Cho
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | | | - Colin T Maguire
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Natalia Torres
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Michael Riedel
- Cardiovascular Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Neil E Bowles
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Cammon B Arrington
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Brett J Kennedy
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Susan P Etheridge
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Shuping Lai
- Cardiovascular Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Chase Pribble
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Lindsay Meyers
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Derek Lundahl
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | | | - Christopher A Kauffman
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Gordon Lemmon
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Steven Boyden
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - W Scott Watkins
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Mary Anne Karren
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | | | | | | | | | | | - Khushi U Shah
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA
| | | | - Ivor J Benjamin
- Cardiovascular Center, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Mark Yandell
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Martin Tristani-Firouzi
- Nora Eccles Harrison CVRTI, University of Utah School of Medicine, Salt Lake City, UT, USA.
- Division of Pediatric Cardiology, University of Utah School of Medicine, Salt Lake City, UT, USA.
| |
Collapse
|
38
|
Belbin GM, Rutledge S, Dodatko T, Cullina S, Turchin MC, Kohli S, Torre D, Yee MC, Gignoux CR, Abul-Husn NS, Houten SM, Kenny EE. Leveraging health systems data to characterize a large effect variant conferring risk for liver disease in Puerto Ricans. Am J Hum Genet 2021; 108:2099-2111. [PMID: 34678161 PMCID: PMC8595966 DOI: 10.1016/j.ajhg.2021.09.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 09/28/2021] [Indexed: 12/22/2022] Open
Abstract
The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784-12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.
Collapse
Affiliation(s)
- Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Stephanie Rutledge
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Tetyana Dodatko
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sinead Cullina
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael C Turchin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sumita Kohli
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Denis Torre
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Muh-Ching Yee
- Stanford Functional Genomics Facility, Stanford University, Stanford, CA 94305, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Noura S Abul-Husn
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sander M Houten
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
39
|
Wang W, Chen L, Wang X, Duan J, Flynn RD, Wang Y, Clark CB, Sun L, Zhang D, Wang DR, Kessler SA, Ma J. A transposon-mediated reciprocal translocation promotes environmental adaptation but compromises domesticability of wild soybeans. THE NEW PHYTOLOGIST 2021; 232:1765-1777. [PMID: 34363228 DOI: 10.1111/nph.17671] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 08/04/2021] [Indexed: 06/13/2023]
Abstract
Large structural variations frequently occur in higher plants; however, the impact of such variations on plant diversification, adaptation and domestication remains elusive. Here, we mapped and characterised a reciprocal chromosomal translocation in soybeans and assessed its effects on diversification and adaptation of wild (Glycine soja) and semiwild (Glycine gracilis) soybeans, and domestication of cultivated soybean (Glycine max), by tracing the distribution of the translocation in the USDA Soybean Germplasm Collection and population genetics analysis. We demonstrate that the translocation occurred through CACTA transposon-mediated chromosomal breakage in wild soybean c. 0.34 Ma and is responsible for semisterility in translocation heterozygotes and reduces their reproductive fitness. The translocation has differentiated Continental (i.e. China and Russia) populations from Maritime (i.e. Korea and Japan) populations of G. soja and predominately adapted to cold and dry climates. Further analysis revealed that the divergence of G. max from G. soja predates the translocation event and that G. gracilis is an evolutionary intermediate between G. soja and G. max. Our results highlight the effects of a chromosome rearrangement on the processes leading to plant divergence and adaptation, and provides evidence that suggests G. gracilis, rather than G. soja, as the ancestor of cultivated soybean.
Collapse
Affiliation(s)
- Weidong Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Liyang Chen
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Xutong Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Jingbo Duan
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Rachel D Flynn
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN, 47907, USA
| | - Ying Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
- College of Plant Science, Jilin University, Changchun, Jilin, 130062, China
| | - Chancelor B Clark
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
| | - Lianjun Sun
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
- College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100083, China
| | - Dajian Zhang
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
- College of Agronomy, Shandong Agricultural University, Tai'an, Shandong, 271018, China
| | - Diane R Wang
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
- Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Sharon A Kessler
- Department of Botany and Plant Pathology, Purdue University, West Lafayette, IN, 47907, USA
- Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
| | - Jianxin Ma
- Department of Agronomy, Purdue University, West Lafayette, IN, 47907, USA
- Center for Plant Biology, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
40
|
Fast two-stage phasing of large-scale sequence data. Am J Hum Genet 2021; 108:1880-1890. [PMID: 34478634 DOI: 10.1016/j.ajhg.2021.08.005] [Citation(s) in RCA: 172] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 08/10/2021] [Indexed: 01/02/2023] Open
Abstract
Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.
Collapse
|
41
|
Abstract
We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets.
Collapse
|
42
|
Sesia M, Bates S, Candès E, Marchini J, Sabatti C. False discovery rate control in genome-wide association studies with population structure. Proc Natl Acad Sci U S A 2021; 118:e2105841118. [PMID: 34580220 PMCID: PMC8501795 DOI: 10.1073/pnas.2105841118] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/18/2021] [Indexed: 12/25/2022] Open
Abstract
We present a comprehensive statistical framework to analyze data from genome-wide association studies of polygenic traits, producing interpretable findings while controlling the false discovery rate. In contrast with standard approaches, our method can leverage sophisticated multivariate algorithms but makes no parametric assumptions about the unknown relation between genotypes and phenotype. Instead, we recognize that genotypes can be considered as a random sample from an appropriate model, encapsulating our knowledge of genetic inheritance and human populations. This allows the generation of imperfect copies (knockoffs) of these variables that serve as ideal negative controls, correcting for linkage disequilibrium and accounting for unknown population structure, which may be due to diverse ancestries or familial relatedness. The validity and effectiveness of our method are demonstrated by extensive simulations and by applications to the UK Biobank data. These analyses confirm our method is powerful relative to state-of-the-art alternatives, while comparisons with other studies validate most of our discoveries. Finally, fast software is made available for researchers to analyze Biobank-scale datasets.
Collapse
Affiliation(s)
- Matteo Sesia
- Department of Data Sciences and Operations, University of Southern California, Los Angeles, CA 90089;
| | - Stephen Bates
- Department of Statistics, University of California, Berkeley, CA 94720
- Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720
| | - Emmanuel Candès
- Department of Statistics, Stanford University, Stanford, CA 94305;
- Department of Mathematics, Stanford University, Stanford, CA 94305
| | | | - Chiara Sabatti
- Department of Statistics, Stanford University, Stanford, CA 94305
- Department of Biomedical Data Sciences, Stanford University, Stanford, CA 94305
| |
Collapse
|
43
|
Sticca EL, Belbin GM, Gignoux CR. Current Developments in Detection of Identity-by-Descent Methods and Applications. Front Genet 2021; 12:722602. [PMID: 34567074 PMCID: PMC8461052 DOI: 10.3389/fgene.2021.722602] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/24/2021] [Indexed: 01/23/2023] Open
Abstract
Identity-by-descent (IBD), the detection of shared segments inherited from a common ancestor, is a fundamental concept in genomics with broad applications in the characterization and analysis of genomes. While historically the concept of IBD was extensively utilized through linkage analyses and in studies of founder populations, applications of IBD-based methods subsided during the genome-wide association study era. This was primarily due to the computational expense of IBD detection, which becomes increasingly relevant as the field moves toward the analysis of biobank-scale datasets that encompass individuals from highly diverse backgrounds. To address these computational barriers, the past several years have seen new methodological advances enabling IBD detection for datasets in the hundreds of thousands to millions of individuals, enabling novel analyses at an unprecedented scale. Here, we describe the latest innovations in IBD detection and describe opportunities for the application of IBD-based methods across a broad range of questions in the field of genomics.
Collapse
Affiliation(s)
- Evan L Sticca
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Gillian M Belbin
- Institute for Genomic Health, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program and Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| |
Collapse
|
44
|
Kivisild T, Saag L, Hui R, Biagini SA, Pankratov V, D'Atanasio E, Pagani L, Saag L, Rootsi S, Mägi R, Metspalu E, Valk H, Malve M, Irdt K, Reisberg T, Solnik A, Scheib CL, Seidman DN, Williams AL, Tambets K, Metspalu M. Patterns of genetic connectedness between modern and medieval Estonian genomes reveal the origins of a major ancestry component of the Finnish population. Am J Hum Genet 2021; 108:1792-1806. [PMID: 34411538 DOI: 10.1016/j.ajhg.2021.07.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 07/23/2021] [Indexed: 11/20/2022] Open
Abstract
The Finnish population is a unique example of a genetic isolate affected by a recent founder event. Previous studies have suggested that the ancestors of Finnic-speaking Finns and Estonians reached the circum-Baltic region by the 1st millennium BC. However, high linguistic similarity points to a more recent split of their languages. To study genetic connectedness between Finns and Estonians directly, we first assessed the efficacy of imputation of low-coverage ancient genomes by sequencing a medieval Estonian genome to high depth (23×) and evaluated the performance of its down-sampled replicas. We find that ancient genomes imputed from >0.1× coverage can be reliably used in principal-component analyses without projection. By searching for long shared allele intervals (LSAIs; similar to identity-by-descent segments) in unphased data for >143,000 present-day Estonians, 99 Finns, and 14 imputed ancient genomes from Estonia, we find unexpectedly high levels of individual connectedness between Estonians and Finns for the last eight centuries in contrast to their clear differentiation by allele frequencies. High levels of sharing of these segments between Estonians and Finns predate the demographic expansion and late settlement process of Finland. One plausible source of this extensive sharing is the 8th-10th centuries AD migration event from North Estonia to Finland that has been proposed to explain uniquely shared linguistic features between the Finnish language and the northern dialect of Estonian and shared Christianity-related loanwords from Slavic. These results suggest that LSAI detection provides a computationally tractable way to detect fine-scale structure in large cohorts.
Collapse
Affiliation(s)
- Toomas Kivisild
- Department of Human Genetics, KU Leuven, Leuven 3000, Belgium; Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK.
| | - Lehti Saag
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Research Department of Genetics, Evolution, and Environment, University College London, London WC1E 6BT, UK
| | - Ruoyun Hui
- McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK; The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB, UK
| | | | - Vasili Pankratov
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Eugenia D'Atanasio
- Instituto di Biologia e Patologia Molecolari, Consiglio Nazionale delle Ricerche, Rome, Italy
| | - Luca Pagani
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Department of Biology, University of Padova, 35131 Padova, Italy
| | - Lauri Saag
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Siiri Rootsi
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Reedik Mägi
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Ene Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Heiki Valk
- Department of Archaeology, Institute of History and Archaeology, University of Tartu, Tartu 51014, Estonia
| | - Martin Malve
- Department of Archaeology, Institute of History and Archaeology, University of Tartu, Tartu 51014, Estonia
| | - Kadri Irdt
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Tuuli Reisberg
- Core Facility, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Anu Solnik
- Core Facility, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Christiana L Scheib
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; McDonald Institute for Archaeological Research, University of Cambridge, Cambridge CB2 3ER, UK; St John's College, University of Cambridge, Cambridge CB2 1TP, UK
| | - Daniel N Seidman
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Amy L Williams
- Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA
| | - Kristiina Tambets
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Mait Metspalu
- Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| |
Collapse
|
45
|
Ioannidis AG, Blanco-Portillo J, Sandoval K, Hagelberg E, Barberena-Jonas C, Hill AVS, Rodríguez-Rodríguez JE, Fox K, Robson K, Haoa-Cardinali S, Quinto-Cortés CD, Miquel-Poblete JF, Auckland K, Parks T, Sofro ASM, Ávila-Arcos MC, Sockell A, Homburger JR, Eng C, Huntsman S, Burchard EG, Gignoux CR, Verdugo RA, Moraga M, Bustamante CD, Mentzer AJ, Moreno-Estrada A. Paths and timings of the peopling of Polynesia inferred from genomic networks. Nature 2021; 597:522-526. [PMID: 34552258 PMCID: PMC9710236 DOI: 10.1038/s41586-021-03902-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 08/12/2021] [Indexed: 02/08/2023]
Abstract
Polynesia was settled in a series of extraordinary voyages across an ocean spanning one third of the Earth1, but the sequences of islands settled remain unknown and their timings disputed. Currently, several centuries separate the dates suggested by different archaeological surveys2-4. Here, using genome-wide data from merely 430 modern individuals from 21 key Pacific island populations and novel ancestry-specific computational analyses, we unravel the detailed genetic history of this vast, dispersed island network. Our reconstruction of the branching Polynesian migration sequence reveals a serial founder expansion, characterized by directional loss of variants, that originated in Samoa and spread first through the Cook Islands (Rarotonga), then to the Society (Tōtaiete mā) Islands (11th century), the western Austral (Tuha'a Pae) Islands and Tuāmotu Archipelago (12th century), and finally to the widely separated, but genetically connected, megalithic statue-building cultures of the Marquesas (Te Henua 'Enana) Islands in the north, Raivavae in the south, and Easter Island (Rapa Nui), the easternmost of the Polynesian islands, settled in approximately AD 1200 via Mangareva.
Collapse
Affiliation(s)
- Alexander G Ioannidis
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA, USA.
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico.
| | - Javier Blanco-Portillo
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Karla Sandoval
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | | | - Carmina Barberena-Jonas
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Adrian V S Hill
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
- The Jenner Institute, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Juan Esteban Rodríguez-Rodríguez
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | - Keolu Fox
- Department of Anthropology, University of California San Diego, La Jolla, CA, USA
| | - Kathryn Robson
- MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
| | | | - Consuelo D Quinto-Cortés
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico
| | | | - Kathryn Auckland
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
| | - Tom Parks
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
| | - Abdul Salam M Sofro
- Department of Biochemistry, Faculty of Medicine, Yayasan Rumah Sakit Islam (YARSI) University, Cempaka Putih, Jakarta, Indonesia
| | - María C Ávila-Arcos
- International Laboratory for Human Genome Research (LIIGH), UNAM Juriquilla, Queretaro, Mexico
| | - Alexandra Sockell
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
| | - Julian R Homburger
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
| | - Celeste Eng
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Scott Huntsman
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Esteban G Burchard
- Program in Pharmaceutical Sciences and Pharmacogenomics, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
| | - Christopher R Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO, USA
| | - Ricardo A Verdugo
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
- Translational Oncology Department, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Mauricio Moraga
- Human Genetics Program, Institute of Biomedical Sciences, Faculty of Medicine, University of Chile, Santiago, Chile
- Department of Anthropology, Faculty of Social Sciences, University of Chile, Santiago, Chile
| | - Carlos D Bustamante
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Alexander J Mentzer
- Wellcome Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Andrés Moreno-Estrada
- National Laboratory of Genomics for Biodiversity (LANGEBIO)-Advanced Genomics Unit (UGA), CINVESTAV, Irapuato, Guanajuato, Mexico.
| |
Collapse
|
46
|
Hendry JA, Kwiatkowski D, McVean G. Elucidating relationships between P.falciparum prevalence and measures of genetic diversity with a combined genetic-epidemiological model of malaria. PLoS Comput Biol 2021; 17:e1009287. [PMID: 34411093 PMCID: PMC8407561 DOI: 10.1371/journal.pcbi.1009287] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 08/31/2021] [Accepted: 07/19/2021] [Indexed: 12/05/2022] Open
Abstract
There is an abundance of malaria genetic data being collected from the field, yet using these data to understand the drivers of regional epidemiology remains a challenge. A key issue is the lack of models that relate parasite genetic diversity to epidemiological parameters. Classical models in population genetics characterize changes in genetic diversity in relation to demographic parameters, but fail to account for the unique features of the malaria life cycle. In contrast, epidemiological models, such as the Ross-Macdonald model, capture malaria transmission dynamics but do not consider genetics. Here, we have developed an integrated model encompassing both parasite evolution and regional epidemiology. We achieve this by combining the Ross-Macdonald model with an intra-host continuous-time Moran model, thus explicitly representing the evolution of individual parasite genomes in a traditional epidemiological framework. Implemented as a stochastic simulation, we use the model to explore relationships between measures of parasite genetic diversity and parasite prevalence, a widely-used metric of transmission intensity. First, we explore how varying parasite prevalence influences genetic diversity at equilibrium. We find that multiple genetic diversity statistics are correlated with prevalence, but the strength of the relationships depends on whether variation in prevalence is driven by host- or vector-related factors. Next, we assess the responsiveness of a variety of statistics to malaria control interventions, finding that those related to mixed infections respond quickly (∼months) whereas other statistics, such as nucleotide diversity, may take decades to respond. These findings provide insights into the opportunities and challenges associated with using genetic data to monitor malaria epidemiology.
Collapse
Affiliation(s)
- Jason A. Hendry
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Dominic Kwiatkowski
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
- Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- Medical Research Council Centre for Genomics and Global Health, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
47
|
Zimmerman KD, Schurr TG, Chen W, Nayak U, Mychaleckyj JC, Quet Q, Moultrie LH, Divers J, Keene KL, Kamen DL, Gilkeson GS, Hunt KJ, Spruill IJ, Fernandes JK, Aldrich MC, Reich D, Garvey WT, Langefeld CD, Sale MM, Ramos PS. Genetic landscape of Gullah African Americans. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2021; 175:905-919. [PMID: 34008864 PMCID: PMC8286328 DOI: 10.1002/ajpa.24333] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 03/30/2021] [Accepted: 04/17/2021] [Indexed: 01/20/2023]
Abstract
OBJECTIVES Gullah African Americans are descendants of formerly enslaved Africans living in the Sea Islands along the coast of the southeastern U.S., from North Carolina to Florida. Their relatively high numbers and geographic isolation were conducive to the development and preservation of a unique culture that retains deep African features. Although historical evidence supports a West-Central African ancestry for the Gullah, linguistic and cultural evidence of a connection to Sierra Leone has led to the suggestion of this country/region as their ancestral home. This study sought to elucidate the genetic structure and ancestry of the Gullah. MATERIALS AND METHODS We leveraged whole-genome genotype data from Gullah, African Americans from Jackson, Mississippi, African populations from Sierra Leone, and population reference panels from Africa and Europe to infer population structure, ancestry proportions, and global estimates of admixture. RESULTS Relative to non-Gullah African Americans from the Southeast US, the Gullah exhibited higher mean African ancestry, lower European admixture, a similarly small Native American contribution, and increased male-biased European admixture. A slightly tighter bottleneck in the Gullah 13 generations ago suggests a largely shared demographic history with non-Gullah African Americans. Despite a slightly higher relatedness to populations from Sierra Leone, our data demonstrate that the Gullah are genetically related to many West African populations. DISCUSSION This study confirms that subtle differences in African American population structure exist at finer regional levels. Such observations can help to inform medical genetics research in African Americans, and guide the interpretation of genetic data used by African Americans seeking to explore ancestral identities.
Collapse
Affiliation(s)
- Kip D. Zimmerman
- Center for Precision MedicineWake Forest School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Theodore G. Schurr
- Department of AnthropologyUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - Wei‐Min Chen
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Uma Nayak
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Josyf C. Mychaleckyj
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Queen Quet
- Gullah/Geechee NationSt. Helena IslandSouth CarolinaUSA
| | - Lee H. Moultrie
- Lee H. Moultrie & AssociatesNorth CharlestonSouth CarolinaUSA
| | - Jasmin Divers
- Department of Health Services ResearchNew York University Winthrop HospitalMineolaNew YorkUSA
| | - Keith L. Keene
- Department of BiologyEast Carolina UniversityGreenvilleNorth CarolinaUSA
- Center for Health DisparitiesEast Carolina University Brody School of MedicineGreenvilleNorth CarolinaUSA
| | - Diane L. Kamen
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Gary S. Gilkeson
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Kelly J. Hunt
- Department of Public Health SciencesMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Ida J. Spruill
- College of NursingMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Jyotika K. Fernandes
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
| | - Melinda C. Aldrich
- Department of Thoracic SurgeryVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of MedicineVanderbilt University Medical CenterNashvilleTennesseeUSA
- Department of Biomedical InformaticsVanderbilt University Medical CenterNashvilleTennesseeUSA
- Vanderbilt Genetics InstituteVanderbilt University Medical CenterNashvilleTennesseeUSA
| | - David Reich
- Department of GeneticsHarvard Medical SchoolBostonMassachusettsUSA
- Howard Hughes Medical InstituteHarvard Medical SchoolBostonMassachusettsUSA
- Broad Institute of MIT and HarvardCambridgeMassachusettsUSA
- Department of Human Evolutionary BiologyHarvard UniversityCambridgeMassachusettsUSA
| | - W. Timothy Garvey
- Department of Nutrition ScienceUniversity of Alabama at BirminghamBirminghamAlabamaUSA
| | - Carl D. Langefeld
- Center for Precision MedicineWake Forest School of MedicineWinston‐SalemNorth CarolinaUSA
| | - Michèle M. Sale
- Center for Public Health GenomicsUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Public Health SciencesUniversity of VirginiaCharlottesvilleVirginiaUSA
| | - Paula S. Ramos
- Department of MedicineMedical University of South CarolinaCharlestonSouth CarolinaUSA
- Department of Public Health SciencesMedical University of South CarolinaCharlestonSouth CarolinaUSA
| |
Collapse
|
48
|
Matsunami M, Koganebuchi K, Imamura M, Ishida H, Kimura R, Maeda S. Fine-Scale Genetic Structure and Demographic History in the Miyako Islands of the Ryukyu Archipelago. Mol Biol Evol 2021; 38:2045-2056. [PMID: 33432348 PMCID: PMC8097307 DOI: 10.1093/molbev/msab005] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Ryukyu Archipelago is located in the southwest of the Japanese islands and is composed of dozens of islands, grouped into the Miyako Islands, Yaeyama Islands, and Okinawa Islands. Based on the results of principal component analysis on genome-wide single-nucleotide polymorphisms, genetic differentiation was observed among the island groups of the Ryukyu Archipelago. However, a detailed population structure analysis of the Ryukyu Archipelago has not yet been completed. We obtained genomic DNA samples from 1,240 individuals living in the Miyako Islands, and we genotyped 665,326 single-nucleotide polymorphisms to infer population history within the Miyako Islands, including Miyakojima, Irabu, and Ikema islands. The haplotype-based analysis showed that populations in the Miyako Islands were divided into three subpopulations located on Miyakojima northeast, Miyakojima southwest, and Irabu/Ikema. The results of haplotype sharing and the D statistics analyses showed that the Irabu/Ikema subpopulation received gene flows different from those of the Miyakojima subpopulations, which may be related with the historically attested immigration during the Gusuku period (900 − 500 BP). A coalescent-based demographic inference suggests that the Irabu/Ikema population firstly split away from the ancestral Ryukyu population about 41 generations ago, followed by a split of the Miyako southwest population from the ancestral Ryukyu population (about 16 generations ago), and the differentiation of the ancestral Ryukyu population into two populations (Miyako northeast and Okinawajima populations) about seven generations ago. Such genetic information is useful for explaining the population history of modern Miyako people and must be taken into account when performing disease association studies.
Collapse
Affiliation(s)
- Masatoshi Matsunami
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Kae Koganebuchi
- Advanced Medical Research Center, Faculty of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Minako Imamura
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan.,Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Nishihara-Cho, Japan
| | - Hajime Ishida
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Ryosuke Kimura
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan
| | - Shiro Maeda
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of Medicine, University of the Ryukyus, Nishihara-Cho, Japan.,Division of Clinical Laboratory and Blood Transfusion, University of the Ryukyus Hospital, Nishihara-Cho, Japan
| |
Collapse
|
49
|
Fang L, Zhao T, Hu Y, Si Z, Zhu X, Han Z, Liu G, Wang S, Ju L, Guo M, Mei H, Wang L, Qi B, Wang H, Guan X, Zhang T. Divergent improvement of two cultivated allotetraploid cotton species. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:1325-1336. [PMID: 33448110 PMCID: PMC8313128 DOI: 10.1111/pbi.13547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 12/24/2020] [Accepted: 01/03/2021] [Indexed: 05/21/2023]
Abstract
Interspecific genomic variation can provide a genetic basis for local adaptation and domestication. A series of studies have presented its role of interspecific haplotypes and introgressions in adaptive traits, but few studies have addressed their role in improving agronomic character. Two allotetraploid Gossypium species, Gossypium barbadense (Gb) and G. hirsutum (Gh) originating from the Americas, are cultivated independently. Here, through sequencing and the comparison of one GWAS panel in 229 Gb accessions and two GWAS panels in 491 Gh accessions, we found that most associated loci or functional haplotypes for agronomic traits were highly divergent, representing the strong divergent improvement between Gb and Gh. Using a comprehensive interspecific haplotype map, we revealed that six interspecific introgressions from Gh to Gb were significantly associated with the phenotypic performance of Gb, which could explain 5%-40% of phenotypic variation in yield and fibre qualities. In addition, three introgressions overlapped with six associated loci in Gb, indicating that these introgression regions were under further selection and stabilized during improvement. A single interspecific introgression often possessed yield-increasing potential but decreased fibre qualities, or the opposite, making it difficult to simultaneously improve yield and fibre qualities. Our study not only has proved the importance of interspecific functional haplotypes or introgressions in the divergent improvement of Gb and Gh, but also supports their potential value in further human-mediated hybridization or precision breeding.
Collapse
Affiliation(s)
- Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Yan Hu
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Zhanfeng Si
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Xiefei Zhu
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
| | - Zegang Han
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Guizhen Liu
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
- Henan Province Seed StationZhengzhouChina
| | - Sen Wang
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
- Institute of Food CropsJiangsu Academy of Agricultural SciencesNanjingChina
| | - Longzhen Ju
- State Key Laboratory of Crop Genetics and Germplasm EnhancementNanjing Agricultural UniversityNanjingChina
| | - Menglan Guo
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Huan Mei
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Luyao Wang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Bowen Qi
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Heng Wang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Xueying Guan
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic ResourcesInstitute of Crop SciencePlant Precision Breeding AcademyCollege of Agriculture and BiotechnologyZhejiang UniversityHangzhouChina
| |
Collapse
|
50
|
Rapid detection of identity-by-descent tracts for mega-scale datasets. Nat Commun 2021; 12:3546. [PMID: 34112768 PMCID: PMC8192555 DOI: 10.1038/s41467-021-22910-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/01/2021] [Indexed: 01/08/2023] Open
Abstract
The ability to identify segments of genomes identical-by-descent (IBD) is a part of standard workflows in both statistical and population genetics. However, traditional methods for finding local IBD across all pairs of individuals scale poorly leading to a lack of adoption in very large-scale datasets. Here, we present iLASH, an algorithm based on similarity detection techniques that shows equal or improved accuracy in simulations compared to current leading methods and speeds up analysis by several orders of magnitude on genomic datasets, making IBD estimation tractable for millions of individuals. We apply iLASH to the PAGE dataset of ~52,000 multi-ethnic participants, including several founder populations with elevated IBD sharing, identifying IBD segments in ~3 minutes per chromosome compared to over 6 days for a state-of-the-art algorithm. iLASH enables efficient analysis of very large-scale datasets, as we demonstrate by computing IBD across the UK Biobank (~500,000 individuals), detecting 12.9 billion pairwise connections.
Collapse
|