1
|
Shanmugam A, Merrigan M, O'Reilly S, Molloy AM, Brody L, Hardiman O, Bodmer W, McLaughlin RL, Cavalleri GL, Byrne RP, Gilbert EH. A genetic perspective on the recent demographic history of Ireland and Britain. Eur J Hum Genet 2025; 33:538-545. [PMID: 39910328 PMCID: PMC11986122 DOI: 10.1038/s41431-025-01794-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 11/10/2024] [Accepted: 01/17/2025] [Indexed: 02/07/2025] Open
Abstract
While subtle yet discrete clusters of genetic identity across Ireland and Britain have been identified, their recent demographic history is unclear. Using genotype data from 6574 individuals with associated regional Irish or British ancestry, we identified genetic communities by applying Leiden community detection. Using haplotype segments segregated by length as proxy for time, we inferred regional Irish and British demographic histories. Using a subset of Irish participants, we provide genealogical context by estimating the enrichment/depletion of surnames within the Irish genetic communities. Through patterns of haplotype sharing, we find evidence of recent population bottlenecks in Orcadian, Manx and Welsh genetic communities. We observed temporal changes in genetic affinities within and between genetic communities in Ireland and Britain. Structure in Ireland is subtler compared to neighbouring British communities, with the Irish groups sharing relatively more short haplotype segments. In addition, we detected varying degrees of genetic isolation in peripheral Irish and British genetic communities across different time periods. Further, we observe a stable migration corridor between north-east Ireland and south-west Scotland while there is a recent migration barrier between south-east and west Ireland. Genealogical analysis of surnames in Ireland reflects history-Anglo-Norman surnames are enriched in the Wexford community while Scottish and Gallowglass surnames were enriched in the Ulster community. Using these new insights into the regional demographic history of Ireland and Britain across different time periods, we hope to understand the driving forces of rare allele frequencies and disease risk association within these populations.
Collapse
Affiliation(s)
- Ashwini Shanmugam
- School of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, Ireland
- The SFI Research Ireland Centre for Research Training in Genomics Data Science, School of Mathematics, Statistics and Applied Mathematics, University of Galway, Galway, Ireland
- FutureNeuro Research Ireland Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
| | | | | | - Anne M Molloy
- School of Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Lawrence Brody
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Orla Hardiman
- FutureNeuro Research Ireland Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
- The Academic Unit of Neurology, School of Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Walter Bodmer
- Weatherall Institute of Molecular Medicine and Department of Oncology, University of Oxford, Oxford, UK
| | - Russell L McLaughlin
- The SFI Research Ireland Centre for Research Training in Genomics Data Science, School of Mathematics, Statistics and Applied Mathematics, University of Galway, Galway, Ireland
- FutureNeuro Research Ireland Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
- Complex Trait Genomics Laboratory, Smurfit Institute of Genetics, School of Genetics and Microbiology, Trinity College Dublin, Dublin 2, Ireland
| | - Gianpiero L Cavalleri
- School of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, Ireland
- The SFI Research Ireland Centre for Research Training in Genomics Data Science, School of Mathematics, Statistics and Applied Mathematics, University of Galway, Galway, Ireland
- FutureNeuro Research Ireland Centre, Royal College of Surgeons in Ireland, Dublin, Ireland
| | - Ross P Byrne
- Complex Trait Genomics Laboratory, Smurfit Institute of Genetics, School of Genetics and Microbiology, Trinity College Dublin, Dublin 2, Ireland.
| | - Edmund H Gilbert
- School of Pharmacy and Biomolecular Sciences, Royal College of Surgeons in Ireland, Dublin, Ireland.
- FutureNeuro Research Ireland Centre, Royal College of Surgeons in Ireland, Dublin, Ireland.
| |
Collapse
|
2
|
Huang Z, Kelleher J, Chan YB, Balding D. Estimating evolutionary and demographic parameters via ARG-derived IBD. PLoS Genet 2025; 21:e1011537. [PMID: 39778081 PMCID: PMC11750106 DOI: 10.1371/journal.pgen.1011537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Revised: 01/21/2025] [Accepted: 12/11/2024] [Indexed: 01/11/2025] Open
Abstract
Inference of evolutionary and demographic parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data. For IBD inferred from real data, we propose an approximate Bayesian computation inference algorithm and use it to show that even poorly-inferred short IBD segments can improve estimation. Our mutation-rate estimator achieves precision similar to a previously-published method despite a 4 000-fold reduction in data used for inference, and we identify significant differences between human populations. Computational cost limits model complexity in our approach, but we are able to incorporate unknown nuisance parameters and model misspecification, still finding improved parameter inference.
Collapse
Affiliation(s)
- Zhendong Huang
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Victoria, Australia
| | - Jerome Kelleher
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| | - Yao-ban Chan
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Victoria, Australia
| | - David Balding
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Victoria, Australia
| |
Collapse
|
3
|
Tahir M, Ren Y, Wu B, Li M, Refaiy M, Cao M, Kong D, Pang X. Parental Reconstruction from a Half-Sib Population of Stoneless Jujube Ziziphus jujuba Mill. Based on Individual Specific SNP Markers Using Multiplex PCR. PLANTS (BASEL, SWITZERLAND) 2024; 13:3163. [PMID: 39599373 PMCID: PMC11598090 DOI: 10.3390/plants13223163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 11/01/2024] [Accepted: 11/07/2024] [Indexed: 11/29/2024]
Abstract
The selection of unique and individual-specific SNPs is important as compared with universal SNPs for individual identification. Therefore, the main significance of this research is the selection of specific SNPs in male parent and the identification of offspring with these specific SNPs in their genome by multiplex PCR, which is utilized for genotyping of 332 half-sib plants of Ziziphus jujuba.This cost-effective method makes as much as possible to utilize the same amount of each pair of various targeted loci primers. After PCR amplification of targeted genome parts, the mixed products can be directly used in a next-generation sequencing platform. We concomitantly amplified 10 unique SNP loci at 10 different chromosomes of male JingZao 39 plants in 332 half-sib plants and sequenced them on the Illumina Novaseq 6000 platform. Analysis of SNP genotyping accuracy of 332 half-sib plants showed that all 10 unique SNPs in all 332 plants were correctly amplified in this multiplex PCR method. Furthermore, based on Mendelian inheritance, we identified 124 full-sib plants that have 10 unique SNPs in their genomes. These results were further confirmed by whole genome resequencing of 82 randomly selected half-sib plants, and the identity-by-descent values of all full-sib plants were between 0.4399 to 0.5652. This study displayed a cost-effective multiplex PCR method and proper identification of pollen parent through specific SNPs in half-sib progenies and firstly obtained a full-sib population between 'Wuhezao' and 'JingZao 39', segregating for stone and stoneless fruit.
Collapse
Affiliation(s)
- Muhammad Tahir
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China; (M.T.); (Y.R.); (B.W.); (M.L.); (M.R.)
| | - Yue Ren
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China; (M.T.); (Y.R.); (B.W.); (M.L.); (M.R.)
| | - Bo Wu
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China; (M.T.); (Y.R.); (B.W.); (M.L.); (M.R.)
| | - Meiyu Li
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China; (M.T.); (Y.R.); (B.W.); (M.L.); (M.R.)
| | - Mohamed Refaiy
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China; (M.T.); (Y.R.); (B.W.); (M.L.); (M.R.)
| | - Ming Cao
- National Foundation for Improved Cultivars of Chinese Jujube, Cangzhou 061000, China; (M.C.); (D.K.)
| | - Decang Kong
- National Foundation for Improved Cultivars of Chinese Jujube, Cangzhou 061000, China; (M.C.); (D.K.)
| | - Xiaoming Pang
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Biotechnology, Beijing Forestry University, Beijing 100083, China; (M.T.); (Y.R.); (B.W.); (M.L.); (M.R.)
| |
Collapse
|
4
|
Broschewitz L, Reim S, Flachowsky H, Höfer M. Pomological and Molecular Characterization of Apple Cultivars in the German Fruit Genebank. PLANTS (BASEL, SWITZERLAND) 2024; 13:2699. [PMID: 39409569 PMCID: PMC11478905 DOI: 10.3390/plants13192699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 09/18/2024] [Accepted: 09/24/2024] [Indexed: 10/20/2024]
Abstract
Traditional varieties are a valuable tool in modern apple breeding. However, the use of synonyms and missing source documentation hinder an effective identification and conservation of relevant cultivars. During several projects, the authenticity and diversity of the apple cultivar collection of the German Fruit Genebank (GFG) was evaluated extensively. The trueness-to-type of 7890 apple trees was assessed on a pomological and molecular level. Pomological evaluations were performed by at least two experienced experts to identify the original cultivar names. On the molecular level, a set of 17 SSR markers was used to determine a unique genetic profile for each apple cultivar. The pomological and molecular characterization was expressed in terms of a comprehensive trueness-to-type criterion and the results were previously published as a well-curated dataset. In this study, the published dataset was analyzed to evaluate the quality and diversity of the apple collection of the GFG and highlight new findings based on phylogenetic and parentage analysis. The dataset contains 1404 unique genetic profiles corresponding to unambiguous cultivar names. Of these 1404 cultivars, 74% were assessed as true-to-type. The collection of diploid apple cultivars showed a high degree of expected heterozygosity (Hexp = 0.84). Genetic diversity in terms of year and location of origin was investigated with a STRUCTURE analysis. It was hypothesized that genetic diversity might decline overtime due to restrictive breeding programs. The results showed a shift dynamic between older and newer cultivars in one specific cluster, but no significant decrease in genetic diversity was observed in this study. Lastly, a parentage analysis was performed to check parental relationships based on historical research. Out of 128 parent-child trios, 110 trios resulted in significant relationships and reconfirmed the information from the literature. In some cases, the information from the literature was disproven. This analysis also allowed for readjusting the trueness-to-type criteria for previously undetermined cultivars. Overall, the importance of authenticity evaluations for gene bank cultivars was highlighted. Furthermore, the direct use of the dataset was shown by relevant investigations on the genetic diversity and structure of the apple cultivar collections of the GFG.
Collapse
Affiliation(s)
- Lea Broschewitz
- Julius Kühn Institute (JKI), Federal Research Centre for Cultivated Plants, Institute of Breeding Research on Fruit Crops, Pillnitzer Platz 3a, 01326 Dresden, Germany
| | | | | | | |
Collapse
|
5
|
Jordan B. [Ancient DNA speaks]. Med Sci (Paris) 2024; 40:563-565. [PMID: 38986104 DOI: 10.1051/medsci/2024070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024] Open
Abstract
Many human DNA sequences have been obtained from ancient remains dating back from several millennia. However, these have low coverage and may contain many errors; this has limited their usefulness for many analyses, in particular the search for Identical By Descent (IBD) segments that is very powerful for detection of kinship. A new method, using imputation from database data and sophisticated statistical analysis, proves able to detect IBD segments (and thus parenthood) in low-quality DNA sequences from individuals linked only by sixth degree parenthood, opening a whole new field of investigation using ancient DNA.
Collapse
Affiliation(s)
- Bertrand Jordan
- Biologiste, généticien et immunologiste, Président d'Aprogène (Association pour la promotion de la Génomique), 13007 Marseille, France
| |
Collapse
|
6
|
Wang X, Muenzler M, King J, Liu M, Li H, Budowle B, Ge J. A complete pipeline enables haplotyping and phasing macrohaplotype in long sequencing reads for polyploidy samples and a multi-source DNA mixture. Electrophoresis 2024; 45:877-884. [PMID: 38196015 DOI: 10.1002/elps.202300143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 11/19/2023] [Accepted: 11/30/2023] [Indexed: 01/11/2024]
Abstract
Macrohaplotype combines multiple types of phased DNA variants, increasing forensic discrimination power. High-quality long-sequencing reads, for example, PacBio HiFi reads, provide data to detect macrohaplotypes in multiploidy and DNA mixtures. However, the bioinformatics tools for detecting macrohaplotypes are lacking. In this study, we developed a bioinformatics software, MacroHapCaller, in which targeted loci (i.e., short TRs [STRs], single nucleotide polymorphisms, and insertion and deletions) are genotyped and combined with novel algorithms to call macrohaplotypes from long reads. MacroHapCaller uses physical phasing (i.e., read-backed phasing) to identify macrohaplotypes, and thus it can detect multi-allelic macrohaplotypes for a given sample. MacroHapCaller was validated with data generated from our designed targeted PacBio HiFi sequencing pipeline, which sequenced ∼8-kb amplicon regions harboring 20 core forensic STR loci in human benchmark samples HG002 and HG003. MacroHapCaller also was validated in whole-genome long-read sequencing data. Robust and accurate genotyping and phased macrohaplotypes were obtained with MacroHapCaller compared with the known ground truth. MacroHapCaller achieved a higher or consistent genotyping accuracy and faster speed than existing tools HipSTR and DeepVar. MacroHapCaller enables efficient macrohaplotype analysis from high-throughput sequencing data and supports applications using discriminating macrohaplotypes.
Collapse
Affiliation(s)
- Xuewen Wang
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Melissa Muenzler
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Jonathan King
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Muyi Liu
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| | - Hongmin Li
- College of Science, Cal State East Bay, Hayward, California, USA
| | - Bruce Budowle
- Department of Forensic Medicine, University of Helsinki, Helsinki, Finland
- Forensic Science Institute, Radford University, Radford, Virginia, USA
| | - Jianye Ge
- Health Science Center, University of North Texas, Fort Worth, Texas, USA
| |
Collapse
|
7
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. Am J Hum Genet 2024; 111:691-700. [PMID: 38513668 PMCID: PMC11023918 DOI: 10.1016/j.ajhg.2024.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/23/2024] Open
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.
Collapse
Affiliation(s)
- Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Huang Z, Kelleher J, Chan YB, Balding DJ. Estimating evolutionary and demographic parameters via ARG-derived IBD. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.07.583855. [PMID: 38559261 PMCID: PMC10979897 DOI: 10.1101/2024.03.07.583855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Inference of demographic and evolutionary parameters from a sample of genome sequences often proceeds by first inferring identical-by-descent (IBD) genome segments. By exploiting efficient data encoding based on the ancestral recombination graph (ARG), we obtain three major advantages over current approaches: (i) no need to impose a length threshold on IBD segments, (ii) IBD can be defined without the hard-to-verify requirement of no recombination, and (iii) computation time can be reduced with little loss of statistical efficiency using only the IBD segments from a set of sequence pairs that scales linearly with sample size. We first demonstrate powerful inferences when true IBD information is available from simulated data. For IBD inferred from real data, we propose an approximate Bayesian computation inference algorithm and use it to show that poorly-inferred short IBD segments can improve estimation precision. We show estimation precision similar to a previously-published estimator despite a 4 000-fold reduction in data used for inference. Computational cost limits model complexity in our approach, but we are able to incorporate unknown nuisance parameters and model misspecification, still finding improved parameter inference.
Collapse
Affiliation(s)
- Zhendong Huang
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - Jerome Kelleher
- Oxford Big Data Institute, University of Oxford, United Kingdom
| | - Yao-ban Chan
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| | - David J. Balding
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Australia
| |
Collapse
|
9
|
Freudiger A, Jovanovic VM, Huang Y, Snyder-Mackler N, Conrad DF, Miller B, Montague MJ, Westphal H, Stadler PF, Bley S, Horvath JE, Brent LJN, Platt ML, Ruiz-Lambides A, Tung J, Nowick K, Ringbauer H, Widdig A. Taking identity-by-descent analysis into the wild: Estimating realized relatedness in free-ranging macaques. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574911. [PMID: 38260273 PMCID: PMC10802400 DOI: 10.1101/2024.01.09.574911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.
Collapse
Affiliation(s)
- Annika Freudiger
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Vladimir M Jovanovic
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Yilei Huang
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Noah Snyder-Mackler
- Center for Evolution & Medicine, School of Life Sciences, Arizona State University, Tempe, USA
| | - Donald F Conrad
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Brian Miller
- Division of Genetics, Oregon National Primate Research Center, Portland, Oregon, USA
| | - Michael J Montague
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Hendrikje Westphal
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Austria
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, NM, USA
| | - Stefanie Bley
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Julie E Horvath
- Department of Biological and Biomedical Sciences, North Carolina Central University, North Carolina, Durham, USA
- Research and Collections Section, North Carolina Museum of Natural Sciences, North Carolina, Raleigh, USA
- Department of Biological Sciences, North Carolina State University, North Carolina, Raleigh, USA
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Lauren J N Brent
- Centre for Research in Animal Behaviour, University of Exeter, Exeter, UK
| | - Michael L Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Marketing Department, the Wharton School of Business, University of Pennsylvania, Philadelphia, PA, USA
- Department of Psychology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Angelina Ruiz-Lambides
- Cayo Santiago Field Station, Caribbean Primate Research Center, University of Puerto Rico, Punta Santiago, Puerto Rico
| | - Jenny Tung
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- Department of Evolutionary Anthropology, Duke University, North Carolina, Durham, USA
- Department of Biology, Duke University, Durham, North Carolina, USA
- Duke University Population Research Institute, Durham, North Carolina, USA
| | - Katja Nowick
- Human Biology and Primate Evolution, Institut für Zoologie, Freie Universität Berlin, Berlin, Germany
- Bioinformatics Solution Center, Freie Universität Berlin, Berlin, Germany
| | - Harald Ringbauer
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Anja Widdig
- Behavioral Ecology Research Group, Faculty of Life Sciences, Institute of Biology, Leipzig University, Leipzig, Germany
- Department of Primate Behavior and Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| |
Collapse
|
10
|
He S, Wang Y, Luo Y, Xue M, Wu M, Tan H, Peng Y, Wang K, Fang M. Integrated analysis strategy of genome-wide functional gene mining reveals DKK2 gene underlying meat quality in Shaziling synthesized pigs. BMC Genomics 2024; 25:30. [PMID: 38178019 PMCID: PMC10765619 DOI: 10.1186/s12864-023-09925-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024] Open
Abstract
BACKGROUND Shaziling pig is a well-known indigenous breed in China who has superior meat quality traits. However, the genetic mechanism and genomic evidence underlying meat quality characteristics of Shaziling pigs are still unclear. To explore and investigate the germplasm characteristics of Shaziling pigs, we totally analyzed 67 individual's whole genome sequencing data for the first time (20 Shaziling pigs [S], 20 Dabasha pigs [DBS], 11 Yorkshire pigs [Y], 10 Berkshire pigs [BKX], 5 Basha pigs [BS] and 1 Warthog). RESULTS A total of 2,538,577 SNPs with high quality were detected and 9 candidate genes which was specifically selected in S and shared in S to DBS were precisely mined and screened using an integrated analysis strategy of identity-by-descent (IBD) and selective sweep. Of them, dickkopf WNT signaling pathway inhibitor 2 (DKK2), the antagonist of Wnt signaling pathway, was the most promising candidate gene which was not only identified an association of palmitic acid and palmitoleic acid quantitative trait locus in PigQTLdb, but also specifically selected in S compared to other 48 Chinese local pigs of 12 populations and 39 foreign pigs of 4 populations. Subsequently, a mutation at 12,726-bp of DKK2 intron 1 (g.114874954 A > C) was identified associated with intramuscular fat content using method of PCR-RFLP in 21 different pig populations. We observed DKK2 specifically expressed in adipose tissues. Overexpression of DKK2 decreased the content of triglyceride, fatty acid synthase and expression of relevant genes of adipogenic and Wnt signaling pathway, while interference of DKK2 got contrary effect during adipogenesis differentiation of porcine preadipocytes and 3T3-L1 cells. CONCLUSIONS Our findings provide an analysis strategy for mining functional genes of important economic traits and provide fundamental data and molecular evidence for improving pig meat quality traits and molecular breeding.
Collapse
Affiliation(s)
- Shuaihan He
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Yubei Wang
- Sanya Institute of China Agricultural University, Sanya, 572025, China
| | - Yabiao Luo
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Mingming Xue
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Maisheng Wu
- Xiangtan Bureau of Animal Husbandry and Veterinary Medicine and Aquatic Product, Xiangtan, 411102, China
| | - Hong Tan
- Xiangtan Bureau of Animal Husbandry and Veterinary Medicine and Aquatic Product, Xiangtan, 411102, China
| | - Yinglin Peng
- Hunan Institute of Animal & Veterinary Science, Changsha, 410131, China
| | - Kejun Wang
- College of Animal Science and Technology, Henan Agricultural University, Zhengzhou, 450002, China.
| | - Meiying Fang
- State Key Laboratory of Animal Biotech Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
- Sanya Institute of China Agricultural University, Sanya, 572025, China.
| |
Collapse
|
11
|
Browning SR, Browning BL. Biobank-scale inference of multi-individual identity by descent and gene conversion. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565574. [PMID: 37961601 PMCID: PMC10635131 DOI: 10.1101/2023.11.03.565574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more efficient collection and storage of identity by descent (IBD) information than approaches that detect and store pairwise IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach.
Collapse
Affiliation(s)
| | - Brian L. Browning
- Department of Biostatistics, University of Washington, Seattle, WA
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA
| |
Collapse
|
12
|
Wharrie S, Yang Z, Raj V, Monti R, Gupta R, Wang Y, Martin A, O’Connor LJ, Kaski S, Marttinen P, Palamara PF, Lippert C, Ganna A. HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes. Bioinformatics 2023; 39:btad535. [PMID: 37647640 PMCID: PMC10493177 DOI: 10.1093/bioinformatics/btad535] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 08/23/2023] [Accepted: 08/29/2023] [Indexed: 09/01/2023] Open
Abstract
MOTIVATION Existing methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking. RESULTS We present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures. AVAILABILITY AND IMPLEMENTATION A synthetic dataset of 1 008 000 individuals and nine traits for 6.8 million common variants is available at https://www.ebi.ac.uk/biostudies/studies/S-BSST936. The HAPNEST software for generating synthetic datasets is available as Docker/Singularity containers and open source Julia and C code at https://github.com/intervene-EU-H2020/synthetic_data.
Collapse
Affiliation(s)
- Sophie Wharrie
- Department of Computer Science, Aalto University, Espoo 02150, Finland
| | - Zhiyu Yang
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki 00014, Finland
| | - Vishnu Raj
- Department of Computer Science, Aalto University, Espoo 02150, Finland
| | - Remo Monti
- Hasso Plattner Institute, University of Potsdam, Digital Engineering Faculty, Potsdam 14469, Germany
| | - Rahul Gupta
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Ying Wang
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Alicia Martin
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Luke J O’Connor
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| | - Samuel Kaski
- Department of Computer Science, Aalto University, Espoo 02150, Finland
- Department of Computer Science, University of Manchester, Manchester M13 9PL, United Kingdom
| | - Pekka Marttinen
- Department of Computer Science, Aalto University, Espoo 02150, Finland
| | | | - Christoph Lippert
- Hasso Plattner Institute, University of Potsdam, Digital Engineering Faculty, Potsdam 14469, Germany
- Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York 10065, United States
| | - Andrea Ganna
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki 00014, Finland
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, United States
| |
Collapse
|
13
|
Ahmad O, Sutter C, Hirsch S, Pfister SM, Schaaf CP. BRCA1/2 potential founder variants in the Jordanian population: an opportunity for a customized screening panel. Hered Cancer Clin Pract 2023; 21:11. [PMID: 37400873 DOI: 10.1186/s13053-023-00256-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 06/22/2023] [Indexed: 07/05/2023] Open
Abstract
A founder variant is a genetic alteration, that is inherited from a common ancestor together with a surrounding chromosomal segment, and is observed at a high frequency in a defined population. This founder effect occurs as a consequence of long-standing inbreeding of isolated populations. For high-risk cancer predisposition genes, such as BRCA1/2, the identification of founder variants in a certain population could help designing customized cost-effective cancer screening panels. This advantage has been best utilized in designing a customized breast cancer BRCA screening panel for the Ashkenazi Jews (AJ) population, composed of the three BRCA founder variants which account for approximately 90% of identified BRCA alterations. Indeed, the high prevalence of pathogenic BRCA1/2 variants among AJ (~ 2%) has additionally contributed to make population-based screening cost-effective in comparison to family-history-based screening. In Jordan there are multiple demographic characteristics supporting the proposal of a founder effect. A high consanguinity rate of ~ 57% in the nineties of the last century and ~ 30% more recently is a prominent factor, in addition to inbreeding which is often practiced by different sub-populations of the country.This review explains the concept of founder effect, then applies it to analyze published Jordanian BRCA variants, and concludes that nine pathogenic (P) and likely pathogenic (LP) BRCA2 variants together with one pathogenic BRCA1 variant are potential founder variants. Together they make up 43% and 55% of all identified BRCA1/2 alterations in the two largest studied cohorts of young patients and high-risk patients respectively. These variants were identified based on being recurrent and either specific to ethnic groups or being novel. In addition, the report highlights the required testing methodologies to validate these findings, and proposes a health economic evaluation model to test cost-effectiveness of a population-based customized BRCA screening panel for the Jordanian population. The aim of this report is to highlight the potential utilization of founder variants in establishing customized cancer predisposition services, in order to encourage more population-based genomic studies in Jordan and similar populations.
Collapse
Affiliation(s)
- Olfat Ahmad
- Division of Pediatric Neurooncology, Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), Heidelberg, Germany
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany
- University of Oxford, Oxford, UK
- King Hussein Cancer Center (KHCC), Amman, Jordan
| | - Christian Sutter
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany
| | - Steffen Hirsch
- Division of Pediatric Neurooncology, Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), Heidelberg, Germany
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany
| | - Stefan M Pfister
- Division of Pediatric Neurooncology, Hopp Children's Cancer Center (KiTZ), Heidelberg, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), German Cancer Consortium (DKTK), Heidelberg, Germany
- Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Heidelberg, Germany
| | - Christian P Schaaf
- Institute of Human Genetics, Heidelberg University, Heidelberg, Germany.
| |
Collapse
|
14
|
Wei Y, Naseri A, Zhi D, Zhang S. RaPID-Query for fast identity by descent search and genealogical analysis. Bioinformatics 2023; 39:btad312. [PMID: 37166451 PMCID: PMC10244210 DOI: 10.1093/bioinformatics/btad312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 04/26/2023] [Accepted: 05/09/2023] [Indexed: 05/12/2023] Open
Abstract
MOTIVATION Due to the rapid growth of the genetic database size, genealogical search, a process of inferring familial relatedness by identifying DNA matches, has become a viable approach to help individuals finding missing family members or law enforcement agencies locating suspects. A fast and accurate method is needed to search an out-of-database individual against millions of individuals. Most existing approaches only offer all-versus-all within panel match. Some prototype algorithms offer one-versus-all query from out-of-panel individual, but they do not tolerate errors. RESULTS A new method, random projection-based identity-by-descent (IBD) detection (RaPID) query, is introduced to make fast genealogical search possible. RaPID-Query identifies IBD segments between a query haplotype and a panel of haplotypes. By integrating matches over multiple PBWT indexes, RaPID-Query manages to locate IBD segments quickly with a given cutoff length while allowing mismatched sites. A single query against all UK biobank autosomal chromosomes was completed within 2.76 seconds on average, with the minimum length 7 cM and 700 markers. RaPID-Query achieved a 0.016 false negative rate and a 0.012 false positive rate simultaneously on a chromosome 20 sequencing panel having 86 265 sites. This is comparable to the state-of-the-art IBD detection method TPBWT(out-of-sample) and Hap-IBD. The high-quality IBD segments yielded by RaPID-Query were able to distinguish up to fourth degree of the familial relatedness for a given individual pair, and the area under the receiver operating characteristic curve values are at least 97.28%. AVAILABILITY AND IMPLEMENTATION The RaPID-Query program is available at https://github.com/ucfcbb/RaPID-Query.
Collapse
Affiliation(s)
- Yuan Wei
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| | - Ardalan Naseri
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Degui Zhi
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Shaojie Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
15
|
Zhao Q, Li Y, Liang Q, Zhao J, Kang K, Hou M, Zhang X, Du R, Kong L, Liang B, Huang W. The infertile individual analysis based on whole-exome sequencing in chinese multi-ethnic groups. Genes Genomics 2023; 45:531-542. [PMID: 36115009 DOI: 10.1007/s13258-022-01307-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 08/10/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND Infertility is a common and rapidly growing health issue around the world. The genetic analysis based on the infertile population is crucial for intervention and treatment. OBJECTIVE To find candidate gene locus led to azoospermia in Chinese multi-ethnic groups and provide theoretical guidance for the diagnosis of genetic diseases to progressively aggravated infertility patients and sterile offspring with ART. METHODS The study based on whole-exome sequencing (WES) was presented for genetic characteristic analysis of multi-ethnics and identification of variants related to infertility in Xinjiang area of China. RESULTS The frequency of pathogenic variants showed significant ethnic differences among four main ethnics in Xinjiang. The population structure analysis confirmed that the Hui was close to the Han population, the Kazak was close to the Uygur population, and there are three ancestry components in the four ethnics. In addition, ten candidate variants potentially regulated azoospermia were detected, and KNTC1 (rs7968222: G > T) was chosen to validate the association. Through the analysis in the valid group, the frequency of rs7968222 (G > T) has a significant difference in the azoospermia population (11.76%, 8/68) and normospermia population (4.63%, 35/756) (P < 0.001). Interestingly, the proportion of people with abnormal follicle-stimulating hormone (FSH) level in the group carrying rs7968222 (G > T) was significantly higher than non-carriers (P < 0.05). Therefore, rs7968222 may regulate spermatogenesis through affecting hormone level. CONCLUSION Our study establishes the genetics analysis of Northwest China and finds a candidate gene locus KNTC1 (rs7968222: G > T), which is one of the genetic susceptibility factors for male azoospermia.
Collapse
Affiliation(s)
- Qiongzhen Zhao
- Tanzhi Stem Cell Bank of Xinjiang, 844000, Tumshuk, Xinjiang, China
| | - Yanqi Li
- Tanzhi Stem Cell Bank of Xinjiang, 844000, Tumshuk, Xinjiang, China
| | - Qi Liang
- Xinjiang Jiayin hospital, 830000, Urumqi, Xinjiang, China
| | - Jie Zhao
- Xinjiang Jiayin hospital, 830000, Urumqi, Xinjiang, China
| | - Kai Kang
- Basecare Medical Device Co., Ltd, 215001, Suzhou, Jiangsu, China
| | - Meiling Hou
- Suzhou BioX Research Institute, 215001, Suzhou, Jiangsu, China
| | - Xin Zhang
- Basecare Medical Device Co., Ltd, 215001, Suzhou, Jiangsu, China
| | - Renqian Du
- Basecare Medical Device Co., Ltd, 215001, Suzhou, Jiangsu, China
| | - Lingyin Kong
- Basecare Medical Device Co., Ltd, 215001, Suzhou, Jiangsu, China
| | - Bo Liang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200020, Shanghai, China.
| | - Weidong Huang
- Tanzhi Stem Cell Bank of Xinjiang, 844000, Tumshuk, Xinjiang, China.
- Xinjiang Jiayin hospital, 830000, Urumqi, Xinjiang, China.
| |
Collapse
|
16
|
Cui B, Guo Z, Cao H, Calus M, Zhang Q. The computational implementation of a platform of relative identity-by-descent scores algorithm for introgressive mapping. Front Genet 2023; 13:1028662. [PMID: 36761695 PMCID: PMC9903072 DOI: 10.3389/fgene.2022.1028662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 11/15/2022] [Indexed: 01/25/2023] Open
Abstract
With the development of genotyping and sequencing technology, researchers working in the area of conservation genetics are able to obtain the genotypes or even the sequences of a representative sample of individuals from the population. It is of great importance to examine the genomic variants and genes that are highly preferred or pruned during the process of adaptive introgression or long-term hybridization. To the best of our knowledge, we are the first to develop a platform with computational integration of a relative identity-by-descent (rIBD) scores algorithm for introgressive mapping. The rIBD algorithm is designed for mapping the fine-scaled genomic regions under adaptive introgression between the source breeds and the admixed breed. Our rIBD calculation platform provides compact functions including reading input information and uploading of files, rIBD calculation, and presentation of the rIBD scores. We analyzed the simulated data using the rIBD calculation platform and calculated the average IBD score of 0.061 with a standard deviation of 0.124. The rIBD scores generally follow a normal distribution, and a cut-off of 0.432 and -0.310 for both positive and negative rIBD scores is derived to enable the identification of genomic regions showing significant introgression signals from the source breed to the admixed breed. A list of genomic regions with detailed calculated rIBD scores is reported, and all the rIBD scores for each of the considered windows are presented in plots on the rIBD calculation platform. Our rIBD calculation platform provides a user-friendly tool for the calculation of fine-scaled rIBD scores for each of the genomic regions to map possible functional genomic variants due to adaptive introgression or long-term hybridization.
Collapse
Affiliation(s)
- Bo Cui
- School of Chemistry and Biological Engineering, University of Science and Technology, Beijing, China
- College of Water Resources and Civil Engineering, China Agricultural University, Beijing, China
- Institute of Biotechnology, Beijing Academy of Agricultural and Forestry Sciences, Beijing, China
| | - Zhongxu Guo
- School of Chemistry and Biological Engineering, University of Science and Technology, Beijing, China
- College of Water Resources and Civil Engineering, China Agricultural University, Beijing, China
- Institute of Biotechnology, Beijing Academy of Agricultural and Forestry Sciences, Beijing, China
| | - Hongbo Cao
- School of Chemistry and Biological Engineering, University of Science and Technology, Beijing, China
- College of Water Resources and Civil Engineering, China Agricultural University, Beijing, China
- Institute of Biotechnology, Beijing Academy of Agricultural and Forestry Sciences, Beijing, China
| | - Mario Calus
- Department of Animal Science, Animal Breeding and Genetics Group, Wageningen University, Wageningen, Netherlands
| | - Qianqian Zhang
- School of Chemistry and Biological Engineering, University of Science and Technology, Beijing, China
- Institute of Biotechnology, Beijing Academy of Agricultural and Forestry Sciences, Beijing, China
| |
Collapse
|
17
|
Yu Z, Abdel-Azim S, Duggal P, Vergara C. Identity by descent mapping of HCV spontaneous clearance in populations of diverse ancestry. RESEARCH SQUARE 2023:rs.3.rs-2433454. [PMID: 36712049 PMCID: PMC9882640 DOI: 10.21203/rs.3.rs-2433454/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Background Acute infection with hepatitis C virus (HCV) affects millions of individuals worldwide. Host genetics plays a role in spontaneous clearance of the acute infection which occurs in approximately 30% of the individuals. Common variants in GPR158, genes in the interferon lambda (IFNL) cluster, and the MHC region have been associated with HCV clearance in populations of diverse ancestry. Fine mapping of those regions has identified some key variants and amino acids as potential causal variants but the role of rare variants in those regions and in the genome, in general, has not been explored. We aimed to detect haplotypes containing rare variants related to HCV clearance using identity-by-descent (IBD) haplotype sharing between unrelated cases/case pairs and case/controls pairs in 3,608 individuals with European and African ancestry. Results We detected 1,711,832 and 5,678,043 and individual pairs of IBD segments in the European and African ancestry individuals, respectively. As expected, individuals of African descent had more, and shorter segments compared to Europeans. We did not detect any significant IBD signals in the known associated gene regions. Conclusions IBD is based on sharing of haplotypes and is most powerful in populations with a shared founder or recent common ancestor. For the complex trait of HCV clearance, we used two outbred, global populations that limited our power to detect IBD associations. Overall, in this population-based sample we failed to detect rare variations associated with HCV clearance in individuals of European and African ancestry.
Collapse
Affiliation(s)
- Zixuan Yu
- Johns Hopkins University, Bloomberg School of Public Health
| | | | - Priya Duggal
- Johns Hopkins University, Bloomberg School of Public Health
| | | |
Collapse
|
18
|
Noto K, Ruiz L. Accurate genome-wide phasing from IBD data. BMC Bioinformatics 2022; 23:502. [PMID: 36424541 PMCID: PMC9686111 DOI: 10.1186/s12859-022-05066-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 11/17/2022] [Indexed: 11/25/2022] Open
Abstract
As genotype databases increase in size, so too do the number of detectable segments of identity by descent (IBD): segments of the genome where two individuals share an identical copy of one of their two parental haplotypes, due to shared ancestry. We show that given a large enough genotype database, these segments of IBD collectively overlap entire chromosomes, including instances of IBD that span multiple chromosomes, and can be used to accurately separate the alleles inherited from each parent across the entire genome. The resulting phase is not an improvement over state-of-the-art local phasing methods, but provides accurate long-range phasing that indicates which of two haplotypes in different regions of the genome, including different chromosomes, was inherited from the same parent. We are able to separate the DNA inherited from each parent completely, across the entire genome, with 98% median accuracy in a test set of 30,000 individuals. We estimate the IBD data requirements for accurate genome-wide phasing, and we propose a method for estimating confidence in the resulting phase. We show that our methods do not require the genotypes of close family, and that they are robust to genotype errors and missing data. In fact, our method can impute missing data accurately and correct genotype errors.
Collapse
|