1
|
Zhang N, Shi S, Lin S, Bai Z, Ling X, Gao J, Yan R, Ou X. Application of SNPs with low minor allele frequencies in missing person identification (MPI) through kinship analysis of DNA mixtures. Electrophoresis 2023; 44:1569-1578. [PMID: 37454302 DOI: 10.1002/elps.202300111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 06/18/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
The need to identify a missing person (MP) through kinship analysis of DNA samples found at a crime scene has become increasingly prevalent. DNA samples from MPs can be severely degraded, contain little DNA and mixed with other contributors, which often makes it difficult to apply conventional methods in practice. This study developed a massively parallel sequencing-based panel that contains 1661 single-nucleotide polymorphisms (SNPs) with low minor allele frequencies (MAFs) (averaged at 0.0613) in the Chinese Han population, and the strategy for relationship inference from DNA mixtures comprising different numbers of contributors (NOCs) and of varying allele dropout probabilities. Based on the simulated dataset and genotyping results of 42 artificial DNA mixtures (NOC = 2-4), it was observed that the present SNP panel was sufficient for balanced mixtures when referenced to the closest relatives (parents/offspring and full siblings). When the mixture profiles suffered from dropout, incorrect assignments were markedly associated with relatedness, NOC and the dropout level. We, therefore, indicate that SNPs with low MAFs could be reliably interpreted for MP identification through the kinship analysis of complex DNA mixtures. Further studies should be extended to more possible scenarios to test the feasibility of this present approach.
Collapse
Affiliation(s)
- Nan Zhang
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| | - Shanshan Shi
- Fetal Medicine Department, The First Affiliated Hospital of Jinan University, Guangzhou, P. R. China
| | - Shaobin Lin
- Fetal Medicine Center, Department of Obstetrics and Gynecology, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, P. R. China
| | - Zhaochen Bai
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| | - Xiaohua Ling
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| | - Jun Gao
- Reproductive Medicine Center, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, P. R. China
| | - Ruiling Yan
- Fetal Medicine Department, The First Affiliated Hospital of Jinan University, Guangzhou, P. R. China
| | - Xueling Ou
- Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, P. R. China
- Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-sen University, Guangzhou, P. R. China
| |
Collapse
|
2
|
Hauser S, Galla SJ, Putnam AS, Steeves TE, Latch EK. Comparing genome-based estimates of relatedness for use in pedigree-based conservation management. Mol Ecol Resour 2022; 22:2546-2558. [PMID: 35510790 DOI: 10.1111/1755-0998.13630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 02/28/2022] [Accepted: 03/30/2022] [Indexed: 12/01/2022]
Abstract
Researchers have long debated which estimator of relatedness best captures the degree of relationship between two individuals. In the genomics era, this debate continues, with relatedness estimates being sensitive to the methods used to generate markers, marker quality, and levels of diversity in sampled individuals. Here, we compare six commonly used genome-based relatedness estimators (kinship genetic distance (KGD), Wang Maximum Likelihood (TrioML), Queller and Goodnight (Rxy ), Kinship INference for Genome-wide association studies (KING-robust), and Pairwise Relatedness (RAB ), allele-sharing co-ancestry (AS)) across five species bred in captivity-including three birds and two mammals-with varying degrees of reliable pedigree data, using reduced-representation and whole genome resequencing data. Genome-based relatedness estimates varied widely across estimators, sequencing methods, and species, yet the most consistent results for known first order relationships were found using Rxy , RAB , and AS. However, AS was found to be less consistently correlated with known pedigree relatedness than either Rxy or RAB . Our combined results indicate there is not a single genome-based estimator that is ideal across different species and data types. To determine the most appropriate genome-based relatedness estimator for each new dataset, we recommend assessing the relative: (1) correlation of candidate estimators with known relationships in the pedigree and (2) precision of candidate estimators with known first-order relationships. These recommendations are broadly applicable to conservation breeding programs, particularly where genome-based estimates of relatedness can complement and complete poorly pedigreed populations. Given a growing interest in the application of wild pedigrees, our results are also applicable to in-situ wildlife management.
Collapse
Affiliation(s)
- Samantha Hauser
- Department of Biological Sciences, University of Wisconsin, Milwaukee, Wisconsin, USA.,Embark Veterinary, Inc., Boston, Massachusetts, United States of America
| | - Stephanie J Galla
- School of Biological Sciences, University of Canterbury, New Zealand.,Department of Biological Sciences, Boise State University, Boise, Idaho, USA
| | - Andrea S Putnam
- Department of Exhibit-Curators, San Diego Zoo Wildlife Alliance, San Diego, California, USA
| | - Tammy E Steeves
- School of Biological Sciences, University of Canterbury, New Zealand
| | - Emily K Latch
- Department of Biological Sciences, University of Wisconsin, Milwaukee, Wisconsin, USA
| |
Collapse
|
3
|
Identification of missing persons through kinship analysis by microhaplotype sequencing of single-source DNA and two-person DNA mixtures. Forensic Sci Int Genet 2022; 58:102689. [DOI: 10.1016/j.fsigen.2022.102689] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 02/22/2022] [Accepted: 03/14/2022] [Indexed: 11/04/2022]
|
4
|
Kling D, Phillips C, Kennett D, Tillmar A. Investigative genetic genealogy: Current methods, knowledge and practice. Forensic Sci Int Genet 2021; 52:102474. [PMID: 33592389 DOI: 10.1016/j.fsigen.2021.102474] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 01/12/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022]
Abstract
Investigative genetic genealogy (IGG) has emerged as a new, rapidly growing field of forensic science. We describe the process whereby dense SNP data, commonly comprising more than half a million markers, are employed to infer distant relationships. By distant we refer to degrees of relatedness exceeding that of first cousins. We review how methods of relationship matching and SNP analysis on an enlarged scale are used in a forensic setting to identify a suspect in a criminal investigation or a missing person. There is currently a strong need in forensic genetics not only to understand the underlying models to infer relatedness but also to fully explore the DNA technologies and data used in IGG. This review brings together many of the topics and examines their effectiveness and operational limits, while suggesting future directions for their forensic validation. We further investigated the methods used by the major direct-to-consumer (DTC) genetic ancestry testing companies as well as submitting a questionnaire where providers of forensic genetic genealogy summarized their operation/services. Although most of the DTC market, and genetic genealogy in general, has undisclosed, proprietary algorithms we review the current knowledge where information has been discussed and published more openly.
Collapse
Affiliation(s)
- Daniel Kling
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Forensic Sciences, Oslo University Hospital, Oslo, Norway.
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Santiago de Compostela, Spain.
| | - Debbie Kennett
- Research Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, United Kingdom
| | - Andreas Tillmar
- Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Department of Biomedical and Clinical Sciences, Faculty of Medicine and Health Sciences, Linköping University, Linköping, Sweden
| |
Collapse
|
5
|
DeVogel N, Auer PL, Manansala R, Rau A, Wang T. A unified linear mixed model for familial relatedness and population structure in genetic association studies. Genet Epidemiol 2020; 45:305-315. [PMID: 33175443 DOI: 10.1002/gepi.22371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/14/2020] [Accepted: 10/20/2020] [Indexed: 11/10/2022]
Abstract
Familial relatedness (FR) and population structure (PS) are two major sources for genetic correlation. In the human population, both FR and PS can further break down into additive and dominant components to account for potential additive and dominant genetic effects. In this study, besides the classical additive genomic relationship matrix, a dominant genomic relationship matrix is introduced. A link between the additive/dominant genomic relationship matrices and the coancestry (or kinship)/double coancestry coefficients is also established. In addition, a way to separate the FR and PS correlations based on the estimates of coancestry and double coancestry coefficients from the genomic relationship matrices is proposed. A unified linear mixed model is also developed, which can account for both the additive and dominance effects of FR and PS correlations as well as their possible random interactions. Finally, this unified linear mixed model is applied to analyze two study cohorts from UK Biobank.
Collapse
Affiliation(s)
- Nicholas DeVogel
- Division of Biostatistics, Institute for Health and Equity, Milwaukee, Wisconsin, USA
| | - Paul L Auer
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Regina Manansala
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA
| | - Andrea Rau
- Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, USA.,INRAE, AgroParisTech, GABI, Université Paris-Saclay, Jouy-en-Josas, France
| | - Tao Wang
- Division of Biostatistics, Institute for Health and Equity, Milwaukee, Wisconsin, USA
| |
Collapse
|
6
|
Albers PK, McVean G. Dating genomic variants and shared ancestry in population-scale sequencing data. PLoS Biol 2020; 18:e3000586. [PMID: 31951611 PMCID: PMC6992231 DOI: 10.1371/journal.pbio.3000586] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 01/30/2020] [Accepted: 01/02/2020] [Indexed: 12/31/2022] Open
Abstract
The origin and fate of new mutations within species is the fundamental process underlying evolution. However, while much attention has been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a nonparametric approach for estimating the date of origin of genetic variants in large-scale sequencing data sets. The accuracy and robustness of the approach is demonstrated through simulation. Using data from two publicly available human genomic diversity resources, we estimated the age of more than 45 million single-nucleotide polymorphisms (SNPs) in the human genome and release the Atlas of Variant Age as a public online database. We characterize the relationship between variant age and frequency in different geographical regions and demonstrate the value of age information in interpreting variants of functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the ancestry shared between individual genomes and to quantify genealogical relationships at different points in the past, as well as to describe and explore the evolutionary history of modern human populations.
Collapse
Affiliation(s)
- Patrick K. Albers
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
7
|
Hanghøj K, Moltke I, Andersen PA, Manica A, Korneliussen TS. Fast and accurate relatedness estimation from high-throughput sequencing data in the presence of inbreeding. Gigascience 2019; 8:giz034. [PMID: 31042285 PMCID: PMC6488770 DOI: 10.1093/gigascience/giz034] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Revised: 01/08/2019] [Accepted: 03/11/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The estimation of relatedness between pairs of possibly inbred individuals from high-throughput sequencing (HTS) data has previously not been possible for samples where we cannot obtain reliable genotype calls, as in the case of low-coverage data. RESULTS We introduce ngsRelateV2, a major revision of ngsRelateV1, a program that originally allowed for estimation of relatedness from HTS data among non-inbred individuals only. The new revised version takes into account the possibility of individuals being inbred by estimating the 9 condensed Jacquard coefficients along with various other relatedness statistics. The program is threaded and scales linearly with the number of cores allocated to the process. CONCLUSION The program is available as an open source C/C++ program under the GPL license and hosted at https://github.com/ANGSD/ngsRelate. To facilitate easy analysis, the program is able to work directly on the most commonly used container formats for raw sequence (BAM/CRAM) and summary data (VCF/BCF).
Collapse
Affiliation(s)
- Kristian Hanghøj
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen K, Denmark
- Université de Toulouse, University Paul Sabatier (UPS), Laboratoire AMIS, CNRS UMR 5288, Toulouse, France
| | - Ida Moltke
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Philip Alstrup Andersen
- Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200 Copenhagen, Denmark
| | - Andrea Manica
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Thorfinn Sand Korneliussen
- Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen K, Denmark
- Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| |
Collapse
|
8
|
Attard CRM, Beheregaray LB, Möller LM. Genotyping‐by‐sequencing for estimating relatedness in nonmodel organisms: Avoiding the trap of precise bias. Mol Ecol Resour 2018; 18:381-390. [DOI: 10.1111/1755-0998.12739] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 11/02/2017] [Accepted: 11/02/2017] [Indexed: 12/29/2022]
Affiliation(s)
- Catherine R. M. Attard
- Molecular Ecology Lab College of Science and Engineering Flinders University Adelaide SA Australia
| | - Luciano B. Beheregaray
- Molecular Ecology Lab College of Science and Engineering Flinders University Adelaide SA Australia
| | - Luciana M. Möller
- Molecular Ecology Lab College of Science and Engineering Flinders University Adelaide SA Australia
| |
Collapse
|
9
|
Ko A, Nielsen R. Composite likelihood method for inferring local pedigrees. PLoS Genet 2017; 13:e1006963. [PMID: 28827797 PMCID: PMC5578687 DOI: 10.1371/journal.pgen.1006963] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Revised: 08/31/2017] [Accepted: 08/07/2017] [Indexed: 12/21/2022] Open
Abstract
Pedigrees contain information about the genealogical relationships among individuals and are of fundamental importance in many areas of genetic studies. However, pedigrees are often unknown and must be inferred from genetic data. Despite the importance of pedigree inference, existing methods are limited to inferring only close relationships or analyzing a small number of individuals or loci. We present a simulated annealing method for estimating pedigrees in large samples of otherwise seemingly unrelated individuals using genome-wide SNP data. The method supports complex pedigree structures such as polygamous families, multi-generational families, and pedigrees in which many of the member individuals are missing. Computational speed is greatly enhanced by the use of a composite likelihood function which approximates the full likelihood. We validate our method on simulated data and show that it can infer distant relatives more accurately than existing methods. Furthermore, we illustrate the utility of the method on a sample of Greenlandic Inuit. Pedigrees contain information about the genealogical relationships among individuals. This information can be used in many areas of genetic studies such as disease association studies, conservation efforts, and for inferences about the demographic history and social structure of a population. Despite their importance, pedigrees are often unknown and must be estimated from genetic information. However, pedigree inference remains a difficult problem due to the high cost of likelihood computation and the enormous number of possible pedigrees that must be considered. These difficulties limit existing methods in their ability to infer pedigrees when the sample size or the number of markers is large, or when the sample contains only distant relatives. In this report, we present a method that circumvents these computational challenges in order to infer pedigrees of complex structure for a large number of individuals. Using simulations, we find that the method can infer distant relatives much more accurately than existing methods. Furthermore, we show that even pairwise inferences of relatedness can be improved substantially by consideration of the pedigree structure with other related individuals in the sample.
Collapse
Affiliation(s)
- Amy Ko
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- * E-mail:
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Department of Statistics, University of California, Berkeley, Berkeley, California, United States of America
- Museum of Natural History, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
10
|
Morimoto C, Manabe S, Kawaguchi T, Kawai C, Fujimoto S, Hamano Y, Yamada R, Matsuda F, Tamaki K. Pairwise Kinship Analysis by the Index of Chromosome Sharing Using High-Density Single Nucleotide Polymorphisms. PLoS One 2016; 11:e0160287. [PMID: 27472558 PMCID: PMC4966930 DOI: 10.1371/journal.pone.0160287] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 07/15/2016] [Indexed: 11/18/2022] Open
Abstract
We developed a new approach for pairwise kinship analysis in forensic genetics based on chromosomal sharing between two individuals. Here, we defined "index of chromosome sharing" (ICS) calculated using 174,254 single nucleotide polymorphism (SNP) loci typed by SNP microarray and genetic length of the shared segments from the genotypes of two individuals. To investigate the expected ICS distributions from first- to fifth-degree relatives and unrelated pairs, we used computationally generated genotypes to consider the effect of linkage disequilibrium and recombination. The distributions were used for probabilistic evaluation of the pairwise kinship analysis, such as likelihood ratio (LR) or posterior probability, without allele frequencies and haplotype frequencies. Using our method, all actual sample pairs from volunteers showed significantly high LR values (i.e., ≥ 108); therefore, we can distinguish distant relationships (up to the fifth-degree) from unrelated pairs based on LR. Moreover, we can determine accurate degrees of kinship in up to third-degree relationships with a probability of > 80% using the criterion of posterior probability ≥ 0.90, even if the kinship of the pair is totally unpredictable. This approach greatly improves pairwise kinship analysis of distant relationships, specifically in cases involving identification of disaster victims or missing persons.
Collapse
Affiliation(s)
- Chie Morimoto
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Sho Manabe
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Takahisa Kawaguchi
- Unit of Human Disease Genomics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Chihiro Kawai
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Shuntaro Fujimoto
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Yuya Hamano
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
- Forensic Science Laboratory, Kyoto Prefectural Police Headquarters, Kyoto, Japan
| | - Ryo Yamada
- Unit of Statistical Genetics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Fumihiko Matsuda
- Unit of Human Disease Genomics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Keiji Tamaki
- Department of Forensic Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
- * E-mail:
| |
Collapse
|