1
|
Strategies for pairwise searches in forensic kinship analysis. Forensic Sci Int Genet 2021; 54:102562. [PMID: 34274795 DOI: 10.1016/j.fsigen.2021.102562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/29/2021] [Accepted: 07/03/2021] [Indexed: 11/20/2022]
Abstract
Testing kinship between pairs of individuals is central to a wide range of applications. We focus on cases where many tests are done jointly. Typical examples include cases where DNA profiles are available from a burial site, a plane crash or a database of convicted offenders. The task is to determine the relationships between DNA profiles or individuals. Our approach generalises previous methods and implementations in several respects. We model general, possibly inbred, pairwise relationships which is important for non-human applications and in archaeological studies of ancient inbred populations. Furthermore, we do not restrict attention to autosomal markers. Some cases, such as distinguishing between maternal and paternal half siblings, can be solved using X-chromosomal markers. When many tests are done, the risk of errors increases. We address this problem by building on the theory of multiple testing and show how optimal thresholds for tests can be determined. We point out that the likelihood ratios in a blind search may be dependent so multiple testing methods and interpretation need to account for this. In addition, we show how a Bayesian approach can be helpful. Our examples, using simulated and real data, demonstrate the practical importance of the methods and implementation is based on freely available software.
Collapse
|
2
|
Sun M, Jobling MA, Taliun D, Pramstaller PP, Egeland T, Sheehan NA. On the use of dense SNP marker data for the identification of distant relative pairs. Theor Popul Biol 2015; 107:14-25. [PMID: 26474828 DOI: 10.1016/j.tpb.2015.10.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 10/02/2015] [Accepted: 10/05/2015] [Indexed: 01/05/2023]
Abstract
There has been recent interest in the exploitation of readily available dense genome scan marker data for the identification of relatives. However, there are conflicting findings on how informative these data are in practical situations and, in particular, sets of thinned markers are often used with no concrete justification for the chosen spacing. We explore the potential usefulness of dense single nucleotide polymorphism (SNP) arrays for this application with a focus on inferring distant relative pairs. We distinguish between relationship estimation, as defined by a pedigree connecting the two individuals of interest, and estimation of general relatedness as would be provided by a kinship coefficient or a coefficient of relatedness. Since our primary interest is in the former case, we adopt a pedigree likelihood approach. We consider the effect of additional SNPs and data on an additional typed relative, together with choice of that relative, on relationship inference. We also consider the effect of linkage disequilibrium. When overall relatedness, rather than the specific relationship, would suffice, we propose an approximate approach that is easy to implement and appears to compete well with a popular moment-based estimator and a recent maximum likelihood approach based on chromosomal sharing. We conclude that denser marker data are more informative for distant relatives. However, linkage disequilibrium cannot be ignored and will be the main limiting factor for applications to real data.
Collapse
Affiliation(s)
- M Sun
- Department of Health Sciences, University of Leicester, UK
| | - M A Jobling
- Department of Genetics, University of Leicester, UK
| | - D Taliun
- Center for Biomedicine, European Academy of Bolzano (EURAC), Bolzano, Italy; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - P P Pramstaller
- Center for Biomedicine, European Academy of Bolzano (EURAC), Bolzano, Italy
| | - T Egeland
- IKBM Norwegian University of Life Sciences, Norway
| | - N A Sheehan
- Department of Health Sciences, University of Leicester, UK; Department of Genetics, University of Leicester, UK.
| |
Collapse
|
3
|
Anderson EC, Ng TC. Bayesian pedigree inference with small numbers of single nucleotide polymorphisms via a factor-graph representation. Theor Popul Biol 2015; 107:39-51. [PMID: 26450523 DOI: 10.1016/j.tpb.2015.09.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Revised: 09/23/2015] [Accepted: 09/24/2015] [Indexed: 10/22/2022]
Abstract
We develop a computational framework for addressing pedigree inference problems using small numbers (80-400) of single nucleotide polymorphisms (SNPs). Our approach relaxes the assumptions, which are commonly made, that sampling is complete with respect to the pedigree and that there is no genotyping error. It relies on representing the inferred pedigree as a factor graph and invoking the Sum-Product algorithm to compute and store quantities that allow the joint probability of the data to be rapidly computed under a large class of rearrangements of the pedigree structure. This allows efficient MCMC sampling over the space of pedigrees, and, hence, Bayesian inference of pedigree structure. In this paper we restrict ourselves to inference of pedigrees without loops using SNPs assumed to be unlinked. We present the methodology in general for multigenerational inference, and we illustrate the method by applying it to the inference of full sibling groups in a large sample (n=1157) of Chinook salmon typed at 95 SNPs. The results show that our method provides a better point estimate and estimate of uncertainty than the currently best-available maximum-likelihood sibling reconstruction method. Extensions of this work to more complex scenarios are briefly discussed.
Collapse
Affiliation(s)
- Eric C Anderson
- Fisheries Ecology Division, Southwest Fisheries Science Center, National Marine Fisheries Service, National Oceanic and Atmospheric Administration, 110 Shaffer Road, Santa Cruz, CA 95060, USA.
| | - Thomas C Ng
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
| |
Collapse
|
4
|
Egeland T, Dørum G, Vigeland MD, Sheehan NA. Mixtures with relatives: A pedigree perspective. Forensic Sci Int Genet 2014; 10:49-54. [DOI: 10.1016/j.fsigen.2014.01.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2013] [Revised: 01/13/2014] [Accepted: 01/22/2014] [Indexed: 10/25/2022]
|
5
|
Cowell RG. A simple greedy algorithm for reconstructing pedigrees. Theor Popul Biol 2012; 83:55-63. [PMID: 23164633 DOI: 10.1016/j.tpb.2012.11.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2012] [Revised: 10/29/2012] [Accepted: 11/01/2012] [Indexed: 11/18/2022]
Abstract
This paper introduces a simple greedy algorithm for searching for high likelihood pedigrees using micro-satellite (STR) genotype information on a complete sample of related individuals. The core idea behind the algorithm is not new, but it is believed that putting it into a greedy search setting, and specifically the application to pedigree learning, is novel. The algorithm does not require age or sex information, but this information can be incorporated if desired. The algorithm is applied to human and non-human genetic data and in a simulation study.
Collapse
|
6
|
Cussens J, Bartlett M, Jones EM, Sheehan NA. Maximum Likelihood Pedigree Reconstruction Using Integer Linear Programming. Genet Epidemiol 2012; 37:69-83. [DOI: 10.1002/gepi.21686] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2012] [Revised: 08/30/2012] [Accepted: 09/07/2012] [Indexed: 11/10/2022]
Affiliation(s)
- James Cussens
- Department of Computer Science; University of York; York; North Yorkshire; United Kingdom
| | - Mark Bartlett
- Department of Computer Science; University of York; York; North Yorkshire; United Kingdom
| | - Elinor M. Jones
- Department of Health Sciences; University of Leicester; Leicester; Leicestershire; United Kingdom
| | - Nuala A. Sheehan
- Department of Health Sciences; University of Leicester; Leicester; Leicestershire; United Kingdom
| |
Collapse
|
7
|
Abstract
MOTIVATION Family relationships can be estimated from DNA marker data. Applications arise in a large number of areas including evolution and conservation research, genealogical research in human, plant and animal populations, forensic problems and genetic mapping via linkage and association analyses. Traditionally, likelihood-based approaches to relationship estimation have used unlinked genetic markers. Due to the fact that some relationships cannot be distinguished from data at unlinked markers, and given the limited number of such markers available, there are considerable constraints on the type of identification problem that can be satisfactorily addressed with such approaches. The aim of this article is to explore the potential of linked autosomal single nucleotide polymorphism markers in this context. Throughout, we will view the problem of relationship estimation as one of pedigree identification rather than identity-by-descent, and thus focus on applications where determination of the exact relationship is important. RESULTS We show that the increase in information obtained by exploiting large sets of linked markers substantially increases the number of problems that can be solved. Results are presented based on simulations as well as on real data. AVAILABILITY The R library FEST is freely available from http://folk.uio.no/thoree/FEST.
Collapse
Affiliation(s)
- Øivind Skare
- Norwegian Institute of Public Health, 0403 Oslo, Norway
| | | | | |
Collapse
|
8
|
Abstract
SUMMARY We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatellites and single nucleotide polymorphisms (SNPs). If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov Chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical dataset with known pedigree. The parentage inference is robust even in the presence of genotyping errors. AVAILABILITY The C source code of FRANz can be obtained under the GPL from http://www.bioinf.uni-leipzig.de/Software/FRANz/.
Collapse
Affiliation(s)
- Markus Riester
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany.
| | | | | |
Collapse
|
9
|
Egeland T, Sheehan N. On identification problems requiring linked autosomal markers. Forensic Sci Int Genet 2008; 2:219-25. [PMID: 19083824 DOI: 10.1016/j.fsigen.2008.02.006] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2007] [Revised: 01/22/2008] [Accepted: 02/21/2008] [Indexed: 11/18/2022]
Affiliation(s)
- Thore Egeland
- Department of Medical Genetics, Ulleval University Hospital, 0407 Oslo, Norway.
| | | |
Collapse
|
10
|
Sheehan NA, Egeland T. Adjusting for founder relatedness in a linkage analysis using prior information. Hum Hered 2007; 65:221-31. [PMID: 18073492 DOI: 10.1159/000112369] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Accepted: 07/31/2007] [Indexed: 11/19/2022] Open
Abstract
In genetic linkage studies, while the pedigrees are generally known, background relatedness between the founding individuals, assumed by definition to be unrelated, can seriously affect the results of the analysis. Likelihood approaches to relationship estimation from genetic marker data can all be expressed in terms of finding the most likely pedigree connecting the individuals of interest. When the true relationship is the main focus, the set of all possible alternative pedigrees can be too large to consider. However, prior information is often available which, when incorporated in a formal and structured way, can restrict this set to a manageable size thus enabling the calculation of a posterior distribution from which inferences can be drawn. Here, the unknown relationships are more of a nuisance factor than of interest in their own right, so the focus is on adjusting the results of the analysis rather than on direct estimation. In this paper, we show how prior information on founder relationships can be exploited in some applications to generate a set of candidate extended pedigrees. We then weight the relevant pedigree-specific likelihoods by their posterior probabilities to adjust the lod score statistics.
Collapse
Affiliation(s)
- N A Sheehan
- Department of Health Sciences and Department of Genetics, University of Leicester, Leicester, UK.
| | | |
Collapse
|
11
|
Barrett JH, Sheehan NA, Cox A, Worthington J, Cannings C, Teare MD. Family based studies and genetic epidemiology: theory and practice. Hum Hered 2007; 64:146-8. [PMID: 17476114 DOI: 10.1159/000101993] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2007] [Accepted: 02/19/2007] [Indexed: 11/19/2022] Open
Abstract
Family based studies have underpinned many successes in uncovering the causes of monogenic and oligogenic diseases. Now research is focussing on the identification and characterisation of genes underlying common diseases and it is widely accepted that these studies will require large population based samples. Population based family study designs have the potential to facilitate the analysis of the effects of both genes and environment. These types of studies integrate the population based approaches of classic epidemiology and the methods enabling the analysis of correlations between relatives sharing both genes and environment. The extent to which such studies are feasible will depend upon population- and disease-specific factors. To review this topic, a symposium was held to present and discuss the costs, requirements and advantages of population based family study designs. This article summarises the features of the meeting held at The University of Sheffield, August 2006.
Collapse
Affiliation(s)
- J H Barrett
- Genetic Epidemiology Division, Leeds Institute of Molecular Medicine, University of Leeds, Leeds, UK
| | | | | | | | | | | |
Collapse
|