1
|
Baldrighi GN, Nova A, Bernardinelli L, Fazia T. A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software. LIFE (BASEL, SWITZERLAND) 2022; 12:life12122030. [PMID: 36556394 PMCID: PMC9781110 DOI: 10.3390/life12122030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 12/01/2022] [Accepted: 12/02/2022] [Indexed: 12/09/2022]
Abstract
Genotype imputation has become an essential prerequisite when performing association analysis. It is a computational technique that allows us to infer genetic markers that have not been directly genotyped, thereby increasing statistical power in subsequent association studies, which consequently has a crucial impact on the identification of causal variants. Many features need to be considered when choosing the proper algorithm for imputation, including the target sample on which it is performed, i.e., related individuals, unrelated individuals, or both. Problems could arise when dealing with a target sample made up of mixed data, composed of both related and unrelated individuals, especially since the scientific literature on this topic is not sufficiently clear. To shed light on this issue, we examined existing algorithms and software for performing phasing and imputation on mixed human data from SNP arrays, specifically when related subjects belong to trios. By discussing the advantages and limitations of the current algorithms, we identified LD-based methods as being the most suitable for reconstruction of haplotypes in this specific context, and we proposed a feasible pipeline that can be used for imputing genotypes in both phased and unphased human data.
Collapse
|
2
|
Kothiyal P, Wong WSW, Bodian DL, Niederhuber JE. Mendelian Inconsistent Signatures from 1314 Ancestrally Diverse Family Trios Distinguish Biological Variation from Sequencing Error. J Comput Biol 2019; 26:405-419. [PMID: 30942611 PMCID: PMC6533806 DOI: 10.1089/cmb.2018.0253] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Next-generation sequencing enables advances in the clinical application of genomics by providing high-throughput detection of genomic variation. However, next-generation sequencing technologies, especially whole-genome sequencing (WGS), are often associated with a high false-positive rate. Trio-based WGS can contribute significantly towards improved quality control methods. Mendelian-inconsistent calls (MIC) in parent–child trios are commonly attributed to erroneous sequencing calls, as the true de novo mutation rate is extremely low compared with MIC incidence. Here, we analyzed WGS data from 1314 mother, father, and child trios across ethnically diverse populations with the goal of characterizing MIC. Genotype calls in a trio can be used to assign different signatures to MIC. MIC occur more frequently within repeats but show varying distribution and error mechanisms across repeat types. MIC are enriched within poly-A/T runs in short interspersed nuclear elements. Alignability scores, allele balance, and relative parental read depth vary among MIC signatures and these differences should be considered when designing filters for MIC reduction. MIC cluster in germline deletions and these MIC also segregate with population. Our results provide a basis for making decisions on how each MIC type should be evaluated before discarding them as errors or including them in alternative applications. With the reduction of sequencing cost, family trio whole genome and exome analysis are being performed more routinely in clinical practice. We provide a reference that can be used for annotating MIC with their frequencies in a larger population to aid in the filtering of candidate de novo mutations.
Collapse
Affiliation(s)
- Prachi Kothiyal
- 1 Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia
| | - Wendy S W Wong
- 1 Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia
| | - Dale L Bodian
- 1 Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia
| | - John E Niederhuber
- 1 Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia.,2 Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
3
|
Watson CM, Crinnion LA, Gurgel-Gianetti J, Harrison SM, Daly C, Antanavicuite A, Lascelles C, Markham AF, Pena SDJ, Bonthron DT, Carr IM. Rapid Detection of Rare Deleterious Variants by Next Generation Sequencing with Optional Microarray SNP Genotype Data. Hum Mutat 2015; 36:823-30. [PMID: 26037133 PMCID: PMC4744743 DOI: 10.1002/humu.22818] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 05/27/2015] [Indexed: 11/25/2022]
Abstract
Autozygosity mapping is a powerful technique for the identification of rare, autosomal recessive, disease‐causing genes. The ease with which this category of disease gene can be identified has greatly increased through the availability of genome‐wide SNP genotyping microarrays and subsequently of exome sequencing. Although these methods have simplified the generation of experimental data, its analysis, particularly when disparate data types must be integrated, remains time consuming. Moreover, the huge volume of sequence variant data generated from next generation sequencing experiments opens up the possibility of using these data instead of microarray genotype data to identify disease loci. To allow these two types of data to be used in an integrated fashion, we have developed AgileVCFMapper, a program that performs both the mapping of disease loci by SNP genotyping and the analysis of potentially deleterious variants using exome sequence variant data, in a single step. This method does not require microarray SNP genotype data, although analysis with a combination of microarray and exome genotype data enables more precise delineation of disease loci, due to superior marker density and distribution.
Collapse
Affiliation(s)
- Christopher M Watson
- School of Medicine, University of Leeds, Leeds, United Kingdom.,Yorkshire Regional Genetics Service, St James's University Hospital, Leeds, United Kingdom
| | - Laura A Crinnion
- School of Medicine, University of Leeds, Leeds, United Kingdom.,Yorkshire Regional Genetics Service, St James's University Hospital, Leeds, United Kingdom
| | - Juliana Gurgel-Gianetti
- Department of Pediatrics, Faculty of Medicine, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | | | - Catherine Daly
- School of Medicine, University of Leeds, Leeds, United Kingdom
| | | | | | | | - Sergio D J Pena
- Laboratory of Clinical Genomics, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil.,GENE-Nucleo de Genetica Medica de Minas Gerais, Belo Horizonte, Brazil
| | - David T Bonthron
- School of Medicine, University of Leeds, Leeds, United Kingdom.,Yorkshire Regional Genetics Service, St James's University Hospital, Leeds, United Kingdom
| | - Ian M Carr
- School of Medicine, University of Leeds, Leeds, United Kingdom
| |
Collapse
|