1
|
Abstract
For many years, linkage analysis was the primary tool used for the genetic mapping of Mendelian and complex traits with familial aggregation. Linkage analysis was largely supplanted by the wide adoption of genome-wide association studies (GWASs). However, with the recent increased use of whole-genome sequencing (WGS), linkage analysis is again emerging as an important and powerful analysis method for the identification of genes involved in disease aetiology, often in conjunction with WGS filtering approaches. Here, we review the principles of linkage analysis and provide practical guidelines for carrying out linkage studies using WGS data.
Collapse
Affiliation(s)
- Jurg Ott
- 1] Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, 16 Lincui Road, Beijing 100101, China. [2] Laboratory of Statistical Genetics, Rockefeller University, 1230 York Avenue, New York, New York 10065, USA
| | - Jing Wang
- Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, 16 Lincui Road, Beijing 100101, China
| | - Suzanne M Leal
- Center for Statistical Genetics, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| |
Collapse
|
2
|
Abstract
Restriction site-associated DNA sequencing or genotyping-by-sequencing (GBS) approaches allow for rapid and cost-effective discovery and genotyping of thousands of single-nucleotide polymorphisms (SNPs) in multiple individuals. However, rigorous quality control practices are needed to avoid high levels of error and bias with these reduced representation methods. We developed a formal statistical framework for filtering spurious loci, using Mendelian inheritance patterns in nuclear families, that accommodates variable-quality genotype calls and missing data--both rampant issues with GBS data--and for identifying sex-linked SNPs. Simulations predict excellent performance of both the Mendelian filter and the sex-linkage assignment under a variety of conditions. We further evaluate our method by applying it to real GBS data and validating a subset of high-quality SNPs. These results demonstrate that our metric of Mendelian inheritance is a powerful quality filter for GBS loci that is complementary to standard coverage and Hardy-Weinberg filters. The described method, implemented in the software MendelChecker, will improve quality control during SNP discovery in nonmodel as well as model organisms.
Collapse
|
3
|
Markus B, Birk OS, Geiger D. Integration of SNP genotyping confidence scores in IBD inference. ACTA ACUST UNITED AC 2011; 27:2880-7. [PMID: 21862568 DOI: 10.1093/bioinformatics/btr486] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
MOTIVATION High-throughput single nucleotide polymorphism (SNP) arrays have become the standard platform for linkage and association analyses. The high SNP density of these platforms allows high-resolution identification of ancestral recombination events even for distant relatives many generations apart. However, such inference is sensitive to marker mistyping and current error detection methods rely on the genotyping of additional close relatives. Genotyping algorithms provide a confidence score for each marker call that is currently not integrated in existing methods. There is a need for a model that incorporates this prior information within the standard identical by descent (IBD) and association analyses. RESULTS We propose a novel model that incorporates marker confidence scores within IBD methods based on the Lander-Green Hidden Markov Model. The novel parameter of this model is the joint distribution of confidence scores and error status per array. We estimate this probability distribution by applying a modified expectation-maximization (EM) procedure on data from nuclear families genotyped with Affymetrix 250K SNP arrays. The converged tables from two different genotyping algorithms are shown for a wide range of error rates. We demonstrate the efficacy of our method in refining the detection of IBD signals using nuclear pedigrees and distant relatives. AVAILABILITY Plinke, a new version of Plink with an extended pairwise IBD inference model allowing per marker error probabilities is freely available at: http://bioinfo.bgu.ac.il/bsu/software/plinke. CONTACT obirk@bgu.ac.il; markusb@bgu.ac.il SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Barak Markus
- The Morris Kahn Laboratory of Human Genetics, Department of Virology and Developmental Genetics, NIBN, Ben Gurion University, Israel.
| | | | | |
Collapse
|
4
|
Burton PR, Hansell AL, Fortier I, Manolio TA, Khoury MJ, Little J, Elliott P. Size matters: just how big is BIG?: Quantifying realistic sample size requirements for human genome epidemiology. Int J Epidemiol 2009; 38:263-73. [PMID: 18676414 PMCID: PMC2639365 DOI: 10.1093/ije/dyn147] [Citation(s) in RCA: 168] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2008] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Despite earlier doubts, a string of recent successes indicates that if sample sizes are large enough, it is possible-both in theory and in practice-to identify and replicate genetic associations with common complex diseases. But human genome epidemiology is expensive and, from a strategic perspective, it is still unclear what 'large enough' really means. This question has critical implications for governments, funding agencies, bioscientists and the tax-paying public. Difficult strategic decisions with imposing price tags and important opportunity costs must be taken. METHODS Conventional power calculations for case-control studies disregard many basic elements of analytic complexity-e.g. errors in clinical assessment, and the impact of unmeasured aetiological determinants-and can seriously underestimate true sample size requirements. This article describes, and applies, a rigorous simulation-based approach to power calculation that deals more comprehensively with analytic complexity and has been implemented on the web as ESPRESSO: (www.p3gobservatory.org/powercalculator.htm). RESULTS Using this approach, the article explores the realistic power profile of stand-alone and nested case-control studies in a variety of settings and provides a robust quantitative foundation for determining the required sample size both of individual biobanks and of large disease-based consortia. Despite universal acknowledgment of the importance of large sample sizes, our results suggest that contemporary initiatives are still, at best, at the lower end of the range of desirable sample size. Insufficient power remains particularly problematic for studies exploring gene-gene or gene-environment interactions. Discussion Sample size calculation must be both accurate and realistic, and we must continue to strengthen national and international cooperation in the design, conduct, harmonization and integration of studies in human genome epidemiology.
Collapse
Affiliation(s)
- Paul R Burton
- Department of Health Sciences, University of Leicester, Leicester LE1 7RH, UK.
| | | | | | | | | | | | | |
Collapse
|
5
|
Li B, Leal SM. Deviations from hardy-weinberg equilibrium in parental and unaffected sibling genotype data. Hum Hered 2008; 67:104-15. [PMID: 19077427 DOI: 10.1159/000179558] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2008] [Accepted: 04/24/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Genotyping error can increase both type I and II errors. In order to elucidate potential genotyping errors, data quality control often includes testing genotype data for deviations from Hardy-Weinberg Equilibrium (HWE). METHODS The Hardy-Weinberg Disequilibrium (HWD) coefficient and the ability to reject the null hypothesis of HWE were calculated analytically for genotype data from parents and unaffected siblings of affected probands. RESULTS Genotype data from parents and unaffected siblings display deviations from HWE when functional or markers in LD with functional locus are tested. For the parental genotype data all deviations from HWE are negative, indicating an excess of heterozygous genotypes with the strongest deviations from HWE observed for the multiplicative model. In contrast, for affected proband genotype data, there is no deviation from HWE under the multiplicative model and the deviations from HWE for the recessive model are positive. For the unaffected sibling data, patterns of deviation from HWE are similar to those observed in the proband data with the exception of the multiplicative model where the HWD coefficient although close to 0 can be either positive or negative depending on the allele frequency. CONCLUSION Deviations from HWE in parental and unaffected sibling genotype data could be due to an association with the functional locus. However these deviations for genotypic relative risk < or =2.0 are not large and therefore the power to detect them is usually low. Testing for deviations from HWE in parental and unaffected sibling genotype data is still beneficial for quality control even though functional loci, in parental and unaffected sibling genotype data, can produce an association signal.
Collapse
Affiliation(s)
- Bingshan Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Tex., USA
| | | |
Collapse
|
6
|
Leal SM. Detection of genotyping errors and pseudo-SNPs via deviations from Hardy-Weinberg equilibrium. Genet Epidemiol 2005; 29:204-14. [PMID: 16080207 PMCID: PMC6192426 DOI: 10.1002/gepi.20086] [Citation(s) in RCA: 86] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Genotype error can greatly reduce the power of a genetic study. For family data, genotype error can be assessed by examining marker data for non-Mendelian inconsistencies, closely linked markers for double recombination events, and consistency of duplicate genotypes. For case-control data, duplicate samples are genotyped, and controls are tested for deviations from Hardy-Weinberg equilibrium (HWE). Duplicate samples can provide accurate estimates of genotyping error rates, unless systematic genotyping errors have occurred. Although genotyping errors can cause deviations from HWE, these deviations are usually small, and the power to detect them is low except for high rates of genotyping error and/or large sample sizes. An additional problem is that even when deviations from HWE are detected for marker loci, without additional experimentation it is not possible to unequivocally implicate genotyping error as the cause. The power and sample sizes necessary to detect deviations from HWE for single-nucleotide polymorphism (SNP) data are examined for a variety of genotyping error and pseudo-SNP models. For the majority of genotyping models examined, the power is poor to detect deviations from HWE. For example, for 1,000 controls, if an allele with a frequency of 0.1 fails to amplify for 28% of the heterozygous genotypes producing a sample error rate of 0.05, the power is 0.51 to detect a deviation from HWE at an alpha level of 0.05. On the other hand, the detection of deviations from HWE for pseudo-SNPs (paralogous and ectopic sequence variants) for the majority of models examined produces a power of >0.8 for sample sizes as small as 50 individuals.
Collapse
Affiliation(s)
- Suzanne M Leal
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
| |
Collapse
|
7
|
Gordon D, Finch SJ. Factors affecting statistical power in the detection of genetic association. J Clin Invest 2005; 115:1408-18. [PMID: 15931375 PMCID: PMC1137002 DOI: 10.1172/jci24756] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The mapping of disease genes to specific loci has received a great deal of attention in the last decade, and many advances in therapeutics have resulted. Here we review family-based and population-based methods for association analysis. We define the factors that determine statistical power and show how study design and analysis should be designed to maximize the probability of localizing disease genes.
Collapse
Affiliation(s)
- Derek Gordon
- Laboratory of Statistical Genetics, Rockefeller University, New York, New York 10021, USA.
| | | |
Collapse
|
8
|
Lalani SR, Safiullah AM, Fernbach SD, Phillips M, Bacino CA, Molinari LM, Glass NL, Towbin JA, Craigen WJ, Belmont JW. SNP genotyping to screen for a common deletion in CHARGE syndrome. BMC MEDICAL GENETICS 2005; 6:8. [PMID: 15710038 PMCID: PMC550653 DOI: 10.1186/1471-2350-6-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/16/2004] [Accepted: 02/14/2005] [Indexed: 11/24/2022]
Abstract
Background CHARGE syndrome is a complex of birth defects including coloboma, choanal atresia, ear malformations and deafness, cardiac defects, and growth delay. We have previously hypothesized that CHARGE syndrome could be caused by unidentified genomic microdeletion, but no such deletion was detected using short tandem repeat (STR) markers spaced an average of 5 cM apart. Recently, microdeletion at 8q12 locus was reported in two patients with CHARGE, although point mutation in CHD7 on chromosome 8 was the underlying etiology in most of the affected patients. Methods We have extended our previous study by employing a much higher density of SNP markers (3258) with an average spacing of approximately 800 kb. These SNP markers are diallelic and, therefore, have much different properties for detection of deletions than STRs. Results A global error rate estimate was produced based on Mendelian inconsistency. One marker, rs431722 exceeded the expected frequency of inconsistencies, but no deletion could be demonstrated after retesting the 4 inconsistent pedigrees with local flanking markers or by FISH with the corresponding BAC clone. Expected deletion detection (EDD) was used to assess the coverage of specific intervals over the genome by deriving the probability of detecting a common loss of heterozygosity event over each genomic interval. This analysis estimated the fraction of unobserved deletions, taking into account the allele frequencies at the SNPs, the known marker spacing and sample size. Conclusions The results of our genotyping indicate that more than 35% of the genome is included in regions with very low probability of a deletion of at least 2 Mb.
Collapse
Affiliation(s)
- Seema R Lalani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Arsalan M Safiullah
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Susan D Fernbach
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Pediatrics (Cardiology), Baylor College of Medicine, Houston, Texas, USA
| | - Michael Phillips
- Genome Quebec and McGill University Innovation Centre, McGill University, Montreal, Quebec, Canada
| | - Carlos A Bacino
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Laura M Molinari
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Nancy L Glass
- Department of Anesthesiology, Baylor College of Medicine, Houston, Texas, USA
| | - Jeffrey A Towbin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Pediatrics (Cardiology), Baylor College of Medicine, Houston, Texas, USA
| | - William J Craigen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Department of Pediatrics, Baylor College of Medicine, Houston, Texas, USA
| | - John W Belmont
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| |
Collapse
|
9
|
Wiltshire S, Cardon LR, McCarthy MI. Evaluating the results of genomewide linkage scans of complex traits by locus counting. Am J Hum Genet 2002; 71:1175-82. [PMID: 12355401 PMCID: PMC385093 DOI: 10.1086/342976] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2002] [Accepted: 07/16/2002] [Indexed: 11/03/2022] Open
Abstract
The evaluation of results from primary genomewide linkage scans of complex human traits remains an area of importance and considerable debate. Apart from the usual assessment of statistical significance by use of asymptotic and empirical calculations, an additional means of evaluation--based on counting the number of distinct regions showing evidence of linkage--is possible. We have explored the characteristics of such a locus-counting method over a range of experimental conditions typically encountered during genomewide scans for complex trait loci. Under the null hypothesis, factors that have an impact on the informativeness of the data--such as map density, availability of parental data, and completeness of genotyping--are seen to markedly influence the number of regions of excess allele sharing and the empirically derived genomewide significance of the associated LOD score thresholds. In some circumstances, the expected number of regions is less than one-quarter of that predicted under the assumption of a dense map and complete extraction of inheritance information. We have applied this method to a previously analyzed data set--the Warren 2 genome scan for type 2-diabetes susceptibility--and demonstrate that more regions showing evidence for linkage were observed in the primary genome scan than would be expected by chance, across the whole range of LOD scores, even though no single linkage result achieved empirical genomewide statistical significance. Locus counting may be useful in assessing the results from genome scans for complex traits in general, especially because relatively few scans generate evidence for linkage reaching genomewide significance by dense-map criteria. By taking account of the effects of reduced data informativeness on the expected number of regions showing evidence for linkage, a more meaningful, and less conservative, evaluation of the results from such linkage studies is possible.
Collapse
Affiliation(s)
- Steven Wiltshire
- Imperial College Genetics and Genomics Research Institute, Imperial College, London, United Kingdom.
| | | | | |
Collapse
|
10
|
Douglas JA, Skol AD, Boehnke M. Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data. Am J Hum Genet 2002; 70:487-95. [PMID: 11791214 PMCID: PMC419989 DOI: 10.1086/338919] [Citation(s) in RCA: 123] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2001] [Accepted: 11/20/2001] [Indexed: 11/03/2022] Open
Abstract
Gene-mapping studies routinely rely on checking for Mendelian transmission of marker alleles in a pedigree, as a means of screening for genotyping errors and mutations, with the implicit assumption that, if a pedigree is consistent with Mendel's laws of inheritance, then there are no genotyping errors. However, the occurrence of inheritance inconsistencies alone is an inadequate measure of the number of genotyping errors, since the rate of occurrence depends on the number and relationships of genotyped pedigree members, the type of errors, and the distribution of marker-allele frequencies. In this article, we calculate the expected probability of detection of a genotyping error or mutation as an inheritance inconsistency in nuclear-family data, as a function of both the number of genotyped parents and offspring and the marker-allele frequency distribution. Through computer simulation, we explore the sensitivity of our analytic calculations to the underlying error model. Under a random-allele-error model, we find that detection rates are 51%-77% for multiallelic markers and 13%-75% for biallelic markers; detection rates are generally lower when the error occurs in a parent than in an offspring, unless a large number of offspring are genotyped. Errors are especially difficult to detect for biallelic markers with equally frequent alleles, even when both parents are genotyped; in this case, the maximum detection rate is 34% for four-person nuclear families. Error detection in families in which parents are not genotyped is limited, even with multiallelic markers. Given these results, we recommend that additional error checking (e.g., on the basis of multipoint analysis) be performed, beyond routine checking for Mendelian consistency. Furthermore, our results permit assessment of the plausibility of an observed number of inheritance inconsistencies for a family, allowing the detection of likely pedigree-rather than genotyping-errors in the early stages of a genome scan. Such early assessments are valuable in either the targeting of families for resampling or discontinued genotyping.
Collapse
Affiliation(s)
- Julie A Douglas
- Department of Human Genetics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109-0618, USA.
| | | | | |
Collapse
|
11
|
Palmer LJ, Cookson WO. Using single nucleotide polymorphisms as a means to understanding the pathophysiology of asthma. Respir Res 2001; 2:102-12. [PMID: 11686872 PMCID: PMC59575 DOI: 10.1186/rr45] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2001] [Revised: 02/01/2001] [Accepted: 02/09/2001] [Indexed: 11/10/2022] Open
Abstract
Asthma is the most common chronic childhood disease in the developed nations, and is a complex disease that has high social and economic costs. Studies of the genetic etiology of asthma offer a way of improving our understanding of its pathogenesis, with the goal of improving preventive strategies, diagnostic tools, and therapies. Considerable effort and expense have been expended in attempts to detect specific polymorphisms in genetic loci contributing to asthma susceptibility. Concomitantly, the technology for detecting single nucleotide polymorphisms (SNPs) has undergone rapid development, extensive catalogues of SNPs across the genome have been constructed, and SNPs have been increasingly used as a method of investigating the genetic etiology of complex human diseases. This paper reviews both current and potential future contributions of SNPs to our understanding of asthma pathophysiology.
Collapse
Affiliation(s)
- L J Palmer
- Channing Laboratory, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA.
| | | |
Collapse
|