1
|
Xu W, Ma J, Greenwood CMT, Paterson AD, Bull SB. Model-Free Linkage Analysis of a Binary Trait. Methods Mol Biol 2017; 1666:343-373. [PMID: 28980254 DOI: 10.1007/978-1-4939-7274-6_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Genetic linkage analysis aims to detect chromosomal regions containing genetic variants that influence risk of specific inherited diseases. The presence of linkage is indicated when a disease or trait cosegregates through the families with genetic markers at a particular region of the genome. Two main types of genetic linkage analysis are in common use, namely model-based linkage analysis and model-free linkage analysis. In this chapter, we focus solely on the latter type and specifically on binary traits or phenotypes, such as the presence or absence of a specific disease. Model-free linkage analysis is based on allele-sharing, where patterns of genetic similarity among affected relatives are compared to chance expectations. Because the model-free methods do not require the specification of the inheritance parameters of a genetic model, they are preferred by many researchers at early stages in the study of a complex disease. We introduce the history of model-free linkage analysis in Subheading 1. Table 1 describes a standard model-free linkage analysis workflow. We describe three popular model-free linkage analysis methods, the nonparametric linkage (NPL) statistic, the affected sib-pair (ASP) likelihood ratio test, and a likelihood approach for pedigrees. The theory behind each linkage test is described in this section together with a simple example of the relevant calculations. Table 4 provides a summary of popular genetic analysis software packages that implement model-free linkage models. In Subheading 2, we work through the methods on a rich example providing sample software code and output. Subheading 3 contains notes with additional details on various topics that may need further consideration during analysis.
Collapse
Affiliation(s)
- Wei Xu
- Department of Biostatistics, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada, M5G 2M9
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada, M5T 3M7
| | - Jin Ma
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Celia M T Greenwood
- Lady Davis Research Institute, Jewish General Hospital, Montréal, QC, Canada, H3T 1E2
- Department of Oncology and Department of Epidemiology, Biostatistics & Occupational Health, McGill University, Montréal, QC, Canada, H4A 3T2 and H3A 1A2
| | - Andrew D Paterson
- Genetics and Genome Biology, The Hospital for Sick Children, Toronto, ON, Canada, M5G 0A4
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada, M5T 3M7
| | - Shelley B Bull
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, 60 Murray St., Box 18, Toronto, ON, Canada, M5T 3L9.
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada, M5T 3M7.
| |
Collapse
|
2
|
Chung RH, Tsai WY, Kang CY, Yao PJ, Tsai HJ, Chen CH. FamPipe: An Automatic Analysis Pipeline for Analyzing Sequencing Data in Families for Disease Studies. PLoS Comput Biol 2016; 12:e1004980. [PMID: 27272119 PMCID: PMC4894624 DOI: 10.1371/journal.pcbi.1004980] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 05/12/2016] [Indexed: 11/18/2022] Open
Abstract
In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.
Collapse
Affiliation(s)
- Ren-Hua Chung
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- * E-mail:
| | - Wei-Yun Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Chen-Yu Kang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Po-Ju Yao
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Hui-Ju Tsai
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- Department of Public Health, China Medical University, Taichung, Taiwan
- Department of Pediatrics, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Chia-Hsiang Chen
- Department of Psychiatry, Chang Gung Memorial Hospital-Linkou, Gueishan, Taoyuan, Taiwan
- Department and Graduate Institute of Biomedical Sciences, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
3
|
Ray D, Li X, Pan W, Pankow JS, Basu S. A Bayesian Partitioning Model for the Detection of Multilocus Effects in Case-Control Studies. Hum Hered 2015; 79:69-79. [PMID: 26044550 DOI: 10.1159/000369858] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 11/12/2014] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWASs) have identified hundreds of genetic variants associated with complex diseases, but these variants appear to explain very little of the disease heritability. The typical single-locus association analysis in a GWAS fails to detect variants with small effect sizes and to capture higher-order interaction among these variants. Multilocus association analysis provides a powerful alternative by jointly modeling the variants within a gene or a pathway and by reducing the burden of multiple hypothesis testing in a GWAS. METHODS Here, we propose a powerful and flexible dimension reduction approach to model multilocus association. We use a Bayesian partitioning model which clusters SNPs according to their direction of association, models higher-order interactions using a flexible scoring scheme and uses posterior marginal probabilities to detect association between the SNP set and the disease. RESULTS We illustrate our method using extensive simulation studies and applying it to detect multilocus interaction in Atherosclerosis Risk in Communities (ARIC) GWAS with type 2 diabetes. CONCLUSION We demonstrate that our approach has better power to detect multilocus interactions than several existing approaches. When applied to the ARIC study dataset with 9,328 individuals to study gene-based associations for type 2 diabetes, our method identified some novel variants not detected by conventional single-locus association analyses.
Collapse
Affiliation(s)
- Debashree Ray
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minn., USA
| | | | | | | | | |
Collapse
|
4
|
Abstract
For many years, linkage analysis was the primary tool used for the genetic mapping of Mendelian and complex traits with familial aggregation. Linkage analysis was largely supplanted by the wide adoption of genome-wide association studies (GWASs). However, with the recent increased use of whole-genome sequencing (WGS), linkage analysis is again emerging as an important and powerful analysis method for the identification of genes involved in disease aetiology, often in conjunction with WGS filtering approaches. Here, we review the principles of linkage analysis and provide practical guidelines for carrying out linkage studies using WGS data.
Collapse
Affiliation(s)
- Jurg Ott
- 1] Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, 16 Lincui Road, Beijing 100101, China. [2] Laboratory of Statistical Genetics, Rockefeller University, 1230 York Avenue, New York, New York 10065, USA
| | - Jing Wang
- Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, 16 Lincui Road, Beijing 100101, China
| | - Suzanne M Leal
- Center for Statistical Genetics, Department of Human and Molecular Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| |
Collapse
|
5
|
Abstract
Genetic linkage analysis aims to detect chromosomal regions containing genes that influence risk of specific inherited diseases. The presence of linkage is indicated when a disease or trait cosegregates through the families with genetic markers at a particular region of the genome. Two main types of genetic linkage analysis are in common use, namely model-based linkage analysis and model-free linkage analysis. In this chapter, we focus solely on the latter type and specifically on binary traits or phenotypes, such as the presence or absence of a specific disease. Model-free linkage analysis is based on allele-sharing, where patterns of genetic similarity among affected relatives are compared to chance expectations. Because the model-free methods do not require the specification of the inheritance parameters of a genetic model, they are preferred by many researchers at early stages in the study of a complex disease. We introduce the history of model-free linkage analysis in Subheading 1. Table 1 describes a standard model-free linkage analysis workflow. We describe three popular model-free linkage analysis methods, the nonparametric linkage (NPL) statistic, the affected sib-pair (ASP) likelihood ratio test, and a likelihood approach for pedigrees. The theory behind each linkage test is described in this section, together with a simple example of the relevant calculations. Table 4 provides a summary of popular genetic analysis software packages that implement model-free linkage models. In Subheading 2, we work through the methods on a rich example providing sample software code and output. Subheading 3 contains notes with additional details on various topics that may need further consideration during analysis.
Collapse
Affiliation(s)
- Wei Xu
- Department of Biostatistics, Princess Margaret Hospital, Toronto, ON, Canada
| | | | | | | |
Collapse
|
6
|
Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 2011; 35:606-19. [PMID: 21769936 DOI: 10.1002/gepi.20609] [Citation(s) in RCA: 188] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Revised: 03/23/2011] [Accepted: 06/03/2011] [Indexed: 01/31/2023]
Abstract
In anticipation of the availability of next-generation sequencing data, there is increasing interest in investigating association between complex traits and rare variants (RVs). In contrast to association studies for common variants (CVs), due to the low frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent development of several new tests for analyzing RVs, most of which are based on the idea of pooling/collapsing RVs. However, there is a lack of evaluations of, and thus guidance on the use of, existing tests. Here we provide a comprehensive comparison of various statistical tests using simulated data. We consider both independent and correlated rare mutations, and representative tests for both CVs and RVs. As expected, if there are no or few non-causal (i.e. neutral or non-associated) RVs in a locus of interest while the effects of causal RVs on the trait are all (or mostly) in the same direction (i.e. either protective or deleterious, but not both), then the simple pooled association tests (without selecting RVs and their association directions) and a new test called kernel-based adaptive clustering (KBAC) perform similarly and are most powerful; KBAC is more robust than simple pooled association tests in the presence of non-causal RVs; however, as the number of non-causal CVs increases and/or in the presence of opposite association directions, the winners are two methods originally proposed for CVs and a new test called C-alpha test proposed for RVs, each of which can be regarded as testing on a variance component in a random-effects model. Interestingly, several methods based on sequential model selection (i.e. selecting causal RVs and their association directions), including two new methods proposed here, perform robustly and often have statistical power between those of the above two classes.
Collapse
Affiliation(s)
- Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455-0392, USA
| | | |
Collapse
|
7
|
Basu S, Pan W, Oetting WS. A dimension reduction approach for modeling multi-locus interaction in case-control studies. Hum Hered 2011; 71:234-45. [PMID: 21734407 DOI: 10.1159/000328842] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2010] [Accepted: 04/12/2011] [Indexed: 01/01/2023] Open
Abstract
Studying one locus or one single nucleotide polymorphism (SNP) at a time may not be sufficient to understand complex diseases because they are unlikely to result from the effect of only one SNP. Each SNP alone may have little or no effect on the risk of the disease, but together they may increase the risk substantially. Analyses focusing on individual SNPs ignore the possibility of interaction among SNPs. In this paper, we propose a parsimonious model to assess the joint effect of a group of SNPs in a case-control study. The model implements a data reduction strategy within a likelihood framework and uses a test to assess the statistical significance of the effect of the group of SNPs on the binary trait. The primary advantage of the proposed approach is that the dimension reduction technique produces a test statistic with degrees of freedom significantly lower than a multiple logistic regression with only main effects of the SNPs, and our parsimonious model can incorporate the possibility of interaction among the SNPs. Moreover, the proposed approach estimates the direction of association of each SNP with the disease and provides an estimate of the average effect of the group of SNPs positively and negatively associated with the disease in the given SNP set. We illustrate the proposed model on simulated and real data, and compare its performance with a few other existing approaches. Our proposed approach appeared to outperform the other approaches for independent SNPs in our simulation studies.
Collapse
Affiliation(s)
- Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, USA. saonli @ umn.edu
| | | | | |
Collapse
|
8
|
Basu S, Pankow JS. An Alternative Model for Quantitative Trait Loci (QTL) Analysis in General Pedigrees. Ann Hum Genet 2010; 75:292-304. [DOI: 10.1111/j.1469-1809.2010.00619.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|