1
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
2
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
3
|
Fan HY, Lin WY, Lu TP, Chen YY, Hsu JB, Yu SL, Su TC, Lin HJ, Chen YC, Chien KL. Targeted next-generation sequencing for genetic variants of left ventricular mass status among community-based adults in Taiwan. Front Genet 2023; 13:1064980. [PMID: 36712865 PMCID: PMC9879005 DOI: 10.3389/fgene.2022.1064980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 12/19/2022] [Indexed: 01/13/2023] Open
Abstract
Background: Left ventricular mass is a highly heritable disease. Previous studies have suggested common genetic variants to be associated with left ventricular mass; however, the roles of rare variants are still unknown. We performed targeted next-generation sequencing using the TruSight Cardio panel, which provides comprehensive coverage of 175 genes with known associations to 17 inherited cardiac conditions. Methods: We conducted next-generation sequencing using the Illumina TruSight Cardiomyopathy Target Genes platform using the 5% and 95% extreme values of left ventricular mass from community-based participants. After removing poor-quality next-generation sequencing subjects, including call rate <98% and Mendelian errors, 144 participants were used for the analysis. We performed downstream analysis, including quality control, alignment, coverage length, and annotation; after setting filtering criteria for depths more than 60, we found a total of 144 samples and 165 target genes for further analysis. Results: Of the 12,287 autosomal variants, most had minor allele frequencies of <1% (rare frequency), and variants had minor allele frequencies ranging from 1% to 5%. In the multi-allele variant analyses, 16 loci in 15 genes were significant using the false discovery rate of less than .1. In addition, gene-based analyses using continuous and binary outcomes showed that three genes (CASQ2, COL5A1, and FXN) remained to be associated with left ventricular mass status. One single-nucleotide polymorphism (rs7538337) was enriched for the CASQ2 gene expressed in aorta artery (p = 4.6 × 10-18), as was another single-nucleotide polymorphism (rs11103536) for the COL5A1 gene expressed in aorta artery (p = 2.0 × 10-9). Among the novel genes discovered, CASQ2, COL5A1, and FXN are within a protein-protein interaction network with known cardiovascular genes. Conclusion: We clearly demonstrated candidate genes to be associated with left ventricular mass. Further studies to characterize the target genes and variants for their functional mechanisms are warranted.
Collapse
Affiliation(s)
- Hsien-Yu Fan
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Yun-Yu Chen
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan,Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan,Cardiovascular Center, Taichung Veterans General Hospital, Taichung, Taiwan,Heart Rhythm Center, Division of Cardiology, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan,Cardiovascular Research Center, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Justin BoKai Hsu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Sung-Liang Yu
- Department of Clinical Laboratory Sciences and Medical Biotechnology, College of Medicine, Taipei, Taiwan
| | - Ta-Chen Su
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Hung-Ju Lin
- Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Yang-Ching Chen
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan,Department of Family Medicine, Taipei Medical University Hospital, Taipei Medical University, Taipei, Taiwan,School of Nutrition and Health Sciences, College of Nutrition, Taipei Medical University, Taipei, Taiwan,Graduate Institute of Metabolism and Obesity Sciences, Taipei Medical University, Taipei, Taiwan
| | - Kuo-Liong Chien
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan,Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan,*Correspondence: Kuo-Liong Chien,
| |
Collapse
|
4
|
Zhou J, Li S, Zhou Y, Sheng X. A two-stage testing strategy for detecting genes×environment interactions in association studies. G3-GENES GENOMES GENETICS 2021; 11:6312559. [PMID: 34568910 PMCID: PMC8496220 DOI: 10.1093/g3journal/jkab220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 06/22/2021] [Indexed: 11/15/2022]
Abstract
Identifying gene×environment (G×E) interactions, especially when rare variants are included in genome-wide association studies, is a major challenge in statistical genetics. However, the detection of G×E interactions is very important for understanding the etiology of complex diseases. Although currently some statistical methods have been developed to detect the interactions between genes and environment, the detection of the interactions for the case of rare variants is still limited. Therefore, it is particularly important to develop a new method to detect the interactions between genes and environment for rare variants. In this study, we extend an existing method of adaptive combination of P-values (ADA) and design a novel strategy (called iSADA) for testing the effects of G×E interactions for rare variants. We propose a new two-stage test to detect the interactions between genes and environment in a certain region of a chromosome or even for the whole genome. First, the score statistic is used to test the associations between trait value and the interaction terms of genes and environment and obtain the original P-values. Then, based on the idea of the ADA method, we further construct a full test statistic via the P-values of the preliminary tests in the first stage, so that we can comprehensively test the interactions between genes and environment in the considered genome region. Simulation studies are conducted to compare our proposed method with other existing methods. The results show that the iSADA has higher power than other methods in each case. A GAW17 data set is also applied to illustrate the applicability of the new method.
Collapse
Affiliation(s)
- Jiabin Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China
| | - Shitao Li
- Department of Basic Course, Shenyang University of Technology, Liaoyang 111000, China
| | - Ying Zhou
- Department of Statistics, School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China
| | - Xiaona Sheng
- School of Information Engineering, Harbin University, Harbin 150086, China
| |
Collapse
|
5
|
Lim E, Chen H, Dupuis J, Liu CT. A unified method for rare variant analysis of gene-environment interactions. Stat Med 2020; 39:801-813. [PMID: 31799744 PMCID: PMC7261513 DOI: 10.1002/sim.8446] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 11/19/2019] [Accepted: 11/21/2019] [Indexed: 01/17/2023]
Abstract
Advanced technology in whole-genome sequencing has offered the opportunity to comprehensively investigate the genetic contribution, particularly rare variants, to complex traits. Several region-based tests have been developed to jointly model the marginal effect of rare variants, but methods to detect gene-environment (GE) interactions are underdeveloped. Identifying the modification effects of environmental factors on genetic risk poses a considerable challenge. To tackle this challenge, we develop a method to detect GE interactions for rare variants using generalized linear mixed effect model. The proposed method can accommodate either binary or continuous traits in related or unrelated samples. Under this model, genetic main effects, GE interactions, and sample relatedness are modeled as random effects. We adopt a kernel-based method to leverage the joint information across rare variants and implement variance component score tests to reduce the computational burden. Our simulation studies of continuous and binary traits show that the proposed method maintains correct type I error rates and appropriate power under various scenarios, such as genotype main effects and GE interaction effects in opposite directions and varying the proportion of causal variants in the model. We apply our method in the Framingham Heart Study to test GE interaction of smoking on body mass index or overweight status and replicate the Cholinergic Receptor Nicotinic Beta 4 gene association reported in previous large consortium meta-analysis of single nucleotide polymorphism-smoking interaction. Our proposed set-based GE test is computationally efficient and is applicable to both binary and continuous phenotypes, while appropriately accounting for familial or cryptic relatedness.
Collapse
Affiliation(s)
- Elise Lim
- Department of Biostatistics, Boston University, Boston, Massachusetts
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Josée Dupuis
- Department of Biostatistics, Boston University, Boston, Massachusetts
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University, Boston, Massachusetts
| |
Collapse
|
6
|
Leongamornlert DA, Saunders EJ, Wakerell S, Whitmore I, Dadaev T, Cieza-Borrella C, Benafif S, Brook MN, Donovan JL, Hamdy FC, Neal DE, Muir K, Govindasami K, Conti DV, Kote-Jarai Z, Eeles RA. Germline DNA Repair Gene Mutations in Young-onset Prostate Cancer Cases in the UK: Evidence for a More Extensive Genetic Panel. Eur Urol 2019; 76:329-337. [PMID: 30777372 PMCID: PMC6695475 DOI: 10.1016/j.eururo.2019.01.050] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 01/31/2019] [Indexed: 12/30/2022]
Abstract
BACKGROUND Rare germline mutations in DNA repair genes are associated with prostate cancer (PCa) predisposition and prognosis. OBJECTIVE To quantify the frequency of germline DNA repair gene mutations in UK PCa cases and controls, in order to more comprehensively evaluate the contribution of individual genes to overall PCa risk and likelihood of aggressive disease. DESIGN, SETTING, AND PARTICIPANTS We sequenced 167 DNA repair and eight PCa candidate genes in a UK-based cohort of 1281 young-onset PCa cases (diagnosed at ≤60yr) and 1160 selected controls. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS Gene-level SKAT-O and gene-set adaptive combination of p values (ADA) analyses were performed separately for cases versus controls, and aggressive (Gleason score ≥8, n=201) versus nonaggressive (Gleason score ≤7, n=1048) cases. RESULTS AND LIMITATIONS We identified 233 unique protein truncating variants (PTVs) with minor allele frequency <0.5% in controls in 97 genes. The total proportion of PTV carriers was higher in cases than in controls (15% vs 12%, odds ratio [OR]=1.29, 95% confidence interval [CI] 1.01-1.64, p=0.036). Gene-level analyses selected NBN (pSKAT-O=2.4×10-4) for overall risk and XPC (pSKAT-O=1.6×10-4) for aggressive disease, both at candidate-level significance (p<3.1×10-4 and p<3.4×10-4, respectively). Gene-set analysis identified a subset of 20 genes associated with increased PCa risk (OR=3.2, 95% CI 2.1-4.8, pADA=4.1×10-3) and four genes that increased risk of aggressive disease (OR=11.2, 95% CI 4.6-27.7, pADA=5.6×10-3), three of which overlap the predisposition gene set. CONCLUSIONS The union of the gene-level and gene-set-level analyses identified 23 unique DNA repair genes associated with PCa predisposition or risk of aggressive disease. These findings will help facilitate the development of a PCa-specific sequencing panel with both predictive and prognostic potential. PATIENT SUMMARY This large sequencing study assessed the rate of inherited DNA repair gene mutations between prostate cancer patients and disease-free men. A panel of 23 genes was identified, which may improve risk prediction or treatment pathways in future clinical practice.
Collapse
Affiliation(s)
- Daniel A Leongamornlert
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Edward J Saunders
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Sarah Wakerell
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Ian Whitmore
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Tokhir Dadaev
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Clara Cieza-Borrella
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Sarah Benafif
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Mark N Brook
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - Jenny L Donovan
- School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Freddie C Hamdy
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK; Faculty of Medical Science, John Radcliffe Hospital, University of Oxford, Oxford, UK
| | - David E Neal
- Department of Oncology, Addenbrooke's Hospital, University of Cambridge, Cambridge, UK; Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre, Cambridge, UK
| | - Kenneth Muir
- Division of Population Health, University of Manchester, Manchester, UK
| | - Koveela Govindasami
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK
| | - David V Conti
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, USA
| | - Zsofia Kote-Jarai
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK.
| | - Rosalind A Eeles
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, UK; The Royal Marsden NHS Foundation Trust, London, UK
| |
Collapse
|
7
|
Yan Q, Liu N, Forno E, Canino G, Celedón JC, Chen W. An integrative association method for omics data based on a modified Fisher's method with application to childhood asthma. PLoS Genet 2019; 15:e1008142. [PMID: 31063461 PMCID: PMC6524814 DOI: 10.1371/journal.pgen.1008142] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 05/17/2019] [Accepted: 04/16/2019] [Indexed: 02/07/2023] Open
Abstract
The development of high-throughput biotechnologies allows the collection of omics data to study the biological mechanisms underlying complex diseases at different levels, such as genomics, epigenomics, and transcriptomics. However, each technology is designed to collect a specific type of omics data. Thus, the association between a disease and one type of omics data is usually tested individually, but this strategy is suboptimal. To better articulate biological processes and increase the consistency of variant identification, omics data from various platforms need to be integrated. In this report, we introduce an approach that uses a modified Fisher's method (denoted as Omnibus-Fisher) to combine separate p-values of association testing for a trait and SNPs, DNA methylation markers, and RNA sequencing, calculated by kernel machine regression into an overall gene-level p-value to account for correlation between omics data. To consider all possible disease models, we extend Omnibus-Fisher to an optimal test by using perturbations. In our simulations, a usual Fisher's method has inflated type I error rates when directly applied to correlated omics data. In contrast, Omnibus-Fisher preserves the expected type I error rates. Moreover, Omnibus-Fisher has increased power compared to its optimal version when the true disease model involves all types of omics data. On the other hand, the optimal Omnibus-Fisher is more powerful than its regular version when only one type of data is causal. Finally, we illustrate our proposed method by analyzing whole-genome genotyping, DNA methylation data, and RNA sequencing data from a study of childhood asthma in Puerto Ricans.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
- * E-mail: (QY); (WC)
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Indiana University Bloomington, Bloomington, IN
| | - Erick Forno
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
| | - Glorisa Canino
- Behavioral Sciences Research Institute, University of Puerto Rico, San Juan, PR
| | - Juan C. Celedón
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pediatric Pulmonary Medicine, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, PA
- * E-mail: (QY); (WC)
| |
Collapse
|
8
|
Persani L, de Filippis T, Colombo C, Gentilini D. GENETICS IN ENDOCRINOLOGY: Genetic diagnosis of endocrine diseases by NGS: novel scenarios and unpredictable results and risks. Eur J Endocrinol 2018; 179:R111-R123. [PMID: 29880707 DOI: 10.1530/eje-18-0379] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 06/06/2018] [Indexed: 12/17/2022]
Abstract
The technological advancements in genetics produced a profound impact on the research and diagnostics of non-communicable diseases. The availability of next-generation sequencing (NGS) allowed the identification of novel candidate genes but also an in-depth modification of the understanding of the architecture of several endocrine diseases. Several different NGS approaches are available allowing the sequencing of several regions of interest or the whole exome or genome (WGS, WES or targeted NGS), with highly variable costs, potentials and limitations that should be clearly known before designing the experiment. Here, we illustrate the NGS scenario, describe the advantages and limitations of the different protocols and review some of the NGS results obtained in different endocrine conditions. We finally give insights on the terminology and requirements for the implementation of NGS in research and diagnostic labs.
Collapse
Affiliation(s)
- Luca Persani
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
- Labs of Endocrine and Metabolic Research, IRCCS Istituto Auxologico Italiano, Milan, Italy
| | - Tiziana de Filippis
- Labs of Endocrine and Metabolic Research, IRCCS Istituto Auxologico Italiano, Milan, Italy
| | - Carla Colombo
- Labs of Endocrine and Metabolic Research, IRCCS Istituto Auxologico Italiano, Milan, Italy
| | - Davide Gentilini
- Labs of Molecular Biology Research, IRCCS Istituto Auxologico Italiano, Milan, Italy
- Labs of University of Pavia, Pavia, Italy
| |
Collapse
|
9
|
Novel Methods for Family-Based Genetic Studies. Methods Mol Biol 2018. [PMID: 29876895 DOI: 10.1007/978-1-4939-7868-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
The recent development of microarray and sequencing technology allows identification of disease susceptibility genes. Although the genome-wide association studies (GWAS) have successfully identified many genetic markers related to human diseases, the traditional statistical methods are not powerful to detect rare genetic markers. The rare genetic markers are usually grouped together and tested at the set level. One of such methods is the sequence kernel association test (SKAT), which has been commonly used in the rare genetic marker analysis. In recent publications, SKAT has been extended to be applicable for family-based rare variant analysis. Here, I present three published statistical approaches for family-based rare variant analysis for: 1. continuous traits, 2. binary traits, and 3. multiple correlated traits.
Collapse
|
10
|
Chen L, Wang Y, Zhou Y. Association analysis of multiple traits by an approach of combining P values. J Genet 2018; 97:79-85. [PMID: 29666327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.
Collapse
Affiliation(s)
- Lili Chen
- Department of Mathematics, School of Sciences, Harbin Institute of Technology, Harbin 150001, People's Republic of China.
| | | | | |
Collapse
|
11
|
Chen L, Wang Y, Zhou Y. Association analysis of multiple traits by an approach of combining
$$P$$
P
values. J Genet 2018. [DOI: 10.1007/s12041-018-0885-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
12
|
Sarnowski C, Satizabal CL, DeCarli C, Pitsillides AN, Cupples LA, Vasan RS, Wilson JG, Bis JC, Fornage M, Beiser AS, DeStefano AL, Dupuis J, Seshadri S. Whole genome sequence analyses of brain imaging measures in the Framingham Study. Neurology 2017; 90:e188-e196. [PMID: 29282330 PMCID: PMC5772158 DOI: 10.1212/wnl.0000000000004820] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 09/22/2017] [Indexed: 11/15/2022] Open
Abstract
Objective We sought to identify rare variants influencing brain imaging phenotypes in the Framingham Heart Study by performing whole genome sequence association analyses within the Trans-Omics for Precision Medicine Program. Methods We performed association analyses of cerebral and hippocampal volumes and white matter hyperintensity (WMH) in up to 2,180 individuals by testing the association of rank-normalized residuals from mixed-effect linear regression models adjusted for sex, age, and total intracranial volume with individual variants while accounting for familial relatedness. We conducted gene-based tests for rare variants using (1) a sliding-window approach, (2) a selection of functional exonic variants, or (3) all variants. Results We detected new loci in 1p21 for cerebral volume (minor allele frequency [MAF] 0.005, p = 10−8) and in 16q23 for hippocampal volume (MAF 0.05, p = 2.7 × 10−8). Previously identified associations in 12q24 for hippocampal volume (rs7294919, p = 4.4 × 10−4) and in 17q25 for WMH (rs7214628, p = 2.0 × 10−3) were confirmed. Gene-based tests detected associations (p ≤ 2.3 × 10−6) in new loci for cerebral (5q13, 8p12, 9q31, 13q12-q13, 15q24, 17q12, 19q13) and hippocampal volumes (2p12) and WMH (3q13, 4p15) including Alzheimer disease– (UNC5D) and Parkinson disease–associated genes (GBA). Pathway analyses evidenced enrichment of associated genes in immunity, inflammation, and Alzheimer disease and Parkinson disease pathways. Conclusions Whole genome sequence–wide search reveals intriguing new loci associated with brain measures. Replication of novel loci is needed to confirm these findings.
Collapse
Affiliation(s)
- Chloé Sarnowski
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston.
| | - Claudia L Satizabal
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Charles DeCarli
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Achilleas N Pitsillides
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - L Adrienne Cupples
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Ramachandran S Vasan
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - James G Wilson
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Joshua C Bis
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Myriam Fornage
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Alexa S Beiser
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Anita L DeStefano
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Josée Dupuis
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | - Sudha Seshadri
- From the Department of Epidemiology (C.S., L.A.C., A.S.B., A.L.D., J.D.), Boston University School of Public Health; Boston University and the NHLBI's Framingham Heart Study (C.L.S., A.N.P., L.A.C., R.S.V., A.S.B., A.L.D., J.D., S.S.); Departments of Neurology (C.L.S., A.S.B., A.L.D., S.S.) and Cardiology, Preventive Medicine & Epidemiology (R.S.V.), Boston University School of Medicine, Boston, MA; Department of Neurology and Center for Neuroscience (C.D.), University of California at Davis; Department of Physiology and Biophysics (J.G.W.), University of Mississippi Medical Center, Jackson; Cardiovascular Health Research Unit (J.C.B.), Department of Medicine, University of Washington, Seattle; and Institute of Molecular Medicine (M.F.), University of Texas Health Science Center, Houston
| | | | | |
Collapse
|
13
|
Association detection between ordinal trait and rare variants based on adaptive combination of P values. J Hum Genet 2017; 63:37-45. [PMID: 29215083 DOI: 10.1038/s10038-017-0354-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Revised: 08/19/2017] [Accepted: 09/06/2017] [Indexed: 12/31/2022]
Abstract
Next-generation sequencing technology not only presents a new method for the detection of human genomic structural variation, but also provides a large number of genetic data of rare variants for us. Currently, how to detect association between human complex diseases and rare variants using genetical data has attracted extensive attention. In the field of medicine, many people's health and disease conditions are measured by ordinal response variables, namely, the trait value reflects the development stage or severity of a certain condition. However, most existing methods to test for association between rare variants and complex diseases are designed to deal with dichotomous or quantitative traits. Association analysis methods of ordinal traits are relatively fewer, and considering ordinal traits as dichotomous and quantitative traits will inevitably lose some valuable information in the original data. Therefore, in this paper, we extend an existing method of adaptive combination of P values (ADA) and propose a new method of association analysis for ordinal trait based on it (called OR-ADA) to test for possible association between ordinal trait and rare variants. In our method, we establish a cumulative logistic regression model, in which the regression coefficients are estimated by the Newton-Raphson algorithm and the likelihood ratio test is used to test the association. Through a large number of simulation studies and an example, we demonstrate the performance of the new method and compare it with several methods. The analysis results show that the OR-ADA strategy is robust to the signs of effects of causal variants and more powerful under many scenarios.
Collapse
|
14
|
Adaptive combination of Bayes factors as a powerful method for the joint analysis of rare and common variants. Sci Rep 2017; 7:13858. [PMID: 29066733 PMCID: PMC5654754 DOI: 10.1038/s41598-017-13177-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 09/21/2017] [Indexed: 11/30/2022] Open
Abstract
Multi-marker association tests can be more powerful than single-locus analyses because they aggregate the variant information within a gene/region. However, combining the association signals of multiple markers within a gene/region may cause noise due to the inclusion of neutral variants, which usually compromises the power of a test. To reduce noise, the “adaptive combination of P-values” (ADA) method removes variants with larger P-values. However, when both rare and common variants are considered, it is not optimal to truncate variants according to their P-values. An alternative summary measure, the Bayes factor (BF), is defined as the ratio of the probability of the data under the alternative hypothesis to that under the null hypothesis. The BF quantifies the “relative” evidence supporting the alternative hypothesis. Here, we propose an “adaptive combination of Bayes factors” (ADABF) method that can be directly applied to variants with a wide spectrum of minor allele frequencies. The simulations show that ADABF is more powerful than single-nucleotide polymorphism (SNP)-set kernel association tests and burden tests. We also analyzed 1,109 case-parent trios from the Schizophrenia Trio Genomic Research in Taiwan. Three genes on chromosome 19p13.2 were found to be associated with schizophrenia at the suggestive significance level of 5 × 10−5.
Collapse
|
15
|
Persyn E, Karakachoff M, Le Scouarnec S, Le Clézio C, Campion D, Consortium FE, Schott JJ, Redon R, Bellanger L, Dina C. DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease. PLoS One 2017; 12:e0179364. [PMID: 28742119 PMCID: PMC5524342 DOI: 10.1371/journal.pone.0179364] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 05/29/2017] [Indexed: 01/01/2023] Open
Abstract
Next-generation sequencing technologies made it possible to assay the effect of rare variants on complex diseases. As an extension of the "common disease-common variant" paradigm, rare variant studies are necessary to get a more complete insight into the genetic architecture of human traits. Association studies of these rare variations show new challenges in terms of statistical analysis. Due to their low frequency, rare variants must be tested by groups. This approach is then hindered by the fact that an unknown proportion of the variants could be neutral. The risk level of a rare variation may be determined by its impact but also by its position in the protein sequence. More generally, the molecular mechanisms underlying the disease architecture may involve specific protein domains or inter-genic regulatory regions. While a large variety of methods are optimizing functionality weights for each single marker, few evaluate variant position differences between cases and controls. Here, we propose a test called DoEstRare, which aims to simultaneously detect clusters of disease risk variants and global allele frequency differences in genomic regions. This test estimates, for cases and controls, variant position densities in the genetic region by a kernel method, weighted by a function of allele frequencies. We compared DoEstRare with previously published strategies through simulation studies as well as re-analysis of real datasets. Based on simulation under various scenarios, DoEstRare was the sole to consistently show highest performance, in terms of type I error and power both when variants were clustered or not. DoEstRare was also applied to Brugada syndrome and early-onset Alzheimer's disease data and provided complementary results to other existing tests. DoEstRare, by integrating variant position information, gives new opportunities to explain disease susceptibility. DoEstRare is implemented in a user-friendly R package.
Collapse
Affiliation(s)
- Elodie Persyn
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
| | - Matilde Karakachoff
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | | | - Camille Le Clézio
- Inserm U1079, Rouen University, Normandy Center for Genomic Medicine and Personalized Medicine, Normandy University, Rouen, France
| | - Dominique Campion
- Inserm U1079, Rouen University, Normandy Center for Genomic Medicine and Personalized Medicine, Normandy University, Rouen, France
| | | | - Jean-Jacques Schott
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | - Richard Redon
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
| | - Lise Bellanger
- Laboratoire de Mathématiques Jean Leray, UMR CNRS 6629, Nantes, France
- * E-mail: (LB); (CD)
| | - Christian Dina
- INSERM, CNRS, UNIV Nantes, l’institut du thorax, Nantes, France
- CHU Nantes, l’institut du thorax, Nantes, France
- * E-mail: (LB); (CD)
| |
Collapse
|
16
|
Abstract
Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm.
Collapse
Affiliation(s)
| | - Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica
- Corresponding author: Hsin-Chou Yang, Institute of Statistical Science, Academia Sinica, No 128, Academia Road, Section 2, Nankang, Taipei 115, Taiwan. Tel.: 886-2-27835611 ext. 113; Fax: 886-2-27831523; E-mail:
| |
Collapse
|
17
|
Lin WY, Liang YC. Conditioning adaptive combination of P-values method to analyze case-parent trios with or without population controls. Sci Rep 2016; 6:28389. [PMID: 27341039 PMCID: PMC4920030 DOI: 10.1038/srep28389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 06/02/2016] [Indexed: 11/24/2022] Open
Abstract
Detection of rare causal variants can help uncover the etiology of complex diseases. Recruiting case-parent trios is a popular study design in family-based studies. If researchers can obtain data from population controls, utilizing them in trio analyses can improve the power of methods. The transmission disequilibrium test (TDT) is a well-known method to analyze case-parent trio data. It has been extended to rare-variant association testing (abbreviated as "rvTDT"), with the flexibility to incorporate population controls. The rvTDT method is robust to population stratification. However, power loss may occur in the conditioning process. Here we propose a "conditioning adaptive combination of P-values method" (abbreviated as "conADA"), to analyze trios with/without unrelated controls. By first truncating the variants with larger P-values, we decrease the vulnerability of conADA to the inclusion of neutral variants. Moreover, because the test statistic is developed by conditioning on parental genotypes, conADA generates valid statistical inference in the presence of population stratification. With regard to statistical methods for next-generation sequencing data analyses, validity may be hampered by population stratification, whereas power may be affected by the inclusion of neutral variants. We recommend conADA for its robustness to these two factors (population stratification and the inclusion of neutral variants).
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yun-Chieh Liang
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
18
|
Yan Q, Weeks DE, Tiwari HK, Yi N, Zhang K, Gao G, Lin WY, Lou XY, Chen W, Liu N. Rare-Variant Kernel Machine Test for Longitudinal Data from Population and Family Samples. Hum Hered 2016; 80:126-38. [PMID: 27161037 DOI: 10.1159/000445057] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Accepted: 02/24/2016] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE The kernel machine (KM) test reportedly performs well in the set-based association test of rare variants. Many studies have been conducted to measure phenotypes at multiple time points, but the standard KM methodology has only been available for phenotypes at a single time point. In addition, family-based designs have been widely used in genetic association studies; therefore, the data analysis method used must appropriately handle familial relatedness. A rare-variant test does not currently exist for longitudinal data from family samples. Therefore, in this paper, we aim to introduce an association test for rare variants, which includes multiple longitudinal phenotype measurements for either population or family samples. METHODS This approach uses KM regression based on the linear mixed model framework and is applicable to longitudinal data from either population (L-KM) or family samples (LF-KM). RESULTS In our population-based simulation studies, L-KM has good control of Type I error rate and increased power in all the scenarios we considered compared with other competing methods. Conversely, in the family-based simulation studies, we found an inflated Type I error rate when L-KM was applied directly to the family samples, whereas LF-KM retained the desired Type I error rate and had the best power performance overall. Finally, we illustrate the utility of our proposed LF-KM approach by analyzing data from an association study between rare variants and blood pressure from the Genetic Analysis Workshop 18 (GAW18). CONCLUSION We propose a method for rare-variant association testing in population and family samples using phenotypes measured at multiple time points for each subject. The proposed method has the best power performance compared to competing approaches in our simulation study.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa., USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Lin WY. Beyond Rare-Variant Association Testing: Pinpointing Rare Causal Variants in Case-Control Sequencing Study. Sci Rep 2016; 6:21824. [PMID: 26903168 PMCID: PMC4763184 DOI: 10.1038/srep21824] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/01/2016] [Indexed: 12/31/2022] Open
Abstract
Rare-variant association testing usually requires some method of aggregation. The next important step is to pinpoint individual rare causal variants among a large number of variants within a genetic region. Recently Ionita-Laza et al. propose a backward elimination (BE) procedure that can identify individual causal variants among the many variants in a gene. The BE procedure removes a variant if excluding this variant can lead to a smaller P-value for the BURDEN test (referred to as "BE-BURDEN") or the SKAT test (referred to as "BE-SKAT"). We here use the adaptive combination of P-values (ADA) method to pinpoint causal variants. Unlike most gene-based association tests, the ADA statistic is built upon per-site P-values of individual variants. It is straightforward to select important variants given the optimal P-value truncation threshold found by ADA. We performed comprehensive simulations to compare ADA with BE-SKAT and BE-BURDEN. Ranking these three approaches according to positive predictive values (PPVs), the percentage of truly causal variants among the total selected variants, we found ADA > BE-SKAT > BE-BURDEN across all simulation scenarios. We therefore recommend using ADA to pinpoint plausible rare causal variants in a gene.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
- Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
20
|
Zhou YJ, Wang Y, Chen LL. Detecting the Common and Individual Effects of Rare Variants on Quantitative Traits by Using Extreme Phenotype Sampling. Genes (Basel) 2016; 7:genes7010002. [PMID: 26784232 PMCID: PMC4728382 DOI: 10.3390/genes7010002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Revised: 12/21/2015] [Accepted: 01/05/2016] [Indexed: 12/19/2022] Open
Abstract
Next-generation sequencing technology has made it possible to detect rare genetic variants associated with complex human traits. In recent literature, various methods specifically designed for rare variants are proposed. These tests can be broadly classified into burden and nonburden tests. In this paper, we take advantage of the burden and nonburden tests, and consider the common effect and the individual deviations from the common effect. To achieve robustness, we use two methods of combining p-values, Fisher's method and the minimum-p method. In rare variant association studies, to improve the power of the tests, we explore the advantage of the extreme phenotype sampling. At first, we dichotomize the continuous phenotypes before analysis, and the two extremes are treated as two different groups representing a dichotomous phenotype. We next compare the powers of several methods based on extreme phenotype sampling and random sampling. Extensive simulation studies show that our proposed methods by using extreme phenotype sampling are the most powerful or very close to the most powerful one in various settings of true models when the same sample size is used.
Collapse
Affiliation(s)
- Ya-Jing Zhou
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China.
| | - Yong Wang
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
| | - Li-Li Chen
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China.
| |
Collapse
|
21
|
Braga TT, Agudelo JSH, Camara NOS. Macrophages During the Fibrotic Process: M2 as Friend and Foe. Front Immunol 2015; 6:602. [PMID: 26635814 PMCID: PMC4658431 DOI: 10.3389/fimmu.2015.00602] [Citation(s) in RCA: 342] [Impact Index Per Article: 34.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 11/09/2015] [Indexed: 01/07/2023] Open
Abstract
Macrophages play essential activities in homeostasis maintenance during different organism’s conditions. They may be polarized according to various stimuli, which subsequently subdivide them into distinct populations. Macrophages with inflammatory activity function mainly during pathological context, while those with regulatory activity control inflammation and also remodel the repairing process. Here, we propose to review and to present a concise discuss on the role of different components during tissue repair, including those related to innate immune receptors and metabolic modifications. The scar formation is directly related to the degree of inflammation, but also with the appearance of M2 macrophages. In spite of greater numbers of macrophages in the fibrotic phase, regulatory macrophages present some characteristics related to promotion of fibrosis but also with the control of scar formation. These regulatory macrophages present an oxidative metabolism, and differ from the initial inflammatory macrophages, which in turn, present a glycolytic characteristic, which allow regulatory ones to optimize the oxygen consumption and minimizing their ROS production. We will emphasize the difference in macrophage subpopulations and the origin and plasticity of these cells during fibrotic processes.
Collapse
Affiliation(s)
- Tarcio Teodoro Braga
- Nephrology Division, Medicine Department, Federal University of São Paulo , São Paulo , Brazil
| | | | - Niels Olsen Saraiva Camara
- Nephrology Division, Medicine Department, Federal University of São Paulo , São Paulo , Brazil ; Immunology Department, University of São Paulo , São Paulo , Brazil ; Renal Physiology Laboratory, Faculty of Medicine, University of São Paulo , São Paulo , Brazil
| |
Collapse
|
22
|
Associating Multivariate Quantitative Phenotypes with Genetic Variants in Family Samples with a Novel Kernel Machine Regression Method. Genetics 2015; 201:1329-39. [PMID: 26482791 DOI: 10.1534/genetics.115.178590] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 10/04/2015] [Indexed: 11/18/2022] Open
Abstract
The recent development of sequencing technology allows identification of association between the whole spectrum of genetic variants and complex diseases. Over the past few years, a number of association tests for rare variants have been developed. Jointly testing for association between genetic variants and multiple correlated phenotypes may increase the power to detect causal genes in family-based studies, but familial correlation needs to be appropriately handled to avoid an inflated type I error rate. Here we propose a novel approach for multivariate family data using kernel machine regression (denoted as MF-KM) that is based on a linear mixed-model framework and can be applied to a large range of studies with different types of traits. In our simulation studies, the usual kernel machine test has inflated type I error rates when applied directly to familial data, while our proposed MF-KM method preserves the expected type I error rates. Moreover, the MF-KM method has increased power compared to methods that either analyze each phenotype separately while considering family structure or use only unrelated founders from the families. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.
Collapse
|
23
|
Detecting association of rare and common variants by adaptive combination of P-values. Genet Res (Camb) 2015; 97:e20. [PMID: 26440553 DOI: 10.1017/s0016672315000208] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies (GWAS) can detect common variants associated with diseases. Next generation sequencing technology has made it possible to detect rare variants. Most of association tests, including burden tests and nonburden tests, mainly target rare variants by upweighting rare variant effects and downweighting common variant effects. But there is increasing evidence that complex diseases are caused by both common and rare variants. In this paper, we extend the ADA method (adaptive combination of P-values; Lin et al., 2014) for rare variants only and propose a RC-ADA method (common and rare variants by adaptive combination of P-values). Our proposed method combines the per-site P-values with the weights based on minor allele frequencies (MAFs). The RC-ADA is robust to directions of effects of causal variants and inclusion of a high proportion of neutral variants. The performance of the RC-ADA method is compared with several other association methods. Extensive simulation studies show that the RC-ADA method is more powerful than other association methods over a wide range of models.
Collapse
|
24
|
Liu G, Liu Y, Jiang Q, Jiang Y, Feng R, Zhang L, Chen Z, Li K, Liu J. Convergent Genetic and Expression Datasets Highlight TREM2 in Parkinson’s Disease Susceptibility. Mol Neurobiol 2015; 53:4931-8. [PMID: 26365049 DOI: 10.1007/s12035-015-9416-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 09/01/2015] [Indexed: 10/23/2022]
|
25
|
Yan Q, Tiwari HK, Yi N, Gao G, Zhang K, Lin WY, Lou XY, Cui X, Liu N. A Sequence Kernel Association Test for Dichotomous Traits in Family Samples under a Generalized Linear Mixed Model. Hum Hered 2015; 79:60-8. [PMID: 25791389 DOI: 10.1159/000375409] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 01/21/2015] [Indexed: 01/15/2023] Open
Abstract
OBJECTIVE The existing methods for identifying multiple rare variants underlying complex diseases in family samples are underpowered. Therefore, we aim to develop a new set-based method for an association study of dichotomous traits in family samples. METHODS We introduce a framework for testing the association of genetic variants with diseases in family samples based on a generalized linear mixed model. Our proposed method is based on a kernel machine regression and can be viewed as an extension of the sequence kernel association test (SKAT and famSKAT) for application to family data with dichotomous traits (F-SKAT). RESULTS Our simulation studies show that the original SKAT has inflated type I error rates when applied directly to family data. By contrast, our proposed F-SKAT has the correct type I error rate. Furthermore, in all of the considered scenarios, F-SKAT, which uses all family data, has higher power than both SKAT, which uses only unrelated individuals from the family data, and another method, which uses all family data. CONCLUSION We propose a set-based association test that can be used to analyze family data with dichotomous phenotypes while handling genetic variants with the same or opposite directions of effects as well as any types of family relationships.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Ala., USA
| | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
27
|
Yan Q, Tiwari HK, Yi N, Lin WY, Gao G, Lou XY, Cui X, Liu N. Kernel-machine testing coupled with a rank-truncation method for genetic pathway analysis. Genet Epidemiol 2014; 38:447-56. [PMID: 24849109 DOI: 10.1002/gepi.21813] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 04/09/2014] [Accepted: 04/10/2014] [Indexed: 01/09/2023]
Abstract
Traditional genome-wide association studies (GWASs) usually focus on single-marker analysis, which only accesses marginal effects. Pathway analysis, on the other hand, considers biological pathway gene marker hierarchical structure and therefore provides additional insights into the genetic architecture underlining complex diseases. Recently, a number of methods for pathway analysis have been proposed to assess the significance of a biological pathway from a collection of single-nucleotide polymorphisms. In this study, we propose a novel approach for pathway analysis that assesses the effects of genes using the sequence kernel association test and the effects of pathways using an extended adaptive rank truncated product statistic. It has been increasingly recognized that complex diseases are caused by both common and rare variants. We propose a new weighting scheme for genetic variants across the whole allelic frequency spectrum to be analyzed together without any form of frequency cutoff for defining rare variants. The proposed approach is flexible. It is applicable to both binary and continuous traits, and incorporating covariates is easy. Furthermore, it can be readily applied to GWAS data, exome-sequencing data, and deep resequencing data. We evaluate the new approach on data simulated under comprehensive scenarios and show that it has the highest power in most of the scenarios while maintaining the correct type I error rate. We also apply our proposed methodology to data from a study of the association between bipolar disorder and candidate pathways from Wellcome Trust Case Control Consortium (WTCCC) to show its utility.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Lin WY. Association testing of clustered rare causal variants in case-control studies. PLoS One 2014; 9:e94337. [PMID: 24736372 PMCID: PMC3988195 DOI: 10.1371/journal.pone.0094337] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 03/12/2014] [Indexed: 11/18/2022] Open
Abstract
Biological evidence suggests that multiple causal variants in a gene may cluster physically. Variants within the same protein functional domain or gene regulatory element would locate in close proximity on the DNA sequence. However, spatial information of variants is usually not used in current rare variant association analyses. We here propose a clustering method (abbreviated as "CLUSTER"), which is extended from the adaptive combination of P-values. Our method combines the association signals of variants that are more likely to be causal. Furthermore, the statistic incorporates the spatial information of variants. With extensive simulations, we show that our method outperforms several commonly-used methods in many scenarios. To demonstrate its use in real data analyses, we also apply this CLUSTER test to the Dallas Heart Study data. CLUSTER is among the best methods when the effects of causal variants are all in the same direction. As variants located in close proximity are more likely to have similar impact on disease risk, CLUSTER is recommended for association testing of clustered rare causal variants in case-control studies.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|