1
|
Sun NA, Wang YU, Chu J, Han Q, Shen Y. Bayesian Approaches in Exploring Gene-environment and Gene-gene Interactions: A Comprehensive Review. Cancer Genomics Proteomics 2023; 20:669-678. [PMID: 38035701 PMCID: PMC10687732 DOI: 10.21873/cgp.20414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Rapid advancements in high-throughput biological techniques have facilitated the generation of high-dimensional omics datasets, which have provided a solid foundation for precision medicine and prognosis prediction. Nonetheless, the problem of missing heritability persists. To solve this problem, it is essential to explain the genetic structure of disease incidence risk and prognosis by incorporating interactions. The development of the Bayesian theory has provided new approaches for developing models for interaction identification and estimation. Several Bayesian models have been developed to improve the accuracy of model and identify the main effect, gene-environment (G×E) and gene-gene (G×G) interactions. Studies based on single-nucleotide polymorphisms (SNPs) are significant for the exploration of rare and common variants. Models based on the effect heredity principle and group-based models are relatively flexible and do not require strict constraints when dealing with the hierarchical structure between the main effect and interactions (M-I). These models have a good interpretability of biological mechanisms. Machine learning-based Bayesian approaches are highly competitive in improving prediction accuracy. These models provide insights into the mechanisms underlying the occurrence and progression of complex diseases, identify more reliable biomarkers, and develop higher predictive accuracy. In this paper, we provide a comprehensive review of these Bayesian approaches.
Collapse
Affiliation(s)
- N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Y U Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Qiang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| |
Collapse
|
2
|
Mridha MF, Prodeep AR, Hoque ASMM, Islam MR, Lima AA, Kabir MM, Hamid MA, Watanobe Y. A Comprehensive Survey on the Progress, Process, and Challenges of Lung Cancer Detection and Classification. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:5905230. [PMID: 36569180 PMCID: PMC9788902 DOI: 10.1155/2022/5905230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/17/2022] [Accepted: 11/09/2022] [Indexed: 12/23/2022]
Abstract
Lung cancer is the primary reason of cancer deaths worldwide, and the percentage of death rate is increasing step by step. There are chances of recovering from lung cancer by detecting it early. In any case, because the number of radiologists is limited and they have been working overtime, the increase in image data makes it hard for them to evaluate the images accurately. As a result, many researchers have come up with automated ways to predict the growth of cancer cells using medical imaging methods in a quick and accurate way. Previously, a lot of work was done on computer-aided detection (CADe) and computer-aided diagnosis (CADx) in computed tomography (CT) scan, magnetic resonance imaging (MRI), and X-ray with the goal of effective detection and segmentation of pulmonary nodule, as well as classifying nodules as malignant or benign. But still, no complete comprehensive review that includes all aspects of lung cancer has been done. In this paper, every aspect of lung cancer is discussed in detail, including datasets, image preprocessing, segmentation methods, optimal feature extraction and selection methods, evaluation measurement matrices, and classifiers. Finally, the study looks into several lung cancer-related issues with possible solutions.
Collapse
Affiliation(s)
- M. F. Mridha
- Department of Computer Science and Engineering, American International University Bangladesh, Dhaka 1229, Bangladesh
| | - Akibur Rahman Prodeep
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh
| | - A. S. M. Morshedul Hoque
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh
| | - Md. Rashedul Islam
- Department of Computer Science and Engineering, University of Asia Pacific, Dhaka 1216, Bangladesh
| | - Aklima Akter Lima
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh
| | - Muhammad Mohsin Kabir
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka 1216, Bangladesh
| | - Md. Abdul Hamid
- Department of Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Yutaka Watanobe
- Department of Computer Science and Engineering, University of Aizu, Aizuwakamatsu 965-8580, Japan
| |
Collapse
|
3
|
Zhu Z, Yang G, Pang Z, Liang J, Wang W, Zhou Y. Establishment of a regression model of bone metabolism markers for the diagnosis of bone metastases in lung cancer. World J Surg Oncol 2021; 19:27. [PMID: 33487166 PMCID: PMC7830744 DOI: 10.1186/s12957-021-02141-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 01/19/2021] [Indexed: 01/16/2023] Open
Abstract
Background The aim of this study was to establish a regression equation model of serum bone metabolism markers. We analyzed the diagnostic value of bone metastases in lung cancer and provided laboratory evidence for the early clinical treatment of bone metastases in lung cancer. Methods A total of 339 patients with non-metastatic lung cancer, patients with lung cancer with bone metastasis, and patients with benign lung disease who were treated in our hospital from July 2012 to October 2015 were included. A total of 103 patients with lung cancer in the non-metastatic group, 128 patients with lung cancer combined with bone metastasis group, and 108 patients with benign lung diseases who had nontumor and nonbone metabolism-related diseases were selected as the control group. Detection and analysis of type I collagen carboxyl terminal peptide β-special sequence (β-CTX), total type I procollagen amino terminal propeptide (TPINP), N-terminal-mid fragment of osteocalcin (N-MID), parathyroid hormone (PTH), vitamin D (VitD3), alkaline phosphatase (ALP), calcium (CA), phosphorus (P), cytokeratin 19 fragment (F211), and other indicators were performed. Four multiple regression models were established to determine the best diagnostic model for lung cancer with bone metastasis. Results Analysis of single indicators of bone metabolism markers in lung cancer was performed, among which F211, β-CTX, TPINP, and ALP were significantly different (P < 0.05). The ROC curve of each indicator was less than 0.712. Based on the multiple regression models, the fourth model was the best and was much better than a single indicator with an AUC of 0.856, a sensitivity of 70.0%, a specificity of 91.0%, a positive predictive value of 82.5%, and a negative predictive value of 72.0%. Conclusion Multiple regression models of bone metabolism markers were established. These models can be used to evaluate the progression of lung cancer and provide a basis for the early treatment of bone metastases.
Collapse
Affiliation(s)
- Zhongliang Zhu
- Department of Clinical Laboratory, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, 310014, Zhejiang, China
| | - Guangyu Yang
- Department of Clinical Laboratory, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, 310014, Zhejiang, China
| | - Zhenzhen Pang
- Department of Clinical Laboratory, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, 310014, Zhejiang, China
| | - Jiawei Liang
- Department of Clinical Laboratory, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, 310014, Zhejiang, China
| | - Weizhong Wang
- Department of Clinical Laboratory, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, 310014, Zhejiang, China.
| | - Yonglie Zhou
- Department of Clinical Laboratory, Zhejiang Provincial People's Hospital, People's Hospital of Hangzhou Medical College, Hangzhou, 310014, Zhejiang, China.
| |
Collapse
|
4
|
Goncalves A, Soper B, Nygård M, Nygård JF, Ray P, Widemann D, Sales AP. Improving five-year survival prediction via multitask learning across HPV-related cancers. PLoS One 2020; 15:e0241225. [PMID: 33196642 PMCID: PMC7668590 DOI: 10.1371/journal.pone.0241225] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Accepted: 10/11/2020] [Indexed: 12/12/2022] Open
Abstract
Oncology is a highly siloed field of research in which sub-disciplinary specialization has limited the amount of information shared between researchers of distinct cancer types. This can be attributed to legitimate differences in the physiology and carcinogenesis of cancers affecting distinct anatomical sites. However, underlying processes that are shared across seemingly disparate cancers probably affect prognosis. The objective of the current study is to investigate whether multitask learning improves 5-year survival cancer patient survival prediction by leveraging information across anatomically distinct HPV related cancers. Data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program database. The study cohort consisted of 29,768 primary cancer cases diagnosed in the United States between 2004 and 2015. Ten different cancer diagnoses were selected, all with a known association with HPV risk. In the analysis, the cancer diagnoses were categorized into three distinct topography groups of varying specificity. The most specific topography grouping consisted of 10 original cancer diagnoses differentiated by the first two digits of the ICD-O-3 topography code. The second topography grouping consisted of cancer diagnoses categorized into six distinct organ groups. Finally, the third topography grouping consisted of just two groups, head-neck cancers and ano-genital cancers. The tasks were to predict 5-year survival for patients within the different topography groups using 14 predictive features which were selected among descriptive variables available in the SEER database. The information from the predictive features was shared between tasks in three different ways, resulting in three distinct predictive models: 1) Information was not shared between patients assigned to different tasks (single task learning); 2) Information was shared between all patients, regardless of task (pooled model); 3) Only relevant information was shared between patients grouped to different tasks (multitask learning). Prediction performance was evaluated with Brier scores. All three models were evaluated against one another on each of the three distinct topography-defined tasks. The results showed that multitask classifiers achieved relative improvement for the majority of the scenarios studied compared to single task learning and pooled baseline methods. In this study, we have demonstrated that sharing information among anatomically distinct cancer types can lead to improved predictive survival models.
Collapse
Affiliation(s)
- Andre Goncalves
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - Braden Soper
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | | | | | - Priyadip Ray
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - David Widemann
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| | - Ana Paula Sales
- Lawrence Livermore National Laboratory, Livermore, CA, United States of America
| |
Collapse
|
5
|
Yuan X, Biswas S. Bivariate logistic Bayesian LASSO for detecting rare haplotype association with two correlated phenotypes. Genet Epidemiol 2019; 43:996-1017. [PMID: 31544985 DOI: 10.1002/gepi.22258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 07/31/2019] [Accepted: 08/09/2019] [Indexed: 11/08/2022]
Abstract
In genetic association studies, joint modeling of related traits/phenotypes can utilize the correlation between them and thereby provide more power and uncover additional information about genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) has been proposed recently to detect rare haplotype variants using case-control data, that is, a single binary phenotype. As there is currently no haplotype association method that can handle multiple binary phenotypes, we extend LBL to fill this gap. We develop a bivariate model by using a latent variable to induce correlation between the two outcomes. We carry out extensive simulations to investigate the bivariate LBL and compare with the univariate LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two data sets-Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association data set on lung cancer and smoking and detect several associated rare haplotypes.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| |
Collapse
|
6
|
Papachristou C, Biswas S. Comparison of haplotype-based tests for detecting gene-environment interactions with rare variants. Brief Bioinform 2019; 21:851-862. [PMID: 31329820 DOI: 10.1093/bib/bbz031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2018] [Revised: 02/06/2019] [Accepted: 02/28/2019] [Indexed: 11/13/2022] Open
Abstract
Dissecting the genetic mechanism underlying a complex disease hinges on discovering gene-environment interactions (GXE). However, detecting GXE is a challenging problem especially when the genetic variants under study are rare. Haplotype-based tests have several advantages over the so-called collapsing tests for detecting rare variants as highlighted in recent literature. Thus, it is of practical interest to compare haplotype-based tests for detecting GXE including the recent ones developed specifically for rare haplotypes. We compare the following methods: haplo.glm, hapassoc, HapReg, Bayesian hierarchical generalized linear model (BhGLM) and logistic Bayesian LASSO (LBL). We simulate data under different types of association scenarios and levels of gene-environment dependence. We find that when the type I error rates are controlled to be the same for all methods, LBL is the most powerful method for detecting GXE. We applied the methods to a lung cancer data set, in particular, in region 15q25.1 as it has been suggested in the literature that it interacts with smoking to affect the lung cancer susceptibility and that it is associated with smoking behavior. LBL and BhGLM were able to detect a rare haplotype-smoking interaction in this region. We also analyzed the sequence data from the Dallas Heart Study, a population-based multi-ethnic study. Specifically, we considered haplotype blocks in the gene ANGPTL4 for association with trait serum triglyceride and used ethnicity as a covariate. Only LBL found interactions of haplotypes with race (Hispanic). Thus, in general, LBL seems to be the best method for detecting GXE among the ones we studied here. Nonetheless, it requires the most computation time.
Collapse
Affiliation(s)
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
7
|
Lin WY, Huang CC, Liu YL, Tsai SJ, Kuo PH. Genome-Wide Gene-Environment Interaction Analysis Using Set-Based Association Tests. Front Genet 2019; 9:715. [PMID: 30693016 PMCID: PMC6339974 DOI: 10.3389/fgene.2018.00715] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Accepted: 12/20/2018] [Indexed: 12/22/2022] Open
Abstract
The identification of gene-environment interactions (G × E) may eventually guide health-related choices and medical interventions for complex diseases. More powerful methods must be developed to identify G × E. The “adaptive combination of Bayes factors method” (ADABF) has been proposed as a powerful genome-wide polygenic approach to detect G × E. In this work, we evaluate its performance when serving as a gene-based G × E test. We compare ADABF with six tests including the “Set-Based gene-EnviRonment InterAction test” (SBERIA), “gene-environment set association test” (GESAT), etc. With extensive simulations, SBERIA and ADABF are found to be more powerful than other G × E tests. However, SBERIA suffers from a power loss when 50% SNP main effects are in the same direction with the SNP × E interaction effects while 50% are in the opposite direction. We further applied these seven G × E methods to the Taiwan Biobank data to explore gene× alcohol interactions on blood pressure levels. The ADAMTS7P1 gene at chromosome 15q25.2 was detected to interact with alcohol consumption on diastolic blood pressure (p = 9.5 × 10−7, according to the GESAT test). At this gene, the P-values provided by other six tests all reached the suggestive significance level (p < 5 × 10−5). Regarding the computation time required for a genome-wide G × E analysis, SBERIA is the fastest method, followed by ADABF. Considering the validity, power performance, robustness, and computation time, ADABF is recommended for genome-wide G × E analyses.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.,Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Ching-Chieh Huang
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Yu-Li Liu
- Center for Neuropsychiatric Research, National Health Research Institutes, Zhunan, Taiwan
| | - Shih-Jen Tsai
- Department of Psychiatry, Taipei Veterans General Hospital, Taipei, Taiwan.,Division of Psychiatry, National Yang-Ming University, Taipei, Taiwan
| | - Po-Hsiu Kuo
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.,Department of Public Health, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
8
|
Zhang Y, Hofmann JN, Purdue MP, Lin S, Biswas S. Logistic Bayesian LASSO for genetic association analysis of data from complex sampling designs. J Hum Genet 2017; 62:819-829. [PMID: 28424482 PMCID: PMC5572548 DOI: 10.1038/jhg.2017.43] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 01/20/2023]
Abstract
Detecting gene-environment interactions with rare variants is critical in dissecting the etiology of common diseases. Interactions with rare haplotype variants (rHTVs) are of particular interest. At the same time, complex sampling designs, such as stratified random sampling, are becoming increasingly popular for designing case-control studies, especially for recruiting controls. The US Kidney Cancer Study (KCS) is an example, wherein all available cases were included while the controls at each site were randomly selected from the population by frequency matching with cases based on age, sex and race. There is currently no rHTV association method that can account for such a complex sampling design. To fill this gap, we consider logistic Bayesian LASSO (LBL), an existing rHTV approach for case-control data, and show that its model can easily accommodate the complex sampling design. We study two extensions that include stratifying variables either as main effects only or with additional modeling of their interactions with haplotypes. We conduct extensive simulation studies to compare the complex sampling methods with the original LBL methods. We find that, when there is no interaction between haplotype and stratifying variables, both extensions perform well while the original LBL methods lead to inflated type I error rates. However, when such an interaction exists, it is necessary to include the interaction effect in the model to control the type I error rate. Finally, we analyze the KCS data and find a significant interaction between (current) smoking and a specific rHTV in the N-acetyltransferase 2 gene.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jonathan N Hofmann
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Mark P Purdue
- Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
9
|
Zhang Y, Lin S, Biswas S. Detecting rare and common haplotype-environment interaction under uncertainty of gene-environment independence assumption. Biometrics 2016; 73:344-355. [PMID: 27478935 DOI: 10.1111/biom.12567] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 05/01/2016] [Accepted: 06/01/2016] [Indexed: 11/28/2022]
Abstract
Finding rare variants and gene-environment interactions (GXE) is critical in dissecting complex diseases. We consider the problem of detecting GXE where G is a rare haplotype and E is a nongenetic factor. Such methods typically assume G-E independence, which may not hold in many applications. A pertinent example is lung cancer-there is evidence that variants on Chromosome 15q25.1 interact with smoking to affect the risk. However, these variants are associated with smoking behavior rendering the assumption of G-E independence inappropriate. With the motivation of detecting GXE under G-E dependence, we extend an existing approach, logistic Bayesian LASSO, which assumes G-E independence (LBL-GXE-I) by modeling G-E dependence through a multinomial logistic regression (referred to as LBL-GXE-D). Unlike LBL-GXE-I, LBL-GXE-D controls type I error rates in all situations; however, it has reduced power when G-E independence holds. To control type I error without sacrificing power, we further propose a unified approach, LBL-GXE, to incorporate uncertainty in the G-E independence assumption by employing a reversible jump Markov chain Monte Carlo method. Our simulations show that LBL-GXE has power similar to that of LBL-GXE-I when G-E independence holds, yet has well-controlled type I errors in all situations. To illustrate the utility of LBL-GXE, we analyzed a lung cancer dataset and found several significant interactions in the 15q25.1 region, including one between a specific rare haplotype and smoking.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75080, U.S.A
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, Ohio 43210, U.S.A
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75080, U.S.A
| |
Collapse
|
10
|
Van Poucke S, Thomeer M, Heath J, Vukicevic M. Are Randomized Controlled Trials the (G)old Standard? From Clinical Intelligence to Prescriptive Analytics. J Med Internet Res 2016; 18:e185. [PMID: 27383622 PMCID: PMC4954919 DOI: 10.2196/jmir.5549] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2016] [Revised: 06/01/2016] [Accepted: 06/21/2016] [Indexed: 12/11/2022] Open
Abstract
Despite the accelerating pace of scientific discovery, the current clinical research enterprise does not sufficiently address pressing clinical questions. Given the constraints on clinical trials, for a majority of clinical questions, the only relevant data available to aid in decision making are based on observation and experience. Our purpose here is 3-fold. First, we describe the classic context of medical research guided by Poppers' scientific epistemology of "falsificationism." Second, we discuss challenges and shortcomings of randomized controlled trials and present the potential of observational studies based on big data. Third, we cover several obstacles related to the use of observational (retrospective) data in clinical studies. We conclude that randomized controlled trials are not at risk for extinction, but innovations in statistics, machine learning, and big data analytics may generate a completely new ecosystem for exploration and validation.
Collapse
Affiliation(s)
- Sven Van Poucke
- Department of Anesthesiology, Critical Care, Emergency Medicine, Pain Therapy, Ziekenhuis Oost-Limburg, Genk, Belgium.
| | | | | | | |
Collapse
|
11
|
Datta AS, Biswas S. Comparison of haplotype-based statistical tests for disease association with rare and common variants. Brief Bioinform 2015; 17:657-71. [PMID: 26338417 DOI: 10.1093/bib/bbv072] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Indexed: 01/26/2023] Open
Abstract
Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods-Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types-individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method-Sequence Kernel Association Test (SKAT) and its two variants-SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects.
Collapse
|