1
|
Liu Y, Chakraborty N, Qin ZS, Kundu S, The Alzheimer’s Disease Neuroimaging Initiative. Integrative Bayesian tensor regression for imaging genetics applications. Front Neurosci 2023; 17:1212218. [PMID: 37680967 PMCID: PMC10481528 DOI: 10.3389/fnins.2023.1212218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 07/17/2023] [Indexed: 09/09/2023] Open
Abstract
Identifying biomarkers for Alzheimer's disease with a goal of early detection is a fundamental problem in clinical research. Both medical imaging and genetics have contributed informative biomarkers in literature. To further improve the performance, recently, there is an increasing interest in developing analytic approaches that combine data across modalities such as imaging and genetics. However, there are limited methods in literature that are able to systematically combine high-dimensional voxel-level imaging and genetic data for accurate prediction of clinical outcomes of interest. Existing prediction models that integrate imaging and genetic features often use region level imaging summaries, and they typically do not consider the spatial configurations of the voxels in the image or incorporate the dependence between genes that may compromise prediction ability. We propose a novel integrative Bayesian scalar-on-image regression model for predicting cognitive outcomes based on high-dimensional spatially distributed voxel-level imaging data, along with correlated transcriptomic features. We account for the spatial dependencies in the imaging voxels via a tensor approach that also enables massive dimension reduction to address the curse of dimensionality, and models the dependencies between the transcriptomic features via a Graph-Laplacian prior. We implement this approach via an efficient Markov chain Monte Carlo (MCMC) computation strategy. We apply the proposed method to the analysis of longitudinal ADNI data for predicting cognitive scores at different visits by integrating voxel-level cortical thickness measurements derived from T1w-MRI scans and transcriptomics data. We illustrate that the proposed imaging transcriptomics approach has significant improvements in prediction compared to prediction using a subset of features from only one modality (imaging or genetics), as well as when using imaging and transcriptomics features but ignoring the inherent dependencies between the features. Our analysis is one of the first to conclusively demonstrate the advantages of prediction based on combining voxel-level cortical thickness measurements along with transcriptomics features, while accounting for inherent structural information.
Collapse
Affiliation(s)
- Yajie Liu
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States
| | - Nilanjana Chakraborty
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Suprateek Kundu
- Department of Biostatistics, Division of Basic Science Research, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | | |
Collapse
|
2
|
Hossain MI, Rahman A, Uddin MSG, Zinia FA. Double burden of malnutrition among women of reproductive age in Bangladesh: A comparative study of classical and Bayesian logistic regression approach. Food Sci Nutr 2023; 11:1785-1796. [PMID: 37051361 PMCID: PMC10084956 DOI: 10.1002/fsn3.3209] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Revised: 12/04/2022] [Accepted: 12/19/2022] [Indexed: 01/08/2023] Open
Abstract
Although the prevalence of undernutrition among women of reproductive age has declined in Bangladesh, the increase in the prevalence of overnutrition remains a major challenge. To achieve Sustainable Development Goal 2.2, it is important to identify the drivers of the double burden of malnutrition on women in Bangladesh. The Bangladesh Demographic and Health Survey, 2017-2018 was used to model the relationship between the double burden of malnutrition among women and the risk factors using a logistic regression model under the classical and Bayesian frameworks and performed the comparison between the regression models based on the narrowest confidence interval. Regarding the Bayesian application, the Metropolis-Hastings algorithm with two types of prior information (historical and noninformative prior) was used to simulate parameter estimates from the posterior distributions. The Boruta algorithm was used to determine the significant predictors. Almost half of reproductive aged women experienced a form of malnutrition (12% were underweight, 26.1% were overweight, and 6.8% were obese). In terms of the narrowest interval estimate, it was found that Bayesian logistic regression with informative priors performs better than the noninformative priors and the classical logistic regression model. Women who were older, highly educated, from rich families, unemployed, and from urban residences were more likely to experience the double burden of malnutrition. This study recommended using the historical prior as the informative prior rather than the flat/noninformative prior to estimating the parameter uncertainty if historical data are available. The double burden of malnutrition among women is a major public health challenge in Bangladesh. This study was to determine the impact of effective risk factors on the double burden of malnutrition among women by applying the Bayesian framework. Using both informative and noninformative priors, "historical prior" was proposed as informative prior information. The main strength is that the proposed prior (historical prior) provided improved estimation as compared to the flat prior distribution.
Collapse
Affiliation(s)
| | - Azizur Rahman
- Department of StatisticsJahangirnagar UniversitySavar, DhakaBangladesh
| | | | | |
Collapse
|
3
|
Fornacon-Wood I, Mistry H, Johnson-Hart C, Faivre-Finn C, O'Connor JPB, Price GJ. Bayesian methods provide a practical real-world evidence framework for evaluating the impact of changes in radiotherapy. Radiother Oncol 2022; 176:53-58. [PMID: 36184998 DOI: 10.1016/j.radonc.2022.09.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 12/14/2022]
Abstract
PURPOSE Retrospective studies have identified a link between the average set-up error of lung cancer patients treated with image-guided radiotherapy (IGRT) and survival. The IGRT protocol was subsequently changed to reduce the action threshold. In this study, we use a Bayesian approach to evaluate the clinical impact of this change to practice using routine 'real-world' patient data. METHODS AND MATERIALS Two cohorts of NSCLC patients treated with IGRT were compared: pre-protocol change (N = 780, 5 mm action threshold) and post-protocol change (N = 411, 2 mm action threshold). Survival models were fitted to each cohort and changes in the hazard ratios (HR) associated with residual set-up errors was assessed. The influence of using an uninformative and a skeptical prior in the model was investigated. RESULTS Following the reduction of the action threshold, the HR for residual set-up error towards the heart was reduced by up to 10%. Median patient survival increased for patients with set-up errors towards the heart, and remained similar for patients with set-up errors away from the heart. Depending on the prior used, a residual hazard ratio may remain. CONCLUSIONS Our analysis found a reduced hazard of death and increased survival for patients with residual set-up errors towards versus away from the heart post-protocol change. This study demonstrates the value of a Bayesian approach in the assessment of technical changes in radiotherapy practice and supports the consideration of adopting this approach in further prospective evaluations of changes to clinical practice.
Collapse
Affiliation(s)
| | - Hitesh Mistry
- Division of Cancer Sciences, University of Manchester, Manchester, UK
| | - Corinne Johnson-Hart
- Department of Medical Physics, The Christie Hospital NHS Foundation Trust, Manchester, UK
| | - Corinne Faivre-Finn
- Division of Cancer Sciences, University of Manchester, Manchester, UK; Department of Clinical Oncology, The Christie Hospital NHS Foundation Trust, Manchester, UK
| | - James P B O'Connor
- Division of Cancer Sciences, University of Manchester, Manchester, UK; Department of Diagnostic Radiology, The Christie Hospital NHS Foundation Trust, Manchester, UK
| | - Gareth J Price
- Division of Cancer Sciences, University of Manchester, Manchester, UK
| |
Collapse
|
4
|
Ou D, Wu Y. The prognostic and clinical significance of IFI44L aberrant downregulation in patients with oral squamous cell carcinoma. BMC Cancer 2021; 21:1327. [PMID: 34903206 PMCID: PMC8667451 DOI: 10.1186/s12885-021-09058-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 11/22/2021] [Indexed: 12/16/2022] Open
Abstract
Background It is a basic task in high-throughput gene expression profiling studies to identify differentially expressed genes (DEGs) between two phenotypes. RankComp, an algorithm, could analyze the highly stable within-sample relative expression orderings (REOs) of gene pairs in a particular type of human normal tissue that are widely reversed in the cancer condition, thereby detecting DEGs for individual disease samples measured by a particular platform. Methods In the present study, Gene Expression Omnibus (GEO) Series (GSE) GSE75540, GSE138206 were downloaded from GEO, by analyzing DEGs in oral squamous cell carcinoma based on online datasets using the RankComp algorithm, using the Kaplan-Meier survival analysis and Cox regression analysis to survival analysis, Gene Set Enrichment Analysis (GSEA) to explore the potential molecular mechanisms underlying. Results We identified 6 reverse gene pairs with stable REOs. All the 12 genes in these 6 reverse gene pairs have been reported to be associated with cancers. Notably, lower Interferon Induced Protein 44 Like (IFI44L) expression was associated with poorer overall survival (OS) and Disease-free survival (DFS) in oral squamous cell carcinoma patients, and IFI44L expression showed satisfactory predictive efficiency by receiver operating characteristic (ROC) curve. Moreover, low IFI44L expression was identified as risk factors for oral squamous cell carcinoma patients’ OS. IFI44L downregulation would lead to the activation of the FRS-mediated FGFR1, FGFR3, and downstream signaling pathways, and might play a role in the PI3K-FGFR cascades. Conclusions Collectively, we identified 6 reverse gene pairs with stable REOs in oral squamous cell carcinoma, which might serve as gene signatures playing a role in the diagnosis in oral squamous cell carcinoma. Moreover, high expression of IFI44L, one of the DEGs in the 6 reverse gene pairs, might be associated with favorable prognosis in oral squamous cell carcinoma patients and serve as a tumor suppressor by acting on the FRS-mediated FGFR signaling. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-09058-y.
Collapse
Affiliation(s)
- Deming Ou
- Department of Stomatology, Panyu Central Hospital, Guangzhou, 511400, China.
| | - Ying Wu
- Department of Stomatology, Foshan Hospital of Traditional Chinese Medicine, Foshan, 528000, China
| |
Collapse
|
5
|
Tercan B, Acar AC. The Use of Informed Priors in Biclustering of Gene Expression with the Hierarchical Dirichlet Process. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1810-1821. [PMID: 30835228 DOI: 10.1109/tcbb.2019.2901676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We motivate and describe the application of Hierarchical Dirichlet Process (HDP) models to the "soft" biclustering of gene expression data, in which we obtain modules (biclusters) where the affiliation of genes and samples with the modules are weighted, instead of being hard memberships. As a distinct contribution, we propose a method which HDP is informed with prior beliefs, significantly increasing the quality of the biclustering in terms of both the correctness of the number of modules inferred, and the precision of these modules, especially when evidence is sparse. We outline two such informed priors; one based on co-expression relationships inherent in the data, the other based on an externally provided regulatory network. We validate these results and compare the performance of our approach to Weighted Gene Correlation Network Analysis (WGCNA), another model that features weighted modules. We have, to this end, performed experiments on semi-synthetic data. The results show that HDP, with the addition of a well-informed prior, is able to capture the correct number of modules with increased accuracy. Furthermore, the model becomes robust to changes in the strength of the prior. We conclude by discussing these results and the benefits provided by our approach for gene expression analysis and network validation.
Collapse
|
6
|
Chang C, Jang JH, Manatunga A, Taylor AT, Long Q. A Bayesian Latent Class Model to Predict Kidney Obstruction in the Absence of Gold Standard. J Am Stat Assoc 2020; 115:1645-1663. [PMID: 34113054 DOI: 10.1080/01621459.2019.1689983] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Kidney obstruction, if untreated in a timely manner, can lead to irreversible loss of renal function. A widely used technology for evaluations of kidneys with suspected obstruction is diuresis renography. However, it is generally very challenging for radiologists who typically interpret renography data in practice to build high level of competency due to the low volume of renography studies and insufficient training. Another challenge is that there is currently no gold standard for detection of kidney obstruction. Seeking to develop a computer-aided diagnostic (CAD) tool that can assist practicing radiologists to reduce errors in the interpretation of kidney obstruction, a recent study collected data from diuresis renography, interpretations on the renography data from highly experienced nuclear medicine experts as well as clinical data. To achieve the objective, we develop a statistical model that can be used as a CAD tool for assisting radiologists in kidney interpretation. We use a Bayesian latent class modeling approach for predicting kidney obstruction through the integrative analysis of time-series renogram data, expert ratings, and clinical variables. A nonparametric Bayesian latent factor regression approach is adopted for modeling renogram curves in which the coefficients of the basis functions are parameterized via the factor loadings dependent on the latent disease status and the extended latent factors that can also adjust for clinical variables. A hierarchical probit model is used for expert ratings, allowing for training with rating data from multiple experts while predicting with at most one expert, which makes the proposed model operable in practice. An efficient MCMC algorithm is developed to train the model and predict kidney obstruction with associated uncertainty. We demonstrate the superiority of the proposed method over several existing methods through extensive simulations. Analysis of the renal study also lends support to the usefulness of our model as a CAD tool to assist less experienced radiologists in the field.
Collapse
Affiliation(s)
- Changgee Chang
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania
| | - Jeong Hoon Jang
- Department of Biostatistics and Bioinformatics, Emory University
| | - Amita Manatunga
- Department of Biostatistics and Bioinformatics, Emory University
| | - Andrew T Taylor
- Department of Radiology and Imaging Sciences, Emory University
| | - Qi Long
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania
| |
Collapse
|
7
|
La Gamba F, Jacobs T, Geys H, Jaki T, Serroyen J, Ursino M, Russu A, Faes C. Bayesian sequential integration within a preclinical pharmacokinetic and pharmacodynamic modeling framework: Lessons learned. Pharm Stat 2019; 18:486-506. [PMID: 30932327 DOI: 10.1002/pst.1941] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 11/09/2018] [Accepted: 02/02/2019] [Indexed: 12/25/2022]
Abstract
The present manuscript aims to discuss the implications of sequential knowledge integration of small preclinical trials in a Bayesian pharmacokinetic and pharmacodynamic (PK-PD) framework. While, at first sight, a Bayesian PK-PD framework seems to be a natural framework to allow for sequential knowledge integration, the scope of this paper is to highlight some often-overlooked challenges while at the same time providing some guidances in the many and overwhelming choices that need to be made. Challenges as well as opportunities will be discussed that are related to the impact of (1) the prior specification, (2) the choice of random effects, (3) the type of sequential integration method. In addition, it will be shown how the success of a sequential integration strategy is highly dependent on a carefully chosen experimental design when small trials are analyzed.
Collapse
Affiliation(s)
- Fabiola La Gamba
- Department of Quantitative Sciences, Janssen Research & Development, a Division of Janssen Pharmaceutica NV, Beerse, Belgium.,Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
| | - Tom Jacobs
- Department of Quantitative Sciences, Janssen Research & Development, a Division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - Helena Geys
- Department of Quantitative Sciences, Janssen Research & Development, a Division of Janssen Pharmaceutica NV, Beerse, Belgium.,Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
| | - Thomas Jaki
- Department of Mathematics and Statistics, Lancaster University, Lancaster, England
| | - Jan Serroyen
- Department of Quantitative Sciences, Janssen Research & Development, a Division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - Moreno Ursino
- Centre de Recherche des Cordeliers, INSERM, Sorbonne Université, USPC, Université Paris Descartes, Université Paris Diderot, Paris, France
| | - Alberto Russu
- Department of Quantitative Sciences, Janssen Research & Development, a Division of Janssen Pharmaceutica NV, Beerse, Belgium
| | - Christel Faes
- Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
| |
Collapse
|
8
|
Ma C, Ji T. Detecting differentially expressed genes for syndromes by considering change in mean and dispersion simultaneously. BMC Bioinformatics 2018; 19:330. [PMID: 30236056 PMCID: PMC6148965 DOI: 10.1186/s12859-018-2354-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 08/30/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Using next-generation sequencing technology to measure gene expression, an empirically intriguing question concerns the identification of differentially expressed genes across treatment groups. Existing methods aim to identify genes whose mean expressions differ among treatment groups by assuming equal dispersion across all groups. For syndromes, however, various combinations of gene expression alterations can result in the same disease, leading to greater heteroscedasticity in the biological replicates in the disease group compared to the normal group. Traditional methods that only consider changes in the mean will fail to fully analyze gene expression in such a scenario. In addition, sequencing technology is relatively expensive; most labs can only afford a few replicates per treatment group, which poses further challenges to reliably estimating the mean and dispersion under each treatment condition. RESULTS We designed an empirical Bayes method and a pooled permutation test to simultaneously consider the change in mean and dispersion across treatment groups. We further computed confidence intervals based on Bayes estimates to identify differentially expressed genes that are unique to each disease sample as well as those that are common across all disease samples. We illustrated our method by applying it to gene expression data from a large offspring syndrome experiment, which motivated this study. We compared our method to competing approaches through simulation studies that mimicked the real datasets to demonstrate the effectiveness of our proposed method. CONCLUSIONS We will show that, compared to popular methods that only aim to find the difference in the mean, our method can capture greater variation in the disease group to effectively identify differentially expressed genes for syndromes.
Collapse
Affiliation(s)
- Chenchen Ma
- Department of Statistics, University of Missouri at Columbia, Columbia, 65211, MO, USA
| | - Tieming Ji
- Department of Statistics, University of Missouri at Columbia, Columbia, 65211, MO, USA.
| |
Collapse
|
9
|
Cai H, Li X, Li J, Liang Q, Zheng W, Guan Q, Guo Z, Wang X. Identifying differentially expressed genes from cross-site integrated data based on relative expression orderings. Int J Biol Sci 2018; 14:892-900. [PMID: 29989020 PMCID: PMC6036750 DOI: 10.7150/ijbs.24548] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2017] [Accepted: 02/02/2018] [Indexed: 12/13/2022] Open
Abstract
It is a basic task in high-throughput gene expression profiling studies to identify differentially expressed genes (DEGs) between two phenotypes. But the weakly differential expression signals between two phenotypes are hardly detectable with limited sample sizes. To solve this problem, many researchers tried to combine multiple independent datasets using meta-analysis or batch effect adjustment algorithms. However, these algorithms may distort true biological differences between two phenotypes and introduce unacceptable high false rates, as demonstrated in this study. These problems pose critical obstacles for analyzing the transcriptional data in The Cancer Genome Atlas where there are many small-scale batches of data. Previously, we developed RankComp to detect DEGs for individual disease samples through exploiting the incongruous relative expression orderings between two phenotypes and further improved it here to identify DEGs using multiple independent datasets. We demonstrated the improved RankComp can directly analyze integrated cross-site data to detect DEGs between two phenotypes without the need of batch effect adjustments. Its usage was illustrated in detecting weak differential expression signals of breast cancer drug-response data using combined datasets from multiple experiments.
Collapse
Affiliation(s)
- Hao Cai
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Xiangyu Li
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Jing Li
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Qirui Liang
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Weicheng Zheng
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China.,Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| | - Qingzhou Guan
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| | - Zheng Guo
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China.,Fujian Key Laboratory of Tumor Microbiology, Fujian Medical University, Fuzhou 350122, China.,Department of Systems Biology, College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150086, China
| | - Xianlong Wang
- Fujian Key Laboratory of Medical Bioinformatics, Key Laboratory of Ministry of Education for Gastrointestinal Cancer, Fujian Medical University, Fuzhou, 350122, China
| |
Collapse
|
10
|
Li B, Li Y, Qin ZS. Improving Hierarchical Models Using Historical Data with Applications in High-Throughput Genomics Data Analysis. STATISTICS IN BIOSCIENCES 2017; 9:73-90. [PMID: 28919931 DOI: 10.1007/s12561-016-9156-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern high-throughput biotechnologies such as microarray and next generation sequencing produce a massive amount of information for each sample assayed. However, in a typical high-throughput experiment, only limited amount of data are observed for each individual feature, thus the classical 'large p, small n' problem. Bayesian hierarchical model, capable of borrowing strength across features within the same dataset, has been recognized as an effective tool in analyzing such data. However, the shrinkage effect, the most prominent feature of hierarchical features, can lead to undesirable over-correction for some features. In this work, we discuss possible causes of the over-correction problem and propose several alternative solutions. Our strategy is rooted in the fact that in the Big Data era, large amount of historical data are available which should be taken advantage of. Our strategy presents a new framework to enhance the Bayesian hierarchical model. Through simulation and real data analysis, we demonstrated superior performance of the proposed strategy. Our new strategy also enables borrowing information across different platforms which could be extremely useful with emergence of new technologies and accumulation of data from different platforms in the Big Data era. Our method has been implemented in R package "adaptiveHM", which is freely available from https://github.com/benliemory/adaptiveHM.
Collapse
Affiliation(s)
- Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Yunxiao Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|