1
|
Hai Y, Ma J, Yang K, Wen Y. Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2023; 39:btad647. [PMID: 37882747 PMCID: PMC10627352 DOI: 10.1093/bioinformatics/btad647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/24/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Accurate disease risk prediction is an essential step in the modern quest for precision medicine. While high-dimensional multi-omics data have provided unprecedented data resources for prediction studies, their high-dimensionality and complex inter/intra-relationships have posed significant analytical challenges. RESULTS We proposed a two-step Bayesian linear mixed model framework (TBLMM) for risk prediction analysis on multi-omics data. TBLMM models the predictive effects from multi-omics data using a hybrid of the sparsity regression and linear mixed model with multiple random effects. It can resemble the shape of the true effect size distributions and accounts for non-linear, including interaction effects, among multi-omics data via kernel fusion. It infers its parameters via a computationally efficient variational Bayes algorithm. Through extensive simulation studies and the prediction analyses on the positron emission tomography imaging outcomes using data obtained from the Alzheimer's Disease Neuroimaging Initiative, we have demonstrated that TBLMM can consistently outperform the existing method in predicting the risk of complex traits. AVAILABILITY AND IMPLEMENTATION The corresponding R package is available on GitHub (https://github.com/YaluWen/TBLMM).
Collapse
Affiliation(s)
- Yang Hai
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Jixiang Ma
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Kaixin Yang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Yalu Wen
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
2
|
Yin W, Zhao SD, Liang F. Bayesian penalized Buckley-James method for high dimensional bivariate censored regression models. LIFETIME DATA ANALYSIS 2022; 28:282-318. [PMID: 35239126 DOI: 10.1007/s10985-022-09549-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 01/22/2022] [Indexed: 06/14/2023]
Abstract
For high dimensional gene expression data, one important goal is to identify a small number of genes that are associated with progression of the disease or survival of the patients. In this paper, we consider the problem of variable selection for multivariate survival data. We propose an estimation procedure for high dimensional accelerated failure time (AFT) models with bivariate censored data. The method extends the Buckley-James method by minimizing a penalized [Formula: see text] loss function with a penalty function induced from a bivariate spike-and-slab prior specification. In the proposed algorithm, censored observations are imputed using the Kaplan-Meier estimator, which avoids a parametric assumption on the error terms. Our empirical studies demonstrate that the proposed method provides better performance compared to the alternative procedures designed for univariate survival data regardless of whether the true events are correlated or not, and conceptualizes a formal way of handling bivariate survival data for AFT models. Findings from the analysis of a myeloma clinical trial using the proposed method are also presented.
Collapse
Affiliation(s)
- Wenjing Yin
- Department of Statistics, University of Illinois, Urbana-Champaign, Champaign, IL, USA
| | - Sihai Dave Zhao
- Department of Statistics, University of Illinois, Urbana-Champaign, Champaign, IL, USA
| | - Feng Liang
- Department of Statistics, University of Illinois, Urbana-Champaign, Champaign, IL, USA.
| |
Collapse
|
3
|
Yan D, Cai S, Bai L, Du Z, Li H, Sun P, Cao J, Yi N, Liu SB, Tang Z. Integration of immune and hypoxia gene signatures improves the prediction of radiosensitivity in breast cancer. Am J Cancer Res 2022; 12:1222-1240. [PMID: 35411250 PMCID: PMC8984882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 02/22/2022] [Indexed: 06/14/2023] Open
Abstract
Immunity and hypoxia are two important factors that affect the response of cancer patients to radiotherapy. At the same time, considering the limited predictive value of a single predictive model and the uncertainty of grouping patients near the cutoff value, we developed and validated a combined model based on immune- and hypoxia-related gene expression profiles to predict the radiosensitivity of breast cancer patients. This study was based on breast cancer data from The Cancer Genome Atlas (TCGA). Spike-and-slab Lasso regression analysis was performed to select three immune-related genes and develop a radiosensitivity model. Lasso Cox regression modeling selected 11 hypoxia-related genes for development of radiosensitivity model. Three independent datasets (Molecular Taxonomy of Breast Cancer International Consortium [METABRIC], E-TABM-158, GSE103746) were used to validate the predictive value of radiosensitivity signatures. In the TCGA dataset, the 10-year survival probabilities of the immune radioresistant (IRR) and hypoxia radioresistant (HRR) groups were 0.189 (0.037, 0.973) and 0.477 (0.293, 0.776), respectively. The 10-year survival probabilities of the immune radiosensitive (IRS) and hypoxia radiosensitive (HRS) groups were 0.778 (0.676, 0.895) and 0.824 (0.723, 0.939), respectively. Based on these two gene signatures, we further constructed a combined model and divided all patients into three groups (IRS/HRS, mixed, IRR/HRR). We identified the IRS/HRS patients most likely to benefit from radiotherapy; the 10-year survival probability was 0.886 (0.806, 0.976). The 10-year survival probability of the IRR/HRR group was 0. In conclusion, a combined model integrating immune- and hypoxia-related gene signatures could effectively predict the radiosensitivity of breast cancer and more accurately identify radiosensitive and radioresistant patients than a single model.
Collapse
Affiliation(s)
- Derui Yan
- Department of Biostatistics, School of Public Health, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
- Suzhou Key Laboratory of Medical Biotechnology, Suzhou Vocational Health CollegeSuzhou 215009, Jiangsu, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
| | - Shang Cai
- Department of Radiotherapy & Oncology, The Second Affiliated Hospital of Soochow UniversitySuzhou 215004, Jiangsu, China
| | - Lu Bai
- Department of Biostatistics, School of Public Health, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
- Suzhou Key Laboratory of Medical Biotechnology, Suzhou Vocational Health CollegeSuzhou 215009, Jiangsu, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
| | - Zixuan Du
- Department of Biostatistics, School of Public Health, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
| | - Huijun Li
- Department of Biostatistics, School of Public Health, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
| | - Peng Sun
- Department of Otolaryngology, The First Affiliated Hospital of Soochow UniversitySuzhou 215006, Jiangsu, China
| | - Jianping Cao
- School of Radiation Medicine and Protection and Collaborative Innovation Center of Radiation Medicine of Jiangsu Higher Education Institutions, Soochow UniversitySuzhou 215031, Jiangsu, China
| | - Nengjun Yi
- Department of Biostatistics, University of Alabama at BirminghamBirmingham, AL 35294, USA
| | - Song-Bai Liu
- Suzhou Key Laboratory of Medical Biotechnology, Suzhou Vocational Health CollegeSuzhou 215009, Jiangsu, China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
- Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Medical College of Soochow UniversitySuzhou 215123, Jiangsu, China
| |
Collapse
|
4
|
Chu J, Sun NA, Hu W, Chen X, Yi N, Shen Y. The Application of Bayesian Methods in Cancer Prognosis and Prediction. Cancer Genomics Proteomics 2022; 19:1-11. [PMID: 34949654 PMCID: PMC8717957 DOI: 10.21873/cgp.20298] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/24/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
With the development of high-throughput biological techniques, high-dimensional omics data have emerged. These molecular data provide a solid foundation for precision medicine and prognostic prediction of cancer. Bayesian methods contribute to constructing prognostic models with complex relationships in omics and improving performance by introducing different prior distribution, which is suitable for modelling the high-dimensional data involved. Using different omics, several Bayesian hierarchical approaches have been proposed for variable selection and model construction. In particular, the Bayesian methods of multi-omics integration have also been consistently proposed in recent years. Compared with single-omics, multi-omics integration modelling will contribute to improving predictive performance, gaining insights into the underlying mechanisms of tumour occurrence and development, and the discovery of more reliable biomarkers. In this work, we present a review of current proposed Bayesian approaches in prognostic prediction modelling in cancer.
Collapse
Affiliation(s)
- Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Xuanli Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, U.S.A
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China;
| |
Collapse
|
5
|
Xiong J, He W. Identification of survival relevant genes with measurement error in gene expression incorporated. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.2004424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Juan Xiong
- Health Science Center, Shengzhen University, Shengzhen, Guangdong, P. R. China
| | - Wenqing He
- University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
6
|
Ojavee SE, Kousathanas A, Trejo Banos D, Orliac EJ, Patxot M, Läll K, Mägi R, Fischer K, Kutalik Z, Robinson MR. Genomic architecture and prediction of censored time-to-event phenotypes with a Bayesian genome-wide analysis. Nat Commun 2021; 12:2337. [PMID: 33879782 PMCID: PMC8058085 DOI: 10.1038/s41467-021-22538-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 03/17/2021] [Indexed: 01/18/2023] Open
Abstract
While recent advancements in computation and modelling have improved the analysis of complex traits, our understanding of the genetic basis of the time at symptom onset remains limited. Here, we develop a Bayesian approach (BayesW) that provides probabilistic inference of the genetic architecture of age-at-onset phenotypes in a sampling scheme that facilitates biobank-scale time-to-event analyses. We show in extensive simulation work the benefits BayesW provides in terms of number of discoveries, model performance and genomic prediction. In the UK Biobank, we find many thousands of common genomic regions underlying the age-at-onset of high blood pressure (HBP), cardiac disease (CAD), and type-2 diabetes (T2D), and for the genetic basis of onset reflecting the underlying genetic liability to disease. Age-at-menopause and age-at-menarche are also highly polygenic, but with higher variance contributed by low frequency variants. Genomic prediction into the Estonian Biobank data shows that BayesW gives higher prediction accuracy than other approaches.
Collapse
Affiliation(s)
- Sven E Ojavee
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
| | | | - Daniel Trejo Banos
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Etienne J Orliac
- Scientific Computing and Research Support Unit, University of Lausanne, Lausanne, Switzerland
| | - Marion Patxot
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Kristi Läll
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Reedik Mägi
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Krista Fischer
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
- Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia
| | - Zoltan Kutalik
- University Center for Primary Care and Public Health, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
7
|
Hernaez M, Blatti C, Gevaert O. Comparison of single and module-based methods for modeling gene regulatory networks. Bioinformatics 2020; 36:558-567. [PMID: 31287491 DOI: 10.1093/bioinformatics/btz549] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 06/11/2019] [Accepted: 07/06/2019] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Gene regulatory networks describe the regulatory relationships among genes, and developing methods for reverse engineering these networks is an ongoing challenge in computational biology. The majority of the initially proposed methods for gene regulatory network discovery create a network of genes and then mine it in order to uncover previously unknown regulatory processes. More recent approaches have focused on inferring modules of co-regulated genes, linking these modules with regulatory genes and then mining them to discover new molecular biology. RESULTS In this work we analyze module-based network approaches to build gene regulatory networks, and compare their performance to single gene network approaches. In the process, we propose a novel approach to estimate gene regulatory networks drawing from the module-based methods. We show that generating modules of co-expressed genes which are predicted by a sparse set of regulators using a variational Bayes method, and then building a bipartite graph on the generated modules using sparse regression, yields more informative networks than previous single and module-based network approaches as measured by: (i) the rate of enriched gene sets, (ii) a network topology assessment, (iii) ChIP-Seq evidence and (iv) the KnowEnG Knowledge Network collection of previously characterized gene-gene interactions. AVAILABILITY AND IMPLEMENTATION The code is written in R and can be downloaded from https://github.com/mikelhernaez/linker. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikel Hernaez
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Charles Blatti
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Olivier Gevaert
- The Stanford Center of Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University.,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| |
Collapse
|
8
|
Teng J, Abdygametova A, Du J, Ma B, Zhou R, Shyr Y, Ye F. Bayesian Inference of Lymph Node Ratio Estimation and Survival Prognosis for Breast Cancer Patients. IEEE J Biomed Health Inform 2019; 24:354-364. [PMID: 31562112 DOI: 10.1109/jbhi.2019.2943401] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
OBJECTIVE We evaluated the prognostic value of lymph node ratio (LNR) for the survival of breast cancer patients using Bayesian inference. METHODS Data on 5,279 women with infiltrating duct and lobular carcinoma breast cancer, diagnosed from 2006-2010, was obtained from the NCI SEER Cancer Registry. A prognostic modeling framework was proposed using Bayesian inference to estimate the impact of LNR in breast cancer survival. Based on the proposed model, we then developed a web application for estimating LNR and predicting overall survival. RESULTS The final survival model with LNR outperformed the other models considered (C-statistic 0.71). Compared to directly measured LNR, estimated LNR slightly increased the accuracy of the prognostic model. Model diagnostics and predictive performance confirmed the effectiveness of Bayesian modeling and the prognostic value of the LNR in predicting breast cancer survival. CONCLUSION The estimated LNR was found to have a significant predictive value for the overall survival of breast cancer patients. SIGNIFICANCE We used Bayesian inference to estimate LNR which was then used to predict overall survival. The models were developed from a large population-based cancer registry. We also built a user-friendly web application for individual patient survival prognosis. The diagnostic value of the LNR and the effectiveness of the proposed model were evaluated by comparisons with existing prediction models.
Collapse
|