1
|
Zhang Y, Qiu L, Ren Y, Cheng Z, Li L, Yao S, Zhang C, Luo Z, Lu H. A meta-learning approach to improving radiation response prediction in cancers. Comput Biol Med 2022; 150:106163. [PMID: 37070625 DOI: 10.1016/j.compbiomed.2022.106163] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/18/2022] [Accepted: 10/01/2022] [Indexed: 11/03/2022]
Abstract
PURPOSE Predicting the efficacy of radiotherapy in individual patients has drawn widespread attention, but the limited sample size remains a bottleneck for utilizing high-dimensional multi-omics data to guide personalized radiotherapy. We hypothesize the recently developed meta-learning framework could address this limitation. METHODS AND MATERIALS By combining gene expression, DNA methylation, and clinical data of 806 patients who had received radiotherapy from The Cancer Genome Atlas (TCGA), we applied the Model-Agnostic Meta-Learning (MAML) framework to tasks consisting of pan-cancer data, to obtain the best initial parameters of a neural network for a specific cancer with smaller number of samples. The performance of meta-learning framework was compared with four traditional machine learning methods based on two training schemes, and tested on Cancer Cell Line Encyclopedia (CCLE) and Chinese Glioma Genome Atlas (CGGA) datasets. Moreover, biological significance of the models was investigated by survival analysis and feature interpretation. RESULTS The mean AUC (Area under the ROC Curve) [95% confidence interval] of our models across nine cancer types was 0.702 [0.691-0.713], which improved by 0.166 on average over other the four machine learning methods on two training schemes. Our models performed significantly better (p < 0.05) in seven cancer types and performed comparable to the other predictors in the rest of two cancer types. The more pan-cancer samples were used to transfer meta-knowledge, the greater the performance improved (p < 0.05). The predicted response scores that our models generated were negatively correlated with cell radiosensitivity index in four cancer types (p < 0.05), while not statistically significant in the other three cancer types. Moreover, the predicted response scores were shown to be prognostic factors in seven cancer types and eight potential radiosensitivity-related genes were identified. CONCLUSIONS For the first time, we established the meta-learning approach to improving individual radiation response prediction by transferring common knowledge from pan-cancer data with MAML framework. The results demonstrated the superiority, generalizability, and biological significance of our approach.
Collapse
|
2
|
Inferring Time-Lagged Causality Using the Derivative of Single-Cell Expression. Int J Mol Sci 2022; 23:ijms23063348. [PMID: 35328768 PMCID: PMC8948830 DOI: 10.3390/ijms23063348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2021] [Revised: 01/07/2022] [Accepted: 01/11/2022] [Indexed: 12/10/2022] Open
Abstract
Many computational methods have been developed to infer causality among genes using cross-sectional gene expression data, such as single-cell RNA sequencing (scRNA-seq) data. However, due to the limitations of scRNA-seq technologies, time-lagged causal relationships may be missed by existing methods. In this work, we propose a method, called causal inference with time-lagged information (CITL), to infer time-lagged causal relationships from scRNA-seq data by assessing the conditional independence between the changing and current expression levels of genes. CITL estimates the changing expression levels of genes by “RNA velocity”. We demonstrate the accuracy and stability of CITL for inferring time-lagged causality on simulation data against other leading approaches. We have applied CITL to real scRNA data and inferred 878 pairs of time-lagged causal relationships. Furthermore, we showed that the number of regulatory relationships identified by CITL was significantly more than that expected by chance. We provide an R package and a command-line tool of CITL for different usage scenarios.
Collapse
|
3
|
Angeloni M, Thievessen I, Engel FB, Magni P, Ferrazzi F. Functional genomics meta-analysis to identify gene set enrichment networks in cardiac hypertrophy. Biol Chem 2021; 402:953-972. [PMID: 33951759 DOI: 10.1515/hsz-2020-0378] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/19/2021] [Indexed: 12/28/2022]
Abstract
In order to take advantage of the continuously increasing number of transcriptome studies, it is important to develop strategies that integrate multiple expression datasets addressing the same biological question to allow a robust analysis. Here, we propose a meta-analysis framework that integrates enriched pathways identified through the Gene Set Enrichment Analysis (GSEA) approach and calculates for each meta-pathway an empirical p-value. Validation of our approach on benchmark datasets showed comparable or even better performance than existing methods and an increase in robustness with increasing number of integrated datasets. We then applied the meta-analysis framework to 15 functional genomics datasets of physiological and pathological cardiac hypertrophy. Within these datasets we grouped expression sets measured at time points that represent the same hallmarks of heart tissue remodeling ('aggregated time points') and performed meta-analysis on the expression sets assigned to each aggregated time point. To facilitate biological interpretation, results were visualized as gene set enrichment networks. Here, our meta-analysis framework identified well-known biological mechanisms associated with pathological cardiac hypertrophy (e.g., cardiomyocyte apoptosis, cardiac contractile dysfunction, and alteration in energy metabolism). In addition, results highlighted novel, potentially cardioprotective mechanisms in physiological cardiac hypertrophy involving the down-regulation of immune cell response, which are worth further investigation.
Collapse
Affiliation(s)
- Miriam Angeloni
- Department of Nephropathology, Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Krankenhausstr. 8-10, D-91054 Erlangen, Germany
- Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Krankenhausstr. 8-10, D-91054 Erlangen, Germany
| | - Ingo Thievessen
- Biophysics Group, Department of Physics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Henkestraße 91, D-91052 Erlangen, Germany
- Muscle Research Center Erlangen (MURCE), D-91052 Erlangen, Germany
| | - Felix B Engel
- Experimental Renal and Cardiovascular Research, Department of Nephropathology, Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Schwabachanlage 12, D-91054 Erlangen, Germany
- Muscle Research Center Erlangen (MURCE), D-91052 Erlangen, Germany
| | - Paolo Magni
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, via Ferrata 5, I-27100 Pavia, Italy
| | - Fulvia Ferrazzi
- Department of Nephropathology, Institute of Pathology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Krankenhausstr. 8-10, D-91054 Erlangen, Germany
- Muscle Research Center Erlangen (MURCE), D-91052 Erlangen, Germany
| |
Collapse
|
4
|
Wang W, Langlois R, Langlois M, Genchev GZ, Wang X, Lu H. Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins. Front Genet 2019; 10:729. [PMID: 31543893 PMCID: PMC6729729 DOI: 10.3389/fgene.2019.00729] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/11/2019] [Indexed: 12/27/2022] Open
Abstract
Function annotation efforts provide a foundation to our understanding of cellular processes and the functioning of the living cell. This motivates high-throughput computational methods to characterize new protein members of a particular function. Research work has focused on discriminative machine-learning methods, which promise to make efficient, de novo predictions of protein function. Furthermore, available function annotation exists predominantly for individual proteins rather than residues of which only a subset is necessary for the conveyance of a particular function. This limits discriminative approaches to predicting functions for which there is sufficient residue-level annotation, e.g., identification of DNA-binding proteins or where an excellent global representation can be divined. Complete understanding of the various functions of proteins requires discovery and functional annotation at the residue level. Herein, we cast this problem into the setting of multiple-instance learning, which only requires knowledge of the protein’s function yet identifies functionally relevant residues and need not rely on homology. We developed a new multiple-instance leaning algorithm derived from AdaBoost and benchmarked this algorithm against two well-studied protein function prediction tasks: annotating proteins that bind DNA and RNA. This algorithm outperforms certain previous approaches in annotating protein function while identifying functionally relevant residues involved in binding both DNA and RNA, and on one protein-DNA benchmark, it achieves near perfect classification.
Collapse
Affiliation(s)
- Wenchuan Wang
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas
| | - Robert Langlois
- Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Marina Langlois
- Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Georgi Z Genchev
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas.,Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States.,Bulgarian Institute for Genomics and Precision Medicine, Sofia, Bulgaria
| | - Xiaolei Wang
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas.,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
| | - Hui Lu
- SJTU-Yale Joint Center for Biostatistics and Data Science, Department of Bioinformatics and Biostatistics, College of Life Science and Biotechnology, Shanghai Jiao Tong University, Shanghai, Chinas.,Department of Bioengineering and Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States.,Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai, China
| |
Collapse
|
5
|
Qin W, Wang X, Zhao H, Lu H. A Novel Joint Gene Set Analysis Framework Improves Identification of Enriched Pathways in Cross Disease Transcriptomic Analysis. Front Genet 2019; 10:293. [PMID: 31031796 PMCID: PMC6473067 DOI: 10.3389/fgene.2019.00293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/19/2019] [Indexed: 12/25/2022] Open
Abstract
Motivation: Gene set enrichment analysis is a widely accepted expression analysis tool which aims at detecting coordinated expression change within a pre-defined gene sets rather than individual genes. The benefit of gene set analysis over individual differentially expressed (DE) gene analysis includes more reproducible and interpretable results and detecting small but consistent change among gene set which could not be detected by DE gene analysis. There have been many successful gene set analysis applications in human diseases. However, when the sample size of a disease study is small and no other public data sets of the same disease are available, it will lead to lack of power to detect pathways of importance to the disease. Results: We have developed a novel joint gene set analysis statistical framework which aims at improving the power of identifying enriched gene sets through integrating multiple similar disease data sets. Through comprehensive simulation studies, we demonstrated that our proposed frameworks obtained much better AUC scores than single data set analysis and another meta-analysis method in identification of enriched pathways. When applied to two real data sets, the proposed framework could retain the enriched gene sets identified by single data set analysis and exclusively obtained up to 200% more disease-related gene sets demonstrating the improved identification power through information shared between similar diseases. We expect that the proposed framework would enable researchers to better explore public data sets when the sample size of their study is limited.
Collapse
Affiliation(s)
- Wenyi Qin
- Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai Jiaotong University, Shanghai, China
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
- Department of Genetics, School of Medicine, Yale University, New Haven, CT, United States
| | - Xujun Wang
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiaotong University, Shanghai, China
| | - Hongyu Zhao
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiaotong University, Shanghai, China
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, United States
| | - Hui Lu
- Center for Biomedical Informatics, Shanghai Children's Hospital, Shanghai Jiaotong University, Shanghai, China
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, United States
- Department of Bioinformatics and Biostatistics, SJTU-Yale Joint Center for Biostatistics, Shanghai Jiaotong University, Shanghai, China
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, United States
| |
Collapse
|
6
|
Gong Z, Ma Q, Wang X, Cai Q, Gong X, Genchev GZ, Lu H, Zeng F. A Herpes Simplex Virus Thymidine Kinase-Induced Mouse Model of Hepatocellular Carcinoma Associated with Up-Regulated Immune-Inflammatory-Related Signals. Genes (Basel) 2018; 9:E380. [PMID: 30060537 PMCID: PMC6115908 DOI: 10.3390/genes9080380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 07/19/2018] [Accepted: 07/23/2018] [Indexed: 12/11/2022] Open
Abstract
Inflammation and fibrosis in human liver are often precursors to hepatocellular carcinoma (HCC), yet none of them is easily modeled in animals. We previously generated transgenic mice with hepatocyte-specific expressed herpes simplex virus thymidine kinase (HSV-tk). These mice would develop hepatitis with the administration of ganciclovir (GCV)(Zhang, 2005 #1). However, our HSV-tk transgenic mice developed hepatitis and HCC tumor as early as six months of age even without GCV administration. We analyzed the transcriptome of the HSV-tk HCC tumor and hepatitis tissue using microarray analysis to investigate the possible causes of HCC. Gene Ontology (GO) enrichment analysis showed that the up-regulated genes in the HCC tissue mainly include the immune-inflammatory and cell cycle genes. The down-regulated genes in HCC tumors are mainly concentrated in the regions related to lipid metabolism. Gene set enrichment analysis (GSEA) showed that immune-inflammatory-related signals in the HSV-tk mice are up-regulated compared to those in Notch mice. Our study suggests that the immune system and inflammation play an important role in HCC development in HSV-tk mice. Specifically, increased expression of immune-inflammatory-related genes is characteristic of HSV-tk mice and that inflammation-induced cell cycle activation maybe a precursory step to cancer. The HSV-tk mouse provides a suitable model for the study of the relationship between immune-inflammation and HCC, and their underlying mechanism for the development of therapeutic application in the future.
Collapse
Affiliation(s)
- Zhijuan Gong
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai 200040, China.
- Department of Histo-Embryology, Genetics and Developmental Biology, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
- Key Laboratory of Embryo Molecular Biology, Ministry of Health & Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China.
| | - Qingwen Ma
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai 200040, China.
- Key Laboratory of Embryo Molecular Biology, Ministry of Health & Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China.
| | - Xujun Wang
- SJTU-Yale Joint Center for Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China.
| | - Qin Cai
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai 200040, China.
- Key Laboratory of Embryo Molecular Biology, Ministry of Health & Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China.
| | - Xiuli Gong
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai 200040, China.
- Key Laboratory of Embryo Molecular Biology, Ministry of Health & Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China.
| | - Georgi Z Genchev
- SJTU-Yale Joint Center for Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China.
| | - Hui Lu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai 200040, China.
- Key Laboratory of Embryo Molecular Biology, Ministry of Health & Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China.
- SJTU-Yale Joint Center for Biostatistics, School of Life Science and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China.
| | - Fanyi Zeng
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai 200040, China.
- Department of Histo-Embryology, Genetics and Developmental Biology, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
- Key Laboratory of Embryo Molecular Biology, Ministry of Health & Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai 200040, China.
| |
Collapse
|