1
|
Nikolaou N, Salazar D, RaviPrakash H, Gonçalves M, Mulla R, Burlutskiy N, Markuzon N, Jacob E. A machine learning approach for multimodal data fusion for survival prediction in cancer patients. NPJ Precis Oncol 2025; 9:128. [PMID: 40325104 PMCID: PMC12053085 DOI: 10.1038/s41698-025-00917-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 04/19/2025] [Indexed: 05/07/2025] Open
Abstract
Technological advancements of the past decade have transformed cancer research, improving patient survival predictions through genotyping and multimodal data analysis. However, there is no comprehensive machine-learning pipeline for comparing methods to enhance these predictions. To address this, a versatile pipeline using The Cancer Genome Atlas (TCGA) data was developed, incorporating various data modalities such as transcripts, proteins, metabolites, and clinical factors. This approach manages challenges like high dimensionality, small sample sizes, and data heterogeneity. By applying different feature extraction and fusion strategies, notably late fusion models, the effectiveness of integrating diverse data types was demonstrated. Late fusion models consistently outperformed single-modality approaches in TCGA lung, breast, and pan-cancer datasets, offering higher accuracy and robustness. This research highlights the potential of comprehensive multimodal data integration in precision oncology to improve survival predictions for cancer patients. The study provides a reusable pipeline for the research community, suggesting future work on larger cohorts.
Collapse
Affiliation(s)
- Nikolaos Nikolaou
- Oncology Data Science, Oncology R&D, AstraZeneca, Cambridge, UK
- Department of Physics & Astronomy, University College London, London, UK
| | - Domingo Salazar
- Oncology Data Science, Oncology R&D, AstraZeneca, Cambridge, UK
| | | | | | - Rob Mulla
- Oncology Data Science, Oncology R&D, AstraZeneca, Waltham, MA, USA
| | | | - Natasha Markuzon
- Oncology Data Science, Oncology R&D, AstraZeneca, Waltham, MA, USA.
| | - Etai Jacob
- Oncology Data Science, Oncology R&D, AstraZeneca, Waltham, MA, USA.
| |
Collapse
|
2
|
Magateshvaren Saras MA, Mitra MK, Tyagi S. Navigating the Multiverse: a Hitchhiker's guide to selecting harmonization methods for multimodal biomedical data. Biol Methods Protoc 2025; 10:bpaf028. [PMID: 40308831 PMCID: PMC12043205 DOI: 10.1093/biomethods/bpaf028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Revised: 03/20/2025] [Accepted: 04/15/2025] [Indexed: 05/02/2025] Open
Abstract
The application of machine learning (ML) techniques in predictive modelling has greatly advanced our comprehension of biological systems. There is a notable shift in the trend towards integration methods that specifically target the simultaneous analysis of multiple modes or types of data, showcasing superior results compared to individual analyses. Despite the availability of diverse ML architectures for researchers interested in embracing a multimodal approach, the current literature lacks a comprehensive taxonomy that includes the pros and cons of these methods to guide the entire process. Closing this gap is imperative, necessitating the creation of a robust framework. This framework should not only categorize the diverse ML architectures suitable for multimodal analysis but also offer insights into their respective advantages and limitations. Additionally, such a framework can serve as a valuable guide for selecting an appropriate workflow for multimodal analysis. This comprehensive taxonomy would provide a clear guidance and support informed decision-making within the progressively intricate landscape of biomedical and clinical data analysis. This is an essential step towards advancing personalized medicine. The aims of the work are to comprehensively study and describe the harmonization processes that are performed and reported in the literature and present a working guide that would enable planning and selecting an appropriate integrative model. We present harmonization as a dual process of representation and integration, each with multiple methods and categories. The taxonomy of the various representation and integration methods are classified into six broad categories and detailed with the advantages, disadvantages and examples. A guide flowchart describing the step-by-step processes that are needed to adopt a multimodal approach is also presented along with examples and references. This review provides a thorough taxonomy of methods for harmonizing multimodal data and introduces a foundational 10-step guide for newcomers to implement a multimodal workflow.
Collapse
Affiliation(s)
- Murali Aadhitya Magateshvaren Saras
- IITB-Monash Research Academy, Mumbai, Maharashtra 400076, India
- Department of Physics, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India
- School of Translational Medicine, Monash University, Melbourne, Victoria 3181, Australia
| | - Mithun K Mitra
- Department of Physics, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India
| | - Sonika Tyagi
- School of Translational Medicine, Monash University, Melbourne, Victoria 3181, Australia
- School of Computing Technologies, RMIT University, Melbourne, Victoria 3001, Australia
| |
Collapse
|
3
|
Mora A, Schmidt C, Balderson B, Frezza C, Bodén M. SiRCle (Signature Regulatory Clustering) model integration reveals mechanisms of phenotype regulation in renal cancer. Genome Med 2024; 16:144. [PMID: 39633487 PMCID: PMC11616309 DOI: 10.1186/s13073-024-01415-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 11/18/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND Clear cell renal cell carcinoma (ccRCC) tumours develop and progress via complex remodelling of the kidney epigenome, transcriptome, proteome and metabolome. Given the subsequent tumour and inter-patient heterogeneity, drug-based treatments report limited success, calling for multi-omics studies to extract regulatory relationships, and ultimately, to develop targeted therapies. Yet, methods for multi-omics integration to reveal mechanisms of phenotype regulation are lacking. METHODS Here, we present SiRCle (Signature Regulatory Clustering), a method to integrate DNA methylation, RNA-seq and proteomics data at the gene level by following central dogma of biology, i.e. genetic information proceeds from DNA, to RNA, to protein. To identify regulatory clusters across the different omics layers, we group genes based on the layer where the gene's dysregulation first occurred. We combine the SiRCle clusters with a variational autoencoder (VAE) to reveal key features from omics' data for each SiRCle cluster and compare patient subpopulations in a ccRCC and a PanCan cohort. RESULTS Applying SiRCle to a ccRCC cohort, we showed that glycolysis is upregulated by DNA hypomethylation, whilst mitochondrial enzymes and respiratory chain complexes are translationally suppressed. Additionally, we identify metabolic enzymes associated with survival along with the possible molecular driver behind the gene's perturbations. By using the VAE to integrate omics' data followed by statistical comparisons between tumour stages on the integrated space, we found a stage-dependent downregulation of proximal renal tubule genes, hinting at a loss of cellular identity in cancer cells. We also identified the regulatory layers responsible for their suppression. Lastly, we applied SiRCle to a PanCan cohort and found common signatures across ccRCC and PanCan in addition to the regulatory layer that defines tissue identity. CONCLUSIONS Our results highlight SiRCle's ability to reveal mechanisms of phenotype regulation in cancer, both specifically in ccRCC and broadly in a PanCan context. SiRCle ranks genes according to biological features. https://github.com/ArianeMora/SiRCle_multiomics_integration .
Collapse
Affiliation(s)
- Ariane Mora
- School of Chemistry and Molecular Biosciences, University of Queensland, Molecular Biosciences Building 76, St Lucia, QLD, 4072, Australia
| | - Christina Schmidt
- Medical Research Council Cancer Unit, Hutchison/MRC Research Centre, University of Cambridge, Cambridge Biomedical Campus, Box 197, Cambridge, CB2 0X2, UK
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Institute for Metabolomics in Ageing, Cluster of Excellence Cellular Stress Responses in Aging-associated Diseases (CECAD), Joseph-Stelzmann-Str. 26, Cologne, 50931, Germany
| | - Brad Balderson
- School of Chemistry and Molecular Biosciences, University of Queensland, Molecular Biosciences Building 76, St Lucia, QLD, 4072, Australia
| | - Christian Frezza
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Institute for Metabolomics in Ageing, Cluster of Excellence Cellular Stress Responses in Aging-associated Diseases (CECAD), Joseph-Stelzmann-Str. 26, Cologne, 50931, Germany.
- University of Cologne, Faculty of Mathematics and Natural Sciences, Institute of Genetics, Cluster of Excellence Cellular Stress Responses in Aging-associated Diseases (CECAD), Cologne, Germany.
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, Molecular Biosciences Building 76, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
4
|
Liu X, Shi J, Jiao Y, An J, Tian J, Yang Y, Zhuo L. Integrated multi-omics with machine learning to uncover the intricacies of kidney disease. Brief Bioinform 2024; 25:bbae364. [PMID: 39082652 PMCID: PMC11289682 DOI: 10.1093/bib/bbae364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 06/20/2024] [Accepted: 07/17/2024] [Indexed: 08/03/2024] Open
Abstract
The development of omics technologies has driven a profound expansion in the scale of biological data and the increased complexity in internal dimensions, prompting the utilization of machine learning (ML) as a powerful toolkit for extracting knowledge and understanding underlying biological patterns. Kidney disease represents one of the major growing global health threats with intricate pathogenic mechanisms and a lack of precise molecular pathology-based therapeutic modalities. Accordingly, there is a need for advanced high-throughput approaches to capture implicit molecular features and complement current experiments and statistics. This review aims to delineate strategies for integrating multi-omics data with appropriate ML methods, highlighting key clinical translational scenarios, including predicting disease progression risks to improve medical decision-making, comprehensively understanding disease molecular mechanisms, and practical applications of image recognition in renal digital pathology. Examining the benefits and challenges of current integration efforts is expected to shed light on the complexity of kidney disease and advance clinical practice.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Li Zhuo
- Corresponding author. Department of Nephrology, China-Japan Friendship Hospital, Beijing 100029, China; China-Japan Friendship Clinic Medical College, Beijing University of Chinese Medicine, 100029 Beijing, China. E-mail:
| |
Collapse
|
5
|
Mora A, Rakar J, Cobeta IM, Salmani BY, Starkenberg A, Thor S, Bodén M. Variational autoencoding of gene landscapes during mouse CNS development uncovers layered roles of Polycomb Repressor Complex 2. Nucleic Acids Res 2022; 50:1280-1296. [PMID: 35048973 PMCID: PMC8860581 DOI: 10.1093/nar/gkac006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/22/2021] [Accepted: 01/05/2022] [Indexed: 12/13/2022] Open
Abstract
A prominent aspect of most, if not all, central nervous systems (CNSs) is that anterior regions (brain) are larger than posterior ones (spinal cord). Studies in Drosophila and mouse have revealed that Polycomb Repressor Complex 2 (PRC2), a protein complex responsible for applying key repressive histone modifications, acts by several mechanisms to promote anterior CNS expansion. However, it is unclear what the full spectrum of PRC2 action is during embryonic CNS development and how PRC2 intersects with the epigenetic landscape. We removed PRC2 function from the developing mouse CNS, by mutating the key gene Eed, and generated spatio-temporal transcriptomic data. To decode the role of PRC2, we developed a method that incorporates standard statistical analyses with probabilistic deep learning to integrate the transcriptomic response to PRC2 inactivation with epigenetic data. This multi-variate analysis corroborates the central involvement of PRC2 in anterior CNS expansion, and also identifies several unanticipated cohorts of genes, such as proliferation and immune response genes. Furthermore, the analysis reveals specific profiles of regulation via PRC2 upon these gene cohorts. These findings uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion. To support the analysis of emerging multi-modal datasets, we provide a novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes.
Collapse
Affiliation(s)
- Ariane Mora
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Jonathan Rakar
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Ignacio Monedero Cobeta
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
- Department of Physiology, Universidad Autonoma de Madrid, Madrid, Spain
| | - Behzad Yaghmaeian Salmani
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
- Department of Cell and Molecular Biology, Karolinska Institute, SE-171 65 Stockholm, Sweden
| | - Annika Starkenberg
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
| | - Stefan Thor
- Department of Clinical and Experimental Medicine, Linköping University, SE-58185 Linköping, Sweden
- School of Biomedical Sciences, University of Queensland, St Lucia, QLD 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
6
|
Trapotsi MA, Hosseini-Gerami L, Bender A. Computational analyses of mechanism of action (MoA): data, methods and integration. RSC Chem Biol 2022; 3:170-200. [PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/09/2021] [Indexed: 12/15/2022] Open
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration.
Collapse
Affiliation(s)
- Maria-Anna Trapotsi
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Layla Hosseini-Gerami
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| | - Andreas Bender
- Centre for Molecular Informatics, Yusuf Hamied Department of Chemistry, University of Cambridge UK
| |
Collapse
|
7
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
8
|
Zuo Y, Wei D, Zhu C, Naveed O, Hong W, Yang X. Unveiling the Pathogenesis of Psychiatric Disorders Using Network Models. Genes (Basel) 2021; 12:1101. [PMID: 34356117 PMCID: PMC8304351 DOI: 10.3390/genes12071101] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/15/2021] [Accepted: 07/16/2021] [Indexed: 01/13/2023] Open
Abstract
Psychiatric disorders are complex brain disorders with a high degree of genetic heterogeneity, affecting millions of people worldwide. Despite advances in psychiatric genetics, the underlying pathogenic mechanisms of psychiatric disorders are still largely elusive, which impedes the development of novel rational therapies. There has been accumulating evidence suggesting that the genetics of complex disorders can be viewed through an omnigenic lens, which involves contextualizing genes in highly interconnected networks. Thus, applying network-based multi-omics integration methods could cast new light on the pathophysiology of psychiatric disorders. In this review, we first provide an overview of the recent advances in psychiatric genetics and highlight gaps in translating molecular associations into mechanistic insights. We then present an overview of network methodologies and review previous applications of network methods in the study of psychiatric disorders. Lastly, we describe the potential of such methodologies within a multi-tissue, multi-omics approach, and summarize the future directions in adopting diverse network approaches.
Collapse
Affiliation(s)
- Yanning Zuo
- Department of Biological Chemistry, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA; (Y.Z.); (D.W.); (W.H.)
- Department of Neurobiology, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
- Department of Integrative Biology and Physiology, University of California at Los Angeles, Los Angeles, CA 90095, USA; (C.Z.); (O.N.)
| | - Don Wei
- Department of Biological Chemistry, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA; (Y.Z.); (D.W.); (W.H.)
- Department of Neurobiology, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
- Department of Psychiatry, Semel Institute, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Carissa Zhu
- Department of Integrative Biology and Physiology, University of California at Los Angeles, Los Angeles, CA 90095, USA; (C.Z.); (O.N.)
| | - Ormina Naveed
- Department of Integrative Biology and Physiology, University of California at Los Angeles, Los Angeles, CA 90095, USA; (C.Z.); (O.N.)
| | - Weizhe Hong
- Department of Biological Chemistry, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA; (Y.Z.); (D.W.); (W.H.)
- Department of Neurobiology, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
- Brain Research Institute, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California at Los Angeles, Los Angeles, CA 90095, USA; (C.Z.); (O.N.)
- Brain Research Institute, University of California at Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California at Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
9
|
Philip M, Chen T, Tyagi S. A Survey of Current Resources to Study lncRNA-Protein Interactions. Noncoding RNA 2021; 7:ncrna7020033. [PMID: 34201302 PMCID: PMC8293367 DOI: 10.3390/ncrna7020033] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 05/28/2021] [Accepted: 06/07/2021] [Indexed: 12/15/2022] Open
Abstract
Phenotypes are driven by regulated gene expression, which in turn are mediated by complex interactions between diverse biological molecules. Protein-DNA interactions such as histone and transcription factor binding are well studied, along with RNA-RNA interactions in short RNA silencing of genes. In contrast, lncRNA-protein interaction (LPI) mechanisms are comparatively unknown, likely directed by the difficulties in studying LPI. However, LPI are emerging as key interactions in epigenetic mechanisms, playing a role in development and disease. Their importance is further highlighted by their conservation across kingdoms. Hence, interest in LPI research is increasing. We therefore review the current state of the art in lncRNA-protein interactions. We specifically surveyed recent computational methods and databases which researchers can exploit for LPI investigation. We discovered that algorithm development is heavily reliant on a few generic databases containing curated LPI information. Additionally, these databases house information at gene-level as opposed to transcript-level annotations. We show that early methods predict LPI using molecular docking, have limited scope and are slow, creating a data processing bottleneck. Recently, machine learning has become the strategy of choice in LPI prediction, likely due to the rapid growth in machine learning infrastructure and expertise. While many of these methods have notable limitations, machine learning is expected to be the basis of modern LPI prediction algorithms.
Collapse
Affiliation(s)
- Melcy Philip
- School of Biological Sciences, Monash University, 25 Rainforest Walk, Clayton, VIC 3800, Australia; (M.P.); (T.C.)
| | - Tyrone Chen
- School of Biological Sciences, Monash University, 25 Rainforest Walk, Clayton, VIC 3800, Australia; (M.P.); (T.C.)
| | - Sonika Tyagi
- School of Biological Sciences, Monash University, 25 Rainforest Walk, Clayton, VIC 3800, Australia; (M.P.); (T.C.)
- Monash eResearch Centre, Monash University, Clayton, VIC 3800, Australia
- Department of Infectious Disease, Monash University (Alfred Campus), 85 Commercial Road, Melbourne, VIC 3004, Australia
- Correspondence:
| |
Collapse
|
10
|
Chen T, Philip M, Lê Cao KA, Tyagi S. A multi-modal data harmonisation approach for discovery of COVID-19 drug targets. Brief Bioinform 2021; 22:6279836. [PMID: 34036326 PMCID: PMC8194516 DOI: 10.1093/bib/bbab185] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 03/09/2021] [Accepted: 04/22/2021] [Indexed: 12/27/2022] Open
Abstract
Despite the volume of experiments performed and data available, the complex biology of coronavirus SARS-COV-2 is not yet fully understood. Existing molecular profiling studies have focused on analysing functional omics data of a single type, which captures changes in a small subset of the molecular perturbations caused by the virus. As the logical next step, results from multiple such omics analysis may be aggregated to comprehensively interpret the molecular mechanisms of SARS-CoV-2. An alternative approach is to integrate data simultaneously in a parallel fashion to highlight the inter-relationships of disease-driving biomolecules, in contrast to comparing processed information from each omics level separately. We demonstrate that valuable information may be masked by using the former fragmented views in analysis, and biomarkers resulting from such an approach cannot provide a systematic understanding of the disease aetiology. Hence, we present a generic, reproducible and flexible open-access data harmonisation framework that can be scaled out to future multi-omics analysis to study a phenotype in a holistic manner. The pipeline source code, detailed documentation and automated version as a R package are accessible. To demonstrate the effectiveness of our pipeline, we applied it to a drug screening task. We integrated multi-omics data to find the lowest level of statistical associations between data features in two case studies. Strongly correlated features within each of these two datasets were used for drug-target analysis, resulting in a list of 84 drug-target candidates. Further computational docking and toxicity analyses revealed seven high-confidence targets, amsacrine, bosutinib, ceritinib, crizotinib, nintedanib and sunitinib as potential starting points for drug therapy and development.
Collapse
Affiliation(s)
- Tyrone Chen
- School of Biological Sciences, Monash University, 25 Rainforest Walk, 3800, VIC, Australia
| | - Melcy Philip
- School of Biological Sciences, Monash University, 25 Rainforest Walk, 3800, VIC, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, University of Melbourne, Building 184, Royal Parade, 3010, VIC, Australia.,School of Mathematics and Statistics, University of Melbourne, 813 Swanston Street, 3010, VIC, Australia
| | - Sonika Tyagi
- School of Biological Sciences, Monash University, 25 Rainforest Walk, 3800, VIC, Australia.,Monash eResearch Centre, Monash University, 15 Innovation Walk, 3800, VIC, Australia.,Department of Infectious Disease, Monash University, 85 Commercial Road, 3004, VIC, Australia
| |
Collapse
|
11
|
Pu Z, Xu M, Yuan X, Xie H, Zhao J. Circular RNA circCUL3 Accelerates the Warburg Effect Progression of Gastric Cancer through Regulating the STAT3/HK2 Axis. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 22:310-318. [PMID: 33230436 PMCID: PMC7527579 DOI: 10.1016/j.omtn.2020.08.023] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 08/20/2020] [Indexed: 02/06/2023]
Abstract
The Warburg effect is a significant hallmark of gastric cancer (GC), and increasing evidence emphasizes the crucial role of circular RNAs (circRNAs) in GC tumorigenesis. However, the precise molecular mechanisms by which circRNAs drive the GC Warburg effect are still elusive. The present study was designed to unveil the roles of circRNAs and the corresponding potential mechanism. High-regulated expression of circCUL3 was observed in both GC tissues and cell lines. Clinically, the high expression of circCUL3 was closely correlated with advanced clinical stage and overall survival in GC patients. Functionally, cellular experimental investigations demonstrated that circCUL3 promoted the proliferation, glucose consumption, lactate production, ATP quantity, and extracellular acidification rate (ECAR) of GC cells. In vivo, circCUL3 knockdown repressed tumor growth. Mechanistic analysis demonstrated that circCUL3 promoted signal transducer and activator of transcription (STAT)3 expression through sponging miR-515-5p; moreover, transcription factor STAT3 accelerated the transcriptional level of hexokinase 2 (HK2). In summary, the present findings provide mechanistic insights into circCUL3/miR-515-5p/STAT3/HK2 axis regulation on the GC Warburg effect, providing a novel possibility for an understanding of GC pathogenesis.
Collapse
Affiliation(s)
- Zhichen Pu
- Department of Drug Clinical Evaluation Center, Yijishan Hospital of Wannan Medical College, Wuhu, Anhui 241001, China
| | - Maodi Xu
- Department of Drug Clinical Evaluation Center, Yijishan Hospital of Wannan Medical College, Wuhu, Anhui 241001, China
| | - Xiaolong Yuan
- Department of Pharmacy, Second Affiliated Hospital of Wannan Medical College, Wuhu, Anhui 241001, China.,Vascular Diseases Research Center of Wannan Medical College, Wuhu, Anhui 241001, China
| | - Haitang Xie
- Department of Drug Clinical Evaluation Center, Yijishan Hospital of Wannan Medical College, Wuhu, Anhui 241001, China
| | - Jun Zhao
- Department of Gastrointestinal Surgery, Yijishan Hospital of Wannan Medical College, Wuhu, Anhui 241001, China
| |
Collapse
|