1
|
Nikolaou N, Salazar D, RaviPrakash H, Gonçalves M, Mulla R, Burlutskiy N, Markuzon N, Jacob E. A machine learning approach for multimodal data fusion for survival prediction in cancer patients. NPJ Precis Oncol 2025; 9:128. [PMID: 40325104 PMCID: PMC12053085 DOI: 10.1038/s41698-025-00917-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Accepted: 04/19/2025] [Indexed: 05/07/2025] Open
Abstract
Technological advancements of the past decade have transformed cancer research, improving patient survival predictions through genotyping and multimodal data analysis. However, there is no comprehensive machine-learning pipeline for comparing methods to enhance these predictions. To address this, a versatile pipeline using The Cancer Genome Atlas (TCGA) data was developed, incorporating various data modalities such as transcripts, proteins, metabolites, and clinical factors. This approach manages challenges like high dimensionality, small sample sizes, and data heterogeneity. By applying different feature extraction and fusion strategies, notably late fusion models, the effectiveness of integrating diverse data types was demonstrated. Late fusion models consistently outperformed single-modality approaches in TCGA lung, breast, and pan-cancer datasets, offering higher accuracy and robustness. This research highlights the potential of comprehensive multimodal data integration in precision oncology to improve survival predictions for cancer patients. The study provides a reusable pipeline for the research community, suggesting future work on larger cohorts.
Collapse
Affiliation(s)
- Nikolaos Nikolaou
- Oncology Data Science, Oncology R&D, AstraZeneca, Cambridge, UK
- Department of Physics & Astronomy, University College London, London, UK
| | - Domingo Salazar
- Oncology Data Science, Oncology R&D, AstraZeneca, Cambridge, UK
| | | | | | - Rob Mulla
- Oncology Data Science, Oncology R&D, AstraZeneca, Waltham, MA, USA
| | | | - Natasha Markuzon
- Oncology Data Science, Oncology R&D, AstraZeneca, Waltham, MA, USA.
| | - Etai Jacob
- Oncology Data Science, Oncology R&D, AstraZeneca, Waltham, MA, USA.
| |
Collapse
|
2
|
Su Q, Liu W, Liu X, Su P, Xie B. Bioinformatics-focused identification of metabolomic Markers in coronary microvascular disease. Comput Biol Med 2025; 189:109992. [PMID: 40068493 DOI: 10.1016/j.compbiomed.2025.109992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Revised: 03/02/2025] [Accepted: 03/04/2025] [Indexed: 04/01/2025]
Abstract
BACKGROUND Coronary microvascular disease (CMVD), marked by dysfunction of the small coronary vessels, poses significant diagnostic challenges due to the complexity and high cost of current procedures like the index of microcirculatory resistance (IMR). This study aimed to identify metabolomic biomarkers from coronary artery samples to facilitate CMVD diagnosis using advanced bioinformatics techniques-specifically, random forest algorithms and generalized linear models (GLMs)-to develop more cost-effective blood-based diagnostics. METHODS In this prospective study, 68 patients scheduled for coronary angiography and IMR assessment were enrolled. Plasma samples obtained from their coronary arteries were analyzed using untargeted metabolomics with liquid chromatography-mass spectrometry. Advanced bioinformatics methods were applied: random forest algorithms were utilized for feature selection to identify significant metabolites, and GLMs were constructed for predictive modeling. The diagnostic performance of the models was evaluated through receiver operating characteristic (ROC) curve analysis. RESULTS The random forest analysis identified the top 10 metabolites that significantly contributed to the classification of CMVD. The GLM built using these metabolites demonstrated excellent diagnostic accuracy, achieving area under the ROC curve (AUC) values of 0.984 in the initial (discovery) cohort and 0.938 in the subsequent (validation) cohort. The use of mathematical modeling enhanced the robustness and interpretability of the biomarker selection process. CONCLUSIONS Advanced bioinformatics techniques, including random forest algorithms and GLMs, effectively identified key metabolites associated with CMVD. While the collection of coronary artery blood samples is invasive due to the necessity of coronary angiography, this method offers a more practical and cost-effective alternative to IMR measurement, potentially improving the diagnostic approach for CMVD.
Collapse
Affiliation(s)
- Qing Su
- Department of Cardiology, Cardiovascualr Imaging Center, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Wenting Liu
- Department of Cardiology, Cardiovascualr Imaging Center, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Xiaoyan Liu
- Department of Cardiology, Cardiovascualr Imaging Center, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Pixiong Su
- Department of Cardiac Surgery, Cardiovascualr Imaging Center, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China
| | - Boqia Xie
- Department of Cardiology, Cardiovascualr Imaging Center, Beijing Chaoyang Hospital, Capital Medical University, Beijing, China.
| |
Collapse
|
3
|
Mukherjee A, Abraham S, Singh A, Balaji S, Mukunthan KS. From Data to Cure: A Comprehensive Exploration of Multi-omics Data Analysis for Targeted Therapies. Mol Biotechnol 2025; 67:1269-1289. [PMID: 38565775 PMCID: PMC11928429 DOI: 10.1007/s12033-024-01133-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
In the dynamic landscape of targeted therapeutics, drug discovery has pivoted towards understanding underlying disease mechanisms, placing a strong emphasis on molecular perturbations and target identification. This paradigm shift, crucial for drug discovery, is underpinned by big data, a transformative force in the current era. Omics data, characterized by its heterogeneity and enormity, has ushered biological and biomedical research into the big data domain. Acknowledging the significance of integrating diverse omics data strata, known as multi-omics studies, researchers delve into the intricate interrelationships among various omics layers. This review navigates the expansive omics landscape, showcasing tailored assays for each molecular layer through genomes to metabolomes. The sheer volume of data generated necessitates sophisticated informatics techniques, with machine-learning (ML) algorithms emerging as robust tools. These datasets not only refine disease classification but also enhance diagnostics and foster the development of targeted therapeutic strategies. Through the integration of high-throughput data, the review focuses on targeting and modeling multiple disease-regulated networks, validating interactions with multiple targets, and enhancing therapeutic potential using network pharmacology approaches. Ultimately, this exploration aims to illuminate the transformative impact of multi-omics in the big data era, shaping the future of biological research.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Suzanna Abraham
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - Akshita Singh
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - S Balaji
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India
| | - K S Mukunthan
- Department of Biotechnology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| |
Collapse
|
4
|
Llinas-Bertran A, Butjosa-Espín M, Barberi V, Seoane JA. Multimodal data integration in early-stage breast cancer. Breast 2025; 80:103892. [PMID: 39922065 PMCID: PMC11973824 DOI: 10.1016/j.breast.2025.103892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 12/13/2024] [Accepted: 01/27/2025] [Indexed: 02/10/2025] Open
Abstract
The use of biomarkers in breast cancer has significantly improved patient outcomes through targeted therapies, such as hormone therapy anti-Her2 therapy and CDK4/6 or PARP inhibitors. However, existing knowledge does not fully encompass the diverse nature of breast cancer, particularly in triple-negative tumors. The integration of multi-omics and multimodal data has the potential to provide new insights into biological processes, to improve breast cancer patient stratification, enhance prognosis and response prediction, and identify new biomarkers. This review presents a comprehensive overview of the state-of-the-art multimodal (including molecular and image) data integration algorithms developed and with applicability to breast cancer stratification, prognosis, or biomarker identification. We examined the primary challenges and opportunities of these multimodal data integration algorithms, including their advantages, limitations, and critical considerations for future research. We aimed to describe models that are not only academically and preclinically relevant, but also applicable to clinical settings.
Collapse
Affiliation(s)
- Arnau Llinas-Bertran
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Maria Butjosa-Espín
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Vittoria Barberi
- Breast Cancer Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Jose A Seoane
- Cancer Computational Biology Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain.
| |
Collapse
|
5
|
Lu Y, Liu C, Pang X, Chen X, Wang C, Huang H. Bioinformatic identification of signature miRNAs associated with fetoplacental vascular dysfunction in gestational diabetes mellitus. Biochem Biophys Rep 2025; 41:101888. [PMID: 39802395 PMCID: PMC11720096 DOI: 10.1016/j.bbrep.2024.101888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2024] [Revised: 11/09/2024] [Accepted: 11/20/2024] [Indexed: 01/16/2025] Open
Abstract
Background Intrauterine exposure to gestational diabetes mellitus (GDM) poses significant risks to fetal development and future metabolic health. Despite its clinical importance, the role of microRNAs (miRNAs) in fetoplacental vascular endothelial cell (VEC) programming in the context of GDM remains elusive. This study aims to identify signature miRNA genes involved in this process using bioinformatics analysis via multiple algorithms. Methods The dataset used in this study was acquired from Gene Expression Omnibus (GEO). Firstly, differentially expressed miRNA genes (DEMGs) were evaluated using limma package. Thereafter, an enrichment analysis of DEMGs was performed. Then, the least absolute shrinkage and selection operator (LASSO) and support vector machine (SVM) were used as the other algorithms for screening candidate signature miRNA genes. Genes from the intersection of limma, LASSO, and SVM genes were used as the final signature miRNA genes. The receiver operator characteristic curve (ROC), the nomogram diagram, gene set enrichment analysis (GSEA), and signature miRNAs-target genes interaction network were implemented further to explore the features and functions of signature genes. Results A total of 32 DEMGs, with 21 upregulated and 11 downregulated miRNA genes, were obtained from limma analysis. LASSO and SVM analyses identified 15 and 12 candidate signature miRNA genes, respectively. After the intersection of genes from limma, LASSO, and SVM analyses, MIR34A and MIR186 were found as the final signature genes related to fetoplacental VEC programming. MIR34A and MIR186 were highly expressed and were associated with an increased risk of fetoplacental VEC programming in GDM mothers. The area under the curve (AUC) of ROC for MIR34A and MIR186 were 0.960 and 0.935, respectively. GSEA analysis revealed that these signature genes positively participate in cellular processes related to VEC migration, cell differentiation, angiogenesis, programmed cell death, and inflammatory response. Finally, miRNAs-target genes interaction network analysis provides the interaction of signature miRNAs and their critical target genes, which may help further studies for miR-34a and miR-186 in GDM. Conclusions MIR34A and MIR186 are novel signature miRNA genes related to fetoplacental VEC programming that may represent critical genes associated with placental function and fetal programming under GDM conditions.
Collapse
Affiliation(s)
- Yulan Lu
- Center of Reproduction Medical, Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, Guangxi, 533000, China
| | - Chunhong Liu
- Center for Medical Laboratory Science, Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, Guangxi, 533000, China
- Key Laboratory of Research and Development on Clinical Molecular Diagnosis for High-Incidence Diseases of Baise, Guangxi, 533000, China
- Key Laboratory of Research on Clinical Molecular Diagnosis for High Incidence Diseases in Western Guangxi of Guangxi Higher Education Institutions, Guangxi, 533000, China
| | - Xiaoxia Pang
- Center for Medical Laboratory Science, Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, Guangxi, 533000, China
- Key Laboratory of Research and Development on Clinical Molecular Diagnosis for High-Incidence Diseases of Baise, Guangxi, 533000, China
- Key Laboratory of Research on Clinical Molecular Diagnosis for High Incidence Diseases in Western Guangxi of Guangxi Higher Education Institutions, Guangxi, 533000, China
| | - Xinghong Chen
- Center of Reproduction Medical, Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, Guangxi, 533000, China
| | - Chunfang Wang
- Center for Medical Laboratory Science, Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, Guangxi, 533000, China
- Key Laboratory of Research and Development on Clinical Molecular Diagnosis for High-Incidence Diseases of Baise, Guangxi, 533000, China
- Key Laboratory of Research on Clinical Molecular Diagnosis for High Incidence Diseases in Western Guangxi of Guangxi Higher Education Institutions, Guangxi, 533000, China
| | - Huatuo Huang
- Center for Medical Laboratory Science, Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, Guangxi, 533000, China
- Key Laboratory of Research and Development on Clinical Molecular Diagnosis for High-Incidence Diseases of Baise, Guangxi, 533000, China
- Key Laboratory of Research on Clinical Molecular Diagnosis for High Incidence Diseases in Western Guangxi of Guangxi Higher Education Institutions, Guangxi, 533000, China
| |
Collapse
|
6
|
Cai Y, Zhou N, Zhao J, Li W, Wang S. CSSEC: An adaptive approach integrating consensus and specific self-expressive coefficients for multi-omics cancer subtyping. Methods 2025; 235:26-33. [PMID: 39880224 DOI: 10.1016/j.ymeth.2025.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 01/05/2025] [Accepted: 01/16/2025] [Indexed: 01/31/2025] Open
Abstract
Cancer is a complex and heterogeneous disease, and accurate cancer subtyping can significantly improve patient survival rates. The complexity of cancer spans multiple omics levels, and analyzing multi-omics data for cancer subtyping has become a major focus of research. However, extracting complementary information from different omics data sources and adaptively integrating them remains a major challenge. To address this, we proposed an adaptive approach integrating consensus and specific self-expressive coefficients for multi-omics cancer subtyping (CSSEC). First, independent self-expressive networks are applied to each omics to calculate coefficient matrices to measure patient similarity. Then, two feature graph convolutional network modules capture consensus and specific similarity features using the topK relevant features. Finally, the multi-omics self-expression coefficient matrix is constructed by consensus and specific similarity features. Furthermore, joint consistency and disparity constraints are applied to regularize the fusion of the self-expressive coefficients. Experimental results demonstrate that CSSEC outperforms existing state-of-the-art methods in survival analysis. Moreover, case studies on kidney cancer confirm that the cancer subtypes identified by CSSEC are biologically significant. The complete code can be available at https://github.com/ykxhs/CSSEC.
Collapse
Affiliation(s)
- Yueyi Cai
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| | - Nan Zhou
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| | - Junran Zhao
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| | - Weihua Li
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| |
Collapse
|
7
|
Taunk K, Jajula S, Bhavsar PP, Choudhari M, Bhanuse S, Tamhankar A, Naiya T, Kalita B, Rapole S. The prowess of metabolomics in cancer research: current trends, challenges and future perspectives. Mol Cell Biochem 2025; 480:693-720. [PMID: 38814423 DOI: 10.1007/s11010-024-05041-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 05/18/2024] [Indexed: 05/31/2024]
Abstract
Cancer due to its heterogeneous nature and large prevalence has tremendous socioeconomic impacts on populations across the world. Therefore, it is crucial to discover effective panels of biomarkers for diagnosing cancer at an early stage. Cancer leads to alterations in cell growth and differentiation at the molecular level, some of which are very unique. Therefore, comprehending these alterations can aid in a better understanding of the disease pathology and identification of the biomolecules that can serve as effective biomarkers for cancer diagnosis. Metabolites, among other biomolecules of interest, play a key role in the pathophysiology of cancer whose levels are significantly altered while 'reprogramming the energy metabolism', a cellular condition favored in cancer cells which is one of the hallmarks of cancer. Metabolomics, an emerging omics technology has tremendous potential to contribute towards the goal of investigating cancer metabolites or the metabolic alterations during the development of cancer. Diverse metabolites can be screened in a variety of biofluids, and tumor tissues sampled from cancer patients against healthy controls to capture the altered metabolism. In this review, we provide an overview of different metabolomics approaches employed in cancer research and the potential of metabolites as biomarkers for cancer diagnosis. In addition, we discuss the challenges associated with metabolomics-driven cancer research and gaze upon the prospects of this emerging field.
Collapse
Affiliation(s)
- Khushman Taunk
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, West Bengal, NH12 Simhat, Haringhata, Nadia, West Bengal, 741249, India
| | - Saikiran Jajula
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Praneeta Pradip Bhavsar
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Mahima Choudhari
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Sadanand Bhanuse
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India
| | - Anup Tamhankar
- Department of Surgical Oncology, Deenanath Mangeshkar Hospital and Research Centre, Erandawne, Pune, Maharashtra, 411004, India
| | - Tufan Naiya
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, West Bengal, NH12 Simhat, Haringhata, Nadia, West Bengal, 741249, India
| | - Bhargab Kalita
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India.
- Amrita School of Nanosciences and Molecular Medicine, Amrita Institute of Medical Sciences and Research Centre, Amrita Vishwa Vidyapeetham, Ponekkara, Kochi, Kerala, 682041, India.
| | - Srikanth Rapole
- Proteomics Lab, National Centre for Cell Science, Ganeshkhind, Pune, Maharashtra, 411007, India.
| |
Collapse
|
8
|
Dong X, Zhang K, Yi S, Wang L, Wang X, Li M, Liang S, Wang Y, Zeng Y. Multi-omics profiling combined with molecular docking reveals immune-inflammatory proteins as potential drug targets in colorectal cancer. Biochem Biophys Res Commun 2024; 739:150598. [PMID: 39213754 DOI: 10.1016/j.bbrc.2024.150598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 08/20/2024] [Accepted: 08/22/2024] [Indexed: 09/04/2024]
Abstract
Colorectal cancer is globally ranked as the third most common malignant tumor. Its development involves a complex biological process driven by various genetic and epigenetic alterations. To elucidate the biological significance of the extensive omics data, we conducted comparative multi-omics studies on colorectal cancer patients at different clinical stages. Bioinformatics methods were applied to analyze multi-omics datasets and explore the molecular landscape. Drug prediction and molecular docking also were conducted to assess potential therapeutic interventions. In vitro experiments were used to validate the inhibitory effect on the migration and proliferation of cell lines. The results indicate up-regulated proteins involved in immune-inflammatory related pathways, while biomarkers related to muscular contraction and cell adhesion are significantly down-regulated. Drug prediction, coupled with in vitro experiments, suggests that AZ-628 may act as a potential drug to inhibit the proliferation and migration of CRC cell lines HCT-116 and HT-29 by regulating the aforementioned key biological pathways or proteins. Complementing these findings, metabolomics analysis unveiled a down-regulation of key carbon metabolism pathways, alongside an up-regulation in amino acid metabolism, particularly proline metabolism. This metabolic shift may reflect an adaptive response in cancer cells, favoring specific amino acids to support their growth. Together, these integrated results provide valuable insights into the intricate landscape of tumor development, highlighting the crossroads of immune regulation, cellular structure, and metabolic reprogramming in the tumorigenic process and providing valuable insights into cancer pathology.
Collapse
Affiliation(s)
- Xiaoping Dong
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, Hunan, China; Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, 410081, China
| | - Kun Zhang
- The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha, 410081, Hunan, China
| | - Siwei Yi
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, Hunan, China; Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, 410081, China
| | - Lingxiang Wang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, Hunan, China; Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, 410081, China; The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha, 410081, Hunan, China
| | - Xingyao Wang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, Hunan, China; Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, 410081, China
| | - Mengtuo Li
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, Hunan, China; Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, 410081, China
| | - Songping Liang
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, Hunan, China; Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, 410081, China
| | - YongJun Wang
- Department of Gastroenterology, The Second Xiangya Hospital, Central South University, Changsha, 410011, Hunan, China.
| | - Yong Zeng
- National & Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, Hunan, China; Peptide and Small Molecule Drug R&D Platform, Furong Laboratory, Hunan Normal University, Changsha, 410081, China; The State Key Laboratory of Developmental Biology of Freshwater Fish, College of Life Science, Hunan Normal University, Changsha, 410081, Hunan, China.
| |
Collapse
|
9
|
Zhou Z, Yang X. An update review of the application of single-cell RNA sequencing in pregnancy-related diseases. Front Endocrinol (Lausanne) 2024; 15:1415173. [PMID: 39717096 PMCID: PMC11663665 DOI: 10.3389/fendo.2024.1415173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 11/22/2024] [Indexed: 12/25/2024] Open
Abstract
Reproductive success hinges on the presence of a robust and functional placenta. Examining the placenta provides insight about the progression of pregnancy and valuable information about the normal developmental trajectory of the fetus. The current limitations of using bulk RNA-sequencing (RNA-seq) analysis stem from the diverse composition of the placenta, hindering a comprehensive description of how distinct trophoblast cell expression patterns contribute to the establishment and sustenance of a successful pregnancy. At present, the transcriptional landscape of intricate tissues increasingly relies on single-cell RNA sequencing (scRNA-seq). A few investigations have utilized scRNA-seq technology to examine the codes governing transcriptome regulation in cells at the maternal-fetal interface. In this review, we explore the fundamental principles of scRNA-seq technology, offering the latest overview of human placental studies utilizing this method across various gestational weeks in both normal pregnancies and pregnancy-related diseases, including recurrent pregnancy loss (RPL), preeclampsia (PE), preterm birth, and gestational diabetes mellitus (GDM). Furthermore, we discuss the limitations and future perspectives of scRNA-seq technology within the realm of reproduction. It seems that scRNA-seq stands out as one of the crucial tools for studying the etiology of pregnancy complications. The future direction of scRNA-seq applications may involve devolving into functional biology, with a primary focus on understanding variations in transcriptional activity among highly specific cell populations. Our goal is to provide obstetricians with an updated understanding of scRNA-seq technology related to pregnancy complications, providing comprehensive understandings to aid in the diagnosis and treatment of these conditions, ultimately improving maternal and fetal prognosis.
Collapse
Affiliation(s)
| | - Xiuhua Yang
- Department of Obstetrics, The First Hospital of China Medical University, Shenyang, China
| |
Collapse
|
10
|
Lee JY, Park W, Kim H, Lee HS, Kang TW, Shin DH, Kim KS, Lee YK, Kim SY, Park JH, Kim YJ. Multi-omics analysis sandbox toolkit for swift derivations of clinically relevant genesets and biomarkers. BMB Rep 2024; 57:521-526. [PMID: 38919019 PMCID: PMC11693602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 10/04/2023] [Accepted: 02/05/2024] [Indexed: 06/27/2024] Open
Abstract
The utilization of multi-omics research has gained popularity in clinical investigations. However, effectively managing and merging extensive and diverse datasets presents a challenge due to its intricacy. This research introduces a Multi-Omics Analysis Sandbox Toolkit, an online platform designed to facilitate the exploration, integration, and visualization of datasets ranging from single-omics to multi-omics. This platform establishes connections between clinical data and omics information, allowing for versatile analysis and storage of both single and multi-omics data. Additionally, users can repeatedly utilize and exchange their findings within the platform. This toolkit offers diverse alternatives for data selection and gene set analysis. It also presents visualization outputs, potential candidates, and annotations. Furthermore, this platform empowers users to collaborate by sharing their datasets, analyses, and conclusions with others, thus enhancing its utility as a collaborative research tool. This Multi-Omics Analysis Sandbox Toolkit stands as a valuable asset in comprehensively grasping the influence of diverse factors in diseases and pinpointing potential biomarkers. [BMB Reports 2024; 57(12): 521-526].
Collapse
Affiliation(s)
- Jin-Young Lee
- Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Won Park
- The Moagen, Inc., Daejeon 35368, Korea
| | | | - Hong Seok Lee
- Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | | | | | | | | | - Seon-Young Kim
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Korea
| | - Ji Hwan Park
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Korea
| | - Young-Joon Kim
- Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
11
|
Oróstica K, Mardones F, Bernal YA, Molina S, Orchard M, Verdugo RA, Carvajal-Hausdorf D, Marcelain K, Contreras S, Armisen R. Advances in machine learning for tumour classification in cancer of unknown primary: A mini-review. Cancer Lett 2024; 611:217348. [PMID: 39613220 DOI: 10.1016/j.canlet.2024.217348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 11/20/2024] [Accepted: 11/21/2024] [Indexed: 12/01/2024]
Abstract
Cancers of unknown primary (CUP) are a heterogeneous group of aggressive metastatic cancers where standardised diagnostic techniques fail to identify the organ where it originated, resulting in a poor prognosis and resistance to treatment. Recent advances in large-scale sequencing techniques have enabled the identification of mutational signatures specific to particular tumour subtypes, even from liquid biopsy samples such as blood. This breakthrough paves the way for the development of new cost-effective diagnostic strategies. This mini-review explores recent advancements in Machine Learning (ML) and its application to tumour classification methods for CUP patients, identifying its weaknesses and strengths when classifying the tumour type. In the era of multi-omics, integrating several sources of information (e.g., imaging, molecular biomarkers, and family history) requires important theoretical advancements: increasing the dimensionality of the problem can result in lowering the predictive accuracy and robustness when data is scarce. Here, we review and discuss different architectures and strategies for incorporating cutting-edge machine learning into CUP diagnosis, aiming to bridge the gap between theory and clinical practice.
Collapse
Affiliation(s)
- Karen Oróstica
- Facultad de Medicina, Universidad de Talca, Talca, Chile.
| | | | - Yanara A Bernal
- Centro de Genética y Genómica, Instituto de Ciencias e Innovación en Medicina, Facultad de Medicina Clínica Alemana Universidad del Desarrollo, Santiago, Chile
| | - Samuel Molina
- Department of Electrical Engineering, Faculty of Physical and Mathematical Sciences, University of Chile, Av. Tupper 2007, Casilla 412-3, Santiago, 8370451, Chile
| | - Marcos Orchard
- Department of Electrical Engineering, Faculty of Physical and Mathematical Sciences, University of Chile, Av. Tupper 2007, Casilla 412-3, Santiago, 8370451, Chile
| | - Ricardo A Verdugo
- Facultad de Medicina, Universidad de Talca, Talca, Chile; Departamento de Oncología Básico Clínica, Facultad de Medicina, Universidad de Chile, Santiago, Chile
| | - Daniel Carvajal-Hausdorf
- Anatomia Patológica, Clinica Alemana, Facultad de Medicina Universidad del Desarrollo, Santiago, Chile
| | - Katherine Marcelain
- Departamento de Oncología Básico Clínica, Facultad de Medicina, Universidad de Chile, Santiago, Chile; Centro Para La Prevención y el Control del Cáncer, Universidad de Chile, Santiago, Chile
| | - Seba Contreras
- Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany.
| | - Ricardo Armisen
- Centro de Genética y Genómica, Instituto de Ciencias e Innovación en Medicina, Facultad de Medicina Clínica Alemana Universidad del Desarrollo, Santiago, Chile.
| |
Collapse
|
12
|
Briscik M, Tazza G, Vidács L, Dillies MA, Déjean S. Supervised multiple kernel learning approaches for multi-omics data integration. BioData Min 2024; 17:53. [PMID: 39580456 PMCID: PMC11585117 DOI: 10.1186/s13040-024-00406-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 11/14/2024] [Indexed: 11/25/2024] Open
Abstract
BACKGROUND Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining. RESULTS We provide novel MKL approaches based on different kernel fusion strategies. To learn from the meta-kernel of input kernels, we adapted unsupervised integration algorithms for supervised tasks with support vector machines. We also tested deep learning architectures for kernel fusion and classification. The results show that MKL-based models can outperform more complex, state-of-the-art, supervised multi-omics integrative approaches. CONCLUSION Multiple kernel learning offers a natural framework for predictive models in multi-omics data. It proved to provide a fast and reliable solution that can compete with and outperform more complex architectures. Our results offer a direction for bio-data mining research, biomarker discovery and further development of methods for heterogeneous data integration.
Collapse
Affiliation(s)
- Mitja Briscik
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, Toulouse, 31062, France.
| | - Gabriele Tazza
- Department of Computer Science, Applied Artificial Intelligence Group , University of Szeged, Szeged, 6720, Hungary.
| | - László Vidács
- Department of Computer Science, Applied Artificial Intelligence Group , University of Szeged, Szeged, 6720, Hungary
| | - Marie-Agnès Dillies
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, F-75015, Paris, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR5219, CNRS, UPS, Université de Toulouse, Cedex 9, Toulouse, 31062, France
| |
Collapse
|
13
|
Sakagianni A, Koufopoulou C, Koufopoulos P, Kalantzi S, Theodorakis N, Nikolaou M, Paxinou E, Kalles D, Verykios VS, Myrianthefs P, Feretzakis G. Data-Driven Approaches in Antimicrobial Resistance: Machine Learning Solutions. Antibiotics (Basel) 2024; 13:1052. [PMID: 39596745 PMCID: PMC11590962 DOI: 10.3390/antibiotics13111052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 10/25/2024] [Accepted: 10/29/2024] [Indexed: 11/29/2024] Open
Abstract
Background/Objectives: The emergence of antimicrobial resistance (AMR) due to the misuse and overuse of antibiotics has become a critical threat to global public health. There is a dire need to forecast AMR to understand the underlying mechanisms of resistance for the development of effective interventions. This paper explores the capability of machine learning (ML) methods, particularly unsupervised learning methods, to enhance the understanding and prediction of AMR. It aims to determine the patterns from AMR gene data that are clinically relevant and, in public health, capable of informing strategies. Methods: We analyzed AMR gene data in the PanRes dataset by applying unsupervised learning techniques, namely K-means clustering and Principal Component Analysis (PCA). These techniques were applied to identify clusters based on gene length and distribution according to resistance class, offering insights into the resistance genes' structural and functional properties. Data preprocessing, such as filtering and normalization, was conducted prior to applying machine learning methods to ensure consistency and accuracy. Our methodology included the preprocessing of data and reduction of dimensionality to ensure that our models were both accurate and interpretable. Results: The unsupervised learning models highlighted distinct clusters of AMR genes, with significant patterns in gene length, including their associated resistance classes. Further dimensionality reduction by PCA allows for clearer visualizations of relationships among gene groupings. These patterns provide novel insights into the potential mechanisms of resistance, particularly the role of gene length in different resistance pathways. Conclusions: This study demonstrates the potential of ML, specifically unsupervised approaches, to enhance the understanding of AMR. The identified patterns in resistance genes could support clinical decision-making and inform public health interventions. However, challenges remain, particularly in integrating genomic data and ensuring model interpretability. Further research is needed to advance ML applications in AMR prediction and management.
Collapse
Affiliation(s)
- Aikaterini Sakagianni
- Intensive Care Unit, Sismanogelio General Hospital, 37 Sismanogleiou Str., 15126 Marousi, Greece;
| | - Christina Koufopoulou
- Anesthesiology Department, Aretaieio University Hospital, National and Kapodistrian University of Athens, Vass. Sofias 76, 11528 Athens, Greece;
| | - Petros Koufopoulos
- Department of Internal Medicine, Sismanogleio General Hospital, 15126 Marousi, Greece;
| | - Sofia Kalantzi
- Department of Internal Medicine & 65+ Clinic, Amalia Fleming General Hospital, 14, 25th Martiou Str., 15127 Athens, Greece;
| | - Nikolaos Theodorakis
- Department of Cardiology & 65+ Clinic, Amalia Fleming General Hospital, 14, 25th Martiou Str., 15127 Athens, Greece; (N.T.); (M.N.)
| | - Maria Nikolaou
- Department of Cardiology & 65+ Clinic, Amalia Fleming General Hospital, 14, 25th Martiou Str., 15127 Athens, Greece; (N.T.); (M.N.)
| | - Evgenia Paxinou
- School of Science and Technology, Hellenic Open University, 18 Aristotelous Str., 26335 Patras, Greece; (E.P.); (D.K.); (V.S.V.)
| | - Dimitris Kalles
- School of Science and Technology, Hellenic Open University, 18 Aristotelous Str., 26335 Patras, Greece; (E.P.); (D.K.); (V.S.V.)
| | - Vassilios S. Verykios
- School of Science and Technology, Hellenic Open University, 18 Aristotelous Str., 26335 Patras, Greece; (E.P.); (D.K.); (V.S.V.)
| | - Pavlos Myrianthefs
- Faculty of Nursing, School of Health Sciences, National and Kapodistrian University of Athens, 11527 Athens, Greece;
| | - Georgios Feretzakis
- School of Science and Technology, Hellenic Open University, 18 Aristotelous Str., 26335 Patras, Greece; (E.P.); (D.K.); (V.S.V.)
| |
Collapse
|
14
|
Shi C, Cheng L, Yu Y, Chen S, Dai Y, Yang J, Zhang H, Chen J, Geng N. Multi-omics integration analysis: Tools and applications in environmental toxicology. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 360:124675. [PMID: 39103035 DOI: 10.1016/j.envpol.2024.124675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 07/08/2024] [Accepted: 08/03/2024] [Indexed: 08/07/2024]
Abstract
Nowadays, traditional single-omics study is not enough to explain the causality between molecular alterations and toxicity endpoints for environmental pollutants. With the development of high-throughput sequencing technology and high-resolution mass spectrometry technology, the integrative analysis of multi-omics has become an efficient strategy to understand holistic biological mechanisms and to uncover the regulation network in specific biological processes. This review summarized sample preparation methods, integration analysis tools and the application of multi-omics integration analyses in environmental toxicology field. Currently, omics methods have been widely applied being as the sensitivity of early biological response, especially for low-dose and long-term exposure to environmental pollutants. Integrative omics can reveal the overall changes of genes, proteins, and/or metabolites in the cells, tissues or organisms, which provide new insights into revealing the overall toxicity effects, screening the toxic targets, and exploring the underlying molecular mechanism of pollutants.
Collapse
Affiliation(s)
- Chengcheng Shi
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Lin Cheng
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Ying Yu
- College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Shuangshuang Chen
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; College of Environmental Science and Engineering, Dalian Maritime University, Dalian, 116026, China
| | - Yubing Dai
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Jiajia Yang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; College of Materials Science and Engineering, Hebei University of Engineering, Handan, 056038, China
| | - Haijun Zhang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Jiping Chen
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Ningbo Geng
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China.
| |
Collapse
|
15
|
Musib L, Coletti R, Lopes MB, Mouriño H, Carrasquinha E. Priority-Elastic net for binary disease outcome prediction based on multi-omics data. BioData Min 2024; 17:45. [PMID: 39472942 PMCID: PMC11523883 DOI: 10.1186/s13040-024-00401-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 10/15/2024] [Indexed: 11/02/2024] Open
Abstract
BACKGROUND High-dimensional omics data integration has emerged as a prominent avenue within the healthcare industry, presenting substantial potential to improve predictive models. However, the data integration process faces several challenges, including data heterogeneity, priority sequence in which data blocks are prioritized for rendering predictive information contained in multiple blocks, assessing the flow of information from one omics level to the other and multicollinearity. METHODS We propose the Priority-Elastic net algorithm, a hierarchical regression method extending Priority-Lasso for the binary logistic regression model by incorporating a priority order for blocks of variables while fitting Elastic-net models sequentially for each block. The fitted values from each step are then used as an offset in the subsequent step. Additionally, we considered the adaptive elastic-net penalty within our priority framework to compare the results. RESULTS The Priority-Elastic net and Priority-Adaptive Elastic net algorithms were evaluated on a brain tumor dataset available from The Cancer Genome Atlas (TCGA), accounting for transcriptomics, proteomics, and clinical information measured over two glioma types: Lower-grade glioma (LGG) and glioblastoma (GBM). CONCLUSION Our findings suggest that the Priority-Elastic net is a highly advantageous choice for a wide range of applications. It offers moderate computational complexity, flexibility in integrating prior knowledge while introducing a hierarchical modeling perspective, and, importantly, improved stability and accuracy in predictions, making it superior to the other methods discussed. This evolution marks a significant step forward in predictive modeling, offering a sophisticated tool for navigating the complexities of multi-omics datasets in pursuit of precision medicine's ultimate goal: personalized treatment optimization based on a comprehensive array of patient-specific data. This framework can be generalized to time-to-event, Cox proportional hazards regression and multicategorical outcomes. A practical implementation of this method is available upon request in R script, complete with an example to facilitate its application.
Collapse
Affiliation(s)
- Laila Musib
- Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisboa, 1749-016, Portugal
- CEAUL - Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Roberta Coletti
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology (NOVA FCT), Caparica, 2829-516, Portugal
| | - Marta B Lopes
- Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology (NOVA FCT), Caparica, 2829-516, Portugal
- UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology (NOVA FCT), Caparica, 2829-516, Portugal
| | - Helena Mouriño
- Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisboa, 1749-016, Portugal
- CEAUL - Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal
| | - Eunice Carrasquinha
- Departamento de Estatística e Investigação Operacional, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisboa, 1749-016, Portugal.
- CEAUL - Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Lisbon, 1749-016, Portugal.
| |
Collapse
|
16
|
Khan MZ, Chen W, Wang X, Liang H, Wei L, Huang B, Kou X, Liu X, Zhang Z, Chai W, Khan A, Peng Y, Wang C. A review of genetic resources and trends of omics applications in donkey research: focus on China. Front Vet Sci 2024; 11:1366128. [PMID: 39464628 PMCID: PMC11502298 DOI: 10.3389/fvets.2024.1366128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 09/12/2024] [Indexed: 10/29/2024] Open
Abstract
Omics methodologies, such as genomics, transcriptomics, proteomics, metabolomics, lipidomics and microbiomics, have revolutionized biological research by allowing comprehensive molecular analysis in livestock animals. However, despite being widely used in various animal species, research on donkeys has been notably scarce. China, renowned for its rich history in donkey husbandry, plays a pivotal role in their conservation and utilization. China boasts 24 distinct donkey breeds, necessitating conservation efforts, especially for smaller breeds facing extinction threats. So far, omics approaches have been employed in studies of donkey milk and meat, shedding light on their composition and quality. Similarly, omics methods have been utilized to explore the molecular basis associated with donkey growth, meat production, and quality traits. Omics analysis has also unraveled the critical role of donkey microbiota in health and nutrition, with gut microbiome studies revealing associations with factors such as pregnancy, age, transportation stress, and altitude. Furthermore, omics applications have addressed donkey health issues, including infectious diseases and reproductive problems. In addition, these applications have also provided insights into the improvement of donkey reproductive efficiency research. In conclusion, omics methodologies are essential for advancing knowledge about donkeys, their genetic diversity, and their applications across various domains. However, omics research in donkeys is still in its infancy, and there is a need for continued research to enhance donkey breeding, production, and welfare in China and beyond.
Collapse
Affiliation(s)
- Muhammad Zahoor Khan
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Wenting Chen
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Xinrui Wang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Huili Liang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Lin Wei
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Bingjian Huang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Xiyan Kou
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Xiaotong Liu
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Zhenwei Zhang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Wenqiong Chai
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Adnan Khan
- Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yongdong Peng
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| | - Changfa Wang
- Liaocheng Research Institute of Donkey High-Efficiency Breeding and Ecological Feeding, Liaocheng University, Liaocheng, China
| |
Collapse
|
17
|
Vitorino R. Transforming Clinical Research: The Power of High-Throughput Omics Integration. Proteomes 2024; 12:25. [PMID: 39311198 PMCID: PMC11417901 DOI: 10.3390/proteomes12030025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 08/31/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
High-throughput omics technologies have dramatically changed biological research, providing unprecedented insights into the complexity of living systems. This review presents a comprehensive examination of the current landscape of high-throughput omics pipelines, covering key technologies, data integration techniques and their diverse applications. It looks at advances in next-generation sequencing, mass spectrometry and microarray platforms and highlights their contribution to data volume and precision. In addition, this review looks at the critical role of bioinformatics tools and statistical methods in managing the large datasets generated by these technologies. By integrating multi-omics data, researchers can gain a holistic understanding of biological systems, leading to the identification of new biomarkers and therapeutic targets, particularly in complex diseases such as cancer. The review also looks at the integration of omics data into electronic health records (EHRs) and the potential for cloud computing and big data analytics to improve data storage, analysis and sharing. Despite significant advances, there are still challenges such as data complexity, technical limitations and ethical issues. Future directions include the development of more sophisticated computational tools and the application of advanced machine learning techniques, which are critical for addressing the complexity and heterogeneity of omics datasets. This review aims to serve as a valuable resource for researchers and practitioners, highlighting the transformative potential of high-throughput omics technologies in advancing personalized medicine and improving clinical outcomes.
Collapse
Affiliation(s)
- Rui Vitorino
- iBiMED, Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
- Department of Surgery and Physiology, Cardiovascular R&D Centre—UnIC@RISE, Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal
| |
Collapse
|
18
|
Li M, Cai Y, Zhang M, Deng S, Wang L. NNBGWO-BRCA marker: Neural Network and binary grey wolf optimization based Breast cancer biomarker discovery framework using multi-omics dataset. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 254:108291. [PMID: 38909399 DOI: 10.1016/j.cmpb.2024.108291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/09/2024] [Accepted: 06/16/2024] [Indexed: 06/25/2024]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is a multifaceted condition characterized by diverse features and a substantial mortality rate, underscoring the imperative for timely detection and intervention. The utilization of multi-omics data has gained significant traction in recent years to identify biomarkers and classify subtypes in breast cancer. This kind of research idea from part to whole will also be an inevitable trend in future life science research. Deep learning can integrate and analyze multi-omics data to predict cancer subtypes, which can further drive targeted therapies. However, there are few articles leveraging the nature of deep learning for feature selection. Therefore, this paper proposes a Neural Network and Binary grey Wolf Optimization based BReast CAncer bioMarker (NNBGWO-BRCAMarker) discovery framework using multi-omics data to obtain a series of biomarkers for precise classification of breast cancer subtypes. METHODS NNBGWO-BRCAMarker consists of two phases: in the first phase, relevant genes are selected using the weights obtained from a trained feedforward neural network; in the second phase, the binary grey wolf optimization algorithm is leveraged to further screen the selected genes, resulting in a set of potential breast cancer biomarkers. RESULTS The SVM classifier with RBF kernel achieved a classification accuracy of 0.9242 ± 0.03 when trained using the 80 biomarkers identified by NNBGWO-BRCAMarker, as evidenced by the experimental results. We conducted a comprehensive gene set analysis, prognostic analysis, and druggability analysis, unveiling 25 druggable genes, 16 enriched pathways strongly linked to specific subtypes of breast cancer, and 8 genes linked to prognostic outcomes. CONCLUSIONS The proposed framework successfully identified 80 biomarkers from the multi-omics data, enabling accurate classification of breast cancer subtypes. This discovery may offer novel insights for clinicians to pursue in further studies.
Collapse
Affiliation(s)
- Min Li
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China.
| | - Yuheng Cai
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Mingzhuang Zhang
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Shaobo Deng
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| | - Lei Wang
- School of Information Engineering, Nanchang Institute of Technology, No. 289 Tianxiang Road, Nanchang Jiangxi, PR China
| |
Collapse
|
19
|
Zhang H, Huang D, Chen E, Cao D, Xu T, Dizdar B, Li G, Chen Y, Payne P, Province M, Li F. mosGraphGPT: a foundation model for multi-omic signaling graphs using generative AI. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.01.606222. [PMID: 39149314 PMCID: PMC11326168 DOI: 10.1101/2024.08.01.606222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Generative pretrained models represent a significant advancement in natural language processing and computer vision, which can generate coherent and contextually relevant content based on the pre-training on large general datasets and fine-tune for specific tasks. Building foundation models using large scale omic data is promising to decode and understand the complex signaling language patterns within cells. Different from existing foundation models of omic data, we build a foundation model, mosGraphGPT, for multi-omic signaling (mos) graphs, in which the multi-omic data was integrated and interpreted using a multi-level signaling graph. The model was pretrained using multi-omic data of cancers in The Cancer Genome Atlas (TCGA), and fine-turned for multi-omic data of Alzheimer's Disease (AD). The experimental evaluation results showed that the model can not only improve the disease classification accuracy, but also is interpretable by uncovering disease targets and signaling interactions. And the model code are uploaded via GitHub with link: https://github.com/mosGraph/mosGraphGPT.
Collapse
Affiliation(s)
- Heming Zhang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
| | - Di Huang
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
| | - Emily Chen
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- School of Arts and Sciences, University of Rochester, Rochester, NY, 14627, USA
| | - Dekang Cao
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Computer Science and Engineering
| | - Tim Xu
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Computer Science and Engineering
| | - Ben Dizdar
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Computer Science and Engineering
| | - Guangfu Li
- Department of Surgery, School of Medicine, University of Connecticut, CT, 06032, USA
| | - Yixin Chen
- Department of Computer Science and Engineering
| | - Philip Payne
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
| | | | - Fuhai Li
- Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine
- Department of Pediatrics, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| |
Collapse
|
20
|
Jeyananthan P. Performance comparison between multi-level gene expression data in cancer subgroup classification. Pathol Res Pract 2024; 260:155419. [PMID: 38955118 DOI: 10.1016/j.prp.2024.155419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/06/2024] [Accepted: 06/19/2024] [Indexed: 07/04/2024]
Abstract
Cancer is a serious disease that can affect various parts of the body such as breast, colon, lung or stomach. Each of these cancers has their own treatment dependent historical subgroups. Hence, the correct identification of cancer subgroup has almost same importance as the timely diagnosis of cancer. This is still a challenging task and a system with highest accuracy is essential. Current researches are moving towards analyzing the gene expression data of cancer patients for various purposes including biomarker identification and studying differently expressed genes, using gene expression data measured in a single level (selected from different gene levels including genome, transcriptome or translation). However, previous studies showed that information carried by one level of gene expression is not similar to another level. This shows the importance of integrating multi-level omics data in these studies. Hence, this study uses tumor gene expression data measured from various levels of gene along with the integration of those data in the subgroup classification of nine different cancers. This is a comprehensive analysis where four different gene expression data such as transcriptome, miRNA, methylation and proteome are used in this subgrouping and the performances between models are compared to reveal the best model.
Collapse
|
21
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets. Brief Bioinform 2024; 25:bbae366. [PMID: 39082650 PMCID: PMC11289684 DOI: 10.1093/bib/bbae366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/21/2024] [Accepted: 07/18/2024] [Indexed: 08/03/2024] Open
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
Collapse
Affiliation(s)
- Zeyu Lu
- Department of Statistics and Data Science, Moody School of Graduate and Advanced Studies, Southern Methodist University, 3225 Daniel Ave., P.O. Box 750332, Dallas, TX, United States
| | - Xue Xiao
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
| | - Qiang Zheng
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Xinlei Wang
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
- Department of Mathematics, University of Texas at Arlington, 411 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
- Department of Pediatrics, Division of Hematology/Oncology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, United States
| |
Collapse
|
22
|
Yang H, Zhao L, Li D, An C, Fang X, Chen Y, Liu J, Xiao T, Wang Z. Subtype-WGME enables whole-genome-wide multi-omics cancer subtyping. CELL REPORTS METHODS 2024; 4:100781. [PMID: 38761803 PMCID: PMC11228280 DOI: 10.1016/j.crmeth.2024.100781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 01/05/2024] [Accepted: 04/26/2024] [Indexed: 05/20/2024]
Abstract
We present an innovative strategy for integrating whole-genome-wide multi-omics data, which facilitates adaptive amalgamation by leveraging hidden layer features derived from high-dimensional omics data through a multi-task encoder. Empirical evaluations on eight benchmark cancer datasets substantiated that our proposed framework outstripped the comparative algorithms in cancer subtyping, delivering superior subtyping outcomes. Building upon these subtyping results, we establish a robust pipeline for identifying whole-genome-wide biomarkers, unearthing 195 significant biomarkers. Furthermore, we conduct an exhaustive analysis to assess the importance of each omic and non-coding region features at the whole-genome-wide level during cancer subtyping. Our investigation shows that both omics and non-coding region features substantially impact cancer development and survival prognosis. This study emphasizes the potential and practical implications of integrating genome-wide data in cancer research, demonstrating the potency of comprehensive genomic characterization. Additionally, our findings offer insightful perspectives for multi-omics analysis employing deep learning methodologies.
Collapse
Affiliation(s)
- Hai Yang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liang Zhao
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Congcong An
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Xiaoyang Fang
- Cornell Tech, Cornell University, New York, NY 14853, USA
| | - Yiwen Chen
- Center for Continuing and Lifelong Education, National University of Singapore, Singapore 119077, Singapore
| | - Jingping Liu
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Ting Xiao
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, China.
| |
Collapse
|
23
|
Esquivel Gaytan A, Bomer N, Grote Beverborg N, van der Meer P. 404-error "Disease not found": Unleashing the translational potential of -omics approaches beyond traditional disease classification in heart failure research. Eur J Heart Fail 2024; 26:1313-1323. [PMID: 38741225 DOI: 10.1002/ejhf.3268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/15/2024] [Accepted: 04/14/2024] [Indexed: 05/16/2024] Open
Abstract
The emergence of personalized medicine, facilitated by the progress in -omics technologies, has initiated a new era in medical diagnostics and treatment. This review examines the potential of -omics approaches in heart failure, a condition that has not yet fully capitalized on personalized strategies compared to other medical fields like cancer therapy. Here, we argue that integrating multi-omics technology with systems medicine approaches could fundamentally transform heart failure management, moving away from the traditional paradigm of 'one size fits all'. Our review examines how omics can enhance understanding of heart failure's molecular foundations and contribute to a more comprehensive disease classification. We draw attention to the current state of medical practice that only relies on clinical evidence and a number of standard laboratory tests. At the same time, we propose a shift towards a universal approach that uses quantitative data from multi-omics to unravel complex molecular interactions. The discussion centres around the potential of the transition as a means to enhance individual risk assessment and emphasizes management within clinical settings. While the use of omics in cardiovascular research is not recent, many past studies have focused only on a single omics approach. In order to achieve a better understanding of disease mechanisms, we explore more holistic approaches using genomics, transcriptomics, epigenomics, and proteomics. This review concludes with a call to action to adopt multi-omics in clinical trials and practice to pave the way for more personalized disease management and more effective heart failure interventions.
Collapse
Affiliation(s)
- Antonio Esquivel Gaytan
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Nils Bomer
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Niels Grote Beverborg
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| | - Peter van der Meer
- Department of Cardiology, University Medical Centre Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
24
|
Chakraborty S, Sharma G, Karmakar S, Banerjee S. Multi-OMICS approaches in cancer biology: New era in cancer therapy. Biochim Biophys Acta Mol Basis Dis 2024; 1870:167120. [PMID: 38484941 DOI: 10.1016/j.bbadis.2024.167120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/06/2024] [Accepted: 03/06/2024] [Indexed: 04/01/2024]
Abstract
Innovative multi-omics frameworks integrate diverse datasets from the same patients to enhance our understanding of the molecular and clinical aspects of cancers. Advanced omics and multi-view clustering algorithms present unprecedented opportunities for classifying cancers into subtypes, refining survival predictions and treatment outcomes, and unravelling key pathophysiological processes across various molecular layers. However, with the increasing availability of cost-effective high-throughput technologies (HTT) that generate vast amounts of data, analyzing single layers often falls short of establishing causal relations. Integrating multi-omics data spanning genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes offers unique prospects to comprehend the underlying biology of complex diseases like cancer. This discussion explores algorithmic frameworks designed to uncover cancer subtypes, disease mechanisms, and methods for identifying pivotal genomic alterations. It also underscores the significance of multi-omics in tumor classifications, diagnostics, and prognostications. Despite its unparalleled advantages, the integration of multi-omics data has been slow to find its way into everyday clinics. A major hurdle is the uneven maturity of different omics approaches and the widening gap between the generation of large datasets and the capacity to process this data. Initiatives promoting the standardization of sample processing and analytical pipelines, as well as multidisciplinary training for experts in data analysis and interpretation, are crucial for translating theoretical findings into practical applications.
Collapse
Affiliation(s)
- Sohini Chakraborty
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Gaurav Sharma
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Sricheta Karmakar
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India
| | - Satarupa Banerjee
- Department of Biotechnology, School of Biosciences and Technology, Vellore Institute of Technology, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
25
|
Sibilio P, Conte F, Huang Y, Castaldi PJ, Hersh CP, DeMeo DL, Silverman EK, Paci P. Correlation-based network integration of lung RNA sequencing and DNA methylation data in chronic obstructive pulmonary disease. Heliyon 2024; 10:e31301. [PMID: 38807864 PMCID: PMC11130701 DOI: 10.1016/j.heliyon.2024.e31301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 05/08/2024] [Accepted: 05/14/2024] [Indexed: 05/30/2024] Open
Abstract
Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous, chronic inflammatory process of the lungs and, like other complex diseases, is caused by both genetic and environmental factors. Detailed understanding of the molecular mechanisms of complex diseases requires the study of the interplay among different biomolecular layers, and thus the integration of different omics data types. In this study, we investigated COPD-associated molecular mechanisms through a correlation-based network integration of lung tissue RNA-seq and DNA methylation data of COPD cases (n = 446) and controls (n = 346) derived from the Lung Tissue Research Consortium. First, we performed a SWIM-network based analysis to build separate correlation networks for RNA-seq and DNA methylation data for our case-control study population. Then, we developed a method to integrate the results into a coupled network of differentially expressed and differentially methylated genes to investigate their relationships across both molecular layers. The functional enrichment analysis of the nodes of the coupled network revealed a strikingly significant enrichment in Immune System components, both innate and adaptive, as well as immune-system component communication (interleukin and cytokine-cytokine signaling). Our analysis allowed us to reveal novel putative COPD-associated genes and to analyze their relationships, both at the transcriptomics and epigenomics levels, thus contributing to an improved understanding of COPD pathogenesis.
Collapse
Affiliation(s)
- Pasquale Sibilio
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, Rome, Italy
| | - Federica Conte
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, Rome, Italy
| | - Yichen Huang
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dawn L. DeMeo
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Paola Paci
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
- Institute for Systems Analysis and Computer Science “Antonio Ruberti”, National Research Council, Rome, Italy
- Karolinska Institutet, 17177, Stockholm, Sweden
| |
Collapse
|
26
|
Novoloaca A, Broc C, Beloeil L, Yu WH, Becker J. Comparative analysis of integrative classification methods for multi-omics data. Brief Bioinform 2024; 25:bbae331. [PMID: 38985929 PMCID: PMC11234228 DOI: 10.1093/bib/bbae331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 05/31/2024] [Indexed: 07/12/2024] Open
Abstract
Recent advances in sequencing, mass spectrometry, and cytometry technologies have enabled researchers to collect multiple 'omics data types from a single sample. These large datasets have led to a growing consensus that a holistic approach is needed to identify new candidate biomarkers and unveil mechanisms underlying disease etiology, a key to precision medicine. While many reviews and benchmarks have been conducted on unsupervised approaches, their supervised counterparts have received less attention in the literature and no gold standard has emerged yet. In this work, we present a thorough comparison of a selection of six methods, representative of the main families of intermediate integrative approaches (matrix factorization, multiple kernel methods, ensemble learning, and graph-based methods). As non-integrative control, random forest was performed on concatenated and separated data types. Methods were evaluated for classification performance on both simulated and real-world datasets, the latter being carefully selected to cover different medical applications (infectious diseases, oncology, and vaccines) and data modalities. A total of 15 simulation scenarios were designed from the real-world datasets to explore a large and realistic parameter space (e.g. sample size, dimensionality, class imbalance, effect size). On real data, the method comparison showed that integrative approaches performed better or equally well than their non-integrative counterpart. By contrast, DIABLO and the four random forest alternatives outperform the others across the majority of simulation scenarios. The strengths and limitations of these methods are discussed in detail as well as guidelines for future applications.
Collapse
Affiliation(s)
- Alexei Novoloaca
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Camilo Broc
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Laurent Beloeil
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| | - Wen-Han Yu
- Bill & Melinda Gates Medical Research Institute, Cambridge, Massachusetts, MA 02139, United States
| | - Jérémie Becker
- BIOASTER Research Institute, 40 avenue Tony Garnier, F-69007 Lyon, France
| |
Collapse
|
27
|
Banerjee J, Tiwari AK, Banerjee S. Drug repurposing for cancer. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2024; 207:123-150. [PMID: 38942535 DOI: 10.1016/bs.pmbts.2024.03.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/30/2024]
Abstract
In the dynamic landscape of cancer therapeutics, the innovative strategy of drug repurposing emerges as a transformative paradigm, heralding a new era in the fight against malignancies. This book chapter aims to embark on the comprehension of the strategic deployment of approved drugs for repurposing and the meticulous journey of drug repurposing from earlier times to the current era. Moreover, the chapter underscores the multifaceted and complex nature of cancer biology, and the evolving field of cancer drug therapeutics while emphasizing the mandate of drug repurposing to advance cancer therapeutics. Importantly, the narrative explores the latest tools, technologies, and cutting-edge methodologies including high-throughput screening, omics technologies, and artificial intelligence-driven approaches, for shaping and accelerating the pace of drug repurposing to uncover novel cancer therapeutic avenues. The chapter critically assesses the breakthroughs, expanding the repertoire of repurposing drug candidates in cancer, and their major categories. Another focal point of this book chapter is that it addresses the emergence of combination therapies involving repurposed drugs, reflecting a shift towards personalized and synergistic treatment approaches. The expert analysis delves into the intricacies of combinatorial regimens, elucidating their potential to target heterogeneous cancer populations and overcome resistance mechanisms, thereby enhancing treatment efficacy. Therefore, this chapter provides in-depth insights into the potential of repurposing towards bringing the much-needed big leap in the field of cancer therapeutics.
Collapse
Affiliation(s)
- Juni Banerjee
- Department of Biotechnology and Bioengineering, Institute of Advanced Research (IAR), Gandhinagar, Gujarat, India
| | - Anand Krishna Tiwari
- Department of Biotechnology and Bioengineering, Institute of Advanced Research (IAR), Gandhinagar, Gujarat, India
| | - Shuvomoy Banerjee
- Department of Biotechnology and Bioengineering, Institute of Advanced Research (IAR), Gandhinagar, Gujarat, India.
| |
Collapse
|
28
|
Drouard G, Mykkänen J, Heiskanen J, Pohjonen J, Ruohonen S, Pahkala K, Lehtimäki T, Wang X, Ollikainen M, Ripatti S, Pirinen M, Raitakari O, Kaprio J. Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data. BMC Med Inform Decis Mak 2024; 24:116. [PMID: 38698395 PMCID: PMC11064347 DOI: 10.1186/s12911-024-02521-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 04/29/2024] [Indexed: 05/05/2024] Open
Abstract
BACKGROUND Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios. METHODS We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning. RESULTS Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively. CONCLUSIONS By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.
Collapse
Affiliation(s)
- Gabin Drouard
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| | - Juha Mykkänen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Jarkko Heiskanen
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Joona Pohjonen
- Research Program in Systems Oncology, University of Helsinki, Helsinki, Finland
| | - Saku Ruohonen
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - Katja Pahkala
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Paavo Nurmi Centre & Unit for Health and Physical Activity, University of Turku, Turku, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, and Finnish Cardiovascular Research Center - Tampere, Faculty of Medicine and Health Technology, Tampere University, 33520, Tampere, Finland
| | - Xiaoling Wang
- Georgia Prevention Institute, Medical College of Georgia, Augusta University, Augusta, GA, USA
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Minerva Foundation Institute for Medical Research, Helsinki, Finland
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku and Turku University Hospital, Turku, Finland
- Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
29
|
Wang H, Liu Z, Ma X. Learning Consistency and Specificity of Cells From Single-Cell Multi-Omic Data. IEEE J Biomed Health Inform 2024; 28:3134-3145. [PMID: 38709615 DOI: 10.1109/jbhi.2024.3370868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Advancements in single-cell technologies concomitantly develop the epigenomic and transcriptomic profiles at the cell levels, providing opportunities to explore the potential biological mechanisms. Even though significant efforts have been dedicated to them, it remains challenging for the integration analysis of multi-omic data of single-cell because of the heterogeneity, complicated coupling and interpretability of data. To handle these issues, we propose a novel self-representation Learning-based Multi-omics data Integrative Clustering algorithm (sLMIC) for the integration of single-cell epigenomic profiles (DNA methylation or scATAC-seq) and transcriptomic (scRNA-seq), which the consistent and specific features of cells are explicitly extracted facilitating the cell clustering. Specifically, sLMIC constructs a graph for each type of single-cell data, thereby transforming omics data into multi-layer networks, which effectively removes heterogeneity of omic data. Then, sLMIC employs the low-rank and exclusivity constraints to separate the self-representation of cells into two parts, i.e., the shared and specific features, which explicitly characterize the consistency and diversity of omic data, providing an effective strategy to model the structure of cell types. Feature extraction and cell clustering are jointly formulated as an overall objective function, where latent features of data are obtained under the guidance of cell clustering. The extensive experimental results on 13 multi-omics datasets of single-cell from diverse organisms and tissues indicate that sLMIC observably exceeds the advanced algorithms regarding various measurements.
Collapse
|
30
|
Ewald JD, Zhou G, Lu Y, Kolic J, Ellis C, Johnson JD, Macdonald PE, Xia J. Web-based multi-omics integration using the Analyst software suite. Nat Protoc 2024; 19:1467-1497. [PMID: 38355833 DOI: 10.1038/s41596-023-00950-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 11/21/2023] [Indexed: 02/16/2024]
Abstract
The growing number of multi-omics studies demands clear conceptual workflows coupled with easy-to-use software tools to facilitate data analysis and interpretation. This protocol covers three key components involved in multi-omics analysis, including single-omics data analysis, knowledge-driven integration using biological networks and data-driven integration through joint dimensionality reduction. Using the dataset from a recent multi-omics study of human pancreatic islet tissue and plasma samples, the first section introduces how to perform transcriptomics/proteomics data analysis using ExpressAnalyst and lipidomics data analysis using MetaboAnalyst. On the basis of significant features detected in these workflows, the second section demonstrates how to perform knowledge-driven integration using OmicsNet. The last section illustrates how to perform data-driven integration from the normalized omics data and metadata using OmicsAnalyst. The complete protocol can be executed in ~2 h. Compared with other available options for multi-omics integration, the Analyst software suite described in this protocol enables researchers to perform a wide range of omics data analysis tasks via a user-friendly web interface.
Collapse
Affiliation(s)
- Jessica D Ewald
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Guangyan Zhou
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada
| | - Yao Lu
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada
| | - Jelena Kolic
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Cara Ellis
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - James D Johnson
- Life Sciences Institute, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Patrick E Macdonald
- Department of Pharmacology, University of Alberta, Edmonton, Alberta, Canada
| | - Jianguo Xia
- Institute of Parasitology, McGill University, Montreal, Quebec, Canada.
- Department of Microbiology and Immunology, McGill University, Montreal, Quebec, Canada.
| |
Collapse
|
31
|
Williams A. Multiomics data integration, limitations, and prospects to reveal the metabolic activity of the coral holobiont. FEMS Microbiol Ecol 2024; 100:fiae058. [PMID: 38653719 PMCID: PMC11067971 DOI: 10.1093/femsec/fiae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 03/25/2024] [Accepted: 04/22/2024] [Indexed: 04/25/2024] Open
Abstract
Since their radiation in the Middle Triassic period ∼240 million years ago, stony corals have survived past climate fluctuations and five mass extinctions. Their long-term survival underscores the inherent resilience of corals, particularly when considering the nutrient-poor marine environments in which they have thrived. However, coral bleaching has emerged as a global threat to coral survival, requiring rapid advancements in coral research to understand holobiont stress responses and allow for interventions before extensive bleaching occurs. This review encompasses the potential, as well as the limits, of multiomics data applications when applied to the coral holobiont. Synopses for how different omics tools have been applied to date and their current restrictions are discussed, in addition to ways these restrictions may be overcome, such as recruiting new technology to studies, utilizing novel bioinformatics approaches, and generally integrating omics data. Lastly, this review presents considerations for the design of holobiont multiomics studies to support lab-to-field advancements of coral stress marker monitoring systems. Although much of the bleaching mechanism has eluded investigation to date, multiomic studies have already produced key findings regarding the holobiont's stress response, and have the potential to advance the field further.
Collapse
Affiliation(s)
- Amanda Williams
- Microbial Biology Graduate Program, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Drive, New Brunswick, NJ 08901, United States
| |
Collapse
|
32
|
Lundy DJ, Szomolay B, Liao CT. Systems Approaches to Cell Culture-Derived Extracellular Vesicles for Acute Kidney Injury Therapy: Prospects and Challenges. FUNCTION 2024; 5:zqae012. [PMID: 38706963 PMCID: PMC11065115 DOI: 10.1093/function/zqae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Revised: 03/02/2024] [Accepted: 03/05/2024] [Indexed: 05/07/2024] Open
Abstract
Acute kidney injury (AKI) is a heterogeneous syndrome, comprising diverse etiologies of kidney insults that result in high mortality and morbidity if not well managed. Although great efforts have been made to investigate underlying pathogenic mechanisms of AKI, there are limited therapeutic strategies available. Extracellular vesicles (EV) are membrane-bound vesicles secreted by various cell types, which can serve as cell-free therapy through transfer of bioactive molecules. In this review, we first overview the AKI syndrome and EV biology, with a particular focus on the technical aspects and therapeutic application of cell culture-derived EVs. Second, we illustrate how multi-omic approaches to EV miRNA, protein, and genomic cargo analysis can yield new insights into their mechanisms of action and address unresolved questions in the field. We then summarize major experimental evidence regarding the therapeutic potential of EVs in AKI, which we subdivide into stem cell and non-stem cell-derived EVs. Finally, we highlight the challenges and opportunities related to the clinical translation of animal studies into human patients.
Collapse
Affiliation(s)
- David J Lundy
- Graduate Institute of Biomedical Materials & Tissue Engineering, Taipei Medical University, Taipei 235603, Taiwan
- International PhD Program in Biomedical Engineering, Taipei Medical University, Taipei 235603, Taiwan
- Center for Cell Therapy, Taipei Medical University Hospital, Taipei 110301, Taiwan
| | - Barbara Szomolay
- Systems Immunity Research Institute, Cardiff University School of Medicine, Cardiff CF14 4XN, UK
- Division of Infection and Immunity, Cardiff University School of Medicine, Cardiff CF14 4XN, UK
| | - Chia-Te Liao
- Division of Nephrology, Department of Internal Medicine, Shuang Ho Hospital, Taipei Medical University, New Taipei City 23561, Taiwan
- Division of Nephrology, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
- Research Center of Urology and Kidney, Taipei Medical University, Taipei 110, Taiwan
| |
Collapse
|
33
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578316. [PMID: 38562775 PMCID: PMC10983863 DOI: 10.1101/2024.02.01.578316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement. Key points An introduction to available computational methods for predicting functional TRs from a query gene set.A detailed walk-through along with practical concerns and limitations.A systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.NGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
Collapse
|
34
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RPJ, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. PLoS Comput Biol 2024; 20:e1011814. [PMID: 38527092 PMCID: PMC10994553 DOI: 10.1371/journal.pcbi.1011814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 04/04/2024] [Accepted: 03/11/2024] [Indexed: 03/27/2024] Open
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. PathIntegrate is available as an open-source Python package.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, Denver, Colorado, United States of America
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Rachel PJ Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
35
|
Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, Glatt SJ. A primer on the use of machine learning to distil knowledge from data in biological psychiatry. Mol Psychiatry 2024; 29:387-401. [PMID: 38177352 PMCID: PMC11228968 DOI: 10.1038/s41380-023-02334-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024]
Abstract
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Collapse
Affiliation(s)
- Thomas P Quinn
- Applied Artificial Intelligence Institute (A2I2), Burwood, VIC, 3125, Australia
| | - Jonathan L Hess
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Victoria S Marshe
- Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Michelle M Barnett
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Anne-Christin Hauschild
- Department of Medical Informatics, Medical University Center Göttingen, Göttingen, Lower Saxony, 37075, Germany
| | - Malgorzata Maciukiewicz
- Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland
- Department of Rheumatology and Immunology, University Hospital Bern, Bern, 3010, Switzerland
- Department for Biomedical Research (DBMR), University of Bern, Bern, 3010, Switzerland
| | - Samar S M Elsheikh
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Xiaoyu Men
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Yannis J Trakadis
- Department Human Genetics, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Michael S Breen
- Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric J Barnett
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Yanli Zhang-James
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Mehmet Eren Ahsen
- Department of Business Administration, Gies College of Business, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Biomedical and Translational Sciences, Carle-Illinois School of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Jiahui Hou
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Asif Salekin
- Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244, USA
| | - Ping-I Lin
- Discipline of Psychiatry and Mental Health, University of New South Wales, Sydney, NSW, 2052, Australia
- Mental Health Research Unit, South Western Sydney Local Health District, Liverpool, NSW, 2170, Australia
| | | | - Andreas Meyer-Lindenberg
- Clinical Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Isabelle Bichindaritz
- Biomedical and Health Informatics/Computer Science Department, State University of New York at Oswego, Oswego, NY, 13126, USA
- Intelligent Bio Systems Lab, State University of New York at Oswego, Oswego, NY, 13126, USA
| | - Stephen V Faraone
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Daniel J Müller
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Würzburg, 97080, Germany
| | - Stephen J Glatt
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Public Health and Preventive Medicine, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
| |
Collapse
|
36
|
Cai Y, Wang S. Deeply integrating latent consistent representations in high-noise multi-omics data for cancer subtyping. Brief Bioinform 2024; 25:bbae061. [PMID: 38426322 PMCID: PMC10939425 DOI: 10.1093/bib/bbae061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 01/13/2024] [Accepted: 01/29/2024] [Indexed: 03/02/2024] Open
Abstract
Cancer is a complex and high-mortality disease regulated by multiple factors. Accurate cancer subtyping is crucial for formulating personalized treatment plans and improving patient survival rates. The underlying mechanisms that drive cancer progression can be comprehensively understood by analyzing multi-omics data. However, the high noise levels in omics data often pose challenges in capturing consistent representations and adequately integrating their information. This paper proposed a novel variational autoencoder-based deep learning model, named Deeply Integrating Latent Consistent Representations (DILCR). Firstly, multiple independent variational autoencoders and contrastive loss functions were designed to separate noise from omics data and capture latent consistent representations. Subsequently, an Attention Deep Integration Network was proposed to integrate consistent representations across different omics levels effectively. Additionally, we introduced the Improved Deep Embedded Clustering algorithm to make integrated variable clustering friendly. The effectiveness of DILCR was evaluated using 10 typical cancer datasets from The Cancer Genome Atlas and compared with 14 state-of-the-art integration methods. The results demonstrated that DILCR effectively captures the consistent representations in omics data and outperforms other integration methods in cancer subtyping. In the Kidney Renal Clear Cell Carcinoma case study, cancer subtypes were identified by DILCR with significant biological significance and interpretability.
Collapse
Affiliation(s)
- Yueyi Cai
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| |
Collapse
|
37
|
Mardoc E, Sow MD, Déjean S, Salse J. Genomic data integration tutorial, a plant case study. BMC Genomics 2024; 25:66. [PMID: 38233804 PMCID: PMC10792847 DOI: 10.1186/s12864-023-09833-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 11/22/2023] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND The ongoing evolution of the Next Generation Sequencing (NGS) technologies has led to the production of genomic data on a massive scale. While tools for genomic data integration and analysis are becoming increasingly available, the conceptual and analytical complexities still represent a great challenge in many biological contexts. RESULTS To address this issue, we describe a six-steps tutorial for the best practices in genomic data integration, consisting of (1) designing a data matrix; (2) formulating a specific biological question toward data description, selection and prediction; (3) selecting a tool adapted to the targeted questions; (4) preprocessing of the data; (5) conducting preliminary analysis, and finally (6) executing genomic data integration. CONCLUSION The tutorial has been tested and demonstrated on publicly available genomic data generated from poplar (Populus L.), a woody plant model. We also developed a new graphical output for the unsupervised multi-block analysis, cimDiablo_v2, available at https://forgemia.inra.fr/umr-gdec/omics-integration-on-poplar , and allowing the selection of master drivers in genomic data variation and interplay.
Collapse
Affiliation(s)
- Emile Mardoc
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Mamadou Dia Sow
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France
| | - Sébastien Déjean
- Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS, Université Paul Sabatier, Toulouse, France
| | - Jérôme Salse
- UCA-INRAE UMR 1095 Genetics, Diversity and Ecophysiology of Cereals (GDEC), 5 Chemin de Beaulieu, 63000, Clermont-Ferrand, France.
| |
Collapse
|
38
|
Wieder C, Cooke J, Frainay C, Poupin N, Bowler R, Jourdan F, Kechris KJ, Lai RP, Ebbels T. PathIntegrate: Multivariate modelling approaches for pathway-based multi-omics data integration. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.09.574780. [PMID: 38260498 PMCID: PMC10802464 DOI: 10.1101/2024.01.09.574780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. The PathIntegrate Python package is available at https://github.com/cwieder/PathIntegrate.
Collapse
Affiliation(s)
- Cecilia Wieder
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Juliette Cooke
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Clement Frainay
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Nathalie Poupin
- Toxalim (Research Centre in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, Toulouse, France
| | - Russell Bowler
- National Jewish Health, 1400 Jackson Street, Denver, CO, 80206, USA
| | - Fabien Jourdan
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, France
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America
| | - Rachel Pj Lai
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Timothy Ebbels
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
39
|
Amente LD, Mills NT, Le TD, Hyppönen E, Lee SH. Unraveling phenotypic variance in metabolic syndrome through multi-omics. Hum Genet 2024; 143:35-47. [PMID: 38095720 DOI: 10.1007/s00439-023-02619-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 11/18/2023] [Indexed: 01/19/2024]
Abstract
Complex multi-omics effects drive the clustering of cardiometabolic risk factors, underscoring the imperative to comprehend how individual and combined omics shape phenotypic variation. Our study partitions phenotypic variance in metabolic syndrome (MetS), blood glucose (GLU), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and blood pressure through genome, transcriptome, metabolome, and exposome (i.e., lifestyle exposome) analyses. Our analysis included a cohort of 62,822 unrelated individuals with white British ancestry, sourced from the UK biobank. We employed linear mixed models to partition phenotypic variance using the restricted maximum likelihood (REML) method, implemented in MTG2 (v2.22). We initiated the analysis by individually modeling omics, followed by subsequent integration of pairwise omics in a joint model that also accounted for the covariance and interaction between omics layers. Finally, we estimated the correlations of various omics effects between the phenotypes using bivariate REML. Significant proportions of the MetS variance were attributed to distinct data sources: genome (9.47%), transcriptome (4.24%), metabolome (14.34%), and exposome (3.77%). The phenotypic variances explained by the genome, transcriptome, metabolome, and exposome ranged from 3.28% for GLU to 25.35% for HDL-C, 0% for GLU to 19.34% for HDL-C, 4.29% for systolic blood pressure (SBP) to 35.75% for TG, and 0.89% for GLU to 10.17% for HDL-C, respectively. Significant correlations were found between genomic and transcriptomic effects for TG and HDL-C. Furthermore, significant interaction effects between omics data were detected for both MetS and its components. Interestingly, significant correlation of omics effect between the phenotypes was found. This study underscores omics' roles, interaction effects, and random-effects covariance in unveiling phenotypic variation in multi-omics domains.
Collapse
Affiliation(s)
- Lamessa Dube Amente
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.
| | - Natalie T Mills
- Discipline of Psychiatry, University of Adelaide, Adelaide, SA, 5000, Australia
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia
- UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.
| |
Collapse
|
40
|
Sharma V, Singh A, Chauhan S, Sharma PK, Chaudhary S, Sharma A, Porwal O, Fuloria NK. Role of Artificial Intelligence in Drug Discovery and Target Identification in Cancer. Curr Drug Deliv 2024; 21:870-886. [PMID: 37670704 DOI: 10.2174/1567201821666230905090621] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 03/08/2023] [Accepted: 03/24/2023] [Indexed: 09/07/2023]
Abstract
Drug discovery and development (DDD) is a highly complex process that necessitates precise monitoring and extensive data analysis at each stage. Furthermore, the DDD process is both timeconsuming and costly. To tackle these concerns, artificial intelligence (AI) technology can be used, which facilitates rapid and precise analysis of extensive datasets within a limited timeframe. The pathophysiology of cancer disease is complicated and requires extensive research for novel drug discovery and development. The first stage in the process of drug discovery and development involves identifying targets. Cell structure and molecular functioning are complex due to the vast number of molecules that function constantly, performing various roles. Furthermore, scientists are continually discovering novel cellular mechanisms and molecules, expanding the range of potential targets. Accurately identifying the correct target is a crucial step in the preparation of a treatment strategy. Various forms of AI, such as machine learning, neural-based learning, deep learning, and network-based learning, are currently being utilised in applications, online services, and databases. These technologies facilitate the identification and validation of targets, ultimately contributing to the success of projects. This review focuses on the different types and subcategories of AI databases utilised in the field of drug discovery and target identification for cancer.
Collapse
Affiliation(s)
- Vishal Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Amit Singh
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Sanjana Chauhan
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Pramod Kumar Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Shubham Chaudhary
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Astha Sharma
- Department of Pharmacy, Galgotias University, Greater Noida, Uttar Pradesh, 201310, India
| | - Omji Porwal
- Department of Pharmacognosy, Faculty of Pharmacy, Tishk International University, Erbil 44001, Iraq
| | | |
Collapse
|
41
|
Na AY, Lee H, Min EK, Paudel S, Choi SY, Sim H, Liu KH, Kim KT, Bae JS, Lee S. Novel Time-dependent Multi-omics Integration in Sepsis-associated Liver Dysfunction. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1101-1116. [PMID: 37084954 PMCID: PMC11082264 DOI: 10.1016/j.gpb.2023.04.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 02/03/2023] [Accepted: 04/11/2023] [Indexed: 04/23/2023]
Abstract
The recently developed technologies that allow the analysis of each single omics have provided an unbiased insight into ongoing disease processes. However, it remains challenging to specify the study design for the subsequent integration strategies that can associate sepsis pathophysiology and clinical outcomes. Here, we conducted a time-dependent multi-omics integration (TDMI) in a sepsis-associated liver dysfunction (SALD) model. We successfully deduced the relation of the Toll-like receptor 4 (TLR4) pathway with SALD. Although TLR4 is a critical factor in sepsis progression, it is not specified in single-omics analyses but only in the TDMI analysis. This finding indicates that the TDMI-based approach is more advantageous than single-omics analyses in terms of exploring the underlying pathophysiological mechanism of SALD. Furthermore, TDMI-based approach can be an ideal paradigm for insightful biological interpretations of multi-omics datasets that will potentially reveal novel insights into basic biology, health, and diseases, thus allowing the identification of promising candidates for therapeutic strategies.
Collapse
Affiliation(s)
- Ann-Yae Na
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Hyojin Lee
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
| | - Eun Ki Min
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
| | - Sanjita Paudel
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - So Young Choi
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - HyunChae Sim
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Kwang-Hyeon Liu
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Ki-Tae Kim
- Department of Environmental Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
| | - Jong-Sup Bae
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea
| | - Sangkyu Lee
- Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu 41566, Republic of Korea; BK21 FOUR Community-Based Intelligent Novel Drug Discovery Education Unit, College of Pharmacy, Kyungpook National University, Daegu 41566, Republic of Korea; School of Pharmacy, Sungkyunkwan University, Suwon 16419, Republic of Korea.
| |
Collapse
|
42
|
Fernandez ME, Martinez-Romero J, Aon MA, Bernier M, Price NL, de Cabo R. How is Big Data reshaping preclinical aging research? Lab Anim (NY) 2023; 52:289-314. [PMID: 38017182 DOI: 10.1038/s41684-023-01286-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/10/2023] [Indexed: 11/30/2023]
Abstract
The exponential scientific and technological progress during the past 30 years has favored the comprehensive characterization of aging processes with their multivariate nature, leading to the advent of Big Data in preclinical aging research. Spanning from molecular omics to organism-level deep phenotyping, Big Data demands large computational resources for storage and analysis, as well as new analytical tools and conceptual frameworks to gain novel insights leading to discovery. Systems biology has emerged as a paradigm that utilizes Big Data to gain insightful information enabling a better understanding of living organisms, visualized as multilayered networks of interacting molecules, cells, tissues and organs at different spatiotemporal scales. In this framework, where aging, health and disease represent emergent states from an evolving dynamic complex system, context given by, for example, strain, sex and feeding times, becomes paramount for defining the biological trajectory of an organism. Using bioinformatics and artificial intelligence, the systems biology approach is leading to remarkable advances in our understanding of the underlying mechanism of aging biology and assisting in creative experimental study designs in animal models. Future in-depth knowledge acquisition will depend on the ability to fully integrate information from different spatiotemporal scales in organisms, which will probably require the adoption of theories and methods from the field of complex systems. Here we review state-of-the-art approaches in preclinical research, with a focus on rodent models, that are leading to conceptual and/or technical advances in leveraging Big Data to understand basic aging biology and its full translational potential.
Collapse
Affiliation(s)
- Maria Emilia Fernandez
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Jorge Martinez-Romero
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Epidemiology and Population Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Miguel A Aon
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
- Laboratory of Cardiovascular Science, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Michel Bernier
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Nathan L Price
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Rafael de Cabo
- Experimental Gerontology Section, Translational Gerontology Branch, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA.
| |
Collapse
|
43
|
Li CX, Chen H, Zounemat-Kermani N, Adcock IM, Sköld CM, Zhou M, Wheelock ÅM. Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage. Brief Bioinform 2023; 25:bbad501. [PMID: 38205966 PMCID: PMC10782800 DOI: 10.1093/bib/bbad501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 11/14/2023] [Accepted: 12/01/2023] [Indexed: 01/12/2024] Open
Abstract
Multi-omics data integration is a complex and challenging task in biomedical research. Consensus clustering, also known as meta-clustering or cluster ensembles, has become an increasingly popular downstream tool for phenotyping and endotyping using multiple omics and clinical data. However, current consensus clustering methods typically rely on ensembling clustering outputs with similar sample coverages (mathematical replicates), which may not reflect real-world data with varying sample coverages (biological replicates). To address this issue, we propose a new consensus clustering with missing labels (ccml) strategy termed ccml, an R protocol for two-step consensus clustering that can handle unequal missing labels (i.e. multiple predictive labels with different sample coverages). Initially, the regular consensus weights are adjusted (normalized) by sample coverage, then a regular consensus clustering is performed to predict the optimal final cluster. We applied the ccml method to predict molecularly distinct groups based on 9-omics integration in the Karolinska COSMIC cohort, which investigates chronic obstructive pulmonary disease, and 24-omics handprint integrative subgrouping of adult asthma patients of the U-BIOPRED cohort. We propose ccml as a downstream toolkit for multi-omics integration analysis algorithms such as Similarity Network Fusion and robust clustering of clinical data to overcome the limitations posed by missing data, which is inevitable in human cohorts consisting of multiple data modalities. The ccml tool is available in the R language (https://CRAN.R-project.org/package=ccml, https://github.com/pulmonomics-lab/ccml, or https://github.com/ZhoulabCPH/ccml).
Collapse
Affiliation(s)
- Chuan-Xing Li
- Respiratory Medicine Unit, Department of Medicine Solna & Centre for Molecular Medicine, Karolinska Institutet
| | - Hongyan Chen
- School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China
| | - Nazanin Zounemat-Kermani
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London, United Kingdom
- Data Science Institute, Imperial College London, London, United Kingdom
| | - Ian M Adcock
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, London, United Kingdom
- Data Science Institute, Imperial College London, London, United Kingdom
| | - C Magnus Sköld
- Respiratory Medicine Unit, Department of Medicine Solna & Centre for Molecular Medicine, Karolinska Institutet
- Department of Respiratory Medicine and Allergy, Karolinska University Hospital Solna, Stockholm, Sweden
| | - Meng Zhou
- School of Biomedical Engineering, Wenzhou Medical University, Wenzhou, China
| | - Åsa M Wheelock
- Respiratory Medicine Unit, Department of Medicine Solna & Centre for Molecular Medicine, Karolinska Institutet
- Department of Respiratory Medicine and Allergy, Karolinska University Hospital Solna, Stockholm, Sweden
| | | |
Collapse
|
44
|
Xing Z, Lin D, Hong Y, Ma Z, Jiang H, Lu Y, Sun J, Song J, Xie L, Yang M, Xie X, Wang T, Zhou H, Chen X, Wang X, Gao J. Construction of a prognostic 6-gene signature for breast cancer based on multi-omics and single-cell data. Front Oncol 2023; 13:1186858. [PMID: 38074669 PMCID: PMC10698552 DOI: 10.3389/fonc.2023.1186858] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 10/25/2023] [Indexed: 03/05/2025] Open
Abstract
BACKGROUND Breast cancer (BC) is one of the females' most common malignant tumors there are large individual differences in its prognosis. We intended to uncover novel useful genetic biomarkers and a risk signature for BC to aid determining clinical strategies. METHODS A combined significance (p combined) was calculated for each gene by Fisher's method based on the RNA-seq, CNV, and DNA methylation data from TCGA-BRCA. Genes with a p combined< 0.01 were subjected to univariate cox and Lasso regression, whereby an RS signature was established. The predicted performance of the RS signature would be assessed in GSE7390 and GSE20685, and emphatically analyzed in triple-negative breast cancer (TNBC) patients, while the expression of immune checkpoints and drug sensitivity were also examined. GSE176078, a single-cell dataset, was used to validate the differences in cellular composition in tumors between TNBC patients with different RS. RESULTS The RS signature consisted of C15orf52, C1orf228, CEL, FUZ, PAK6, and SIRPG showed good performance. It could distinguish the prognosis of patients well, even stratified by disease stages or subtypes and also showed a stronger predictive ability than traditional clinical indicators. The down-regulated expressions of many immune checkpoints, while the decreased sensitivity of many antitumor drugs was observed in TNBC patients with higher RS. The overall cells and lymphocytes composition differed between patients with different RS, which could facilitate a more personalized treatment. CONCLUSION The six genes RS signature established based on multi-omics data exhibited well performance in predicting the prognosis of BC patients, regardless of disease stages or subtypes. Contributing to a more personalized treatment, our signature might benefit the outcome of BC patients.
Collapse
Affiliation(s)
- Zeyu Xing
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Dongcai Lin
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Yuting Hong
- Department of Scientific Research Projects, Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing, China
| | - Zihuan Ma
- Department of Scientific Research Projects, Beijing ChosenMed Clinical Laboratory Co. Ltd., Beijing, China
| | - Hongnan Jiang
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Ye Lu
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Jiale Sun
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Jiarui Song
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Li Xie
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Man Yang
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Xintong Xie
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Tianyu Wang
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Hong Zhou
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Xiaoqi Chen
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| | - Xiang Wang
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jidong Gao
- National Cancer Center, National Clinical Research Center for Cancer, Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Shenzhen, China
| |
Collapse
|
45
|
Guo W, Lv C, Guo M, Zhao Q, Yin X, Zhang L. Innovative applications of artificial intelligence in zoonotic disease management. SCIENCE IN ONE HEALTH 2023; 2:100045. [PMID: 39077042 PMCID: PMC11262289 DOI: 10.1016/j.soh.2023.100045] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 10/22/2023] [Indexed: 07/31/2024]
Abstract
Zoonotic diseases, transmitted between humans and animals, pose a substantial threat to global public health. In recent years, artificial intelligence (AI) has emerged as a transformative tool in the fight against diseases. This comprehensive review discusses the innovative applications of AI in the management of zoonotic diseases, including disease prediction, early diagnosis, drug development, and future prospects. AI-driven predictive models leverage extensive datasets to predict disease outbreaks and transmission patterns, thereby facilitating proactive public health responses. Early diagnosis benefits from AI-powered diagnostic tools that expedite pathogen identification and containment. Furthermore, AI technologies have accelerated drug discovery by identifying potential drug targets and optimizing candidate drugs. This review addresses these advancements, while also examining the promising future of AI in zoonotic disease control. We emphasize the pivotal role of AI in revolutionizing our approach to managing zoonotic diseases and highlight its potential to safeguard the health of both humans and animals on a global scale.
Collapse
Affiliation(s)
- Wenqiang Guo
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Chenrui Lv
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Meng Guo
- College of Veterinary Medicine, Henan Agricultural University, Zhengzhou 450046, China
| | - Qiwei Zhao
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinyi Yin
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Li Zhang
- Department of Animal Nutrition and Feed Science, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
46
|
Ranjbari S, Arslanturk S. Integration of incomplete multi-omics data using Knowledge Distillation and Supervised Variational Autoencoders for disease progression prediction. J Biomed Inform 2023; 147:104512. [PMID: 37813325 DOI: 10.1016/j.jbi.2023.104512] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/31/2023] [Accepted: 10/03/2023] [Indexed: 10/11/2023]
Abstract
OBJECTIVE The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. METHODS However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. RESULTS Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. CONCLUSION The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer.
Collapse
Affiliation(s)
- Sima Ranjbari
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| | - Suzan Arslanturk
- Department of Computer Science, Wayne State University, Detroit, 48202, MI, USA.
| |
Collapse
|
47
|
Hai Y, Ma J, Yang K, Wen Y. Bayesian linear mixed model with multiple random effects for prediction analysis on high-dimensional multi-omics data. Bioinformatics 2023; 39:btad647. [PMID: 37882747 PMCID: PMC10627352 DOI: 10.1093/bioinformatics/btad647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/24/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Accurate disease risk prediction is an essential step in the modern quest for precision medicine. While high-dimensional multi-omics data have provided unprecedented data resources for prediction studies, their high-dimensionality and complex inter/intra-relationships have posed significant analytical challenges. RESULTS We proposed a two-step Bayesian linear mixed model framework (TBLMM) for risk prediction analysis on multi-omics data. TBLMM models the predictive effects from multi-omics data using a hybrid of the sparsity regression and linear mixed model with multiple random effects. It can resemble the shape of the true effect size distributions and accounts for non-linear, including interaction effects, among multi-omics data via kernel fusion. It infers its parameters via a computationally efficient variational Bayes algorithm. Through extensive simulation studies and the prediction analyses on the positron emission tomography imaging outcomes using data obtained from the Alzheimer's Disease Neuroimaging Initiative, we have demonstrated that TBLMM can consistently outperform the existing method in predicting the risk of complex traits. AVAILABILITY AND IMPLEMENTATION The corresponding R package is available on GitHub (https://github.com/YaluWen/TBLMM).
Collapse
Affiliation(s)
- Yang Hai
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| | - Jixiang Ma
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Kaixin Yang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
| | - Yalu Wen
- Department of Health Statistics, Shanxi Medical University, Taiyuan, Shanxi Province 030000, China
- Department of Statistics, University of Auckland, Auckland 1010, New Zealand
| |
Collapse
|
48
|
Fiocchi C. Omics and Multi-Omics in IBD: No Integration, No Breakthroughs. Int J Mol Sci 2023; 24:14912. [PMID: 37834360 PMCID: PMC10573814 DOI: 10.3390/ijms241914912] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 09/27/2023] [Accepted: 10/02/2023] [Indexed: 10/15/2023] Open
Abstract
The recent advent of sophisticated technologies like sequencing and mass spectroscopy platforms combined with artificial intelligence-powered analytic tools has initiated a new era of "big data" research in various complex diseases of still-undetermined cause and mechanisms. The investigation of these diseases was, until recently, limited to traditional in vitro and in vivo biological experimentation, but a clear switch to in silico methodologies is now under way. This review tries to provide a comprehensive assessment of state-of-the-art knowledge on omes, omics and multi-omics in inflammatory bowel disease (IBD). The notion and importance of omes, omics and multi-omics in both health and complex diseases like IBD is introduced, followed by a discussion of the various omics believed to be relevant to IBD pathogenesis, and how multi-omics "big data" can generate new insights translatable into useful clinical tools in IBD such as biomarker identification, prediction of remission and relapse, response to therapy, and precision medicine. The pitfalls and limitations of current IBD multi-omics studies are critically analyzed, revealing that, regardless of the types of omes being analyzed, the majority of current reports are still based on simple associations of descriptive retrospective data from cross-sectional patient cohorts rather than more powerful longitudinally collected prospective datasets. Given this limitation, some suggestions are provided on how IBD multi-omics data may be optimized for greater clinical and therapeutic benefit. The review concludes by forecasting the upcoming incorporation of multi-omics analyses in the routine management of IBD.
Collapse
Affiliation(s)
- Claudio Fiocchi
- Department of Inflammation & Immunity, Lerner Research Institute, Cleveland, OH 44195, USA;
- Department of Gastroenterology, Hepatology and Nutrition, Digestive Disease and Surgery Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| |
Collapse
|
49
|
Way GP, Sailem H, Shave S, Kasprowicz R, Carragher NO. Evolution and impact of high content imaging. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2023; 28:292-305. [PMID: 37666456 DOI: 10.1016/j.slasd.2023.08.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/09/2023] [Accepted: 08/29/2023] [Indexed: 09/06/2023]
Abstract
The field of high content imaging has steadily evolved and expanded substantially across many industry and academic research institutions since it was first described in the early 1990's. High content imaging refers to the automated acquisition and analysis of microscopic images from a variety of biological sample types. Integration of high content imaging microscopes with multiwell plate handling robotics enables high content imaging to be performed at scale and support medium- to high-throughput screening of pharmacological, genetic and diverse environmental perturbations upon complex biological systems ranging from 2D cell cultures to 3D tissue organoids to small model organisms. In this perspective article the authors provide a collective view on the following key discussion points relevant to the evolution of high content imaging: • Evolution and impact of high content imaging: An academic perspective • Evolution and impact of high content imaging: An industry perspective • Evolution of high content image analysis • Evolution of high content data analysis pipelines towards multiparametric and phenotypic profiling applications • The role of data integration and multiomics • The role and evolution of image data repositories and sharing standards • Future perspective of high content imaging hardware and software.
Collapse
Affiliation(s)
- Gregory P Way
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Heba Sailem
- School of Cancer and Pharmaceutical Sciences, King's College London, UK
| | - Steven Shave
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK; Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK
| | - Richard Kasprowicz
- GlaxoSmithKline Medicines Research Centre, Gunnels Wood Rd, Stevenage SG1 2NY, UK
| | - Neil O Carragher
- Edinburgh Cancer Research, Cancer Research UK Scotland Centre, Institute of Genetics and Cancer, University of Edinburgh, UK.
| |
Collapse
|
50
|
Ye X, Shang Y, Shi T, Zhang W, Sakurai T. Multi-omics clustering for cancer subtyping based on latent subspace learning. Comput Biol Med 2023; 164:107223. [PMID: 37490833 DOI: 10.1016/j.compbiomed.2023.107223] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Yifan Shang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Weihang Zhang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|