1
|
De Filippis GM, Amalfitano D, Russo C, Tommasino C, Rinaldi AM. A systematic mapping study of semantic technologies in multi-omics data integration. J Biomed Inform 2025; 165:104809. [PMID: 40154721 DOI: 10.1016/j.jbi.2025.104809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 02/03/2025] [Accepted: 03/07/2025] [Indexed: 04/01/2025]
Abstract
OBJECTIVE The integration of multi-omics data is essential for understanding complex biological systems, providing insights beyond single-omics approaches. However, challenges related to data heterogeneity, standardization, and computational scalability persist. This study explores the interdisciplinary application of semantic technologies to enhance data integration, standardization, and analysis in multi-omics research. METHODS We performed a systematic mapping study assessing literature from 2014 to 2024, focusing on the utilization of ontologies, knowledge graphs, and graph-based methods for multi-omics integration. RESULTS Our findings indicate a growing number of publications in this field, predominantly appearing in high-impact journals. The deployment of semantic technologies has notably improved data visualization, querying, and management, thus enhancing gene and pathway discovery, and providing deeper disease insights and more accurate predictive modeling. CONCLUSION The study underscores the significance of semantic technologies in overcoming multi-omics integration challenges. Future research should focus on integrating diverse data types, developing advanced computational tools, and incorporating AI and machine learning to foster personalized medicine applications.
Collapse
Affiliation(s)
- Giovanni Maria De Filippis
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Domenico Amalfitano
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Cristiano Russo
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Cristian Tommasino
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| | - Antonio Maria Rinaldi
- Department of Electrical Engineering and Information Technology DIETI, University of Naples Federico II, Via Claudio, 21, Naples, 80125, Italy.
| |
Collapse
|
2
|
Oğuztüzün Ç, Gao Z, Li H, Xu R. Precision Drug Repurposing (PDR): Patient-level modeling and prediction combining foundational knowledge graph with biobank data. J Biomed Inform 2025; 163:104786. [PMID: 39952626 DOI: 10.1016/j.jbi.2025.104786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2024] [Revised: 12/16/2024] [Accepted: 01/31/2025] [Indexed: 02/17/2025]
Abstract
OBJECTIVE Drug repurposing accelerates therapeutic development by finding new indications for approved drugs. However, accounting for individual patient differences is challenging. This study introduces a Precision Drug Repurposing (PDR) framework at single-patient resolution, integrating individual-level data with a foundational biomedical knowledge graph to enable personalized drug discovery. METHODS We developed a framework integrating patient-specific data from the UK Biobank (Polygenic Risk Scores, biomarker expressions, and medical history) with a comprehensive biomedical knowledge graph (61,146 entities, 1,246,726 relations). Using Alzheimer's Disease as a case study, we compared three diverse patient-specific models with a foundational model through standard link prediction metrics. We evaluated top predicted candidate drugs using patient medication history and literature review. RESULTS Our framework maintained the robust prediction capabilities of the foundational model. The integration of patient data, particularly Polygenic Risk Scores (PRS), significantly influenced drug prioritization (Cohen's d = 1.05 for scoring differences). Ablation studies demonstrated PRS's crucial role, with effect size decreasing to 0.77 upon removal. Each patient model identified novel drug candidates that were missed by the foundational model but showed therapeutic relevance when evaluated using patient's own medication history. These candidates were further supported by aligned literature evidence with the patient-level genetic risk profiles based on PRS. CONCLUSION This exploratory study demonstrates a promising approach to precision drug repurposing by integrating patient-specific data with a foundational knowledge graph.
Collapse
Affiliation(s)
- Çerağ Oğuztüzün
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University, 10900 Euclid Ave, Cleveland, 44106, OH, USA; Department of Computer Science, Case Western Reserve University, 10900 Euclid Ave, Cleveland, 44106, OH, USA
| | - Zhenxiang Gao
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University, 10900 Euclid Ave, Cleveland, 44106, OH, USA
| | - Hui Li
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University, 10900 Euclid Ave, Cleveland, 44106, OH, USA
| | - Rong Xu
- Center for Artificial Intelligence in Drug Discovery, Case Western Reserve University, 10900 Euclid Ave, Cleveland, 44106, OH, USA.
| |
Collapse
|
3
|
Prasanna S, Kumar A, Rao D, Simoes EJ, Rao P. A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning. Front Big Data 2025; 7:1466391. [PMID: 39906190 PMCID: PMC11790625 DOI: 10.3389/fdata.2024.1466391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 12/12/2024] [Indexed: 02/06/2025] Open
Abstract
Advances in high-throughput genome sequencing have enabled large-scale genome sequencing in clinical practice and research studies. By analyzing genomic variants of humans, scientists can gain better understanding of the risk factors of complex diseases such as cancer and COVID-19. To model and analyze the rich genomic data, knowledge graphs (KGs) and graph machine learning (GML) can be regarded as enabling technologies. In this article, we present a scalable tool called VariantKG for analyzing genomic variants of humans modeled using KGs and GML. Specifically, we used publicly available genome sequencing data from patients with COVID-19. VariantKG extracts variant-level genetic information output by a variant calling pipeline, annotates the variant data with additional metadata, and converts the annotated variant information into a KG represented using the Resource Description Framework (RDF). The resulting KG is further enhanced with patient metadata and stored in a scalable graph database that enables efficient RDF indexing and query processing. VariantKG employs the Deep Graph Library (DGL) to perform GML tasks such as node classification. A user can extract a subset of the KG and perform inference tasks using DGL. The user can monitor the training and testing performance and hardware utilization. We tested VariantKG for KG construction by using 1,508 genome sequences, leading to 4 billion RDF statements. We evaluated GML tasks using VariantKG by selecting a subset of 500 sequences from the KG and performing node classification using well-known GML techniques such as GraphSAGE, Graph Convolutional Network (GCN) and Graph Transformer. VariantKG has intuitive user interfaces and features enabling a low barrier to entry for KG construction, model inference, and model interpretation on genomic variants of humans.
Collapse
Affiliation(s)
- Shivika Prasanna
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| | - Ajay Kumar
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| | - Deepthi Rao
- Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO, United States
| | - Eduardo J. Simoes
- Department of Biomedical Informatics, Biostatistics and Medical Epidemiology, University of Missouri, Columbia, MO, United States
| | - Praveen Rao
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, United States
| |
Collapse
|
4
|
Yang X, Huang K, Yang D, Zhao W, Zhou X. Biomedical Big Data Technologies, Applications, and Challenges for Precision Medicine: A Review. GLOBAL CHALLENGES (HOBOKEN, NJ) 2024; 8:2300163. [PMID: 38223896 PMCID: PMC10784210 DOI: 10.1002/gch2.202300163] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 09/20/2023] [Indexed: 01/16/2024]
Abstract
The explosive growth of biomedical Big Data presents both significant opportunities and challenges in the realm of knowledge discovery and translational applications within precision medicine. Efficient management, analysis, and interpretation of big data can pave the way for groundbreaking advancements in precision medicine. However, the unprecedented strides in the automated collection of large-scale molecular and clinical data have also introduced formidable challenges in terms of data analysis and interpretation, necessitating the development of novel computational approaches. Some potential challenges include the curse of dimensionality, data heterogeneity, missing data, class imbalance, and scalability issues. This overview article focuses on the recent progress and breakthroughs in the application of big data within precision medicine. Key aspects are summarized, including content, data sources, technologies, tools, challenges, and existing gaps. Nine fields-Datawarehouse and data management, electronic medical record, biomedical imaging informatics, Artificial intelligence-aided surgical design and surgery optimization, omics data, health monitoring data, knowledge graph, public health informatics, and security and privacy-are discussed.
Collapse
Affiliation(s)
- Xue Yang
- Department of Pancreatic Surgery and West China Biomedical Big Data CenterWest China HospitalSichuan UniversityChengdu610041China
| | - Kexin Huang
- Department of Pancreatic Surgery and West China Biomedical Big Data CenterWest China HospitalSichuan UniversityChengdu610041China
| | - Dewei Yang
- College of Advanced Manufacturing EngineeringChongqing University of Posts and TelecommunicationsChongqingChongqing400000China
| | - Weiling Zhao
- Center for Systems MedicineSchool of Biomedical InformaticsUTHealth at HoustonHoustonTX77030USA
| | - Xiaobo Zhou
- Center for Systems MedicineSchool of Biomedical InformaticsUTHealth at HoustonHoustonTX77030USA
| |
Collapse
|
5
|
Zhu C, Xia X, Li N, Zhong F, Yang Z, Liu L. RDKG-115: Assisting drug repurposing and discovery for rare diseases by trimodal knowledge graph embedding. Comput Biol Med 2023; 164:107262. [PMID: 37481946 DOI: 10.1016/j.compbiomed.2023.107262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/07/2023] [Accepted: 07/16/2023] [Indexed: 07/25/2023]
Abstract
Rare diseases (RDs) may affect individuals in small numbers, but they have a significant impact on a global scale. Accurate diagnosis of RDs is challenging, and there is a severe lack of drugs available for treatment. Pharmaceutical companies have shown a preference for drug repurposing from existing drugs developed for other diseases due to the high investment, high risk, and long cycle involved in RD drug development. Compared to traditional approaches, knowledge graph embedding (KGE) based methods are more efficient and convenient, as they treat drug repurposing as a link prediction task. KGE models allow for the enrichment of existing knowledge by incorporating multimodal information from various sources. In this study, we constructed RDKG-115, a rare disease knowledge graph involving 115 RDs, composed of 35,643 entities, 25 relations, and 5,539,839 refined triplets, based on 372,384 high-quality literature and 4 biomedical datasets: DRKG, Pathway Commons, PharmKG, and PMapp. Subsequently, we developed a trimodal KGE model containing structure, category, and description embeddings using reverse-hyperplane projection. We utilized this model to infer 4199 reliable new inferred triplets from RDKG-115. Finally, we calculated potential drugs and small molecules for each of the 115 RDs, taking multiple sclerosis as a case study. This study provides a paradigm for large-scale screening of drug repurposing and discovery for RDs, which will speed up the drug development process and ultimately benefit patients with RDs. The source code and data are available at https://github.com/ZhuChaoY/RDKG-115.
Collapse
Affiliation(s)
- Chaoyu Zhu
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Xiaoqiong Xia
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Nan Li
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Fan Zhong
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Lei Liu
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China; Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai, 200120, China.
| |
Collapse
|