1
|
Asim MN, Ibrahim MA, Zaib A, Dengel A. DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models. Front Med (Lausanne) 2025; 12:1503229. [PMID: 40265190 PMCID: PMC12011883 DOI: 10.3389/fmed.2025.1503229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 03/10/2025] [Indexed: 04/24/2025] Open
Abstract
Deoxyribonucleic acid (DNA) serves as fundamental genetic blueprint that governs development, functioning, growth, and reproduction of all living organisms. DNA can be altered through germline and somatic mutations. Germline mutations underlie hereditary conditions, while somatic mutations can be induced by various factors including environmental influences, chemicals, lifestyle choices, and errors in DNA replication and repair mechanisms which can lead to cancer. DNA sequence analysis plays a pivotal role in uncovering the intricate information embedded within an organism's genetic blueprint and understanding the factors that can modify it. This analysis helps in early detection of genetic diseases and the design of targeted therapies. Traditional wet-lab experimental DNA sequence analysis through traditional wet-lab experimental methods is costly, time-consuming, and prone to errors. To accelerate large-scale DNA sequence analysis, researchers are developing AI applications that complement wet-lab experimental methods. These AI approaches can help generate hypotheses, prioritize experiments, and interpret results by identifying patterns in large genomic datasets. Effective integration of AI methods with experimental validation requires scientists to understand both fields. Considering the need of a comprehensive literature that bridges the gap between both fields, contributions of this paper are manifold: It presents diverse range of DNA sequence analysis tasks and AI methodologies. It equips AI researchers with essential biological knowledge of 44 distinct DNA sequence analysis tasks and aligns these tasks with 3 distinct AI-paradigms, namely, classification, regression, and clustering. It streamlines the integration of AI into DNA sequence analysis tasks by consolidating information of 36 diverse biological databases that can be used to develop benchmark datasets for 44 different DNA sequence analysis tasks. To ensure performance comparisons between new and existing AI predictors, it provides insights into 140 benchmark datasets related to 44 distinct DNA sequence analysis tasks. It presents word embeddings and language models applications across 44 distinct DNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 39 word embeddings and 67 language models based predictive pipeline performance values as well as top performing traditional sequence encoding-based predictors and their performances across 44 DNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Arooj Zaib
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
2
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
3
|
Wang S, Wang M, Sun S, Liu X, Li D. Effect of miR-654-3p targeting EMP1 on osteoblast activity and differentiation in delayed fracture healing. J Orthop Surg Res 2025; 20:322. [PMID: 40156038 PMCID: PMC11951503 DOI: 10.1186/s13018-025-05736-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Accepted: 03/18/2025] [Indexed: 04/01/2025] Open
Abstract
BACKGROUND Delayed fracture healing (DFH) is a common postoperative complication in fracture patients, and a validated serum marker may aid in the clinical management and improve the prognosis of fracture patients. In this study, we investigated the diagnostic role and potential regulatory mechanisms of miR-654-3p in DFH. METHODS 73 patients with DFH and 75 patients with normal fracture healing (NFH) were included. Expression of miR-654-3p and EMP1 and several mRNA markers of osteogenic differentiation were evaluated by RT-qPCR. The diagnostic value of miR-654-3p and EMP1 alone and in combination was assessed using ROC curves. Cell proliferation capacity was assessed by CCK-8 and apoptosis rate by flow cytometry. DLR experiments demonstrated the targeting relationship between miR-654-3p and EMP1. RESULTS Levels of miR-654-3p were found to be significantly lower in DFH compared to NFH. Following cell differentiation treatment, miR-654-3p levels increased and EMP1 levels decreased. Furthermore, a negative correlation was identified between miR-654-3p and EMP1 target binding and expression levels. The combination of miR-654-3p and EMP1 holds significant diagnostic value for DFH. miR-654-3p high expression can inhibit EMP1 levels, which promotes cell proliferation, increases osteoblast activity and levels of differentiation markers, and decreases the rate of apoptosis. CONCLUSION miR-654-3p and EMP1 are aberrantly expressed in DFH, and both have high diagnostic value for DFH. miR-654-3p is involved in the proliferation, differentiation, and apoptotic activities of osteoblasts by regulating the level of EMP1, thus affecting the progression of DFH.
Collapse
Affiliation(s)
- Shantao Wang
- Spinal Trauma Orthopedics, Yidu Central Hospital of Weifang, No.5168, Jiangjunshan Road, Qingzhou, Weifang, 262500, China.
| | - Mingwei Wang
- Department of Pediatric, Yidu Central Hospital of Weifang, Weifang, 262500, China
| | - Shengliang Sun
- Hand, Foot and Ankle Surgery, Yidu Central Hospital of Weifang, Weifang, 262500, China
| | - Xinsheng Liu
- Spinal Trauma Orthopedics, Yidu Central Hospital of Weifang, No.5168, Jiangjunshan Road, Qingzhou, Weifang, 262500, China
| | - Danzhi Li
- Spinal Trauma Orthopedics, Yidu Central Hospital of Weifang, No.5168, Jiangjunshan Road, Qingzhou, Weifang, 262500, China
| |
Collapse
|
4
|
Ha J. DeepWalk-Based Graph Embeddings for miRNA-Disease Association Prediction Using Deep Neural Network. Biomedicines 2025; 13:536. [PMID: 40149513 PMCID: PMC11940379 DOI: 10.3390/biomedicines13030536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Revised: 02/17/2025] [Accepted: 02/17/2025] [Indexed: 03/29/2025] Open
Abstract
Background: In recent years, micro ribonucleic acids (miRNAs) have been recognized as key regulators in numerous biological processes, particularly in the development and progression of diseases. As a result, extensive research has focused on uncovering the critical involvement of miRNAs in disease mechanisms to better comprehend the underlying causes of human diseases. Despite these efforts, relying solely on biological experiments to identify miRNA-disease associations is both time-consuming and costly, making it an impractical approach for large-scale studies. Methods: In this paper, we propose a novel DeepWalk-based graph embedding method for predicting miRNA-disease association (DWMDA). Using DeepWalk, we extracted meaningful low-dimensional vectors from the miRNA and disease networks. Then, we applied a deep neural network to identify miRNA-disease associations using the low-dimensional vectors of miRNAs and diseases extracted via DeepWalk. Results: An ablation study was conducted to assess the proposed graph embedding modules. Furthermore, DWMDA demonstrates exceptional performance in two major cancer case studies (breast and lung), with results based on statistically robust measures, further emphasizing its reliability as a method for identifying associations between miRNAs and diseases. Conclusions: We expect that our model will not only facilitate the accurate prediction of disease-associated miRNAs but also serve as a generalizable framework for exploring interactions among various biological entities.
Collapse
Affiliation(s)
- Jihwan Ha
- Major of Big Data Convergence, Division of Data Information Science, Pukyong National University, Busan 48513, Republic of Korea
| |
Collapse
|
5
|
Abbasi AF, Asim MN, Dengel A. Transitioning from wet lab to artificial intelligence: a systematic review of AI predictors in CRISPR. J Transl Med 2025; 23:153. [PMID: 39905452 PMCID: PMC11796103 DOI: 10.1186/s12967-024-06013-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 12/18/2024] [Indexed: 02/06/2025] Open
Abstract
The revolutionary CRISPR-Cas9 system leverages a programmable guide RNA (gRNA) and Cas9 proteins to precisely cleave problematic regions within DNA sequences. This groundbreaking technology holds immense potential for the development of targeted therapies for a wide range of diseases, including cancers, genetic disorders, and hereditary diseases. CRISPR-Cas9 based genome editing is a multi-step process such as designing a precise gRNA, selecting the appropriate Cas protein, and thoroughly evaluating both on-target and off-target activity of the Cas9-gRNA complex. To ensure the accuracy and effectiveness of CRISPR-Cas9 system, after the targeted DNA cleavage, the process requires careful analysis of the resultant outcomes such as indels and deletions. Following the success of artificial intelligence (AI) in various fields, researchers are now leveraging AI algorithms to catalyze and optimize the multi-step process of CRISPR-Cas9 system. To achieve this goal AI-driven applications are being integrated into each step, but existing AI predictors have limited performance and many steps still rely on expensive and time-consuming wet-lab experiments. The primary reason behind low performance of AI predictors is the gap between CRISPR and AI fields. Effective integration of AI into multi-step CRISPR-Cas9 system demands comprehensive knowledge of both domains. This paper bridges the knowledge gap between AI and CRISPR-Cas9 research. It offers a unique platform for AI researchers to grasp deep understanding of the biological foundations behind each step in the CRISPR-Cas9 multi-step process. Furthermore, it provides details of 80 available CRISPR-Cas9 system-related datasets that can be utilized to develop AI-driven applications. Within the landscape of AI predictors in CRISPR-Cas9 multi-step process, it provides insights of representation learning methods, machine and deep learning methods trends, and performance values of existing 50 predictive pipelines. In the context of representation learning methods and classifiers/regressors, a thorough analysis of existing predictive pipelines is utilized for recommendations to develop more robust and precise predictive pipelines.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Smart Data and Knowledge Services, German Research Center for Artificial Intelligence, 67663, Kaiserslautern, Germany.
- Department of Computer Science, Rhineland-Palatinate Technical University Kaiserslautern-Landau, 67663, Kaiserslautern, Germany.
| | - Muhammad Nabeel Asim
- Department of Computer Science, Rhineland-Palatinate Technical University Kaiserslautern-Landau, 67663, Kaiserslautern, Germany
| | - Andreas Dengel
- Smart Data and Knowledge Services, German Research Center for Artificial Intelligence, 67663, Kaiserslautern, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University Kaiserslautern-Landau, 67663, Kaiserslautern, Germany
| |
Collapse
|
6
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
7
|
Bereczki Z, Benczik B, Balogh OM, Marton S, Puhl E, Pétervári M, Váczy-Földi M, Papp ZT, Makkos A, Glass K, Locquet F, Euler G, Schulz R, Ferdinandy P, Ágg B. Mitigating off-target effects of small RNAs: conventional approaches, network theory and artificial intelligence. Br J Pharmacol 2025; 182:340-379. [PMID: 39293936 DOI: 10.1111/bph.17302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 05/07/2024] [Accepted: 06/17/2024] [Indexed: 09/20/2024] Open
Abstract
Three types of highly promising small RNA therapeutics, namely, small interfering RNAs (siRNAs), microRNAs (miRNAs) and the RNA subtype of antisense oligonucleotides (ASOs), offer advantages over small-molecule drugs. These small RNAs can target any gene product, opening up new avenues of effective and safe therapeutic approaches for a wide range of diseases. In preclinical research, synthetic small RNAs play an essential role in the investigation of physiological and pathological pathways as silencers of specific genes, facilitating discovery and validation of drug targets in different conditions. Off-target effects of small RNAs, however, could make it difficult to interpret experimental results in the preclinical phase and may contribute to adverse events of small RNA therapeutics. Out of the two major types of off-target effects we focused on the hybridization-dependent, especially on the miRNA-like off-target effects. Our main aim was to discuss several approaches, including sequence design, chemical modifications and target prediction, to reduce hybridization-dependent off-target effects that should be considered even at the early development phase of small RNA therapy. Because there is no standard way of predicting hybridization-dependent off-target effects, this review provides an overview of all major state-of-the-art computational methods and proposes new approaches, such as the possible inclusion of network theory and artificial intelligence (AI) in the prediction workflows. Case studies and a concise survey of experimental methods for validating in silico predictions are also presented. These methods could contribute to interpret experimental results, to minimize off-target effects and hopefully to avoid off-target-related adverse events of small RNA therapeutics. LINKED ARTICLES: This article is part of a themed issue Non-coding RNA Therapeutics. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v182.2/issuetoc.
Collapse
Affiliation(s)
- Zoltán Bereczki
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Bettina Benczik
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| | - Olivér M Balogh
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Szandra Marton
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - Eszter Puhl
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
| | - Mátyás Pétervári
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Sanovigado Kft, Budapest, Hungary
| | - Máté Váczy-Földi
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - Zsolt Tamás Papp
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
| | - András Makkos
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| | - Kimberly Glass
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Fabian Locquet
- Physiologisches Institut, Justus-Liebig-Universität Gießen, Giessen, Germany
| | - Gerhild Euler
- Physiologisches Institut, Justus-Liebig-Universität Gießen, Giessen, Germany
| | - Rainer Schulz
- Physiologisches Institut, Justus-Liebig-Universität Gießen, Giessen, Germany
| | - Péter Ferdinandy
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| | - Bence Ágg
- Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Center for Pharmacology and Drug Research & Development, Semmelweis University, Budapest, Hungary
- HUN-REN-SU System Pharmacology Research Group, Department of Pharmacology and Pharmacotherapy, Semmelweis University, Budapest, Hungary
- Pharmahungary Group, Szeged, Hungary
| |
Collapse
|
8
|
Guan YJ, Yu CQ, Li LP, You ZH, Wei MM, Wang XF, Yang C, Guo LX. MHESMMR: a multilevel model for predicting the regulation of miRNAs expression by small molecules. BMC Bioinformatics 2024; 25:6. [PMID: 38166644 PMCID: PMC10763044 DOI: 10.1186/s12859-023-05629-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/21/2023] [Indexed: 01/05/2024] Open
Abstract
According to the expression of miRNA in pathological processes, miRNAs can be divided into oncogenes or tumor suppressors. Prediction of the regulation relations between miRNAs and small molecules (SMs) becomes a vital goal for miRNA-target therapy. But traditional biological approaches are laborious and expensive. Thus, there is an urgent need to develop a computational model. In this study, we proposed a computational model to predict whether the regulatory relationship between miRNAs and SMs is up-regulated or down-regulated. Specifically, we first use the Large-scale Information Network Embedding (LINE) algorithm to construct the node features from the self-similarity networks, then use the General Attributed Multiplex Heterogeneous Network Embedding (GATNE) algorithm to extract the topological information from the attribute network, and finally utilize the Light Gradient Boosting Machine (LightGBM) algorithm to predict the regulatory relationship between miRNAs and SMs. In the fivefold cross-validation experiment, the average accuracies of the proposed model on the SM2miR dataset reached 79.59% and 80.37% for up-regulation pairs and down-regulation pairs, respectively. In addition, we compared our model with another published model. Moreover, in the case study for 5-FU, 7 of 10 candidate miRNAs are confirmed by related literature. Therefore, we believe that our model can promote the research of miRNA-targeted therapy.
Collapse
Affiliation(s)
- Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi'an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi'an, China.
| | - Li-Ping Li
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi, China.
- College of Agriculture and Forestry, Longdong University, Qingyang, China.
| | - Zhu-Hong You
- School of Computer Science, North-Western Polytechnical University, Xi'an, China
| | - Meng-Meng Wei
- School of Information Engineering, Xijing University, Xi'an, China
| | - Xin-Fei Wang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Chen Yang
- School of Information Engineering, Xijing University, Xi'an, China
| | - Lu-Xiang Guo
- School of Information Engineering, Xijing University, Xi'an, China
| |
Collapse
|
9
|
Wang S, Wang F, Qiao S, Zhuang Y, Zhang K, Pang S, Nowak R, Lv Z. MSHGANMDA: Meta-Subgraphs Heterogeneous Graph Attention Network for miRNA-Disease Association Prediction. IEEE J Biomed Health Inform 2023; 27:4639-4648. [PMID: 35759606 DOI: 10.1109/jbhi.2022.3186534] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MicroRNAs (miRNAs) influence several biological processes involved in human disease. Biological experiments for verifying the association between miRNA and disease are always costly in terms of both money and time. Although numerous biological experiments have identified multi-types of associations between miRNAs and diseases, existing computational methods are unable to sufficiently mine the knowledge in these associations to predict unknown associations. In this study, we innovatively propose a heterogeneous graph attention network model based on meta-subgraphs (MSHGANMDA) to predict the potential miRNA-disease associations. Firstly, we define five types of meta-subgraph from the known miRNA-disease associations. Then, we use meta-subgraph attention and meta-subgraph semantic attention to extract features of miRNA-disease pairs within and between these five meta-subgraphs, respectively. Finally, we apply a fully-connected layer (FCL) to predict the scores of unknown miRNA-disease associations and cross-entropy loss to train our model end-to-end. To evaluate the effectiveness of MSHGANMDA, we apply five-fold cross-validation to calculate the mean values of evaluation metrics Accuracy, Precision, Recall, and F1-score as 0.8595, 0.8601, 0.8596, and 0.8595, respectively. Experiments show that our model, which primarily utilizes multi-types of miRNA-disease association data, gets the greatest ROC-AUC value of 0.934 when compared to other state-of-the-art approaches. Furthermore, through case studies, we further confirm the effectiveness of MSHGANMDA in predicting unknown diseases.
Collapse
|
10
|
Predicting miRNA-Disease Association Based on Neural Inductive Matrix Completion with Graph Autoencoders and Self-Attention Mechanism. Biomolecules 2022; 12:biom12010064. [PMID: 35053212 PMCID: PMC8774034 DOI: 10.3390/biom12010064] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 12/29/2021] [Accepted: 12/31/2021] [Indexed: 02/06/2023] Open
Abstract
Many studies have clarified that microRNAs (miRNAs) are associated with many human diseases. Therefore, it is essential to predict potential miRNA-disease associations for disease pathogenesis and treatment. Numerous machine learning and deep learning approaches have been adopted to this problem. In this paper, we propose a Neural Inductive Matrix completion-based method with Graph Autoencoders (GAE) and Self-Attention mechanism for miRNA-disease associations prediction (NIMGSA). Some of the previous works based on matrix completion ignore the importance of label propagation procedure for inferring miRNA-disease associations, while others cannot integrate matrix completion and label propagation effectively. Varying from previous studies, NIMGSA unifies inductive matrix completion and label propagation via neural network architecture, through the collaborative training of two graph autoencoders. This neural inductive matrix completion-based method is also an implementation of self-attention mechanism for miRNA-disease associations prediction. This end-to-end framework can strengthen the robustness and preciseness of both matrix completion and label propagation. Cross validations indicate that NIMGSA outperforms current miRNA-disease prediction methods. Case studies demonstrate that NIMGSA is competent in detecting potential miRNA-disease associations.
Collapse
|