1
|
Zhu L, Zhang Z, Yang S. BioSeq_Ksite: Multi-perspective feature-driven prediction of protein succinylation based on an adaptive attention module with SSBCE loss strategy. Int J Biol Macromol 2025; 310:143601. [PMID: 40306513 DOI: 10.1016/j.ijbiomac.2025.143601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 04/23/2025] [Accepted: 04/26/2025] [Indexed: 05/02/2025]
Abstract
Succinylation is a post-translational modification in which a succinyl group is transferred to the lysine residue of a protein, playing a crucial role in regulating both protein structure and cellular function. This paper introduces a novel sequential model, BioSeq_Ksite, designed to enhance succinylation prediction accuracy by integrating an adaptive attention mechanism and a joint loss function. This study first presents a new hybrid feature, ProtFusion, which combines the physicochemical properties of amino acids with pretrained models. Next, this paper introduces an adaptive attention module that enables the model to autonomously identify important features during training. Additionally, a gated network architecture is adopted to create a dual-branch sequential model. Finally, by combining sensitivity, specificity, and cross-entropy loss, a new joint loss function is proposed, which is used for succinylation prediction for the first time and significantly enhances the model's ability to handle class-imbalanced data. Evaluation on the test dataset shows that BioSeq_Ksite outperforms other models in MCC, Sn, AUC, and F1-Score, with a 7.68 % improvement in MCC over the second-best model. It provides an efficient and reliable tool for succinylation research and application. BioSeq_Ksite can be accessed at https://github.com/zzq1124ZHZ/BioSeq_Ksite.
Collapse
Affiliation(s)
- Lun Zhu
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou 213164, China
| | - Ziqi Zhang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou 213164, China.
| |
Collapse
|
2
|
Zhao K, Ji Z, Zhang L, Quan N, Li Y, Yu G, Bi X. HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences. BMC Bioinformatics 2025; 26:110. [PMID: 40263997 PMCID: PMC12013097 DOI: 10.1186/s12859-025-06122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 03/27/2025] [Indexed: 04/24/2025] Open
Abstract
BACKGROUND Understanding the relationships between proteins and specific disease phenotypes contributes to the early detection of diseases and advances the development of personalized medicine. The acquisition of a large amount of proteomics data has facilitated this process. To improve discovery efficiency and reduce the time and financial costs associated with biological experiments, various computational methods have yielded promising results. However, the lack of rich and reliable protein-related information still presents challenges in this process. RESULTS In this paper, we propose an ensemble prediction model, named HPOseq, which predicts human protein-phenotype relationships based only on sequence information. HPOseq establishes two base models to achieve objectives. One directly extracts internal information from amino acid sequences as protein features to predict the associated phenotypes. The other builds a protein-protein network based on sequence similarity, extracting information between proteins for phenotype prediction. Ultimately, an ensemble module is employed to integrate the predictions from both base models, resulting in the final prediction. CONCLUSION The results of 5-fold cross-validation reveal that HPOseq outperforms seven baseline methods for predicting protein-phenotype relationships. Moreover, we conduct case studies from the points of phenotype annotation and protein analysis to verify the practical significance of HPOseq.
Collapse
Affiliation(s)
- Kai Zhao
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830011, China
| | - Zhuocheng Ji
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830011, China
| | - Linlin Zhang
- School of Software, Xinjiang University, Urumqi, 830011, China
| | - Na Quan
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830011, China
| | - Yuheng Li
- School of Computer Science and Technology, Xinjiang University, Urumqi, 830011, China
| | - Guanglei Yu
- College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, 830011, China
- School Of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Xuehua Bi
- College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi, 830011, China.
- School Of Computer Science and Engineering, Central South University, Changsha, 410083, China.
| |
Collapse
|
3
|
Wen X, Liu H, Long W, Wei S, Zhu R. Consistent semantic representation learning for out-of-distribution molecular property prediction. Brief Bioinform 2025; 26:bbaf147. [PMID: 40205853 PMCID: PMC11982020 DOI: 10.1093/bib/bbaf147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 02/01/2025] [Accepted: 03/14/2025] [Indexed: 04/11/2025] Open
Abstract
Invariant molecular representation models provide potential solutions to guarantee accurate prediction of molecular properties under distribution shifts out-of-distribution (OOD) by identifying and leveraging invariant substructures inherent to the molecules. However, due to the complex entanglement of molecular functional groups and the frequent display of activity cliffs by molecular properties, the separation of molecules becomes inaccurate and tricky. This results in inconsistent semantics among the invariant substructures identified by existing models, which means molecules sharing identical invariant structures may exhibit drastically different properties. Focusing on the aforementioned challenges, in the semantic space, this paper explores the potential correlation between the consistent semantic-expressing the same information within different molecular representation forms-and the molecular property prediction problem. To enhance the performance of OOD molecular property prediction, this paper proposes a consistent semantic representation learning (CSRL) framework without separating molecules, which comprises two modules: a semantic uni-code (SUC) module and a consistent semantic extractor (CSE). To address inconsistent mapping of semantic in different molecular representation forms, SUC adjusts incorrect embeddings into the correct embeddings of two molecular representation forms. Then, CSE leverages non-semantic information as training labels to guide the discriminator's learning, thereby suppressing the reliance of CSE on the non-semantic information in different molecular representation embeddings. Extensive experiments demonstrate that the consistent semantic can guarantee the performance of models. Overall, CSRL can improve the model's average Receiver Operating Characteristic - Area Under the Curve (ROC-AUC) by 6.43%, when comparing with 11 state-of-the-art models on 12 datasets.
Collapse
Affiliation(s)
- Xinlong Wen
- College of Informatics, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, People’s Republic of China
| | - Hao Liu
- College of Informatics, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, People’s Republic of China
| | - Wenhan Long
- College of Informatics, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, People’s Republic of China
| | - Shuoying Wei
- College of Informatics, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, People’s Republic of China
| | - Rongbo Zhu
- College of Informatics, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, 430070, Hubei, People’s Republic of China
| |
Collapse
|
4
|
Xie Y, Gao J, Bi X, Zhao J. Unsupervised cervical cell instance segmentation method integrating cellular characteristics. Med Biol Eng Comput 2025; 63:773-791. [PMID: 39489855 DOI: 10.1007/s11517-024-03222-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 10/09/2024] [Indexed: 11/05/2024]
Abstract
Cell instance segmentation is a key technology for cervical cancer auxiliary diagnosis systems. However, pixel-level annotation is time-consuming and labor-intensive, making it difficult to obtain a large amount of annotated data. This results in the model not being fully trained. In response to these problems, this paper proposes an unsupervised cervical cell instance segmentation method that integrates cell characteristics. Cervical cells have a clear corresponding structure between the nucleus and cytoplasm. This method fully takes this feature into account by building a dual-flow framework to locate the nucleus and cytoplasm and generate high-quality pseudo-labels. In the nucleus segmentation stage, the position and range of the nucleus are determined using the standard cell-restricted nucleus segmentation method. In the cytoplasm segmentation stage, a multi-angle collaborative segmentation method is used to achieve the positioning of the cytoplasm. First, taking advantage of the self-similarity characteristics of pixel blocks in cells, a cytoplasmic segmentation method based on self-similarity map iteration is proposed. The pixel blocks are mapped from the perspective of local details, and the iterative segmentation is repeated. Secondly, using low-level features such as cell color and shape, a self-supervised heatmap-aware cytoplasm segmentation method is proposed to obtain the activation map of the cytoplasm from the perspective of global attention. The two methods are fused to determine cytoplasmic regions, and combined with nuclear locations, high-quality pseudo-labels are generated. These pseudo-labels are used to train the model cyclically, and the loss strategy is used to encourage the model to discover new object masks, thereby obtaining a segmentation model with better performance. Experimental results show that this method achieves good results in cytoplasm segmentation. On the three datasets of ISBI, MS_CellSeg, and Cx22, 54.32%, 44.64%, and 66.52% AJI were obtained, respectively, which is better than other typical unsupervised methods selected in this article.
Collapse
Affiliation(s)
- Yining Xie
- College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Jingling Gao
- College of Computer and Control Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Xueyan Bi
- Heilongjiang Institute for Drug Control, Harbin, 150088, China
| | - Jing Zhao
- College of Mechanical and Electrical Engineering, Northeast Forestry University, Harbin, 150040, China.
| |
Collapse
|
5
|
Malik M, Chang YY, Liu YC, Le VT, Ou YY. MCNN_MC: Computational Prediction of Mitochondrial Carriers and Investigation of Bongkrekic Acid Toxicity Using Protein Language Models and Convolutional Neural Networks. J Chem Inf Model 2024; 64:9125-9134. [PMID: 39133248 PMCID: PMC11683872 DOI: 10.1021/acs.jcim.4c00961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/26/2024] [Accepted: 07/29/2024] [Indexed: 08/13/2024]
Abstract
Mitochondrial carriers (MCs) are essential proteins that transport metabolites across mitochondrial membranes and play a critical role in cellular metabolism. ADP/ATP (adenosine diphosphate/adenosine triphosphate) is one of the most important carriers as it contributes to cellular energy production and is susceptible to the powerful toxin bongkrekic acid. This toxin has claimed several lives; for example, a recent foodborne outbreak in Taipei, Taiwan, has caused four deaths and sickened 30 people. The issue of bongkrekic acid poisoning has been a long-standing problem in Indonesia, with reports as early as 1895 detailing numerous deaths from contaminated coconut fermented cakes. In bioinformatics, significant advances have been made in understanding biological processes through computational methods; however, no established computational method has been developed for identifying mitochondrial carriers. We propose a computational bioinformatics approach for predicting MCs from a broader class of secondary active transporters with a focus on the ADP/ATP carrier and its interaction with bongkrekic acid. The proposed model combines protein language models (PLMs) with multiwindow scanning convolutional neural networks (mCNNs). While PLM embeddings capture contextual information within proteins, mCNN scans multiple windows to identify potential binding sites and extract local features. Our results show 96.66% sensitivity, 95.76% specificity, 96.12% accuracy, 91.83% Matthews correlation coefficient (MCC), 94.63% F1-Score, and 98.55% area under the curve (AUC). The results demonstrate the effectiveness of the proposed approach in predicting MCs and elucidating their functions, particularly in the context of bongkrekic acid toxicity. This study presents a valuable approach for identifying novel mitochondrial complexes, characterizing their functional roles, and understanding mitochondrial toxicology mechanisms. Our findings, that utilize computational methods to improve our understanding of cellular processes and drug-target interactions, contribute to the development of therapeutic strategies for mitochondrial disorders, reducing the devastating effects of bongkrekic acid poisoning.
Collapse
Affiliation(s)
- Muhammad
Shahid Malik
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Department
of Computer Sciences, Karakoram International
University, Gilgit-Baltistan 15100, Pakistan
| | - Yan-Yun Chang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Chen Liu
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Van The Le
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Graduate
Program in Biomedical Informatics, Yuan
Ze University, Chung-Li 32003, Taiwan
| |
Collapse
|
6
|
Liu Q, He D, Fan M, Wang J, Cui Z, Wang H, Mi Y, Li N, Meng Q, Hou Y. Prediction and Interpretation Microglia Cytotoxicity by Machine Learning. J Chem Inf Model 2024; 64:9306-9326. [PMID: 38949724 DOI: 10.1021/acs.jcim.4c00366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Ameliorating microglia-mediated neuroinflammation is a crucial strategy in developing new drugs for neurodegenerative diseases. Plant compounds are an important screening target for the discovery of drugs for the treatment of neurodegenerative diseases. However, due to the spatial complexity of phytochemicals, it becomes particularly important to evaluate the effectiveness of compounds while avoiding the mixing of cytotoxic substances in the early stages of compound screening. Traditional high-throughput screening methods suffer from high cost and low efficiency. A computational model based on machine learning provides a novel avenue for cytotoxicity determination. In this study, a microglia cytotoxicity classifier was developed using a machine learning approach. First, we proposed a data splitting strategy based on the molecule murcko generic scaffold, under this condition, three machine learning approaches were coupled with three kinds of molecular representation methods to construct microglia cytotoxicity classifier, which were then compared and assessed by the predictive accuracy, balanced accuracy, F1-score, and Matthews Correlation Coefficient. Then, the recursive feature elimination integrated with support vector machine (RFE-SVC) dimension reduction method was introduced to molecular fingerprints with high dimensions to further improve the model performance. Among all the microglial cytotoxicity classifiers, the SVM coupled with ECFP4 fingerprint after feature selection (ECFP4-RFE-SVM) obtained the most accurate classification for the test set (ACC of 0.99, BA of 0.99, F1-score of 0.99, MCC of 0.97). Finally, the Shapley additive explanations (SHAP) method was used in interpreting the microglia cytotoxicity classifier and key substructure smart identified as structural alerts. Experimental results show that ECFP4-RFE-SVM have reliable classification capability for microglia cytotoxicity, and SHAP can not only provide a rational explanation for microglia cytotoxicity predictions, but also offer a guideline for subsequent molecular cytotoxicity modifications.
Collapse
Affiliation(s)
- Qing Liu
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Dakuo He
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Mengmeng Fan
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Jinpeng Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Zeyu Cui
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Hao Wang
- College of Information Science and Engineering, State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819, P. R. China
| | - Yan Mi
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Ning Li
- School of Traditional Chinese Materia Medica, Key Laboratory for TCM Material Basis Study and Innovative Drug Development of Shenyang City, Shenyang Pharmaceutical University, Shenyang 110016, P. R. China
| | - Qingqi Meng
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| | - Yue Hou
- Key Laboratory of Bioresource Research and Development of Liaoning Province, College of Life and Health Sciences, National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Key Laboratory of Data Analytics and Optimization for Smart Industry, Ministry of Education, Northeastern University, Shenyang 110169, P. R. China
| |
Collapse
|
7
|
Zheng X, Zhang F, Wang L, Fan H, Yu B, Qi X, Liang B. Association between serum calcium and in-hospital mortality in critically ill atrial fibrillation patients from the MIMIC IV database. Sci Rep 2024; 14:27954. [PMID: 39543197 PMCID: PMC11564696 DOI: 10.1038/s41598-024-79015-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 11/05/2024] [Indexed: 11/17/2024] Open
Abstract
Thongprayoon et al. found in a study of 12,599 non-dialysis adult hospitalized patients that serum calcium (SC) disturbances affected more than half of the patients and were associated with increased in-hospital mortality. Similar impacts of SC disturbances on in-hospital mortality have been observed in patients with acute myocardial infarction and the general hospitalized population. Atrial fibrillation (AF), the most common arrhythmia in the intensive care unit (ICU), affects around 6% of critically ill patients. However, the significance of the relationship between SC levels and in-hospital mortality in these patients remains unclear. This study aimed to explore the correlation between SC levels and in-hospital mortality in ICU patients diagnosed with AF. Data from the MIMIC-IV database included 11,621 AF patients (average age 75.59 ± 11.74 years; 42.56% male), with an in-hospital mortality rate of 8.90%. A nonlinear relationship between SC levels and in-hospital mortality was observed. Effect sizes on either side of the inflection point were 0.79 (HR: 0.79, 95% CI 0.67-0.94, P = 0.006) and 1.12 (HR: 1.12, 95% CI 1.01-1.25, P = 0.029). Sensitivity analyses confirmed these results. SC levels around 8.56 mg/dL were associated with the lowest risk of in-hospital mortality, with risks increasing as SC levels deviated from this point. SC levels below this inflection point were linked to more pronounced clinical impacts. This finding has significant clinical implications for clinicians. Therefore, in the treatment of ICU patients with AF, clinicians should closely monitor SC levels, with a focus on maintaining them around 8.56 mg/dL.
Collapse
Affiliation(s)
- Xin Zheng
- Department of Cardiology, Second Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| | - Fenfang Zhang
- Department of Cardiology, Yangquan First People's Hospital, Yangquan, Shanxi, China
| | - Leigang Wang
- Department of Cardiology, Second Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| | - Hongxuan Fan
- Department of Cardiology, Second Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| | - Bing Yu
- Department of Cardiology, Second Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| | - Xiaogang Qi
- Department of Cardiology, Second Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
- Orthopedics Department, Yangquan First People's Hospital, Yangquan, Shanxi, China
| | - Bin Liang
- Department of Cardiology, Second Hospital of Shanxi Medical University, Taiyuan, Shanxi, China.
| |
Collapse
|
8
|
Yin W, Wang S, Zhang Y, Qiao S, Wu W, Li H. Multirelational Hypergraph Representation Learning for Predicting circRNA-miRNA Associations. J Chem Inf Model 2024; 64:8349-8360. [PMID: 39432249 DOI: 10.1021/acs.jcim.4c01436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2024]
Abstract
One of the principal functions of circular RNA (circRNA) is to participate in gene regulation by sponging microRNAs (miRNAs). Using accumulated circRNA-miRNA associations (CMAs) to construct computational models for predicting potential associations provides a crucial tool for accelerating the validation of reliable associations through traditional experiments. Nevertheless, the current prediction models are constrained in their capacity to represent the higher-order relationships of CMAs and thus require further enhancement in terms of their predictive efficacy. In order to address this issue, we propose a new model based on multirelational hypergraph representation learning (MRHRL). This model employs hypergraphs to capture various higher-order relationships among RNAs and aggregates complementary information through a view attention mechanism. Furthermore, MRHRL introduces a hyperedge-level reconstruction task, jointly optimizing the prediction and reconstruction tasks within a unified framework to uncover potential information, thereby enhancing the model's predictive and generalization capabilities. Experiments conducted on three real-world data sets demonstrate that MRHRL achieves satisfactory results in CMAs prediction, significantly outperforming existing prediction models.
Collapse
Affiliation(s)
- Wenjing Yin
- College of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, China
| | - Shudong Wang
- College of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, China
| | - Yuanyuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, Shandong 266520, China
| | - Sibo Qiao
- College of Software, Tiangong University, Tianjin 300387, China
| | - Wenhao Wu
- College of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, China
| | - Hengxiao Li
- College of Computer Science and Technology, China University of Petroleum, Qingdao, Shandong 266580, China
| |
Collapse
|
9
|
Malebary SJ, Alromema N. iDLB-Pred: identification of disordered lipid binding residues in protein sequences using convolutional neural network. Sci Rep 2024; 14:24724. [PMID: 39433833 PMCID: PMC11494137 DOI: 10.1038/s41598-024-75700-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 10/08/2024] [Indexed: 10/23/2024] Open
Abstract
Proteins, nucleic acids, and lipids all interact with intrinsically disordered protein areas. Lipid-binding regions are involved in a variety of biological processes as well as a number of human illnesses. The expanding body of experimental evidence for these interactions and the dearth of techniques to anticipate them from the protein sequence serve as driving forces. Although large-scale laboratory techniques are considered to be essential for equipment for studying binding residues, they are time consuming and costly, making it challenging for researchers to predict lipid binding residues. As a result, computational techniques are being looked at as a different strategy to overcome this difficulty. To predict disordered lipid-binding residues (DLBRs), we proposed iDLB-Pred predictor utilizing benchmark dataset to compute feature through extraction techniques to identify relevant patterns and information. Various classification techniques, including deep learning methods such as Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Multilayer Perceptrons (MLPs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRUs), were employed for model training. The proposed model, iDLB-Pred, was rigorously validated using metrics such as accuracy, sensitivity, specificity, and Matthew's correlation coefficient. The results demonstrate the predictor's exceptional performance, achieving accuracy rates of 81% on an independent dataset and 86% in 10-fold cross-validation.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, 21911, Rabigh, Saudi Arabia.
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, 21911, Rabigh, Saudi Arabia
| |
Collapse
|
10
|
Chang F, Liu L, Hu F, Sun X, Zhao Y, Zhang N, Li C. RNAfcg: RNA Flexibility Prediction Based on Topological Centrality and Global Features. J Chem Inf Model 2024; 64:7786-7792. [PMID: 39276067 DOI: 10.1021/acs.jcim.4c00848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2024]
Abstract
The dynamics of RNAs are related intimately to their functions. Molecular flexibility, as a starting point for understanding their dynamics, has been utilized to predict many characteristics associated with their functions. Since the experimental measurement methods are time-consuming and labor-intensive, it is urgently needed to develop reliable theoretical methods to predict RNA flexibility. In this work, we develop an effective machine learning method, RNAfcg, to predict RNA flexibility, where the Random Forest (RF) is trained by features including the topological centralities, flexibility-rigidity index, and global characteristics first introduced by us, as well as some traditional sequence and structural features. The analyses show that the three types of features introduced first have significant contributions to RNA flexibility prediction, among which the topological type contributes the most, which indicates the importance of structural topology in determining RNA flexibility. The performance comparison indicates that RNAfcg outperforms the state-of-the-art machine learning methods and the commonly used Gaussian Network Model (GNM) models, achieving a much higher Pearson correlation coefficient (PCC) of 0.6619 on the test data set. This work is helpful for understanding RNA dynamics and can be used to predict RNA function information. The source code is available at https://github.com/ChunhuaLab/RNAfcg/.
Collapse
Affiliation(s)
- Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Lamei Liu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Yingchun Zhao
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Na Zhang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
11
|
Han X, Zhang A, Meng Z, Wang Q, Liu S, Wang Y, Tan J, Guo L, Li F. Bioinformatics analysis based on extracted ingredients combined with network pharmacology, molecular docking and molecular dynamics simulation to explore the mechanism of Jinbei oral liquid in the therapy of idiopathic pulmonary fibrosis. Heliyon 2024; 10:e38173. [PMID: 39364246 PMCID: PMC11447332 DOI: 10.1016/j.heliyon.2024.e38173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 09/19/2024] [Indexed: 10/05/2024] Open
Abstract
Objective Jinbei oral liquid (JBOL), which is derived from a traditional hospital preparation, is frequently utilized to treat idiopathic pulmonary fibrosis (IPF) and has shown efficacy in clinical therapy. However, there are now several obstacles facing the mechanism inquiry, including target proteins, active components, and the binding affinity between crucial compounds and target proteins. To gain additional insight into the mechanisms underlying JBOL in anti-IPF, this study used bioinformation technologies, including network pharmacology, molecular docking, and molecular dynamic simulation, with a substantial amount of data based on realistic constituents. Methods Using network pharmacology, we loaded 118 realistic compounds into the SwissTargetPrediction and SwissADME databases and screened the active compounds and target proteins. IPF-related targets were collected from the OMIM, DisGeNET, and GeneCards databases, and the network of IPF-active constituents was built with Cytoscape 3.10.1. The GO and KEGG pathway enrichment analyses were carried out using Metascape, and the protein-protein interaction (PPI) network was constructed to screen the key targets with the STRING database. Finally, the reciprocal affinity between the active molecules and the crucial targets was assessed through the use of molecular docking and molecular dynamics simulation. Results A total of 122 targets and 34 tested active compounds were summarized in this investigation. Among these, kaempferol, apigenin, baicalein were present in high degree. PPI networks topological analysis identified eight key target proteins. AGE-RAGE, EGFR, and PI3K-Akt signaling pathways were found to be regulated during the phases of cell senescence, inflammatory response, autophagy, and immunological response in anti-IPF of JBOL. It was verified by molecular docking and molecular dynamics simulation that the combining way and binding energy between active ingredients and selected targets. Conclusions This work forecasts the prospective core ingredients, targets, and signal pathways of JBOL in anti-IPF, which has confirmed the multiple targets and pathways of JBOL in anti-IPF and provided the first comprehensive assessment with bioinformatic approaches. With empirical backing and an innovative approach to the molecular mechanism, JBOL is being considered as a potential new medication.
Collapse
Affiliation(s)
- Xinru Han
- Shandong University of Traditional Chinese Medicine, Jinan, China
- Department of Pharmacy, Central Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Aijun Zhang
- Shandong University of Traditional Chinese Medicine, Jinan, China
- Institute of Chinese Materia Medica, Shandong Hongji-tang Pharmaceutical Group Co., Ltd., Jinan, China
| | - Zhaoqing Meng
- Institute of Chinese Materia Medica, Shandong Hongji-tang Pharmaceutical Group Co., Ltd., Jinan, China
| | - Qian Wang
- Department of Pharmacy, Central Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Song Liu
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yunjia Wang
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Jiaxin Tan
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Lubo Guo
- Department of Pharmacy, Central Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Feng Li
- Shandong University of Traditional Chinese Medicine, Jinan, China
| |
Collapse
|
12
|
Khan S, AlQahtani SA, Noor S, Ahmad N. PSSM-Sumo: deep learning based intelligent model for prediction of sumoylation sites using discriminative features. BMC Bioinformatics 2024; 25:284. [PMID: 39215231 PMCID: PMC11363370 DOI: 10.1186/s12859-024-05917-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 08/27/2024] [Indexed: 09/04/2024] Open
Abstract
Post-translational modifications (PTMs) are fundamental to essential biological processes, exerting significant influence over gene expression, protein localization, stability, and genome replication. Sumoylation, a PTM involving the covalent addition of a chemical group to a specific protein sequence, profoundly impacts the functional diversity of proteins. Notably, identifying sumoylation sites has garnered significant attention due to their crucial roles in proteomic functions and their implications in various diseases, including Parkinson's and Alzheimer's. Despite the proposal of several computational models for identifying sumoylation sites, their effectiveness could be improved by the limitations associated with conventional learning methodologies. In this study, we introduce pseudo-position-specific scoring matrix (PsePSSM), a robust computational model designed for accurately predicting sumoylation sites using an optimized deep learning algorithm and efficient feature extraction techniques. Moreover, to streamline computational processes and eliminate irrelevant and noisy features, sequential forward selection using a support vector machine (SFS-SVM) is implemented to identify optimal features. The multi-layer Deep Neural Network (DNN) is a robust classifier, facilitating precise sumoylation site prediction. We meticulously assess the performance of PSSM-Sumo through a tenfold cross-validation approach, employing various statistical metrics such as the Matthews Correlation Coefficient (MCC), accuracy, sensitivity, specificity, and the Area under the ROC Curve (AUC). Comparative analyses reveal that PSSM-Sumo achieves an exceptional average prediction accuracy of 98.71%, surpassing existing models. The robustness and accuracy of the proposed model position it as a promising tool for advancing drug discovery and the diagnosis of diverse diseases linked to sumoylation sites.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KPK, Pakistan
| | - Salman A AlQahtani
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University Jalalabad, Jalalabad, Afghanistan.
| |
Collapse
|
13
|
Arif M, Musleh S, Fida H, Alam T. PLMACPred prediction of anticancer peptides based on protein language model and wavelet denoising transformation. Sci Rep 2024; 14:16992. [PMID: 39043738 PMCID: PMC11266708 DOI: 10.1038/s41598-024-67433-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 07/11/2024] [Indexed: 07/25/2024] Open
Abstract
Anticancer peptides (ACPs) perform a promising role in discovering anti-cancer drugs. The growing research on ACPs as therapeutic agent is increasing due to its minimal side effects. However, identifying novel ACPs using wet-lab experiments are generally time-consuming, labor-intensive, and expensive. Leveraging computational methods for fast and accurate prediction of ACPs would harness the drug discovery process. Herein, a machine learning-based predictor, called PLMACPred, is developed for identifying ACPs from peptide sequence only. PLMACPred adopted a set of encoding schemes representing evolutionary-property, composition-property, and protein language model (PLM), i.e., evolutionary scale modeling (ESM-2)- and ProtT5-based embedding to encode peptides. Then, two-dimensional (2D) wavelet denoising (WD) was employed to remove the noise from extracted features. Finally, ensemble-based cascade deep forest (CDF) model was developed to identify ACP. PLMACPred model attained superior performance on all three benchmark datasets, namely, ACPmain, ACPAlter, and ACP740 over tenfold cross validation and independent dataset. PLMACPred outperformed the existing models and improved the prediction accuracy by 18.53%, 2.4%, 7.59% on ACPmain, ACPalter, ACP740 dataset, respectively. We showed that embedding from ProtT5 and ESM-2 was capable of capturing better contextual information from the entire sequence than the other encoding schemes for ACP prediction. For the explainability of proposed model, SHAP (SHapley Additive exPlanations) method was used to analyze the feature effect on the ACP prediction. A list of novel sequence motifs was proposed from the ACP sequence using MEME suites. We believe, PLMACPred will support in accelerating the discovery of novel ACPs as well as other activities of microbial peptides.
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Huma Fida
- Department of Microbiology, Abdul Wali Khan University, Mardan, KPK, Pakistan
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
14
|
Zhuang J, Huang X, Liu S, Gao W, Su R, Feng K. MulTFBS: A Spatial-Temporal Network with Multichannels for Predicting Transcription Factor Binding Sites. J Chem Inf Model 2024; 64:4322-4333. [PMID: 38733561 DOI: 10.1021/acs.jcim.3c02088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]
Abstract
Revealing the mechanisms that influence transcription factor binding specificity is the key to understanding gene regulation. In previous studies, DNA double helix structure and one-hot embedding have been used successfully to design computational methods for predicting transcription factor binding sites (TFBSs). However, DNA sequence as a kind of biological language, the method of word embedding representation in natural language processing, has not been considered properly in TFBS prediction models. In our work, we integrate different types of features of DNA sequence to design a multichanneled deep learning framework, namely MulTFBS, in which independent one-hot encoding, word embedding encoding, which can incorporate contextual information and extract the global features of the sequences, and double helix three-dimensional structural features have been trained in different channels. To extract sequence high-level information effectively, in our deep learning framework, we select the spatial-temporal network by combining convolutional neural networks and bidirectional long short-term memory networks with attention mechanism. Compared with six state-of-the-art methods on 66 universal protein-binding microarray data sets of different transcription factors, MulTFBS performs best on all data sets in the regression tasks, with the average R2 of 0.698 and the average PCC of 0.833, which are 5.4% and 3.2% higher, respectively, than the suboptimal method CRPTS. In addition, we evaluate the classification performance of MulTFBS for distinguishing bound or unbound regions on TF ChIP-seq data. The results show that our framework also performs well in the TFBS classification tasks.
Collapse
Affiliation(s)
- Jujuan Zhuang
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Xinru Huang
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Shuhan Liu
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Wanquan Gao
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Rui Su
- The School of Science, Dalian Maritime University, Dalian 116026, China
| | - Kexin Feng
- The School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
15
|
Pan L, Wang H, Yang B, Li W. A protein network refinement method based on module discovery and biological information. BMC Bioinformatics 2024; 25:157. [PMID: 38643108 PMCID: PMC11031909 DOI: 10.1186/s12859-024-05772-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 04/10/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND The identification of essential proteins can help in understanding the minimum requirements for cell survival and development to discover drug targets and prevent disease. Nowadays, node ranking methods are a common way to identify essential proteins, but the poor data quality of the underlying PIN has somewhat hindered the identification accuracy of essential proteins for these methods in the PIN. Therefore, researchers constructed refinement networks by considering certain biological properties of interacting protein pairs to improve the performance of node ranking methods in the PIN. Studies show that proteins in a complex are more likely to be essential than proteins not present in the complex. However, the modularity is usually ignored for the refinement methods of the PINs. METHODS Based on this, we proposed a network refinement method based on module discovery and biological information. The idea is, first, to extract the maximal connected subgraph in the PIN, and to divide it into different modules by using Fast-unfolding algorithm; then, to detect critical modules according to the orthologous information, subcellular localization information and topology information within each module; finally, to construct a more refined network (CM-PIN) by using the identified critical modules. RESULTS To evaluate the effectiveness of the proposed method, we used 12 typical node ranking methods (LAC, DC, DMNC, NC, TP, LID, CC, BC, PR, LR, PeC, WDC) to compare the overall performance of the CM-PIN with those on the S-PIN, D-PIN and RD-PIN. The experimental results showed that the CM-PIN was optimal in terms of the identification number of essential proteins, precision-recall curve, Jackknifing method and other criteria, and can help to identify essential proteins more accurately.
Collapse
Affiliation(s)
- Li Pan
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Haoyue Wang
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| | - Bo Yang
- Hunan Institute of Science and Technology, Yueyang, 414006, China
- Hunan Engineering Research Center of Multimodal Health Sensing and Intelligent Analysis, Yueyang, 414006, China
| | - Wenbin Li
- Hunan Institute of Science and Technology, Yueyang, 414006, China.
| |
Collapse
|
16
|
Khan S, Uddin I, Khan M, Iqbal N, Alshanbari HM, Ahmad B, Khan DM. Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification. Sci Rep 2024; 14:9116. [PMID: 38643305 PMCID: PMC11551160 DOI: 10.1038/s41598-024-59777-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/15/2024] [Indexed: 04/22/2024] Open
Abstract
RNA modifications are pivotal in the development of newly synthesized structures, showcasing a vast array of alterations across various RNA classes. Among these, 5-hydroxymethylcytosine (5HMC) stands out, playing a crucial role in gene regulation and epigenetic changes, yet its detection through conventional methods proves cumbersome and costly. To address this, we propose Deep5HMC, a robust learning model leveraging machine learning algorithms and discriminative feature extraction techniques for accurate 5HMC sample identification. Our approach integrates seven feature extraction methods and various machine learning algorithms, including Random Forest, Naive Bayes, Decision Tree, and Support Vector Machine. Through K-fold cross-validation, our model achieved a notable 84.07% accuracy rate, surpassing previous models by 7.59%, signifying its potential in early cancer and cardiovascular disease diagnosis. This study underscores the promise of Deep5HMC in offering insights for improved medical assessment and treatment protocols, marking a significant advancement in RNA modification analysis.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Islam Uddin
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mukhtaj Khan
- Department of Information Technology, The University of Haripur, Haripur, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Huda M Alshanbari
- Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Bakhtiyar Ahmad
- Higher Education Department Afghanistan, Kabul, Afghanistan.
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| |
Collapse
|
17
|
Sinha S. Machine learning ranking of plausible (un)explored synergistic gene combinations using sensitivity indices of time series measurements of Wnt signaling pathway. Integr Biol (Camb) 2024; 16:zyae020. [PMID: 39606798 DOI: 10.1093/intbio/zyae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2024] [Revised: 09/25/2024] [Accepted: 11/14/2024] [Indexed: 11/29/2024]
Abstract
Combinations of genes or proteins work in synergy at different times and durations in a signaling pathway. However, which combinations are prevalent at a particular time point or duration is mostly not known. Sensitivity analysis plays a major role in computing the strength of the influence of involved factors in any phenomena under investigation. When applied to expression profiles of various intra/extracellular factors that work in a signaling pathway, the variance- and density-based analysis yields a range of sensitivity indices for individual and various combinations of factors. These combinations denote the higher order interactions among the involved factors, which might be of interest. In this work, after estimating the individual effects of factors for a higher order combination, the individual indices are considered as discriminative features. Exploiting the analogy of prioritizing webpages using ranking algorithms, for a particular order, a full set of combinations of genes can be prioritized based on these features using a powerful support vector ranking algorithm. Recording the changing rankings of the combinations over time points and durations reveals which higher order combinations influence the pathway and when and where an intervention might be necessary to affect the pathway. Integration, innovation, and insight Combinations of genes or proteins work in synergy at different times and durations in a signaling pathway. However, which combinations are prevalent at a particular time point or duration is mostly not known. This work develops a search engine that reveals ground-breaking results in the form of higher order (un)explored/(un)tested combinations (as biological hypotheses), based on sensitivity indices. These indices capture the strength of influence of factors (here genes/proteins) that affect a signaling pathway. Recording the changing rankings of these combinations over time points and durations reveals how higher order combinations behave within the pathway. Significance The manuscript develops a search engine that reveals ground-breaking results in the form of higher order (un)explored/(un)tested combinations of genes/proteins (as biological hypotheses), based on sensitivity indices that capture the strength of influence of factors (here genes/proteins) that affect the Wnt signaling pathway. The pipeline uses kernel-based sensitivity indices to capture the influence of the factors in a pathway and employs powerful support vector ranking algorithm. Because of the above point, biologists/oncologists will be able to narrow down their search to particular combinations that are ranked and, if a synergistic functioning is confirmed, will be able to study the mechanism between the components of a combination, in the Wnt pathway. The search engine design is not only limited to one dataset and a range of combinations of genes/proteins. The framework can be applied/modified to all problems where one is interested in searching for particular combinations of factors involved in a particular phenomena. Recording the changing rankings of the combinations over time points and durations reveals how higher order interactions behave within the pathway and when and where an intervention might be necessary to influence the pathway, for therapeutic purpose. It reveals the various unexplored FZD-WNT combinations that have been untested till now in the Wnt pathway.
Collapse
Affiliation(s)
- Shriprakash Sinha
- Independent Researcher, 104 Madhurisha Heights Phase 1, Risali 490006, Chhattisgarh, India
| |
Collapse
|