1
|
Gul G. In silico screening of peptide inhibitors targeting α-synuclein for Parkinson's disease. J Mol Graph Model 2025; 139:109079. [PMID: 40381333 DOI: 10.1016/j.jmgm.2025.109079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 05/05/2025] [Accepted: 05/13/2025] [Indexed: 05/20/2025]
Abstract
Parkinson's disease affects cognitive, motor, and autonomic functions due to nervous system degeneration. Though no cure exists, medications and therapies can help alleviate symptoms, but their effectiveness diminishes as the disease progresses, ultimately increasing the need for alternative treatments. α-Synuclein has long been one of the main targets in addressing Parkinson's through drug design studies, but no drugs are yet approved against α-Synuclein aggregation. Therefore, this study aims to develop potential inhibitors of fibrillization by screening thousands of peptides in terms of their binding abilities via Molecular Docking and Molecular Dynamics simulations. Our results show that peptides with Lysine and Arginine at terminal groups result in higher binding affinities to the C-terminal domain. Among the heptapeptides examined, RWRRKRL shows the highest binding free energy to the protein while KKRHKWR exhibits superior stabilizing effect, interacting with both N- and C-terminal regions of α-Synuclein. The inhibitory potential of peptides on the fibrillar structure of protein varies with concentration, and RWRRKRL at 1:3 protein-peptide monomer ratio shows promise as an inhibitor by reducing the internal H-bonds of the protein and increasing RMSD values. These results reveal that short-chain peptides can be designed against α-Synuclein oligomerization offering a potential therapeutic approach for preventing Parkinson's.
Collapse
Affiliation(s)
- Gulsah Gul
- Department of Chemical and Biological Engineering, Koç University, İstanbul, Turkey.
| |
Collapse
|
2
|
Cai J, Zhao J, Bin Y, Xia J, Zheng C. iAmyP: A Multi-view Learning for Amyloidogenic Hexapeptides Identification Based on Sequence Least Squares Programming. Interdiscip Sci 2025; 17:277-292. [PMID: 39546159 DOI: 10.1007/s12539-024-00666-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 10/07/2024] [Accepted: 10/09/2024] [Indexed: 11/17/2024]
Abstract
The development of peptide drug is hindered by the risk of amyloidogenic aggregation; if peptides tend to aggregate in this manner, they may be unsuitable for drug design. Computational methods aimed at predicting amyloidogenic sequences often face challenges in extracting high-quality features, and their predictive performance can be enchanced. To surmount these challenges, iAmyP was introduced as a specialized computational tool designed for predicting amyloidogenic hexapeptides. Utilizing multi-view learning, iAmyP incorporated sequence, structural, and evolutionary features, performing feature selection and feature fusion through recursive feature elimination and attention mechanisms. This amalgamation of features and subsequent feature selection and fusion lead to optimal performance facilitated by an optimization algorithm based on sequence least squares programming. Notably, iAmyP exhibited robust generalization for peptides with lengths of 7-10 amino acids. The role of hydrophobic amino acids in the aggregation process is critical, and a thorough analysis have significantly enhanced our insight into their significance in amyloidogenic hexapeptides. This tool represented an advancement in the development of peptide therapeutics by providing an understanding of amyloidogenic aggregation, establishing itself as a valuable framework for assessing amyloidogenic sequences. The data and code can be freely accessed at https://github.com/xialab-ahu/iAmyP .
Collapse
Affiliation(s)
- Jinling Cai
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China
| | - Jianping Zhao
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China.
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, and School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China.
| | - Junfeng Xia
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China.
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, China.
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, and School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| |
Collapse
|
3
|
Cho M, Been N, Son HS. Analysis of protein determinants of genotype-specific properties of group a rotaviruses using machine learning. Comput Biol Med 2025; 191:110143. [PMID: 40203739 DOI: 10.1016/j.compbiomed.2025.110143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 04/01/2025] [Accepted: 04/03/2025] [Indexed: 04/11/2025]
Abstract
Group A rotaviruses (RVAs) are the leading cause of viral diarrhoea across various host species, including mammals and birds. The VP7 and VP4 proteins of these viruses play critical roles in determining genotype specificity, influencing viral infectivity and host adaptation. This study employed machine-learning techniques to classify RVA genotypes based on the molecular and physicochemical properties of these proteins. A dataset of 94 VP7 and 68 VP4 protein sequences was collected from various host species. Seven machine-learning algorithms-Naïve Bayes (NB), logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbour (kNN), support vector machine (SVM), and artificial neural network (ANN)-were used for genotype classification. Feature subsets were configured using ranking-based attribute selection, and classification performance was evaluated using accuracy (ACC), precision, recall, Matthews' correlation coefficient (MCC), and the area under the curve (AUC). kNN demonstrated the highest classification accuracy for both VP7 (ACC = 97.87 %) and VP4 (ACC = 100 %), outperforming NB, LR, DT, RF, SVM, and ANN. For VP7 sequences, key properties influencing genotype classification included hydrophobicity, normalised van der Waals volume, and leucine composition. For VP4, polarity, normalised van der Waals volume, and polarizability were the most significant factors. In summary, the genotype-specific molecular features of VP7 and VP4 proteins served as reliable markers for RVA classification. Our findings highlight the potential of machine-learning approaches to predict RVA genotypes based on the physicochemical properties of amino acids, providing valuable insights into the molecular mechanisms that drive viral evolution, host specificity, and immune evasion.
Collapse
Affiliation(s)
- Myeongji Cho
- Laboratory of Computational Virology & Viroinformatics, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Public Health AI Lab, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Institute of Health and Environment, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea
| | - Nara Been
- Laboratory of Computational Virology & Viroinformatics, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Public Health AI Lab, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea
| | - Hyeon S Son
- Laboratory of Computational Virology & Viroinformatics, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Public Health AI Lab, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Institute of Health and Environment, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Interdisciplinary Graduate Program in Bioinformatics, College of Natural Science, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.
| |
Collapse
|
4
|
Yao Y, Zhang D, Fan H, Wu T, Su Y, Bin Y. Prediction of Chemically Modified Antimicrobial Peptides and Their Sub-functional Activities Using Hybrid Features. Probiotics Antimicrob Proteins 2025:10.1007/s12602-025-10575-6. [PMID: 40397268 DOI: 10.1007/s12602-025-10575-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2025] [Indexed: 05/22/2025]
Abstract
Antimicrobial peptides (AMPs) demonstrate a broad spectrum of activities against various pathogens, thereby offering a promising strategy to mitigate the urgent challenge of antimicrobial resistance. Recent studies indicate that chemically modified AMPs (cmAMPs), which contain chemically modified amino acids, have the potential to alleviate the adverse effects commonly associated with conventional AMPs. Nevertheless, there remains a notable deficiency in computational methods specifically designed for the analysis and prediction of cmAMPs and their sub-function predictions. In this study, we proposed a two-layer model, termed as iCMAMP, aimed for the identification of cmAMPs and their sub-functional activities. The first layer, referred to as iCMAMP-1L, integrates three categories encompassing seven distinct groups of features, in conjunction with an ensemble method designed at enhancing predictive accuracy for cmAMPs. This ensemble approach effectively extracts relevant insights from a heterogeneous array of features sets while addressing potential dimensionality challenges. On the test dataset, iCMAMP-1L achieved an ACC of 0.934 and an MCC of 0.868, representing improvements of 3.4% and 6.8%, respectively, over AntiMPmod, which is the sole existing method for predicting cmAMPs. A comparative analysis between cmAMPs and their corresponding AMPs revealed that chemical modifications can significantly reduce hemolysis and toxicity associated with AMPs, while the functional characteristics of the peptides are primarily determined by their sequences. The second layer of our model, designated as iCMAMP-2L, employed a multi-label classification approach to predict the sub-functional activities of cmAMPs, with a specific focus on the dipeptide composition-based features. On the test dataset, iCMAMP-2L achieved an Accuracy of 0.390 and an Absolute true of 0.621. The data and Python code used in the iCMAMP model are available at https://github.com/swicher123/iCMAMP/tree/master .
Collapse
Affiliation(s)
- Yujie Yao
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Daijun Zhang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Henghui Fan
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Ting Wu
- Department of Infectious Diseases & Anhui Province Key Laboratory of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, Anhui, China.
- Institute of Bacterial Resistance & Anhui Center for Surveillance of Bacterial Resistance, Anhui Medical University, Hefei, 230022, Anhui, China.
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, Hefei, 230601, Anhui, China.
| | - Yannan Bin
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
5
|
Gaffar S, Chong KT, Tayara H. TFProtBert: Detection of Transcription Factors Binding to Methylated DNA Using ProtBert Latent Space Representation. Int J Mol Sci 2025; 26:4234. [PMID: 40362469 PMCID: PMC12071566 DOI: 10.3390/ijms26094234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2025] [Revised: 04/22/2025] [Accepted: 04/24/2025] [Indexed: 05/15/2025] Open
Abstract
Transcription factors (TFs) are fundamental regulators of gene expression and perform diverse functions in cellular processes. The management of 3-dimensional (3D) genome conformation and gene expression relies primarily on TFs. TFs are crucial regulators of gene expression, performing various roles in biological processes. They attract transcriptional machinery to the enhancers or promoters of specific genes, thereby activating or inhibiting transcription. Identifying these TFs is a significant step towards understanding cellular gene expression mechanisms. Due to the time-consuming and labor-intensive nature of experimental methods, the development of computational models is essential. In this work, we introduced a two-layer prediction framework based on a support vector machine (SVM) using the latent space representation of a protein language model, ProtBert. The first layer of the method reliably predicts and identifies transcription factors (TFs), and in the second layer, the proposed method predicts and identifies transcription factors that prefer binding to methylated deoxyribonucleic acid (TFPMs). In addition, we also tested the proposed method on an imbalanced database. In detecting TFs and TFPMs, the proposed model consistently outperformed state-of-the-art approaches, as demonstrated by performance comparisons via empirical cross-validation analysis and independent tests.
Collapse
Affiliation(s)
- Saima Gaffar
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea;
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea;
- Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
6
|
Feng H, Nie Q, Yang S. SORFPP: Enhancing rich sequence-driven information to identify SEPs based on fused framework on validation datasets. PLoS One 2025; 20:e0320314. [PMID: 40294059 PMCID: PMC12036913 DOI: 10.1371/journal.pone.0320314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Accepted: 02/17/2025] [Indexed: 04/30/2025] Open
Abstract
BACKGROUND Genome sequencing has enabled us to find functional peptides encoded by short open read frames (sORFs) in long non-coding RNAs (lncRNAs). sORFs-encoded peptides (SEPs) regulate gene expression, signaling, and so on and have significant roles, unlike common peptides. Various computational methods have been proposed. However, there is a lack of contributive features and effective models. Therefore, a high-throughput computational method to predict SEPs is needed. RESULTS We propose a computational method, SORFPP, to predict SEPs by mining feature information from multiple perspectives in an experimentally validated dataset from TranLnc. SORFPP fully extracts SEP sequence information using the protein language model ESM-2 and curated traditional encoding, including QSOrder, k-mer, etc. SORFPP uses CatBoost to solve the sparsity problem of traditional encoding. SORFPP also analyzes ESM-2 pre-training characterization information with the Self-attention model. Finally, an ensemble learning framework combines the two models and their results are fed into Logistic Regression model for accurate and robust predictions. For comparison, SORFPP outperforms other state-of-the-art models in Matthew correlation coefficient by 12.2%-24.2% on three benchmark datasets. CONCLUSION Integrating the ensemble learning strategy with contributive traditional features and the protein language encoding methods shows better performance. Datasets and codes are accessible at https://doi.org/10.6084/m9.figshare.28079897 and http://111.229.198.94:5000/.
Collapse
Affiliation(s)
- Hongqi Feng
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, China
| | - Qi Nie
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, China
- The Affiliated Changzhou No.2 People’s Hospital of Nanjing Medical University, Changzhou, China
| |
Collapse
|
7
|
Shao Y, Liu T. iNClassSec-ESM: Discovering potential non-classical secreted proteins through a novel protein language model. Comput Struct Biotechnol J 2025; 27:1350-1358. [PMID: 40235638 PMCID: PMC11999076 DOI: 10.1016/j.csbj.2025.03.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Revised: 03/15/2025] [Accepted: 03/26/2025] [Indexed: 04/17/2025] Open
Abstract
Non-classical secreted proteins (NCSPs) are a class of proteins lacking signal peptides, secreted by Gram-positive bacteria through non-classical secretion pathways. With the increasing demand for highly secreted proteins in recent years, non-classical secretion pathways have received more attention due to their advantages over classical secretion pathways (Sec/Tat). However, because the mechanisms of non-classical secretion pathways are not yet clear, identifying NCSPs through biological experiments is expensive and time-consuming, making it imperative to develop computational methods to address this issue. Existing NCSP prediction methods mainly use traditional handcrafted features to represent proteins from sequence information, which limits the models' ability to capture complex protein characteristics. In this study, we proposed a novel NCSP predictor, iNClassSec-ESM, which combined deep learning with traditional classifiers to enhance prediction performance. iNClassSec-ESM integrates an XGBoost model trained on comprehensive handcrafted features and a Deep Neural Network (DNN) trained on hidden layer embeddings from the protein language model (PLM) ESM3. The ESM3 is the recently proposed multimodal PLM and has not yet been fully explored in terms of protein representation. Therefore, we extracted hidden layer embeddings from ESM3 as inputs for multiple classifiers and deep learning networks, and compared them with existing PLMs. Benchmark experiments indicate that iNClassSec-ESM outperforms most of existing methods across multiple performance metrics and could serve as an effective tool for discovering potential NCSPs. Additionally, the ESM3 hidden layer embeddings, as an innovative protein representation method, show great potential for the application in broader protein-related classification tasks. The source code of iNClassSec-ESM and the ESM3 embeddings extraction script are publicly available at https://github.com/AmamiyaHoshie/iNClassSec-ESM/.
Collapse
Affiliation(s)
- Yizhou Shao
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| |
Collapse
|
8
|
Ferrari ÁJR, Dixit SM, Thibeault J, Garcia M, Houliston S, Ludwig RW, Notin P, Phoumyvong CM, Martell CM, Jung MD, Tsuboyama K, Carter L, Arrowsmith CH, Guttman M, Rocklin GJ. Large-scale discovery, analysis, and design of protein energy landscapes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.20.644235. [PMID: 40196533 PMCID: PMC11974690 DOI: 10.1101/2025.03.20.644235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
All folded proteins continuously fluctuate between their low-energy native structures and higher energy conformations that can be partially or fully unfolded. These rare states influence protein function, interactions, aggregation, and immunogenicity, yet they remain far less understood than protein native states. Although native protein structures are now often predictable with impressive accuracy, conformational fluctuations and their energies remain largely invisible and unpredictable, and experimental challenges have prevented large-scale measurements that could improve machine learning and physics-based modeling. Here, we introduce a multiplexed experimental approach to analyze the energies of conformational fluctuations for hundreds of protein domains in parallel using intact protein hydrogen-deuterium exchange mass spectrometry. We analyzed 5,778 domains 28-64 amino acids in length, revealing hidden variation in conformational fluctuations even between sequences sharing the same fold and global folding stability. Site-resolved hydrogen exchange NMR analysis of 13 domains showed that these fluctuations often involve entire secondary structural elements with lower stability than the overall fold. Computational modeling of our domains identified structural features that correlated with the experimentally observed fluctuations, enabling us to design mutations that stabilized low-stability structural segments. Our dataset enables new machine learning-based analysis of protein energy landscapes, and our experimental approach promises to reveal these landscapes at unprecedented scale.
Collapse
Affiliation(s)
- Állan J. R. Ferrari
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Sugyan M. Dixit
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Jane Thibeault
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Mario Garcia
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Scott Houliston
- Structural Genomics Consortium, University of Toronto, Toronto, ON M5G 1L7, Canada; Princess Margaret Cancer Centre, University of Toronto, Toronto, ON M5G 2M9, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 2M9, Canada
| | - Robert W. Ludwig
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Pascal Notin
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Claire M. Phoumyvong
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Cydney M. Martell
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Michelle D. Jung
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Kotaro Tsuboyama
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Current address: Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
| | - Lauren Carter
- Department of Biochemistry, University of Washington, Seattle, WA, USA. Current address: Bill & Melinda Gates Medical Research Institute
| | - Cheryl H. Arrowsmith
- Structural Genomics Consortium, University of Toronto, Toronto, ON M5G 1L7, Canada; Princess Margaret Cancer Centre, University of Toronto, Toronto, ON M5G 2M9, Canada; Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 2M9, Canada
| | - Miklos Guttman
- Department of Medicinal Chemistry, University of Washington, Seattle, WA, USA
| | - Gabriel J. Rocklin
- Department of Pharmacology & Center for Synthetic Biology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
- Robert H. Lurie Comprehensive Cancer Center, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| |
Collapse
|
9
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Stack-AVP: A Stacked Ensemble Predictor Based on Multi-view Information for Fast and Accurate Discovery of Antiviral Peptides. J Mol Biol 2025; 437:168853. [PMID: 39510347 DOI: 10.1016/j.jmb.2024.168853] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 10/22/2024] [Accepted: 10/31/2024] [Indexed: 11/15/2024]
Abstract
AVPs, or antiviral peptides, are short chains of amino acids capable of inhibiting viral replication, preventing viral entry, or disrupting viral membranes. They represent a promising area of research for developing new antiviral therapies due to their potential to target a broad spectrum of viruses, incorporating those resistant to traditional antiviral drugs. However, traditional experimental methods for identifying AVPs are often costly and labour-intensive. Thus far, multiple computational methods have been introduced for the in silico identification of AVPs, but these methods still have certain shortcomings. In this study, we propose a novel stacked ensemble learning framework, termed Stack-AVP, for fast and accurate AVP identification. In Stack-AVP, we investigated heterogeneous prediction models, which were trained with 12 commonly used machine learning algorithms coupled with a wide range of multiple feature encoding schemes. Subsequently, these prediction models were adopted to generate multi-view features providing class information and probability information. Finally, we applied our feature selection method to determine the best feature subset for the construction of the final stacked model. Comparative assessments on the independent test dataset revealed that Stack-AVP surpassed the performance of current state-of-the-art methods, with an accuracy of 0.930, MCC of 0.860, and AUC of 0.975. Furthermore, it was found that our multi-view features exhibited a crucial mechanism to improve the prediction performance of AVPs. To facilitate experimental scientists in performing high-throughput identification of AVPs, the prediction sever Stack-AVP is publicly accessible at https://pmlabqsar.pythonanywhere.com/Stack-AVP.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; Kasetsart University International College (KUIC), Kasetsart University, Bangkok 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
10
|
Ho CH, Chu YW, Huang LY, Chen CW. SUMO-LMNet: Lossless mapping network for predicting SUMOylation sites in SUMO1 and SUMO2 using high-dimensional features. Comput Struct Biotechnol J 2025; 27:1048-1059. [PMID: 40143924 PMCID: PMC11937687 DOI: 10.1016/j.csbj.2025.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2024] [Revised: 03/02/2025] [Accepted: 03/04/2025] [Indexed: 03/28/2025] Open
Abstract
Accurate SUMOylation site prediction is crucial for deciphering gene regulation and disease mechanisms. However, distinguishing SUMO1 and SUMO2 modifications remains a major challenge due to their structural similarities. Conventional prediction models often struggle to differentiate between these paralogues, limiting their applicability in biological research. To address this, we introduce SUMO-LMNet, a deep learning-based framework for the precise prediction of SUMO1 and SUMO2 sites. Unlike previous models, SUMO-LMNet integrates a lossless mapping strategy and deep learning architectures to enhance both prediction accuracy and interpretability. Our model extracts high-dimensional features from sequences and transforms them into two-dimensional feature maps, enabling convolutional neural networks (CNNs) to effectively capture both local and global dependencies within the data. By leveraging a Lossless Mapping Network (LM-Net), this approach preserves the original feature space, ensuring that feature integrity is retained without loss of spatial information. While Grad-CAM highlights key features in individual predictions, it lacks consistency across samples and does not provide a dataset-wide evaluation of feature importance. To address this, we introduce Combined Heatmap Feature Analysis (CHFA), which systematically aggregates feature importance across multiple samples, providing a more reliable and interpretable dataset-wide assessment. Experimental results reveal distinct feature dependencies between SUMO1 and SUMO2, underscoring the necessity of paralogue-specific predictive models. Through a systematic comparison of multiple neural network architectures, we demonstrate that our model achieves over 80 % accuracy in distinguishing SUMO1 and SUMO2 modification sites. By prioritizing candidate sites for further study, our model aids experimental design and accelerates the discovery of biologically relevant SUMOylation targets. SUMO-LMNet is publicly available at https://predictor.isu.edu.tw/sumo-lmnet.
Collapse
Affiliation(s)
- Cheng-Hsun Ho
- Department of Medical Laboratory Science, College of Medical Science and Technology, I-Shou University, Kaohsiung City, Taiwan
| | - Yen-Wei Chu
- Graduate Institute of Genomics and Bioinformatics, National Chung Hsing University, Taichung City, Taiwan
- Doctoral Program in Medical Biotechnology, National Chung Hsing University, Taichung City, Taiwan
- Institute of Molecular Biology, National Chung Hsing University, Taichung City, Taiwan
- Smart Sustainable New Agriculture Research Center (SMARTer), Taichung City, Taiwan
| | - Lan-Ying Huang
- Doctoral Program in Medical Biotechnology, National Chung Hsing University, Taichung City, Taiwan
| | - Chi-Wei Chen
- Graduate Degree Program of Smart Healthcare & Bioinformatics, I-Shou University, Kaohsiung City, Taiwan
- Department of Biomedical Engineering, I-Shou University, Kaohsiung City, Taiwan
| |
Collapse
|
11
|
Sun J, Ru J, Cribbs AP, Xiong D. PyPropel: a Python-based tool for efficiently processing and characterising protein data. BMC Bioinformatics 2025; 26:70. [PMID: 40025421 PMCID: PMC11871610 DOI: 10.1186/s12859-025-06079-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 02/10/2025] [Indexed: 03/04/2025] Open
Abstract
BACKGROUND The volume of protein sequence data has grown exponentially in recent years, driven by advancements in metagenomics. Despite this, a substantial proportion of these sequences remain poorly annotated, underscoring the need for robust bioinformatics tools to facilitate efficient characterisation and annotation for functional studies. RESULTS We present PyPropel, a Python-based computational tool developed to streamline the large-scale analysis of protein data, with a particular focus on applications in machine learning. PyPropel integrates sequence and structural data pre-processing, feature generation, and post-processing for model performance evaluation and visualisation, offering a comprehensive solution for handling complex protein datasets. CONCLUSION PyPropel provides added value over existing tools by offering a unified workflow that encompasses the full spectrum of protein research, from raw data pre-processing to functional annotation and model performance analysis, thereby supporting efficient protein function studies.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK.
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
| | - Adam P Cribbs
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA.
| |
Collapse
|
12
|
Ochoa R, Deibler K. PepFuNN: Novo Nordisk Open-Source Toolkit to Enable Peptide in Silico Analysis. J Pept Sci 2025; 31:e3666. [PMID: 39777768 PMCID: PMC11706630 DOI: 10.1002/psc.3666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 12/04/2024] [Accepted: 12/09/2024] [Indexed: 01/11/2025]
Abstract
We present PepFuNN, a new open-source version of the PepFun package with functions to study the chemical space of peptide libraries and perform structure-activity relationship analyses. PepFuNN is a Python package comprising five modules to study peptides with natural amino acids and, in some cases, sequences with non-natural amino acids based on the availability of a public monomer dictionary. The modules allow calculating physicochemical properties, performing similarity analysis using different peptide representations, clustering peptides using molecular fingerprints or calculated descriptors, designing peptide libraries based on specific requirements, and a module dedicated to extracting matched pairs from experimental campaigns to guide the selection of the most relevant mutations in design new rounds. The code and tutorials are available at https://github.com/novonordisk-research/pepfunn.
Collapse
Affiliation(s)
| | - Kristine Deibler
- Novo Nordisk Research Center Seattle, Novo Nordisk A/SSeattleWashingtonUSA
| |
Collapse
|
13
|
Yue J, Li T, Xu J, Chen Z, Li Y, Liang S, Liu Z, Wang Y. Discovery of anticancer peptides from natural and generated sequences using deep learning. Int J Biol Macromol 2025; 290:138880. [PMID: 39706427 DOI: 10.1016/j.ijbiomac.2024.138880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 12/23/2024]
Abstract
Anticancer peptides (ACPs) demonstrate significant potential in clinical cancer treatment due to their ability to selectively target and kill cancer cells. In recent years, numerous artificial intelligence (AI) algorithms have been developed. However, many predictive methods lack sufficient wet lab validation, thereby constraining the progress of models and impeding the discovery of novel ACPs. This study proposes a comprehensive research strategy by introducing CNBT-ACPred, an ACP prediction model based on a three-channel deep learning architecture, supported by extensive in vitro and in vivo experiments. CNBT-ACPred achieved an accuracy of 0.9554 and a Matthews Correlation Coefficient (MCC) of 0.8602. Compared to existing excellent models, CNBT-ACPred increased accuracy by at least 5 % and improved MCC by 15 %. Predictions were conducted on over 3.8 million sequences from Uniprot, along with 100,000 sequences generated by a deep generative model, ultimately identifying 37 out of 41 candidate peptides from >30 species that exhibited effective in vitro tumor inhibitory activity. Among these, tPep14 demonstrated significant anticancer effects in two mouse xenograft models without detectable toxicity. Finally, the study revealed correlations between the amino acid composition, structure, and function of the identified ACP candidates.
Collapse
Affiliation(s)
- Jianda Yue
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Tingting Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Jiawei Xu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zihui Chen
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| |
Collapse
|
14
|
Gao Q, Xu T, Li X, Gao W, Shi H, Zhang Y, Chen J, Yue Z. Interpretable Dynamic Directed Graph Convolutional Network for Multi-Relational Prediction of Missense Mutation and Drug Response. IEEE J Biomed Health Inform 2025; 29:1514-1524. [PMID: 39423073 DOI: 10.1109/jbhi.2024.3483316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2024]
Abstract
Tumor heterogeneity presents a significant challenge in predicting drug responses, especially as missense mutations within the same gene can lead to varied outcomes such as drug resistance, enhanced sensitivity, or therapeutic ineffectiveness. These complex relationships highlight the need for advanced analytical approaches in oncology. Due to their powerful ability to handle heterogeneous data, graph convolutional networks (GCNs) represent a promising approach for predicting drug responses. However, simple bipartite graphs cannot accurately capture the complex relationships involved in missense mutation and drug response. Furthermore, Deep learning models for drug response are often considered "black boxes", and their interpretability remains a widely discussed issue. To address these challenges, we propose an Interpretable Dynamic Directed Graph Convolutional Network (IDDGCN) framework, which incorporates four key features: 1) the use of directed graphs to differentiate between sensitivity and resistance relationships, 2) the dynamic updating of node weights based on node-specific interactions, 3) the exploration of associations between different mutations within the same gene and drug response, and 4) the enhancement of interpretability models through the integration of a weighted mechanism that accounts for the biological significance, alongside a ground truth construction method to evaluate prediction transparency. The experimental results demonstrate that IDDGCN outperforms existing state-of-the-art models, exhibiting excellent predictive power. Both qualitative and quantitative evaluations of its interpretability further highlight its ability to explain predictions, offering a fresh perspective for precision oncology and targeted drug development.
Collapse
|
15
|
Viesi E, Perricone U, Aloy P, Giugno R. APBIO: bioactive profiling of air pollutants through inferred bioactivity signatures and prediction of novel target interactions. J Cheminform 2025; 17:13. [PMID: 39891207 PMCID: PMC11786462 DOI: 10.1186/s13321-025-00961-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Accepted: 01/20/2025] [Indexed: 02/03/2025] Open
Abstract
More sophisticated representations of compounds attempt to incorporate not only information on the structure and physicochemical properties of molecules, but also knowledge about their biological traits, leading to the so-called bioactivity profile. The bioactive profiling of air pollutants is challenging and crucial, as their biological activity and toxicological effects have not been deeply investigated yet, and further exploration could shed light on the impact of air pollution on complex disorders. Therefore, a biological signature that simultaneously captures the chemistry and the biology of small molecules may be beneficial in predicting the behaviour of such ligands towards a protein target. Moreover, the interactivity between biological entities can be represented through combined feature vectors that can be given as input to a machine learning (ML) model to capture the underlying interaction. To this end, we propose a chemogenomic approach, called Air Pollutant Bioactivity (APBIO), which integrates compound bioactivity signatures and target sequence descriptors to train ML classifiers subsequently used to predict potential compound-target interactions (CTIs). We report the performances of the proposed methodology and, via external validation sets, demonstrate its outperformance compared to existing molecular representations in terms of model generalizability. We have also developed a publicly available Streamlit application for APBIO at ap-bio.streamlit.app, allowing users to predict associations between investigated compounds and protein targets.Scientific contributionWe derived ex novo bioactivity signatures for air pollutant molecules to capture their biological behaviour and associations with protein targets. The proposed chemogenomic methodology enables the prediction of novel CTIs for known or similar compounds and targets through well-established and efficient ML models, deepening our insight into the molecular interactions and mechanisms that may have a deleterious impact on human biological systems.
Collapse
Affiliation(s)
- Eva Viesi
- Department of Computer Science, University of Verona, Verona, Italy.
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
- NBFC, National Biodiversity Future Center, Palermo, Italy.
| | - Ugo Perricone
- Molecular Informatics Unit, Ri.MED Foundation, Palermo, Italy
- NBFC, National Biodiversity Future Center, Palermo, Italy
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Catalonia, Spain
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona, Italy
- NBFC, National Biodiversity Future Center, Palermo, Italy
| |
Collapse
|
16
|
Emmanuel J, Isewon I, Oyelade J. An optimized deep-forest algorithm using a modified differential evolution optimization algorithm: A case of host-pathogen protein-protein interaction prediction. Comput Struct Biotechnol J 2025; 27:595-611. [PMID: 39995682 PMCID: PMC11849198 DOI: 10.1016/j.csbj.2025.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Revised: 01/21/2025] [Accepted: 01/21/2025] [Indexed: 02/26/2025] Open
Abstract
Deep Forest employs forest structures and leverages deep architecture to learn feature vector information adaptively. However, deep forest-based models have limitations such as manual hyperparameter optimization and time and memory usage inefficiencies. Bayesian optimization is a widely used model-based hyperparameter optimization method. Evolutionary algorithms such as Differential Evolution (DE) have recently been introduced to improve Bayesian optimization's acquisition function. Despite its effectiveness, DE has a significant drawback as it relies on randomly selecting indices from the population of target vectors to construct donor vectors in search of optimal solutions. This randomness is ineffective, as suboptimal or redundant indices may be selected. Therefore, in this research we developed a modified differential evolution (DE) acquisition function for improved host-pathogen protein-protein interaction prediction. The modified DE introduces a weighted and adaptive donor vector technique that selects the best-fitted donor vectors as opposed to the random approach. This modified optimization approach was implemented in a deep forest model for automatic hyperparameter optimization. The performance of the optimized deep forest model was evaluated on human-Plasmodium falciparum protein sequence datasets using 10-fold cross-validation. The results were compared with standard optimization methods such as traditional Bayesian optimization, genetic algorithms, evolutionary strategies, and other machine learning models. The optimized model achieved an accuracy of 89.3 %, outperforming other models across all metrics, including a sensitivity of 85.4 % and a precision of 91.6 %. Additionally, the optimized model predicted seven novel host-pathogen interactions. Finally, the model was implemented as a web application which is accessible at http://dfh3pi.covenantuniversity.edu.ng.
Collapse
Affiliation(s)
- Jerry Emmanuel
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Nigeria
- Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Nigeria
- Covenant University Bioinformatics Research (CUBRe), Nigeria
| |
Collapse
|
17
|
Luo Z, Wang Q, Xia Y, Zhu X, Yang S, Xu Z, Gu L. DLBWE-Cys: a deep-learning-based tool for identifying cysteine S-carboxyethylation sites using binary-weight encoding. Front Genet 2025; 15:1464976. [PMID: 39845187 PMCID: PMC11751040 DOI: 10.3389/fgene.2024.1464976] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 12/23/2024] [Indexed: 01/24/2025] Open
Abstract
Cysteine S-carboxyethylation, a novel post-translational modification (PTM), plays a critical role in the pathogenesis of autoimmune diseases, particularly ankylosing spondylitis. Accurate identification of S-carboxyethylation modification sites is essential for elucidating their functional mechanisms. Unfortunately, there are currently no computational tools that can accurately predict these sites, posing a significant challenge to this area of research. In this study, we developed a new deep learning model, DLBWE-Cys, which integrates CNN, BiLSTM, Bahdanau attention mechanisms, and a fully connected neural network (FNN), using Binary-Weight encoding specifically designed for the accurate identification of cysteine S-carboxyethylation sites. Our experimental results show that our model architecture outperforms other machine learning and deep learning models in 5-fold cross-validation and independent testing. Feature comparison experiments confirmed the superiority of our proposed Binary-Weight encoding method over other encoding techniques. t-SNE visualization further validated the model's effective classification capabilities. Additionally, we confirmed the similarity between the distribution of positional weights in our Binary-Weight encoding and the allocation of weights in attentional mechanisms. Further experiments proved the effectiveness of our Binary-Weight encoding approach. Thus, this model paves the way for predicting cysteine S-carboxyethylation modification sites in protein sequences. The source code of DLBWE-Cys and experiments data are available at: https://github.com/ztLuo-bioinfo/DLBWE-Cys.
Collapse
Affiliation(s)
- Zhengtao Luo
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Qingyong Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Yingchun Xia
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Shuai Yang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| | - Zhaochun Xu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, China
- School for Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, China
| | - Lichuan Gu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Anhui Agricultural University, Hefei, Anhui, China
| |
Collapse
|
18
|
Hassan MT, Tayara H, Chong KT. Possum: identification and interpretation of potassium ion inhibitors using probabilistic feature vectors. Arch Toxicol 2025; 99:225-235. [PMID: 39438319 DOI: 10.1007/s00204-024-03888-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024]
Abstract
The flow of potassium ions through cell membranes plays a crucial role in facilitating various cell processes such as hormone secretion, epithelial function, maintenance of electrochemical gradients, and electrical impulse formation. Potassium ion inhibitors are considered promising alternatives in treating cancer, muscle weakness, renal dysfunction, endocrine disorders, impaired cellular function, and cardiac arrhythmia. Thus, it becomes essential to identify and understand potassium ion inhibitors in order to regulate the ion flow across ion channels. In this study, we created a meta-model, POSSUM, for the identification of potassium ion inhibitors. Two distinct datasets were used for training, testing, and evaluation of the meta-model. We employed seven feature descriptors and five distinctive classifiers to construct 35 baseline models. We used the mean Gini index score to select the optimal base models and classifiers. The POSSUM method was trained on the optimal probabilistic feature vectors. The proposed optimal model, POSSUM, outperforms the baseline models and the existing methods on both datasets. We anticipate POSSUM will be a very useful tool and will be essential in the process of finding and screening possible potassium ion inhibitors.
Collapse
Affiliation(s)
- Mir Tanveerul Hassan
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea.
- Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju, 54896, Jeollabuk-do, South Korea.
| |
Collapse
|
19
|
Liang Y, Ma X, Li J, Zhang S. iACVP-MR: Accurate Identification of Anti-coronavirus Peptide based on Multiple Features Information and Recurrent Neural Network. Curr Med Chem 2025; 32:2055-2067. [PMID: 38549527 DOI: 10.2174/0109298673277663240101111507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/26/2023] [Accepted: 11/30/2023] [Indexed: 05/14/2024]
Abstract
BACKGROUND Over the years, viruses have caused human illness and threatened human health. Therefore, it is pressing to develop anti-coronavirus infection drugs with clear function, low cost, and high safety. Anti-coronavirus peptide (ACVP) is a key therapeutic agent against coronavirus. Traditional methods for finding ACVP need a great deal of money and man power. Hence, it is a significant task to establish intelligent computational tools to able rapid, efficient and accurate identification of ACVP. METHODS In this paper, we construct an excellent model named iACVP-MR to identify ACVP based on multiple features and recurrent neural networks. Multiple features are extracted by using reduced amino acid component and dipeptide component, compositions of k-spaced amino acid pairs, BLOSUM62 encoder according to the N5C5 sequence, as well as second-order moving average approach based on 16 physicochemical properties. Then, two recurrent neural networks named long-short term memory (LSTM) and bidirectional gated recurrent unit (BiGRU) combined attention mechanism are used for feature fusion and classification, respectively. RESULTS The accuracies of ENNAVIA-C and ENNAVIA-D datasets under the 10-fold cross-validation are 99.15% and 98.92%, respectively, and other evaluation indexes have also obtained satisfactory results. The experimental results show that our model is superior to other existing models. CONCLUSION The iACVP-MR model can be viewed as a powerful and intelligent tool for the accurate identification of ACVP. The datasets and source codes for iACVP-MR are freely downloaded at https://github.com/yunyunliang88/iACVP-MR.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Xinyan Ma
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Jin Li
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P.R. China
| |
Collapse
|
20
|
Zhu L, Chen Z, Yang S. EnDM-CPP: A Multi-view Explainable Framework Based on Deep Learning and Machine Learning for Identifying Cell-Penetrating Peptides with Transformers and Analyzing Sequence Information. Interdiscip Sci 2024:10.1007/s12539-024-00673-4. [PMID: 39714579 DOI: 10.1007/s12539-024-00673-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 10/28/2024] [Accepted: 11/01/2024] [Indexed: 12/24/2024]
Abstract
Cell-Penetrating Peptides (CPPs) are a crucial carrier for drug delivery. Since the process of synthesizing new CPPs in the laboratory is both time- and resource-consuming, computational methods to predict potential CPPs can be used to find CPPs to enhance the development of CPPs in therapy. In this study, EnDM-CPP is proposed, which combines machine learning algorithms (SVM and CatBoost) with convolutional neural networks (CNN and TextCNN). For dataset construction, three previous CPP benchmark datasets, including CPPsite 2.0, MLCPP 2.0, and CPP924, are merged to improve the diversity and reduce homology. For feature generation, two language model-based features obtained from the Transformer architecture, including ProtT5 and ESM-2, are employed in CNN and TextCNN. Additionally, sequence features, such as CPRS, Hybrid PseAAC, KSC, etc., are input to SVM and CatBoost. Based on the result of each predictor, Logistic Regression (LR) is built to predict the final decision. The experiment results indicate that ProtT5 and ESM-2 fusion features significantly contribute to predicting CPP and that combining employed features and models demonstrates better association. On an independent test dataset comparison, EnDM-CPP achieved an accuracy of 0.9495 and a Matthews correlation coefficient of 0.9008 with an improvement of 2.23%-9.48% and 4.32%-19.02%, respectively, compared with other state-of-the-art methods. Code and data are available at https://github.com/tudou1231/EnDM-CPP.git .
Collapse
Affiliation(s)
- Lun Zhu
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou, 213164, China
| | - Zehua Chen
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou, 213164, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence, Aliyun School of Big Data, School of Software, Changzhou University, Changzhou, 213164, China.
- The Affiliated Changzhou No. 2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China.
| |
Collapse
|
21
|
Wang Z, Wu J, Zheng M, Geng C, Zhen B, Zhang W, Wu H, Xu Z, Xu G, Chen S, Li X. StaPep: An Open-Source Toolkit for Structure Prediction, Feature Extraction, and Rational Design of Hydrocarbon-Stapled Peptides. J Chem Inf Model 2024; 64:9361-9373. [PMID: 39503524 DOI: 10.1021/acs.jcim.4c01718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2024]
Abstract
All-hydrocarbon stapled peptides, with their covalent side-chain constraints, provide enhanced proteolytic stability and membrane permeability, making them superior to linear peptides. However, tools for extracting structural and physicochemical descriptors to predict the properties of hydrocarbon-stapled peptides are lacking. To address this, we present StaPep, a Python-based toolkit for generating 3D structures and calculating 21 features for hydrocarbon-stapled peptides. StaPep supports peptides containing two non-standard amino acids (norleucine and 2-aminoisobutyric acid) and six non-natural anchoring residues (S3, S5, S8, R3, R5, and R8), with customization options for other non-standard amino acids. We showcase StaPep's utility through three case studies. The first generates 3D structures of these peptides with a mean RMSD of 1.62 ± 0.86, offering essential structural insights for drug design and biological activity prediction. The second develops machine learning models based on calculated molecular features to differentiate between membrane-permeable and non-permeable stapled peptides, achieving an AUC of 0.93. The third constructs regression models to predict the antimicrobial activity of stapled peptides against Escherichia coli, with a Pearson correlation of 0.84. StaPep's pipeline spans data retrieval, structure generation, feature calculation, and machine learning modeling for hydrocarbon-stapled peptides. The source codes and data set are freely available on Github: https://github.com/dahuilangda/stapep_package.
Collapse
Affiliation(s)
- Zhe Wang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Hangzhou VicrobX Biotech Co., Ltd., Hangzhou 310018, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou 311215, China
| | - Mengjun Zheng
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Chenchen Geng
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Borui Zhen
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Wei Zhang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
- Hangzhou VicrobX Biotech Co., Ltd., Hangzhou 310018, China
| | - Hui Wu
- Huadong Medicine Co., Ltd., Hangzhou 310015, China
| | - Zhengyang Xu
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| | - Gang Xu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China
| | - Si Chen
- School of Medicine, Shanghai University, Shanghai 200444, China
| | - Xiang Li
- School of Pharmacy, Second Military Medical University, Shanghai 200433, China
| |
Collapse
|
22
|
Conte A, Gulmini N, Costa F, Cartura M, Bröhl F, Patanè F, Filippini F. NERVE 2.0: boosting the new enhanced reverse vaccinology environment via artificial intelligence and a user-friendly web interface. BMC Bioinformatics 2024; 25:378. [PMID: 39695945 PMCID: PMC11654298 DOI: 10.1186/s12859-024-06004-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Accepted: 12/03/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Vaccines development in this millennium started by the milestone work on Neisseria meningitidis B, reporting the invention of Reverse Vaccinology (RV), which allows to identify vaccine candidates (VCs) by screening bacterial pathogens genome or proteome through computational analyses. When NERVE (New Enhanced RV Environment), the first RV software integrating tools to perform the selection of VCs, was released, it prompted further development in the field. However, the problem-solving potential of most, if not all, RV programs is still largely unexploited by experimental vaccinologists that impaired by somehow difficult interfaces, requiring bioinformatic skills. RESULTS We report here on the development and release of NERVE 2.0 (available at: https://nerve-bio.org ) which keeps the original integrative and modular approach of NERVE, while showing higher predictive performance than its previous version and other web-RV programs (Vaxign and Vaxijen). We renewed some of its modules and added innovative ones, such as Loop-Razor, to recover fragments of promising vaccine candidates or Epitope Prediction for the epitope prediction binding affinities and population coverage. Along with two newly built AI (Artificial Intelligence)-based models: ESPAAN and Virulent. To improve user-friendliness, NERVE was shifted to a tutored, web-based interface, with a noSQL-database to consent the user to submit, obtain and retrieve analysis results at any moment. CONCLUSIONS With its redesigned and updated environment, NERVE 2.0 allows customisable and refinable bacterial protein vaccine analyses to all different kinds of users.
Collapse
Affiliation(s)
- Andrea Conte
- Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, Padua, Italy
| | - Nicola Gulmini
- Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, Padua, Italy
| | - Francesco Costa
- Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, Padua, Italy
- EMBL-European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| | - Matteo Cartura
- Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, Padua, Italy
| | | | - Francesco Patanè
- Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, Padua, Italy
| | - Francesco Filippini
- Synthetic Biology and Biotechnology Unit, Department of Biology, University of Padua, Padua, Italy.
| |
Collapse
|
23
|
Contreras-Torres E, Marrero-Ponce Y. MD-LAIs Software: Computing Whole-Sequence and Amino Acid-Level "Embeddings" for Peptides and Proteins. J Chem Inf Model 2024; 64:8665-8672. [PMID: 39552512 DOI: 10.1021/acs.jcim.3c01189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Several computational tools have been developed to calculate sequence-based molecular descriptors (MDs) for peptides and proteins. However, these tools have certain limitations: 1) They generally lack capabilities for curating input data. 2) Their outputs often exhibit significant overlap. 3) There is limited availability of MDs at the amino acid (aa) level. 4) They lack flexibility in computing specific MDs. To address these issues, we developed MD-LAIs (Molecular Descriptors from Local Amino acid Invariants), Java-based software designed to compute both whole-sequence and aa-level MDs for peptides and proteins. These MDs are generated by applying aggregation operators (AOs) to macromolecular vectors containing the chemical-physical and structural properties of aas. The set of AOs includes both nonclassical (e.g., Minkowski norms) and classical AOs (e.g., Radial Distribution Function). Classical AOs capture neighborhood structural information at different k levels, while nonclassical AOs are applied using a sliding window to generalize the aa-level output. A weighting system based on fuzzy membership functions is also included to account for the contributions of individual aas. MD-LAIs features: 1) a module for data curation tasks, 2) a feature selection module, 3) projects of highly relevant MDs, and 4) low-dimensional lists of informative global and aa-level MDs. Overall, we expect that MD-LAIs will be a valuable tool for encoding protein or peptide sequences. The software is freely available as a stand-alone system on GitHub (https://github.com/Grupo-Medicina-Molecular-y-Traslacional/MD_LAIS).
Collapse
Affiliation(s)
- Ernesto Contreras-Torres
- Norewian Cruise Line Holdings Limited, Corporate Center Drive, Miami, Florida 33216, United States
- Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México 03920, México
| | - Yovani Marrero-Ponce
- Facultad de Ingeniería, Universidad Panamericana, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México 03920, México
- Universidad San Francisco de Quito (USFQ), Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Edificio de Especialidades Médicas, Quito 170157 Pichincha, Ecuador
| |
Collapse
|
24
|
Shukla R, Singh TR. AlzGenPred - CatBoost-based gene classifier for predicting Alzheimer's disease using high-throughput sequencing data. Sci Rep 2024; 14:30294. [PMID: 39639110 PMCID: PMC11621786 DOI: 10.1038/s41598-024-82208-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 12/03/2024] [Indexed: 12/07/2024] Open
Abstract
AD is a progressive neurodegenerative disorder characterized by memory loss. Due to the advancement in next-generation sequencing, an enormous amount of AD-associated genomics data is available. However, the information about the involvement of these genes in AD association is still a research topic. Therefore, AlzGenPred is developed to identify the AD-associated genes using machine-learning. A total of 13,504 features derived from eight sequence-encoding schemes were generated and evaluated using 16 machine learning algorithms. Network-based features significantly outperformed sequence-based features, effectively distinguishing AD-associated genes. In contrast, sequence-based features failed to classify accurately. To improve performance, we generated 24 fused features (6020 D) from sequence-based encodings, increasing accuracy by 5-7% using a two-step lightGBM-based recursive feature selection method. However, accuracy remained below 70% even after hyperparameter tuning. Therefore, network-based features were used to generate the CatBoost-based ML method AlzGenPred with 96.55% accuracy and 98.99% AUROC. The developed method is tested on the AlzGene dataset where it showed 96.43% accuracy. Then the model was validated using the transcriptomics dataset. AlzGenPred provides a reliable and user-friendly tool for identifying potential AD biomarkers, accelerating biomarker discovery, and advancing our understanding of AD. It is available at https://www.bioinfoindia.org/alzgenpred/ and https://github.com/shuklarohit815/AlzGenPred .
Collapse
Affiliation(s)
- Rohit Shukla
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India
- Center of Excellence for Aging and Brain Repair, Morsani College of Medicine, University of South Florida, Tampa, 33613, FL, USA
| | - Tiratha Raj Singh
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India.
- Centre of Healthcare Technologies and Informatics (CEHTI), Jaypee University of Information Technology (JUIT), Waknaghat, Solan, 173234, H.P., India.
| |
Collapse
|
25
|
Julian W, Sergeeva O, Cao W, Wu C, Erokwu B, Flask C, Zhang L, Wang X, Basilion J, Yang S, Lee Z. Searching for Protein Off-Targets of Prostate-Specific Membrane Antigen-Targeting Radioligands in the Salivary Glands. Cancer Biother Radiopharm 2024; 39:721-732. [PMID: 39268679 PMCID: PMC11824224 DOI: 10.1089/cbr.2024.0066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2024] Open
Abstract
Background: Prostate specific membrane antigen (PSMA)-targeted radioligand therapies represent a highly effective treatment for metastatic prostate cancer. However, high and sustain uptake of PSMA-ligands in the salivary glands led to dose limiting dry mouth (xerostomia), especially with α-emitters. The expression of PSMA and histologic analysis couldn't directly explain the toxicity, suggesting a potential off-target mediator for uptake. In this study, we searched for possible off-target non-PSMA protein(s) in the salivary glands. Methods: A machine-learning based quantitative structure activity relationship (QSAR) model was built for seeking the possible off-target(s). The resulting target candidates from the model prediction were subjected to further analysis for salivary protein expression and structural homology at key regions required for PSMA-ligand binding. Furthermore, cellular binding assays were performed utilizing multiple cell lines with high expression of the candidate proteins and low expression of PSMA. Finally, PSMA knockout (PSMA-/-) mice were scanned by small animal PET/MR using [68Ga]Ga-PSMA-11 for in-vivo validation. Results: The screening of the trained QSAR model did not yield a solid off-target protein, which was corroborated in part by cellular binding assays. Imaging using PSMA-/- mice further demonstrated markedly reduced PSMA-radioligand uptake in the salivary glands. Conclusion: Uptake of the PSMA-targeted radioligands in the salivary glands remains primarily PSMA-mediated. Further investigations are needed to illustrate a seemingly different process of uptake and retention in the salivary glands than that in prostate cancer.
Collapse
Affiliation(s)
- William Julian
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Olga Sergeeva
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Wei Cao
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Chunying Wu
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Bernadette Erokwu
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Chris Flask
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Lifang Zhang
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Xinning Wang
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
- Biomedical Engineering Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - James Basilion
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
- Biomedical Engineering Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Sichun Yang
- Nutrition Department, Case Western Reserve University, Cleveland, Ohio, USA
| | - Zhenghong Lee
- Radiology Department, Case Western Reserve University, Cleveland, Ohio, USA
- Biomedical Engineering Department, Case Western Reserve University, Cleveland, Ohio, USA
| |
Collapse
|
26
|
Uthayopas K, de Sá AG, Alavi A, Pires DE, Ascher DB. PRIMITI: A computational approach for accurate prediction of miRNA-target mRNA interaction. Comput Struct Biotechnol J 2024; 23:3030-3039. [PMID: 39175797 PMCID: PMC11340604 DOI: 10.1016/j.csbj.2024.06.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/20/2024] [Accepted: 06/23/2024] [Indexed: 08/24/2024] Open
Abstract
Current medical research has been demonstrating the roles of miRNAs in a variety of cellular mechanisms, lending credence to the association between miRNA dysregulation and multiple diseases. Understanding the mechanisms of miRNA is critical for developing effective diagnostic and therapeutic strategies. miRNA-mRNA interactions emerge as the most important mechanism to be understood despite their experimental validation constraints. Accordingly, several computational models have been developed to predict miRNA-mRNA interactions, albeit presenting limited predictive capabilities, poor characterisation of miRNA-mRNA interactions, and low usability. To address these drawbacks, we developed PRIMITI, a PRedictive model for the Identification of novel miRNA-Target mRNA Interactions. PRIMITI is a novel machine learning model that utilises CLIP-seq and expression data to characterise functional target sites in 3'-untranslated regions (3'-UTRs) and predict miRNA-target mRNA repression activity. The model was trained using a reliable negative sample selection approach and the robust extreme gradient boosting (XGBoost) model, which was coupled with newly introduced features, including sequence and genetic variation information. PRIMITI achieved an area under the receiver operating characteristic (ROC) curve (AUC) up to 0.96 for a prediction of functional miRNA-target site binding and 0.96 for a prediction of miRNA-target mRNA repression activity on cross-validation and an independent blind test. Additionally, the model outperformed state-of-the-art methods in recovering miRNA-target repressions in an unseen microarray dataset and in a collection of validated miRNA-mRNA interactions, highlighting its utility for preliminary screening. PRIMITI is available on a reliable, scalable, and user-friendly web server at https://biosig.lab.uq.edu.au/primiti.
Collapse
Affiliation(s)
- Korawich Uthayopas
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Alex G.C. de Sá
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, VIC 3010, Australia
| | - Azadeh Alavi
- School of Computational Technology, RMIT University, Melbourne, VIC 3000, Australia
| | - Douglas E.V. Pires
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville, VIC 3052, Australia
| | - David B. Ascher
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- Baker Department of Cardiometabolic Health, University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
27
|
Li M, Wu Y, Li B, Lu C, Jian G, Shang X, Chen H, Huang J, He B. ACVPICPred: Inhibitory activity prediction of anti-coronavirus peptides based on artificial neural network. Comput Struct Biotechnol J 2024; 23:3625-3633. [PMID: 39469670 PMCID: PMC11513478 DOI: 10.1016/j.csbj.2024.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 09/18/2024] [Accepted: 09/24/2024] [Indexed: 10/30/2024] Open
Abstract
Peptides, as small molecular compounds, exhibit prominent advantages in the inhibition of coronaviruses due to their safety, efficacy, and specificity, holding great promise as drugs against coronaviruses. The rapid and efficient determination of the activity of anti-coronavirus peptides (ACovPs) can greatly accelerate the development of drugs for treating coronavirus-related diseases. Hence, we present ACVPICPred, a computational model designed to predict the inhibitory activity of ACovPs based on their sequences and structural information. By leveraging bioinformatics tools AlphaFold3 for structural predictions and several feature extraction methods, the model integrates both sequence and structural features to enhance prediction accuracy. To address the limitations of existing datasets, we employed data augmentation techniques, including the introduction of noise and the SMOGN, to improve the model robustness. The model's performance was evaluated through five-fold cross-validation, achieving a Pearson correlation coefficient of 0.7668 (p < 0.05) and an R² of 0.5880 on the training dataset. Overall, in our study, compared to models that only use sequence features, models that combine structural features have achieved more robust results in various evaluation metrics. ACVPICPred is freely accessible at the following URL: http://i.uestc.edu.cn/acvpICPred/main/Main.php.
Collapse
Affiliation(s)
- Min Li
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| | - Yifei Wu
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| | - Bowen Li
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| | - Chunying Lu
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| | - Guifen Jian
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| | - Xing Shang
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| | - Heng Chen
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi‑Tech Zone, Chengdu 6173001, Sichuan, China
| | - Bifang He
- Medical College, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
- State Key Laboratory of Public Big Data, Guizhou University, Huaxi District, Guiyang 550025, Guizhou, China
| |
Collapse
|
28
|
Parvez A, Ali SD, Tayara H, Chong KT. Stacking based ensemble learning framework for identification of nitrotyrosine sites. Comput Biol Med 2024; 183:109200. [PMID: 39366143 DOI: 10.1016/j.compbiomed.2024.109200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 09/02/2024] [Accepted: 09/22/2024] [Indexed: 10/06/2024]
Abstract
Protein nitrotyrosine is an essential post-translational modification that results from the nitration of tyrosine amino acid residues. This modification is known to be associated with the regulation and characterization of several biological functions and diseases. Therefore, accurate identification of nitrotyrosine sites plays a significant role in the elucidating progress of associated biological signs. In this regard, we reported an accurate computational tool known as iNTyro-Stack for the identification of protein nitrotyrosine sites. iNTyro-Stack is a machine-learning model based on a stacking algorithm. The base classifiers in stacking are selected based on the highest performance. The feature map employed is a linear combination of the amino composition encoding schemes, including the composition of k-spaced amino acid pairs and tri-peptide composition. The recursive feature elimination technique is used for significant feature selection. The performance of the proposed method is evaluated using k-fold cross-validation and independent testing approaches. iNTyro-Stack achieved an accuracy of 86.3% and a Matthews correlation coefficient (MCC) of 72.6% in cross-validation. Its generalization capability was further validated on an imbalanced independent test set, where it attained an accuracy of 69.32%. iNTyro-Stack outperforms existing state-of-the-art methods across both evaluation techniques. The github repository is create to reproduce the method and results of iNTyro-Stack, accessible on: https://github.com/waleed551/iNTyro-Stack/.
Collapse
Affiliation(s)
- Aiman Parvez
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Syed Danish Ali
- Department of Electrical Engineering, The University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Pakistan; Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Hilal Tayara
- Department of International Science and Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Kil To Chong
- Department of International Science and Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju, 54896, South Korea
| |
Collapse
|
29
|
Xia H, Ji B, Qiao D, Peng S. CellMsg: graph convolutional networks for ligand-receptor-mediated cell-cell communication analysis. Brief Bioinform 2024; 26:bbae716. [PMID: 39800874 PMCID: PMC11725396 DOI: 10.1093/bib/bbae716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Revised: 12/04/2024] [Accepted: 12/27/2024] [Indexed: 01/16/2025] Open
Abstract
The role of cell-cell communications (CCCs) is increasingly recognized as being important to differentiation, invasion, metastasis, and drug resistance in tumoral tissues. Developing CCC inference methods using traditional experimental methods are time-consuming, labor-intensive, cannot handle large amounts of data. To facilitate inference of CCCs, we proposed a computational framework, called CellMsg, which involves two primary steps: identifying ligand-receptor interactions (LRIs) and measuring the strength of LRIs-mediated CCCs. Specifically, CellMsg first identifies high-confident LRIs based on multimodal features of ligands and receptors and graph convolutional networks. Then, CellMsg measures the strength of intercellular communication by combining the identified LRIs and single-cell RNA-seq data using a three-point estimation method. Performance evaluation on four benchmark LRI datasets by five-fold cross validation demonstrated that CellMsg accurately captured the relationships between ligands and receptors, resulting in the identification of high-confident LRIs. Compared with other methods of identifying LRIs, CellMsg has better prediction performance and robustness. Furthermore, the LRIs identified by CellMsg were successfully validated through molecular docking. Finally, we examined the overlap of LRIs between CellMsg and five other classical CCC databases, as well as the intercellular crosstalk among seven cell types within a human melanoma tissue. In summary, CellMsg establishes a complete, reliable, and well-organized LRI database and an effective CCC strength evaluation method for each single-cell RNA-seq data. It provides a computational tool allowing researchers to decipher intercellular communications. CellMsg is freely available at https://github.com/pengsl-lab/CellMsg.
Collapse
Affiliation(s)
- Hong Xia
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Boya Ji
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| | - Debin Qiao
- School of Computer and Artificial Intelligence, ZhengZhou University, Zhengzhou 450001, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
| |
Collapse
|
30
|
Li J, He S, Zhang J, Zhang F, Zou Q, Ni F. T4Seeker: a hybrid model for type IV secretion effectors identification. BMC Biol 2024; 22:259. [PMID: 39543674 PMCID: PMC11566746 DOI: 10.1186/s12915-024-02064-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 11/06/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND The type IV secretion system is widely present in various bacteria, such as Salmonella, Escherichia coli, and Helicobacter pylori. These bacteria use the type IV secretion system to secrete type IV secretion effectors, infect host cells, and disrupt or modulate the communication pathways. In this study, type III and type VI secretion effectors were used as negative samples to train a robust model. RESULTS The area under the curve of T4Seeker on the validation and independent test sets were 0.947 and 0.970, respectively, demonstrating the strong predictive capacity and robustness of T4Seeker. After comparing with the classic and state-of-the-art T4SE identification models, we found that T4Seeker, which is based on traditional features and large language model features, had a higher predictive ability. CONCLUSION The T4Seeker proposed in this study demonstrates superior performance in the field of T4SEs prediction. By integrating features at multiple levels, it achieves higher predictive accuracy and strong generalization capability, providing an effective tool for future T4SE research.
Collapse
Affiliation(s)
- Jing Li
- Department of Microbiology, University of Hong Kong, Hong Kong, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
- School of Biomedical Sciences, University of Hong Kong, Hong Kong, China
| | - Shida He
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
- The Joint Innovation Center for Engineering in Medicine, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, 324000, China
- Department of Respiratory and Critical Care, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, China
| | - Jian Zhang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
| | - Feng Zhang
- The Joint Innovation Center for Engineering in Medicine, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, 324000, China
- Department of Respiratory and Critical Care, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, Changchun, 130021, China.
| |
Collapse
|
31
|
Kang Y, Wang H, Qin Y, Liu G, Yu Y, Zhang Y. PSATF-6mA: an integrated learning fusion feature-encoded DNA-6 mA methylcytosine modification site recognition model based on attentional mechanisms. Front Genet 2024; 15:1498884. [PMID: 39600317 PMCID: PMC11588721 DOI: 10.3389/fgene.2024.1498884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 10/30/2024] [Indexed: 11/29/2024] Open
Abstract
DNA methylation is of crucial importance for biological genetic expression, such as biological cell differentiation and cellular tumours. The identification of DNA-6mA sites using traditional biological experimental methods requires more cumbersome steps and a large amount of time. The advent of neural network technology has facilitated the identification of 6 mA sites on cross-species DNA with enhanced efficacy. Nevertheless, the majority of contemporary neural network models for identifying 6 mA sites prioritize the design of the identification model, with comparatively limited research conducted on the statistically significant DNA sequence itself. Consequently, this paper will focus on the statistical strategy of DNA double-stranded features, utilising the multi-head self-attention mechanism in neural networks applied to DNA position probabilistic relationships. Furthermore, a new recognition model, PSATF-6 mA, will be constructed by continually adjusting the attentional tendency of feature fusion through an integrated learning framework. The experimental results, obtained through cross-validation with cross-species data, demonstrate that the PSATF-6 mA model outperforms the baseline model. The in-Matthews correlation coefficient (MCC) for the cross-species dataset of rice and m. musus genomes can reach a score of 0.982. The present model is expected to assist biologists in more accurately identifying 6 mA locus and in formulating new testable biological hypotheses.
Collapse
Affiliation(s)
- Yanmei Kang
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Hongyuan Wang
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Yubo Qin
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Guanlin Liu
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| | - Yi Yu
- College of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
| | - Yongjian Zhang
- School of Cyber Science and Engineering, University of International Relations, Beijing, China
| |
Collapse
|
32
|
Zhao C, Yan S, Li J. TPGPred: A Mixed-Feature-Driven Approach for Identifying Thermophilic Proteins Based on GradientBoosting. Int J Mol Sci 2024; 25:11866. [PMID: 39595936 PMCID: PMC11594102 DOI: 10.3390/ijms252211866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 11/01/2024] [Accepted: 11/03/2024] [Indexed: 11/28/2024] Open
Abstract
Thermophilic proteins maintain their stability and functionality under extreme high-temperature conditions, making them of significant importance in both fundamental biological research and biotechnological applications. In this study, we developed a machine learning-based thermophilic protein GradientBoosting prediction model, TPGPred, designed to predict thermophilic proteins by leveraging a large-scale dataset of both thermophilic and non-thermophilic protein sequences. By combining various machine learning algorithms with feature-engineering methods, we systematically evaluated the classification performance of the model, identifying the optimal feature combinations and classification models. Trained on a large public dataset of 5652 samples, TPGPred achieved an Accuracy score greater than 0.95 and an Area Under the Receiver Operating Characteristic Curve (AUROC) score greater than 0.98 on an independent test set of 627 samples. Our findings offer new insights into the identification and classification of thermophilic proteins and provide a solid foundation for their industrial application development.
Collapse
Affiliation(s)
- Cuihuan Zhao
- Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China;
| | - Shuan Yan
- Institute of Public Safety Research, Department of Engineering Physics, Tsinghua University, Beijing 100084, China
| | - Jiahang Li
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
33
|
Alban TJ, Riaz N, Parthasarathy P, Makarov V, Kendall S, Yoo SK, Shah R, Weinhold N, Srivastava R, Ma X, Krishna C, Mok JY, van Esch WJE, Garon E, Akerley W, Creelan B, Aanur N, Chowell D, Geese WJ, Rizvi NA, Chan TA. Neoantigen immunogenicity landscapes and evolution of tumor ecosystems during immunotherapy with nivolumab. Nat Med 2024; 30:3209-3222. [PMID: 39349627 PMCID: PMC12066197 DOI: 10.1038/s41591-024-03240-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 08/08/2024] [Indexed: 11/16/2024]
Abstract
Neoantigen immunoediting drives immune checkpoint blockade efficacy, yet the molecular features of neoantigens and how neoantigen immunogenicity shapes treatment response remain poorly understood. To address these questions, 80 patients with non-small cell lung cancer were enrolled in the biomarker cohort of CheckMate 153 (CA209-153), which collected radiographic guided biopsy samples before treatment and during treatment with nivolumab. Early loss of mutations and neoantigens during therapy are both associated with clinical benefit. We examined 1,453 candidate neoantigens, including many of which that had reduced cancer cell fraction after treatment with nivolumab, and identified 196 neopeptides that were recognized by T cells. Mapping these neoantigens to clonal dynamics, evolutionary trajectories and clinical response revealed a strong selection against immunogenic neoantigen-harboring clones. We identified position-specific amino acid and physiochemical features related to immunogenicity and developed an immunogenicity score. Nivolumab-induced microenvironmental evolution in non-small cell lung cancer shared some similarities with melanoma, yet critical differences were apparent. This study provides unprecedented molecular portraits of neoantigen landscapes underlying nivolumab's mechanism of action.
Collapse
Affiliation(s)
- Tyler J Alban
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Nadeem Riaz
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Prerana Parthasarathy
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Vladimir Makarov
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Sviatoslav Kendall
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Seong-Keun Yoo
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
| | - Rachna Shah
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Nils Weinhold
- Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Raghvendra Srivastava
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Xiaoxiao Ma
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
| | | | | | | | - Edward Garon
- Department of Thoracic Medical Oncology, University of California Los Angeles, Los Angeles, CA, USA
| | - Wallace Akerley
- Department of Internal Medicine, University of Utah, Salt Lake City, UT, USA
| | - Benjamin Creelan
- Department of Thoracic Oncology, Moffitt Cancer Center, Tampa, FL, USA
| | | | - Diego Chowell
- Precision Immunology Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | - Naiyer A Rizvi
- Synthekine, Menlo Park, CA, USA
- Thoracic Oncology, Columbia University, New York, NY, USA
| | - Timothy A Chan
- Center for Immunotherapy and Precision Immuno-Oncology, Cleveland Clinic, Cleveland, OH, USA.
- Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA.
- Taussig Cancer Institute, Cleveland Clinic, Cleveland, OH, USA.
- National Center for Regenerative Medicine, Cleveland Clinic, Cleveland, OH, USA.
| |
Collapse
|
34
|
Fu X, Duan H, Zang X, Liu C, Li X, Zhang Q, Zhang Z, Zou Q, Cui F. Hyb_SEnc: An Antituberculosis Peptide Predictor Based on a Hybrid Feature Vector and Stacked Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1897-1910. [PMID: 39083393 DOI: 10.1109/tcbb.2024.3425644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
Tuberculosis has plagued mankind since ancient times, and the struggle between humans and tuberculosis continues. Mycobacterium tuberculosis is the leading cause of tuberculosis, infecting nearly one-third of the world's population. The rise of peptide drugs has created a new direction in the treatment of tuberculosis. Therefore, for the treatment of tuberculosis, the prediction of anti-tuberculosis peptides is crucial. This paper proposes an anti-tuberculosis peptide prediction method based on hybrid features and stacked ensemble learning. First, a random forest (RF) and extremely randomized tree (ERT) are selected as first-level learning of stacked ensembles. Then, the five best-performing feature encoding methods are selected to obtain the hybrid feature vector, and then the decision tree and recursive feature elimination (DT-RFE) are used to refine the hybrid feature vector. After selection, the optimal feature subset is used as the input of the stacked ensemble model. At the same time, logistic regression (LR) is used as a stacked ensemble secondary learner to build the final stacked ensemble model Hyb_SEnc. The prediction accuracy of Hyb_SEnc achieved 94.68% and 95.74% on the independent test sets of AntiTb_MD and AntiTb_RD, respectively.
Collapse
|
35
|
Ahmed Z, Shahzadi K, Jin Y, Li R, Momanyi BM, Zulfiqar H, Ning L, Lin H. Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy. Proteomics 2024; 24:e2400044. [PMID: 38824664 DOI: 10.1002/pmic.202400044] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 05/03/2024] [Accepted: 05/21/2024] [Indexed: 06/04/2024]
Abstract
RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.
Collapse
Affiliation(s)
- Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Kiran Shahzadi
- Department of Biotechnology, Women University of Azad Jammu and Kashmir Bagh, Bagh, Azad Kashmir, Pakistan
| | - Yanting Jin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Rui Li
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Biffon Manyura Momanyi
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
| | - Lin Ning
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, China
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
36
|
Nielsen SDH, Liang N, Rathish H, Kim BJ, Lueangsakulthai J, Koh J, Qu Y, Schulz HJ, Dallas DC. Bioactive milk peptides: an updated comprehensive overview and database. Crit Rev Food Sci Nutr 2024; 64:11510-11529. [PMID: 37504497 PMCID: PMC10822030 DOI: 10.1080/10408398.2023.2240396] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Partial digestion of milk proteins leads to the formation of numerous bioactive peptides. Previously, our research team thoroughly examined the decades of existing literature on milk bioactive peptides across species to construct the milk bioactive peptide database (MBPDB). Herein, we provide a comprehensive update to the data within the MBPDB and a review of the current state of research for each functional category from in vitro to animal and clinical studies, including angiotensin-converting enzyme (ACE)-inhibitory, antimicrobial, antioxidant, dipeptidyl peptidase (DPP)-IV inhibitory, opioid, anti-inflammatory, immunomodulatory, calcium absorption and bone health and anticancer activity. This information will help drive future research on the bioactivities of milk peptides.
Collapse
Affiliation(s)
| | - Ningjian Liang
- Nutrition Program, College of Health, Oregon State University, Corvallis, Oregon, USA
| | - Harith Rathish
- Department of Computer Science, Aarhus University, Aarhus, Denmark
| | - Bum Jin Kim
- Nutrition Program, College of Health, Oregon State University, Corvallis, Oregon, USA
| | | | - Jeewon Koh
- Nutrition Program, College of Health, Oregon State University, Corvallis, Oregon, USA
| | - Yunyao Qu
- Nutrition Program, College of Health, Oregon State University, Corvallis, Oregon, USA
| | - Hans-Jörg Schulz
- Department of Computer Science, Aarhus University, Aarhus, Denmark
| | - David C. Dallas
- Nutrition Program, College of Health, Oregon State University, Corvallis, Oregon, USA
| |
Collapse
|
37
|
Breimann S, Frishman D. AAclust: k-optimized clustering for selecting redundancy-reduced sets of amino acid scales. BIOINFORMATICS ADVANCES 2024; 4:vbae165. [PMID: 39544628 PMCID: PMC11562964 DOI: 10.1093/bioadv/vbae165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/10/2024] [Accepted: 10/23/2024] [Indexed: 11/17/2024]
Abstract
Summary Amino acid scales are crucial for sequence-based protein prediction tasks, yet no gold standard scale set or simple scale selection methods exist. We developed AAclust, a wrapper for clustering models that require a pre-defined number of clusters k, such as k-means. AAclust obtains redundancy-reduced scale sets by clustering and selecting one representative scale per cluster, where k can either be optimized by AAclust or defined by the user. The utility of AAclust scale selections was assessed by applying machine learning models to 24 protein benchmark datasets. We found that top-performing scale sets were different for each benchmark dataset and significantly outperformed scale sets used in previous studies. Noteworthy is the strong dependence of the model performance on the scale set size. AAclust enables a systematic optimization of scale-based feature engineering in machine learning applications. Availability and implementation The AAclust algorithm is part of AAanalysis, a Python-based framework for interpretable sequence-based protein prediction, which is documented and accessible at https://aaanalysis.readthedocs.io/en/latest and https://github.com/breimanntools/aaanalysis.
Collapse
Affiliation(s)
- Stephan Breimann
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
- Division of Metabolic Biochemistry, Biomedical Center (BMC), LMU Munich, Munich, 81377, Germany
- Biochemistry of γ-Secretase, German Center for Neurodegenerative Diseases (DZNE), Munich, 81377, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich (TUM), Freising, 85354, Germany
| |
Collapse
|
38
|
Breimann S, Kamp F, Steiner H, Frishman D. AAontology: An Ontology of Amino Acid Scales for Interpretable Machine Learning. J Mol Biol 2024; 436:168717. [PMID: 39053689 DOI: 10.1016/j.jmb.2024.168717] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/15/2024] [Accepted: 07/19/2024] [Indexed: 07/27/2024]
Abstract
Amino acid scales are crucial for protein prediction tasks, many of them being curated in the AAindex database. Despite various clustering attempts to organize them and to better understand their relationships, these approaches lack the fine-grained classification necessary for satisfactory interpretability in many protein prediction problems. To address this issue, we developed AAontology-a two-level classification for 586 amino acid scales (mainly from AAindex) together with an in-depth analysis of their relations-using bag-of-word-based classification, clustering, and manual refinement over multiple iterations. AAontology organizes physicochemical scales into 8 categories and 67 subcategories, enhancing the interpretability of scale-based machine learning methods in protein bioinformatics. Thereby it enables researchers to gain a deeper biological insight. We anticipate that AAontology will be a building block to link amino acid properties with protein function and dysfunctions as well as aid informed decision-making in mutation analysis or protein drug design.
Collapse
Affiliation(s)
- Stephan Breimann
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich, Freising, Germany; Ludwig-Maximilians-University Munich, Biomedical Center, Division of Metabolic Biochemistry, Munich, Germany; German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Frits Kamp
- Ludwig-Maximilians-University Munich, Biomedical Center, Division of Metabolic Biochemistry, Munich, Germany
| | - Harald Steiner
- Ludwig-Maximilians-University Munich, Biomedical Center, Division of Metabolic Biochemistry, Munich, Germany; German Center for Neurodegenerative Diseases (DZNE), Munich, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, School of Life Sciences, Technical University of Munich, Freising, Germany.
| |
Collapse
|
39
|
Feng C, Wei H, Xu C, Feng B, Zhu X, Liu J, Zou Q. iProps: A Comprehensive Software Tool for Protein Classification and Analysis With Automatic Machine Learning Capabilities and Model Interpretation Capabilities. IEEE J Biomed Health Inform 2024; 28:6237-6247. [PMID: 39008396 DOI: 10.1109/jbhi.2024.3425716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Protein classification is a crucial field in bioinformatics. The development of a comprehensive tool that can perform feature evaluation, visualization, automated machine learning, and model interpretation would significantly advance research in protein classification. However, there is a significant gap in the literature regarding tools that integrate all these essential functionalities. This paper presents iProps, a novel Python-based software package, meticulously crafted to fulfill these multifaceted requirements. iProps is distinguished by its proficiency in feature extraction, evaluation, automated machine learning, and interpretation of classification models. Firstly, iProps fully leverages evolutionary information and amino acid reduction information to propose or extend several numerical protein features that are independent of sequence length, including SC-PSSM, ORDip, TRC, CTDC-E, CKSAAGP-E, and so forth; at the same time, it also implements the calculation of 17 other numerical features within the software. iProps also provides feature combination operations for the aforementioned features to generate more hybrid features, and has added data balancing sampling processing as well as built-in classifier settings, among other functionalities. Thus, It can discern the most effective protein class recognition feature from a multitude of candidates, utilizing three automated machine learning algorithms to identify the most optimal classifiers and parameter settings. Furthermore, iProps generates a detailed explanatory report that includes 23 informative graphs derived from three interpretable models. To assess the performance of iProps, a series of numerical experiments were conducted using two well-established datasets. The results demonstrated that our software achieved superior recognition performance in every case. Beyond its contributions to bioinformatics, iProps broadens its applicability by offering robust data analysis tools that are beneficial across various disciplines, capitalizing on its automated machine learning and model interpretation capabilities. As an open-source platform, iProps is readily accessible and features an intuitive user interface, ensuring ease of use for individuals, even those without a background in programming.
Collapse
|
40
|
Wen J, Ding Z, Wei Z, Xia H, Zhang Y, Zhu X. NeuroPpred-SHE: An interpretable neuropeptides prediction model based on selected features from hand-crafted features and embeddings of T5 model. Comput Biol Med 2024; 181:109048. [PMID: 39182368 DOI: 10.1016/j.compbiomed.2024.109048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 08/13/2024] [Accepted: 08/18/2024] [Indexed: 08/27/2024]
Abstract
Neuropeptides are the most ubiquitous neurotransmitters in the immune system, regulating various biological processes. Neuropeptides play a significant role for the discovery of new drugs and targets for nervous system disorders. Traditional experimental methods for identifying neuropeptides are time-consuming and costly. Although several computational methods have been developed to predict the neuropeptides, the accuracy is still not satisfactory due to the representability of the extracted features. In this work, we propose an efficient and interpretable model, NeuroPpred-SHE, for predicting neuropeptides by selecting the optimal feature subset from both hand-crafted features and embeddings of a protein language model. Specially, we first employed a pre-trained T5 protein language model to extract embedding features and twelve other encoding methods to extract hand-crafted features from peptide sequences, respectively. Secondly, we fused both embedding features and hand-crafted features to enhance the feature representability. Thirdly, we utilized random forest (RF), Max-Relevance and Min-Redundancy (mRMR) and eXtreme Gradient Boosting (XGBoost) methods to select the optimal feature subset from the fused features. Finally, we employed five machine learning methods (GBDT, XGBoost, SVM, MLP, and LightGBM) to build the models. Our results show that the model based on GBDT achieves the best performance. Furthermore, our final model was compared with other state-of-the-art methods on an independent test set, the results indicate that our model achieves an AUROC of 97.8 % which is higher than all the other state-of-the-art predictors. Our model is available at: https://github.com/wenjean/NeuroPpred-SHE.
Collapse
Affiliation(s)
- Jian Wen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Zhijie Ding
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Zhuoyu Wei
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Hongwei Xia
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Yong Zhang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
| |
Collapse
|
41
|
Qin Z, Ren H, Zhao P, Wang K, Liu H, Miao C, Du Y, Li J, Wu L, Chen Z. Current computational tools for protein lysine acylation site prediction. Brief Bioinform 2024; 25:bbae469. [PMID: 39316944 PMCID: PMC11421846 DOI: 10.1093/bib/bbae469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/20/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024] Open
Abstract
As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.
Collapse
Affiliation(s)
- Zhaohui Qin
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Haoran Ren
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Kaiyuan Wang
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Huixia Liu
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Chunbo Miao
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanxiu Du
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Liuji Wu
- National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
42
|
Wei T, Lu C, Du H, Yang Q, Qi X, Liu Y, Zhang Y, Chen C, Li Y, Tang Y, Zhang WH, Tao X, Jiang N. DeepPBI-KG: a deep learning method for the prediction of phage-bacteria interactions based on key genes. Brief Bioinform 2024; 25:bbae484. [PMID: 39344712 PMCID: PMC11440089 DOI: 10.1093/bib/bbae484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/18/2024] [Accepted: 09/13/2024] [Indexed: 10/01/2024] Open
Abstract
Phages, the natural predators of bacteria, were discovered more than 100 years ago. However, increasing antimicrobial resistance rates have revitalized phage research. Methods that are more time-consuming and efficient than wet-laboratory experiments are needed to help screen phages quickly for therapeutic use. Traditional computational methods usually ignore the fact that phage-bacteria interactions are achieved by key genes and proteins. Methods for intraspecific prediction are rare since almost all existing methods consider only interactions at the species and genus levels. Moreover, most strains in existing databases contain only partial genome information because whole-genome information for species is difficult to obtain. Here, we propose a new approach for interaction prediction by constructing new features from key genes and proteins via the application of K-means sampling to select high-quality negative samples for prediction. Finally, we develop DeepPBI-KG, a corresponding prediction tool based on feature selection and a deep neural network. The results show that the average area under the curve for prediction reached 0.93 for each strain, and the overall AUC and area under the precision-recall curve reached 0.89 and 0.92, respectively, on the independent test set; these values are greater than those of other existing prediction tools. The forward and reverse validation results indicate that key genes and key proteins regulate and influence the interaction, which supports the reliability of the model. In addition, intraspecific prediction experiments based on Klebsiella pneumoniae data demonstrate the potential applicability of DeepPBI-KG for intraspecific prediction. In summary, the feature engineering and interaction prediction approaches proposed in this study can effectively improve the robustness and stability of interaction prediction, can achieve high generalizability, and may provide new directions and insights for rapid phage screening for therapy.
Collapse
Affiliation(s)
- Tongqing Wei
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Chenqi Lu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Hanxiao Du
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Qianru Yang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Xin Qi
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
| | - Yankun Liu
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
| | - Yi Zhang
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Chen Chen
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Yutong Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Yuanhao Tang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
| | - Wen-Hong Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Xu Tao
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| | - Ning Jiang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, No. 2005 Songhu Road, Shanghai, 200433, China
- Shanghai Sci-Tech Inno Center for Infection & Immunity, No. 1688 Guoquan Bei Road, Shanghai, China
- Department of Infectious Diseases, Huashan Hospital, Shanghai Medical College, Fudan Univerisy, No. 12 Wulumuqi Zhong Road, Shanghai, China
| |
Collapse
|
43
|
Xu J, Gao Y, Lu Q, Zhang R, Gui J, Liu X, Yue Z. RiceSNP-BST: a deep learning framework for predicting biotic stress-associated SNPs in rice. Brief Bioinform 2024; 25:bbae599. [PMID: 39562160 PMCID: PMC11576077 DOI: 10.1093/bib/bbae599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 10/07/2024] [Accepted: 11/04/2024] [Indexed: 11/21/2024] Open
Abstract
Rice consistently faces significant threats from biotic stresses, such as fungi, bacteria, pests, and viruses. Consequently, accurately and rapidly identifying previously unknown single-nucleotide polymorphisms (SNPs) in the rice genome is a critical challenge for rice research and the development of resistant varieties. However, the limited availability of high-quality rice genotype data has hindered this research. Deep learning has transformed biological research by facilitating the prediction and analysis of SNPs in biological sequence data. Convolutional neural networks are especially effective in extracting structural and local features from DNA sequences, leading to significant advancements in genomics. Nevertheless, the expanding catalog of genome-wide association studies provides valuable biological insights for rice research. Expanding on this idea, we introduce RiceSNP-BST, an automatic architecture search framework designed to predict SNPs associated with rice biotic stress traits (BST-associated SNPs) by integrating multidimensional features. Notably, the model successfully innovates the datasets, offering more precision than state-of-the-art methods while demonstrating good performance on an independent test set and cross-species datasets. Additionally, we extracted features from the original DNA sequences and employed causal inference to enhance the biological interpretability of the model. This study highlights the potential of RiceSNP-BST in advancing genome prediction in rice. Furthermore, a user-friendly web server for RiceSNP-BST (http://rice-snp-bst.aielab.cc) has been developed to support broader genome research.
Collapse
Affiliation(s)
- Jiajun Xu
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Yujia Gao
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Quan Lu
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Renyi Zhang
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Jianfeng Gui
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Xiaoshuang Liu
- Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
- Research Center for Biological Breeding Technology, Advance Academy, Anhui Agricultural University, 130, Changjiang West Road, Hefei, Anhui Province 230036, China
| |
Collapse
|
44
|
Chung CR, Chien CY, Tang Y, Wu LC, Hsu JBK, Lu JJ, Lee TY, Bai C, Horng JT. An ensemble deep learning model for predicting minimum inhibitory concentrations of antimicrobial peptides against pathogenic bacteria. iScience 2024; 27:110718. [PMID: 39262770 PMCID: PMC11388163 DOI: 10.1016/j.isci.2024.110718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 07/09/2024] [Accepted: 08/08/2024] [Indexed: 09/13/2024] Open
Abstract
The rise of antibiotic resistance necessitates effective alternative therapies. Antimicrobial peptides (AMPs) are promising due to their broad inhibitory effects. This study focuses on predicting the minimum inhibitory concentration (MIC) of AMPs against whom-priority pathogens: Staphylococcus aureus ATCC 25923, Escherichia coli ATCC 25922, and Pseudomonas aeruginosa ATCC 27853. We developed a comprehensive regression model integrating AMP sequence-based and genomic features. Using eight AI-based architectures, including deep learning with protein language model embeddings, we created an ensemble model combining bi-directional long short-term memory (BiLSTM), convolutional neural network (CNN), and multi-branch model (MBM). The ensemble model showed superior performance with Pearson correlation coefficients of 0.756, 0.781, and 0.802 for the bacterial strains, demonstrating its accuracy in predicting MIC values. This work sets a foundation for future studies to enhance model performance and advance AMP applications in combating antibiotic resistance.
Collapse
Affiliation(s)
- Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Chung-Yu Chien
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Yun Tang
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Li-Ching Wu
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Jang-Jih Lu
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
- School of Medicine, Chang Gung University, Taoyuan City, Taiwan
- Department of Medical Biotechnology and Laboratory Science, Chang Gung University, Taoyuan City, Taiwan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Center for Intelligent Drug Systems and Smart Biodevices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
| | - Chen Bai
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), Shenzhen 518172, China
| | - Jorng-Tzong Horng
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan City, Taiwan
| |
Collapse
|
45
|
Nasir S, Anwer F, Ishaq Z, Saeed MT, Ali A. VacSol-ML(ESKAPE) : Machine learning empowering vaccine antigen prediction for ESKAPE pathogens. Vaccine 2024; 42:126204. [PMID: 39126830 DOI: 10.1016/j.vaccine.2024.126204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 07/29/2024] [Accepted: 08/01/2024] [Indexed: 08/12/2024]
Abstract
The ESKAPE family, comprising Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter spp., poses a significant global threat due to their heightened virulence and extensive antibiotic resistance. These pathogens contribute largely to the prevalence of nosocomial or hospital-acquired infections, resulting in high morbidity and mortality rates. To tackle this healthcare problem urgent measures are needed, including development of innovative vaccines and therapeutic strategies. Designing vaccines involves a complex and resource-intensive process of identifying protective antigens and potential vaccine candidates (PVCs) from pathogens. Reverse vaccinology (RV), an approach based on genomics, made this process more efficient by leveraging bioinformatics tools to identify potential vaccine candidates. In recent years, artificial intelligence and machine learning (ML) techniques has shown promise in enhancing the accuracy and efficiency of reverse vaccinology. This study introduces a supervised ML classification framework, to predict potential vaccine candidates specifically against ESKAPE pathogens. The model's training utilized biological and physicochemical properties from a dataset containing protective antigens and non-protective proteins of ESKAPE pathogens. Conventional autoencoders based strategy was employed for feature encoding and selection. During the training process, seven machine learning algorithms were trained and subjected to Stratified 5-fold Cross Validation. Random Forest and Logistic Regression exhibited best performance in various metrics including accuracy, precision, recall, WF1 score, and Area under the curve. An ensemble model was developed, to take collective strengths of both the algorithms. To assess efficacy of our final ensemble model, a high-quality benchmark dataset was employed. VacSol-ML(ESKAPE) demonstrated outstanding discrimination between protective vaccine candidates (PVCs) and non-protective antigens. VacSol-ML(ESKAPE), proves to be an invaluable tool in expediting vaccine development for these pathogens. Accessible to the public through both a web server and standalone version, it encourages collaborative research. The web-based and standalone tools are available at http://vacsolml.mgbio.tech/.
Collapse
Affiliation(s)
- Samavi Nasir
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Farha Anwer
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Zaara Ishaq
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Muhammad Tariq Saeed
- School of Interdisciplinary Engineering & Science (SINES), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Amjad Ali
- Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan; MGBIO (SMC Private) Ltd, National Science & Technology Park (NSTP), NUST Campus Sector H-12, Islamabad, Pakistan.
| |
Collapse
|
46
|
Qin Z, Liu H, Zhao P, Wang K, Ren H, Miao C, Li J, Chen YZ, Chen Z. SLAM: Structure-aware lysine β-hydroxybutyrylation prediction with protein language model. Int J Biol Macromol 2024; 280:135741. [PMID: 39293623 DOI: 10.1016/j.ijbiomac.2024.135741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/13/2024] [Accepted: 09/15/2024] [Indexed: 09/20/2024]
Abstract
Post-translational modifications (PTMs) diversify protein functions by adding/removing chemical groups to certain amino acid. As a newly-reported PTM, lysine β-hydroxybutyrylation (Kbhb) presents a new avenue to functional proteomics. Therefore, accurate and efficient prediction of Kbhb sites is imperative. However, the current experimental methods for identifying PTM sites are often expensive and time-consuming. Up to now, there is no computational method proposed for Kbhb sites detection. To this end, we present the first deep learning-based method, termed SLAM, to in silico identify lysine β-hydroxybutyrylation sites. The performance of SLAM is evaluated on both 5-fold cross-validation and independent test, achieving 0.890, 0.899, 0.907 and 0.923 in terms of AUROC values, on the general and species-specific independent test sets, respectively. As one example, we predicted the potential Kbhb sites in human S-adenosyl-L-homocysteine hydrolase, which is in agreement with experimentally-verified Kbhb sites. In summary, our method could enable accurate and efficient characterization of novel Kbhb sites that are crucial for the function and stability of proteins and could be applied in the structure-guided identification of other important PTM sites. The SLAM online service and source code is available at https://ai4bio.online/SLAM and https://github.com/Gabriel-QIN/SLAM, respectively.
Collapse
Affiliation(s)
- Zhaohui Qin
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Huixia Liu
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Kaiyuan Wang
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Haoran Ren
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Chunbo Miao
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China.
| | - Yong-Zi Chen
- Key Laboratory of Cancer Prevention and Therapy, Tianjin 300060, China; Laboratory of Tumor Cell Biology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Tianjin's Clinical Research Center for Cancer, Tianjin 300060, China.
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China.
| |
Collapse
|
47
|
Yue J, Xu J, Li T, Li Y, Chen Z, Liang S, Liu Z, Wang Y. Discovery of potential antidiabetic peptides using deep learning. Comput Biol Med 2024; 180:109013. [PMID: 39137670 DOI: 10.1016/j.compbiomed.2024.109013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/01/2024] [Accepted: 08/08/2024] [Indexed: 08/15/2024]
Abstract
Antidiabetic peptides (ADPs), peptides with potential antidiabetic activity, hold significant importance in the treatment and control of diabetes. Despite their therapeutic potential, the discovery and prediction of ADPs remain challenging due to limited data, the complex nature of peptide functions, and the expensive and time-consuming nature of traditional wet lab experiments. This study aims to address these challenges by exploring methods for the discovery and prediction of ADPs using advanced deep learning techniques. Specifically, we developed two models: a single-channel CNN and a three-channel neural network (CNN + RNN + Bi-LSTM). ADPs were primarily gathered from the BioDADPep database, alongside thousands of non-ADPs sourced from anticancer, antibacterial, and antiviral peptide datasets. Subsequently, data preprocessing was performed with the evolutionary scale model (ESM-2), followed by model training and evaluation through 10-fold cross-validation. Furthermore, this work collected a series of newly published ADPs as an independent test set through literature review, and found that the CNN model achieved the highest accuracy (90.48 %) in predicting the independent test set, surpassing existing ADP prediction tools. Finally, the application of the model was considered. SeqGAN was used to generate new candidate ADPs, followed by screening with the constructed CNN model. Selected peptides were then evaluated using physicochemical property prediction and structural forecasts for pharmaceutical potential. In summary, this study not only established robust ADP prediction models but also employed these models to screen a batch of potential ADPs, addressing a critical need in the field of peptide-based antidiabetic research.
Collapse
Affiliation(s)
- Jianda Yue
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Jiawei Xu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Tingting Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Zihui Chen
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China.
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China.
| |
Collapse
|
48
|
Wang Q, Ge R, Wang C, Elazab A, Fang Q, Zhang R. TDFFM: Transformer and Deep Forest Fusion Model for Predicting Coronavirus 3C-Like Protease Cleavage Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1231-1241. [PMID: 38498765 DOI: 10.1109/tcbb.2024.3378470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
COVID-19, caused by the highly contagious SARS-CoV-2 virus, is distinguished by its positive-sense, single-stranded RNA genome. A thorough understanding of SARS-CoV-2 pathogenesis is crucial for halting its proliferation. Notably, the 3C-like protease of the coronavirus (denoted as 3CLpro) is instrumental in the viral replication process. Precise delineation of 3CLpro cleavage sites is imperative for elucidating the transmission dynamics of SARS-CoV-2. While machine learning tools have been deployed to identify potential 3CLpro cleavage sites, these existing methods often fall short in terms of accuracy. To improve the performances of these predictions, we propose a novel analytical framework, the Transformer and Deep Forest Fusion Model (TDFFM). Within TDFFM, we utilize the AAindex and the BLOSUM62 matrix to encode protein sequences. These encoded features are subsequently input into two distinct components: a Deep Forest, which is an effective decision tree ensemble methodology, and a Transformer equipped with a Multi-Level Attention Model (TMLAM). The integration of the attention mechanism allows our model to more accurately identify positive samples, thus enhancing the overall predictive performance. Evaluation on a test set demonstrates that our TDFFM achieves an accuracy of 0.955, an AUC of 0.980, and an F1-score of 0.367, substantiating the model's superior prediction capabilities.
Collapse
|
49
|
Zhang J, Qian J. Advances in Computational Intelligence-Based Methods of Structure and Function Prediction of Proteins. Biomolecules 2024; 14:1083. [PMID: 39334850 PMCID: PMC11430421 DOI: 10.3390/biom14091083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2024] [Accepted: 08/26/2024] [Indexed: 09/30/2024] Open
Abstract
Proteins serve as the building blocks of life and play essential roles in almost every cellular process [...].
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China;
| | | |
Collapse
|
50
|
Guevara-Barrientos D, Kaundal R. Malivhu: A Comprehensive Bioinformatics Resource for Filtering SARS and MERS Virus Proteins by Their Classification, Family and Species, and Prediction of Their Interactions Against Human Proteins. Bioinform Biol Insights 2024; 18:11779322241263671. [PMID: 39148721 PMCID: PMC11325310 DOI: 10.1177/11779322241263671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 06/04/2024] [Indexed: 08/17/2024] Open
Abstract
COVID 19 pandemic is still ongoing, having taken more than 6 million human lives with it, and it seems that the world will have to learn how to live with the virus around. In consequence, there is a need to develop different treatments against it, not only with vaccines, but also new medicines. To do this, human-virus protein-protein interactions (PPIs) play a key part in drug-target discovery, but finding them experimentally can be either costly or sometimes unreliable. Therefore, computational methods arose as a powerful alternative to predict these interactions, reducing costs and helping researchers confirm only certain interactions instead of trying all possible combinations in the laboratory. Malivhu is a tool that predicts human-virus PPIs through a 4-phase process using machine learning models, where phase 1 filters ssRNA(+) class virus proteins, phase 2 filters Coronaviridae family proteins and phase 3 filters severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) species proteins, and phase 4 predicts human-SARS-CoV/SARS-CoV-2/MERS protein-protein interactions. The performance of the models was measured with Matthews correlation coefficient, F1-score, specificity, sensitivity, and accuracy scores, getting accuracies of 99.07%, 99.83%, and 100% for the first 3 phases, respectively, and 94.24% for human-SARS-CoV PPI, 94.50% for human-SARS-CoV-2 PPI, and 95.45% for human-MERS PPI on independent testing. All the prediction models developed for each of the 4 phases were implemented as web server which is freely available at https://kaabil.net/malivhu/.
Collapse
Affiliation(s)
- David Guevara-Barrientos
- Department of Computer Science, College of Science, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
| | - Rakesh Kaundal
- Department of Computer Science, College of Science, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
- Department of Plants, Soils & Climate, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
| |
Collapse
|