1
|
Le VT, Yuune JPT, Vu TTP, Malik MS, Ou YY. DeepCR: predicting cytokine receptor proteins through pretrained language models and deep learning networks. J Biomol Struct Dyn 2025:1-18. [PMID: 40448687 DOI: 10.1080/07391102.2025.2512448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2025] [Accepted: 05/21/2025] [Indexed: 06/02/2025]
Abstract
Cytokine receptors play a pivotal role in mediating the immune response and are critical in cytokine storms, which underlie the pathogenesis of conditions such as acute respiratory distress syndrome (ARDS) and autoimmune disorders. Identifying cytokine receptors is essential for understanding their biological functions, exploring therapeutic targets, and guiding clinical interventions. Traditional biochemical methods to identify cytokine receptors are labor-intensive, costly, and time-consuming, prompting the need for more efficient alternatives. Recent advances in computational biology have enabled the use of machine learning to classify cytokine receptor proteins. Most existing approaches focused on homologous features and protein composition to classify cytokine families, but no dedicated studies have been conducted on cytokine receptor proteins. This gap presents an opportunity to develop a method specifically for classifying cytokine receptors among other membrane proteins. In this study, we present a novel classification framework combining pre-trained language models (PLMs) with a multi-window convolutional neural network (mCNN) architecture for the fast and accurate identification of cytokine receptor proteins. PLMs, such as ProtTrans and ESM variants, capture biochemical context directly from raw protein sequences, while mCNN efficiently extracts local and global sequence patterns using convolutional layers with varying window sizes. Our model achieved an AUC of 0.96 in the training as well as 0.97 and 0.93 in two independent tests, demonstrating its effectiveness in distinguishing cytokine receptors from non-cytokine receptor proteins. By eliminating the need for manual feature extraction, this approach offers a robust and scalable solution for protein classification, paving the way for its application in drug discovery and understanding cytokine-mediated diseases.
Collapse
Affiliation(s)
- Van The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
| | | | - Thi Thu Phuong Vu
- Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, Taiwan
| | - Muhammad Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
- Department of Computer Sciences, Karakoram International University, Gilgit-Baltistan, Pakistan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
- Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, Taiwan
| |
Collapse
|
2
|
Arslan N, Eggeling R, Reuter B, Van Leathem K, Pingarilho M, Gomes P, Sönnerborg A, Kaiser R, Zazzi M, Pfeifer N. HIV multidrug class resistance prediction with a time sliding anchor approach. BIOINFORMATICS ADVANCES 2025; 5:vbaf099. [PMID: 40421422 PMCID: PMC12104520 DOI: 10.1093/bioadv/vbaf099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Revised: 04/16/2025] [Accepted: 04/25/2025] [Indexed: 05/28/2025]
Abstract
Motivation The emergence of multidrug class resistance (MDR) in Human Immunodeficiency Virus (HIV) is a rare but significant challenge in antiretroviral therapy (ART). MDR, which may arise from prolonged drug exposure, treatment failures, or transmission of resistant strains, accelerates disease progression and poses particular challenges in resource-limited settings with restricted access to resistance testing and advanced therapies. Early prediction of future MDR development is important to inform therapeutic decisions and mitigate its occurrence. Results In this study, we employ various machine learning classifiers to predict future resistance to all four major antiretroviral drug classes using features extracted from clinical HIV sequence data. We systematically explore several variations of the problem that differ in the pre-existing resistance level and the temporal gap between sample collection and observed MDR occurrence. Our models show the ability to predict multidrug class resistance even in the most challenging variations, albeit at a reduced accuracy. Feature importance analysis reveals that our models primarily utilize known drug resistance mutations for easier classification tasks, but rely on new mutations for the difficult task of distinguishing four class drug resistance from three class drug resistance. Availability and implementation All analysis was performed using the Euresist Integrated DataBase (EIDB). Researchers wishing to reproduce, validate or extend these findings can request access to the latest EIDB release via the Euresist Network.
Collapse
Affiliation(s)
- Nurhan Arslan
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen 72076, Germany
| | - Ralf Eggeling
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen 72076, Germany
| | - Bernhard Reuter
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen 72076, Germany
| | - Kristel Van Leathem
- Laboratory of Clinical and Epidemiological Virology, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven, Leuven 3000, Belgium
| | - Marta Pingarilho
- Global Health and Tropical Medicine, GHTM, Associate Laboratory in Translation and Innovation towards Global Health, LA-REAL, Instituto de Higiene e Medicina Tropical, IHMT, Universidade NOVA de Lisboa, Lisbon 1349-008, Portugal
| | - Perpétua Gomes
- Laboratório de Biologia Molecular, LMCBM, SPC, Unidade Local de Saúde Lisboa Ocidental, Hospital Egas Moniz, Caparica 2829-511, Portugal
- Egas Moniz Center for Interdisciplinary Research (CiiEM), Egas Moniz School of Health and Science, Lisbon, Almada 1349-019, Portugal
| | - Anders Sönnerborg
- Department of Medicine Huddinge, Karolinska University Hospital, Stockholm 14186, Sweden
- Division of Infectious Diseases, Department of Clinical Microbiology, Karolinska Institutet, Stockholm 14152, Sweden
| | - Rolf Kaiser
- Institute of Virology, Faculty of Medicine, University Hospital Cologne, University of Cologne, Cologne 50935, Germany
| | - Maurizio Zazzi
- Department of Medical Biotechnology, University of Siena, Siena 53100, Italy
| | - Nico Pfeifer
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Tuebingen 72076, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen 72076, Germany
| |
Collapse
|
3
|
Malik M, Le VT, Ou YY. NA_mCNN: Classification of Sodium Transporters in Membrane Proteins by Integrating Multi-Window Deep Learning and ProtTrans for Their Therapeutic Potential. J Proteome Res 2025; 24:2324-2335. [PMID: 40193588 PMCID: PMC12053934 DOI: 10.1021/acs.jproteome.4c00884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 01/01/2025] [Accepted: 03/19/2025] [Indexed: 04/09/2025]
Abstract
Sodium transporters maintain cellular homeostasis by transporting ions, minerals, and nutrients across the membrane, and Na+/K+ ATPases facilitate the cotransport of solutes in neurons, muscle cells, and epithelial cells. Sodium transporters are important for many physiological processes, and their dysfunction leads to diseases such as hypertension, diabetes, neurological disorders, and cancer. The NA_mCNN computational method highlights the functional diversity and significance of sodium transporters in membrane proteins using protein language model embeddings (PLMs) and multiple-window scanning deep learning models. This work investigates PLMs that include Tape, ProtTrans, ESM-1b-1280, and ESM-2-128 to achieve more accuracy in sodium transporter classification. Five-fold cross-validation and independent testing demonstrate ProtTrans embedding robustness. In cross-validation, ProtTrans achieved an AUC of 0.9939, a sensitivity of 0.9829, and a specificity of 0.9889, demonstrating its ability to distinguish positive and negative samples. In independent testing, ProtTrans maintained a sensitivity of 0.9765, a specificity of 0.9991, and an AUC of 0.9975, which indicates its high level of discrimination. This study advances the understanding of sodium transporter diversity and function, as well as their role in human pathophysiology. Our goal is to use deep learning techniques and protein language models for identifying sodium transporters to accelerate identification and develop new therapeutic interventions.
Collapse
Affiliation(s)
- Muhammad
Shahid Malik
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Department
of Computer Sciences, Karakoram International
University, Gilgit-Baltistan 15100, Pakistan
| | - Van The Le
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Graduate
Program in Biomedical Informatics, Yuan
Ze University, Chung-Li 32003, Taiwan
| |
Collapse
|
4
|
Chuang CC, Liu YC, Ou YY. DeepEpiIL13: Deep Learning for Rapid and Accurate Prediction of IL-13-Inducing Epitopes Using Pretrained Language Models and Multiwindow Convolutional Neural Networks. ACS OMEGA 2025; 10:9675-9683. [PMID: 40092768 PMCID: PMC11904640 DOI: 10.1021/acsomega.4c10960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 02/12/2025] [Accepted: 02/14/2025] [Indexed: 03/19/2025]
Abstract
Accurate prediction of interleukin-13 (IL-13)-inducing epitopes is crucial for advancing targeted therapies against allergic inflammation, the cytokine storm associated with severe COVID-19, and related disorders. Current epitope prediction methods, however, often exhibit limitations in efficiency and accuracy. To address this, we introduce DeepEpilL13, a novel deep learning framework that uniquely synergizes pretrained language models with multiwindow convolutional neural networks (CNNs) for the rapid and accurate identification of IL-13-inducing epitopes from protein sequences. DeepEpilL13 leverages high-dimensional embeddings generated by the pretrained language model, which capture rich contextual information from protein sequences. These embeddings are then processed by a multiwindow CNN architecture, enabling the effective exploration of both local and global sequence patterns pertinent to IL-13 induction. The proposed DeepEpilL13 approach underwent rigorous evaluation using both benchmark data sets and an independent SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) data set. Results demonstrate that DeepEpilL13 achieves superior performance compared with traditional methods. On the benchmark data set, DeepEpilL13 attained a Matthews correlation coefficient (MCC) of 0.52 and an area under the receiver operating characteristic curve (AUC) of 0.86. Notably, when assessed on the independent SARS-CoV-2 data set, DeepEpilL13 exhibited remarkable robustness, achieving an MCC of 0.63 and an AUC of 0.92. These metrics underscore the enhanced predictive capability and robust applicability of DeepEpilL13, particularly within the context of the COVID-19 research and related viral infections. This study presents DeepEpilL13 as a powerful and efficient deep learning framework for accurate epitope prediction. By offering significant improvement in performance and robustness, DeepEpilL13 provides new and promising avenues for the development of epitope-based vaccines and immunotherapies specifically targeting IL-13-mediated disorders. The successful and rapid identification of IL-13-inducing epitopes using DeepEpilL13 paves the way for novel therapeutic interventions against a range of conditions, including allergic diseases, inflammatory conditions, and severe viral infections such as COVID-19, with potential for a significant impact on public health outcomes.
Collapse
Affiliation(s)
- Cheng-Che Chuang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Chen Liu
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Graduate
Program in Biomedical Informatics, Yuan
Ze University, Chung-Li 32003, Taiwan
| |
Collapse
|
5
|
Chuang CC, Liu YC, Jhang WE, Wei SS, Ou YY. RAG_MCNNIL6: A Retrieval-Augmented Multi-Window Convolutional Network for Accurate Prediction of IL-6 Inducing Epitopes. J Chem Inf Model 2025; 65:2685-2694. [PMID: 39967508 PMCID: PMC11898070 DOI: 10.1021/acs.jcim.4c02144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Revised: 01/20/2025] [Accepted: 02/11/2025] [Indexed: 02/20/2025]
Abstract
Interleukin-6 (IL-6) is a critical cytokine involved in immune regulation, inflammation, and the pathogenesis of various diseases, including autoimmune disorders, cancer, and the cytokine storm associated with severe COVID-19. Identifying IL-6 inducing epitopes, the short peptide fragments that trigger IL-6 production, is crucial for developing epitope-based vaccines and immunotherapies. However, traditional methods for epitope prediction often lack accuracy and efficiency. This study presents RAG_MCNNIL6, a novel deep learning framework that integrates Retrieval-augmented generation (RAG) with multiwindow convolutional neural networks (MCNNs) for accurate and rapid prediction of IL-6 inducing epitopes. RAG_MCNNIL6 leverages ProtTrans, a state-of-the-art pretrained protein language model, to generate rich embedding representations of peptide sequences. By incorporating a RAG-based similarity retrieval and embedding augmentation strategy, RAG_MCNNIL6 effectively captures both local and global sequence patterns relevant for IL-6 induction, significantly improving prediction performance compared to existing methods. We demonstrate the superior performance of RAG_MCNNIL6 on benchmark data sets, highlighting its potential for advancing research and therapeutic development for IL-6-mediated diseases.
Collapse
Affiliation(s)
- Cheng-Che Chuang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Chen Liu
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Wei-En Jhang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Sin-Siang Wei
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Graduate
Program in Biomedical Informatics, Yuan
Ze University, Chung-Li 32003, Taiwan
| |
Collapse
|
6
|
Shah SMA, Rafi M, Malik MS, Malik SA, Ou YY. mCNN-glucose: Identifying families of glucose transporters using a deep convolutional neural network based on multiple-scanning windows. Int J Biol Macromol 2025; 294:139522. [PMID: 39761890 DOI: 10.1016/j.ijbiomac.2025.139522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Revised: 01/01/2025] [Accepted: 01/03/2025] [Indexed: 01/11/2025]
Abstract
Glucose transporters are essential carrier proteins that function on the phospholipid bilayer to facilitate glucose diffusion across cell membranes. The transporters play many physiological and pathological roles in addition to absorption and metabolism of fructose in food and the pathogenesis of gastrointestinal diseases. These carrier proteins play an important role in diseases of the nervous system, cardiovascular system, digestive system, and urinary system. These essential transporters have been extensively studied as potential therapeutic targets for cancers such as pancreatic, prostate, and hepatocellular carcinoma, which serve as diagnostic and prognostic indicators. The method uses position-specific scoring metrics (PSSM) with multiple-scanning windows-based convolutional neural networks to classify glucose transport proteins based on their functional significance and crucial role in therapy. Convolutional neural networks with multiple window scanning are employed to capture biologically meaningful, significant, and meaningful features from PSSM evolutionary profiles. Our proposed Method obtained Matthews correlation coefficients (MCC) of 0.99, Accuracy (AC) of 99.46, for Glucose facilitative transporters (GLUT), 0.99, 99.46, for Sodium Coupled glucose transporters (SGLT), and 0.92, and 97.3 for Sugars will eventually be exported transporters (SWEET) respectively. This study shows significantly higher performance than our previous study, which could be used to accurately classify novel glucose transporters.
Collapse
Affiliation(s)
- Syed Muazzam Ali Shah
- Department of Software Engineering, National University of Computer and Emerging Sciences, Shah Latif Town, 75030 Karachi, Pakistan
| | - Muhammad Rafi
- Artificial Intelligence and Data Science Department, National University of Computer and Emerging Sciences, Shah Latif Town, 75030 Karachi, Pakistan
| | - Muhammad Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Zhongli, Taoyuan 320315, Taiwan; Department of Computer Sciences, Karakoram International University, Gilgit-Baltistan 15100, Pakistan
| | - Sohail Ahmed Malik
- Artificial Intelligence and Data Science Department, National University of Computer and Emerging Sciences, Shah Latif Town, 75030 Karachi, Pakistan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Zhongli, Taoyuan 320315, Taiwan; Graduate program for Biomedical Informatics, Yuan Ze University, Zhongli, Taoyuan 320315, Taiwan.
| |
Collapse
|
7
|
Le VT, Malik MS, Lin YJ, Liu YC, Chang YY, Ou YY. ATP_mCNN: Predicting ATP binding sites through pretrained language models and multi-window neural networks. Comput Biol Med 2025; 185:109541. [PMID: 39653625 DOI: 10.1016/j.compbiomed.2024.109541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 11/20/2024] [Accepted: 12/05/2024] [Indexed: 01/26/2025]
Abstract
Adenosine triphosphate plays a vital role in providing energy and enabling key cellular processes through interactions with binding proteins. The increasing amount of protein sequence data necessitates computational methods for identifying binding sites. However, experimental identification of adenosine triphosphate-binding residues remains challenging. To address the challenge, we developed a multi-window convolutional neural network architecture taking pre-trained protein language model embeddings as input features. In particular, multiple parallel convolutional layers scan for motifs localized to different window sizes. Max pooling extracts salient features concatenated across windows into a final multi-scale representation for residue-level classification. On benchmark datasets, our model achieves an area under the ROC curve of 0.95, significantly improving on prior sequence-based models and outperforming convolutional neural network baselines. This demonstrates the utility of pre-trained language models and multi-window convolutional neural networks for advanced sequence-based prediction of adenosine triphosphate-binding residues. Our approach provides a promising new direction for elucidating binding mechanisms and interactions from primary structure.
Collapse
Affiliation(s)
- Van-The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Muhammad-Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Department of Computer Sciences, Karakoram International University, Gilgit-Baltistan, 15100, Pakistan
| | - Yi-Jing Lin
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Chen Liu
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yan-Yun Chang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|
8
|
Zhang H, Wei Y, Saravanan KM. Artificial intelligence and computer-aided drug discovery: Methods development and application. Methods 2025; 234:294-295. [PMID: 39826658 DOI: 10.1016/j.ymeth.2025.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025] Open
Affiliation(s)
- Haiping Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055 China.
| | - Yanjie Wei
- Shenzhen Key Laboratory of Intelligent Bioinformatics & Center for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055 China.
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073 Tamil Nadu, India.
| |
Collapse
|
9
|
Malik M, Chang YY, Liu YC, Le VT, Ou YY. MCNN_MC: Computational Prediction of Mitochondrial Carriers and Investigation of Bongkrekic Acid Toxicity Using Protein Language Models and Convolutional Neural Networks. J Chem Inf Model 2024; 64:9125-9134. [PMID: 39133248 PMCID: PMC11683872 DOI: 10.1021/acs.jcim.4c00961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 07/26/2024] [Accepted: 07/29/2024] [Indexed: 08/13/2024]
Abstract
Mitochondrial carriers (MCs) are essential proteins that transport metabolites across mitochondrial membranes and play a critical role in cellular metabolism. ADP/ATP (adenosine diphosphate/adenosine triphosphate) is one of the most important carriers as it contributes to cellular energy production and is susceptible to the powerful toxin bongkrekic acid. This toxin has claimed several lives; for example, a recent foodborne outbreak in Taipei, Taiwan, has caused four deaths and sickened 30 people. The issue of bongkrekic acid poisoning has been a long-standing problem in Indonesia, with reports as early as 1895 detailing numerous deaths from contaminated coconut fermented cakes. In bioinformatics, significant advances have been made in understanding biological processes through computational methods; however, no established computational method has been developed for identifying mitochondrial carriers. We propose a computational bioinformatics approach for predicting MCs from a broader class of secondary active transporters with a focus on the ADP/ATP carrier and its interaction with bongkrekic acid. The proposed model combines protein language models (PLMs) with multiwindow scanning convolutional neural networks (mCNNs). While PLM embeddings capture contextual information within proteins, mCNN scans multiple windows to identify potential binding sites and extract local features. Our results show 96.66% sensitivity, 95.76% specificity, 96.12% accuracy, 91.83% Matthews correlation coefficient (MCC), 94.63% F1-Score, and 98.55% area under the curve (AUC). The results demonstrate the effectiveness of the proposed approach in predicting MCs and elucidating their functions, particularly in the context of bongkrekic acid toxicity. This study presents a valuable approach for identifying novel mitochondrial complexes, characterizing their functional roles, and understanding mitochondrial toxicology mechanisms. Our findings, that utilize computational methods to improve our understanding of cellular processes and drug-target interactions, contribute to the development of therapeutic strategies for mitochondrial disorders, reducing the devastating effects of bongkrekic acid poisoning.
Collapse
Affiliation(s)
- Muhammad
Shahid Malik
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Department
of Computer Sciences, Karakoram International
University, Gilgit-Baltistan 15100, Pakistan
| | - Yan-Yun Chang
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Chen Liu
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Van The Le
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department
of Computer Science and Engineering, Yuan
Ze University, Chung-Li 32003, Taiwan
- Graduate
Program in Biomedical Informatics, Yuan
Ze University, Chung-Li 32003, Taiwan
| |
Collapse
|
10
|
Malik MS, Le VT, Shah SMA, Ou YY. MCNN-AAPT: accurate classification and functional prediction of amino acid and peptide transporters in secondary active transporters using protein language models and multi-window deep learning. J Biomol Struct Dyn 2024:1-10. [PMID: 39576667 DOI: 10.1080/07391102.2024.2431664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 04/23/2024] [Indexed: 02/28/2025]
Abstract
Secondary active transporters play a crucial role in cellular physiology by facilitating the movement of molecules across cell membranes. Identifying the functional classes of these transporters, particularly amino acid and peptide transporters, is essential for understanding their involvement in various physiological processes and disease pathways, including cancer. This study aims to develop a robust computational framework that integrates pre-trained protein language models and deep learning techniques to classify amino acid and peptide transporters within the secondary active transporter (SAT) family and predict their functional association with solute carrier (SLC) proteins. The study leverages a comprehensive dataset of 448 secondary active transporters, including 36 solute carrier proteins, obtained from UniProt and the Transporter Classification Database (TCDB). Three state-of-the-art protein language models, ProtTrans, ESM-1b, and ESM-2, are evaluated within a deep learning neural network architecture that employs a multi-window scanning technique to capture local and global sequence patterns. The ProtTrans-based feature set demonstrates exceptional performance, achieving a classification accuracy of 98.21% with 87.32% sensitivity and 99.76% specificity for distinguishing amino acid and peptide transporters from other SATs. Furthermore, the model maintains strong predictive ability for SLC proteins, with an overall accuracy of 88.89% and a Matthews Correlation Coefficient (MCC) of 0.7750. This study showcases the power of integrating pre-trained protein language models and deep learning techniques for the functional classification of secondary active transporters and the prediction of associated solute carrier proteins. The findings have significant implications for drug development, disease research, and the broader understanding of cellular transport mechanisms.
Collapse
Affiliation(s)
- Muhammad Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
- Department of Computer Sciences, Karakoram International University, Gilgit-Baltistan, Pakistan
| | - Van The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
| | | | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
- Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, Taiwan
| |
Collapse
|
11
|
Zhang H, Wei Y, Saravanan KM. Artificial intelligence and computer-aided drug discovery: Methods development and application. Methods 2024; 231:55-56. [PMID: 39265960 DOI: 10.1016/j.ymeth.2024.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2024] Open
Affiliation(s)
- Haiping Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| | - Yanjie Wei
- Shenzhen Key Laboratory of Intelligent Bioinformatics & Center for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai 600073, Tamil Nadu, India.
| |
Collapse
|
12
|
Le VT, Tseng YH, Liu YC, Malik MS, Ou YY. VesiMCNN: Using pre-trained protein language models and multiple window scanning convolutional neural networks to identify vesicular transport proteins. Int J Biol Macromol 2024; 280:136048. [PMID: 39332561 DOI: 10.1016/j.ijbiomac.2024.136048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 09/16/2024] [Accepted: 09/25/2024] [Indexed: 09/29/2024]
Abstract
Vesicular transport is a critical cellular process responsible for the proper organization and functioning of eukaryotic cells. This mechanism relies on specialized vesicles that shuttle macromolecules, such as proteins, across the cellular landscape, a process pivotal to maintaining cellular homeostasis. Disruptions in vesicular transport have been linked to various disease mechanisms, including cancer and neurodegenerative disorders. In this study, we present vesiMCNN, a novel computational approach that integrates pre-trained protein language models with a multi-window scanning convolutional neural network architecture to accurately identify vesicular transport proteins. To the best of our knowledge, this is the first study to leverage the power of pre-trained language models in combination with the multi-window scanning technique for this task. Our method achieved a Matthews Correlation Coefficient (MCC) of 0.558 and an Area Under the Receiver Operating Characteristic (AUC-ROC) of 0.933, outperforming existing state-of-the-art approaches. Additionally, we have curated a comprehensive benchmark dataset for the study of vesicular transport proteins, which can facilitate further research in this field. The remarkable performance of our model, combined with the comprehensive dataset and novel deep learning model, marks a significant advancement in the field of vesicular transport protein research.
Collapse
Affiliation(s)
- Van The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yi-Hsuan Tseng
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yu-Chen Liu
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Muhammad Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan; Department of Computer Sciences, Karakoram International University, Gilgit, -Baltistan, 15100, Pakistan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li 32003, Taiwan.
| |
Collapse
|
13
|
Le VT, Malik MS, Tseng YH, Lee YC, Huang CI, Ou YY. DeepPLM_mCNN: An approach for enhancing ion channel and ion transporter recognition by multi-window CNN based on features from pre-trained language models. Comput Biol Chem 2024; 110:108055. [PMID: 38555810 DOI: 10.1016/j.compbiolchem.2024.108055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/28/2024] [Accepted: 03/19/2024] [Indexed: 04/02/2024]
Abstract
Accurate classification of membrane proteins like ion channels and transporters is critical for elucidating cellular processes and drug development. We present DeepPLM_mCNN, a novel framework combining Pretrained Language Models (PLMs) and multi-window convolutional neural networks (mCNNs) for effective classification of membrane proteins into ion channels and ion transporters. Our approach extracts informative features from protein sequences by utilizing various PLMs, including TAPE, ProtT5_XL_U50, ESM-1b, ESM-2_480, and ESM-2_1280. These PLM-derived features are then input into a mCNN architecture to learn conserved motifs important for classification. When evaluated on ion transporters, our best performing model utilizing ProtT5 achieved 90% sensitivity, 95.8% specificity, and 95.4% overall accuracy. For ion channels, we obtained 88.3% sensitivity, 95.7% specificity, and 95.2% overall accuracy using ESM-1b features. Our proposed DeepPLM_mCNN framework demonstrates significant improvements over previous methods on unseen test data. This study illustrates the potential of combining PLMs and deep learning for accurate computational identification of membrane proteins from sequence data alone. Our findings have important implications for membrane protein research and drug development targeting ion channels and transporters. The data and source codes in this study are publicly available at the following link: https://github.com/s1129108/DeepPLM_mCNN.
Collapse
Affiliation(s)
- Van-The Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Muhammad-Shahid Malik
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Department of Computer Science and Engineering, Karakoram International University, Pakistan
| | - Yi-Hsuan Tseng
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Cheng Lee
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Cheng-I Huang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li, 32003, Taiwan.
| |
Collapse
|