1
|
Shao Y, Liu T. iNClassSec-ESM: Discovering potential non-classical secreted proteins through a novel protein language model. Comput Struct Biotechnol J 2025; 27:1350-1358. [PMID: 40235638 PMCID: PMC11999076 DOI: 10.1016/j.csbj.2025.03.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2024] [Revised: 03/15/2025] [Accepted: 03/26/2025] [Indexed: 04/17/2025] Open
Abstract
Non-classical secreted proteins (NCSPs) are a class of proteins lacking signal peptides, secreted by Gram-positive bacteria through non-classical secretion pathways. With the increasing demand for highly secreted proteins in recent years, non-classical secretion pathways have received more attention due to their advantages over classical secretion pathways (Sec/Tat). However, because the mechanisms of non-classical secretion pathways are not yet clear, identifying NCSPs through biological experiments is expensive and time-consuming, making it imperative to develop computational methods to address this issue. Existing NCSP prediction methods mainly use traditional handcrafted features to represent proteins from sequence information, which limits the models' ability to capture complex protein characteristics. In this study, we proposed a novel NCSP predictor, iNClassSec-ESM, which combined deep learning with traditional classifiers to enhance prediction performance. iNClassSec-ESM integrates an XGBoost model trained on comprehensive handcrafted features and a Deep Neural Network (DNN) trained on hidden layer embeddings from the protein language model (PLM) ESM3. The ESM3 is the recently proposed multimodal PLM and has not yet been fully explored in terms of protein representation. Therefore, we extracted hidden layer embeddings from ESM3 as inputs for multiple classifiers and deep learning networks, and compared them with existing PLMs. Benchmark experiments indicate that iNClassSec-ESM outperforms most of existing methods across multiple performance metrics and could serve as an effective tool for discovering potential NCSPs. Additionally, the ESM3 hidden layer embeddings, as an innovative protein representation method, show great potential for the application in broader protein-related classification tasks. The source code of iNClassSec-ESM and the ESM3 embeddings extraction script are publicly available at https://github.com/AmamiyaHoshie/iNClassSec-ESM/.
Collapse
Affiliation(s)
- Yizhou Shao
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai, 201306, China
| |
Collapse
|
2
|
Foltran BB, Teixeira AF, Romero EC, Fernandes LGV, Nascimento ALTO. Leucine-rich repeat proteins of Leptospira interrogans that interact to host glycosaminoglycans and integrins. Front Microbiol 2024; 15:1497712. [PMID: 39659425 PMCID: PMC11629876 DOI: 10.3389/fmicb.2024.1497712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Accepted: 11/05/2024] [Indexed: 12/12/2024] Open
Abstract
Pathogenic spirochaetes of the genus Leptospira are the etiological agents of leptospirosis, a zoonotic infection worldwide. The disease is considered an emerging and re-emerging threat due to global warming, followed by heavy rainfall and flooding when outbreaks of leptospirosis occur. Adhesion to host tissues is mediated by surface/extracellular proteins expressed by pathogens during infection. Leucine-rich repeat (LRR) domain-containing proteins seem to be important for the virulence of pathogenic Leptospira and their role has been recently examined. Here, we report the characterization of two LRR-proteins encoded by LIC11051 and LIC11505. They present 7 and 17 LRR motifs, respectively. LIC11051 was found mainly in the P1 subclade, whereas LIC11505 was identified with higher identity in subclade P1, but was also found in subclades P2, S1, and S2. The recombinant proteins were recognized by antibodies in leptospirosis serum samples, suggesting their expression during infection. rLIC11505 contains a broad spectrum of ligands, including GAG and integrin receptors, whereas rLIC11051 showed limited binding activity. The attachment of proteins to ligands was specific, dose-dependent, and saturable. Compared to their role in adhesion, both proteins were shown to be secreted, with the ability to reassociate with the bacteria. Taken together, our data suggested that LIC11051 and LIC11505 participate in leptospiral pathogenesis. To the best of our knowledge, this is the first report showing leptospiral LRR-proteins exhibiting GAG and integrin receptor-binding properties.
Collapse
Affiliation(s)
- Bruno B. Foltran
- Laboratório de Desenvolvimento de Vacinas, Instituto Butantan, São Paulo, Brazil
- Programa de Pós-Graduação Interunidades em Biotecnologia, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil
| | - Aline F. Teixeira
- Laboratório de Desenvolvimento de Vacinas, Instituto Butantan, São Paulo, Brazil
| | - Eliete C. Romero
- Centro de Bacteriologia, Instituto Adolfo Lutz, São Paulo, Brazil
| | - Luis G. V. Fernandes
- Infectious Bacterial Disease Research Unit, U.S. Department of Agriculture (USDA) Agricultural Research Service, National Animal Disease Center, Ames, IA, United States
| | | |
Collapse
|
3
|
Zhao Y, Zhang S, Liang Y. HemoFuse: multi-feature fusion based on multi-head cross-attention for identification of hemolytic peptides. Sci Rep 2024; 14:22518. [PMID: 39342017 PMCID: PMC11438874 DOI: 10.1038/s41598-024-74326-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 09/25/2024] [Indexed: 10/01/2024] Open
Abstract
Hemolytic peptides are therapeutic peptides that damage red blood cells. However, therapeutic peptides used in medical treatment must exhibit low toxicity to red blood cells to achieve the desired therapeutic effect. Therefore, accurate prediction of the hemolytic activity of therapeutic peptides is essential for the development of peptide therapies. In this study, a multi-feature cross-fusion model, HemoFuse, for hemolytic peptide identification is proposed. The feature vectors of peptide sequences are transformed by word embedding technique and four hand-crafted feature extraction methods. We apply multi-head cross-attention mechanism to hemolytic peptide identification for the first time. It captures the interaction between word embedding features and hand-crafted features by calculating the attention of all positions in them, so that multiple features can be deeply fused. Moreover, we visualize the features obtained by this module to enhance its interpretability. On the comprehensive integrated dataset, HemoFuse achieves ideal results, with ACC, SP, SN, MCC, F1, AUC, and AP of 0.7575, 0.8814, 0.5793, 0.4909, 0.6620, 0.8387, and 0.7118, respectively. Compared with HemoDL proposed by Yang et al., it is 3.32%, 3.89%, 5.93%, 10.6%, 8.17%, 5.88%, and 2.72% higher. Other ablation experiments also prove that our model is reasonable and efficient. The codes and datasets are accessible at https://github.com/z11code/Hemo .
Collapse
Affiliation(s)
- Ya Zhao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P. R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P. R. China.
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P. R. China
| |
Collapse
|
4
|
Zhang Y, Yu L, Yang M, Han B, Luo J, Jing R. Model fusion for predicting unconventional proteins secreted by exosomes using deep learning. Proteomics 2024; 24:e2300184. [PMID: 38643383 DOI: 10.1002/pmic.202300184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/22/2024]
Abstract
Unconventional secretory proteins (USPs) are vital for cell-to-cell communication and are necessary for proper physiological processes. Unlike classical proteins that follow the conventional secretory pathway via the Golgi apparatus, these proteins are released using unconventional pathways. The primary modes of secretion for USPs are exosomes and ectosomes, which originate from the endoplasmic reticulum. Accurate and rapid identification of exosome-mediated secretory proteins is crucial for gaining valuable insights into the regulation of non-classical protein secretion and intercellular communication, as well as for the advancement of novel therapeutic approaches. Although computational methods based on amino acid sequence prediction exist for predicting unconventional proteins secreted by exosomes (UPSEs), they suffer from significant limitations in terms of algorithmic accuracy. In this study, we propose a novel approach to predict UPSEs by combining multiple deep learning models that incorporate both protein sequences and evolutionary information. Our approach utilizes a convolutional neural network (CNN) to extract protein sequence information, while various densely connected neural networks (DNNs) are employed to capture evolutionary conservation patterns.By combining six distinct deep learning models, we have created a superior framework that surpasses previous approaches, achieving an ACC score of 77.46% and an MCC score of 0.5406 on an independent test dataset.
Collapse
Affiliation(s)
- Yonglin Zhang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Lezheng Yu
- School of Chemistry and Materials Science, Guizhou Education University, Guiyang, Guizhou, China
| | - Ming Yang
- Department of Clinical Pharmacy and Pharmacy Management, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan, China
| | - Bin Han
- GCP Center/Institute of Drug Clinical Trials, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Jiesi Luo
- Basic Medical College, Southwest Medical University, Luzhou, Sichuan, China
| | - Runyu Jing
- School of Cyber Science and Engineering, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
5
|
Lee CN, Hall BA, Sanford L, Molehin AJ. Molecular Characterization and Functional Analysis of a Schistosoma mansoni Serine Protease Inhibitor, Smserpin-p46. Microorganisms 2024; 12:1164. [PMID: 38930546 PMCID: PMC11205507 DOI: 10.3390/microorganisms12061164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 05/30/2024] [Accepted: 06/05/2024] [Indexed: 06/28/2024] Open
Abstract
Serine protease inhibitors are a superfamily of proteins that regulate various physiological processes including fibrinolysis, inflammation and immune responses. In parasite systems, serpins are believed to play important roles in parasite colonization, inhibition of host immune serine proteases and penetration of defensive barriers. However, serpins are less well characterized in schistosomes. In this study, a Schistosoma mansoni serpin (Smserpin-p46) containing a 1360 base pair open reading frame, was cloned, expressed and functionally characterized. Bioinformatics analysis revealed that Smserpin-p46 contains the key residues, structural domains and motifs characteristic of inhibitory serpins. Gene expression profiling demonstrated stage-specific expression of Smserpin-p46 with the highest expression in adult male worms. Recombinant Smserpin-p46 (rSmserpin-p46) inhibited both human neutrophil cathepsin G and elastase, key serine proteases involved in NETosis, a program for the formation of neutrophil extracellular traps. Using specific rabbit antiserum, Smserpin-p46 was detected in soluble worm antigen preparation and was localized to the adult worm tegument. Cumulatively, the expression of Smserpin-p46 on the parasite tegument and its ability to inhibit proteases involved in NETosis highlights the importance of this serpin in parasite-host interactions and encourages its further investigation as a candidate vaccine antigen for the control of schistosomiasis.
Collapse
Affiliation(s)
- Christine N. Lee
- Biomedical Sciences Program, College of Graduate Studies, Midwestern University, Glendale, AZ 85308, USA;
| | - Brooke Ashlyn Hall
- Department of Microbiology and Immunology, College of Graduate Studies, Midwestern University, Glendale, AZ 85308, USA;
| | - Leah Sanford
- Arizona College of Osteopathic Medicine, Midwestern University, Glendale, AZ 85308, USA;
| | - Adebayo J. Molehin
- Department of Microbiology and Immunology, College of Graduate Studies, Midwestern University, Glendale, AZ 85308, USA;
- Arizona College of Osteopathic Medicine, Midwestern University, Glendale, AZ 85308, USA;
| |
Collapse
|
6
|
Liu T, Song C, Wang C. NCSP-PLM: An ensemble learning framework for predicting non-classical secreted proteins based on protein language models and deep learning. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:1472-1488. [PMID: 38303473 DOI: 10.3934/mbe.2024063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Non-classical secreted proteins (NCSPs) refer to a group of proteins that are located in the extracellular environment despite the absence of signal peptides and motifs. They usually play different roles in intercellular communication. Therefore, the accurate prediction of NCSPs is a critical step to understanding in depth their associated secretion mechanisms. Since the experimental recognition of NCSPs is often costly and time-consuming, computational methods are desired. In this study, we proposed an ensemble learning framework, termed NCSP-PLM, for the identification of NCSPs by extracting feature embeddings from pre-trained protein language models (PLMs) as input to several fine-tuned deep learning models. First, we compared the performance of nine PLM embeddings by training three neural networks: Multi-layer perceptron (MLP), attention mechanism and bidirectional long short-term memory network (BiLSTM) and selected the best network model for each PLM embedding. Then, four models were excluded due to their below-average accuracies, and the remaining five models were integrated to perform the prediction of NCSPs based on the weighted voting. Finally, the 5-fold cross validation and the independent test were conducted to evaluate the performance of NCSP-PLM on the benchmark datasets. Based on the same independent dataset, the sensitivity and specificity of NCSP-PLM were 91.18% and 97.06%, respectively. Particularly, the overall accuracy of our model achieved 94.12%, which was 7~16% higher than that of the existing state-of-the-art predictors. It indicated that NCSP-PLM could serve as a useful tool for the annotation of NCSPs.
Collapse
Affiliation(s)
- Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Chen Song
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Chunhua Wang
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| |
Collapse
|
7
|
Parthiban S, Vijeesh T, Gayathri T, Shanmugaraj B, Sharma A, Sathishkumar R. Artificial intelligence-driven systems engineering for next-generation plant-derived biopharmaceuticals. FRONTIERS IN PLANT SCIENCE 2023; 14:1252166. [PMID: 38034587 PMCID: PMC10684705 DOI: 10.3389/fpls.2023.1252166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023]
Abstract
Recombinant biopharmaceuticals including antigens, antibodies, hormones, cytokines, single-chain variable fragments, and peptides have been used as vaccines, diagnostics and therapeutics. Plant molecular pharming is a robust platform that uses plants as an expression system to produce simple and complex recombinant biopharmaceuticals on a large scale. Plant system has several advantages over other host systems such as humanized expression, glycosylation, scalability, reduced risk of human or animal pathogenic contaminants, rapid and cost-effective production. Despite many advantages, the expression of recombinant proteins in plant system is hindered by some factors such as non-human post-translational modifications, protein misfolding, conformation changes and instability. Artificial intelligence (AI) plays a vital role in various fields of biotechnology and in the aspect of plant molecular pharming, a significant increase in yield and stability can be achieved with the intervention of AI-based multi-approach to overcome the hindrance factors. Current limitations of plant-based recombinant biopharmaceutical production can be circumvented with the aid of synthetic biology tools and AI algorithms in plant-based glycan engineering for protein folding, stability, viability, catalytic activity and organelle targeting. The AI models, including but not limited to, neural network, support vector machines, linear regression, Gaussian process and regressor ensemble, work by predicting the training and experimental data sets to design and validate the protein structures thereby optimizing properties such as thermostability, catalytic activity, antibody affinity, and protein folding. This review focuses on, integrating systems engineering approaches and AI-based machine learning and deep learning algorithms in protein engineering and host engineering to augment protein production in plant systems to meet the ever-expanding therapeutics market.
Collapse
Affiliation(s)
- Subramanian Parthiban
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Thandarvalli Vijeesh
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Thashanamoorthi Gayathri
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Balamurugan Shanmugaraj
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| | - Ashutosh Sharma
- Tecnologico de Monterrey, School of Engineering and Sciences, Centre of Bioengineering, Queretaro, Mexico
| | - Ramalingam Sathishkumar
- Plant Genetic Engineering Laboratory, Department of Biotechnology, Bharathiar University, Coimbatore, India
| |
Collapse
|
8
|
Li F, Guo X, Bi Y, Jia R, Pitt ME, Pan S, Li S, Gasser RB, Coin LJ, Song J. Digerati - A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins. Comput Biol Med 2023; 163:107155. [PMID: 37356289 DOI: 10.1016/j.compbiomed.2023.107155] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/27/2023]
Abstract
The genome of Mycobacterium tuberculosis contains a relatively high percentage (10%) of genes that are poorly characterised because of their highly repetitive nature and high GC content. Some of these genes encode proteins of the PE/PPE family, which are thought to be involved in host-pathogen interactions, virulence, and disease pathogenicity. Members of this family are genetically divergent and challenging to both identify and classify using conventional computational tools. Thus, advanced in silico methods are needed to identify proteins of this family for subsequent functional annotation efficiently. In this study, we developed the first deep learning-based approach, termed Digerati, for the rapid and accurate identification of PE and PPE family proteins. Digerati was built upon a multipath parallel hybrid deep learning framework, which equips multi-layer convolutional neural networks with bidirectional, long short-term memory, equipped with a self-attention module to effectively learn the higher-order feature representations of PE/PPE proteins. Empirical studies demonstrated that Digerati achieved a significantly better performance (∼18-20%) than alignment-based approaches, including BLASTP, PHMMER, and HHsuite, in both prediction accuracy and speed. Digerati is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE/PPE family members. The webserver and source codes of Digerati are publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/Digerati/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China; Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia.
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Yue Bi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, 3800, Australia
| | - Runchang Jia
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Miranda E Pitt
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia
| | - Shirui Pan
- School of Information and Communication Technology, Griffith University, QLD, 4222, Australia
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Robin B Gasser
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, VIC, 3010, Australia
| | - Lachlan Jm Coin
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, 3800, Australia.
| |
Collapse
|
9
|
Guo X, Li F, Song J. Predicting Pseudouridine Sites with Porpoise. Methods Mol Biol 2023; 2624:139-151. [PMID: 36723814 DOI: 10.1007/978-1-0716-2962-8_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Pseudouridine is a ubiquitous RNA modification and plays a crucial role in many biological processes. However, it remains a challenging task to identify pseudouridine sites using expensive and time-consuming experimental research. To this end, we present Porpoise, a computational approach to identify pseudouridine sites from RNA sequence data. Porpoise builds on a stacking ensemble learning framework with several informative features and achieves competitive performance compared with state-of-the-art approaches. This protocol elaborates on step-by-step use and execution of the local stand-alone version and the webserver of Porpoise. In addition, we also provide a general machine learning framework that can help identify the optimal stacking ensemble learning model using different combinations of feature-based features. This general machine learning framework can facilitate users to build their pseudouridine predictors using their in-house datasets.
Collapse
Affiliation(s)
- Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, China
| | - Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling, China.
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute, Monash University, Melbourne, VIC, Australia.
- Monash Data Futures Institute, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
10
|
Zhang H, Wang Y, Pan Z, Sun X, Mou M, Zhang B, Li Z, Li H, Zhu F. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief Bioinform 2022; 23:6747810. [PMID: 36198065 DOI: 10.1093/bib/bbac411] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/04/2022] [Accepted: 08/23/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.
Collapse
Affiliation(s)
- Hanyu Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.,Shanghai Key Laboratory of New Drug Design, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
11
|
Zhu L, Wang X, Li F, Song J. PreAcrs: a machine learning framework for identifying anti-CRISPR proteins. BMC Bioinformatics 2022; 23:444. [PMID: 36284264 PMCID: PMC9597991 DOI: 10.1186/s12859-022-04986-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 10/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Anti-CRISPR proteins are potent modulators that inhibit the CRISPR-Cas immunity system and have huge potential in gene editing and gene therapy as a genome-editing tool. Extensive studies have shown that anti-CRISPR proteins are essential for modifying endogenous genes, promoting the RNA-guided binding and cleavage of DNA or RNA substrates. In recent years, identifying and characterizing anti-CRISPR proteins has become a hot and significant research topic in bioinformatics. However, as most anti-CRISPR proteins fall short in sharing similarities to those currently known, traditional screening methods are time-consuming and inefficient. Machine learning methods could fill this gap with powerful predictive capability and provide a new perspective for anti-CRISPR protein identification. RESULTS Here, we present a novel machine learning ensemble predictor, called PreAcrs, to identify anti-CRISPR proteins from protein sequences directly. Three features and eight different machine learning algorithms were used to train PreAcrs. PreAcrs outperformed other existing methods and significantly improved the prediction accuracy for identifying anti-CRISPR proteins. CONCLUSIONS In summary, the PreAcrs predictor achieved a competitive performance for predicting new anti-CRISPR proteins in terms of accuracy and robustness. We anticipate PreAcrs will be a valuable tool for researchers to speed up the research process. The source code is available at: https://github.com/Lyn-666/anti_CRISPR.git .
Collapse
Affiliation(s)
- Lin Zhu
- Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Xiaoyu Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800 Australia
| |
Collapse
|