1
|
Lara-Ramírez D, Santacruz-Tinoco CE, Ramón-Gallegos E, Muñoz-Medina JE. In silico design of Ebola virus Glycoprotein antigenic peptides as vaccine candidates. PLoS One 2025; 20:e0319496. [PMID: 40153397 PMCID: PMC11952221 DOI: 10.1371/journal.pone.0319496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Accepted: 02/03/2025] [Indexed: 03/30/2025] Open
Abstract
Ebola virus (EBOV) is a filovirus that causes severe hemorrhagic fever and has a fatality rate between 50 and 90%. The vaccines were developed against the Ebola Zaire species; therefore, it is necessary to develop vaccines against other species to control future outbreaks. The objective of this work was to obtain vaccine candidate peptides against different EBOV species through the use of bioinformatics programs and servers that allow glycoprotein (GP) to be analyzed. GP sequences of various EBOV species that did not present gaps or unspecified amino acids or that were repeated (same year, region and laboratory) were downloaded from the NCBI database. A consensus sequence was generated and used to determine vaccine candidate peptides, which were evaluated, through a combination of servers and molecular dynamics, for their ability to interact with B and T lymphocytes, toxicity, allergenicity, solvent exposure, glycosylation, antigenicity, and presence in mature GP. Five vaccine candidate peptides were identified, of which PEP4 had the best characteristics evaluated in this study. PEP4 may be a potential candidate for the development of an EBOV vaccine.
Collapse
Affiliation(s)
- David Lara-Ramírez
- Environmental Cytopathology Laboratory, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico CityMexico
- División de Laboratorios Especializados. Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | | | - Eva Ramón-Gallegos
- Environmental Cytopathology Laboratory, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico CityMexico
| | | |
Collapse
|
2
|
Chatzimiltis S, Agathocleous M, Promponas VJ, Christodoulou C. Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings. Comput Struct Biotechnol J 2025; 27:243-251. [PMID: 39866664 PMCID: PMC11764030 DOI: 10.1016/j.csbj.2024.12.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 12/20/2024] [Accepted: 12/21/2024] [Indexed: 01/28/2025] Open
Abstract
Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem. In this paper, we deploy a Convolutional Neural Network (CNN) trained with the Subsampled Hessian Newton (SHN) method (a Hessian Free Optimisation variant), with a two- dimensional input representation of embeddings extracted from a language model pretrained with protein sequences. Utilising a CNN trained with the SHN method and the input embeddings, we achieved on average a 79.96% per residue (Q3) accuracy on the CB513 dataset and 81.45% Q3 accuracy on the PISCES dataset (without any post-processing techniques applied). The application of ensembles and filtering techniques to the results of the CNN improved the overall prediction performance. The Q3 accuracy on the CB513 increased to 93.65% and for the PISCES dataset to 87.13%. Moreover, our method was evaluated using the CASP13 dataset where we showed that as the post-processing window size increased, the prediction performance increased as well. In fact, with the biggest post-processing window size (limited by the smallest CASP13 protein), we achieved a Q3 accuracy of 98.12% and a Segment Overlap (SOV) score of 96.98 on the CASP13 dataset when the CNNs were trained with the PISCES dataset. Finally, we showed that input representations from embeddings can perform equally well as representations extracted from multiple sequence alignments.
Collapse
Affiliation(s)
- Sotiris Chatzimiltis
- University of Cyprus, Department of Computer Science, Nicosia, Cyprus
- 5G/6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, United Kingdom
| | - Michalis Agathocleous
- University of Cyprus, Department of Computer Science, Nicosia, Cyprus
- University of Nicosia, Department of Computer Science, Nicosia, Cyprus
| | | | | |
Collapse
|
3
|
Balakrishnan A, Mishra SK, Georrge JJ. Insight into Protein Engineering: From In silico Modelling to In vitro Synthesis. Curr Pharm Des 2025; 31:179-202. [PMID: 39354773 DOI: 10.2174/0113816128349577240927071706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/12/2024] [Accepted: 09/13/2024] [Indexed: 10/03/2024]
Abstract
Protein engineering alters the polypeptide chain to obtain a novel protein with improved functional properties. This field constantly evolves with advanced in silico tools and techniques to design novel proteins and peptides. Rational incorporating mutations, unnatural amino acids, and post-translational modifications increases the applications of engineered proteins and peptides. It aids in developing drugs with maximum efficacy and minimum side effects. Currently, the engineering of peptides is gaining attention due to their high stability, binding specificity, less immunogenic, and reduced toxicity properties. Engineered peptides are potent candidates for drug development due to their high specificity and low cost of production compared with other biologics, including proteins and antibodies. Therefore, understanding the current perception of designing and engineering peptides with the help of currently available in silico tools is crucial. This review extensively studies various in silico tools available for protein engineering in the prospect of designing peptides as therapeutics, followed by in vitro aspects. Moreover, a discussion on the chemical synthesis and purification of peptides, a case study, and challenges are also incorporated.
Collapse
Affiliation(s)
- Anagha Balakrishnan
- Department of Bioinformatics, University of North Bengal, Siliguri, District-Darjeeling, West Bengal 734013, India
| | - Saurav K Mishra
- Department of Bioinformatics, University of North Bengal, Siliguri, District-Darjeeling, West Bengal 734013, India
| | - John J Georrge
- Department of Bioinformatics, University of North Bengal, Siliguri, District-Darjeeling, West Bengal 734013, India
| |
Collapse
|
4
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. E-pRSA: Embeddings Improve the Prediction of Residue Relative Solvent Accessibility in Protein Sequence. J Mol Biol 2024; 436:168494. [PMID: 39237207 DOI: 10.1016/j.jmb.2024.168494] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 02/09/2024] [Accepted: 02/10/2024] [Indexed: 09/07/2024]
Abstract
Knowledge of the solvent accessibility of residues in a protein is essential for different applications, including the identification of interacting surfaces in protein-protein interactions and the characterization of variations. We describe E-pRSA, a novel web server to estimate Relative Solvent Accessibility values (RSAs) of residues directly from a protein sequence. The method exploits two complementary Protein Language Models to provide fast and accurate predictions. When benchmarked on different blind test sets, E-pRSA scores at the state-of-the-art, and outperforms a previous method we developed, DeepREx, which was based on sequence profiles after Multiple Sequence Alignments. The E-pRSA web server is freely available at https://e-prsa.biocomp.unibo.it/main/ where users can submit single-sequence and batch jobs.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
5
|
Liang Y, Lv D, Liu K, Yang L, Shu H, Wen L, Lv C, Sun Q, Yin J, Liu H, Xu J, Liu Z, Ding N. MicroProteinDB: A database to provide knowledge on sequences, structures and function of ncRNA-derived microproteins. Comput Biol Med 2024; 177:108660. [PMID: 38820774 DOI: 10.1016/j.compbiomed.2024.108660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 05/08/2024] [Accepted: 05/26/2024] [Indexed: 06/02/2024]
Abstract
Omics-based technologies have revolutionized our comprehension of microproteins encoded by ncRNAs, revealing their abundant presence and pivotal roles within complex functional landscapes. Here, we developed MicroProteinDB (http://bio-bigdata.hrbmu.edu.cn/MicroProteinDB), which offers and visualizes the extensive knowledge to aid retrieval and analysis of computationally predicted and experimentally validated microproteins originating from various ncRNA types. Employing prediction algorithms grounded in diverse deep learning approaches, MicroProteinDB comprehensively documents the fundamental physicochemical properties, secondary and tertiary structures, interactions with functional proteins, family domains, and inter-species conservation of microproteins. With five major analytical modules, it will serve as a valuable knowledge for investigating ncRNA-derived microproteins.
Collapse
Affiliation(s)
- Yinan Liang
- The First Affiliated Hospital, Harbin Medical University, Harbin, 150001, China
| | - Dezhong Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Kefan Liu
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150081, China
| | - Liting Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Huan Shu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Luan Wen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Chongwen Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Qisen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Jiaqi Yin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Hui Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| | - Zhigang Liu
- Affiliated Foshan Maternity&Child Healthcare Hospital, Southern Medical University, Guangzhou, 510000, China.
| | - Na Ding
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
6
|
Liang Y, Yin X, Zhang Y, Guo Y, Wang Y. Predicting lncRNA-protein interactions through deep learning framework employing multiple features and random forest algorithm. BMC Bioinformatics 2024; 25:108. [PMID: 38475723 DOI: 10.1186/s12859-024-05727-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 03/01/2024] [Indexed: 03/14/2024] Open
Abstract
RNA-protein interaction (RPI) is crucial to the life processes of diverse organisms. Various researchers have identified RPI through long-term and high-cost biological experiments. Although numerous machine learning and deep learning-based methods for predicting RPI currently exist, their robustness and generalizability have significant room for improvement. This study proposes LPI-MFF, an RPI prediction model based on multi-source information fusion, to address these issues. The LPI-MFF employed protein-protein interactions features, sequence features, secondary structure features, and physical and chemical properties as the information sources with the corresponding coding scheme, followed by the random forest algorithm for feature screening. Finally, all information was combined and a classification method based on convolutional neural networks is used. The experimental results of fivefold cross-validation demonstrated that the accuracy of LPI-MFF on RPI1807 and NPInter was 97.60% and 97.67%, respectively. In addition, the accuracy rate on the independent test set RPI1168 was 84.9%, and the accuracy rate on the Mus musculus dataset was 90.91%. Accordingly, LPI-MFF demonstrated greater robustness and generalization than other prevalent RPI prediction methods.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China
| | - XingRui Yin
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China
| | - YangSen Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China
| | - You Guo
- First Affiliated Hospital, Gannan Medical University, Medical College Road, Ganzhou, China.
| | - YingLong Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Zhimin Avenue, Nanchang, China.
| |
Collapse
|
7
|
Li S, Tian T, Zhang Z, Zou Z, Zhao D, Zeng J. PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction. Cell Syst 2023; 14:692-705.e6. [PMID: 37516103 DOI: 10.1016/j.cels.2023.05.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 11/25/2022] [Accepted: 05/19/2023] [Indexed: 07/31/2023]
Abstract
Protein-ligand interactions are essential for cellular activities and drug discovery processes. Appropriately and effectively representing protein features is of vital importance for developing computational approaches, especially data-driven methods, for predicting protein-ligand interactions. However, existing approaches may not fully investigate the features of the ligand-occupying regions in the protein pockets. Here, we design a structure-based protein representation method, named PocketAnchor, for capturing the local environmental and spatial features of protein pockets to facilitate protein-ligand interaction-related learning tasks. We define "anchors" as probe points reaching into the cavities and those located near the surface of proteins, and we design a specific message passing strategy for gathering local information from the atoms and surface neighboring these anchors. Comprehensive evaluation of our method demonstrated its successful applications in pocket detection and binding affinity prediction, which indicated that our anchor-based approach can provide effective protein feature representations for improving the prediction of protein-ligand interactions.
Collapse
Affiliation(s)
- Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Ziting Zhang
- Department of Automation, Tsinghua University, Beijing 100084, China; MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Ziheng Zou
- Silexon AI Technology, Nanjing, Jiangsu Province 210023, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
8
|
Eicher JE, Brom JA, Wang S, Sheiko SS, Atkin JM, Pielak GJ. Secondary structure and stability of a gel-forming tardigrade desiccation-tolerance protein. Protein Sci 2022; 31:e4495. [PMID: 36335581 PMCID: PMC9679978 DOI: 10.1002/pro.4495] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/26/2022] [Accepted: 11/02/2022] [Indexed: 11/08/2022]
Abstract
Protein-based pharmaceuticals are increasingly important, but their inherent instability necessitates a "cold chain" requiring costly refrigeration during production, shipment, and storage. Drying can overcome this problem, but most proteins need the addition of stabilizers, and some cannot be successfully formulated. Thus, there is a need for new, more effective protective molecules. Cytosolically, abundant heat-soluble proteins from tardigrades are both fundamentally interesting and a promising source of inspiration; these disordered, monodisperse polymers form hydrogels whose structure may protect client proteins during drying. We used attenuated total reflectance Fourier transform infrared spectroscopy, differential scanning calorimetry, and small-amplitude oscillatory shear rheometry to characterize gelation. A 5% (wt/vol) gel has a strength comparable with human skin, and melts cooperatively and reversibly near body temperature with an enthalpy comparable with globular proteins. We suggest that the dilute protein forms α-helical coiled coils and increasing their concentration drives gelation via intermolecular β-sheet formation.
Collapse
Affiliation(s)
- Jonathan E. Eicher
- Department of ChemistryUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Julia A. Brom
- Department of ChemistryUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Shikun Wang
- Department of ChemistryUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Sergei S. Sheiko
- Department of ChemistryUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Joanna M. Atkin
- Department of ChemistryUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| | - Gary J. Pielak
- Department of ChemistryUniversity of North Carolina at Chapel HillChapel HillNorth CarolinaUSA
| |
Collapse
|
9
|
Johansson-Åkhe I, Wallner B. Improving peptide-protein docking with AlphaFold-Multimer using forced sampling. FRONTIERS IN BIOINFORMATICS 2022; 2:959160. [PMID: 36304330 PMCID: PMC9580857 DOI: 10.3389/fbinf.2022.959160] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/16/2022] [Indexed: 12/02/2022] Open
Abstract
Protein interactions are key in vital biological processes. In many cases, particularly in regulation, this interaction is between a protein and a shorter peptide fragment. Such peptides are often part of larger disordered regions in other proteins. The flexible nature of peptides enables the rapid yet specific regulation of important functions in cells, such as their life cycle. Consequently, knowledge of the molecular details of peptide-protein interactions is crucial for understanding and altering their function, and many specialized computational methods have been developed to study them. The recent release of AlphaFold and AlphaFold-Multimer has led to a leap in accuracy for the computational modeling of proteins. In this study, the ability of AlphaFold to predict which peptides and proteins interact, as well as its accuracy in modeling the resulting interaction complexes, are benchmarked against established methods. We find that AlphaFold-Multimer predicts the structure of peptide-protein complexes with acceptable or better quality (DockQ ≥0.23) for 66 of the 112 complexes investigated-25 of which were high quality (DockQ ≥0.8). This is a massive improvement on previous methods with 23 or 47 acceptable models and only four or eight high quality models, when using energy-based docking or interaction templates, respectively. In addition, AlphaFold-Multimer can be used to predict whether a peptide and a protein will interact. At 1% false positives, AlphaFold-Multimer found 26% of the possible interactions with a precision of 85%, the best among the methods benchmarked. However, the most interesting result is the possibility of improving AlphaFold by randomly perturbing the neural network weights to force the network to sample more of the conformational space. This increases the number of acceptable models from 66 to 75 and improves the median DockQ from 0.47 to 0.55 (17%) for first ranked models. The best possible DockQ improves from 0.58 to 0.72 (24%), indicating that selecting the best possible model is still a challenge. This scheme of generating more structures with AlphaFold should be generally useful for many applications involving multiple states, flexible regions, and disorder.
Collapse
Affiliation(s)
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, Linköping, Sweden
| |
Collapse
|