1
|
Gillani M, Pollastri G. Impact of Alignments on the Accuracy of Protein Subcellular Localization Predictions. Proteins 2025; 93:745-759. [PMID: 39575640 PMCID: PMC11809130 DOI: 10.1002/prot.26767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 10/01/2024] [Accepted: 11/01/2024] [Indexed: 02/11/2025]
Abstract
Alignments in bioinformatics refer to the arrangement of sequences to identify regions of similarity that can indicate functional, structural, or evolutionary relationships. They are crucial for bioinformaticians as they enable accurate predictions and analyses in various applications, including protein subcellular localization. The predictive model used in this article is based on a deep - convolutional architecture. We tested configurations of Deep N-to-1 convolutional neural networks of various depths and widths during experimentation for the evaluation of better-performing values across a diverse set of eight classes. For without alignment assessment, sequences are encoded using one-hot encoding, converting each character into a numerical representation, which is straightforward for non-numerical data and useful for machine learning models. For with alignments assessment, multiple sequence alignments (MSAs) are created using PSI-BLAST, capturing evolutionary information by calculating frequencies of residues and gaps. The average difference in peak performance between models with alignments and without alignments is approximately 15.82%. The average difference in the highest accuracy achieved with alignments compared with without alignments is approximately 15.16%. Thus, extensive experimentation indicates that higher alignment accuracy implies a more reliable model and improved prediction accuracy, which can be trusted to deliver consistent performance across different layers and classes of subcellular localization predictions. This research provides valuable insights into prediction accuracies with and without alignments, offering bioinformaticians an effective tool for better understanding while potentially reducing the need for extensive experimental validations. The source code and datasets are available at http://distilldeep.ucd.ie/SCL8/.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer ScienceUniversity College Dublin (UCD)DublinIreland
| | | |
Collapse
|
2
|
Basmenj ER, Pajhouh SR, Ebrahimi Fallah A, naijian R, Rahimi E, Atighy H, Ghiabi S, Ghiabi S. Computational epitope-based vaccine design with bioinformatics approach; a review. Heliyon 2025; 11:e41714. [PMID: 39866399 PMCID: PMC11761309 DOI: 10.1016/j.heliyon.2025.e41714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 01/03/2025] [Indexed: 01/28/2025] Open
Abstract
The significance of vaccine development has gained heightened importance in light of the COVID-19 pandemic. In such critical circumstances, global citizens anticipate researchers in this field to swiftly identify a vaccine candidate to combat the pandemic's root cause. It is widely recognized that the vaccine design process is traditionally both time-consuming and costly. However, a specialized subfield within bioinformatics, known as "multi-epitope vaccine design" or "reverse vaccinology," has significantly decreased the time and costs of the vaccine design process. The methodology reverses itself in this subfield and finds a potential vaccine candidate by analyzing the pathogen's genome. Leveraging the tools available in this domain, we strive to pinpoint the most suitable antigen for crafting a vaccine against our target. Once the optimal antigen is identified, the next step involves uncovering epitopes within this antigen. The immune system recognizes particular areas of an antigen as epitopes. By characterizing these crucial segments, we gain the opportunity to design a vaccine centered around these epitopes. Subsequently, after identifying and assembling the vital epitopes with the assistance of linkers and adjuvants, our vaccine candidate can be formulated. Finally, employing computational techniques, we can thoroughly evaluate the designed vaccine. This review article comprehensively covers the entire multi-epitope vaccine development process, starting from obtaining the pathogen's genome to identifying the relevant vaccine candidate and concluding with an evaluation. Furthermore, we will delve into the essential tools needed at each stage, comparing and introducing them.
Collapse
Affiliation(s)
| | | | | | - Rafe naijian
- Student research committee, faculty of pharmacy, Mazandaran University of Medical Sciences, Sari, Iran
| | - Elmira Rahimi
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Hossein Atighy
- School of Pharmacy, Centro Escolar University, Manila, Philippines
| | - Shadan Ghiabi
- Faculty of Veterinary Medicine, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Shamim Ghiabi
- Tehran Azad University of Medical Sciences, Faculty of Pharmaceutical Sciences, Iran
| |
Collapse
|
3
|
Pitarch B, Pazos F. Deep Learning Approaches for the Prediction of Protein Functional Sites. Molecules 2025; 30:214. [PMID: 39860084 PMCID: PMC11767512 DOI: 10.3390/molecules30020214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Revised: 12/20/2024] [Accepted: 01/01/2025] [Indexed: 01/27/2025] Open
Abstract
Knowing which residues of a protein are important for its function is of paramount importance for understanding the molecular basis of this function and devising ways of modifying it for medical or biotechnological applications. Due to the difficulty in detecting these residues experimentally, prediction methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. Deep learning approaches are especially well suited for this task due to the large amounts of protein sequences for training them, the trivial codification of this sequence data to feed into these systems, and the intrinsic sequential nature of the data that makes them suitable for language models. As a consequence, deep learning-based approaches are being applied to the prediction of different types of functional sites and regions in proteins. This review aims to give an overview of the current landscape of methodologies so that interested users can have an idea of which kind of approaches are available for their proteins of interest. We also try to give an idea of how these systems work, as well as explain their limitations and high dependence on the training set so that users are aware of the quality of expected results.
Collapse
Affiliation(s)
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), 28049 Madrid, Spain;
| |
Collapse
|
4
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
5
|
Taha K. Employing Machine Learning Techniques to Detect Protein Function: A Survey, Experimental, and Empirical Evaluations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1965-1986. [PMID: 39008392 DOI: 10.1109/tcbb.2024.3427381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
This review article delves deeply into the various machine learning (ML) methods and algorithms employed in discerning protein functions. Each method discussed is assessed for its efficacy, limitations, potential improvements, and future prospects. We present an innovative hierarchical classification system that arranges algorithms into intricate categories and unique techniques. This taxonomy is based on a tri-level hierarchy, starting with the methodology category and narrowing down to specific techniques. Such a framework allows for a structured and comprehensive classification of algorithms, assisting researchers in understanding the interrelationships among diverse algorithms and techniques. The study incorporates both empirical and experimental evaluations to differentiate between the techniques. The empirical evaluation ranks the techniques based on four criteria. The experimental assessments rank: (1) individual techniques under the same methodology sub-category, (2) different sub-categories within the same category, and (3) the broad categories themselves. Integrating the innovative methodological classification, empirical findings, and experimental assessments, the article offers a well-rounded understanding of ML strategies in protein function identification. The paper also explores techniques for multi-task and multi-label detection of protein functions, in addition to focusing on single-task methods. Moreover, the paper sheds light on the future avenues of ML in protein function determination.
Collapse
|
6
|
Bai P, Li G, Luo J, Liang C. Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training. Brief Bioinform 2024; 25:bbae568. [PMID: 39489606 PMCID: PMC11531862 DOI: 10.1093/bib/bbae568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 09/24/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024] Open
Abstract
The functional study of proteins is a critical task in modern biology, playing a pivotal role in understanding the mechanisms of pathogenesis, developing new drugs, and discovering novel drug targets. However, existing computational models for subcellular localization face significant challenges, such as reliance on known Gene Ontology (GO) annotation databases or overlooking the relationship between GO annotations and subcellular localization. To address these issues, we propose DeepMTC, an end-to-end deep learning-based multi-task collaborative training model. DeepMTC integrates the interrelationship between subcellular localization and the functional annotation of proteins, leveraging multi-task collaborative training to eliminate dependence on known GO databases. This strategy gives DeepMTC a distinct advantage in predicting newly discovered proteins without prior functional annotations. First, DeepMTC leverages pre-trained language model with high accuracy to obtain the 3D structure and sequence features of proteins. Additionally, it employs a graph transformer module to encode protein sequence features, addressing the problem of long-range dependencies in graph neural networks. Finally, DeepMTC uses a functional cross-attention mechanism to efficiently combine upstream learned functional features to perform the subcellular localization task. The experimental results demonstrate that DeepMTC outperforms state-of-the-art models in both protein function prediction and subcellular localization. Moreover, interpretability experiments revealed that DeepMTC can accurately identify the key residues and functional domains of proteins, confirming its superior performance. The code and dataset of DeepMTC are freely available at https://github.com/ghli16/DeepMTC.
Collapse
Affiliation(s)
- Peihao Bai
- School of Information and Software Engineering, East China Jiaotong University, No. 808 Shuanggang East Road, Nanchang 330013, China
| | - Guanghui Li
- School of Information and Software Engineering, East China Jiaotong University, No. 808 Shuanggang East Road, Nanchang 330013, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, No. 2 Lushan Road, Changsha 410082, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, No. 1 University Road, Jinan 250358, China
- Shandong Key Laboratory of Biophysics, Dezhou University, No. 566 University Road, Dezhou 253023, China
| |
Collapse
|
7
|
Bhushan V, Nita-Lazar A. Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology. J Proteome Res 2024; 23:2700-2722. [PMID: 38451675 PMCID: PMC11296931 DOI: 10.1021/acs.jproteome.3c00839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The mammalian cell is a complex entity, with membrane-bound and membrane-less organelles playing vital roles in regulating cellular homeostasis. Organellar protein niches drive discrete biological processes and cell functions, thus maintaining cell equilibrium. Cellular processes such as signaling, growth, proliferation, motility, and programmed cell death require dynamic protein movements between cell compartments. Aberrant protein localization is associated with a wide range of diseases. Therefore, analyzing the subcellular proteome of the cell can provide a comprehensive overview of cellular biology. With recent advancements in mass spectrometry, imaging technology, computational tools, and deep machine learning algorithms, studies pertaining to subcellular protein localization and their dynamic distributions are gaining momentum. These studies reveal changing interaction networks because of "moonlighting proteins" and serve as a discovery tool for disease network mechanisms. Consequently, this review aims to provide a comprehensive repository for recent advancements in subcellular proteomics subcontexting methods, challenges, and future perspectives for method developers. In summary, subcellular proteomics is crucial to the understanding of the fundamental cellular mechanisms and the associated diseases.
Collapse
Affiliation(s)
- Vanya Bhushan
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Aleksandra Nita-Lazar
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
8
|
Gillani M, Pollastri G. SCLpred-ECL: Subcellular Localization Prediction by Deep N-to-1 Convolutional Neural Networks. Int J Mol Sci 2024; 25:5440. [PMID: 38791479 PMCID: PMC11121631 DOI: 10.3390/ijms25105440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 05/09/2024] [Accepted: 05/11/2024] [Indexed: 05/26/2024] Open
Abstract
The subcellular location of a protein provides valuable insights to bioinformaticians in terms of drug designs and discovery, genomics, and various other aspects of medical research. Experimental methods for protein subcellular localization determination are time-consuming and expensive, whereas computational methods, if accurate, would represent a much more efficient alternative. This article introduces an ab initio protein subcellular localization predictor based on an ensemble of Deep N-to-1 Convolutional Neural Networks. Our predictor is trained and tested on strict redundancy-reduced datasets and achieves 63% accuracy for the diverse number of classes. This predictor is a step towards bridging the gap between a protein sequence and the protein's function. It can potentially provide information about protein-protein interaction to facilitate drug design and processes like vaccine production that are essential to disease prevention.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), D04 V1W8 Dublin, Ireland;
| | | |
Collapse
|
9
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
10
|
Sharma L, Deepak A, Ranjan A, Krishnasamy G. A CNN-CBAM-BIGRU model for protein function prediction. Stat Appl Genet Mol Biol 2024; 23:sagmb-2024-0004. [PMID: 38943434 DOI: 10.1515/sagmb-2024-0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/07/2024] [Indexed: 07/01/2024]
Abstract
Understanding a protein's function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein's function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study's findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0 % for cellular components, +1.1 % for molecular functions, and +0.5 % for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4 % for the cellular component, +1.2 % for molecular functions, and +0.6 % for biological processes.
Collapse
Affiliation(s)
- Lavkush Sharma
- Department of Computer Science and Engineering, 230635 National Institute of Technology Patna , Patna, Bihar, India
| | - Akshay Deepak
- Department of Computer Science and Engineering, 230635 National Institute of Technology Patna , Patna, Bihar, India
| | - Ashish Ranjan
- Department of Computer Science and Engineering, C.V. Raman Global University, Bhubaneswar, Odisha, India
| | | |
Collapse
|
11
|
Savojardo C, Martelli PL, Casadio R. Finding functional motifs in protein sequences with deep learning and natural language models. Curr Opin Struct Biol 2023; 81:102641. [PMID: 37385080 DOI: 10.1016/j.sbi.2023.102641] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/17/2023] [Accepted: 05/24/2023] [Indexed: 07/01/2023]
Abstract
Recently, prediction of structural/functional motifs in protein sequences takes advantage of powerful machine learning based approaches. Protein encoding adopts protein language models overpassing standard procedures. Different combinations of machine learning and encoding schemas are available for predicting different structural/functional motifs. Particularly interesting is the adoption of protein language models to encode proteins in addition to evolution information and physicochemical parameters. A thorough analysis of recent predictors developed for annotating transmembrane regions, sorting signals, lipidation and phosphorylation sites allows to investigate the state-of-the-art focusing on the relevance of protein language models for the different tasks. This highlights that more experimental data are necessary to exploit available powerful machine learning methods.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy.
| |
Collapse
|
12
|
Yan TC, Yue ZX, Xu HQ, Liu YH, Hong YF, Chen GX, Tao L, Xie T. A systematic review of state-of-the-art strategies for machine learning-based protein function prediction. Comput Biol Med 2023; 154:106446. [PMID: 36680931 DOI: 10.1016/j.compbiomed.2022.106446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/07/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]
Abstract
New drug discovery is inseparable from the discovery of drug targets, and the vast majority of the known targets are proteins. At the same time, proteins are essential structural and functional elements of living cells necessary for the maintenance of all forms of life. Therefore, protein functions have become the focus of many pharmacological and biological studies. Traditional experimental techniques are no longer adequate for rapidly growing annotation of protein sequences, and approaches to protein function prediction using computational methods have emerged and flourished. A significant trend has been to use machine learning to achieve this goal. In this review, approaches to protein function prediction based on the sequence, structure, protein-protein interaction (PPI) networks, and fusion of multi-information sources are discussed. The current status of research on protein function prediction using machine learning is considered, and existing challenges and prominent breakthroughs are discussed to provide ideas and methods for future studies.
Collapse
Affiliation(s)
- Tian-Ci Yan
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Zi-Xuan Yue
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Hong-Quan Xu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yu-Hong Liu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yan-Feng Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Gong-Xing Chen
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicines, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China.
| |
Collapse
|
13
|
Nakai K, Wei L. Recent Advances in the Prediction of Subcellular Localization of Proteins and Related Topics. FRONTIERS IN BIOINFORMATICS 2022; 2:910531. [PMID: 36304291 PMCID: PMC9580943 DOI: 10.3389/fbinf.2022.910531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Prediction of subcellular localization of proteins from their amino acid sequences has a long history in bioinformatics and is still actively developing, incorporating the latest advances in machine learning and proteomics. Notably, deep learning-based methods for natural language processing have made great contributions. Here, we review recent advances in the field as well as its related fields, such as subcellular proteomics and the prediction/recognition of subcellular localization from image data.
Collapse
Affiliation(s)
- Kenta Nakai
- Institute of Medical Science, The University of Tokyo, Minato-Ku, Japan
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
14
|
Ranjan P, Das P. Understanding the impact of missense mutations on the structure and function of the EDA gene in X-linked hypohidrotic ectodermal dysplasia: A bioinformatics approach. J Cell Biochem 2021; 123:431-449. [PMID: 34817077 DOI: 10.1002/jcb.30186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/05/2021] [Accepted: 11/10/2021] [Indexed: 12/19/2022]
Abstract
X-linked hypohidrotic dysplasia (XLHED), caused by mutations in the EDA gene, is a rare genetic disease that affects the development and function of the teeth, hair, nails, and sweat glands. The structural and functional consequences of caused by an ectodysplasin-A (EDA) mutations on protein phenotype, stability, and posttranslational modifications (PTMs) have not been well investigated. The present investigation involves five missense mutations that cause XLHED (L56P, R155C, P220L, V251M, and V322A) in different domains of EDA (TM, furin, collagen, and tumor necrosis factor [TNF]) from previously published papers. The deleterious nature of EDA mutant variants was identified using several computational algorithm tools. The point mutations induce major drifts in the structural flexibility of EDA mutant variants and have a negative impact on their stability, according to the 3D protein modeling tool assay. Using the molecular docking technique, EDA/EDA variants were docked to 10 EDA interacting partners, retrieved from the STRING database. We found a novel biomarker CD68 by molecular docking analysis, suggesting all five EDA variants had lower affinity for EDAR, EDA2R, and CD68, implying that they would affect embryonic signaling between the ectodermal and mesodermal cell layers. In silico research such as gene ontology, subcellular localization, protein-protein interaction, and PTMs investigations indicates major functional alterations would occur in EDA variants. According to molecular simulations, EDA variants influence the structural conformation, compactness, stiffness, and function of the EDA protein. Further studies on cell line and animal models might be useful in determining their specific roles in functional annotations.
Collapse
Affiliation(s)
- Prashant Ranjan
- Centre for Genetic Disorders, Institute of Science, Banaras Hindu University, Varanasi, Uttar Pradesh, India
| | - Parimal Das
- Centre for Genetic Disorders, Institute of Science, Banaras Hindu University, Varanasi, Uttar Pradesh, India
| |
Collapse
|
15
|
Timmons PB, Hewage CM. APPTEST is a novel protocol for the automatic prediction of peptide tertiary structures. Brief Bioinform 2021; 22:bbab308. [PMID: 34396417 PMCID: PMC8575040 DOI: 10.1093/bib/bbab308] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/29/2023] Open
Abstract
Good knowledge of a peptide's tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5-40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
Collapse
Affiliation(s)
- Patrick Brendan Timmons
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Chandralal M Hewage
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
16
|
Timmons PB, Hewage CM. ENNAVIA is a novel method which employs neural networks for antiviral and anti-coronavirus activity prediction for therapeutic peptides. Brief Bioinform 2021; 22:bbab258. [PMID: 34297817 PMCID: PMC8575049 DOI: 10.1093/bib/bbab258] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 06/09/2021] [Accepted: 06/18/2021] [Indexed: 11/14/2022] Open
Abstract
Viruses represent one of the greatest threats to human health, necessitating the development of new antiviral drug candidates. Antiviral peptides often possess excellent biological activity and a favourable toxicity profile, and therefore represent a promising field of novel antiviral drugs. As the quantity of sequencing data grows annually, the development of an accurate in silico method for the prediction of peptide antiviral activities is important. This study leverages advances in deep learning and cheminformatics to produce a novel sequence-based deep neural network classifier for the prediction of antiviral peptide activity. The method outperforms the existent best-in-class, with an external test accuracy of 93.9%, Matthews correlation coefficient of 0.87 and an Area Under the Curve of 0.93 on the dataset of experimentally validated peptide activities. This cutting-edge classifier is available as an online web server at https://research.timmons.eu/ennavia, facilitating in silico screening and design of peptide antiviral drugs by the wider research community.
Collapse
Affiliation(s)
- Patrick Brendan Timmons
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| | - Chandralal M Hewage
- UCD School of Biomolecular and Biomedical Science, UCD Centre for Synthesis and Chemical Biology, UCD Conway Institute, University College Dublin, Dublin 4, Ireland
| |
Collapse
|
17
|
Jiang Y, Wang D, Wang W, Xu D. Computational methods for protein localization prediction. Comput Struct Biotechnol J 2021; 19:5834-5844. [PMID: 34765098 PMCID: PMC8564054 DOI: 10.1016/j.csbj.2021.10.023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 10/12/2021] [Accepted: 10/13/2021] [Indexed: 12/16/2022] Open
Abstract
The accurate annotation of protein localization is crucial in understanding protein function in tandem with a broad range of applications such as pathological analysis and drug design. Since most proteins do not have experimentally-determined localization information, the computational prediction of protein localization has been an active research area for more than two decades. In particular, recent machine-learning advancements have fueled the development of new methods in protein localization prediction. In this review paper, we first categorize the main features and algorithms used for protein localization prediction. Then, we summarize a list of protein localization prediction tools in terms of their coverage, characteristics, and accessibility to help users find suitable tools based on their needs. Next, we evaluate some of these tools on a benchmark dataset. Finally, we provide an outlook on the future exploration of protein localization methods.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Weiwei Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
18
|
Kaleel M, Ellinger L, Lalor C, Pollastri G, Mooney C. SCLpred-MEM: Subcellular localization prediction of membrane proteins by deep N-to-1 convolutional neural networks. Proteins 2021; 89:1233-1239. [PMID: 33983651 DOI: 10.1002/prot.26144] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 02/22/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
The knowledge of the subcellular location of a protein is a valuable source of information in genomics, drug design, and various other theoretical and analytical perspectives of bioinformatics. Due to the expensive and time-consuming nature of experimental methods of protein subcellular location determination, various computational methods have been developed for subcellular localization prediction. We introduce "SCLpred-MEM," an ab initio protein subcellular localization predictor, powered by an ensemble of Deep N-to-1 Convolutional Neural Networks (N1-NN) trained and tested on strict redundancy reduced datasets. SCLpred-MEM is available as a web-server predicting query proteins into two classes, membrane and non-membrane proteins. SCLpred-MEM achieves a Matthews correlation coefficient of 0.52 on a strictly homology-reduced independent test set and 0.62 on a less strict homology reduced independent test set, surpassing or matching other state-of-the-art subcellular localization predictors.
Collapse
Affiliation(s)
- Manaz Kaleel
- School of Computer Science, University College Dublin, Dublin, Ireland.,UCD Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Liam Ellinger
- Whitacre College of Engineering, Texas Tech University, Lubbock, Texas, USA
| | - Clodagh Lalor
- School of Computer Science, University College Dublin, Dublin, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Dublin, Ireland.,UCD Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Catherine Mooney
- School of Computer Science, University College Dublin, Dublin, Ireland
| |
Collapse
|
19
|
Sudhakar P, Machiels K, Verstockt B, Korcsmaros T, Vermeire S. Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions. Front Microbiol 2021; 12:618856. [PMID: 34046017 PMCID: PMC8148342 DOI: 10.3389/fmicb.2021.618856] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 03/19/2021] [Indexed: 12/11/2022] Open
Abstract
The microbiome, by virtue of its interactions with the host, is implicated in various host functions including its influence on nutrition and homeostasis. Many chronic diseases such as diabetes, cancer, inflammatory bowel diseases are characterized by a disruption of microbial communities in at least one biological niche/organ system. Various molecular mechanisms between microbial and host components such as proteins, RNAs, metabolites have recently been identified, thus filling many gaps in our understanding of how the microbiome modulates host processes. Concurrently, high-throughput technologies have enabled the profiling of heterogeneous datasets capturing community level changes in the microbiome as well as the host responses. However, due to limitations in parallel sampling and analytical procedures, big gaps still exist in terms of how the microbiome mechanistically influences host functions at a system and community level. In the past decade, computational biology and machine learning methodologies have been developed with the aim of filling the existing gaps. Due to the agnostic nature of the tools, they have been applied in diverse disease contexts to analyze and infer the interactions between the microbiome and host molecular components. Some of these approaches allow the identification and analysis of affected downstream host processes. Most of the tools statistically or mechanistically integrate different types of -omic and meta -omic datasets followed by functional/biological interpretation. In this review, we provide an overview of the landscape of computational approaches for investigating mechanistic interactions between individual microbes/microbiome and the host and the opportunities for basic and clinical research. These could include but are not limited to the development of activity- and mechanism-based biomarkers, uncovering mechanisms for therapeutic interventions and generating integrated signatures to stratify patients.
Collapse
Affiliation(s)
- Padhmanand Sudhakar
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Kathleen Machiels
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
| | - Bram Verstockt
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| | - Tamas Korcsmaros
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Séverine Vermeire
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| |
Collapse
|
20
|
Abstract
The elucidation of the subcellular localization of proteins is very important in order to deeply understand their functions. In fact, proteins activities are strictly correlated to the cellular compartment and microenvironment in which they are present.In recent years, several effective and reliable proteomics techniques and computational methods have been developed and implemented in order to identify the proteins subcellular localization. This process is often time-consuming and expensive, but the recent technological and bioinformatics progress allowed the development of more accurate and simple workflows to determine the localization, interactions, and functions of proteins.In the following chapter, a brief introduction on the importance of knowing subcellular localization of proteins will be presented. Then, sample preparation protocols, proteomic methods, data analysis strategies, and software for the prediction of proteins localization will be presented and discussed. Finally, the more recent and advanced spatial proteomics techniques will be shown.
Collapse
Affiliation(s)
- Elettra Barberis
- Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy
| | - Emilio Marengo
- Department of Sciences and Technological Innovation, University of Piemonte Orientale, Alessandria, Italy
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy
| | - Marcello Manfredi
- Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy.
- Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy.
| |
Collapse
|
21
|
Oh J, Wilson M, Hill K, Leftley N, Hodgman C, Bennett MJ, Swarup R. Arabidopsis antibody resources for functional studies in plants. Sci Rep 2020; 10:21945. [PMID: 33319797 PMCID: PMC7738516 DOI: 10.1038/s41598-020-78689-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 11/20/2020] [Indexed: 11/29/2022] Open
Abstract
Here we report creation of a unique and a very valuable resource for Plant Scientific community worldwide. In this era of post-genomics and modelling of multi-cellular systems using an integrative systems biology approach, better understanding of protein localization at sub-cellular, cellular and tissue levels is likely to result in better understanding of their function and role in cell and tissue dynamics, protein–protein interactions and protein regulatory networks. We have raised 94 antibodies against key Arabidopsis root proteins, using either small peptides or recombinant proteins. The success rate with the peptide antibodies was very low. We show that affinity purification of antibodies massively improved the detection rate. Of 70 protein antibodies, 38 (55%) antibodies could detect a signal with high confidence and 22 of these antibodies are of immunocytochemistry grade. The targets include key proteins involved in hormone synthesis, transport and perception, membrane trafficking related proteins and several sub cellular marker proteins. These antibodies are available from the Nottingham Arabidopsis Stock Centre.
Collapse
Affiliation(s)
- Jaesung Oh
- School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK.,Plasma Technology Research Center, National Fusion Research Institute, Gunsan, Jeollabuk-do, 573-540, Republic of Korea
| | - Michael Wilson
- School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK
| | - Kristine Hill
- School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK
| | - Nicola Leftley
- School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK
| | - Charlie Hodgman
- School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK
| | - Malcolm J Bennett
- School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK
| | - Ranjan Swarup
- School of Biosciences and Centre for Plant Integrative Biology, University of Nottingham, Nottingham, UK.
| |
Collapse
|
22
|
Kumar R, Dhanda SK. Bird Eye View of Protein Subcellular Localization Prediction. Life (Basel) 2020; 10:E347. [PMID: 33327400 PMCID: PMC7764902 DOI: 10.3390/life10120347] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022] Open
Abstract
Proteins are made up of long chain of amino acids that perform a variety of functions in different organisms. The activity of the proteins is determined by the nucleotide sequence of their genes and by its 3D structure. In addition, it is essential for proteins to be destined to their specific locations or compartments to perform their structure and functions. The challenge of computational prediction of subcellular localization of proteins is addressed in various in silico methods. In this review, we reviewed the progress in this field and offered a bird eye view consisting of a comprehensive listing of tools, types of input features explored, machine learning approaches employed, and evaluation matrices applied. We hope the review will be useful for the researchers working in the field of protein localization predictions.
Collapse
Affiliation(s)
- Ravindra Kumar
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Sandeep Kumar Dhanda
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
23
|
DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment. Int J Mol Sci 2020; 21:ijms21165710. [PMID: 32784927 PMCID: PMC7460811 DOI: 10.3390/ijms21165710] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 08/05/2020] [Accepted: 08/07/2020] [Indexed: 12/18/2022] Open
Abstract
Mitochondrial proteins are physiologically active in different compartments, and their abnormal location will trigger the pathogenesis of human mitochondrial pathologies. Correctly identifying submitochondrial locations can provide information for disease pathogenesis and drug design. A mitochondrion has four submitochondrial compartments, the matrix, the outer membrane, the inner membrane, and the intermembrane space, but various existing studies ignored the intermembrane space. The majority of researchers used traditional machine learning methods for predicting mitochondrial protein localization. Those predictors required expert-level knowledge of biology to be encoded as features rather than allowing the underlying predictor to extract features through a data-driven procedure. Besides, few researchers have considered the imbalance in datasets. In this paper, we propose a novel end-to-end predictor employing deep neural networks, DeepPred-SubMito, for protein submitochondrial location prediction. First, we utilize random over-sampling to decrease the influence caused by unbalanced datasets. Next, we train a multi-channel bilayer convolutional neural network for multiple subsequences to learn high-level features. Third, the prediction result is outputted through the fully connected layer. The performance of the predictor is measured by 10-fold cross-validation and 5-fold cross-validation on the SM424-18 dataset and the SubMitoPred dataset, respectively. Experimental results show that the predictor outperforms state-of-the-art predictors. In addition, the prediction of results in the M983 dataset also confirmed its effectiveness in predicting submitochondrial locations.
Collapse
|