1
|
Bhushan V, Nita-Lazar A. Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology. J Proteome Res 2024; 23:2700-2722. [PMID: 38451675 PMCID: PMC11296931 DOI: 10.1021/acs.jproteome.3c00839] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
The mammalian cell is a complex entity, with membrane-bound and membrane-less organelles playing vital roles in regulating cellular homeostasis. Organellar protein niches drive discrete biological processes and cell functions, thus maintaining cell equilibrium. Cellular processes such as signaling, growth, proliferation, motility, and programmed cell death require dynamic protein movements between cell compartments. Aberrant protein localization is associated with a wide range of diseases. Therefore, analyzing the subcellular proteome of the cell can provide a comprehensive overview of cellular biology. With recent advancements in mass spectrometry, imaging technology, computational tools, and deep machine learning algorithms, studies pertaining to subcellular protein localization and their dynamic distributions are gaining momentum. These studies reveal changing interaction networks because of "moonlighting proteins" and serve as a discovery tool for disease network mechanisms. Consequently, this review aims to provide a comprehensive repository for recent advancements in subcellular proteomics subcontexting methods, challenges, and future perspectives for method developers. In summary, subcellular proteomics is crucial to the understanding of the fundamental cellular mechanisms and the associated diseases.
Collapse
Affiliation(s)
- Vanya Bhushan
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Aleksandra Nita-Lazar
- Functional Cellular Networks Section, Laboratory of Immune System Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| |
Collapse
|
2
|
Wen JW, Zhang HL, Du PF. Vislocas: Vision transformers for identifying protein subcellular mis-localization signatures of different cancer subtypes from immunohistochemistry images. Comput Biol Med 2024; 174:108392. [PMID: 38608321 DOI: 10.1016/j.compbiomed.2024.108392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 03/22/2024] [Accepted: 04/01/2024] [Indexed: 04/14/2024]
Abstract
Proteins must be sorted to specific subcellular compartments to perform their functions. Abnormal protein subcellular localizations are related to many diseases. Although many efforts have been made in predicting protein subcellular localization from various static information, including sequences, structures and interactions, such static information cannot predict protein mis-localization events in diseases. On the contrary, the IHC (immunohistochemistry) images, which have been widely applied in clinical diagnosis, contains information that can be used to find protein mis-localization events in disease states. In this study, we create the Vislocas method, which is capable of finding mis-localized proteins from IHC images as markers of cancer subtypes. By combining CNNs and vision transformer encoders, Vislocas can automatically extract image features at both global and local level. Vislocas can be trained with full-sized IHC images from scratch. It is the first attempt to create an end-to-end IHC image-based protein subcellular location predictor. Vislocas achieved comparable or better performances than state-of-the-art methods. We applied Vislocas to find significant protein mis-localization events in different subtypes of glioma, melanoma and skin cancer. The mis-localized proteins, which were found purely from IHC images by Vislocas, are in consistency with clinical or experimental results in literatures. All codes of Vislocas have been deposited in a Github repository (https://github.com/JingwenWen99/Vislocas). All datasets of Vislocas have been deposited in Zenodo (https://zenodo.org/records/10632698).
Collapse
Affiliation(s)
- Jing-Wen Wen
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Han-Lin Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China.
| |
Collapse
|
3
|
Wang J, Zhou H, Wang Y, Xu M, Yu Y, Wang J, Liu Y. Prediction of submitochondrial proteins localization based on Gene Ontology. Comput Biol Med 2023; 167:107589. [PMID: 37883850 DOI: 10.1016/j.compbiomed.2023.107589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/28/2023] [Accepted: 10/17/2023] [Indexed: 10/28/2023]
Abstract
Mitochondria, which are double-membrane bound organelles commonly found in eukaryotic cells, play a fundamental role as sites for cellular energy production. Within the mitochondria, there exist substructures called submitochondria, and specific proteins associated with submitochondria have been implicated in various human diseases. Therefore, comprehending the precise localization of these submitochondrial proteins is of utmost importance. Such knowledge not only aids in unraveling their role in the pathogenesis of diseases but also facilitates the development of therapeutic drugs and diagnostic methods. In this study, we proposed a novel method based on Gene Ontology (GO) to predict the localization of the submitochondrial proteins, called GO-Submito. More specifically, the GO-Submito fine-tuned pre-training Bidirectional Encoder Representations from Transformers models to encode GO annotations into vectors. Subsequently, the Multi-head Attention Mechanism was employed to fuse these encoded vectors of GO annotations, enabling precise localization prediction. Through comprehensive evaluation, our results demonstrated that GO-Submito outperforms existing methods, offering a reliable and efficient tool for precisely localizing submitochondrial proteins.
Collapse
Affiliation(s)
- Jingyu Wang
- Department of Epidemiology, School of Public Health, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
| | - Haihang Zhou
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
| | - Yuxiang Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
| | - Mengdie Xu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China.
| | - Yun Yu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China; Institute of Medical Informatics and Management, Nanjing Medical University, 101 Longmian Avenu, Nanjing, 210029, Jiangsu, China.
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China; Institute of Medical Informatics and Management, Nanjing Medical University, 101 Longmian Avenu, Nanjing, 210029, Jiangsu, China.
| | - Yun Liu
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, 101 Longmian Avenue, Nanjing, 211166, Jiangsu, China; Department of Information, the First Affiliated Hospital, Nanjing Medical University, No. 300 Guang Zhou Road, Nanjing, 210029, Jiangsu, China; Institute of Medical Informatics and Management, Nanjing Medical University, 101 Longmian Avenu, Nanjing, 210029, Jiangsu, China.
| |
Collapse
|
4
|
Cong H, Liu H, Cao Y, Chen Y, Liang C. Multiple Protein Subcellular Locations Prediction Based on Deep Convolutional Neural Networks with Self-Attention Mechanism. Interdiscip Sci 2022; 14:421-438. [PMID: 35066812 DOI: 10.1007/s12539-021-00496-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Revised: 12/06/2021] [Accepted: 12/13/2021] [Indexed: 12/12/2022]
Abstract
As an important research field in bioinformatics, protein subcellular location prediction is critical to reveal the protein functions and provide insightful information for disease diagnosis and drug development. Predicting protein subcellular locations remains a challenging task due to the difficulty of finding representative features and robust classifiers. Many feature fusion methods have been widely applied to tackle the above issues. However, they still suffer from accuracy loss due to feature redundancy. Furthermore, multiple protein subcellular locations prediction is more complicated since it is fundamentally a multi-label classification problem. The traditional binary classifiers or even multi-class classifiers cannot achieve satisfactory results. This paper proposes a novel method for protein subcellular location prediction with both single and multiple sites based on deep convolutional neural networks. Specifically, we first obtain the integrated features by simultaneously considering the pseudo amino acid, amino acid index distribution, and physicochemical property. We then adopt deep convolutional neural networks to extract high-dimensional features from the fused feature, removing the redundant preliminary features and gaining better representations of the raw sequences. Moreover, we use the self-attention mechanism and a customized loss function to ensure that the model is more inclined to positive data. In addition, we use random k-label sets to reduce the number of prediction labels. Meanwhile, we employ a hybrid strategy of over-sampling and under-sampling to tackle the data imbalance problem. We compare our model with three representative classification alternatives. The experiment results show that our model achieves the best performance in terms of accuracy, demonstrating the efficacy of the proposed model.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent, Computing University of Jinan, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| |
Collapse
|
5
|
Jiang Y, Wang D, Wang W, Xu D. Computational methods for protein localization prediction. Comput Struct Biotechnol J 2021; 19:5834-5844. [PMID: 34765098 PMCID: PMC8564054 DOI: 10.1016/j.csbj.2021.10.023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 10/12/2021] [Accepted: 10/13/2021] [Indexed: 12/16/2022] Open
Abstract
The accurate annotation of protein localization is crucial in understanding protein function in tandem with a broad range of applications such as pathological analysis and drug design. Since most proteins do not have experimentally-determined localization information, the computational prediction of protein localization has been an active research area for more than two decades. In particular, recent machine-learning advancements have fueled the development of new methods in protein localization prediction. In this review paper, we first categorize the main features and algorithms used for protein localization prediction. Then, we summarize a list of protein localization prediction tools in terms of their coverage, characteristics, and accessibility to help users find suitable tools based on their needs. Next, we evaluate some of these tools on a benchmark dataset. Finally, we provide an outlook on the future exploration of protein localization methods.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Duolin Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Weiwei Wang
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| |
Collapse
|
6
|
Computer-Aided Prediction of Protein Mitochondrial Localization. Methods Mol Biol 2021; 2275:433-452. [PMID: 34118055 DOI: 10.1007/978-1-0716-1262-0_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2023]
Abstract
Protein sequences, directly translated from genomic data, need functional and structural annotation. Together with molecular function and biological process, subcellular localization is an important feature necessary for understanding the protein role and the compartment where the mature protein is active. In the case of mitochondrial proteins, their precursor sequences translated by the ribosome machinery include specific patterns from which it is possible not only to recognize their final destination within the organelle but also which of the mitochondrial subcompartments the protein is intended for. Four compartments are routinely discriminated, including the inner and the outer membranes, the intermembrane space, and the matrix. Here we discuss to which extent it is feasible to develop computational methods for detecting mitochondrial targeting peptides in the precursor sequence and to discriminate their final destination in the organelle. We benchmark two of our methods on the general task of recognizing human mitochondrial proteins endowed with an experimentally characterized targeting peptide (TPpred3) and predicting which submitochondrial compartment is the final destination (DeepMito). We describe how to adopt our web servers in order to discriminate which human proteins are endowed with mitochondrial targeting peptides, the position of cleavage sites, and which submitochondrial compartment are intended for. By this, we add some other 1788 human proteins to the 450 ones already manually annotated in UniProt with a mitochondrial targeting peptide, providing for each of them also the characterization of the suborganellar localization.
Collapse
|
7
|
Hou Z, Yang Y, Li H, Wong KC, Li X. iDeepSubMito: identification of protein submitochondrial localization with deep learning. Brief Bioinform 2021; 22:6332322. [PMID: 34337657 DOI: 10.1093/bib/bbab288] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 05/25/2021] [Accepted: 06/05/2021] [Indexed: 01/09/2023] Open
Abstract
Mitochondria are membrane-bound organelles containing over 1000 different proteins involved in mitochondrial function, gene expression and metabolic processes. Accurate localization of those proteins in the mitochondrial compartments is critical to their operation. A few computational methods have been developed for predicting submitochondrial localization from the protein sequences. Unfortunately, most of these computational methods focus on employing biological features or evolutionary information to extract sequence features, which greatly limits the performance of subsequent identification. Moreover, the efficiency of most computational models is still under explored, especially the deep learning feature, which is promising but requires improvement. To address these limitations, we propose a novel computational method called iDeepSubMito to predict the location of mitochondrial proteins to the submitochondrial compartments. First, we adopted a coding scheme using the ProteinELMo to model the probability distribution over the protein sequences and then represent the protein sequences as continuous vectors. Then, we proposed and implemented convolutional neural network architecture based on the bidirectional LSTM with self-attention mechanism, to effectively explore the contextual information and protein sequence semantic features. To demonstrate the effectiveness of our proposed iDeepSubMito, we performed cross-validation on two datasets containing 424 proteins and 570 proteins respectively, and consisting of four different mitochondrial compartments (matrix, inner membrane, outer membrane and intermembrane regions). Experimental results revealed that our method outperformed other computational methods. In addition, we tested iDeepSubMito on the M187, M983 and MitoCarta3.0 to further verify the efficiency of our method. Finally, the motif analysis and the interpretability analysis were conducted to reveal novel insights into subcellular biological functions of mitochondrial proteins. iDeepSubMito source code is available on GitHub at https://github.com/houzl3416/iDeepSubMito.
Collapse
Affiliation(s)
- Zilong Hou
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuning Yang
- Information Science and Technology, Northeast Normal University, Jilin, China
| | - Hui Li
- Department of Computer science, City University of Hong Kong, Hong Kong SAR
| | - Ka-Chun Wong
- Department of Computer science, City University of Hong Kong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China
| |
Collapse
|
8
|
Anteghini M, Martins dos Santos V, Saccenti E. In-Pero: Exploiting Deep Learning Embeddings of Protein Sequences to Predict the Localisation of Peroxisomal Proteins. Int J Mol Sci 2021; 22:6409. [PMID: 34203866 PMCID: PMC8232616 DOI: 10.3390/ijms22126409] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 05/31/2021] [Accepted: 06/09/2021] [Indexed: 01/28/2023] Open
Abstract
Peroxisomes are ubiquitous membrane-bound organelles, and aberrant localisation of peroxisomal proteins contributes to the pathogenesis of several disorders. Many computational methods focus on assigning protein sequences to subcellular compartments, but there are no specific tools tailored for the sub-localisation (matrix vs. membrane) of peroxisome proteins. We present here In-Pero, a new method for predicting protein sub-peroxisomal cellular localisation. In-Pero combines standard machine learning approaches with recently proposed multi-dimensional deep-learning representations of the protein amino-acid sequence. It showed a classification accuracy above 0.9 in predicting peroxisomal matrix and membrane proteins. The method is trained and tested using a double cross-validation approach on a curated data set comprising 160 peroxisomal proteins with experimental evidence for sub-peroxisomal localisation. We further show that the proposed approach can be easily adapted (In-Mito) to the prediction of mitochondrial protein localisation obtaining performances for certain classes of proteins (matrix and inner-membrane) superior to existing tools.
Collapse
Affiliation(s)
- Marco Anteghini
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
- LifeGlimmer GmbH, 12163 Berlin, Germany
| | - Vitor Martins dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
- LifeGlimmer GmbH, 12163 Berlin, Germany
| | - Edoardo Saccenti
- Laboratory of Systems and Synthetic Biology, Wageningen University & Research, Stippeneng 4, 6708 WE Wageningen, The Netherlands;
| |
Collapse
|
9
|
Wattanapornprom W, Thammarongtham C, Hongsthong A, Lertampaiporn S. Ensemble of Multiple Classifiers for Multilabel Classification of Plant Protein Subcellular Localization. Life (Basel) 2021; 11:life11040293. [PMID: 33808227 PMCID: PMC8066735 DOI: 10.3390/life11040293] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/16/2021] [Accepted: 03/25/2021] [Indexed: 12/17/2022] Open
Abstract
The accurate prediction of protein localization is a critical step in any functional genome annotation process. This paper proposes an improved strategy for protein subcellular localization prediction in plants based on multiple classifiers, to improve prediction results in terms of both accuracy and reliability. The prediction of plant protein subcellular localization is challenging because the underlying problem is not only a multiclass, but also a multilabel problem. Generally, plant proteins can be found in 10–14 locations/compartments. The number of proteins in some compartments (nucleus, cytoplasm, and mitochondria) is generally much greater than that in other compartments (vacuole, peroxisome, Golgi, and cell wall). Therefore, the problem of imbalanced data usually arises. Therefore, we propose an ensemble machine learning method based on average voting among heterogeneous classifiers. We first extracted various types of features suitable for each type of protein localization to form a total of 479 feature spaces. Then, feature selection methods were used to reduce the dimensions of the features into smaller informative feature subsets. This reduced feature subset was then used to train/build three different individual models. In the process of combining the three distinct classifier models, we used an average voting approach to combine the results of these three different classifiers that we constructed to return the final probability prediction. The method could predict subcellular localizations in both single- and multilabel locations, based on the voting probability. Experimental results indicated that the proposed ensemble method could achieve correct classification with an overall accuracy of 84.58% for 11 compartments, on the basis of the testing dataset.
Collapse
Affiliation(s)
- Warin Wattanapornprom
- Applied Computer Science Program, Department of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi, Bangkok 10140, Thailand;
| | - Chinae Thammarongtham
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Apiradee Hongsthong
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
| | - Supatcha Lertampaiporn
- Biochemical Engineering and Systems Biology Research Group, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency at King Mongkut’s University of Technology Thonburi, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand; (C.T.); (A.H.)
- Correspondence:
| |
Collapse
|
10
|
Towards a systems-level understanding of mitochondrial biology. Cell Calcium 2021; 95:102364. [PMID: 33601101 DOI: 10.1016/j.ceca.2021.102364] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 01/22/2021] [Accepted: 01/23/2021] [Indexed: 11/21/2022]
Abstract
Human mitochondria are complex and highly dynamic biological systems, comprised of over a thousand parts and evolved to fully integrate into the specialized intracellular signaling networks and metabolic requirements of each cell and organ. Over the last two decades, several complementary, top-down computational and experimental approaches have been developed to identify, characterize and modulate the human mitochondrial system, demonstrating the power of integrating classical reductionist and discovery-driven analyses in order to de-orphanize hitherto unknown molecular components of mitochondrial machineries and pathways. To this goal, systematic, multiomics-based surveys of proteome composition, protein networks, and phenotype-to-pathway associations at the tissue, cell and organellar level have been largely exploited to predict the full complement of mitochondrial proteins and their functional interactions, therefore catalyzing data-driven hypotheses. Collectively, these multidisciplinary and integrative research approaches hold the potential to propel our understanding of mitochondrial biology and provide a systems-level framework to unraveling mitochondria-mediated and disease-spanning pathomechanisms.
Collapse
|
11
|
Kumar R, Dhanda SK. Bird Eye View of Protein Subcellular Localization Prediction. Life (Basel) 2020; 10:E347. [PMID: 33327400 PMCID: PMC7764902 DOI: 10.3390/life10120347] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022] Open
Abstract
Proteins are made up of long chain of amino acids that perform a variety of functions in different organisms. The activity of the proteins is determined by the nucleotide sequence of their genes and by its 3D structure. In addition, it is essential for proteins to be destined to their specific locations or compartments to perform their structure and functions. The challenge of computational prediction of subcellular localization of proteins is addressed in various in silico methods. In this review, we reviewed the progress in this field and offered a bird eye view consisting of a comprehensive listing of tools, types of input features explored, machine learning approaches employed, and evaluation matrices applied. We hope the review will be useful for the researchers working in the field of protein localization predictions.
Collapse
Affiliation(s)
- Ravindra Kumar
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Sandeep Kumar Dhanda
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
12
|
Large-scale prediction and analysis of protein sub-mitochondrial localization with DeepMito. BMC Bioinformatics 2020; 21:266. [PMID: 32938368 PMCID: PMC7493403 DOI: 10.1186/s12859-020-03617-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 06/18/2020] [Indexed: 12/31/2022] Open
Abstract
Background The prediction of protein subcellular localization is a key step of the big effort towards protein functional annotation. Many computational methods exist to identify high-level protein subcellular compartments such as nucleus, cytoplasm or organelles. However, many organelles, like mitochondria, have their own internal compartmentalization. Knowing the precise location of a protein inside mitochondria is crucial for its accurate functional characterization. We recently developed DeepMito, a new method based on a 1-Dimensional Convolutional Neural Network (1D-CNN) architecture outperforming other similar approaches available in literature. Results Here, we explore the adoption of DeepMito for the large-scale annotation of four sub-mitochondrial localizations on mitochondrial proteomes of five different species, including human, mouse, fly, yeast and Arabidopsis thaliana. A significant fraction of the proteins from these organisms lacked experimental information about sub-mitochondrial localization. We adopted DeepMito to fill the gap, providing complete characterization of protein localization at sub-mitochondrial level for each protein of the five proteomes. Moreover, we identified novel mitochondrial proteins fishing on the set of proteins lacking any subcellular localization annotation using available state-of-the-art subcellular localization predictors. We finally performed additional functional characterization of proteins predicted by DeepMito as localized into the four different sub-mitochondrial compartments using both available experimental and predicted GO terms. All data generated in this study were collected into a database called DeepMitoDB (available at http://busca.biocomp.unibo.it/deepmitodb), providing complete functional characterization of 4307 mitochondrial proteins from the five species. Conclusions DeepMitoDB offers a comprehensive view of mitochondrial proteins, including experimental and predicted fine-grain sub-cellular localization and annotated and predicted functional annotations. The database complements other similar resources providing characterization of new proteins. Furthermore, it is also unique in including localization information at the sub-mitochondrial level. For this reason, we believe that DeepMitoDB can be a valuable resource for mitochondrial research.
Collapse
|
13
|
Bian H, Guo M, Wang J. Recognition of Mitochondrial Proteins in Plasmodium Based on the Tripeptide Composition. Front Cell Dev Biol 2020; 8:578901. [PMID: 33043014 PMCID: PMC7525148 DOI: 10.3389/fcell.2020.578901] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 08/13/2020] [Indexed: 01/31/2023] Open
Abstract
Mitochondria play essential roles in eukaryotic cells, especially in Plasmodium cells. They have several unusual evolutionary and functional features that are incredibly vital for disease diagnosis and drug design. Thus, predicting mitochondrial proteins of Plasmodium has become a worthwhile work. However, existing computational methods can only predict mitochondrial proteins of Plasmodium falciparum (P. falciparum for short), and these methods have low accuracy. It is highly desirable to design a classifier with high accuracy for predicting mitochondrial proteins for all Plasmodium species, not only P. falciparum. We proposed a novel method, named as PM-OTC, for predicting mitochondrial proteins in Plasmodium. PM-OTC uses the Support Vector Machine (SVM) as the classifier and the selected tripeptide composition as the features. We adopted the 5-fold cross-validation method to train and test PM-OTC. Results demonstrate that PM-OTC achieves an accuracy of 94.91%, and performances of PM-OTC are superior to other methods.
Collapse
Affiliation(s)
- Haodong Bian
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China.,Stage Key Laboratories of Reproductive Regulation & Breeding of Grassland Livestock, Hohhot, China
| |
Collapse
|
14
|
Savojardo C, Bruciaferri N, Tartari G, Martelli PL, Casadio R. DeepMito: accurate prediction of protein sub-mitochondrial localization using convolutional neural networks. Bioinformatics 2020; 36:56-64. [PMID: 31218353 PMCID: PMC6956790 DOI: 10.1093/bioinformatics/btz512] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 05/31/2019] [Accepted: 06/17/2019] [Indexed: 11/18/2022] Open
Abstract
Motivation The correct localization of proteins in cell compartments is a key issue for their function. Particularly, mitochondrial proteins are physiologically active in different compartments and their aberrant localization contributes to the pathogenesis of human mitochondrial pathologies. Many computational methods exist to assign protein sequences to subcellular compartments such as nucleus, cytoplasm and organelles. However, a substantial lack of experimental evidence in public sequence databases hampered so far a finer grain discrimination, including also intra-organelle compartments. Results We describe DeepMito, a novel method for predicting protein sub-mitochondrial cellular localization. Taking advantage of powerful deep-learning approaches, such as convolutional neural networks, our method is able to achieve very high prediction performances when discriminating among four different mitochondrial compartments (matrix, outer, inner and intermembrane regions). The method is trained and tested in cross-validation on a newly generated, high-quality dataset comprising 424 mitochondrial proteins with experimental evidence for sub-organelle localizations. We benchmark DeepMito towards the only one recent approach developed for the same task. Results indicate that DeepMito performances are superior. Finally, genomic-scale prediction on a highly-curated dataset of human mitochondrial proteins further confirms the effectiveness of our approach and suggests that DeepMito is a good candidate for genome-scale annotation of mitochondrial protein subcellular localization. Availability and implementation The DeepMito web server as well as all datasets used in this study are available at http://busca.biocomp.unibo.it/deepmito. A standalone version of DeepMito is available on DockerHub at https://hub.docker.com/r/bolognabiocomp/deepmito. DeepMito source code is available on GitHub at https://github.com/BolognaBiocomp/deepmito Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Niccolò Bruciaferri
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Giacomo Tartari
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
15
|
DeepPred-SubMito: A Novel Submitochondrial Localization Predictor Based on Multi-Channel Convolutional Neural Network and Dataset Balancing Treatment. Int J Mol Sci 2020; 21:ijms21165710. [PMID: 32784927 PMCID: PMC7460811 DOI: 10.3390/ijms21165710] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 08/05/2020] [Accepted: 08/07/2020] [Indexed: 12/18/2022] Open
Abstract
Mitochondrial proteins are physiologically active in different compartments, and their abnormal location will trigger the pathogenesis of human mitochondrial pathologies. Correctly identifying submitochondrial locations can provide information for disease pathogenesis and drug design. A mitochondrion has four submitochondrial compartments, the matrix, the outer membrane, the inner membrane, and the intermembrane space, but various existing studies ignored the intermembrane space. The majority of researchers used traditional machine learning methods for predicting mitochondrial protein localization. Those predictors required expert-level knowledge of biology to be encoded as features rather than allowing the underlying predictor to extract features through a data-driven procedure. Besides, few researchers have considered the imbalance in datasets. In this paper, we propose a novel end-to-end predictor employing deep neural networks, DeepPred-SubMito, for protein submitochondrial location prediction. First, we utilize random over-sampling to decrease the influence caused by unbalanced datasets. Next, we train a multi-channel bilayer convolutional neural network for multiple subsequences to learn high-level features. Third, the prediction result is outputted through the fully connected layer. The performance of the predictor is measured by 10-fold cross-validation and 5-fold cross-validation on the SM424-18 dataset and the SubMitoPred dataset, respectively. Experimental results show that the predictor outperforms state-of-the-art predictors. In addition, the prediction of results in the M983 dataset also confirmed its effectiveness in predicting submitochondrial locations.
Collapse
|
16
|
MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM. Processes (Basel) 2020. [DOI: 10.3390/pr8060725] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Mitochondrial proteins of Plasmodium falciparum (MPPF) are an important target for anti-malarial drugs, but their identification through manual experimentation is costly, and in turn, their related drugs production by pharmaceutical institutions involves a prolonged time duration. Therefore, it is highly desirable for pharmaceutical companies to develop computationally automated and reliable approach to identify proteins precisely, resulting in appropriate drug production in a timely manner. In this direction, several computationally intelligent techniques are developed to extract local features from biological sequences using machine learning methods followed by various classifiers to discriminate the nature of proteins. Unfortunately, these techniques demonstrate poor performance while capturing contextual features from sequence patterns, yielding non-representative classifiers. In this paper, we proposed a sequence-based framework to extract deep and representative features that are trust-worthy for Plasmodium mitochondrial proteins identification. The backbone of the proposed framework is MPPF identification-net (MPPFI-Net), that is based on a convolutional neural network (CNN) with multilayer bi-directional long short-term memory (MBD-LSTM). MPPIF-Net inputs protein sequences, passes through various convolution and pooling layers to optimally extract learned features. We pass these features into our sequence learning mechanism, MBD-LSTM, that is particularly trained to classify them into their relevant classes. Our proposed model is experimentally evaluated on newly prepared dataset PF2095 and two existing benchmark datasets i.e., PF175 and MPD using the holdout method. The proposed method achieved 97.6%, 97.1%, and 99.5% testing accuracy on PF2095, PF175, and MPD datasets, respectively, which outperformed the state-of-the-art approaches.
Collapse
|
17
|
Gao J, Miao Z, Zhang Z, Wei H, Kurgan L. Prediction of Ion Channels and their Types from Protein Sequences: Comprehensive Review and Comparative Assessment. Curr Drug Targets 2020; 20:579-592. [PMID: 30360734 DOI: 10.2174/1389450119666181022153942] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2018] [Revised: 10/03/2018] [Accepted: 10/04/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND Ion channels are a large and growing protein family. Many of them are associated with diseases, and consequently, they are targets for over 700 drugs. Discovery of new ion channels is facilitated with computational methods that predict ion channels and their types from protein sequences. However, these methods were never comprehensively compared and evaluated. OBJECTIVE We offer first-of-its-kind comprehensive survey of the sequence-based predictors of ion channels. We describe eight predictors that include five methods that predict ion channels, their types, and four classes of the voltage-gated channels. We also develop and use a new benchmark dataset to perform comparative empirical analysis of the three currently available predictors. RESULTS While several methods that rely on different designs were published, only a few of them are currently available and offer a broad scope of predictions. Support and availability after publication should be required when new methods are considered for publication. Empirical analysis shows strong performance for the prediction of ion channels and modest performance for the prediction of ion channel types and voltage-gated channel classes. We identify a substantial weakness of current methods that cannot accurately predict ion channels that are categorized into multiple classes/types. CONCLUSION Several predictors of ion channels are available to the end users. They offer practical levels of predictive quality. Methods that rely on a larger and more diverse set of predictive inputs (such as PSIONplus) are more accurate. New tools that address multi-label prediction of ion channels should be developed.
Collapse
Affiliation(s)
- Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Zhen Miao
- College of Life Sciences, Nankai University, Tianjin, China
| | - Zhaopeng Zhang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Hong Wei
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, United States
| |
Collapse
|
18
|
Muthye V, Kandoi G, Lavrov DV. MMPdb and MitoPredictor: Tools for facilitating comparative analysis of animal mitochondrial proteomes. Mitochondrion 2020; 51:118-125. [PMID: 31972373 DOI: 10.1016/j.mito.2020.01.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 12/09/2019] [Accepted: 01/02/2020] [Indexed: 11/24/2022]
Abstract
Data on experimentally-characterized animal mitochondrial proteomes (mt-proteomes) are limited to a few model organisms and are scattered across multiple databases, impeding a comparative analysis. We developed two resources to address these problems. First, we re-analyzed proteomic data from six species with experimentally characterized mt-proteomes: animals (Homo sapiens, Mus musculus, Caenorhabditis elegans, and Drosophila melanogaster), and outgroups (Acanthamoeba castellanii and Saccharomyces cerevisiae) and created the Metazoan Mitochondrial Proteome Database (MMPdb) to host the results. Second, we developed a novel pipeline, "MitoPredictor" that uses a Random Forest classifier to infer mitochondrial localization of proteins based on orthology, mitochondrial targeting signal prediction, and protein domain analyses. Both tools generate an R Shiny applet that can be used to visualize and interact with the results and can be used on a personal computer. MMPdb is also available online at https://mmpdb.eeob.iastate.edu/.
Collapse
Affiliation(s)
- Viraj Muthye
- Bioinformatics and Computational Biology Program, Iowa State University, 2014 Molecular Biology Building, Ames, Iowa 50011, USA; Department of Ecology, Evolution and Organismal Biology, Iowa State University, 251 Bessey Hall, 2200 Osborne Drive, Ames, Iowa 50011, USA.
| | - Gaurav Kandoi
- Bioinformatics and Computational Biology Program, Iowa State University, 2014 Molecular Biology Building, Ames, Iowa 50011, USA; Department of Electrical and Computer Engineering, Iowa State University, 2520 Osborn Drive, Ames, IA 50011, USA
| | - Dennis V Lavrov
- Bioinformatics and Computational Biology Program, Iowa State University, 2014 Molecular Biology Building, Ames, Iowa 50011, USA; Department of Ecology, Evolution and Organismal Biology, Iowa State University, 251 Bessey Hall, 2200 Osborne Drive, Ames, Iowa 50011, USA
| |
Collapse
|
19
|
Malhotra D, Casey JR. Molecular Mechanisms of Fuchs and Congenital Hereditary Endothelial Corneal Dystrophies. Rev Physiol Biochem Pharmacol 2020; 178:41-81. [PMID: 32789790 DOI: 10.1007/112_2020_39] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The cornea, the eye's outermost layer, protects the eye from the environment. The cornea's innermost layer is an endothelium separating the stromal layer from the aqueous humor. A central role of the endothelium is to maintain stromal hydration state. Defects in maintaining this hydration can impair corneal clarity and thus visual acuity. Two endothelial corneal dystrophies, Fuchs Endothelial Corneal Dystrophy (FECD) and Congenital Hereditary Endothelial Dystrophy (CHED), are blinding corneal diseases with varied clinical presentation in patients across different age demographics. Recessive CHED with an early onset (typically age: 0-3 years) and dominantly inherited FECD with a late onset (age: 40-50 years) have similar phenotypes, although caused by defects in several different genes. A range of molecular mechanisms have been proposed to explain FECD and CHED pathology given the involvement of multiple causative genes. This critical review provides insight into the proposed molecular mechanisms underlying FECD and CHED pathology along with common pathways that may explain the link between the defective gene products and provide a new perspective to view these genetic blinding diseases.
Collapse
Affiliation(s)
- Darpan Malhotra
- Department of Biochemistry, University of Alberta, Edmonton, AB, Canada
- Membrane Protein Disease Research Group, University of Alberta, Edmonton, AB, Canada
| | - Joseph R Casey
- Department of Biochemistry, University of Alberta, Edmonton, AB, Canada.
- Membrane Protein Disease Research Group, University of Alberta, Edmonton, AB, Canada.
- Department of Physiology, University of Alberta, Edmonton, AB, Canada.
- Department of Ophthalmology and Visual Science, University of Alberta, Edmonton, AB, Canada.
| |
Collapse
|
20
|
Nithya V. SubmitoLoc: Identification of mitochondrial sub cellular locations of proteins using support vector machine. Bioinformation 2019; 15:863-868. [PMID: 32256006 PMCID: PMC7088428 DOI: 10.6026/97320630015863] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2019] [Revised: 12/31/2019] [Accepted: 12/31/2019] [Indexed: 11/23/2022] Open
Abstract
Mitochondria are important sub-cellular organelles in eukaryotes. Defects in mitochondrial system lead to a variety of disease. Therefore, detailed knowledge of mitochondrial proteome is vital to understand mitochondrial system and their function. Sequence databases contain large number of mitochondrial proteins but they are mostly not annotated. In this study, we developed a support vector machine approach, SubmitoLoc, to predict mitochondrial sub cellular locations of proteins based on various sequence derived properties. We evaluated the predictor using 10-fold cross validation. Our method achieved 88.56 % accuracy using all features. Average sensitivity and specificity for four-subclass prediction is 85.37% and 87.25% respectively. High prediction accuracy suggests that SubmitoLoc will be useful for researchers studying mitochondrial biology and drug discovery.
Collapse
Affiliation(s)
- Varadharaju Nithya
- Department of Animal Health Management, Alagappa University, Karaikudi-630003, India
| |
Collapse
|
21
|
Nightingale DJ, Geladaki A, Breckels LM, Oliver SG, Lilley KS. The subcellular organisation of Saccharomyces cerevisiae. Curr Opin Chem Biol 2019; 48:86-95. [PMID: 30503867 PMCID: PMC6391909 DOI: 10.1016/j.cbpa.2018.10.026] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 10/29/2018] [Accepted: 10/31/2018] [Indexed: 01/06/2023]
Abstract
Subcellular protein localisation is essential for the mechanisms that govern cellular homeostasis. The ability to understand processes leading to this phenomenon will therefore enhance our understanding of cellular function. Here we review recent developments in this field with regard to mass spectrometry, fluorescence microscopy and computational prediction methods. We highlight relative strengths and limitations of current methodologies focussing particularly on studies in the yeast Saccharomyces cerevisiae. We further present the first cell-wide spatial proteome map of S. cerevisiae, generated using hyperLOPIT, a mass spectrometry-based protein correlation profiling technique. We compare protein subcellular localisation assignments from this map, with two published fluorescence microscopy studies and show that confidence in localisation assignment is attained using multiple orthogonal methods that provide complementary data.
Collapse
Affiliation(s)
- Daniel Jh Nightingale
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, United Kingdom; Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1GA, United Kingdom
| | - Aikaterini Geladaki
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, United Kingdom; Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1GA, United Kingdom; Department of Genetics, University of Cambridge, Downing Street, Cambridge, CB2 3EH, United Kingdom
| | - Lisa M Breckels
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, United Kingdom; Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1GA, United Kingdom
| | - Stephen G Oliver
- Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1GA, United Kingdom
| | - Kathryn S Lilley
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QR, United Kingdom; Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge, CB2 1GA, United Kingdom.
| |
Collapse
|