1
|
Faiz M, Khan SJ, Azim F, Ejaz N, Shamim F. Deciphering Membrane Proteins Through Deep Learning Models by Revealing Their Locale Within the Cell. Bioengineering (Basel) 2024; 11:1150. [PMID: 39593811 PMCID: PMC11592231 DOI: 10.3390/bioengineering11111150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 11/09/2024] [Accepted: 11/11/2024] [Indexed: 11/28/2024] Open
Abstract
Membrane proteins constitute essential biomolecules attached to or integrated into cellular and organelle membranes, playing diverse roles in cellular processes. Their precise localization is crucial for understanding their functions. Existing protein subcellular localization predictors are predominantly trained on globular proteins; their performance diminishes for membrane proteins, explicitly via deep learning models. To address this challenge, the proposed study segregates membrane proteins into three distinct locations, including the plasma membrane, internal membrane, and membrane of the organelle, using deep learning algorithms including recurrent neural networks (RNN) and Long Short-Term Memory (LSTM). A redundancy-curtailed dataset of 3000 proteins from the MemLoci approach is selected for the investigation, along with incorporating pseudo amino acid composition (PseAAC). PseAAC is an exemplary technique for extracting protein information hidden in the amino acid sequences. After extensive testing, the results show that the accuracy for LSTM and RNN is 83.4% and 80.5%, respectively. The results show that the LSTM model outperforms the RNN and is most commonly employed in proteomics.
Collapse
Affiliation(s)
- Mehwish Faiz
- Department of Electrical Engineering, Faculty of Engineering, Science, Technology and Management, Ziauddin University, Karachi 74200, Pakistan; (M.F.); (F.A.)
- Department of Biomedical Engineering, Faculty of Engineering, Science, Technology and Management, Ziauddin University, Karachi 74200, Pakistan
| | - Saad Jawaid Khan
- Department of Biomedical Engineering, Faculty of Engineering, Science, Technology and Management, Ziauddin University, Karachi 74200, Pakistan
| | - Fahad Azim
- Department of Electrical Engineering, Faculty of Engineering, Science, Technology and Management, Ziauddin University, Karachi 74200, Pakistan; (M.F.); (F.A.)
| | - Nazia Ejaz
- Department of Biomedical Engineering, Balochistan University of Engineering and Technology, Khuzdar 89100, Pakistan
| | - Fahad Shamim
- Institute of Biomedical Engineering & Technology (IBET), Liaquat University of Medical and Health Sciences, Jamshoro 76060, Pakistan
| |
Collapse
|
2
|
Fu X, Chen Y, Tian S. DlncRNALoc: A discrete wavelet transform-based model for predicting lncRNA subcellular localization. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:20648-20667. [PMID: 38124569 DOI: 10.3934/mbe.2023913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
The prediction of long non-coding RNA (lncRNA) subcellular localization is essential to the understanding of its function and involvement in cellular regulation. Traditional biological experimental methods are costly and time-consuming, making computational methods the preferred approach for predicting lncRNA subcellular localization (LSL). However, existing computational methods have limitations due to the structural characteristics of lncRNAs and the uneven distribution of data across subcellular compartments. We propose a discrete wavelet transform (DWT)-based model for predicting LSL, called DlncRNALoc. We construct a physicochemical property matrix of a 2-tuple bases based on lncRNA sequences, and we introduce a DWT lncRNA feature extraction method. We use the Synthetic Minority Over-sampling Technique (SMOTE) for oversampling and the local fisher discriminant analysis (LFDA) algorithm to optimize feature information. The optimized feature vectors are fed into support vector machine (SVM) to construct a predictive model. DlncRNALoc has been applied for a five-fold cross-validation on the three sets of benchmark datasets. Extensive experiments have demonstrated the superiority and effectiveness of the DlncRNALoc model in predicting LSL.
Collapse
Affiliation(s)
- Xiangzheng Fu
- Neher's Biophysics Laboratory for Innovative Drug Discovery, State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao, China
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Yifan Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, China
- Department of Basic Biology, Changsha Medical College, Changsha, Hunan, China
| | - Sha Tian
- Department of Internal Medicine, College of Integrated Chinese and Western Medicine, Hunan University of Chinese Medicine, Changsha, Hunan, China
| |
Collapse
|
3
|
Faiz M, Khan SJ, Azim F, Ejaz N. Disclosing the locale of transmembrane proteins within cellular alcove by machine learning approach: systematic review and meta analysis. J Biomol Struct Dyn 2023; 42:11133-11148. [PMID: 37768108 DOI: 10.1080/07391102.2023.2260490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
Protein subcellular localization is a promising research question in Proteomics and associated fields, including Biological Sciences, Biomedical Engineering, Computational Biology, Bioinformatics, Proteomics, Artificial Intelligence, and Biophysics. However, computational techniques are preferred to explore this attribute for a massive number of proteins. The byproduct of this conjunction yields diversified location identifiers of proteins. These protein subcellular localization identifiers are unique regarding the database used, organisms, Machine Learning Technique, and accuracy. Despite the availability of these identifiers, the majority of the work has been done on the subcellular localization of proteins and, less work has been done specifically on locations of transmembrane proteins. This systematic review accounts for computational techniques implemented on transmembrane protein localization. Moreover, a literature search on PubMed, Science Direct, and IEEE Databases disclosed no systematic review or meta-analysis on the cell's transmembrane protein locale. A Systematic review was formed under the guidelines of PRISMA by using Science Direct, PubMed, and IEEE Databases. Journal publications from 2000 to 2023 were taken into consideration and screened. This review has focused only on computational studies rather than experimental techniques. 1004 studies were reviewed and were categorized as relevant and non-relevant according to inclusion and exclusion criteria. All the screening was done through Endnote after importing citations. This systematic review characterizes the gap in targeting the locale of the transmembrane protein and will aid researchers in exploring its new horizons.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Mehwish Faiz
- Department of Biomedical Engineering, Ziauddin University (FESTM), Karachi, Pakistan
- Department of Electrical Engineering, Ziauddin University, (FESTM), Karachi, Pakistan
| | - Saad Jawaid Khan
- Department of Biomedical Engineering, Ziauddin University (FESTM), Karachi, Pakistan
| | - Fahad Azim
- Department of Electrical Engineering, Ziauddin University, (FESTM), Karachi, Pakistan
| | - Nazia Ejaz
- Balochistan University of Engineering and Technology, Khuzdar, Pakistan
| |
Collapse
|
4
|
Grass GD, Ercan D, Obermayer AN, Shaw T, Stewart PA, Chahoud J, Dhillon J, Lopez A, Johnstone PAS, Rogatto SR, Spiess PE, Eschrich SA. An Assessment of the Penile Squamous Cell Carcinoma Surfaceome for Biomarker and Therapeutic Target Discovery. Cancers (Basel) 2023; 15:3636. [PMID: 37509297 PMCID: PMC10377392 DOI: 10.3390/cancers15143636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/01/2023] [Accepted: 07/11/2023] [Indexed: 07/30/2023] Open
Abstract
Penile squamous cell carcinoma (PSCC) is a rare malignancy in most parts of the world and the underlying mechanisms of this disease have not been fully investigated. About 30-50% of cases are associated with high-risk human papillomavirus (HPV) infection, which may have prognostic value. When PSCC becomes resistant to upfront therapies there are limited options, thus further research is needed in this venue. The extracellular domain-facing protein profile on the cell surface (i.e., the surfaceome) is a key area for biomarker and drug target discovery. This research employs computational methods combined with cell line translatomic (n = 5) and RNA-seq transcriptomic data from patient-derived tumors (n = 18) to characterize the PSCC surfaceome, evaluate the composition dependency on HPV infection, and explore the prognostic impact of identified surfaceome candidates. Immunohistochemistry (IHC) was used to validate the localization of select surfaceome markers. This analysis characterized a diverse surfaceome within patient tumors with 25% and 18% of the surfaceome represented by the functional classes of receptors and transporters, respectively. Significant differences in protein classes were noted by HPV status, with the most change being seen in transporter proteins (25%). IHC confirmed the robust surface expression of select surfaceome targets in the top 85% of expression and a superfamily immunoglobulin protein called BSG/CD147 was prognostic of survival. This study provides the first description of the PSCC surfaceome and its relation to HPV infection and sets a foundation for novel biomarker and drug target discovery in this rare cancer.
Collapse
Affiliation(s)
- George Daniel Grass
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Dalia Ercan
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Alyssa N Obermayer
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Timothy Shaw
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Paul A Stewart
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Jad Chahoud
- Department of Genitourinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Jasreman Dhillon
- Department of Anatomic Pathology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Alex Lopez
- Department of Anatomic Pathology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Peter A S Johnstone
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Silvia Regina Rogatto
- Department of Clinical Genetics, University Hospital of Southern Denmark-Vejle, Beriderbakken 4, 7100 Vejle, Denmark
| | - Philippe E Spiess
- Department of Genitourinary Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| | - Steven A Eschrich
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL 33612, USA
| |
Collapse
|
5
|
Role of C-terminal domain of Mycobacterium tuberculosis PE6 (Rv0335c) protein in host mitochondrial stress and macrophage apoptosis. Apoptosis 2023; 28:136-165. [PMID: 36258102 PMCID: PMC9579591 DOI: 10.1007/s10495-022-01778-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/27/2022] [Indexed: 11/02/2022]
Abstract
PE/PPE proteins of Mycobacterium tuberculosis (Mtb) target the host organelles to dictate the outcome of infection. This study investigated the significance of PE6/Rv0335c protein's unique C-terminal in causing host mitochondrial perturbations and apoptosis. In-silico analysis revealed that similar to eukaryotic apoptotic Bcl2 proteins, Rv0335c had disordered, hydrophobic C-terminal and two BH3-like motifs in which one was located at C-terminal. Also, Rv0335c's N terminal had mitochondrial targeting sequence. Since, C-terminal of Bcl2 proteins are crucial for mitochondria targeting and apoptosis; it became relevant to evaluate the role of Rv0335c's C-terminal domain in modulating host mitochondrial functions and apoptosis. To confirm this, in-vitro experiments were conducted with Rv0335c whole protein and Rv0335c∆Cterm (C-terminal domain deleted Rv0335c) protein. Rv0335c∆Cterm caused significant reduction in mitochondrial perturbations and Caspase-mediated apoptosis of THP1 macrophages in comparison to Rv0335c. However, the deletion of C-terminal domain didn't affect Rv0335c's ability to localize to mitochondria. Nine Ca2+ binding residues were predicted within Rv0335c and four of them were at the C-terminal. In-vitro studies confirmed that Rv0335c caused significant increase in intracellular calcium influx whereas Rv0335c∆Cterm had insignificant effect on Ca2+ influx. Rv0335c has been reported to be a TLR4 agonist and, we observed a significant reduction in the expression of TLR4-HLA-DR-TNF-α in response to Rv0335c∆Cterm protein also suggesting the role of Rv0335c's C-terminal domain in host-pathogen interaction. These findings indicate the possibility of Rv0335c as a molecular mimic of eukaryotic Bcl2 proteins which equips it to cause host mitochondrial perturbations and apoptosis that may facilitate pathogen persistence.
Collapse
|
6
|
Cai J, Wang T, Deng X, Tang L, Liu L. GM-lncLoc: LncRNAs subcellular localization prediction based on graph neural network with meta-learning. BMC Genomics 2023; 24:52. [PMID: 36709266 PMCID: PMC9883864 DOI: 10.1186/s12864-022-09034-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 11/21/2022] [Indexed: 01/29/2023] Open
Abstract
In recent years, a large number of studies have shown that the subcellular localization of long non-coding RNAs (lncRNAs) can bring crucial information to the recognition of lncRNAs function. Therefore, it is of great significance to establish a computational method to accurately predict the subcellular localization of lncRNA. Previous prediction models are based on low-level sequences information and are troubled by the few samples problem. In this study, we propose a new prediction model, GM-lncLoc, which is based on the initial information extracted from the lncRNA sequence, and also combines the graph structure information to extract high level features of lncRNA. In addition, the training mode of meta-learning is introduced to obtain meta-parameters by training a series of tasks. With the meta-parameters, the final parameters of other similar tasks can be learned quickly, so as to solve the problem of few samples in lncRNA subcellular localization. Compared with the previous methods, GM-lncLoc achieved the best results with an accuracy of 93.4 and 94.2% in the benchmark datasets of 5 and 4 subcellular compartments, respectively. Furthermore, the prediction performance of GM-lncLoc was also better on the independent dataset. It shows the effectiveness and great potential of our proposed method for lncRNA subcellular localization prediction. The datasets and source code are freely available at https://github.com/JunzheCai/GM-lncLoc .
Collapse
Affiliation(s)
- Junzhe Cai
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| | - Ting Wang
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| | - Xi Deng
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| | - Lin Tang
- grid.410739.80000 0001 0723 6903Key Laboratory of Educational Information for Nationalities Ministry of Education, Yunnan Normal University, Kunming, Yunnan China
| | - Lin Liu
- grid.410739.80000 0001 0723 6903School of Information, Yunnan Normal University, Kunming, Yunnan China
| |
Collapse
|
7
|
Mercatelli D, Cabrelle C, Veltri P, Giorgi FM, Guzzi PH. Detection of pan-cancer surface protein biomarkers via a network-based approach on transcriptomics data. Brief Bioinform 2022; 23:6695270. [DOI: 10.1093/bib/bbac400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 07/28/2022] [Accepted: 08/17/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Cell surface proteins have been used as diagnostic and prognostic markers in cancer research and as targets for the development of anticancer agents. Many of these proteins lie at the top of signaling cascades regulating cell responses and gene expression, therefore acting as ‘signaling hubs’. It has been previously demonstrated that the integrated network analysis on transcriptomic data is able to infer cell surface protein activity in breast cancer. Such an approach has been implemented in a publicly available method called ‘SURFACER’. SURFACER implements a network-based analysis of transcriptomic data focusing on the overall activity of curated surface proteins, with the final aim to identify those proteins driving major phenotypic changes at a network level, named surface signaling hubs. Here, we show the ability of SURFACER to discover relevant knowledge within and across cancer datasets. We also show how different cancers can be stratified in surface-activity-specific groups. Our strategy may identify cancer-wide markers to design targeted therapies and biomarker-based diagnostic approaches.
Collapse
Affiliation(s)
- Daniele Mercatelli
- Department of Pharmacy and Biotechnology, University of Bologna , 40138 Bologna , Italy
| | - Chiara Cabrelle
- Department of Pharmacy and Biotechnology, University of Bologna , 40138 Bologna , Italy
| | - Pierangelo Veltri
- Department of Surgical and Medical Sciences, Magna Graecia University , 88100 Catanzaro , Italy
| | - Federico M Giorgi
- Department of Pharmacy and Biotechnology, University of Bologna , 40138 Bologna , Italy
| | - Pietro H Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University , 88100 Catanzaro , Italy
| |
Collapse
|
8
|
Xie X, Cao P, Wang Z, Gao J, Wu M, Li X, Zhang J, Wang Y, Gong D, Yang J. Genome-wide characterization and expression profiling of the PDR gene family in tobacco (Nicotiana tabacum). Gene 2021; 788:145637. [PMID: 33848571 DOI: 10.1016/j.gene.2021.145637] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/13/2021] [Accepted: 04/07/2021] [Indexed: 11/18/2022]
Abstract
The pleiotropic drug resistance (PDR) proteins of the ATP-binding cassette (ABC) family play essential roles in physiological processes and have been characterized in many plant species. However, no comprehensive investigation of tobacco (Nicotiana tabacum), an important economic crop and a useful model plant for scientific research, has been presented. We identified 32 PDR genes in the tobacco genome and explored their domain organization, chromosomal distribution and evolution, promoter cis-elements, and expression profiles. A phylogenetic analysis revealed that tobacco has a significantly expanded number of PDR genes involved in plant defense. It also revealed that two tobacco PDR proteins may function as strigolactone transporters to regulate shoot branching, and several NtPDR genes may be involved in cadmium transport. Moreover, tissue expression profiles of NtPDR genes and their responses to several hormones and abiotic stresses were assessed using quantitative real-time PCR. Most of the NtPDR genes were regulated by jasmonate or salicylic acid, suggesting the important regulatory roles of NtPDRs in plant defense and secondary metabolism. They were also responsive to abiotic stresses, like drought and cold, and there was a strong correlation between the presence of promoter cis-elements and abiotic/biotic stress responses. These results provide useful clues for further in-depth studies on the functions of the tobacco PDR genes.
Collapse
Affiliation(s)
- Xiaodong Xie
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Peijian Cao
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Zhong Wang
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Junping Gao
- China Tobacco Hunan Industrial Co., Ltd., Changsha 410007, China
| | - Mingzhu Wu
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Xiaoxu Li
- China Tobacco Hunan Industrial Co., Ltd., Changsha 410007, China
| | - Jianfeng Zhang
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China
| | - Yaofu Wang
- China Tobacco Hunan Industrial Co., Ltd., Changsha 410007, China
| | - Daping Gong
- Tobacco Research Institute, Chinese Academy of Agricultural Sciences, Qingdao, China.
| | - Jun Yang
- China Tobacco Gene Research Center, Zhengzhou Tobacco Research Institute of CNTC, Zhengzhou 450001, China.
| |
Collapse
|
9
|
Imai K, Nakai K. Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences. Front Genet 2020; 11:607812. [PMID: 33324450 PMCID: PMC7723863 DOI: 10.3389/fgene.2020.607812] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 11/03/2020] [Indexed: 12/13/2022] Open
Abstract
At the time of translation, nascent proteins are thought to be sorted into their final subcellular localization sites, based on the part of their amino acid sequences (i.e., sorting or targeting signals). Thus, it is interesting to computationally recognize these signals from the amino acid sequences of any given proteins and to predict their final subcellular localization with such information, supplemented with additional information (e.g., k-mer frequency). This field has a long history and many prediction tools have been released. Even in this era of proteomic atlas at the single-cell level, researchers continue to develop new algorithms, aiming at accessing the impact of disease-causing mutations/cell type-specific alternative splicing, for example. In this article, we overview the entire field and discuss its future direction.
Collapse
Affiliation(s)
- Kenichiro Imai
- Cellular and Molecular Biotechnology Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Kenta Nakai
- The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
10
|
Kaleel M, Zheng Y, Chen J, Feng X, Simpson JC, Pollastri G, Mooney C. SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks. Bioinformatics 2020; 36:3343-3349. [PMID: 32142105 DOI: 10.1093/bioinformatics/btaa156] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 02/25/2020] [Accepted: 03/02/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. RESULTS Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75-0.86 outperforming the other state-of-the-art web servers we tested. AVAILABILITY AND IMPLEMENTATION SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. CONTACT catherine.mooney@ucd.ie.
Collapse
Affiliation(s)
- Manaz Kaleel
- School of Computer Science.,UCD Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Yandan Zheng
- Beijing-Dublin International College, Beijing University of Technology, Chaoyang, China
| | - Jialiang Chen
- Beijing-Dublin International College, Beijing University of Technology, Chaoyang, China
| | - Xuanming Feng
- Beijing-Dublin International College, Beijing University of Technology, Chaoyang, China
| | - Jeremy C Simpson
- Conway Institute of Biomolecular and Biomedical Research.,School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Gianluca Pollastri
- School of Computer Science.,UCD Institute for Discovery, University College Dublin, Dublin, Ireland
| | - Catherine Mooney
- School of Computer Science.,Beijing-Dublin International College, Beijing University of Technology, Chaoyang, China
| |
Collapse
|
11
|
Bouziane H, Chouarfia A. Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment. J Integr Bioinform 2020; 18:51-79. [PMID: 32598314 PMCID: PMC8035964 DOI: 10.1515/jib-2019-0091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 04/08/2020] [Indexed: 12/31/2022] Open
Abstract
To date, many proteins generated by large-scale genome sequencing projects are still uncharacterized and subject to intensive investigations by both experimental and computational means. Knowledge of protein subcellular localization (SCL) is of key importance for protein function elucidation. However, it remains a challenging task, especially for multiple sites proteins known to shuttle between cell compartments to perform their proper biological functions and proteins which do not have significant homology to proteins of known subcellular locations. Due to their low-cost and reasonable accuracy, machine learning-based methods have gained much attention in this context with the availability of a plethora of biological databases and annotated proteins for analysis and benchmarking. Various predictive models have been proposed to tackle the SCL problem, using different protein sequence features pertaining to the subcellular localization, however, the overwhelming majority of them focuses on single localization and cover very limited cellular locations. The prediction was basically established on sorting signals, amino acids compositions, and homology. To improve the prediction quality, focus is actually on knowledge information extracted from annotation databases, such as protein-protein interactions and Gene Ontology (GO) functional domains annotation which has been recently a widely adopted and essential information for learning systems. To deal with such problem, in the present study, we considered SCL prediction task as a multi-label learning problem and tried to label both single site and multiple sites unannotated bacterial protein sequences by mining proteins homology relationships using both GO terms of protein homologs and PSI-BLAST profiles. The experiments using 5-fold cross-validation tests on the benchmark datasets showed a significant improvement on the results obtained by the proposed consensus multi-label prediction model which discriminates six compartments for Gram-negative and five compartments for Gram-positive bacterial proteins.
Collapse
Affiliation(s)
- Hafida Bouziane
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| | - Abdallah Chouarfia
- Département d’Informatique, Université des Sciences et de la Technologie d’Oran Mohamed Boudiaf, USTO-MB BP 1505, El M’Naouer, 31000, Oran, Algeria
| |
Collapse
|
12
|
Zhang G, Zhang W. Direct protein-protein interaction network for insecticide resistance based on subcellular localization analysis in Drosophila melanogaster. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART. B, PESTICIDES, FOOD CONTAMINANTS, AND AGRICULTURAL WASTES 2020; 55:732-748. [PMID: 32567974 DOI: 10.1080/03601234.2020.1782114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In present study, we constructed the direct protein-protein interaction network of insecticide resistance based on subcellular localization analysis. Totally 177 of 528 resistance proteins were identified and they were located in 11 subcellular localizations. We further analyzed topological properties of the network and the biological characteristics of resistance proteins, such as k-core, neighborhood connectivity, instability index and aliphatic index. They can be used to predict the key proteins and potential mechanisms from macro-perspective. The problem of resistance has not been solved fundamentally, because the development of new insecticides can't keep pace with the development speed of resistance, and the lack of understanding of molecular mechanism of resistance. As the further analysis to reduce data noise, we constructed the direct protein-protein interaction network of insecticide resistance based on subcellular localization analysis. The interaction between proteins located at the same subcellular location belongs to direct interactions, thus eliminating indirect interaction. Totally 177 of 528 resistance proteins were identified and they were located in 11 subcellular localizations. We further analyzed topological properties of the network and the biological characteristics of resistance proteins, such as k-core, neighborhood connectivity, instability index and aliphatic index. They can be used to predict the hub proteins and potential mechanisms from macro-perspective. This is the first study to explore the insecticide resistance molecular mechanism of Drosophila melanogaster based on subcellular localization analysis. It can provide the bioinformatics foundation for further understanding the mechanisms of insecticide resistance. It also provides a reference for the study of molecular mechanism of insecticide resistance of other insects.
Collapse
Affiliation(s)
- Guilu Zhang
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Wenjun Zhang
- School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
13
|
Cao Z, Pan X, Yang Y, Huang Y, Shen HB. The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier. Bioinformatics 2019; 34:2185-2194. [PMID: 29462250 DOI: 10.1093/bioinformatics/bty085] [Citation(s) in RCA: 290] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 02/14/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Cao
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - Yang Yang
- Department of Computer Science, Shanghai Jiao Tong University, and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai, China
| | - Yan Huang
- State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
14
|
Orioli T, Vihinen M. Benchmarking subcellular localization and variant tolerance predictors on membrane proteins. BMC Genomics 2019; 20:547. [PMID: 31307390 PMCID: PMC6631444 DOI: 10.1186/s12864-019-5865-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Background Membrane proteins constitute up to 30% of the human proteome. These proteins have special properties because the transmembrane segments are embedded into lipid bilayer while extramembranous parts are in different environments. Membrane proteins have several functions and are involved in numerous diseases. A large number of prediction methods have been introduced to predict protein subcellular localization as well as the tolerance or pathogenicity of amino acid substitutions. Results We tested the performance of 22 tolerance predictors by collecting information on membrane proteins and variants in them. The analysis indicated that the best tools had similar prediction performance on transmembrane, inside and outside regions of transmembrane proteins and comparable to overall prediction performances for all types of proteins. PON-P2 had the highest performance followed by REVEL, MetaSVM and VEST3. Further, we tested with the high quality dataset also the performance of seven subcellular localization predictors on membrane proteins. We assessed separately the performance for single pass and multi pass membrane proteins. Predictions for multi pass proteins were more reliable than those for single pass proteins. Conclusions The predictors for variant effects had better performance than subcellular localization tools. The best tolerance predictors are highly reliable. As there are large differences in the performances of tools, end-users have to be cautious in method selection. Electronic supplementary material The online version of this article (10.1186/s12864-019-5865-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tommaso Orioli
- International Master in Bioinformatics, School of Science, University of Bologna, Bologna, Italy.,Department of Experimental Medical Science, BMC B13, Lund University, SE-22184, Lund, Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184, Lund, Sweden.
| |
Collapse
|
15
|
Bausch-Fluck D, Goldmann U, Müller S, van Oostrum M, Müller M, Schubert OT, Wollscheid B. The in silico human surfaceome. Proc Natl Acad Sci U S A 2018; 115:E10988-E10997. [PMID: 30373828 PMCID: PMC6243280 DOI: 10.1073/pnas.1808790115] [Citation(s) in RCA: 228] [Impact Index Per Article: 32.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Cell-surface proteins are of great biomedical importance, as demonstrated by the fact that 66% of approved human drugs listed in the DrugBank database target a cell-surface protein. Despite this biomedical relevance, there has been no comprehensive assessment of the human surfaceome, and only a fraction of the predicted 5,000 human transmembrane proteins have been shown to be located at the plasma membrane. To enable analysis of the human surfaceome, we developed the surfaceome predictor SURFY, based on machine learning. As a training set, we used experimentally verified high-confidence cell-surface proteins from the Cell Surface Protein Atlas (CSPA) and trained a random forest classifier on 131 features per protein and, specifically, per topological domain. SURFY was used to predict a human surfaceome of 2,886 proteins with an accuracy of 93.5%, which shows excellent overlap with known cell-surface protein classes (i.e., receptors). In deposited mRNA data, we found that between 543 and 1,100 surfaceome genes were expressed in cancer cell lines and maximally 1,700 surfaceome genes were expressed in embryonic stem cells and derivative lines. Thus, the surfaceome diversity depends on cell type and appears to be more dynamic than the nonsurface proteome. To make the predicted surfaceome readily accessible to the research community, we provide visualization tools for intuitive interrogation (wlab.ethz.ch/surfaceome). The in silico surfaceome enables the filtering of data generated by multiomics screens and supports the elucidation of the surfaceome nanoscale organization.
Collapse
Affiliation(s)
- Damaris Bausch-Fluck
- Institute of Molecular Systems Biology at the Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
- Biomedical Proteomics Platform, Department of Health Sciences and Technology, ETH Zurich, 8093 Zurich, Switzerland
| | - Ulrich Goldmann
- Institute of Molecular Systems Biology at the Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Sebastian Müller
- Institute of Molecular Systems Biology at the Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Marc van Oostrum
- Institute of Molecular Systems Biology at the Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
- Biomedical Proteomics Platform, Department of Health Sciences and Technology, ETH Zurich, 8093 Zurich, Switzerland
| | - Maik Müller
- Institute of Molecular Systems Biology at the Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
- Biomedical Proteomics Platform, Department of Health Sciences and Technology, ETH Zurich, 8093 Zurich, Switzerland
| | - Olga T Schubert
- Institute of Molecular Systems Biology at the Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Bernd Wollscheid
- Institute of Molecular Systems Biology at the Department of Biology, ETH Zurich, 8093 Zurich, Switzerland;
- Biomedical Proteomics Platform, Department of Health Sciences and Technology, ETH Zurich, 8093 Zurich, Switzerland
| |
Collapse
|
16
|
Savojardo C, Martelli P, Fariselli P, Profiti G, Casadio R. BUSCA: an integrative web server to predict subcellular localization of proteins. Nucleic Acids Res 2018; 46:W459-W466. [PMID: 29718411 PMCID: PMC6031068 DOI: 10.1093/nar/gky320] [Citation(s) in RCA: 281] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 04/12/2018] [Accepted: 04/17/2018] [Indexed: 12/28/2022] Open
Abstract
Here, we present BUSCA (http://busca.biocomp.unibo.it), a novel web server that integrates different computational tools for predicting protein subcellular localization. BUSCA combines methods for identifying signal and transit peptides (DeepSig and TPpred3), GPI-anchors (PredGPI) and transmembrane domains (ENSEMBLE3.0 and BetAware) with tools for discriminating subcellular localization of both globular and membrane proteins (BaCelLo, MemLoci and SChloro). Outcomes from the different tools are processed and integrated for annotating subcellular localization of both eukaryotic and bacterial protein sequences. We benchmark BUSCA against protein targets derived from recent CAFA experiments and other specific data sets, reporting performance at the state-of-the-art. BUSCA scores better than all other evaluated methods on 2732 targets from CAFA2, with a F1 value equal to 0.49 and among the best methods when predicting targets from CAFA3. We propose BUSCA as an integrated and accurate resource for the annotation of protein subcellular localization.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40100, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40100, Italy
| | - Piero Fariselli
- Department of Comparative Biomedicine and Food Science, University of Padova, Padova 35020, Italy
| | - Giuseppe Profiti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40100, Italy
- Institute of Biomembrane, Bioenergetics and Molecular Biotechnologies, Italian National Research Council (CNR), Bari 70126, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40100, Italy
- Institute of Biomembrane, Bioenergetics and Molecular Biotechnologies, Italian National Research Council (CNR), Bari 70126, Italy
| |
Collapse
|
17
|
Shao W, Liu M, Xu YY, Shen HB, Zhang D. An Organelle Correlation-Guided Feature Selection Approach for Classifying Multi-Label Subcellular Bio-Images. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:828-838. [PMID: 28278481 DOI: 10.1109/tcbb.2017.2677907] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Nowadays, with the advances in microscopic imaging, accurate classification of bioimage-based protein subcellular location pattern has attracted as much attention as ever. One of the basic challenging problems is how to select the useful feature components among thousands of potential features to describe the images. This is not an easy task especially considering there is a high ratio of multi-location proteins. Existing feature selection methods seldom take the correlation among different cellular compartments into consideration, and thus may miss some features that will be co-important for several subcellular locations. To deal with this problem, we make use of the important structural correlation among different cellular compartments and propose an organelle structural correlation regularized feature selection method CSF (Common-Sets of Features) in this paper. We formulate the multi-label classification problem by adopting a group-sparsity regularizer to select common subsets of relevant features from different cellular compartments. In addition, we also add a cell structural correlation regularized Laplacian term, which utilizes the prior biological structural information to capture the intrinsic dependency among different cellular compartments. The CSF provides a new feature selection strategy for multi-label bio-image subcellular pattern classifications, and the experimental results also show its superiority when comparing with several existing algorithms.
Collapse
|
18
|
Du PF. Predicting Protein Submitochondrial Locations: The 10th Anniversary. Curr Genomics 2017; 18:316-321. [PMID: 29081687 PMCID: PMC5635615 DOI: 10.2174/1389202918666170228143256] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/16/2022] Open
Abstract
Predicting protein submitochondrial location has been studied for about ten years. A number of methods have been developed. The prediction performances have been improved to an almost perfect level. In this review, we introduce the background of this research topic. We also compare the methods, the performances and the datasets that have been used by these studies. Towards the end, we provide hints for the future directions of this research topic.
Collapse
Affiliation(s)
- Pu-Feng Du
- School of Computer Science and Technology, Tianjin University, Tianjin300350, China
| |
Collapse
|
19
|
Xu YY, Yang F, Shen HB. Incorporating organelle correlations into semi-supervised learning for protein subcellular localization prediction. Bioinformatics 2016; 32:2184-92. [DOI: 10.1093/bioinformatics/btw219] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Accepted: 04/18/2016] [Indexed: 01/08/2023] Open
|
20
|
Yu DJ, Hu J, Li QM, Tang ZM, Yang JY, Shen HB. Constructing query-driven dynamic machine learning model with application to protein-ligand binding sites prediction. IEEE Trans Nanobioscience 2015; 14:45-58. [PMID: 25730499 DOI: 10.1109/tnb.2015.2394328] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We are facing an era with annotated biological data rapidly and continuously generated. How to effectively incorporate new annotated data into the learning step is crucial for enhancing the performance of a bioinformatics prediction model. Although machine-learning-based methods have been extensively used for dealing with various biological problems, existing approaches usually train static prediction models based on fixed training datasets. The static approaches are found having several disadvantages such as low scalability and impractical when training dataset is huge. In view of this, we propose a dynamic learning framework for constructing query-driven prediction models. The key difference between the proposed framework and the existing approaches is that the training set for the machine learning algorithm of the proposed framework is dynamically generated according to the query input, as opposed to training a general model regardless of queries in traditional static methods. Accordingly, a query-driven predictor based on the smaller set of data specifically selected from the entire annotated base dataset will be applied on the query. The new way for constructing the dynamic model enables us capable of updating the annotated base dataset flexibly and using the most relevant core subset as the training set makes the constructed model having better generalization ability on the query, showing "part could be better than all" phenomenon. According to the new framework, we have implemented a dynamic protein-ligand binding sites predictor called OSML (On-site model for ligand binding sites prediction). Computer experiments on 10 different ligand types of three hierarchically organized levels show that OSML outperforms most existing predictors. The results indicate that the current dynamic framework is a promising future direction for bridging the gap between the rapidly accumulated annotated biological data and the effective machine-learning-based predictors. OSML web server and datasets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/OSML/ for academic use.
Collapse
|
21
|
Wei ZS, Yang JY, Shen HB, Yu DJ. A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites. IEEE Trans Nanobioscience 2015; 14:746-60. [DOI: 10.1109/tnb.2015.2475359] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
22
|
Shao W, Liu M, Zhang D. Human cell structure-driven model construction for predicting protein subcellular location from biological images. Bioinformatics 2015; 32:114-21. [PMID: 26363175 DOI: 10.1093/bioinformatics/btv521] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 08/31/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The systematic study of subcellular location pattern is very important for fully characterizing the human proteome. Nowadays, with the great advances in automated microscopic imaging, accurate bioimage-based classification methods to predict protein subcellular locations are highly desired. All existing models were constructed on the independent parallel hypothesis, where the cellular component classes are positioned independently in a multi-class classification engine. The important structural information of cellular compartments is missed. To deal with this problem for developing more accurate models, we proposed a novel cell structure-driven classifier construction approach (SC-PSorter) by employing the prior biological structural information in the learning model. Specifically, the structural relationship among the cellular components is reflected by a new codeword matrix under the error correcting output coding framework. Then, we construct multiple SC-PSorter-based classifiers corresponding to the columns of the error correcting output coding codeword matrix using a multi-kernel support vector machine classification approach. Finally, we perform the classifier ensemble by combining those multiple SC-PSorter-based classifiers via majority voting. RESULTS We evaluate our method on a collection of 1636 immunohistochemistry images from the Human Protein Atlas database. The experimental results show that our method achieves an overall accuracy of 89.0%, which is 6.4% higher than the state-of-the-art method. AVAILABILITY AND IMPLEMENTATION The dataset and code can be downloaded from https://github.com/shaoweinuaa/. CONTACT dqzhang@nuaa.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Shao
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Mingxia Liu
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| | - Daoqiang Zhang
- School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
| |
Collapse
|
23
|
Automated training for algorithms that learn from genomic data. BIOMED RESEARCH INTERNATIONAL 2015; 2015:234236. [PMID: 25695053 PMCID: PMC4324891 DOI: 10.1155/2015/234236] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2014] [Revised: 11/09/2014] [Accepted: 11/21/2014] [Indexed: 11/20/2022]
Abstract
Supervised machine learning algorithms are used by life scientists for a variety of objectives.
Expert-curated public gene and protein databases are major resources for gathering data to
train these algorithms. While these data resources are continuously updated, generally, these
updates are not incorporated into published machine learning algorithms which thereby can
become outdated soon after their introduction. In this paper, we propose a new model of
operation for supervised machine learning algorithms that learn from genomic data. By defining
these algorithms in a pipeline in which the training data gathering procedure and the learning
process are automated, one can create a system that generates a classifier or predictor using
information available from public resources. The proposed model is explained using three case
studies on SignalP, MemLoci, and ApicoAP in which existing machine learning models are
utilized in pipelines. Given that the vast majority of the procedures described for gathering
training data can easily be automated, it is possible to transform valuable machine learning
algorithms into self-evolving learners that benefit from the ever-changing data available for
gene products and to develop new machine learning algorithms that are similarly capable.
Collapse
|
24
|
Dehzangi A, Heffernan R, Sharma A, Lyons J, Paliwal K, Sattar A. Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. J Theor Biol 2014; 364:284-94. [PMID: 25264267 DOI: 10.1016/j.jtbi.2014.09.029] [Citation(s) in RCA: 179] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2014] [Revised: 08/11/2014] [Accepted: 09/17/2014] [Indexed: 11/17/2022]
Abstract
Protein subcellular localization is defined as predicting the functioning location of a given protein in the cell. It is considered an important step towards protein function prediction and drug design. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve protein subcellular localization prediction performance. However, relying solely on GO, this problem remains unsolved. At the same time, the impact of other sources of features especially evolutionary-based features has not been explored adequately for this task. In this study, we aim to extract discriminative evolutionary features to tackle this problem. To do this, we propose two segmentation based feature extraction methods to explore potential local evolutionary-based information for Gram-positive and Gram-negative subcellular localizations. We will show that by applying a Support Vector Machine (SVM) classifier to our extracted features, we are able to enhance Gram-positive and Gram-negative subcellular localization prediction accuracies by up to 6.4% better than previous studies including the studies that used GO for feature extraction.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia.
| | - Rhys Heffernan
- School of Engineering, Griffith University, Brisbane, Australia
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; School of Engineering and Physics, University of the South Pacific, Fiji
| | - James Lyons
- School of Engineering, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- School of Engineering, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; National ICT Australia (NICTA), Brisbane, Australia
| |
Collapse
|
25
|
Characterization of human gene locus CYYR1: a complex multi-transcript system. Mol Biol Rep 2014; 41:6025-38. [PMID: 24981926 DOI: 10.1007/s11033-014-3480-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 06/17/2014] [Indexed: 01/19/2023]
Abstract
Cysteine/tyrosine-rich 1 (CYYR1) is a gene we previously identified on human chromosome 21 starting from an in-depth bioinformatics analysis of chromosome 21 segment 40/105 (21q21.3), where no coding region had previously been predicted. CYYR1 was initially characterized as a four-exon gene, whose brain-derived cDNA sequencing predicts a 154-amino acid product. In this study we provide, with in silico and in vitro analyses, the first detailed description of the human CYYR1 locus. The analysis of this locus revealed that it is composed of a multi-transcript system, which includes at least seven CYYR1 alternative spliced isoforms and a new CYYR1 antisense gene (named CYYR1-AS1). In particular, we cloned, for the first time, the following isoforms: CYYR1-1,2,3,4b and CYYR1-1,2,3b, which present a different 3' transcribed region, with a consequent different carboxy-terminus of the predicted proteins; CYYR1-1,2,4 lacks exon 3; CYYR1-1,2,2bis,3,4 presents an additional exon between exon 2 and exon 3; CYYR1-1b,2,3,4 presents a different 5' untranslated region when compared to CYYR1. The complexity of the locus is enriched by the presence of an antisense transcript. We have cloned a long transcript overlapping with CYYR1 as an antisense RNA, probably a non-coding RNA. Expression analysis performed in different normal tissues, tumour cell lines as well as in trisomy 21 and euploid fibroblasts has confirmed a quantitative and qualitative variability in the expression pattern of the multi-transcript locus, suggesting a possible role in complex diseases that should be further investigated.
Collapse
|
26
|
Miller LC, Jiang Z, Sang Y, Harhay GP, Lager KM. Evolutionary characterization of pig interferon-inducible transmembrane gene family and member expression dynamics in tracheobronchial lymph nodes of pigs infected with swine respiratory disease viruses. Vet Immunol Immunopathol 2014; 159:180-91. [PMID: 24656980 DOI: 10.1016/j.vetimm.2014.02.015] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Studies have found that a cluster of duplicated gene loci encoding the interferon-inducible transmembrane proteins (IFITMs) family have antiviral activity against several viruses, including influenza A virus. The gene family has 5 and 7 members in humans and mice, respectively. Here, we confirm the current annotation of pig IFITM1, IFITM2, IFITM3, IFITM5, IFITM1L1 and IFITM1L4, manually annotated IFITM1L2, IFITM1L3, IFITM5L, IFITM3L1 and IFITM3L2, and provide expressed sequence tag (EST) and/or mRNA evidence, not contained with the NCBI Reference Sequence database (RefSeq), for the existence of IFITM6, IFITM7 and a new IFITM1-like (IFITM1LN) gene in pigs. Phylogenic analyses showed seven porcine IFITM genes with highly conserved human/mouse orthologs known to have anti-viral activity. Digital Gene Expression Tag Profiling (DGETP) of swine tracheobronchial lymph nodes (TBLN) of pigs infected with swine influenza virus (SIV), porcine pseudorabies virus, porcine reproductive and respiratory syndrome virus or porcine circovirus type 2 over 14 days post-inoculation (dpi) showed that gene expression abundance differs dramatically among pig IFITM family members, ranging from 0 to over 3000 tags per million. In particular, SIV up-regulated IFITM1 by 5.9 fold at 3 dpi. Bayesian framework further identified pig IFITM1 and IFITM3 as differentially expressed genes in the overall transcriptome analysis. In addition to being a component of protein complexes involved in homotypic adhesion, the IFITM1 is also associated with pathways related to regulation of cell proliferation and IFITM3 is involved in immune responses.
Collapse
Affiliation(s)
- Laura C Miller
- USDA, Agricultural Research Service, National Animal Disease Center, Virus and Prion Research Unit, 1920 Dayton Avenue, Ames, IA 50010, USA.
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA 99164, USA.
| | - Yongming Sang
- Department of Anatomy and Physiology, College of Veterinary Medicine, Kansas State University, Manhattan, KS 66506, USA.
| | - Gregory P Harhay
- Animal Health Research Unit, United States Meat Animal Research Center-USDA-ARS, Clay Center, NE 68933, USA
| | - Kelly M Lager
- USDA, Agricultural Research Service, National Animal Disease Center, Virus and Prion Research Unit, 1920 Dayton Avenue, Ames, IA 50010, USA
| |
Collapse
|
27
|
Du P, Gu S, Jiao Y. PseAAC-General: fast building various modes of general form of Chou's pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 2014; 15:3495-506. [PMID: 24577312 PMCID: PMC3975349 DOI: 10.3390/ijms15033495] [Citation(s) in RCA: 211] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 02/13/2014] [Accepted: 02/14/2014] [Indexed: 11/16/2022] Open
Abstract
The general form pseudo-amino acid composition (PseAAC) has been widely used to represent protein sequences in predicting protein structural and functional attributes. We developed the program PseAAC-General to generate various different modes of Chou’s general PseAAC, such as the gene ontology mode, the functional domain mode, and the sequential evolution mode. This program allows the users to define their own desired modes. In every mode, 544 physicochemical properties of the amino acids are available for choosing. The computing efficiency is at least 100 times that of existing programs, which makes it able to facilitate the extensive studies on proteins and peptides. The PseAAC-General is freely available via SourceForge. It runs on both Linux and Windows.
Collapse
Affiliation(s)
- Pufeng Du
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
| | - Shuwang Gu
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
| | - Yasen Jiao
- School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
| |
Collapse
|
28
|
Li X, Wu X, Wu G. Robust feature generation for protein subchloroplast location prediction with a weighted GO transfer model. J Theor Biol 2014; 347:84-94. [PMID: 24423409 DOI: 10.1016/j.jtbi.2014.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Revised: 10/17/2013] [Accepted: 01/03/2014] [Indexed: 10/25/2022]
Abstract
Chloroplasts are crucial organelles of green plants and eukaryotic algae since they conduct photosynthesis. Predicting the subchloroplast location of a protein can provide important insights for understanding its biological functions. The performance of subchloroplast location prediction algorithms often depends on deriving predictive and succinct features from genomic and proteomic data. In this work, a novel weighted Gene Ontology (GO) transfer model is proposed to generate discriminating features from sequence data and GO Categories. This model contains two components. First, we transfer the GO terms of the homologous protein, and then assign the bit-score as weights to GO features. Second, we employ term-selection methods to determine weights for GO terms. This model is capable of improving prediction accuracy due to the tolerance of the noise derived from homolog knowledge transfer. The proposed weighted GO transfer method based on bit-score and a logarithmic transformation of CHI-square (WS-LCHI) performs better than the baseline models, and also outperforms the four off-the-shelf subchloroplast prediction methods.
Collapse
Affiliation(s)
- Xiaomei Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| | - Xindong Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China; Department of Computer Science, University of Vermont, Burlington, VT 50405, USA.
| | - Gongqing Wu
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230009, PR China.
| |
Collapse
|
29
|
Du P, Xu C. Predicting multisite protein subcellular locations: progress and challenges. Expert Rev Proteomics 2014; 10:227-37. [DOI: 10.1586/epr.13.16] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
30
|
Cilingir G, Lau AO, Broschat SL. ApicoAMP: The first computational model for identifying apicoplast-targeted transmembrane proteins in Apicomplexa. J Microbiol Methods 2013; 95:313-9. [DOI: 10.1016/j.mimet.2013.09.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Revised: 09/22/2013] [Accepted: 09/23/2013] [Indexed: 10/26/2022]
|
31
|
Mei S. SVM ensemble based transfer learning for large-scale membrane proteins discrimination. J Theor Biol 2013; 340:105-10. [PMID: 24050851 DOI: 10.1016/j.jtbi.2013.09.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2013] [Revised: 09/04/2013] [Accepted: 09/06/2013] [Indexed: 11/16/2022]
Abstract
Membrane proteins play important roles in molecular trans-membrane transport, ligand-receptor recognition, cell-cell interaction, enzyme catalysis, host immune defense response and infectious disease pathways. Up to present, discriminating membrane proteins remains a challenging problem from the viewpoints of biological experimental determination and computational modeling. This work presents SVM ensemble based transfer learning model for membrane proteins discrimination (SVM-TLM). To reduce the data constraints on computational modeling, this method investigates the effectiveness of transferring the homolog knowledge to the target membrane proteins under the framework of probability weighted ensemble learning. As compared to multiple kernel learning based transfer learning model, the method takes the advantages of sparseness based SVM optimization on large data, thus more computationally efficient for large protein data analysis. The experiments on large membrane protein benchmark dataset show that SVM-TLM achieves significantly better cross validation performance than the baseline model.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, China.
| |
Collapse
|
32
|
SubMito-PSPCP: predicting protein submitochondrial locations by hybridizing positional specific physicochemical properties with pseudoamino acid compositions. BIOMED RESEARCH INTERNATIONAL 2013; 2013:263829. [PMID: 24027753 PMCID: PMC3763570 DOI: 10.1155/2013/263829] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Revised: 07/10/2013] [Accepted: 07/20/2013] [Indexed: 11/17/2022]
Abstract
Knowing the submitochondrial location of a mitochondrial protein is an important step in understanding its function. We developed a new method for predicting protein submitochondrial locations by introducing a new concept: positional specific physicochemical properties. With the framework of general form pseudoamino acid compositions, our method used only about 100 features to represent protein sequences, which is much simpler than the existing methods. On the dataset of SubMito, our method achieved over 93% overall accuracy, with 98.60% for inner membrane, 93.90% for matrix, and 70.70% for outer membrane, which are comparable to all state-of-the-art methods. As our method can be used as a general method to upgrade all pseudoamino-acid-composition-based methods, it should be very useful in future studies. We implement our method as an online service: SubMito-PSPCP.
Collapse
|
33
|
Yu DJ, Hu J, Yang J, Shen HB, Tang J, Yang JY. Designing template-free predictor for targeting protein-ligand binding sites with classifier ensemble and spatial clustering. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:994-1008. [PMID: 24334392 DOI: 10.1109/tcbb.2013.104] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Accurately identifying the protein-ligand binding sites or pockets is of significant importance for both protein function analysis and drug design. Although much progress has been made, challenges remain, especially when the 3D structures of target proteins are not available or no homology templates can be found in the library, where the template-based methods are hard to be applied. In this paper, we report a new ligand-specific template-free predictor called TargetS for targeting protein-ligand binding sites from primary sequences. TargetS first predicts the binding residues along the sequence with ligand-specific strategy and then further identifies the binding sites from the predicted binding residues through a recursive spatial clustering algorithm. Protein evolutionary information, predicted protein secondary structure, and ligand-specific binding propensities of residues are combined to construct discriminative features; an improved AdaBoost classifier ensemble scheme based on random undersampling is proposed to deal with the serious imbalance problem between positive (binding) and negative (nonbinding) samples. Experimental results demonstrate that TargetS achieves high performances and outperforms many existing predictors. TargetS web server and data sets are freely available at: http://www.csbio.sjtu.edu.cn/bioinf/TargetS/ for academic use.
Collapse
Affiliation(s)
- Dong-Jun Yu
- Nanjing University of Science and Technology, Nanjing
| | - Jun Hu
- Nanjing University of Science and Technology, Nanjing
| | - Jing Yang
- Shanghai Jiao Tong University, Shanghai and Ministry of Education of China, Shanghai
| | - Hong-Bin Shen
- Shanghai Jiao Tong University, Shanghai and Ministry of Education of China, Shanghai
| | - Jinhui Tang
- Nanjing University of Science and Technology, Nanjing
| | - Jing-Yu Yang
- Nanjing University of Science and Technology, Nanjing
| |
Collapse
|
34
|
Mooney C, Cessieux A, Shields DC, Pollastri G. SCL-Epred: a generalised de novo eukaryotic protein subcellular localisation predictor. Amino Acids 2013; 45:291-9. [PMID: 23568340 DOI: 10.1007/s00726-013-1491-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 03/26/2013] [Indexed: 11/26/2022]
Abstract
Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein's location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein-secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/ .
Collapse
Affiliation(s)
- Catherine Mooney
- Complex and Adaptive Systems Laboratory, Conway Institute of Biomolecular and Biomedical Science, School of Medicine and Medical Science, University College Dublin, Ireland.
| | | | | | | |
Collapse
|
35
|
Yu DJ, Hu J, Tang ZM, Shen HB, Yang J, Yang JY. Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.10.012] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
36
|
Learning protein multi-view features in complex space. Amino Acids 2013; 44:1365-79. [DOI: 10.1007/s00726-013-1472-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 02/13/2013] [Indexed: 12/11/2022]
|
37
|
Yu D, Wu X, Shen H, Yang J, Tang Z, Qi Y, Yang J. Enhancing Membrane Protein Subcellular Localization Prediction by Parallel Fusion of Multi-View Features. IEEE Trans Nanobioscience 2012; 11:375-85. [PMID: 22875262 DOI: 10.1109/tnb.2012.2208473] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Dongjun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | | | | | | | | | | | | |
Collapse
|
38
|
Subcellular localization prediction for human internal and organelle membrane proteins with projected gene ontology scores. J Theor Biol 2012; 313:61-7. [DOI: 10.1016/j.jtbi.2012.08.016] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Revised: 07/05/2012] [Accepted: 08/15/2012] [Indexed: 11/15/2022]
|
39
|
Li L, Zhang Y, Zou L, Li C, Yu B, Zheng X, Zhou Y. An ensemble classifier for eukaryotic protein subcellular location prediction using gene ontology categories and amino acid hydrophobicity. PLoS One 2012; 7:e31057. [PMID: 22303481 PMCID: PMC3268814 DOI: 10.1371/journal.pone.0031057] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Accepted: 12/31/2011] [Indexed: 02/05/2023] Open
Abstract
With the rapid increase of protein sequences in the post-genomic age, it is challenging to develop accurate and automated methods for reliably and quickly predicting their subcellular localizations. Till now, many efforts have been tried, but most of which used only a single algorithm. In this paper, we proposed an ensemble classifier of KNN (k-nearest neighbor) and SVM (support vector machine) algorithms to predict the subcellular localization of eukaryotic proteins based on a voting system. The overall prediction accuracies by the one-versus-one strategy are 78.17%, 89.94% and 75.55% for three benchmark datasets of eukaryotic proteins. The improved prediction accuracies reveal that GO annotations and hydrophobicity of amino acids help to predict subcellular locations of eukaryotic proteins.
Collapse
Affiliation(s)
- Liqi Li
- Department of Orthopedics, Xinqiao Hospital, Third Military Medical University, Chongqing, China
| | - Yuan Zhang
- Department of Orthopedics, Xinqiao Hospital, Third Military Medical University, Chongqing, China
| | - Lingyun Zou
- Department of Microbiology, College of Basic Medical Sciences, Third Military Medical University, Chongqing, China
| | - Changqing Li
- Department of Orthopedics, Xinqiao Hospital, Third Military Medical University, Chongqing, China
| | - Bo Yu
- Department of Orthopedics, Yichun People's Hospital, Yichun, China
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
- Scientific Computing Key Laboratory of Shanghai Universities, Shanghai, China
| | - Yue Zhou
- Department of Orthopedics, Xinqiao Hospital, Third Military Medical University, Chongqing, China
| |
Collapse
|
40
|
Grifantini R, Pagani M, Pierleoni A, Grandi A, Parri M, Campagnoli S, Pileri P, Cattaneo D, Canidio E, Pontillo A, De Camilli E, Bresciani A, Marinoni F, Pedrazzoli E, Nogarotto R, Abrignani S, Viale G, Sarmientos P, Grandi G. A novel polyclonal antibody library for expression profiling of poorly characterized, membrane and secreted human proteins. J Proteomics 2011; 75:532-47. [PMID: 21920474 DOI: 10.1016/j.jprot.2011.08.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Revised: 08/22/2011] [Accepted: 08/23/2011] [Indexed: 11/19/2022]
Abstract
The YOMICS™ antibody library (http://www.yomics.com/) presented in this article is a new collection of 1559 murine polyclonal antibodies specific for 1287 distinct human proteins. This antibody library is designed to target marginally characterized membrane-associated and secreted proteins. It was generated against human proteins annotated as transmembrane or secreted in GenBank, EnsEMBL, Vega and Uniprot databases, described in no or very few dedicated PubMed-linked publications. The selected proteins/protein regions were expressed in E. coli, purified and used to raise antibodies in the mouse. The capability of YOMICS™ antibodies to specifically recognize their target proteins either as recombinant form or as expressed in cells and tissues was confirmed through several experimental approaches, including Western blot, confocal microscopy and immunohistochemistry (IHC). Moreover, to show the applicability of the library for biomarker investigation by IHC, five antibodies against proteins either known to be expressed in some cancers or homologous to tumor-associated proteins were tested on tissue microarrays carrying tumor and normal tissues from breast, colon, lung, ovary and prostate. A consistent differential expression in cancer was observed. Our results indicate that the YOMICS™ antibody library is a tool for systematic protein expression profile analysis that nicely complements the already available commercial antibody collections.
Collapse
|
41
|
Pierleoni A, Indio V, Savojardo C, Fariselli P, Martelli PL, Casadio R. MemPype: a pipeline for the annotation of eukaryotic membrane proteins. Nucleic Acids Res 2011; 39:W375-80. [PMID: 21543452 PMCID: PMC3125734 DOI: 10.1093/nar/gkr282] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
MemPype is a Python-based pipeline including previously published methods for the prediction of signal peptides (SPEP), glycophosphatidylinositol (GPI) anchors (PredGPI), all-alpha membrane topology (ENSEMBLE), and a recent method (MemLoci) that specifically discriminates the localization of eukaryotic membrane proteins in: ‘cell membrane’, ‘internal membranes’, ‘organelle membranes’. MemLoci scores with accuracy of 70% and generalized correlation coefficient (GCC) of 0.50 on a rigorous homology-unbiased validation set and overpasses other predictors for subcellular localization. The annotation process is based both on inheritance through homology and computational methods. Each submitted protein first retrieves, when available, up to 25 similar proteins (with sequence identity ≥50% and alignment coverage ≥50% on both sequences). This helps the identification of membrane-associated proteins and detailed localization tags. Each protein is also filtered for the presence of a GPI anchor [0.8% false positive rate (FPR)]. A positive score of GPI anchor prediction labels the sequence as exposed to ‘Cell surface’. Concomitantly the sequence is analysed for the presence of a signal peptide and classified with MemLoci into one of three discriminated classes. Finally the sequence is filtered for predicting its putative all-alpha protein membrane topology (FPR <1%). The web server is available at: http://mu2py.biocomp.unibo.it/mempype.
Collapse
Affiliation(s)
- Andrea Pierleoni
- Externautics s.p.a.-Bioinformatics, Via Fiorentina 1, 53100 Siena, Italy.
| | | | | | | | | | | |
Collapse
|