1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Kumar R, Dhanda SK. Bird Eye View of Protein Subcellular Localization Prediction. Life (Basel) 2020; 10:E347. [PMID: 33327400 PMCID: PMC7764902 DOI: 10.3390/life10120347] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022] Open
Abstract
Proteins are made up of long chain of amino acids that perform a variety of functions in different organisms. The activity of the proteins is determined by the nucleotide sequence of their genes and by its 3D structure. In addition, it is essential for proteins to be destined to their specific locations or compartments to perform their structure and functions. The challenge of computational prediction of subcellular localization of proteins is addressed in various in silico methods. In this review, we reviewed the progress in this field and offered a bird eye view consisting of a comprehensive listing of tools, types of input features explored, machine learning approaches employed, and evaluation matrices applied. We hope the review will be useful for the researchers working in the field of protein localization predictions.
Collapse
Affiliation(s)
- Ravindra Kumar
- Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Sandeep Kumar Dhanda
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
3
|
Han GS, Yu ZG. ML-rRBF-ECOC: A Multi-Label Learning Classifier for Predicting Protein Subcellular Localization with Both Single and Multiple Sites. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666190103143945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The subcellular localization of a protein is closely related with its functions
and interactions. More and more evidences show that proteins may simultaneously exist at, or move
between, two or more different subcellular localizations. Therefore, predicting protein subcellular localization
is an important but challenging problem.
Observation:
Most of the existing methods for predicting protein subcellular localization assume that a
protein locates at a single site. Although a few methods have been proposed to deal with proteins with
multiple sites, correlations between subcellular localization are not efficiently taken into account. In
this paper, we propose an integrated method for predicting protein subcellular localizations with both
single site and multiple sites.
Methods:
Firstly, we extend the Multi-Label Radial Basis Function (ML-RBF) method to the regularized
version, and augment the first layer of ML-RBF to take local correlations between subcellular localization
into account. Secondly, we embed the modified ML-RBF into a multi-label Error-Correcting
Output Codes (ECOC) method in order to further consider the subcellular localization dependency. We
name our method ML-rRBF-ECOC. Finally, the performance of ML-rRBF-ECOC is evaluated on
three benchmark datasets.
Results:
The results demonstrate that ML-rRBF-ECOC has highly competitive performance to the related
multi-label learning method and some state-of-the-art methods for predicting protein subcellular
localizations with multiple sites. Considering dependency between subcellular localizations can contribute
to the improvement of prediction performance.
Conclusion:
This also indicates that correlations between different subcellular localizations really exist.
Our method at least plays a complementary role to existing methods for predicting protein subcellular
localizations with multiple sites.
Collapse
Affiliation(s)
- Guo-Sheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan 411105, China
| |
Collapse
|
4
|
Chase EE, Robicheau BM, Veinot S, Breton S, Stewart DT. The complete mitochondrial genome of the hermaphroditic freshwater mussel Anodonta cygnea (Bivalvia: Unionidae): in silico analyses of sex-specific ORFs across order Unionoida. BMC Genomics 2018; 19:221. [PMID: 29587633 PMCID: PMC5870820 DOI: 10.1186/s12864-018-4583-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 03/07/2018] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Doubly uniparental inheritance (DUI) of mitochondrial DNA in bivalves is a fascinating exception to strictly maternal inheritance as practiced by all other animals. Recent work on DUI suggests that there may be unique regions of the mitochondrial genomes that play a role in sex determination and/or sexual development in freshwater mussels (order Unionoida). In this study, one complete mitochondrial genome of the hermaphroditic swan mussel, Anodonta cygnea, is sequenced and compared to the complete mitochondrial genome of the gonochoric duck mussel, Anodonta anatina. An in silico assessment of novel proteins found within freshwater bivalve species (known as F-, H-, and M-open reading frames or ORFs) is conducted, with special attention to putative transmembrane domains (TMs), signal peptides (SPs), signal cleavage sites (SCS), subcellular localization, and potential control regions. Characteristics of TMs are also examined across freshwater mussel lineages. RESULTS In silico analyses suggests the presence of SPs and SCSs and provides some insight into possible function(s) of these novel ORFs. The assessed confidence in these structures and functions was highly variable, possibly due to the novelty of these proteins. The number and topology of putative TMs appear to be maintained among both F- and H-ORFs, however, this is not the case for M-ORFs. There does not appear to be a typical control region in H-type mitochondrial DNA, especially given the loss of tandem repeats in unassigned regions when compared to F-type mtDNA. CONCLUSION In silico analyses provides a useful tool to discover patterns in DUI and to navigate further in situ analyses related to DUI in freshwater mussels. In situ analysis will be necessary to further explore the intracellular localizations and possible role of these open reading frames in the process of sex determination in freshwater mussel.
Collapse
Affiliation(s)
- E. E. Chase
- Department of Biology, Acadia University, Wolfville, NS Canada
| | - B. M. Robicheau
- Department of Biology, Dalhousie University, Halifax, NS Canada
| | - S. Veinot
- Department of Biology, Dalhousie University, Halifax, NS Canada
| | - S. Breton
- Département de Sciences Biologiques, Université de Montréal, Montréal, QC, Canada
| | - D. T. Stewart
- Department of Biology, Acadia University, Wolfville, NS Canada
| |
Collapse
|
5
|
Kunze M. Predicting Peroxisomal Targeting Signals to Elucidate the Peroxisomal Proteome of Mammals. Subcell Biochem 2018; 89:157-199. [PMID: 30378023 DOI: 10.1007/978-981-13-2233-4_7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Peroxisomes harbor a plethora of proteins, but the peroxisomal proteome as the entirety of all peroxisomal proteins is still unknown for mammalian species. Computational algorithms can be used to predict the subcellular localization of proteins based on their amino acid sequence and this method has been amply used to forecast the intracellular fate of individual proteins. However, when applying such algorithms systematically to all proteins of an organism the prediction of its peroxisomal proteome in silico should be possible. Therefore, a reliable detection of peroxisomal targeting signals (PTS ) acting as postal codes for the intracellular distribution of the encoding protein is crucial. Peroxisomal proteins can utilize different routes to reach their destination depending on the type of PTS. Accordingly, independent prediction algorithms have been developed for each type of PTS, but only those for type-1 motifs (PTS1) have so far reached a satisfying predictive performance. This is partially due to the low number of peroxisomal proteins limiting the power of statistical analyses and partially due to specific properties of peroxisomal protein import, which render functional PTS motifs inactive in specific contexts. Moreover, the prediction of the peroxisomal proteome is limited by the high number of proteins encoded in mammalian genomes, which causes numerous false positive predictions even when using reliable algorithms and buries the few yet unidentified peroxisomal proteins. Thus, the application of prediction algorithms to identify all peroxisomal proteins is currently ineffective as stand-alone method, but can display its full potential when combined with other methods.
Collapse
Affiliation(s)
- Markus Kunze
- Department of Pathobiology of the Nervous System, Center for Brain Research, Medical University of Vienna, Vienna, Austria.
| |
Collapse
|
6
|
Huang G, Ulrich PN, Storey M, Johnson D, Tischer J, Tovar JA, Moreno SNJ, Orlando R, Docampo R. Proteomic analysis of the acidocalcisome, an organelle conserved from bacteria to human cells. PLoS Pathog 2014; 10:e1004555. [PMID: 25503798 PMCID: PMC4263762 DOI: 10.1371/journal.ppat.1004555] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 11/05/2014] [Indexed: 01/12/2023] Open
Abstract
Acidocalcisomes are acidic organelles present in a diverse range of organisms from bacteria to human cells. In this study acidocalcisomes were purified from the model organism Trypanosoma brucei, and their protein composition was determined by mass spectrometry. The results, along with those that we previously reported, show that acidocalcisomes are rich in pumps and transporters, involved in phosphate and cation homeostasis, and calcium signaling. We validated the acidocalcisome localization of seven new, putative, acidocalcisome proteins (phosphate transporter, vacuolar H+-ATPase subunits a and d, vacuolar iron transporter, zinc transporter, polyamine transporter, and acid phosphatase), confirmed the presence of six previously characterized acidocalcisome proteins, and validated the localization of five novel proteins to different subcellular compartments by expressing them fused to epitope tags in their endogenous loci or by immunofluorescence microscopy with specific antibodies. Knockdown of several newly identified acidocalcisome proteins by RNA interference (RNAi) revealed that they are essential for the survival of the parasites. These results provide a comprehensive insight into the unique composition of acidocalcisomes of T. brucei, an important eukaryotic pathogen, and direct evidence that acidocalcisomes are especially adapted for the accumulation of polyphosphate.
Collapse
Affiliation(s)
- Guozhong Huang
- Center for Tropical and Emerging Global Diseases and Department of Cellular Biology, University of Georgia, Athens, Georgia, United States of America
| | - Paul N Ulrich
- Department of Biology, Georgia State University, Atlanta, Georgia, United States of America
| | - Melissa Storey
- Center for Tropical and Emerging Global Diseases and Department of Cellular Biology, University of Georgia, Athens, Georgia, United States of America
| | - Darryl Johnson
- Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States of America
| | - Julie Tischer
- Center for Tropical and Emerging Global Diseases and Department of Cellular Biology, University of Georgia, Athens, Georgia, United States of America
| | - Javier A Tovar
- Department of Biology, Georgia State University, Atlanta, Georgia, United States of America
| | - Silvia N J Moreno
- Center for Tropical and Emerging Global Diseases and Department of Cellular Biology, University of Georgia, Athens, Georgia, United States of America
| | - Ron Orlando
- Complex Carbohydrate Research Center, University of Georgia, Athens, Georgia, United States of America
| | - Roberto Docampo
- Center for Tropical and Emerging Global Diseases and Department of Cellular Biology, University of Georgia, Athens, Georgia, United States of America
| |
Collapse
|
7
|
Céspedes N, Habel C, Lopez-Perez M, Castellanos A, Kajava AV, Servis C, Felger I, Moret R, Arévalo-Herrera M, Corradin G, Herrera S. Plasmodium vivax antigen discovery based on alpha-helical coiled coil protein motif. PLoS One 2014; 9:e100440. [PMID: 24959747 PMCID: PMC4069070 DOI: 10.1371/journal.pone.0100440] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 05/23/2014] [Indexed: 01/08/2023] Open
Abstract
Protein α-helical coiled coil structures that elicit antibody responses, which block critical functions of medically important microorganisms, represent a means for vaccine development. By using bioinformatics algorithms, a total of 50 antigens with α-helical coiled coil motifs orthologous to Plasmodium falciparum were identified in the P. vivax genome. The peptides identified in silico were chemically synthesized; circular dichroism studies indicated partial or high α-helical content. Antigenicity was evaluated using human sera samples from malaria-endemic areas of Colombia and Papua New Guinea. Eight of these fragments were selected and used to assess immunogenicity in BALB/c mice. ELISA assays indicated strong reactivity of serum samples from individuals residing in malaria-endemic regions and sera of immunized mice, with the α-helical coiled coil structures. In addition, ex vivo production of IFN-γ by murine mononuclear cells confirmed the immunogenicity of these structures and the presence of T-cell epitopes in the peptide sequences. Moreover, sera of mice immunized with four of the eight antigens recognized native proteins on blood-stage P. vivax parasites, and antigenic cross-reactivity with three of the peptides was observed when reacted with both the P. falciparum orthologous fragments and whole parasites. Results here point to the α-helical coiled coil peptides as possible P. vivax malaria vaccine candidates as were observed for P. falciparum. Fragments selected here warrant further study in humans and non-human primate models to assess their protective efficacy as single components or assembled as hybrid linear epitopes.
Collapse
MESH Headings
- Amino Acid Motifs
- Animals
- Antibodies, Protozoan/immunology
- Antigens, Protozoan/chemistry
- Antigens, Protozoan/genetics
- Antigens, Protozoan/immunology
- Circular Dichroism
- Computational Biology
- Cross Reactions/immunology
- Databases, Genetic
- Epitopes, T-Lymphocyte/chemistry
- Epitopes, T-Lymphocyte/immunology
- Female
- Genome, Protozoan
- Histocompatibility Antigens Class II/immunology
- Humans
- Immunity, Cellular
- Immunoglobulin G/blood
- Immunoglobulin G/immunology
- Mice
- Peptides/chemistry
- Peptides/immunology
- Plasmodium vivax/genetics
- Plasmodium vivax/immunology
- Protein Structure, Secondary
Collapse
Affiliation(s)
- Nora Céspedes
- Malaria Vaccine and Drug Development Center (MVDC), Cali, Colombia
- School of Health, University of Valle, Cali, Colombia
| | - Catherine Habel
- Biochemistry Department, University of Lausanne, Epalinges, Switzerland
| | | | - Angélica Castellanos
- Malaria Vaccine and Drug Development Center (MVDC), Cali, Colombia
- Fundación Centro de Primates, Cali, Colombia
| | - Andrey V. Kajava
- Centre de Recherches de Biochimie Macromoleculaire (CRBM) and Institut de Biologie Computationnelle (IBC), CNRS, University of Montpellier, Montpellier, France
- University ITMO, St. Petersburg, Russia
| | - Catherine Servis
- Biochemistry Department, University of Lausanne, Epalinges, Switzerland
| | - Ingrid Felger
- Swiss Tropical and Public Health Institute, Basel, Switzerland
| | - Remy Moret
- Hôpital Saint Camille, Ouagadougou, Burkina Faso
| | - Myriam Arévalo-Herrera
- Malaria Vaccine and Drug Development Center (MVDC), Cali, Colombia
- School of Health, University of Valle, Cali, Colombia
| | | | - Sócrates Herrera
- Malaria Vaccine and Drug Development Center (MVDC), Cali, Colombia
- Caucaseco Scientific Research Center, Cali, Colombia
- * E-mail:
| |
Collapse
|
8
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|
9
|
Fredericks WJ, McGarvey T, Wang H, Zheng Y, Fredericks NJ, Yin H, Wang LP, Hsiao W, Lee R, Weiss JS, Nickerson ML, Kruth HS, Rauscher FJ, Malkowicz SB. The TERE1 protein interacts with mitochondrial TBL2: regulation of trans-membrane potential, ROS/RNS and SXR target genes. J Cell Biochem 2013; 114:2170-87. [PMID: 23564352 DOI: 10.1002/jcb.24567] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2013] [Accepted: 04/02/2013] [Indexed: 12/12/2022]
Abstract
We originally discovered TERE1 as a potential tumor suppressor protein based upon reduced expression in bladder and prostate cancer specimens and growth inhibition of tumor cell lines/xenografts upon ectopic expression. Analysis of TERE1 (aka UBIAD1) has shown it is a prenyltransferase enzyme in the natural bio-synthetic pathways for both vitamin K-2 and COQ10 production and exhibits multiple subcellular localizations including mitochondria, endoplasmic reticulum, and golgi. Vitamin K-2 is involved in mitochondrial electron transport, SXR nuclear hormone receptor signaling and redox cycling: together these functions may form the basis for tumor suppressor function. To gain further insight into mechanisms of growth suppression and enzymatic regulation of TERE1 we isolated TERE1 associated proteins and identified the WD40 repeat, mitochondrial protein TBL2. We examined whether disease specific mutations in TERE1 affected interactions with TBL2 and the role of each protein in altering mitochondrial function, ROS/RNS production and SXR target gene regulation. Biochemical binding assays demonstrated a direct, high affinity interaction between TERE1 and TBL2 proteins; TERE1 was localized to both mitochondrial and non-mitochondrial membranes whereas TBL2 was predominantly mitochondrial; multiple independent single amino acid substitutions in TERE1 which cause a human hereditary corneal disease reduced binding to TBL2 strongly suggesting the relevance of this interaction. Ectopic TERE1 expression elevated mitochondrial trans-membrane potential, oxidative stress, NO production, and activated SXR targets. A TERE1-TBL2 complex likely functions in oxidative/nitrosative stress, lipid metabolism, and SXR signaling pathways in its role as a tumor suppressor.
Collapse
Affiliation(s)
- William J Fredericks
- Division of Urology, Department of Surgery, University of Pennsylvania and Veterans Affairs Medical Center Philadelphia, Philadelphia, Pennsylvania 19104, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Zhang Y, Fang C, Bao H, Fan H, Shen H, Yang P. Nuclear proteome profile of C57BL/6J mouse liver. SCIENCE CHINA-LIFE SCIENCES 2013; 56:513-23. [PMID: 23737002 DOI: 10.1007/s11427-013-4488-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 04/19/2013] [Indexed: 11/29/2022]
Abstract
The liver proteome can serve as a reference to better understand both disease mechanisms and possible therapeutics, since the liver is an important organ in the body that performs a large number of tasks. Here we identify the organelle proteome of C57BL/6J mouse liver nuclei as a promising strategy to enrich low abundance proteins, in the sense that analysis of whole liver cells is rather complex for current techniques and may not be suitable for proteins with low abundance. Evaluation of nucleus integrity and purity was performed to demonstrate the effectiveness of the optimized isolation procedure. The extracted nuclear proteins were identified by 2-DE MS analyses, and a total of 748 proteins were identified. Bioinformatic analyses were performed to demonstrate the physicochemical properties, cellular locations and functions of the proteins.
Collapse
Affiliation(s)
- Yang Zhang
- School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai 200433, China
| | | | | | | | | | | |
Collapse
|
11
|
Bhowmick P, Pancsa R, Guharoy M, Tompa P. Functional diversity and structural disorder in the human ubiquitination pathway. PLoS One 2013; 8:e65443. [PMID: 23734257 PMCID: PMC3667038 DOI: 10.1371/journal.pone.0065443] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 04/24/2013] [Indexed: 02/04/2023] Open
Abstract
The ubiquitin-proteasome system plays a central role in cellular regulation and protein quality control (PQC). The system is built as a pyramid of increasing complexity, with two E1 (ubiquitin activating), few dozen E2 (ubiquitin conjugating) and several hundred E3 (ubiquitin ligase) enzymes. By collecting and analyzing E3 sequences from the KEGG BRITE database and literature, we assembled a coherent dataset of 563 human E3s and analyzed their various physical features. We found an increase in structural disorder of the system with multiple disorder predictors (IUPred – E1: 5.97%, E2: 17.74%, E3: 20.03%). E3s that can bind E2 and substrate simultaneously (single subunit E3, ssE3) have significantly higher disorder (22.98%) than E3s in which E2 binding (multi RING-finger, mRF, 0.62%), scaffolding (6.01%) and substrate binding (adaptor/substrate recognition subunits, 17.33%) functions are separated. In ssE3s, the disorder was localized in the substrate/adaptor binding domains, whereas the E2-binding RING/HECT-domains were structured. To demonstrate the involvement of disorder in E3 function, we applied normal modes and molecular dynamics analyses to show how a disordered and highly flexible linker in human CBL (an E3 that acts as a regulator of several tyrosine kinase-mediated signalling pathways) facilitates long-range conformational changes bringing substrate and E2-binding domains towards each other and thus assisting in ubiquitin transfer. E3s with multiple interaction partners (as evidenced by data in STRING) also possess elevated levels of disorder (hubs, 22.90% vs. non-hubs, 18.36%). Furthermore, a search in PDB uncovered 21 distinct human E3 interactions, in 7 of which the disordered region of E3s undergoes induced folding (or mutual induced folding) in the presence of the partner. In conclusion, our data highlights the primary role of structural disorder in the functions of E3 ligases that manifests itself in the substrate/adaptor binding functions as well as the mechanism of ubiquitin transfer by long-range conformational transitions.
Collapse
Affiliation(s)
- Pallab Bhowmick
- VIB Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Rita Pancsa
- VIB Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Mainak Guharoy
- VIB Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Peter Tompa
- VIB Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- * E-mail:
| |
Collapse
|
12
|
MXL-3 and HLH-30 transcriptionally link lipolysis and autophagy to nutrient availability. Nat Cell Biol 2013; 15:668-76. [PMID: 23604316 DOI: 10.1038/ncb2741] [Citation(s) in RCA: 258] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2012] [Accepted: 03/22/2013] [Indexed: 11/08/2022]
Abstract
Fat is stored or mobilized according to food availability. Malfunction of the mechanisms that ensure this coordination underlie metabolic diseases in humans. In mammals, lysosomal and autophagic function is required for normal fat storage and mobilization in the presence or absence of food. Autophagy is tightly linked to nutrients. However, if and how lysosomal lipolysis is coupled to nutritional status remains to be determined. Here we identify MXL-3 and HLH-30 (TFEB orthologue) [corrected] as transcriptional switches coupling lysosomal lipolysis and autophagy to nutrient availability and controlling fat storage and ageing in Caenorhabditis elegans. Transcriptional coupling of lysosomal lipolysis and autophagy to nutrients is also observed in mammals. Thus, MXL-3 and HLH-30 orchestrate an adaptive and conserved cellular response to nutritional status and regulate lifespan.
Collapse
|
13
|
The RAVEN toolbox and its use for generating a genome-scale metabolic model for Penicillium chrysogenum. PLoS Comput Biol 2013; 9:e1002980. [PMID: 23555215 PMCID: PMC3605104 DOI: 10.1371/journal.pcbi.1002980] [Citation(s) in RCA: 275] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 01/24/2013] [Indexed: 01/06/2023] Open
Abstract
We present the RAVEN (Reconstruction, Analysis and Visualization of Metabolic Networks) Toolbox: a software suite that allows for semi-automated reconstruction of genome-scale models. It makes use of published models and/or the KEGG database, coupled with extensive gap-filling and quality control features. The software suite also contains methods for visualizing simulation results and omics data, as well as a range of methods for performing simulations and analyzing the results. The software is a useful tool for system-wide data analysis in a metabolic context and for streamlined reconstruction of metabolic networks based on protein homology. The RAVEN Toolbox workflow was applied in order to reconstruct a genome-scale metabolic model for the important microbial cell factory Penicillium chrysogenum Wisconsin54-1255. The model was validated in a bibliomic study of in total 440 references, and it comprises 1471 unique biochemical reactions and 1006 ORFs. It was then used to study the roles of ATP and NADPH in the biosynthesis of penicillin, and to identify potential metabolic engineering targets for maximization of penicillin production. Genome-scale models (GEMs) are large stoichiometric models of cell metabolism, where the goal is to incorporate every metabolic transformation that an organism can perform. Such models have been extensively used for the study of bacterial metabolism, in particular for metabolic engineering purposes. More recently, the use of GEMs for eukaryotic organisms has become increasingly widespread. Since these models typically involve thousands of metabolic reactions, the reconstruction and validation of them can be a very complex task. We have developed a software suite, RAVEN Toolbox, which aims at automating parts of the reconstruction process in order to allow for faster reconstruction of high-quality GEMs. The software is particularly well suited for reconstruction of models for eukaryotic organisms, due to how it deals with sub-cellular localization of reactions. We used the software for reconstructing a model of the filamentous fungi Penicillium chrysogenum, the organism used in penicillin production and an important microbial cell factory. The resulting model was validated through an extensive literature survey and by comparison with published fermentation data. The model was used for the identification of transcriptionally regulated metabolic bottlenecks in order to increase the yield in penicillin fermentations. In this paper we present the RAVEN Toolbox and the GEM for P. chrysogenum.
Collapse
|
14
|
Han GS, Yu ZG, Anh V, Krishnajith APD, Tian YC. An ensemble method for predicting subnuclear localizations from primary protein structures. PLoS One 2013; 8:e57225. [PMID: 23460833 PMCID: PMC3584121 DOI: 10.1371/journal.pone.0057225] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 01/18/2013] [Indexed: 12/04/2022] Open
Abstract
Background Predicting protein subnuclear localization is a challenging problem. Some previous works based on non-sequence information including Gene Ontology annotations and kernel fusion have respective limitations. The aim of this work is twofold: one is to propose a novel individual feature extraction method; another is to develop an ensemble method to improve prediction performance using comprehensive information represented in the form of high dimensional feature vector obtained by 11 feature extraction methods. Methodology/Principal Findings A novel two-stage multiclass support vector machine is proposed to predict protein subnuclear localizations. It only considers those feature extraction methods based on amino acid classifications and physicochemical properties. In order to speed up our system, an automatic search method for the kernel parameter is used. The prediction performance of our method is evaluated on four datasets: Lei dataset, multi-localization dataset, SNL9 dataset and a new independent dataset. The overall accuracy of prediction for 6 localizations on Lei dataset is 75.2% and that for 9 localizations on SNL9 dataset is 72.1% in the leave-one-out cross validation, 71.7% for the multi-localization dataset and 69.8% for the new independent dataset, respectively. Comparisons with those existing methods show that our method performs better for both single-localization and multi-localization proteins and achieves more balanced sensitivities and specificities on large-size and small-size subcellular localizations. The overall accuracy improvements are 4.0% and 4.7% for single-localization proteins and 6.5% for multi-localization proteins. The reliability and stability of our classification model are further confirmed by permutation analysis. Conclusions It can be concluded that our method is effective and valuable for predicting protein subnuclear localizations. A web server has been designed to implement the proposed method. It is freely available at http://bioinformatics.awowshop.com/snlpred_page.php.
Collapse
Affiliation(s)
- Guo Sheng Han
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan City, Hunan, China
| | - Zu Guo Yu
- School of Mathematics and Computational Science, Xiangtan University, Xiangtan City, Hunan, China
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
- * E-mail:
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Anaththa P. D. Krishnajith
- School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Yu-Chu Tian
- School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Queensland, Australia
| |
Collapse
|
15
|
Lin Q, Tan HT, Lim HSR, Chung MCM. Sieving through the cancer secretome. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:2360-71. [PMID: 23376431 DOI: 10.1016/j.bbapap.2013.01.030] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Revised: 01/03/2013] [Accepted: 01/24/2013] [Indexed: 12/22/2022]
Abstract
Cancer is among the most prevalent and serious health problems worldwide. Therefore, there is an urgent need for novel cancer biomarkers with high sensitivity and specificity for early detection and management of the disease. The cancer secretome, encompassing all the proteins that are secreted by cancer cells, is a promising source of biomarkers as the secreted proteins are most likely to enter the blood circulation. Moreover, since secreted proteins are responsible for signaling and communication with the tumor microenvironment, studying the cancer secretome would further the understanding of cancer biology. Latest developments in proteomics technologies have significantly advanced the study of the cancer secretome. In this review, we will present an overview of the secretome sample preparation process and summarize the data from recent secretome studies of six common cancers with high mortality (breast, colorectal, gastric, liver, lung and prostate cancers). In particular, we will focus on the various platforms that were employed and discuss the clinical applicability of the key findings in these studies. This article is part of a Special Issue entitled: An Updated Secretome.
Collapse
Affiliation(s)
- Qifeng Lin
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, 117597 Singapore
| | | | | | | |
Collapse
|
16
|
Yu D, Wu X, Shen H, Yang J, Tang Z, Qi Y, Yang J. Enhancing Membrane Protein Subcellular Localization Prediction by Parallel Fusion of Multi-View Features. IEEE Trans Nanobioscience 2012; 11:375-85. [PMID: 22875262 DOI: 10.1109/tnb.2012.2208473] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Dongjun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| | | | | | | | | | | | | |
Collapse
|
17
|
A novel algorithm combining support vector machine with the discrete wavelet transform for the prediction of protein subcellular localization. Comput Biol Med 2012; 42:180-7. [DOI: 10.1016/j.compbiomed.2011.11.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2011] [Revised: 09/29/2011] [Accepted: 11/15/2011] [Indexed: 02/03/2023]
|
18
|
Mooney C, Wang YH, Pollastri G. SCLpred: protein subcellular localization prediction by N-to-1 neural networks. Bioinformatics 2011; 27:2812-9. [PMID: 21873639 DOI: 10.1093/bioinformatics/btr494] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Knowledge of the subcellular location of a protein provides valuable information about its function and possible interaction with other proteins. In the post-genomic era, fast and accurate predictors of subcellular location are required if this abundance of sequence data is to be fully exploited. We have developed a subcellular localization predictor (SCLpred), which predicts the location of a protein into four classes for animals and fungi and five classes for plants (secreted, cytoplasm, nucleus, mitochondrion and chloroplast) using machine learning models trained on large non-redundant sets of protein sequences. The algorithm powering SCLpred is a novel Neural Network (N-to-1 Neural Network, or N1-NN) we have developed, which is capable of mapping whole sequences into single properties (a functional class, in this work) without resorting to predefined transformations, but rather by adaptively compressing the sequence into a hidden feature vector. We benchmark SCLpred against other publicly available predictors using two benchmarks including a new subset of Swiss-Prot Release 2010_06. We show that SCLpred surpasses the state of the art. The N1-NN algorithm is fully general and may be applied to a host of problems of similar shape, that is, in which a whole sequence needs to be mapped into a fixed-size array of properties, and the adaptive compression it operates may shed light on the space of protein sequences. AVAILABILITY The predictive systems described in this article are publicly available as a web server at http://distill.ucd.ie/distill/. CONTACT gianluca.pollastri@ucd.ie.
Collapse
Affiliation(s)
- Catherine Mooney
- School of Computer Science and Informatics, University College Dublin, Belfield, Ireland
| | | | | |
Collapse
|
19
|
Sheng JJ, Acquaah-Mensah GK. Subcellular location and molecular mobility of human cytosolic sulfotransferase 1C1 in living human embryonic kidney 293 cells. Drug Metab Dispos 2011; 39:1334-7. [PMID: 21546557 DOI: 10.1124/dmd.111.039537] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Cytosolic sulfotransferases were first isolated from the hepatic cytosol, and they have been localized in the cytoplasm of formaldehyde-fixed human cell samples. The current work was carried out to determine the subcellular localization and molecular mobility of cytosolic sulfotransferases in living human embryonic kidney (HEK) 293 cells. In this work, the subcellular location of human cytosolic sulfotransferase 1C1 (SULT1C1) was studied in cultured HEK293 cells using confocal laser-scanning microscopy. A green fluorescent protein (GFP)-tagged SULT1C1 protein was localized in the cytoplasm of living HEK293 cells. This is consistent with results from previous studies on several other cytosolic sulfotransferase isoforms. Fluorescence recovery after photobleaching microscopy was performed to assess the molecular mobility of the expressed GFP-SULT1C1 molecules. The results suggested that the expressed recombinant GFP-SULT1C1 molecules in living HEK293 cells may include both mobile and immobile populations. To obtain additional insights into the subcellular location of SULT1C1, two machine learning algorithms, Sequential Minimal Optimization and Multilayer Perceptron, were used to compute the probability distribution for the localization of SULT1C1 in nine selected cellular compartments. The resulting probability distribution suggested that the most likely subcellular location of SULT1C1 is the cytosol.
Collapse
Affiliation(s)
- Jonathan J Sheng
- Department of Pharmaceutical Sciences, School of Pharmacy-Worcester/Manchester, Massachusetts College of Pharmacy and Health Sciences, Worcester, MA 01608, USA.
| | | |
Collapse
|
20
|
Ulrich PN, Jimenez V, Park M, Martins VP, Atwood J, Moles K, Collins D, Rohloff P, Tarleton R, Moreno SNJ, Orlando R, Docampo R. Identification of contractile vacuole proteins in Trypanosoma cruzi. PLoS One 2011; 6:e18013. [PMID: 21437209 PMCID: PMC3060929 DOI: 10.1371/journal.pone.0018013] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2010] [Accepted: 02/22/2011] [Indexed: 11/19/2022] Open
Abstract
Contractile vacuole complexes are critical components of cell volume regulation
and have been shown to have other functional roles in several free-living
protists. However, very little is known about the functions of the contractile
vacuole complex of the parasite Trypanosoma cruzi, the
etiologic agent of Chagas disease, other than a role in osmoregulation.
Identification of the protein composition of these organelles is important for
understanding their physiological roles. We applied a combined proteomic and
bioinfomatic approach to identify proteins localized to the contractile vacuole.
Proteomic analysis of a T. cruzi fraction enriched for
contractile vacuoles and analyzed by one-dimensional gel electrophoresis and
LC-MS/MS resulted in the addition of 109 newly detected proteins to the group of
expressed proteins of epimastigotes. We also identified different peptides that
map to at least 39 members of the dispersed gene family 1 (DGF-1) providing
evidence that many members of this family are simultaneously expressed in
epimastigotes. Of the proteins present in the fraction we selected several
homologues with known localizations in contractile vacuoles of other organisms
and others that we expected to be present in these vacuoles on the basis of
their potential roles. We determined the localization of each by expression as
GFP-fusion proteins or with specific antibodies. Six of these putative proteins
(Rab11, Rab32, AP180, ATPase subunit B, VAMP1, and phosphate transporter)
predominantly localized to the vacuole bladder. TcSNARE2.1, TcSNARE2.2, and
calmodulin localized to the spongiome. Calmodulin was also cytosolic. Our
results demonstrate the utility of combining subcellular fractionation,
proteomic analysis, and bioinformatic approaches for localization of organellar
proteins that are difficult to detect with whole cell methodologies. The CV
localization of the proteins investigated revealed potential novel roles of
these organelles in phosphate metabolism and provided information on the
potential participation of adaptor protein complexes in their biogenesis.
Collapse
Affiliation(s)
- Paul N. Ulrich
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Veronica Jimenez
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Miyoung Park
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Vicente P. Martins
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - James Atwood
- Complex Carbohydrate Research Center,
University of Georgia, Athens, Georgia, United States of America
| | - Kristen Moles
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Dalis Collins
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Peter Rohloff
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Rick Tarleton
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Silvia N. J. Moreno
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
| | - Ron Orlando
- Complex Carbohydrate Research Center,
University of Georgia, Athens, Georgia, United States of America
| | - Roberto Docampo
- Center for Tropical and Emerging Global
Diseases and Department of Cellular Biology, University of Georgia, Athens,
Georgia, United States of America
- * E-mail:
| |
Collapse
|
21
|
Pierleoni A, Martelli PL, Casadio R. MemLoci: predicting subcellular localization of membrane proteins in eukaryotes. Bioinformatics 2011; 27:1224-30. [DOI: 10.1093/bioinformatics/btr108] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
22
|
Laurila K, Vihinen M. PROlocalizer: integrated web service for protein subcellular localization prediction. Amino Acids 2011; 40:975-80. [PMID: 20811800 PMCID: PMC3040813 DOI: 10.1007/s00726-010-0724-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2010] [Accepted: 08/10/2010] [Indexed: 12/01/2022]
Abstract
Subcellular localization is an important protein property, which is related to function, interactions and other features. As experimental determination of the localization can be tedious, especially for large numbers of proteins, a number of prediction tools have been developed. We developed the PROlocalizer service that integrates 11 individual methods to predict altogether 12 localizations for animal proteins. The method allows the submission of a number of proteins and mutations and generates a detailed informative document of the prediction and obtained results. PROlocalizer is available at http://bioinf.uta.fi/PROlocalizer/ .
Collapse
Affiliation(s)
- Kirsti Laurila
- Department of Signal Processing, Tampere University of Technology, P.O. Box 527, 33101 Tampere, Finland
- Institute of Medical Technology, University of Tampere, 33014 Tampere, Finland
| | - Mauno Vihinen
- Institute of Medical Technology, University of Tampere, 33014 Tampere, Finland
- Science Center, Tampere University Hospital, 33520 Tampere, Finland
| |
Collapse
|
23
|
Vertommen A, Panis B, Swennen R, Carpentier SC. Challenges and solutions for the identification of membrane proteins in non-model plants. J Proteomics 2011; 74:1165-81. [PMID: 21354347 DOI: 10.1016/j.jprot.2011.02.016] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2010] [Revised: 02/04/2011] [Accepted: 02/16/2011] [Indexed: 01/27/2023]
Abstract
The workhorse for proteomics in non-model plants is classical two-dimensional electrophoresis, a combination of iso-electric focusing and SDS-PAGE. However, membrane proteins with multiple membrane spanning domains are hardly detected on classical 2-DE gels because of their low abundance and poor solubility in aqueous media. In the current review, solutions that have been proposed to handle these two problems in non-model plants are discussed. An overview of alternative techniques developed for membrane proteomics is provided together with a comparison of their strong and weak points. Subsequently, strengths and weaknesses of the different techniques and methods to evaluate the identification of membrane proteins are discussed. Finally, an overview of recent plant membrane proteome studies is provided with the used separation technique and the number of identified membrane proteins listed.
Collapse
Affiliation(s)
- A Vertommen
- Laboratory of Tropical Crop Improvement, Department of Biosystems, K.U. Leuven, Kasteelpark Arenberg 13, B-3001 Heverlee, Belgium
| | | | | | | |
Collapse
|
24
|
Gianazza E, Eberini I, Sensi C, Barile M, Vergani L, Vanoni MA. Energy matters: mitochondrial proteomics for biomedicine. Proteomics 2011; 11:657-74. [PMID: 21241019 DOI: 10.1002/pmic.201000412] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Revised: 09/22/2010] [Accepted: 11/03/2010] [Indexed: 12/16/2022]
Abstract
This review compiles results of medical relevance from mitochondrial proteomics, grouped either according to the type of disease - genetic or degenerative - or to the involved mechanism - oxidative stress or apoptosis. The findings are commented in the light of our current understanding of uniformity/variability in cell responses to different stimuli. Specificities in the conceptual and technical approaches to human mitochondrial proteomics are also outlined.
Collapse
Affiliation(s)
- Elisabetta Gianazza
- Dipartimento di Scienze Farmacologiche, Università degli Studi di Milano, Milano, Italy.
| | | | | | | | | | | |
Collapse
|
25
|
Mooney C, Wang YH, Pollastri G. De Novo Protein Subcellular Localization Prediction by N-to-1 Neural Networks. COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS 2011. [DOI: 10.1007/978-3-642-21946-7_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
26
|
Abstract
Membrane proteins are key molecules in the cell and are important targets for drug development. Much effort has, therefore, been directed towards research of this group of proteins, but their hydrophobic nature can make working with them challenging. Here we discuss methodologies used in the study of the membrane proteome, specifically discussing approaches that circumvent technical issues specific to the membrane. In addition, we review several techniques used for visualization, qualification, quantitation and localization of membrane proteins. The combination of the techniques we describe holds great promise to allow full characterization of the membrane proteome and to map the dynamic changes within it essential for cellular function.
Collapse
Affiliation(s)
- Arnoud J Groen
- Cambridge Centre for Proteomics, Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
27
|
Oikawa A, Joshi HJ, Rennie EA, Ebert B, Manisseri C, Heazlewood JL, Scheller HV. An integrative approach to the identification of Arabidopsis and rice genes involved in xylan and secondary wall development. PLoS One 2010; 5:e15481. [PMID: 21124849 PMCID: PMC2990762 DOI: 10.1371/journal.pone.0015481] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Accepted: 09/24/2010] [Indexed: 11/19/2022] Open
Abstract
Xylans constitute the major non-cellulosic component of plant biomass. Xylan biosynthesis is particularly pronounced in cells with secondary walls, implying that the synthesis network consists of a set of highly expressed genes in such cells. To improve the understanding of xylan biosynthesis, we performed a comparative analysis of co-expression networks between Arabidopsis and rice as reference species with different wall types. Many co-expressed genes were represented by orthologs in both species, which implies common biological features, while some gene families were only found in one of the species, and therefore likely to be related to differences in their cell walls. To predict the subcellular location of the identified proteins, we developed a new method, PFANTOM (plant protein family information-based predictor for endomembrane), which was shown to perform better for proteins in the endomembrane system than other available prediction methods. Based on the combined approach of co-expression and predicted cellular localization, we propose a model for Arabidopsis and rice xylan synthesis in the Golgi apparatus and signaling from plasma membrane to nucleus for secondary cell wall differentiation. As an experimental validation of the model, we show that an Arabidopsis mutant in the PGSIP1 gene encoding one of the Golgi localized candidate proteins has a highly decreased content of glucuronic acid in secondary cell walls and substantially reduced xylan glucuronosyltransferase activity.
Collapse
Affiliation(s)
- Ai Oikawa
- Feedstocks Division, Joint BioEnergy Institute, Emeryville, California, USA
| | | | | | | | | | | | | |
Collapse
|
28
|
Shen YQ, Burger G. TESTLoc: protein subcellular localization prediction from EST data. BMC Bioinformatics 2010; 11:563. [PMID: 21078192 PMCID: PMC3000424 DOI: 10.1186/1471-2105-11-563] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2010] [Accepted: 11/15/2010] [Indexed: 11/25/2022] Open
Abstract
Background The eukaryotic cell has an intricate architecture with compartments and substructures dedicated to particular biological processes. Knowing the subcellular location of proteins not only indicates how bio-processes are organized in different cellular compartments, but also contributes to unravelling the function of individual proteins. Computational localization prediction is possible based on sequence information alone, and has been successfully applied to proteins from virtually all subcellular compartments and all domains of life. However, we realized that current prediction tools do not perform well on partial protein sequences such as those inferred from Expressed Sequence Tag (EST) data, limiting the exploitation of the large and taxonomically most comprehensive body of sequence information from eukaryotes. Results We developed a new predictor, TESTLoc, suited for subcellular localization prediction of proteins based on their partial sequence conceptually translated from ESTs (EST-peptides). Support Vector Machine (SVM) is used as computational method and EST-peptides are represented by different features such as amino acid composition and physicochemical properties. When TESTLoc was applied to the most challenging test case (plant data), it yielded high accuracy (~85%). Conclusions TESTLoc is a localization prediction tool tailored for EST data. It provides a variety of models for the users to choose from, and is available for download at http://megasun.bch.umontreal.ca/~shenyq/TESTLoc/TESTLoc.html
Collapse
Affiliation(s)
- Yao-Qing Shen
- Robert-Cedergren Center for Bioinformatics and Genomics; Biochemistry Department, Université de Montréal, 2900 Edouard-Montpetit, Montreal, QC, H3T 1J4, Canada.
| | | |
Collapse
|
29
|
Lee YH, Tan HT, Chung MCM. Subcellular fractionation methods and strategies for proteomics. Proteomics 2010; 10:3935-56. [DOI: 10.1002/pmic.201000289] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
30
|
Gudimella R, Nallapeta S, Varadwaj P, Suravajhala P. Fungome: Annotating proteins implicated in fungal pathogenesis. Bioinformation 2010; 5:202-207. [PMID: 21364798 PMCID: PMC3040500 DOI: 10.6026/97320630005202] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Accepted: 08/25/2010] [Indexed: 12/03/2022] Open
Abstract
Sequencing genomes of different pathogenic fungi produced plethora of genetic information. This "omics" data might be of great interest to probe strain diversity, identify virulence factors and complementary genes in other fungal species, and importantly in predicting the role of proteins specific to pathogenesis in humans. We propose a component called "fungome" for those fungal proteins implicated in pathogenesis which, we believe, will allow researchers to improve the annotation of fungal proteins.
Collapse
Affiliation(s)
| | | | - Pritish Varadwaj
- Bioinformatics division, Indian Institute of Information Technology, Allahabad 211012, UP, India
| | - Prashanth Suravajhala
- Department of Science, Systems and Models, Roskilde University, 4000 Roskilde, Denmark
| |
Collapse
|
31
|
Zakeri P, Moshiri B, Sadeghi M. Prediction of protein submitochondria locations based on data fusion of various features of sequences. J Theor Biol 2010; 269:208-16. [PMID: 21040732 DOI: 10.1016/j.jtbi.2010.10.026] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2010] [Revised: 10/16/2010] [Accepted: 10/22/2010] [Indexed: 01/16/2023]
Abstract
In this study, the predictors are developed for protein submitochondria locations based on various features of sequences. Information about the submitochondria location for a mitochondria protein can provide much better understanding about its function. We use ten representative models of protein samples such as pseudo amino acid composition, dipeptide composition, functional domain composition, the combining discrete model based on prediction of solvent accessibility and secondary structure elements, the discrete model of pairwise sequence similarity, etc. We construct a predictor based on support vector machines (SVMs) for each representative model. The overall prediction accuracy by the leave-one-out cross validation test obtained by the predictor which is based on the discrete model of pairwise sequence similarity is 1% better than the best computational system that exists for this problem. Moreover, we develop a method based on ordered weighted averaging (OWA) which is one of the fusion data operators. Therefore, OWA is applied on the 11 best SVM-based classifiers that are constructed based on various features of sequence. This method is called Mito-Loc. The overall leave-one-out cross validation accuracy obtained by Mito-Loc is about 95%. This indicates that our proposed approach (Mito-Loc) is superior to the result of the best existing approach which has already been reported.
Collapse
Affiliation(s)
- Pooya Zakeri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | | | | |
Collapse
|
32
|
Prediction of midbody, centrosome and kinetochore proteins based on gene ontology information. Biochem Biophys Res Commun 2010; 401:382-4. [PMID: 20854791 DOI: 10.1016/j.bbrc.2010.09.061] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Accepted: 09/14/2010] [Indexed: 01/21/2023]
Abstract
In the process of cell division, a great deal of proteins is assembled into three distinct organelles, namely midbody, centrosome and kinetochore. Knowing the localization of microkit (midbody, centrosome and kinetochore) proteins will facilitate drug target discovery and provide novel insights into understanding their functions. In this study, a support vector machine (SVM) model, MicekiPred, was presented to predict the localization of microkit proteins based on gene ontology (GO) information. A total accuracy of 77.51% was achieved using the jackknife cross-validation. This result shows that the model will be an effective complementary tool for future experimental study. The prediction model and dataset used in this article can be freely downloaded from http://cobi.uestc.edu.cn/people/hlin/tools/MicekiPred/.
Collapse
|
33
|
Yu L, Guo Y, Zhang Z, Li Y, Li M, Li G, Xiong W, Zeng Y. SecretP: a new method for predicting mammalian secreted proteins. Peptides 2010; 31:574-8. [PMID: 20045033 DOI: 10.1016/j.peptides.2009.12.026] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2009] [Revised: 12/17/2009] [Accepted: 12/17/2009] [Indexed: 11/19/2022]
Abstract
In contrast to a large number of classically secreted proteins (CSPs) and non-secreted proteins (NSPs), only a few proteins have been experimentally proved to enter non-classical secretory pathways. So it is difficult to identify non-classically secreted proteins (NCSPs), and no methods are available for distinguishing the three types of proteins simultaneously. In order to solve this problem, a data mining has been taken firstly, and mammalian proteins exported via ER-Golgi-independent pathways are collected through extensive literature searches. In this paper, a support vector machine (SVM)-based ternary classifier named SecretP is proposed to predict mammalian secreted proteins by using pseudo-amino acid composition (PseAA) and five additional features. When distinguishing the three types of proteins, SecretP yielded an accuracy of 88.79%. Evaluating the performance of our method by an independent test set of 92 human proteins, 76 of them are correctly predicted as NCSPs. When performed on another public independent data set, the prediction result of SecretP is comparable to those of other existing computational methods. Therefore, SecretP can be a useful supplementary tool for future secretome studies. The web server SecretP and all supplementary tables listed in this paper are freely available at http://cic.scu.edu.cn/bioinformatics/secretp/index.htm.
Collapse
Affiliation(s)
- Lezheng Yu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | | | | | | | | | | | | | | |
Collapse
|
34
|
Vallesi A, Di Pretoro B, Ballarini P, Apone F, Luporini P. A Novel Protein Kinase from the Ciliate Euplotes raikovi with Close Structural Identity to the Mammalian Intestinal and Male-Germ Cell Kinases: Characterization and Functional Implications in the Autocrine Pheromone Signaling Loop. Protist 2010; 161:250-63. [DOI: 10.1016/j.protis.2009.12.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2009] [Accepted: 11/21/2009] [Indexed: 12/01/2022]
|
35
|
Nevo I, Oberthuer A, Botzer E, Sagi-Assif O, Maman S, Pasmanik-Chor M, Kariv N, Fischer M, Yron I, Witz IP. Gene-expression-based analysis of local and metastatic neuroblastoma variants reveals a set of genes associated with tumor progression in neuroblastoma patients. Int J Cancer 2010; 126:1570-81. [PMID: 19739072 DOI: 10.1002/ijc.24889] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Metastasis is the primary cause of mortality in Neuroblastoma (NB) patients, but the metastatic process in NB is poorly understood. Metastsis is a multistep process that requires the coordinated action of many genes. The identification of genes that promote or suppress tumor metastasis can advance our understanding of this process. In the present study, we utilized a human NB xenograft model comprising local and metastatic NB variants, which was recently developed in our laboratory. We set out to identify molecular correlates of NB metastasis and to determine the clinical relevance of these molecules. We first performed genome-wide expression profiles of metastatic and nonmetastatic NB variants that have an identical genetic background. We found that some of the proteins highly expressed in the metastatic NB variants are localized in the cytoplasm and endoplasmic reticulum. Other proteins are linked to metabolic processes and signaling pathways, thereby supporting the invasive and metastatic state of the cells. Subsequently, we intersected the differentially expressed genes in the human xenografted variants with genes differentially expressed in Stage 1 and Stage 4 primary tumors of NB patients. By using the same gene-expression platform, molecular correlates associated with metastatic progression in primary NB tumors were identified. The resulting smaller gene set was clinically relevant as it discriminated between high- and low-risk NB patients.
Collapse
Affiliation(s)
- Ido Nevo
- Department of Cell Research and Immunology, The George S. Wise Faculty of Life Science, Tel-Aviv University, Tel-Aviv, Israel
| | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Hundsrucker C, Skroblin P, Christian F, Zenn HM, Popara V, Joshi M, Eichhorst J, Wiesner B, Herberg FW, Reif B, Rosenthal W, Klussmann E. Glycogen synthase kinase 3beta interaction protein functions as an A-kinase anchoring protein. J Biol Chem 2009; 285:5507-21. [PMID: 20007971 DOI: 10.1074/jbc.m109.047944] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
A-kinase anchoring proteins (AKAPs) include a family of scaffolding proteins that target protein kinase A (PKA) and other signaling proteins to cellular compartments and thereby confine the activities of the associated proteins to distinct regions within cells. AKAPs bind PKA directly. The interaction is mediated by the dimerization and docking domain of regulatory subunits of PKA and the PKA-binding domain of AKAPs. Analysis of the interactions between the dimerization and docking domain and various PKA-binding domains yielded a generalized motif allowing the identification of AKAPs. Our bioinformatics and peptide array screening approaches based on this signature motif identified GSKIP (glycogen synthase kinase 3beta interaction protein) as an AKAP. GSKIP directly interacts with PKA and GSK3beta (glycogen synthase kinase 3beta). It is widely expressed and facilitates phosphorylation and thus inactivation of GSK3beta by PKA. GSKIP contains the evolutionarily conserved domain of unknown function 727. We show here that this domain of GSKIP and its vertebrate orthologues binds both PKA and GSK3beta and thereby provides a mechanism for the integration of PKA and GSK3beta signaling pathways.
Collapse
Affiliation(s)
- Christian Hundsrucker
- Leibniz Institute for Molecular Pharmacology, Robert-Rössle-Strasse 10, 13125 Berlin, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Lin HN, Chen CT, Sung TY, Ho SY, Hsu WL. Protein subcellular localization prediction of eukaryotes using a knowledge-based approach. BMC Bioinformatics 2009; 10 Suppl 15:S8. [PMID: 19958518 PMCID: PMC2788359 DOI: 10.1186/1471-2105-10-s15-s8] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The study of protein subcellular localization (PSL) is important for elucidating protein functions involved in various cellular processes. However, determining the localization sites of a protein through wet-lab experiments can be time-consuming and labor-intensive. Thus, computational approaches become highly desirable. Most of the PSL prediction systems are established for single-localized proteins. However, a significant number of eukaryotic proteins are known to be localized into multiple subcellular organelles. Many studies have shown that proteins may simultaneously locate or move between different cellular compartments and be involved in different biological processes with different roles. RESULTS In this study, we propose a knowledge based method, called KnowPredsite, to predict the localization site(s) of both single-localized and multi-localized proteins. Based on the local similarity, we can identify the "related sequences" for prediction. We construct a knowledge base to record the possible sequence variations for protein sequences. When predicting the localization annotation of a query protein, we search against the knowledge base and used a scoring mechanism to determine the predicted sites. We downloaded the dataset from ngLOC, which consisted of ten distinct subcellular organelles from 1923 species, and performed ten-fold cross validation experiments to evaluate KnowPred site's performance. The experiment results show that KnowPred site achieves higher prediction accuracy than ngLOC and Blast-hit method. For single-localized proteins, the overall accuracy of KnowPred site is 91.7%. For multi-localized proteins, the overall accuracy of KnowPred site is 72.1%, which is significantly higher than that of ngLOC by 12.4%. Notably, half of the proteins in the dataset that cannot find any Blast hit sequence above a specified threshold can still be correctly predicted by KnowPred site. CONCLUSION KnowPred site demonstrates the power of identifying related sequences in the knowledge base. The experiment results show that even though the sequence similarity is low, the local similarity is effective for prediction. Experiment results show that KnowPred site is a highly accurate prediction method for both single- and multi-localized proteins. It is worth-mentioning the prediction process of KnowPred site is transparent and biologically interpretable and it shows a set of template sequences to generate the prediction result. The KnowPred site prediction server is available at http://bio-cluster.iis.sinica.edu.tw/kbloc/.
Collapse
Affiliation(s)
- Hsin-Nan Lin
- Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan, Republic of China.
| | | | | | | | | |
Collapse
|
38
|
McQuade LR, Schmidt U, Pascovici D, Stojanov T, Baker MS. Improved Membrane Proteomics Coverage of Human Embryonic Stem Cells by Peptide IPG-IEF. J Proteome Res 2009; 8:5642-9. [DOI: 10.1021/pr900597s] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Leon R. McQuade
- Australian Proteome Analysis Facility, Faculty of Science, Macquarie University, NSW 2109, Australia, and Sydney IVF Stem Cells, Sydney, NSW 2000, Australia
| | - Uli Schmidt
- Australian Proteome Analysis Facility, Faculty of Science, Macquarie University, NSW 2109, Australia, and Sydney IVF Stem Cells, Sydney, NSW 2000, Australia
| | - Dana Pascovici
- Australian Proteome Analysis Facility, Faculty of Science, Macquarie University, NSW 2109, Australia, and Sydney IVF Stem Cells, Sydney, NSW 2000, Australia
| | - Tomas Stojanov
- Australian Proteome Analysis Facility, Faculty of Science, Macquarie University, NSW 2109, Australia, and Sydney IVF Stem Cells, Sydney, NSW 2000, Australia
| | - Mark S. Baker
- Australian Proteome Analysis Facility, Faculty of Science, Macquarie University, NSW 2109, Australia, and Sydney IVF Stem Cells, Sydney, NSW 2000, Australia
| |
Collapse
|
39
|
Park S, Yang JS, Jang SK, Kim S. Construction of functional interaction networks through consensus localization predictions of the human proteome. J Proteome Res 2009; 8:3367-76. [PMID: 19415893 DOI: 10.1021/pr900018z] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Characterizing the subcellular localization of a protein provides a key clue for understanding protein function. However, different protein localization prediction programs often deliver conflicting results regarding the localization of the same protein. As the number of available localization prediction programs continues to grow, there is a need for a consensus prediction approach. To address this need, we developed a consensus localization prediction method called ConLoc based on a large-scale, systematic integration of 13 available programs that make predictions for five major subcellular localizations (cytosol, extracellular, mitochondria, nucleus, and plasma membrane). The ability of ConLoc to accurately predict protein localization was substantially better than existing programs. Using ConLoc prediction, we built a localization-guided functional interaction network of the human proteome and mapped known disease associations within this network. We found a high degree of shared disease associations among functionally interacting proteins that are localized to the same cellular compartment. Thus, the use of consensus localization prediction, such as ConLoc, is a new approach for the identification of novel disease associated genes.
Collapse
Affiliation(s)
- Solip Park
- School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang 790-784, Korea
| | | | | | | |
Collapse
|
40
|
Desler C, Suravajhala P, Sanderhoff M, Rasmussen M, Rasmussen LJ. In Silico screening for functional candidates amongst hypothetical proteins. BMC Bioinformatics 2009; 10:289. [PMID: 19754976 PMCID: PMC2758874 DOI: 10.1186/1471-2105-10-289] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Accepted: 09/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The definition of a hypothetical protein is a protein that is predicted to be expressed from an open reading frame, but for which there is no experimental evidence of translation. Hypothetical proteins constitute a substantial fraction of proteomes of human as well as of other eukaryotes. With the general belief that the majority of hypothetical proteins are the product of pseudogenes, it is essential to have a tool with the ability of pinpointing the minority of hypothetical proteins with a high probability of being expressed. RESULTS Here, we present an in silico selection strategy where eukaryotic hypothetical proteins are sorted according to two criteria that can be reliably identified in silico: the presence of subcellular targeting signals and presence of characterized protein domains. To validate the selection strategy we applied it on a database of human hypothetical proteins dating to 2006 and compared the proteins predicted to be expressed by our selecting strategy, with their status in 2008. For the comparison we focused on mitochondrial proteins, since considerable amounts of research have focused on this field in between 2006 and 2008. Therefore, many proteins, defined as hypothetical in 2006, have later been characterized as mitochondrial. CONCLUSION Among the total amount of human proteins hypothetical in 2006, 21% have later been experimentally characterized and 6% of those have been shown to have a role in a mitochondrial context. In contrast, among the selected hypothetical proteins from the 2006 dataset, predicted by our strategy to have a mitochondrial role, 53-62% have later been experimentally characterized, and 85% of these have actually been assigned a role in mitochondria by 2008.Therefore our in silico selection strategy can be used to select the most promising candidates for subsequent in vitro and in vivo analyses.
Collapse
Affiliation(s)
- Claus Desler
- Department of Science, Systems and Models, Roskilde University, DK-4000 Roskilde, Denmark.
| | | | | | | | | |
Collapse
|
41
|
Blum T, Briesemeister S, Kohlbacher O. MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 2009; 10:274. [PMID: 19723330 PMCID: PMC2745392 DOI: 10.1186/1471-2105-10-274] [Citation(s) in RCA: 212] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2009] [Accepted: 09/01/2009] [Indexed: 11/10/2022] Open
Abstract
Background Knowledge of subcellular localization of proteins is crucial to proteomics, drug target discovery and systems biology since localization and biological function are highly correlated. In recent years, numerous computational prediction methods have been developed. Nevertheless, there is still a need for prediction methods that show more robustness and higher accuracy. Results We extended our previous MultiLoc predictor by incorporating phylogenetic profiles and Gene Ontology terms. Two different datasets were used for training the system, resulting in two versions of this high-accuracy prediction method. One version is specialized for globular proteins and predicts up to five localizations, whereas a second version covers all eleven main eukaryotic subcellular localizations. In a benchmark study with five localizations, MultiLoc2 performs considerably better than other methods for animal and plant proteins and comparably for fungal proteins. Furthermore, MultiLoc2 performs clearly better when using a second dataset that extends the benchmark study to all eleven main eukaryotic subcellular localizations. Conclusion MultiLoc2 is an extensive high-performance subcellular protein localization prediction system. By incorporating phylogenetic profiles and Gene Ontology terms MultiLoc2 yields higher accuracies compared to its previous version. Moreover, it outperforms other prediction systems in two benchmarks studies. MultiLoc2 is available as user-friendly and free web-service, available at: .
Collapse
Affiliation(s)
- Torsten Blum
- Division for Simulation of Biological Systems, ZBIT/WSI, Eberhard-Karls-Universität Tübingen, Germany.
| | | | | |
Collapse
|
42
|
Mori S, Chang JT, Andrechek ER, Matsumura N, Baba T, Yao G, Kim JW, Gatza M, Murphy S, Nevins JR. Anchorage-independent cell growth signature identifies tumors with metastatic potential. Oncogene 2009; 28:2796-805. [PMID: 19483725 PMCID: PMC3008357 DOI: 10.1038/onc.2009.139] [Citation(s) in RCA: 255] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2008] [Revised: 03/12/2009] [Accepted: 04/08/2009] [Indexed: 12/15/2022]
Abstract
The oncogenic phenotype is complex, resulting from the accumulation of multiple somatic mutations that lead to the deregulation of growth regulatory and cell fate controlling activities and pathways. The ability to dissect this complexity, so as to reveal discrete aspects of the biology underlying the oncogenic phenotype, is critical to understanding the various mechanisms of disease as well as to reveal opportunities for novel therapeutic strategies. Previous work has characterized the process of anchorage-independent growth of cancer cells in vitro as a key aspect of the tumor phenotype, particularly with respect to metastatic potential. Nevertheless, it remains a major challenge to translate these cell biology findings into the context of human tumors. We previously used DNA microarray assays to develop expression signatures, which have the capacity to identify subtle distinctions in biological states and can be used to connect in vitro and in vivo states. Here we describe the development of a signature of anchorage-independent growth, show that the signature exhibits characteristics of deregulated mitochondrial function and then demonstrate that the signature identifies human tumors with the potential for metastasis.
Collapse
Affiliation(s)
- S Mori
- Duke Institute for Genome Sciences and Policy, Duke University Medical Center, Durham, NC 27708, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Murray CI, Barrett M, Van Eyk JE. Assessment of ProteoExtract subcellular fractionation kit reveals limited and incomplete enrichment of nuclear subproteome from frozen liver and heart tissue. Proteomics 2009; 9:3934-8. [PMID: 19637233 PMCID: PMC2761665 DOI: 10.1002/pmic.200701170] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2007] [Accepted: 04/28/2009] [Indexed: 11/08/2022]
Abstract
The nuclear fraction of the ProteoExtract subcellular fractionation kit was assessed using frozen rat liver and heart tissue. Fractionation was evaluated by Western blot using protein markers for various subcellular compartments and followed up with LC/MS/MS analysis of the nuclear fractions. Of the proteins identified, nuclear proteins were in the minority (less than 15%) and there was poor representation of the various nuclear substructures when compared with liver nuclear isolations using a classical density-based centrifugation protocol. The ProteoExtract kit demonstrated poor specificity for the nucleus and offers limited promise for proteomics investigations of the nuclear subproteome in frozen tissue samples.
Collapse
|
44
|
Guda C, King BR, Pal LR, Guda P. A top-down approach to infer and compare domain-domain interactions across eight model organisms. PLoS One 2009; 4:e5096. [PMID: 19333396 PMCID: PMC2659750 DOI: 10.1371/journal.pone.0005096] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 02/10/2009] [Indexed: 11/22/2022] Open
Abstract
Knowledge of specific domain-domain interactions (DDIs) is essential to understand the functional significance of protein interaction networks. Despite the availability of an enormous amount of data on protein-protein interactions (PPIs), very little is known about specific DDIs occurring in them. Here, we present a top-down approach to accurately infer functionally relevant DDIs from PPI data. We created a comprehensive, non-redundant dataset of 209,165 experimentally-derived PPIs by combining datasets from five major interaction databases. We introduced an integrated scoring system that uses a novel combination of a set of five orthogonal scoring features covering the probabilistic, evolutionary, evidence-based, spatial and functional properties of interacting domains, which can map the interacting propensity of two domains in many dimensions. This method outperforms similar existing methods both in the accuracy of prediction and in the coverage of domain interaction space. We predicted a set of 52,492 high-confidence DDIs to carry out cross-species comparison of DDI conservation in eight model species including human, mouse, Drosophila, C. elegans, yeast, Plasmodium, E. coli and Arabidopsis. Our results show that only 23% of these DDIs are conserved in at least two species and only 3.8% in at least 4 species, indicating a rather low conservation across species. Pair-wise analysis of DDI conservation revealed a ‘sliding conservation’ pattern between the evolutionarily neighboring species. Our methodology and the high-confidence DDI predictions generated in this study can help to better understand the functional significance of PPIs at the modular level, thus can significantly impact further experimental investigations in systems biology research.
Collapse
Affiliation(s)
- Chittibabu Guda
- GenNYsis Center for Excellence in Cancer Genomics and Department of Epidemiology & Biostatistics, State University of New York at Albany, Rensselaer, NY, USA.
| | | | | | | |
Collapse
|
45
|
Shin CJ, Wong S, Davis MJ, Ragan MA. Protein-protein interaction as a predictor of subcellular location. BMC SYSTEMS BIOLOGY 2009; 3:28. [PMID: 19243629 PMCID: PMC2663780 DOI: 10.1186/1752-0509-3-28] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2008] [Accepted: 02/25/2009] [Indexed: 11/10/2022]
Abstract
Background Many biological processes are mediated by dynamic interactions between and among proteins. In order to interact, two proteins must co-occur spatially and temporally. As protein-protein interactions (PPIs) and subcellular location (SCL) are discovered via separate empirical approaches, PPI and SCL annotations are independent and might complement each other in helping us to understand the role of individual proteins in cellular networks. We expect reliable PPI annotations to show that proteins interacting in vivo are co-located in the same cellular compartment. Our goal here is to evaluate the potential of using PPI annotation in determining SCL of proteins in human, mouse, fly and yeast, and to identify and quantify the factors that contribute to this complementarity. Results Using publicly available data, we evaluate the hypothesis that interacting proteins must be co-located within the same subcellular compartment. Based on a large, manually curated PPI dataset, we demonstrate that a substantial proportion of interacting proteins are in fact co-located. We develop an approach to predict the SCL of a protein based on the SCL of its interaction partners, given sufficient confidence in the interaction itself. The frequency of false positive PPIs can be reduced by use of six lines of supporting evidence, three based on type of recorded evidence (empirical approach, multiplicity of databases, and multiplicity of literature citations) and three based on type of biological evidence (inferred biological process, domain-domain interactions, and orthology relationships), with biological evidence more-effective than recorded evidence. Our approach performs better than four existing prediction methods in identifying the SCL of membrane proteins, and as well as or better for soluble proteins. Conclusion Understanding cellular systems requires knowledge of the SCL of interacting proteins. We show how PPI data can be used more effectively to yield reliable SCL predictions for both soluble and membrane proteins. Scope exists for further improvement in our understanding of cellular function through consideration of the biological context of molecular interactions.
Collapse
Affiliation(s)
- Chang Jin Shin
- The University of Queensland, Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics, QLD, Australia.
| | | | | | | |
Collapse
|
46
|
Abstract
Current classification of medical diagnosis derives from observational correlation between clinical syndromes and pathologic analysis. Limited understanding of the molecular determinants of diseases encountered in the critically ill remains a major obstacle to the rationale selection of therapeutic targets. Indeed, many human diseases reflect a disorder in physiologic processes that are known to involve the interaction of many complex control loops and to respond to a variety of pharmacologic agents and environmental factors. The advent of whole-genome sequencing and other high-throughput technologies have changed biomedical research into a data-rich discipline. "Omics" data sets that describe virtually all biomolecules in the cell are now publicly available. One of the challenges faced by investigators now lies in the interpretation of vast amounts of biological data sets to derive fundamental and applied biological information about whole systems. As mechanistic understanding of disease requires more than an agglomeration of information on the expression and activities of disease-associated molecules, network analysis has been applied to biological problems. Network analysis of the biological integratome promises to identify factors that influence disease phenotype, providing unique insight into disease mechanism. Network analysis also provides a mechanistic basis for defining phenotypic differences through consideration of unique genetic and environmental factors that govern intermediate phenotypes contributing to disease expression. Lastly, network analysis offers a unique method for identifying therapeutic targets that can alter disease expression.
Collapse
|
47
|
Mizuno Y, Kurochkin IV, Herberth M, Okazaki Y, Schönbach C. Predicted mouse peroxisome-targeted proteins and their actual subcellular locations. BMC Bioinformatics 2008; 9 Suppl 12:S16. [PMID: 19091015 PMCID: PMC2638156 DOI: 10.1186/1471-2105-9-s12-s16] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The import of most intraperoxisomal proteins is mediated by peroxisome targeting signals at their C-termini (PTS1) or N-terminal regions (PTS2). Both signals have been integrated in subcellular location prediction programs. However their present performance, particularly of PTS2-targeting did not seem fitting for large-scale screening of sequences. RESULTS We modified an earlier reported PTS1 screening method to identify PTS2-containing mouse candidates using a combination of computational and manual annotation. For rapid confirmation of five new PTS2- and two previously identified PTS1-containing candidates we developed the new cell line CHO-perRed which stably expresses the peroxisomal marker dsRed-PTS1. Using CHO-perRed we confirmed the peroxisomal localization of PTS1-targeted candidate Zadh2. Preliminary characterization of Zadh2 expression suggested non-PPARalpha mediated activation. Notably, none of the PTS2 candidates located to peroxisomes. CONCLUSION In a few cases the PTS may oscillate from "silent" to "functional" depending on its surface accessibility indicating the potential for context-dependent conditional subcellular sorting. Overall, PTS2-targeting predictions are unlikely to improve without generation and integration of new experimental data from location proteomics, protein structures and quantitative Pex7 PTS2 peptide binding assays.
Collapse
Affiliation(s)
- Yumi Mizuno
- Division of Functional Genomics and Systems Medicine, Research Center for Genomic Medicine, Saitama Medical University, Hidaka, Saitama 350-1241, Japan.
| | | | | | | | | |
Collapse
|
48
|
Aiyar RS, Gagneur J, Steinmetz LM. Identification of mitochondrial disease genes through integrative analysis of multiple datasets. Methods 2008; 46:248-55. [PMID: 18930150 PMCID: PMC2774125 DOI: 10.1016/j.ymeth.2008.10.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2008] [Revised: 10/03/2008] [Accepted: 10/08/2008] [Indexed: 11/24/2022] Open
Abstract
Determining the genetic factors in a disease is crucial to elucidating its molecular basis. This task is challenging due to a lack of information on gene function. The integration of large-scale functional genomics data has proven to be an effective strategy to prioritize candidate disease genes. Mitochondrial disorders are a prevalent and heterogeneous class of diseases that are particularly amenable to this approach. Here we explain the application of integrative approaches to the identification of mitochondrial disease genes. We first examine various datasets that can be used to evaluate the involvement of each gene in mitochondrial function. The data integration methodology is then described, accompanied by examples of common implementations. Finally, we discuss how gene networks are constructed using integrative techniques and applied to candidate gene prioritization. Relevant public data resources are indicated. This report highlights the success and potential of data integration as well as its applicability to the search for mitochondrial disease genes.
Collapse
Affiliation(s)
- Raeka S. Aiyar
- European Molecular Biology Laboratory, Meyerhofstraβe 1, 69117 Heidelberg, Germany
| | - Julien Gagneur
- European Molecular Biology Laboratory, Meyerhofstraβe 1, 69117 Heidelberg, Germany
| | - Lars M. Steinmetz
- European Molecular Biology Laboratory, Meyerhofstraβe 1, 69117 Heidelberg, Germany
| |
Collapse
|
49
|
Sadowski PG, Groen AJ, Dupree P, Lilley KS. Sub-cellular localization of membrane proteins. Proteomics 2008; 8:3991-4011. [PMID: 18780351 DOI: 10.1002/pmic.200800217] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2008] [Indexed: 05/30/2025]
Abstract
In eukaryotes, numerous complex sub-cellular structures exist. The majority of these are delineated by membranes. Many proteins are trafficked to these in order to be able to carry out their correct physiological function. Assigning the sub-cellular location of a protein is of paramount importance to biologists in the elucidation of its role and in the refinement of knowledge of cellular processes by tracing certain activities to specific organelles. Membrane proteins are a key set of proteins as these form part of the boundary of the organelles and represent many important functions such as transporters, receptors, and trafficking. They are, however, some of the most challenging proteins to work with due to poor solubility, a wide concentration range within the cell and inaccessibility to many of the tools employed in proteomics studies. This review focuses on membrane proteins with particular emphasis on sub-cellular localization in terms of methodologies that can be used to determine the accurate location of membrane proteins to organelles. We also discuss what is known about the membrane protein cohorts of major organelles.
Collapse
Affiliation(s)
- Pawel G Sadowski
- Cambridge Centre for Proteomics, Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Cambridge, UK
| | | | | | | |
Collapse
|
50
|
Sui S, Wang J, Yang B, Song L, Zhang J, Chen M, Liu J, Lu Z, Cai Y, Chen S, Bi W, Zhu Y, He F, Qian X. Phosphoproteome analysis of the human Chang liver cells using SCX and a complementary mass spectrometric strategy. Proteomics 2008; 8:2024-34. [PMID: 18491316 DOI: 10.1002/pmic.200700896] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The liver is the largest organ in the body, with many complex, essential functions, such as metabolism, deintoxication, and secretion, often regulated via post-translational modifications, especially phosphorylation. Thus, the detection of phosphoproteins and phosphorylation sites is important to comprehensively explore human liver biological function. The human Chang liver cell line is among the first derived from non-malignant tissue, and its phosphoproteome profile has never been globally analyzed. To develop the complete phosphoproteome and probe the roles of protein phosphorylation in normal human liver, we adopted a shotgun strategy based on strong cation exchange chromatograph, titanium dioxide and LC-MS/MS to isolate and identify phosphorylated proteins. Two types of MS approach, Q-TOF and IT, were used and compared to identify phosphosites from complex protein mixtures of these cells. A total of 1035 phosphorylation sites and 686 phosphorylated peptides were identified from 607 phosphoproteins. A search using the public database of PhosphoSite showed that approximately 344 phosphoproteins and 760 phosphorylation sites appeared to be novel. In addition, N-terminal phosphorylated peptides were a greater fraction of all identified phosphopeptides. With GOfact analysis, we found that most of the identified phosphoproteins are involved in regulating metabolism, consistent with the liver's role as a key metabolic organ.
Collapse
Affiliation(s)
- Shaohui Sui
- State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing, People's Republic of China
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|