1
|
Teimouri H, Medvedeva A, Kolomeisky AB. Unraveling the role of physicochemical differences in predicting protein-protein interactions. J Chem Phys 2024; 161:045102. [PMID: 39051836 DOI: 10.1063/5.0219501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 07/09/2024] [Indexed: 07/27/2024] Open
Abstract
The ability to accurately predict protein-protein interactions is critically important for understanding major cellular processes. However, current experimental and computational approaches for identifying them are technically very challenging and still have limited success. We propose a new computational method for predicting protein-protein interactions using only primary sequence information. It utilizes the concept of physicochemical similarity to determine which interactions will most likely occur. In our approach, the physicochemical features of proteins are extracted using bioinformatics tools for different organisms. Then they are utilized in a machine-learning method to identify successful protein-protein interactions via correlation analysis. It was found that the most important property that correlates most with the protein-protein interactions for all studied organisms is dipeptide amino acid composition (the frequency of specific amino acid pairs in a protein sequence). While current approaches often overlook the specificity of protein-protein interactions with different organisms, our method yields context-specific features that determine protein-protein interactions. The analysis is specifically applied to the bacterial two-component system that includes histidine kinase and transcriptional response regulators, as well as to the barnase-barstar complex, demonstrating the method's versatility across different biological systems. Our approach can be applied to predict protein-protein interactions in any biological system, providing an important tool for investigating complex biological processes' mechanisms.
Collapse
Affiliation(s)
- Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, USA
| | - Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, USA
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
2
|
Bohutínská M, Peichel CL. Divergence time shapes gene reuse during repeated adaptation. Trends Ecol Evol 2024; 39:396-407. [PMID: 38155043 DOI: 10.1016/j.tree.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 11/15/2023] [Accepted: 11/20/2023] [Indexed: 12/30/2023]
Abstract
When diverse lineages repeatedly adapt to similar environmental challenges, the extent to which the same genes are involved (gene reuse) varies across systems. We propose that divergence time among lineages is a key factor driving this variability: as lineages diverge, the extent of gene reuse should decrease due to reductions in allele sharing, functional differentiation among genes, and restructuring of genome architecture. Indeed, we show that many genomic studies of repeated adaptation find that more recently diverged lineages exhibit higher gene reuse during repeated adaptation, but the relationship becomes less clear at older divergence time scales. Thus, future research should explore the factors shaping gene reuse and their interplay across broad divergence time scales for a deeper understanding of evolutionary repeatability.
Collapse
Affiliation(s)
- Magdalena Bohutínská
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland; Department of Botany, Faculty of Science, Charles University, Prague, 12800, Czech Republic.
| | - Catherine L Peichel
- Division of Evolutionary Ecology, Institute of Ecology and Evolution, University of Bern, Bern, 3012, Switzerland
| |
Collapse
|
3
|
Teimouri H, Medvedeva A, Kolomeisky AB. Physical-Chemical Features Selection Reveals That Differences in Dipeptide Compositions Correlate Most with Protein-Protein Interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.27.582345. [PMID: 38464064 PMCID: PMC10925282 DOI: 10.1101/2024.02.27.582345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The ability to accurately predict protein-protein interactions is critically important for our understanding of major cellular processes. However, current experimental and computational approaches for identifying them are technically very challenging and still have limited success. We propose a new computational method for predicting protein-protein interactions using only primary sequence information. It utilizes a concept of physical-chemical similarity to determine which interactions will most probably occur. In our approach, the physical-chemical features of protein are extracted using bioinformatics tools for different organisms, and then they are utilized in a machine-learning method to identify successful protein-protein interactions via correlation analysis. It is found that the most important property that correlates most with the protein-protein interactions for all studied organisms is dipeptide amino acid compositions. The analysis is specifically applied to the bacterial two-component system that includes histidine kinase and transcriptional response regulators. Our theoretical approach provides a simple and robust method for quantifying the important details of complex mechanisms of biological processes.
Collapse
Affiliation(s)
- Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States
| | - Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States
| | - Anatoly B. Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, United States
- Department of Physics and Astronomy, Rice University, Houston, TX, United States
| |
Collapse
|
4
|
Xie S, Xie X, Zhao X, Liu F, Wang Y, Ping J, Ji Z. HNSPPI: a hybrid computational model combing network and sequence information for predicting protein-protein interaction. Brief Bioinform 2023; 24:bbad261. [PMID: 37480553 DOI: 10.1093/bib/bbad261] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/24/2023] [Accepted: 06/26/2023] [Indexed: 07/24/2023] Open
Abstract
Most life activities in organisms are regulated through protein complexes, which are mainly controlled via Protein-Protein Interactions (PPIs). Discovering new interactions between proteins and revealing their biological functions are of great significance for understanding the molecular mechanisms of biological processes and identifying the potential targets in drug discovery. Current experimental methods only capture stable protein interactions, which lead to limited coverage. In addition, expensive cost and time consuming are also the obvious shortcomings. In recent years, various computational methods have been successfully developed for predicting PPIs based only on protein homology, primary sequences of protein or gene ontology information. Computational efficiency and data complexity are still the main bottlenecks for the algorithm generalization. In this study, we proposed a novel computational framework, HNSPPI, to predict PPIs. As a hybrid supervised learning model, HNSPPI comprehensively characterizes the intrinsic relationship between two proteins by integrating amino acid sequence information and connection properties of PPI network. The experimental results show that HNSPPI works very well on six benchmark datasets. Moreover, the comparison analysis proved that our model significantly outperforms other five existing algorithms. Finally, we used the HNSPPI model to explore the SARS-CoV-2-Human interaction system and found several potential regulations. In summary, HNSPPI is a promising model for predicting new protein interactions from known PPI data.
Collapse
Affiliation(s)
- Shijie Xie
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| | - Xiaojun Xie
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| | - Xin Zhao
- Department of Hepatobiliary Surgery, Beijing Chaoyang Hospital affiliated to Capital Medical University, Beijing 100020, China
| | - Fei Liu
- Joint International Research Laboratory of Animal Health and Food Safety of Ministry of Education & Single Molecule Nanometry Laboratory (Sinmolab), Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Yiming Wang
- Key Laboratory of Biological Interactions and Crop Health, Department of Plant Pathology, Nanjing Agricultural University, 210095, Nanjing, China
| | - Jihui Ping
- MOE International Joint Collaborative Research Laboratory for Animal Health and Food Safety & Jiangsu Engineering Laboratory of Animal Immunology, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhiwei Ji
- College of Artificial Intelligence, Nanjing Agricultural University, No. 1 Weigang Rd, Nanjing, Jiangsu 210095, China
| |
Collapse
|
5
|
Gibson BG, Cox TE, Marchbank KJ. Contribution of animal models to the mechanistic understanding of Alternative Pathway and Amplification Loop (AP/AL)-driven Complement-mediated Diseases. Immunol Rev 2023; 313:194-216. [PMID: 36203396 PMCID: PMC10092198 DOI: 10.1111/imr.13141] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
This review aimed to capture the key findings that animal models have provided around the role of the alternative pathway and amplification loop (AP/AL) in disease. Animal models, particularly mouse models, have been incredibly useful to define the role of complement and the alternative pathway in health and disease; for instance, the use of cobra venom factor and depletion of C3 provided the initial insight that complement was essential to generate an appropriate adaptive immune response. The development of knockout mice have further underlined the importance of the AP/AL in disease, with the FH knockout mouse paving the way for the first anti-complement drugs. The impact from the development of FB, properdin, and C3 knockout mice closely follows this in terms of mechanistic understanding in disease. Indeed, our current understanding that complement plays a role in most conditions at one level or another is rooted in many of these in vivo studies. That C3, in particular, has roles beyond the obvious in innate and adaptive immunity, normal physiology, and cellular functions, with or without other recognized AP components, we would argue, only extends the reach of this arm of the complement system. Humanized mouse models also continue to play their part. Here, we argue that the animal models developed over the last few decades have truly helped define the role of the AP/AL in disease.
Collapse
Affiliation(s)
- Beth G. Gibson
- Complement Therapeutics Research Group and Newcastle University Translational and Clinical Research InstituteFaculty of Medical ScienceNewcastle‐upon‐TyneUK
- National Renal Complement Therapeutics CentreaHUS ServiceNewcastle upon TyneUK
| | - Thomas E. Cox
- Complement Therapeutics Research Group and Newcastle University Translational and Clinical Research InstituteFaculty of Medical ScienceNewcastle‐upon‐TyneUK
- National Renal Complement Therapeutics CentreaHUS ServiceNewcastle upon TyneUK
| | - Kevin J. Marchbank
- Complement Therapeutics Research Group and Newcastle University Translational and Clinical Research InstituteFaculty of Medical ScienceNewcastle‐upon‐TyneUK
- National Renal Complement Therapeutics CentreaHUS ServiceNewcastle upon TyneUK
| |
Collapse
|
6
|
Littmann M, Heinzinger M, Dallago C, Olenyi T, Rost B. Embeddings from deep learning transfer GO annotations beyond homology. Sci Rep 2021; 11:1160. [PMID: 33441905 PMCID: PMC7806674 DOI: 10.1038/s41598-020-80786-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/24/2020] [Indexed: 11/09/2022] Open
Abstract
Knowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.
Collapse
Affiliation(s)
- Maria Littmann
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Tobias Olenyi
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology, i12, TUM (Technical University of Munich), Boltzmannstr. 3, Garching, 85748, Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany
- School of Life Sciences Weihenstephan (TUM-WZW), TUM (Technical University of Munich), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
7
|
Stamboulian M, Guerrero RF, Hahn MW, Radivojac P. The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 2020; 36:i219-i226. [PMID: 32657391 PMCID: PMC7355290 DOI: 10.1093/bioinformatics/btaa468] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION The computational prediction of gene function is a key step in making full use of newly sequenced genomes. Function is generally predicted by transferring annotations from homologous genes or proteins for which experimental evidence exists. The 'ortholog conjecture' proposes that orthologous genes should be preferred when making such predictions, as they evolve functions more slowly than paralogous genes. Previous research has provided little support for the ortholog conjecture, though the incomplete nature of the data cast doubt on the conclusions. RESULTS We use experimental annotations from over 40 000 proteins, drawn from over 80 000 publications, to revisit the ortholog conjecture in two pairs of species: (i) Homo sapiens and Mus musculus and (ii) Saccharomyces cerevisiae and Schizosaccharomyces pombe. By making a distinction between questions about the evolution of function versus questions about the prediction of function, we find strong evidence against the ortholog conjecture in the context of function prediction, though questions about the evolution of function remain difficult to address. In both pairs of species, we quantify the amount of information that would be ignored if paralogs are discarded, as well as the resulting loss in prediction accuracy. Taken as a whole, our results support the view that the types of homologs used for function transfer are largely irrelevant to the task of function prediction. Maximizing the amount of data used for this task, regardless of whether it comes from orthologs or paralogs, is most likely to lead to higher prediction accuracy. AVAILABILITY AND IMPLEMENTATION https://github.com/predragradivojac/oc. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Moses Stamboulian
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Rafael F Guerrero
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Matthew W Hahn
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| |
Collapse
|
8
|
Jamil IN, Remali J, Azizan KA, Nor Muhammad NA, Arita M, Goh HH, Aizat WM. Systematic Multi-Omics Integration (MOI) Approach in Plant Systems Biology. FRONTIERS IN PLANT SCIENCE 2020; 11:944. [PMID: 32754171 PMCID: PMC7371031 DOI: 10.3389/fpls.2020.00944] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 06/10/2020] [Indexed: 05/03/2023]
Abstract
Across all facets of biology, the rapid progress in high-throughput data generation has enabled us to perform multi-omics systems biology research. Transcriptomics, proteomics, and metabolomics data can answer targeted biological questions regarding the expression of transcripts, proteins, and metabolites, independently, but a systematic multi-omics integration (MOI) can comprehensively assimilate, annotate, and model these large data sets. Previous MOI studies and reviews have detailed its usage and practicality on various organisms including human, animals, microbes, and plants. Plants are especially challenging due to large poorly annotated genomes, multi-organelles, and diverse secondary metabolites. Hence, constructive and methodological guidelines on how to perform MOI for plants are needed, particularly for researchers newly embarking on this topic. In this review, we thoroughly classify multi-omics studies on plants and verify workflows to ensure successful omics integration with accurate data representation. We also propose three levels of MOI, namely element-based (level 1), pathway-based (level 2), and mathematical-based integration (level 3). These MOI levels are described in relation to recent publications and tools, to highlight their practicality and function. The drawbacks and limitations of these MOI are also discussed for future improvement toward more amenable strategies in plant systems biology.
Collapse
Affiliation(s)
- Ili Nadhirah Jamil
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Juwairiah Remali
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Kamalrul Azlan Azizan
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Nor Azlan Nor Muhammad
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Masanori Arita
- Bioinformation & DDBJ Center, National Institute of Genetics (NIG), Mishima, Japan
- Metabolome Informatics Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan
| | - Hoe-Han Goh
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| | - Wan Mohd Aizat
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia (UKM), Bangi, Malaysia
| |
Collapse
|
9
|
Schafferhans A, O'Donoghue SI, Heinzinger M, Rost B. Dark Proteins Important for Cellular Function. Proteomics 2019; 18:e1800227. [PMID: 30318701 DOI: 10.1002/pmic.201800227] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 09/14/2018] [Indexed: 01/08/2023]
Abstract
Despite substantial and successful projects for structural genomics, many proteins remain for which neither experimental structures nor homology-based models are known for any part of the amino acid sequence. These have been called "dark proteins," in contrast to non-dark proteins, in which at least part of the sequence has a known or inferred structure. It has been hypothesized that non-dark proteins may be more abundantly expressed than dark proteins, which are known to have much fewer sequence relatives. Surprisingly, the opposite has been observed: human dark and non-dark proteins had quite similar levels of expression, in terms of both mRNA and protein abundance. Such high levels of expression strongly indicate that dark proteins-as a group-are important for cellular function. This is remarkable, given how carefully structural biologists have focused on proteins crucial for function, and highlights the important challenge posed by dark proteins in future research.
Collapse
Affiliation(s)
- Andrea Schafferhans
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748 Garching, Germany.,Department of Bioengineering Sciences, University of Applied Sciences, Freising, Germany
| | - Seán I O'Donoghue
- CSIRO Data61, Sydney, Australia.,Division of Genomics & Epigenetics, Garvan Institute of Medical Research, Sydney, Australia.,School of Biotechnology & Biomolecular Sciences, University of New South Wales (UNSW), Sydney, NSW, Australia
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748 Garching, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748 Garching, Germany.,Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching, Germany.,TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
10
|
Understanding Human-Virus Protein-Protein Interactions Using a Human Protein Complex-Based Analysis Framework. mSystems 2019; 4:mSystems00303-18. [PMID: 30984872 PMCID: PMC6456672 DOI: 10.1128/msystems.00303-18] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 03/20/2019] [Indexed: 12/29/2022] Open
Abstract
Although human protein complexes have been reported to be directly related to viral infection, previous studies have not systematically investigated human-virus PPIs from the perspective of human protein complexes. To the best of our knowledge, we have presented here the most comprehensive and in-depth analysis of human-virus PPIs in the context of VTCs. Our findings confirm that human protein complexes are heavily involved in viral infection. The observed preferences of virally targeted subunits within complexes reflect the mechanisms used by viruses to manipulate host protein complexes. The identified periodic expression patterns of the VTCs and the corresponding candidates could increase our understanding of how viruses manipulate the host cell cycle. Finally, our proposed conceptual application framework of VTCs and the developed VTcomplex could provide new hints to develop antiviral drugs for the clinical treatment of viral infections. Computational analysis of human-virus protein-protein interaction (PPI) data is an effective way toward systems understanding the molecular mechanism of viral infection. Previous work has mainly focused on characterizing the global properties of viral targets within the entire human PPI network. In comparison, how viruses manipulate host local networks (e.g., human protein complexes) has been rarely addressed from a computational perspective. By mainly integrating information about human-virus PPIs, human protein complexes, and gene expression profiles, we performed a large-scale analysis of virally targeted complexes (VTCs) related to five common human-pathogenic viruses, including influenza A virus subtype H1N1, human immunodeficiency virus type 1, Epstein-Barr virus, human papillomavirus, and hepatitis C virus. We found that viral targets are enriched within human protein complexes. We observed in the context of VTCs that viral targets tended to have a high within-complex degree and to be scaffold and housekeeping proteins. Complexes that are essential for viral propagation were simultaneously targeted by multiple viruses. We characterized the periodic expression patterns of VTCs and provided the corresponding candidates that may be involved in the manipulation of the host cell cycle. As a potential application of the current analysis, we proposed a VTC-based antiviral drug target discovery strategy. Finally, we developed an online VTC-related platform known as VTcomplex (http://zzdlab.com/vtcomplex/index.php or http://systbio.cau.edu.cn/vtcomplex/index.php). We hope that the current analysis can provide new insights into the global landscape of human-virus PPIs at the VTC level and that the developed VTcomplex will become a vital resource for the community. IMPORTANCE Although human protein complexes have been reported to be directly related to viral infection, previous studies have not systematically investigated human-virus PPIs from the perspective of human protein complexes. To the best of our knowledge, we have presented here the most comprehensive and in-depth analysis of human-virus PPIs in the context of VTCs. Our findings confirm that human protein complexes are heavily involved in viral infection. The observed preferences of virally targeted subunits within complexes reflect the mechanisms used by viruses to manipulate host protein complexes. The identified periodic expression patterns of the VTCs and the corresponding candidates could increase our understanding of how viruses manipulate the host cell cycle. Finally, our proposed conceptual application framework of VTCs and the developed VTcomplex could provide new hints to develop antiviral drugs for the clinical treatment of viral infections.
Collapse
|
11
|
Ngounou Wetie AG, Sokolowska I, Channaveerappa D, Dupree EJ, Jayathirtha M, Woods AG, Darie CC. Proteomics and Non-proteomics Approaches to Study Stable and Transient Protein-Protein Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1140:121-142. [DOI: 10.1007/978-3-030-15950-4_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Exploring the interactions of the RAS family in the human protein network and their potential implications in RAS-directed therapies. Oncotarget 2018; 7:75810-75826. [PMID: 27713118 PMCID: PMC5342780 DOI: 10.18632/oncotarget.12416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/15/2016] [Indexed: 12/14/2022] Open
Abstract
RAS proteins are the founding members of the RAS superfamily of GTPases. They are involved in key signaling pathways regulating essential cellular functions such as cell growth and differentiation. As a result, their deregulation by inactivating mutations often results in aberrant cell proliferation and cancer. With the exception of the relatively well-known KRAS, HRAS and NRAS proteins, little is known about how the interactions of the other RAS human paralogs affect cancer evolution and response to treatment. In this study we performed a comprehensive analysis of the relationship between the phylogeny of RAS proteins and their location in the protein interaction network. This analysis was integrated with the structural analysis of conserved positions in available 3D structures of RAS complexes. Our results show that many RAS proteins with divergent sequences are found close together in the human interactome. We found specific conserved amino acid positions in this group that map to the binding sites of RAS with many of their signaling effectors, suggesting that these pairs could share interacting partners. These results underscore the potential relevance of cross-talking in the RAS signaling network, which should be taken into account when considering the inhibitory activity of drugs targeting specific RAS oncoproteins. This study broadens our understanding of the human RAS signaling network and stresses the importance of considering its potential cross-talk in future therapies.
Collapse
|
13
|
Kotelnikova E, Kalinin A, Yuryev A, Maslov S. Prediction of Protein-protein Interactions on the Basis of Evolutionary Conservation of Protein Functions. Evol Bioinform Online 2017. [DOI: 10.1177/117693430700300029] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Motivation Although a great deal of progress is being made in the development of fast and reliable experimental techniques to extract genome-wide networks of protein-protein and protein-DNA interactions, the sequencing of new genomes proceeds at an even faster rate. That is why there is a considerable need for reliable methods of in-silico prediction of protein interaction based solely on sequence similarity information and known interactions from well-studied organisms. This problem can be solved if a dependency exists between sequence similarity and the conservation of the proteins’ functions. Results In this paper, we introduce a novel probabilistic method for prediction of protein-protein interactions using a new empirical probabilistic formula describing the loss of interactions between homologous proteins during the course of evolution. This formula describes an evolutional process quite similar to the process of the Earth's population growth. In addition, our method favors predictions confirmed by several interacting pairs over predictions coming from a single interacting pair. Our approach is useful in working with “noisy” data such as those coming from high-throughput experiments. We have generated predictions for five “model” organisms: H. sapiens, D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae and evaluated the quality of these predictions.
Collapse
Affiliation(s)
| | - Andrey Kalinin
- Ariadne Genomics Inc. 9430 Key West Ave., Suite 113, Rockville, MD 20850, U.S.A
| | - Anton Yuryev
- Ariadne Genomics Inc. 9430 Key West Ave., Suite 113, Rockville, MD 20850, U.S.A
| | - Sergei Maslov
- Department of Physics, Brookhaven National Laboratory, Upton, New York 11973, U.S.A
| |
Collapse
|
14
|
Zhang A, He L, Wang Y. Prediction of GCRV virus-host protein interactome based on structural motif-domain interactions. BMC Bioinformatics 2017; 18:145. [PMID: 28253857 PMCID: PMC5335770 DOI: 10.1186/s12859-017-1500-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2016] [Accepted: 01/27/2017] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Grass carp hemorrhagic disease, caused by grass carp reovirus (GCRV), is the most fatal causative agent in grass carp aquaculture. Protein-protein interactions between virus and host are one avenue through which GCRV can trigger infection and induce disease. Experimental approaches for the detection of host-virus interactome have many inherent limitations, and studies on protein-protein interactions between GCRV and its host remain rare. RESULTS In this study, based on known motif-domain interaction information, we systematically predicted the GCRV virus-host protein interactome by using motif-domain interaction pair searching strategy. These proteins derived from different domain families and were predicted to interact with different motif patterns in GCRV. JAM-A protein was successfully predicted to interact with motifs of GCRV Sigma1-like protein, and shared the similar binding mode compared with orthoreovirus. Differentially expressed genes during GCRV infection process were extracted and mapped to our predicted interactome, the overlapped genes displayed different tissue expression distributions on the whole, the overall expression level in intestinal is higher than that of other three tissues, which may suggest that the functions of these genes are more active in intestinal. Function annotation and pathway enrichment analysis revealed that the host targets were largely involved in signaling pathway and immune pathway, such as interferon-gamma signaling pathway, VEGF signaling pathway, EGF receptor signaling pathway, B cell activation, and T cell activation. CONCLUSIONS Although the predicted PPIs may contain some false positives due to limited data resource and poor research background in non-model species, the computational method still provide reasonable amount of interactions, which can be further validated by high throughput experiments. The findings of this work will contribute to the development of system biology for GCRV infectious diseases, and help guide the identification of novel receptors of GCRV in its host.
Collapse
Affiliation(s)
- Aidi Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
| | - Libo He
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China
| | - Yaping Wang
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, 430072, China.
| |
Collapse
|
15
|
Kshirsagar M, Murugesan K, Carbonell JG, Klein-Seetharaman J. Multitask Matrix Completion for Learning Protein Interactions Across Diseases. J Comput Biol 2017; 24:501-514. [PMID: 28128642 DOI: 10.1089/cmb.2016.0201] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Disease-causing pathogens such as viruses introduce their proteins into the host cells in which they interact with the host's proteins, enabling the virus to replicate inside the host. These interactions between pathogen and host proteins are key to understanding infectious diseases. Often multiple diseases involve phylogenetically related or biologically similar pathogens. Here we present a multitask learning method to jointly model interactions between human proteins and three different but related viruses: Hepatitis C, Ebola virus, and Influenza A. Our multitask matrix completion-based model uses a shared low-rank structure in addition to a task-specific sparse structure to incorporate the various interactions. We obtain between 7 and 39 percentage points improvement in predictive performance over prior state-of-the-art models. We show how our model's parameters can be interpreted to reveal both general and specific interaction-relevant characteristics of the viruses. Our code is available online.
Collapse
Affiliation(s)
| | - Keerthiram Murugesan
- 2 Language Technologies Institute, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - Jaime G Carbonell
- 2 Language Technologies Institute, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - Judith Klein-Seetharaman
- 3 Metabolic & Vascular Health, Warwick Medical School, University of Warwick , Coventry, United Kingdom
| |
Collapse
|
16
|
Kuo TH, Li KB. Predicting Protein-Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids. Int J Mol Sci 2016; 17:ijms17111788. [PMID: 27792167 PMCID: PMC5133789 DOI: 10.3390/ijms17111788] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 10/14/2016] [Accepted: 10/18/2016] [Indexed: 12/17/2022] Open
Abstract
Information about the interface sites of Protein–Protein Interactions (PPIs) is useful for many biological research works. However, despite the advancement of experimental techniques, the identification of PPI sites still remains as a challenging task. Using a statistical learning technique, we proposed a computational tool for predicting PPI interaction sites. As an alternative to similar approaches requiring structural information, the proposed method takes all of the input from protein sequences. In addition to typical sequence features, our method takes into consideration that interaction sites are not randomly distributed over the protein sequence. We characterized this positional preference using protein complexes with known structures, proposed a numerical index to estimate the propensity and then incorporated the index into a learning system. The resulting predictor, without using structural information, yields an area under the ROC curve (AUC) of 0.675, recall of 0.597, precision of 0.311 and accuracy of 0.583 on a ten-fold cross-validation experiment. This performance is comparable to the previous approach in which structural information was used. Upon introducing the B-factor data to our predictor, we demonstrated that the AUC can be further improved to 0.750. The tool is accessible at http://bsaltools.ym.edu.tw/predppis.
Collapse
Affiliation(s)
- Tzu-Hao Kuo
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.
| | - Kuo-Bin Li
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.
- Office of Information Management, National Yang-Ming University Hospital, Yilan 260, Taiwan.
| |
Collapse
|
17
|
Goncearenco A, Shaytan AK, Shoemaker BA, Panchenko AR. Structural Perspectives on the Evolutionary Expansion of Unique Protein-Protein Binding Sites. Biophys J 2015. [PMID: 26213149 DOI: 10.1016/j.bpj.2015.06.056] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Structures of protein complexes provide atomistic insights into protein interactions. Human proteins represent a quarter of all structures in the Protein Data Bank; however, available protein complexes cover less than 10% of the human proteome. Although it is theoretically possible to infer interactions in human proteins based on structures of homologous protein complexes, it is still unclear to what extent protein interactions and binding sites are conserved, and whether protein complexes from remotely related species can be used to infer interactions and binding sites. We considered biological units of protein complexes and clustered protein-protein binding sites into similarity groups based on their structure and sequence, which allowed us to identify unique binding sites. We showed that the growth rate of the number of unique binding sites in the Protein Data Bank was much slower than the growth rate of the number of structural complexes. Next, we investigated the evolutionary roots of unique binding sites and identified the major phyletic branches with the largest expansion in the number of novel binding sites. We found that many binding sites could be traced to the universal common ancestor of all cellular organisms, whereas relatively few binding sites emerged at the major evolutionary branching points. We analyzed the physicochemical properties of unique binding sites and found that the most ancient sites were the largest in size, involved many salt bridges, and were the most compact and least planar. In contrast, binding sites that appeared more recently in the evolution of eukaryotes were characterized by a larger fraction of polar and aromatic residues, and were less compact and more planar, possibly due to their more transient nature and roles in signaling processes.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Alexey K Shaytan
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Benjamin A Shoemaker
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland
| | - Anna R Panchenko
- Computational Biology Branch of the National Center for Biotechnology Information, Bethesda, Maryland.
| |
Collapse
|
18
|
Hamp T, Rost B. Evolutionary profiles improve protein–protein interaction prediction from sequence. Bioinformatics 2015; 31:1945-50. [DOI: 10.1093/bioinformatics/btv077] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Accepted: 02/02/2015] [Indexed: 11/14/2022] Open
|
19
|
Hamp T, Rost B. More challenges for machine-learning protein interactions. ACTA ACUST UNITED AC 2015; 31:1521-5. [PMID: 25586513 DOI: 10.1093/bioinformatics/btu857] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 12/23/2014] [Indexed: 01/21/2023]
Abstract
MOTIVATION Machine learning may be the most popular computational tool in molecular biology. Providing sustained performance estimates is challenging. The standard cross-validation protocols usually fail in biology. Park and Marcotte found that even refined protocols fail for protein-protein interactions (PPIs). RESULTS Here, we sketch additional problems for the prediction of PPIs from sequence alone. First, it not only matters whether proteins A or B of a target interaction A-B are similar to proteins of training interactions (positives), but also whether A or B are similar to proteins of non-interactions (negatives). Second, training on multiple interaction partners per protein did not improve performance for new proteins (not used to train). In contrary, a strictly non-redundant training that ignored good data slightly improved the prediction of difficult cases. Third, which prediction method appears to be best crucially depends on the sequence similarity between the test and the training set, how many true interactions should be found and the expected ratio of negatives to positives. The correct assessment of performance is the most complicated task in the development of prediction methods. Our analyses suggest that PPIs square the challenge for this task.
Collapse
Affiliation(s)
- Tobias Hamp
- Department of Informatics, Bioinformatics and Computational Biology I12, Technische Universität München, 85748 Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology I12, Technische Universität München, 85748 Garching/Munich, Germany
| |
Collapse
|
20
|
Sudha G, Nussinov R, Srinivasan N. An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:141-50. [PMID: 25077409 DOI: 10.1016/j.pbiomolbio.2014.07.004] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 07/13/2014] [Indexed: 12/20/2022]
Abstract
Rich data bearing on the structural and evolutionary principles of protein-protein interactions are paving the way to a better understanding of the regulation of function in the cell. This is particularly the case when these interactions are considered in the framework of key pathways. Knowledge of the interactions may provide insights into the mechanisms of crucial 'driver' mutations in oncogenesis. They also provide the foundation toward the design of protein-protein interfaces and inhibitors that can abrogate their formation or enhance them. The main features to learn from known 3-D structures of protein-protein complexes and the extensive literature which analyzes them computationally and experimentally include the interaction details which permit undertaking structure-based drug discovery, the evolution of complexes and their interactions, the consequences of alterations such as post-translational modifications, ligand binding, disease causing mutations, host pathogen interactions, oligomerization, aggregation and the roles of disorder, dynamics, allostery and more to the protein and the cell. This review highlights some of the recent advances in these areas, including design, inhibition and prediction of protein-protein complexes. The field is broad, and much work has been carried out in these areas, making it challenging to cover it in its entirety. Much of this is due to the fast increase in the number of molecules whose structures have been determined experimentally and the vast increase in computational power. Here we provide a concise overview.
Collapse
Affiliation(s)
- Govindarajan Sudha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | - Ruth Nussinov
- Cancer and Inflammation Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., National Cancer Institute, Frederick, MD 21702, USA; Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
21
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
22
|
Breker M, Schuldiner M. The emergence of proteome-wide technologies: systematic analysis of proteins comes of age. Nat Rev Mol Cell Biol 2014; 15:453-64. [PMID: 24938631 DOI: 10.1038/nrm3821] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
During the lifetime of a cell proteins can change their localization, alter their abundance and undergo modifications, all of which cannot be assayed by tracking mRNAs alone. Methods to study proteomes directly are coming of age, thereby opening new perspectives on the role of post-translational regulation in stabilizing the cellular milieu. Proteomics has undergone a revolution, and novel technologies for the systematic analysis of proteins have emerged. These methods can expand our ability to acquire information from single proteins to proteomes, from static to dynamic measures and from the population level to the level of single cells. Such approaches promise that proteomes will soon be studied at a similar level of dynamic resolution as has been the norm for transcriptomes.
Collapse
Affiliation(s)
- Michal Breker
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Maya Schuldiner
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
23
|
Ngounou Wetie AG, Sokolowska I, Woods AG, Roy U, Deinhardt K, Darie CC. Protein-protein interactions: switch from classical methods to proteomics and bioinformatics-based approaches. Cell Mol Life Sci 2014; 71:205-28. [PMID: 23579629 PMCID: PMC11113707 DOI: 10.1007/s00018-013-1333-1] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Revised: 03/25/2013] [Accepted: 03/26/2013] [Indexed: 11/28/2022]
Abstract
Following the sequencing of the human genome and many other organisms, research on protein-coding genes and their functions (functional genomics) has intensified. Subsequently, with the observation that proteins are indeed the molecular effectors of most cellular processes, the discipline of proteomics was born. Clearly, proteins do not function as single entities but rather as a dynamic network of team players that have to communicate. Though genetic (yeast two-hybrid Y2H) and biochemical methods (co-immunoprecipitation Co-IP, affinity purification AP) were the methods of choice at the beginning of the study of protein-protein interactions (PPI), in more recent years there has been a shift towards proteomics-based methods and bioinformatics-based approaches. In this review, we first describe in depth PPIs and we make a strong case as to why unraveling the interactome is the next challenge in the field of proteomics. Furthermore, classical methods of investigation of PPIs and structure-based bioinformatics approaches are presented. The greatest emphasis is placed on proteomic methods, especially native techniques that were recently developed and that have been shown to be reliable. Finally, we point out the limitations of these methods and the need to set up a standard for the validation of PPI experiments.
Collapse
Affiliation(s)
- Armand G. Ngounou Wetie
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Izabela Sokolowska
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Alisa G. Woods
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Urmi Roy
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| | - Katrin Deinhardt
- Centre for Biological Sciences, University of Southampton, Life Sciences Building 85, Southampton, SO17 1BJ UK
- Institute for Life Sciences, University of Southampton, Life Sciences Building 85, Southampton, SO17 1BJ UK
| | - Costel C. Darie
- Department of Chemistry and Biomolecular Science, Biochemistry and Proteomics Group, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699-5810 USA
| |
Collapse
|
24
|
Shoemaker B, Wuchty S, Panchenko AR. Computational large-scale mapping of protein-protein interactions using structural complexes. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2013; 73:3.9.1-3.9.9. [PMID: 24510594 PMCID: PMC3920302 DOI: 10.1002/0471140864.ps0309s73] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Although the identification of protein interactions by high-throughput methods progresses at a fast pace, "interactome" datasets still suffer from high rates of false positives and low coverage. To map the interactome of any organism, this unit presents a computational framework to predict protein-protein or gene-gene interactions utilizing experimentally determined evidence of structural complexes, atomic details of binding interfaces and evolutionary conservation.
Collapse
Affiliation(s)
- Benjamin Shoemaker
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland
| | - Stefan Wuchty
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
25
|
Abstract
UNLABELLED Protein interaction networks are important for the understanding of regulatory mechanisms, for the explanation of experimental data and for the prediction of protein functions. Unfortunately, most interaction data is available only for model organisms. As a possible remedy, the transfer of interactions to organisms of interest is common practice, but it is not clear when interactions can be transferred from one organism to another and, thus, the confidence in the derived interactions is low. Here, we propose to use a rich set of features to train Random Forests in order to score transferred interactions. We evaluated the transfer from a range of eukaryotic organisms to S. cerevisiae using orthologs. Directly transferred interactions to S. cerevisiae are on average only 24% consistent with the current S. cerevisiae interaction network. By using commonly applied filter approaches the transfer precision can be improved, but at the cost of a large decrease in the number of transferred interactions. Our Random Forest approach uses various features derived from both the target and the source network as well as the ortholog annotations to assign confidence values to transferred interactions. Thereby, we could increase the average transfer consistency to 85%, while still transferring almost 70% of all correctly transferable interactions. We tested our approach for the transfer of interactions to other species and showed that our approach outperforms competing methods for the transfer of interactions to species where no experimental knowledge is available. Finally, we applied our predictor to score transferred interactions to 83 targets species and we were able to extend the available interactome of B. taurus, M. musculus and G. gallus with over 40,000 interactions each. Our transferred interaction networks are publicly available via our web interface, which allows to inspect and download transferred interaction sets of different sizes, for various species, and at specified expected precision levels. AVAILABILITY http://services.bio.ifi.lmu.de/coin-db/.
Collapse
Affiliation(s)
- Robert Pesch
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
- * E-mail:
| | - Ralf Zimmer
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
26
|
Kumar S, Biswal DK, Tandon V. In-silico analysis of caspase-3 and -7 proteases from blood-parasitic Schistosoma species (Trematoda) and their human host. Bioinformation 2013; 9:456-63. [PMID: 23847399 PMCID: PMC3705615 DOI: 10.6026/97320630009456] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2012] [Revised: 10/21/2012] [Accepted: 04/17/2013] [Indexed: 12/24/2022] Open
Abstract
Proteolytic enzymes of the caspase family, which reside as latent precursors in most nucleated metazoan cells, are core effectors of apoptosis. Of them, the executioner caspases- 3 and -7 exist within the cytosol as inactive dimers and are activated by a process called dimerization. Caspase inhibition is looked upon as a promising approach for treating multiple diseases. Though caspases have been extensively studied in the human system, their role in eukaryotic pathogens and parasites of human hosts has not drawn enough attention. In protein sequence analysis, caspases of blood flukes (Schistosoma spp) were revealed to have a low sequence identity with their counterparts in human and other mammalian hosts, which encouraged us to analyse interacting domains that participate in dimerization of caspases in the parasite and to reveal differences, if any, between the host-parasite systems. Significant differences in the molecular surface arrangement of the dimer interfaces reveal that in schistosomal caspases only eight out of forty dimer conformations are similar to human caspase structures. Thus, the parasite-specific dimer conformations (that are different from caspases of the host) may emerge as potential drug targets of therapeutic value against schistosomal infections. Three important factors namely, the size of amino acids, secondary structures and geometrical arrangement of interacting domains influence the pattern of caspase dimer formation, which, in turn, is manifested in varied structural conformations of caspases in the parasite and its human hosts.
Collapse
Affiliation(s)
- Shakti Kumar
- Bioinformatics Centre, North-Eastern Hill University, Shillong 793022, Meghalaya, India
- Department of Zoology, North-Eastern Hill University, Shillong 793022, Meghalaya, India
| | - Devendra Kumar Biswal
- Bioinformatics Centre, North-Eastern Hill University, Shillong 793022, Meghalaya, India
| | - Veena Tandon
- Bioinformatics Centre, North-Eastern Hill University, Shillong 793022, Meghalaya, India
- Department of Zoology, North-Eastern Hill University, Shillong 793022, Meghalaya, India
| |
Collapse
|
27
|
Jin Y, Turaev D, Weinmaier T, Rattei T, Makse HA. The evolutionary dynamics of protein-protein interaction networks inferred from the reconstruction of ancient networks. PLoS One 2013; 8:e58134. [PMID: 23526967 PMCID: PMC3603955 DOI: 10.1371/journal.pone.0058134] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 01/30/2013] [Indexed: 11/18/2022] Open
Abstract
Cellular functions are based on the complex interplay of proteins, therefore the structure and dynamics of these protein-protein interaction (PPI) networks are the key to the functional understanding of cells. In the last years, large-scale PPI networks of several model organisms were investigated. A number of theoretical models have been developed to explain both the network formation and the current structure. Favored are models based on duplication and divergence of genes, as they most closely represent the biological foundation of network evolution. However, studies are often based on simulated instead of empirical data or they cover only single organisms. Methodological improvements now allow the analysis of PPI networks of multiple organisms simultaneously as well as the direct modeling of ancestral networks. This provides the opportunity to challenge existing assumptions on network evolution. We utilized present-day PPI networks from integrated datasets of seven model organisms and developed a theoretical and bioinformatic framework for studying the evolutionary dynamics of PPI networks. A novel filtering approach using percolation analysis was developed to remove low confidence interactions based on topological constraints. We then reconstructed the ancient PPI networks of different ancestors, for which the ancestral proteomes, as well as the ancestral interactions, were inferred. Ancestral proteins were reconstructed using orthologous groups on different evolutionary levels. A stochastic approach, using the duplication-divergence model, was developed for estimating the probabilities of ancient interactions from today's PPI networks. The growth rates for nodes, edges, sizes and modularities of the networks indicate multiplicative growth and are consistent with the results from independent static analysis. Our results support the duplication-divergence model of evolution and indicate fractality and multiplicative growth as general properties of the PPI network structure and dynamics.
Collapse
Affiliation(s)
- Yuliang Jin
- Levich Institute and Physics Department, City College of New York, New York, New York, United States of America
| | - Dmitrij Turaev
- Department of Computational Systems Biology, University of Vienna, Vienna, Austria
| | - Thomas Weinmaier
- Department of Computational Systems Biology, University of Vienna, Vienna, Austria
| | - Thomas Rattei
- Department of Computational Systems Biology, University of Vienna, Vienna, Austria
| | - Hernán A. Makse
- Levich Institute and Physics Department, City College of New York, New York, New York, United States of America
| |
Collapse
|
28
|
Xin X, Gfeller D, Cheng J, Tonikian R, Sun L, Guo A, Lopez L, Pavlenco A, Akintobi A, Zhang Y, Rual JF, Currell B, Seshagiri S, Hao T, Yang X, Shen YA, Salehi-Ashtiani K, Li J, Cheng AT, Bouamalay D, Lugari A, Hill DE, Grimes ML, Drubin DG, Grant BD, Vidal M, Boone C, Sidhu SS, Bader GD. SH3 interactome conserves general function over specific form. Mol Syst Biol 2013; 9:652. [PMID: 23549480 PMCID: PMC3658277 DOI: 10.1038/msb.2013.9] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2012] [Accepted: 02/20/2013] [Indexed: 12/20/2022] Open
Abstract
Src homology 3 (SH3) domains bind peptides to mediate protein-protein interactions that assemble and regulate dynamic biological processes. We surveyed the repertoire of SH3 binding specificity using peptide phage display in a metazoan, the worm Caenorhabditis elegans, and discovered that it structurally mirrors that of the budding yeast Saccharomyces cerevisiae. We then mapped the worm SH3 interactome using stringent yeast two-hybrid and compared it with the equivalent map for yeast. We found that the worm SH3 interactome resembles the analogous yeast network because it is significantly enriched for proteins with roles in endocytosis. Nevertheless, orthologous SH3 domain-mediated interactions are highly rewired. Our results suggest a model of network evolution where general function of the SH3 domain network is conserved over its specific form.
Collapse
Affiliation(s)
- Xiaofeng Xin
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - David Gfeller
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Jackie Cheng
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, USA
| | - Raffi Tonikian
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Lin Sun
- Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, NJ, USA
| | - Ailan Guo
- Cell Signaling Technology, Danvers, MA, USA
| | - Lianet Lopez
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Alevtina Pavlenco
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | - Adenrele Akintobi
- Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, NJ, USA
| | - Yingnan Zhang
- Department of Early Discovery Biochemistry, Genentech, South San Francisco, CA, USA
| | - Jean-François Rual
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Bridget Currell
- Department of Molecular Biology, Genentech, South San Francisco, CA, USA
| | | | - Tong Hao
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Xinping Yang
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Yun A Shen
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Kourosh Salehi-Ashtiani
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Jingjing Li
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Aaron T Cheng
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, USA
| | - Dryden Bouamalay
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, USA
| | - Adrien Lugari
- IMR Laboratory, UPR 3243, Institut de Microbiologie de la Méditérannée, CNRS and Aix-Marseille Université, Marseille Cedex 20, France
| | - David E Hill
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Mark L Grimes
- Division of Biological Sciences, Center for Structural and Functional Neuroscience, The University of Montana, Missoula, MT, USA
| | - David G Drubin
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA, USA
| | - Barth D Grant
- Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, NJ, USA
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Genetics, Harvard Medical School, Boston, MA, USA
| | - Charles Boone
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Sachdev S Sidhu
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
29
|
The ortholog conjecture is untestable by the current gene ontology but is supported by RNA sequencing data. PLoS Comput Biol 2012; 8:e1002784. [PMID: 23209392 PMCID: PMC3510086 DOI: 10.1371/journal.pcbi.1002784] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2012] [Accepted: 10/02/2012] [Indexed: 11/19/2022] Open
Abstract
The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.
Collapse
|
30
|
Leal Valentim F, Neven F, Boyen P, van Dijk ADJ. Interactome-wide prediction of protein-protein binding sites reveals effects of protein sequence variation in Arabidopsis thaliana. PLoS One 2012; 7:e47022. [PMID: 23077539 PMCID: PMC3471968 DOI: 10.1371/journal.pone.0047022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 09/07/2012] [Indexed: 11/18/2022] Open
Abstract
The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction networks.
Collapse
Affiliation(s)
| | - Frank Neven
- Hasselt University and Transnational University of Limburg, Hasselt, Belgium
| | - Peter Boyen
- Hasselt University and Transnational University of Limburg, Hasselt, Belgium
| | - Aalt D. J. van Dijk
- Plant Research International, Bioscience, Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
31
|
Diversity in genetic in vivo methods for protein-protein interaction studies: from the yeast two-hybrid system to the mammalian split-luciferase system. Microbiol Mol Biol Rev 2012; 76:331-82. [PMID: 22688816 DOI: 10.1128/mmbr.05021-11] [Citation(s) in RCA: 135] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The yeast two-hybrid system pioneered the field of in vivo protein-protein interaction methods and undisputedly gave rise to a palette of ingenious techniques that are constantly pushing further the limits of the original method. Sensitivity and selectivity have improved because of various technical tricks and experimental designs. Here we present an exhaustive overview of the genetic approaches available to study in vivo binary protein interactions, based on two-hybrid and protein fragment complementation assays. These methods have been engineered and employed successfully in microorganisms such as Saccharomyces cerevisiae and Escherichia coli, but also in higher eukaryotes. From single binary pairwise interactions to whole-genome interactome mapping, the self-reassembly concept has been employed widely. Innovative studies report the use of proteins such as ubiquitin, dihydrofolate reductase, and adenylate cyclase as reconstituted reporters. Protein fragment complementation assays have extended the possibilities in protein-protein interaction studies, with technologies that enable spatial and temporal analyses of protein complexes. In addition, one-hybrid and three-hybrid systems have broadened the types of interactions that can be studied and the findings that can be obtained. Applications of these technologies are discussed, together with the advantages and limitations of the available assays.
Collapse
|
32
|
Lewis ACF, Jones NS, Porter MA, Deane CM. What evidence is there for the homology of protein-protein interactions? PLoS Comput Biol 2012; 8:e1002645. [PMID: 23028270 PMCID: PMC3447968 DOI: 10.1371/journal.pcbi.1002645] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2011] [Accepted: 06/21/2012] [Indexed: 12/17/2022] Open
Abstract
The notion that sequence homology implies functional similarity underlies much of computational biology. In the case of protein-protein interactions, an interaction can be inferred between two proteins on the basis that sequence-similar proteins have been observed to interact. The use of transferred interactions is common, but the legitimacy of such inferred interactions is not clear. Here we investigate transferred interactions and whether data incompleteness explains the lack of evidence found for them. Using definitions of homology associated with functional annotation transfer, we estimate that conservation rates of interactions are low even after taking interactome incompleteness into account. For example, at a blastp E-value threshold of 10(-70), we estimate the conservation rate to be about 11 % between S. cerevisiae and H. sapiens. Our method also produces estimates of interactome sizes (which are similar to those previously proposed). Using our estimates of interaction conservation we estimate the rate at which protein-protein interactions are lost across species. To our knowledge, this is the first such study based on large-scale data. Previous work has suggested that interactions transferred within species are more reliable than interactions transferred across species. By controlling for factors that are specific to within-species interaction prediction, we propose that the transfer of interactions within species might be less reliable than transfers between species. Protein-protein interactions appear to be very rarely conserved unless very high sequence similarity is observed. Consequently, inferred interactions should be used with care.
Collapse
Affiliation(s)
- Anna C. F. Lewis
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Systems Biology Doctoral Training Centre, University of Oxford, Oxford, United Kingdom
| | - Nick S. Jones
- Department of Mathematics, Imperial College, London, United Kingdom
- Department of Physics, University of Oxford, Oxford, United Kingdom
- CABDyN Complexity Centre, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Integrative Systems Biology, University of Oxford, Oxford, United Kingdom
| | - Mason A. Porter
- CABDyN Complexity Centre, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Industrial and Applied Mathematics, Mathematical Institute, University of Oxford, Oxford, United Kingdom
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, United Kingdom
- Oxford Centre for Integrative Systems Biology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
33
|
Gomes M, Hamer R, Reinert G, Deane CM. Mutual information and variants for protein domain-domain contact prediction. BMC Res Notes 2012; 5:472. [PMID: 23244412 PMCID: PMC3532072 DOI: 10.1186/1756-0500-5-472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/10/2012] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein). METHODS Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively. RESULTS We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific "successful" case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random. CONCLUSIONS All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities.
Collapse
Affiliation(s)
- Mireille Gomes
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | | | |
Collapse
|
34
|
Hamp T, Rost B. Alternative protein-protein interfaces are frequent exceptions. PLoS Comput Biol 2012; 8:e1002623. [PMID: 22876170 PMCID: PMC3410849 DOI: 10.1371/journal.pcbi.1002623] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2011] [Accepted: 06/11/2012] [Indexed: 11/18/2022] Open
Abstract
The intricate molecular details of protein-protein interactions (PPIs) are crucial for function. Therefore, measuring the same interacting protein pair again, we expect the same result. This work measured the similarity in the molecular details of interaction for the same and for homologous protein pairs between different experiments. All scores analyzed suggested that different experiments often find exceptions in the interfaces of similar PPIs: up to 22% of all comparisons revealed some differences even for sequence-identical pairs of proteins. The corresponding number for pairs of close homologs reached 68%. Conversely, the interfaces differed entirely for 12-29% of all comparisons. All these estimates were calculated after redundancy reduction. The magnitude of interface differences ranged from subtle to the extreme, as illustrated by a few examples. An extreme case was a change of the interacting domains between two observations of the same biological interaction. One reason for different interfaces was the number of copies of an interaction in the same complex: the probability of observing alternative binding modes increases with the number of copies. Even after removing the special cases with alternative hetero-interfaces to the same homomer, a substantial variability remained. Our results strongly support the surprising notion that there are many alternative solutions to make the intricate molecular details of PPIs crucial for function.
Collapse
Affiliation(s)
- Tobias Hamp
- TUM, Bioinformatik - I12, Informatik, Garching, Germany
| | - Burkhard Rost
- TUM, Bioinformatik - I12, Informatik, Garching, Germany
- Institute of Advanced Study (IAS), TUM, Garching, Germany
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
35
|
Som A, Luštrek M, Singh NK, Fuellen G. Derivation of an interaction/regulation network describing pluripotency in human. Gene 2012; 502:99-107. [PMID: 22548825 DOI: 10.1016/j.gene.2012.04.025] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2012] [Revised: 03/21/2012] [Accepted: 04/09/2012] [Indexed: 01/08/2023]
Abstract
Identification of the key genes/proteins of pluripotency and their interrelationships is an important step in understanding the induction and maintenance of pluripotency. Experimental approaches have accumulated large amounts of interaction/regulation data in mouse. We investigate how far such information can be transferred to human, the species of maximum interest, for which experimental data are much more limited. To address this issue, we mapped an existing mouse pluripotency network (the PluriNetWork) to human. We transferred interaction and regulation links between genes/proteins from mouse to human on the basis of orthologous relationship of the genes/proteins (called interolog mapping). To reduce the number of false positives, we used four different methods: phylogenetic profiling, Gene Ontology semantic similarity, gene co-expression, and RNA interference (RNAi) data. The methods and the resulting networks were evaluated by a novel approach using the information about the genes known to be involved in pluripotency from the literature. The RNAi method proved best for filtering out unlikely interactions, so it was used to construct the final human pluripotency network. The RNAi data are based on human embryonic stem cells (hESCs) that are generally considered to be in a (primed) epiblast stem cell state. Therefore, we assume that the final human network may reflect the (primed) epiblast stem cell state more closely, while the mouse network reflects the (unprimed/naïve) embryonic stem cell state more closely.
Collapse
Affiliation(s)
- Anup Som
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, Ernst-Heydemann-Str. 8, 18057, Rostock, Germany
| | | | | | | |
Collapse
|
36
|
Tyagi M, Thangudu RR, Zhang D, Bryant SH, Madej T, Panchenko AR. Homology inference of protein-protein interactions via conserved binding sites. PLoS One 2012; 7:e28896. [PMID: 22303436 PMCID: PMC3269416 DOI: 10.1371/journal.pone.0028896] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2011] [Accepted: 11/16/2011] [Indexed: 11/18/2022] Open
Abstract
The coverage and reliability of protein-protein interactions determined by high-throughput experiments still needs to be improved, especially for higher organisms, therefore the question persists, how interactions can be verified and predicted by computational approaches using available data on protein structural complexes. Recently we developed an approach called IBIS (Inferred Biomolecular Interaction Server) to predict and annotate protein-protein binding sites and interaction partners, which is based on the assumption that the structural location and sequence patterns of protein-protein binding sites are conserved between close homologs. In this study first we confirmed high accuracy of our method and found that its accuracy depends critically on the usage of all available data on structures of homologous complexes, compared to the approaches where only a non-redundant set of complexes is employed. Second we showed that there exists a trade-off between specificity and sensitivity if we employ in the prediction only evolutionarily conserved binding site clusters or clusters supported by only one observation (singletons). Finally we addressed the question of identifying the biologically relevant interactions using the homology inference approach and demonstrated that a large majority of crystal packing interactions can be correctly identified and filtered by our algorithm. At the same time, about half of biological interfaces that are not present in the protein crystallographic asymmetric unit can be reconstructed by IBIS from homologous complexes without the prior knowledge of crystal parameters of the query protein.
Collapse
Affiliation(s)
- Manoj Tyagi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ratna R. Thangudu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Dachuan Zhang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Stephen H. Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Thomas Madej
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (TM); (AP)
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (TM); (AP)
| |
Collapse
|
37
|
Herman D, Ochoa D, Juan D, Lopez D, Valencia A, Pazos F. Selection of organisms for the co-evolution-based study of protein interactions. BMC Bioinformatics 2011; 12:363. [PMID: 21910884 PMCID: PMC3179974 DOI: 10.1186/1471-2105-12-363] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2011] [Accepted: 09/12/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The prediction and study of protein interactions and functional relationships based on similarity of phylogenetic trees, exemplified by the mirrortree and related methodologies, is being widely used. Although dependence between the performance of these methods and the set of organisms used to build the trees was suspected, so far nobody assessed it in an exhaustive way, and, in general, previous works used as many organisms as possible. In this work we asses the effect of using different sets of organism (chosen according with various phylogenetic criteria) on the performance of this methodology in detecting protein interactions of different nature. RESULTS We show that the performance of three mirrortree-related methodologies depends on the set of organisms used for building the trees, and it is not always directly related to the number of organisms in a simple way. Certain subsets of organisms seem to be more suitable for the predictions of certain types of interactions. This relationship between type of interaction and optimal set of organism for detecting them makes sense in the light of the phylogenetic distribution of the organisms and the nature of the interactions. CONCLUSIONS In order to obtain an optimal performance when predicting protein interactions, it is recommended to use different sets of organisms depending on the available computational resources and data, as well as the type of interactions of interest.
Collapse
Affiliation(s)
- Dorota Herman
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), Cantoblanco, 28049 Madrid, Spain
| | | | | | | | | | | |
Collapse
|
38
|
Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput Biol 2011; 7:e1002073. [PMID: 21695233 PMCID: PMC3111532 DOI: 10.1371/journal.pcbi.1002073] [Citation(s) in RCA: 130] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 04/18/2011] [Indexed: 01/23/2023] Open
Abstract
A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the “ortholog conjecture”). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act. The use of model organisms in biological research rests upon the assumption that gene and protein functions discovered in one organism are likely to be the same or similar in another organism. Hence, the assumption that experiments in mouse will tell us about the function of genes in humans. A guiding principle in the assignment of function from one organism to another is that single-copy genes (“orthologs”) are statistically more likely to provide functional information than are multi-copy genes, whether in the same organism or different organisms. Here we have tested this idea by examining genes with known functions in human and mouse. Surprisingly, we find that multi-copy genes are equally or more likely to provide accurate functional information than are single-copy genes. Our results suggest that the organism itself plays at least as large a role in determining the function of genes as does the particular sequence of the gene alone. This insight will benefit the assignment of function to genes whose roles are not yet known by widening the pool of appropriate genes from which function can be inferred.
Collapse
|
39
|
Lees JG, Heriche JK, Morilla I, Ranea JA, Orengo CA. Systematic computational prediction of protein interaction networks. Phys Biol 2011; 8:035008. [PMID: 21572181 DOI: 10.1088/1478-3975/8/3/035008] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Determining the network of physical protein associations is an important first step in developing mechanistic evidence for elucidating biological pathways. Despite rapid advances in the field of high throughput experiments to determine protein interactions, the majority of associations remain unknown. Here we describe computational methods for significantly expanding protein association networks. We describe methods for integrating multiple independent sources of evidence to obtain higher quality predictions and we compare the major publicly available resources available for experimentalists to use.
Collapse
Affiliation(s)
- J G Lees
- Research Department of Structural & Molecular Biology, University College London, London, UK.
| | | | | | | | | |
Collapse
|
40
|
Clark WT, Radivojac P. Analysis of protein function and its prediction from amino acid sequence. Proteins 2011; 79:2086-96. [PMID: 21671271 DOI: 10.1002/prot.23029] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Revised: 02/15/2011] [Accepted: 03/03/2011] [Indexed: 01/02/2023]
Abstract
Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO.
Collapse
Affiliation(s)
- Wyatt T Clark
- School of Informatics and Computing, Indiana University, Bloomington, Indiana 47405, USA
| | | |
Collapse
|
41
|
Lateral acquisition of genes is affected by the friendliness of their products. Proc Natl Acad Sci U S A 2010; 108:343-8. [PMID: 21149709 DOI: 10.1073/pnas.1009775108] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A major factor in the evolution of microbial genomes is the lateral acquisition of genes that evolved under the functional constraints of other species. Integration of foreign genes into a genome that has different components and circuits poses an evolutionary challenge. Moreover, genes belonging to complex modules in the pretransfer species are unlikely to maintain their functionality when transferred alone to new species. Thus, it is widely accepted that lateral gene transfer favors proteins with only a few protein-protein interactions. The propensity of proteins to participate in protein-protein interactions can be assessed using computational methods that identify putative interaction sites on the protein. Here we report that laterally acquired proteins contain significantly more putative interaction sites than native proteins. Thus, genes encoding proteins with multiple protein-protein interactions may in fact be more prone to transfer than genes with fewer interactions. We suggest that these proteins have a greater chance of forming new interactions in new species, thus integrating into existing modules. These results reveal basic principles for the incorporation of novel genes into existing systems.
Collapse
|
42
|
van Dijk ADJ, Morabito G, Fiers M, van Ham RCHJ, Angenent GC, Immink RGH. Sequence motifs in MADS transcription factors responsible for specificity and diversification of protein-protein interaction. PLoS Comput Biol 2010; 6:e1001017. [PMID: 21124869 PMCID: PMC2991254 DOI: 10.1371/journal.pcbi.1001017] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 10/27/2010] [Indexed: 11/18/2022] Open
Abstract
Protein sequences encompass tertiary structures and contain information about specific molecular interactions, which in turn determine biological functions of proteins. Knowledge about how protein sequences define interaction specificity is largely missing, in particular for paralogous protein families with high sequence similarity, such as the plant MADS domain transcription factor family. In comparison to the situation in mammalian species, this important family of transcription regulators has expanded enormously in plant species and contains over 100 members in the model plant species Arabidopsis thaliana. Here, we provide insight into the mechanisms that determine protein-protein interaction specificity for the Arabidopsis MADS domain transcription factor family, using an integrated computational and experimental approach. Plant MADS proteins have highly similar amino acid sequences, but their dimerization patterns vary substantially. Our computational analysis uncovered small sequence regions that explain observed differences in dimerization patterns with reasonable accuracy. Furthermore, we show the usefulness of the method for prediction of MADS domain transcription factor interaction networks in other plant species. Introduction of mutations in the predicted interaction motifs demonstrated that single amino acid mutations can have a large effect and lead to loss or gain of specific interactions. In addition, various performed bioinformatics analyses shed light on the way evolution has shaped MADS domain transcription factor interaction specificity. Identified protein-protein interaction motifs appeared to be strongly conserved among orthologs, indicating their evolutionary importance. We also provide evidence that mutations in these motifs can be a source for sub- or neo-functionalization. The analyses presented here take us a step forward in understanding protein-protein interactions and the interplay between protein sequences and network evolution.
Collapse
Affiliation(s)
| | | | - Martijn Fiers
- Plant Research International, Bioscience, Wageningen, The Netherlands
| | | | - Gerco C. Angenent
- Plant Research International, Bioscience, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands
| | - Richard G. H. Immink
- Plant Research International, Bioscience, Wageningen, The Netherlands
- Centre for BioSystems Genomics (CBSG), Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
43
|
Levy ED. A Simple Definition of Structural Regions in Proteins and Its Use in Analyzing Interface Evolution. J Mol Biol 2010; 403:660-70. [DOI: 10.1016/j.jmb.2010.09.028] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Revised: 08/19/2010] [Accepted: 09/13/2010] [Indexed: 10/19/2022]
|
44
|
van Dijk ADJ, van Ham RCHJ. Conserved and variable correlated mutations in the plant MADS protein network. BMC Genomics 2010; 11:607. [PMID: 20979667 PMCID: PMC3017862 DOI: 10.1186/1471-2164-11-607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 10/28/2010] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. RESULTS Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. CONCLUSION Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.
Collapse
Affiliation(s)
- Aalt DJ van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| |
Collapse
|
45
|
Ranea JAG, Morilla I, Lees JG, Reid AJ, Yeats C, Clegg AB, Sanchez-Jimenez F, Orengo C. Finding the "dark matter" in human and yeast protein network prediction and modelling. PLoS Comput Biol 2010; 6:e1000945. [PMID: 20885791 PMCID: PMC2944794 DOI: 10.1371/journal.pcbi.1000945] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2009] [Accepted: 08/30/2010] [Indexed: 11/17/2022] Open
Abstract
Accurate modelling of biological systems requires a deeper and more complete knowledge about the molecular components and their functional associations than we currently have. Traditionally, new knowledge on protein associations generated by experiments has played a central role in systems modelling, in contrast to generally less trusted bio-computational predictions. However, we will not achieve realistic modelling of complex molecular systems if the current experimental designs lead to biased screenings of real protein networks and leave large, functionally important areas poorly characterised. To assess the likelihood of this, we have built comprehensive network models of the yeast and human proteomes by using a meta-statistical integration of diverse computationally predicted protein association datasets. We have compared these predicted networks against combined experimental datasets from seven biological resources at different level of statistical significance. These eukaryotic predicted networks resemble all the topological and noise features of the experimentally inferred networks in both species, and we also show that this observation is not due to random behaviour. In addition, the topology of the predicted networks contains information on true protein associations, beyond the constitutive first order binary predictions. We also observe that most of the reliable predicted protein associations are experimentally uncharacterised in our models, constituting the hidden or "dark matter" of networks by analogy to astronomical systems. Some of this dark matter shows enrichment of particular functions and contains key functional elements of protein networks, such as hubs associated with important functional areas like the regulation of Ras protein signal transduction in human cells. Thus, characterising this large and functionally important dark matter, elusive to established experimental designs, may be crucial for modelling biological systems. In any case, these predictions provide a valuable guide to these experimentally elusive regions.
Collapse
Affiliation(s)
- Juan A. G. Ranea
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
- Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, Spain
| | - Ian Morilla
- Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, Spain
| | - Jon G. Lees
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Adam J. Reid
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Corin Yeats
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Andrew B. Clegg
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| | - Francisca Sanchez-Jimenez
- Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, Spain
| | - Christine Orengo
- Research Department of Structural & Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
46
|
Kundrotas PJ, Vakser IA. Accuracy of protein-protein binding sites in high-throughput template-based modeling. PLoS Comput Biol 2010; 6:e1000727. [PMID: 20369011 PMCID: PMC2848539 DOI: 10.1371/journal.pcbi.1000727] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 03/01/2010] [Indexed: 11/18/2022] Open
Abstract
The accuracy of protein structures, particularly their binding sites, is essential for the success of modeling protein complexes. Computationally inexpensive methodology is required for genome-wide modeling of such structures. For systematic evaluation of potential accuracy in high-throughput modeling of binding sites, a statistical analysis of target-template sequence alignments was performed for a representative set of protein complexes. For most of the complexes, alignments containing all residues of the interface were found. The full interface alignments were obtained even in the case of poor alignments where a relatively small part of the target sequence (as low as 40%) aligned to the template sequence, with a low overall alignment identity (<30%). Although such poor overall alignments might be considered inadequate for modeling of whole proteins, the alignment of the interfaces was strong enough for docking. In the set of homology models built on these alignments, one third of those ranked 1 by a simple sequence identity criteria had RMSD<5 Å, the accuracy suitable for low-resolution template free docking. Such models corresponded to multi-domain target proteins, whereas for single-domain proteins the best models had 5 Å<RMSD<10 Å, the accuracy suitable for less sensitive structure-alignment methods. Overall, ∼50% of complexes with the interfaces modeled by high-throughput techniques had accuracy suitable for meaningful docking experiments. This percentage will grow with the increasing availability of co-crystallized protein-protein complexes. Protein-protein interactions play a central role in life processes at the molecular level. The structural information on these interactions is essential for our understanding of these processes and our ability to design drugs to cure diseases. Limitations of experimental techniques to determine the structure of protein-protein complexes leave the vast majority of these complexes to be determined by computational modeling. The modeling is also important for revealing the mechanisms of the complex formation. The 3D modeling of protein complexes (protein docking) relies on the structure of the individual proteins for the prediction of their assembly. Thus the structural accuracy of the individual proteins, which often are models themselves, is critical for the docking. For the docking purposes, the accuracy of the binding sites is obviously essential, whereas the accuracy of the non-binding regions is less critical. In our study, we systematically analyze the accuracy of the binding sites in protein models produced by high-throughput techniques suitable for large-scale (e.g., genome-wide) studies. The results indicate that this accuracy is adequate for the low- to medium-resolution docking of a significant part of known protein-protein complexes.
Collapse
Affiliation(s)
- Petras J. Kundrotas
- Center for Bioinformatics and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
| | - Ilya A. Vakser
- Center for Bioinformatics and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: .
| |
Collapse
|
47
|
Raman K. Construction and analysis of protein-protein interaction networks. AUTOMATED EXPERIMENTATION 2010; 2:2. [PMID: 20334628 PMCID: PMC2834675 DOI: 10.1186/1759-4499-2-2] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 02/15/2010] [Indexed: 12/28/2022]
Abstract
Protein–protein interactions form the basis for a vast majority of cellular events, including signal transduction and transcriptional regulation. It is now understood that the study of interactions between cellular macromolecules is fundamental to the understanding of biological systems. Interactions between proteins have been studied through a number of high-throughput experiments and have also been predicted through an array of computational methods that leverage the vast amount of sequence data generated in the last decade. In this review, I discuss some of the important computational methods for the prediction of functional linkages between proteins. I then give a brief overview of some of the databases and tools that are useful for a study of protein–protein interactions. I also present an introduction to network theory, followed by a discussion of the parameters commonly used in analysing networks, important network topologies, as well as methods to identify important network components, based on perturbations.
Collapse
Affiliation(s)
- Karthik Raman
- Department of Biochemistry, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland.
| |
Collapse
|
48
|
Reid AJ, Ranea JA, Orengo CA. Comparative evolutionary analysis of protein complexes in E. coli and yeast. BMC Genomics 2010; 11:79. [PMID: 20122144 PMCID: PMC2837643 DOI: 10.1186/1471-2164-11-79] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2009] [Accepted: 02/01/2010] [Indexed: 11/17/2022] Open
Abstract
Background Proteins do not act in isolation; they frequently act together in protein complexes to carry out concerted cellular functions. The evolution of complexes is poorly understood, especially in organisms other than yeast, where little experimental data has been available. Results We generated accurate, high coverage datasets of protein complexes for E. coli and yeast in order to study differences in the evolution of complexes between these two species. We show that substantial differences exist in how complexes have evolved between these organisms. A previously proposed model of complex evolution identified complexes with cores of interacting homologues. We support findings of the relative importance of this mode of evolution in yeast, but find that it is much less common in E. coli. Additionally it is shown that those homologues which do cluster in complexes are involved in eukaryote-specific functions. Furthermore we identify correlated pairs of non-homologous domains which occur in multiple protein complexes. These were identified in both yeast and E. coli and we present evidence that these too may represent complex cores in yeast but not those of E. coli. Conclusions Our results suggest that there are differences in the way protein complexes have evolved in E. coli and yeast. Whereas some yeast complexes have evolved by recruiting paralogues, this is not apparent in E. coli. Furthermore, such complexes are involved in eukaryotic-specific functions. This implies that the increase in gene family sizes seen in eukaryotes in part reflects multiple family members being used within complexes. However, in general, in both E. coli and yeast, homologous domains are used in different complexes.
Collapse
Affiliation(s)
- Adam J Reid
- Research Department of Structural & Molecular Biology, University College London, London, WC1E 6BT, UK.
| | | | | |
Collapse
|
49
|
Terentiev AA, Moldogazieva NT, Shaitan KV. Dynamic proteomics in modeling of the living cell. Protein-protein interactions. BIOCHEMISTRY (MOSCOW) 2010; 74:1586-607. [DOI: 10.1134/s0006297909130112] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
50
|
Shin CJ, Davis MJ, Ragan MA. Towards the mammalian interactome: Inference of a core mammalian interaction set in mouse. Proteomics 2009; 9:5256-66. [DOI: 10.1002/pmic.200900262] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|