1
|
Melkonian M, Juigné C, Dameron O, Rabut G, Becker E. Towards a reproducible interactome: semantic-based detection of redundancies to unify protein-protein interaction databases. Bioinformatics 2022; 38:1685-1691. [PMID: 35015827 DOI: 10.1093/bioinformatics/btac013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 11/29/2021] [Accepted: 01/06/2022] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Information on protein-protein interactions is collected in numerous primary databases with their own curation process. Several meta-databases aggregate primary databases to provide more exhaustive datasets. In addition to exhaustivity, aggregation contributes to reliability by providing an overview of the various studies and detection methods supporting an interaction. However, interactions listed in different primary databases are partly redundant because some publications reporting protein-protein interactions have been curated by multiple primary databases. Mere aggregation can thus introduce a bias if these redundancies are not identified and eliminated. To overcome this bias, meta-databases rely on the Molecular Interaction ontology that describes interaction detection methods, but they do not fully take advantage of the ontology's rich semantics, which leads to systematically overestimating interaction reproducibility. RESULTS We propose a precise definition of explicit and implicit redundancy and show that both can be easily detected using Semantic Web technologies. We apply this process to a dataset from the Agile Protein Interactomes DataServer (APID) meta-database and show that while explicit redundancies were detected by the APID aggregation process, about 15% of APID entries are implicitly redundant and should not be taken into account when presenting confidence-related metrics. More than 90% of implicit redundancies result from the aggregation of distinct primary databases, whereas the remaining occurs between entries of a single database. Finally, we build a 'reproducible interactome' with interactions that have been reproduced by multiple methods or publications. The size of the reproducible interactome is drastically impacted by removing redundancies for both yeast (-59%) and human (-56%), and we show that this is largely due to implicit redundancies. AVAILABILITY AND IMPLEMENTATION Software, data and results are available at https://gitlab.com/nnet56/reproducible-interactome, https://reproducible-interactome.genouest.org/, Zenodo (https://doi.org/10.5281/zenodo.5595037) and NDEx (https://doi.org/10.18119/N94302 and https://doi.org/10.18119/N97S4D). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marc Melkonian
- Univ Rennes, Inria, CNRS, IRISA - UMR 6074, F-35000 Rennes, France.,Univ Rennes, CNRS, IGDR - UMR 6290, F-35000 Rennes, France
| | - Camille Juigné
- Univ Rennes, Inria, CNRS, IRISA - UMR 6074, F-35000 Rennes, France.,Pegase, Inrae, Institut Agro, 35590 Saint-Gilles, France
| | - Olivier Dameron
- Univ Rennes, Inria, CNRS, IRISA - UMR 6074, F-35000 Rennes, France
| | - Gwenaël Rabut
- Univ Rennes, CNRS, IGDR - UMR 6290, F-35000 Rennes, France
| | | |
Collapse
|
2
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
3
|
Khazen G, Gyulkhandanian A, Issa T, Maroun RC. Getting to know each other: PPIMem, a novel approach for predicting transmembrane protein-protein complexes. Comput Struct Biotechnol J 2021; 19:5184-5197. [PMID: 34630938 PMCID: PMC8476896 DOI: 10.1016/j.csbj.2021.09.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/23/2021] [Accepted: 09/12/2021] [Indexed: 02/03/2023] Open
Abstract
Because of their considerable number and diversity, membrane proteins and their macromolecular complexes represent the functional units of cells. Their quaternary structure may be stabilized by interactions between the α-helices of different proteins in the hydrophobic region of the cell membrane. Membrane proteins equally represent potential pharmacological targets par excellence for various diseases. Unfortunately, their experimental 3D structure and that of their complexes with other intramembrane protein partners are scarce due to technical difficulties. To overcome this key problem, we devised PPIMem, a computational approach for the specific prediction of higher-order structures of α-helical transmembrane proteins. The novel approach involves proper identification of the amino acid residues at the interface of molecular complexes with a 3D structure. The identified residues compose then nonlinear interaction motifs that are conveniently expressed as mathematical regular expressions. These are efficiently implemented for motif search in amino acid sequence databases, and for the accurate prediction of intramembrane protein-protein complexes. Our template interface-based approach predicted 21,544 binary complexes between 1,504 eukaryotic plasma membrane proteins across 39 species. We compare our predictions to experimental datasets of protein-protein interactions as a first validation method. The online database that results from the PPIMem algorithm with the annotated predicted interactions are implemented as a web server and can be accessed directly at https://transint.univ-evry.fr.
Collapse
Affiliation(s)
- Georges Khazen
- Computer Science and Mathematics Department, Lebanese American University, Byblos, Lebanon
| | - Aram Gyulkhandanian
- Inserm U1204/Université d'Evry/Université Paris-Saclay, Structure-Activité des Biomolécules Normales et Pathologiques, 91025 Evry, France
| | - Tina Issa
- Computer Science and Mathematics Department, Lebanese American University, Byblos, Lebanon
| | - Rachid C Maroun
- Inserm U1204/Université d'Evry/Université Paris-Saclay, Structure-Activité des Biomolécules Normales et Pathologiques, 91025 Evry, France
| |
Collapse
|
4
|
Farahi N, Lazar T, Wodak SJ, Tompa P, Pancsa R. Integration of Data from Liquid-Liquid Phase Separation Databases Highlights Concentration and Dosage Sensitivity of LLPS Drivers. Int J Mol Sci 2021; 22:ijms22063017. [PMID: 33809541 PMCID: PMC8002189 DOI: 10.3390/ijms22063017] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 03/12/2021] [Accepted: 03/13/2021] [Indexed: 12/13/2022] Open
Abstract
Liquid–liquid phase separation (LLPS) is a molecular process that leads to the formation of membraneless organelles, representing functionally specialized liquid-like cellular condensates formed by proteins and nucleic acids. Integrating the data on LLPS-associated proteins from dedicated databases revealed only modest agreement between them and yielded a high-confidence dataset of 89 human LLPS drivers. Analysis of the supporting evidence for our dataset uncovered a systematic and potentially concerning difference between protein concentrations used in a good fraction of the in vitro LLPS experiments, a key parameter that governs the phase behavior, and the proteomics-derived cellular abundance levels of the corresponding proteins. Closer scrutiny of the underlying experimental data enabled us to offer a sound rationale for this systematic difference, which draws on our current understanding of the cellular organization of the proteome and the LLPS process. In support of this rationale, we find that genes coding for our human LLPS drivers tend to be dosage-sensitive, suggesting that their cellular availability is tightly regulated to preserve their functional role in direct or indirect relation to condensate formation. Our analysis offers guideposts for increasing agreement between in vitro and in vivo studies, probing the roles of proteins in LLPS.
Collapse
Affiliation(s)
- Nazanin Farahi
- VIB-VUB Center for Structural Biology, Flemish Institute for Biotechnology, 1050 Brussels, Belgium; (N.F.); (T.L.); (S.J.W.)
- Structural Biology Brussels, Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Biology, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Tamas Lazar
- VIB-VUB Center for Structural Biology, Flemish Institute for Biotechnology, 1050 Brussels, Belgium; (N.F.); (T.L.); (S.J.W.)
- Structural Biology Brussels, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Shoshana J. Wodak
- VIB-VUB Center for Structural Biology, Flemish Institute for Biotechnology, 1050 Brussels, Belgium; (N.F.); (T.L.); (S.J.W.)
- Structural Biology Brussels, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Peter Tompa
- VIB-VUB Center for Structural Biology, Flemish Institute for Biotechnology, 1050 Brussels, Belgium; (N.F.); (T.L.); (S.J.W.)
- Structural Biology Brussels, Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences, 1117 Budapest, Hungary
- Correspondence: (P.T.); (R.P.)
| | - Rita Pancsa
- Institute of Enzymology, Research Centre for Natural Sciences, 1117 Budapest, Hungary
- Correspondence: (P.T.); (R.P.)
| |
Collapse
|
5
|
Abstract
iRefWeb is a resource that provides web interface to a large collection of protein-protein interactions aggregated from major primary databases. The underlying data-consolidation process, called iRefIndex, implements a rigorous methodology of identifying redundant protein sequences and integrating disparate data records that reference the same peptide sequences, despite many potential differences in data identifiers across various source databases. iRefWeb offers a unified user interface to all interaction records and associated information collected by iRefIndex, in addition to a number of data filters and visual features that present the supporting evidence. Users of iRefWeb can explore the consolidated landscape of protein-protein interactions, establish the provenance and reliability of each data record, and compare annotations performed by different data curator teams. The iRefWeb portal is freely available at http://wodaklab.org/iRefWeb .
Collapse
|
6
|
Bajpai AK, Davuluri S, Tiwary K, Narayanan S, Oguru S, Basavaraju K, Dayalan D, Thirumurugan K, Acharya KK. Systematic comparison of the protein-protein interaction databases from a user's perspective. J Biomed Inform 2020; 103:103380. [PMID: 32001390 DOI: 10.1016/j.jbi.2020.103380] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 11/08/2019] [Accepted: 01/27/2020] [Indexed: 01/08/2023]
|
7
|
Gemovic B, Sumonja N, Davidovic R, Perovic V, Veljkovic N. Mapping of Protein-Protein Interactions: Web-Based Resources for Revealing Interactomes. Curr Med Chem 2019; 26:3890-3910. [PMID: 29446725 DOI: 10.2174/0929867325666180214113704] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 09/14/2017] [Accepted: 01/29/2018] [Indexed: 01/04/2023]
Abstract
BACKGROUND The significant number of protein-protein interactions (PPIs) discovered by harnessing concomitant advances in the fields of sequencing, crystallography, spectrometry and two-hybrid screening suggests astonishing prospects for remodelling drug discovery. The PPI space which includes up to 650 000 entities is a remarkable reservoir of potential therapeutic targets for every human disease. In order to allow modern drug discovery programs to leverage this, we should be able to discern complete PPI maps associated with a specific disorder and corresponding normal physiology. OBJECTIVE Here, we will review community available computational programs for predicting PPIs and web-based resources for storing experimentally annotated interactions. METHODS We compared the capacities of prediction tools: iLoops, Struck2Net, HOMCOS, COTH, PrePPI, InterPreTS and PRISM to predict recently discovered protein interactions. RESULTS We described sequence-based and structure-based PPI prediction tools and addressed their peculiarities. Additionally, since the usefulness of prediction algorithms critically depends on the quality and quantity of the experimental data they are built on; we extensively discussed community resources for protein interactions. We focused on the active and recently updated primary and secondary PPI databases, repositories specialized to the subject or species, as well as databases that include both experimental and predicted PPIs. CONCLUSION PPI complexes are the basis of important physiological processes and therefore, possible targets for cell-penetrating ligands. Reliable computational PPI predictions can speed up new target discoveries through prioritization of therapeutically relevant protein-protein complexes for experimental studies.
Collapse
Affiliation(s)
- Branislava Gemovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Neven Sumonja
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Radoslav Davidovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Vladimir Perovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| | - Nevena Veljkovic
- Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia
| |
Collapse
|
8
|
Badal VD, Kundrotas PJ, Vakser IA. Natural language processing in text mining for structural modeling of protein complexes. BMC Bioinformatics 2018; 19:84. [PMID: 29506465 PMCID: PMC5838950 DOI: 10.1186/s12859-018-2079-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 02/20/2018] [Indexed: 12/04/2022] Open
Abstract
Background Structural modeling of protein-protein interactions produces a large number of putative configurations of the protein complexes. Identification of the near-native models among them is a serious challenge. Publicly available results of biomedical research may provide constraints on the binding mode, which can be essential for the docking. Our text-mining (TM) tool, which extracts binding site residues from the PubMed abstracts, was successfully applied to protein docking (Badal et al., PLoS Comput Biol, 2015; 11: e1004630). Still, many extracted residues were not relevant to the docking. Results We present an extension of the TM tool, which utilizes natural language processing (NLP) for analyzing the context of the residue occurrence. The procedure was tested using generic and specialized dictionaries. The results showed that the keyword dictionaries designed for identification of protein interactions are not adequate for the TM prediction of the binding mode. However, our dictionary designed to distinguish keywords relevant to the protein binding sites led to considerable improvement in the TM performance. We investigated the utility of several methods of context analysis, based on dissection of the sentence parse trees. The machine learning-based NLP filtered the pool of the mined residues significantly more efficiently than the rule-based NLP. Constraints generated by NLP were tested in docking of unbound proteins from the DOCKGROUND X-ray benchmark set 4. The output of the global low-resolution docking scan was post-processed, separately, by constraints from the basic TM, constraints re-ranked by NLP, and the reference constraints. The quality of a match was assessed by the interface root-mean-square deviation. The results showed significant improvement of the docking output when using the constraints generated by the advanced TM with NLP. Conclusions The basic TM procedure for extracting protein-protein binding site residues from the PubMed abstracts was significantly advanced by the deep parsing (NLP techniques for contextual analysis) in purging of the initial pool of the extracted residues. Benchmarking showed a substantial increase of the docking success rate based on the constraints generated by the advanced TM with NLP. Electronic supplementary material The online version of this article (10.1186/s12859-018-2079-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Varsha D Badal
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047, USA
| | - Petras J Kundrotas
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047, USA.
| | - Ilya A Vakser
- Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047, USA.
| |
Collapse
|
9
|
Felgueiras J, Silva JV, Fardilha M. Adding biological meaning to human protein-protein interactions identified by yeast two-hybrid screenings: A guide through bioinformatics tools. J Proteomics 2018; 171:127-140. [PMID: 28526529 DOI: 10.1016/j.jprot.2017.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 04/26/2017] [Accepted: 05/13/2017] [Indexed: 02/02/2023]
|
10
|
Aguilar D, Pinart M, Koppelman GH, Saeys Y, Nawijn MC, Postma DS, Akdis M, Auffray C, Ballereau S, Benet M, García-Aymerich J, González JR, Guerra S, Keil T, Kogevinas M, Lambrecht B, Lemonnier N, Melen E, Sunyer J, Valenta R, Valverde S, Wickman M, Bousquet J, Oliva B, Antó JM. Computational analysis of multimorbidity between asthma, eczema and rhinitis. PLoS One 2017; 12:e0179125. [PMID: 28598986 PMCID: PMC5466323 DOI: 10.1371/journal.pone.0179125] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 05/24/2017] [Indexed: 12/11/2022] Open
Abstract
Background The mechanisms explaining the co-existence of asthma, eczema and rhinitis (allergic multimorbidity) are largely unknown. We investigated the mechanisms underlying multimorbidity between three main allergic diseases at a molecular level by identifying the proteins and cellular processes that are common to them. Methods An in silico study based on computational analysis of the topology of the protein interaction network was performed in order to characterize the molecular mechanisms of multimorbidity of asthma, eczema and rhinitis. As a first step, proteins associated to either disease were identified using data mining approaches, and their overlap was calculated. Secondly, a functional interaction network was built, allowing to identify cellular pathways involved in allergic multimorbidity. Finally, a network-based algorithm generated a ranked list of newly predicted multimorbidity-associated proteins. Results Asthma, eczema and rhinitis shared a larger number of associated proteins than expected by chance, and their associated proteins exhibited a significant degree of interconnectedness in the interaction network. There were 15 pathways involved in the multimorbidity of asthma, eczema and rhinitis, including IL4 signaling and GATA3-related pathways. A number of proteins potentially associated to these multimorbidity processes were also obtained. Conclusions These results strongly support the existence of an allergic multimorbidity cluster between asthma, eczema and rhinitis, and suggest that type 2 signaling pathways represent a relevant multimorbidity mechanism of allergic diseases. Furthermore, we identified new candidates contributing to multimorbidity that may assist in identifying new targets for multimorbid allergic diseases.
Collapse
Affiliation(s)
- Daniel Aguilar
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- Structural Bioinformatics Group, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
- * E-mail:
| | - Mariona Pinart
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
- Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, Spain
| | - Gerard H. Koppelman
- University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD, Groningen, The Netherlands
- University of Groningen, University Medical Center Groningen, Beatrix Children's Hospital, Department of Pediatric Pulmonology and Pediatric Allergology, Groningen, The Netherlands
| | - Yvan Saeys
- Inflammation Research Center, VIB, Ghent, Belgium
- Department of Respiratory Medicine, Ghent University Hospital, Ghent, Belgium
| | - Martijn C. Nawijn
- University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD, Groningen, The Netherlands
- University of Groningen, Laboratory of Allergology and Pulmonary Diseases, Department of Pathology and Medical Biology, University Medical Center Groningen, Groningen, The Netherlands
| | - Dirkje S. Postma
- University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD, Groningen, The Netherlands
- University of Groningen, Laboratory of Allergology and Pulmonary Diseases, Department of Pathology and Medical Biology, University Medical Center Groningen, Groningen, The Netherlands
| | - Mübeccel Akdis
- Swiss Institute of Allergy and Asthma Research (SIAF), Davos, Switzerland
- Christine Kühne–Center for Allergy Research and Education, Davos, Switzerland
| | - Charles Auffray
- European Institute for Systems Biology and Medicine (EISBM), CNRS, Lyon, France
| | - Stéphane Ballereau
- European Institute for Systems Biology and Medicine (EISBM), CNRS, Lyon, France
| | - Marta Benet
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
| | - Judith García-Aymerich
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
| | - Juan Ramón González
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
| | - Stefano Guerra
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
- Arizona Respiratory Center, Tucson, Arizona, United States of America
| | - Thomas Keil
- Institute of Social Medicine, Epidemiology and Health Economics, Charité University Medical Centre, Berlin, Germany
| | - Manolis Kogevinas
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
- Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, Spain
- National School of Public Health, Athens, Greece
| | - Bart Lambrecht
- University of Groningen, University Medical Center Groningen, Groningen Research Institute for Asthma and COPD, Groningen, The Netherlands
- Department of Pulmonary Medicine, Erasmus MC, Rotterdam, the Netherlands
| | - Nathanael Lemonnier
- European Institute for Systems Biology and Medicine (EISBM), CNRS, Lyon, France
| | - Erik Melen
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
- Sach's Children's Hospital, Stockholm, Sweden
| | - Jordi Sunyer
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- Structural Bioinformatics Group, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
- Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, Spain
| | - Rudolf Valenta
- Division of Immunopathology, Department of Pathophysiology and Allergy Research, Center of Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna, Austria
- Christian Doppler Laboratory for Allergy Research, Medical University of Vienna, Vienna, Austria
| | - Sergi Valverde
- ICREA-Complex Systems Lab, Universitat Pompeu Fabra, Barcelona, Spain
- Institut de Biologia Evolutiva, CSIC-UPF, Barcelona, Spain
| | - Magnus Wickman
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
- Sach's Children's Hospital, Stockholm, Sweden
| | - Jean Bousquet
- Hopital Arnaud de Villeneuve University Hospital and Inserm, Montpellier, France
| | - Baldo Oliva
- Structural Bioinformatics Group, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
| | - Josep M. Antó
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiologia y Salud Pública (CIBERESP), Barcelona, Spain
- Institut Municipal d'Investigació Mèdica (IMIM), Barcelona, Spain
| |
Collapse
|
11
|
Badal VD, Kundrotas PJ, Vakser IA. Text Mining for Protein Docking. PLoS Comput Biol 2015; 11:e1004630. [PMID: 26650466 PMCID: PMC4674139 DOI: 10.1371/journal.pcbi.1004630] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 10/29/2015] [Indexed: 11/18/2022] Open
Abstract
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate. Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.
Collapse
Affiliation(s)
- Varsha D. Badal
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
| | - Petras J. Kundrotas
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (IAV); (PJK)
| | - Ilya A. Vakser
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, United States of America
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail: (IAV); (PJK)
| |
Collapse
|
12
|
Abstract
The prediction of protein-protein interactions and kinase-specific phosphorylation sites on individual proteins is critical for correctly placing proteins within signaling pathways and networks. The importance of this type of annotation continues to increase with the continued explosion of genomic and proteomic data, particularly with emerging data categorizing posttranslational modifications on a large scale. A variety of computational tools are available for this purpose. In this chapter, we review the general methodologies for these types of computational predictions and present a detailed user-focused tutorial of one such method and computational tool, Scansite, which is freely available to the entire scientific community over the Internet.
Collapse
Affiliation(s)
- Tobias Ehrenberger
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | |
Collapse
|
13
|
Kelder T, Verschuren L, van Ommen B, van Gool AJ, Radonjic M. Network signatures link hepatic effects of anti-diabetic interventions with systemic disease parameters. BMC SYSTEMS BIOLOGY 2014; 8:108. [PMID: 25204982 PMCID: PMC4363943 DOI: 10.1186/s12918-014-0108-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 08/29/2014] [Indexed: 11/10/2022]
Abstract
Background Multifactorial diseases such as type 2 diabetes mellitus (T2DM), are driven by a complex network of interconnected mechanisms that translate to a diverse range of complications at the physiological level. To optimally treat T2DM, pharmacological interventions should, ideally, target key nodes in this network that act as determinants of disease progression. Results We set out to discover key nodes in molecular networks based on the hepatic transcriptome dataset from a preclinical study in obese LDLR-/- mice recently published by Radonjic et al. Here, we focus on comparing efficacy of anti-diabetic dietary (DLI) and two drug treatments, namely PPARA agonist fenofibrate and LXR agonist T0901317. By combining knowledge-based and data-driven networks with a random walks based algorithm, we extracted network signatures that link the DLI and two drug interventions to dyslipidemia-related disease parameters. Conclusions This study identified specific and prioritized sets of key nodes in hepatic molecular networks underlying T2DM, uncovering pathways that are to be modulated by targeted T2DM drug interventions in order to modulate the complex disease phenotype.
Collapse
Affiliation(s)
- Thomas Kelder
- TNO, Research Group Microbiology & Systems Biology, Zeist, The Netherlands. .,Current address: EdgeLeap B.V, Utrecht, The Netherlands.
| | - Lars Verschuren
- TNO, Research Group Microbiology & Systems Biology, Zeist, The Netherlands.
| | - Ben van Ommen
- TNO, Research Group Microbiology & Systems Biology, Zeist, The Netherlands.
| | - Alain J van Gool
- TNO, Research Group Microbiology & Systems Biology, Zeist, The Netherlands. .,Department of Laboratory Medicine, Radboud University Nijmegen Medical Center, Nijmegen, The Netherlands. .,Faculty of Physics, Mathematics and Informatics, Radboud University Nijmegen, Nijmegen, The Netherlands.
| | - Marijana Radonjic
- TNO, Research Group Microbiology & Systems Biology, Zeist, The Netherlands. .,Current address: EdgeLeap B.V, Utrecht, The Netherlands.
| |
Collapse
|
14
|
Abstract
The past decade has seen a dramatic expansion in the number and range of techniques available to obtain genome-wide information and to analyze this information so as to infer both the functions of individual molecules and how they interact to modulate the behavior of biological systems. Here, we review these techniques, focusing on the construction of physical protein-protein interaction networks, and highlighting approaches that incorporate protein structure, which is becoming an increasingly important component of systems-level computational techniques. We also discuss how network analyses are being applied to enhance our basic understanding of biological systems and their disregulation, as well as how these networks are being used in drug development.
Collapse
Affiliation(s)
- Donald Petrey
- Center for Computational Biology and Bioinformatics, Department of Systems Biology
| | | |
Collapse
|
15
|
Schramm SJ, Jayaswal V, Goel A, Li SS, Yang YH, Mann GJ, Wilkins MR. Molecular interaction networks for the analysis of human disease: utility, limitations, and considerations. Proteomics 2014; 13:3393-405. [PMID: 24166987 DOI: 10.1002/pmic.201200570] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 09/11/2013] [Accepted: 10/07/2013] [Indexed: 01/01/2023]
Abstract
High-throughput '-omics' data can be combined with large-scale molecular interaction networks, for example, protein-protein interaction networks, to provide a unique framework for the investigation of human molecular biology. Interest in these integrative '-omics' methods is growing rapidly because of their potential to understand complexity and association with disease; such approaches have a focus on associations between phenotype and "network-type." The potential of this research is enticing, yet there remain a series of important considerations. Here, we discuss interaction data selection, data quality, the relative merits of using data from large high-throughput studies versus a meta-database of smaller literature-curated studies, and possible issues of sociological or inspection bias in interaction data. Other work underway, especially international consortia to establish data formats, quality standards and address data redundancy, and the improvements these efforts are making to the field, is also evaluated. We present options for researchers intending to use large-scale molecular interaction networks as a functional context for protein or gene expression data, including microRNAs, especially in the context of human disease.
Collapse
Affiliation(s)
- Sarah-Jane Schramm
- Sydney Medical School, Westmead Millennium Institute for Medical Research, The University of Sydney, Sydney, NSW, Australia; Melanoma Institute Australia, Sydney, NSW, Australia
| | | | | | | | | | | | | |
Collapse
|
16
|
Keseler IM, Skrzypek M, Weerasinghe D, Chen AY, Fulcher C, Li GW, Lemmer KC, Mladinich KM, Chow ED, Sherlock G, Karp PD. Curation accuracy of model organism databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau058. [PMID: 24923819 PMCID: PMC4207230 DOI: 10.1093/database/bau058] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Manual extraction of information from the biomedical literature-or biocuration-is the central methodology used to construct many biological databases. For example, the UniProt protein database, the EcoCyc Escherichia coli database and the Candida Genome Database (CGD) are all based on biocuration. Biological databases are used extensively by life science researchers, as online encyclopedias, as aids in the interpretation of new experimental data and as golden standards for the development of new bioinformatics algorithms. Although manual curation has been assumed to be highly accurate, we are aware of only one previous study of biocuration accuracy. We assessed the accuracy of EcoCyc and CGD by manually selecting curated assertions within randomly chosen EcoCyc and CGD gene pages and by then validating that the data found in the referenced publications supported those assertions. A database assertion is considered to be in error if that assertion could not be found in the publication cited for that assertion. We identified 10 errors in the 633 facts that we validated across the two databases, for an overall error rate of 1.58%, and individual error rates of 1.82% for CGD and 1.40% for EcoCyc. These data suggest that manual curation of the experimental literature by Ph.D-level scientists is highly accurate. Database URL: http://ecocyc.org/, http://www.candidagenome.org//
Collapse
Affiliation(s)
- Ingrid M Keseler
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Marek Skrzypek
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Deepika Weerasinghe
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Albert Y Chen
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Carol Fulcher
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Gene-Wei Li
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Kimberly C Lemmer
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Katherine M Mladinich
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Edmond D Chow
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Gavin Sherlock
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| | - Peter D Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, CA, USA, Department of Genetics, Stanford University, CA 94305, USA, Department of Bacteriology, University of Wisconsin, WI 53706-1521, USA, Department of Cellular and Molecular Pharmacology, University of California at San Francisco, CA 94158-2140, USA, DOE Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, WI 53726, USA and Department of Medical Microbiology and Immunology, University of Wisconsin, WI 53706-1521, USA
| |
Collapse
|
17
|
Lage K. Protein-protein interactions and genetic diseases: The interactome. Biochim Biophys Acta Mol Basis Dis 2014; 1842:1971-1980. [PMID: 24892209 DOI: 10.1016/j.bbadis.2014.05.028] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Revised: 05/07/2014] [Accepted: 05/24/2014] [Indexed: 12/27/2022]
Abstract
Protein-protein interactions mediate essentially all biological processes. Despite the quality of these data being widely questioned a decade ago, the reproducibility of large-scale protein interaction data is now much improved and there is little question that the latest screens are of high quality. Moreover, common data standards and coordinated curation practices between the databases that collect the interactions have made these valuable data available to a wide group of researchers. Here, I will review how protein-protein interactions are measured, collected and quality controlled. I discuss how the architecture of molecular protein networks has informed disease biology, and how these data are now being computationally integrated with the newest genomic technologies, in particular genome-wide association studies and exome-sequencing projects, to improve our understanding of molecular processes perturbed by genetics in human diseases. This article is part of a Special Issue entitled: From Genome to Function.
Collapse
Affiliation(s)
- Kasper Lage
- Department of Surgery and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; Harvard Medical School, Boston, MA, USA; The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
18
|
Rid R, Strasser W, Siegl D, Frech C, Kommenda M, Kern T, Hintner H, Bauer JW, Önder K. PRIMOS: an integrated database of reassessed protein-protein interactions providing web-based access to in silico validation of experimentally derived data. Assay Drug Dev Technol 2014; 11:333-46. [PMID: 23772554 DOI: 10.1089/adt.2013.506] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Steady improvements in proteomics present a bioinformatic challenge to retrieve, store, and process the accumulating and often redundant amount of information. In particular, a large-scale comparison and analysis of protein-protein interaction (PPI) data requires tools for data interpretation as well as validation. At this juncture, the Protein Interaction and Molecule Search (PRIMOS) platform represents a novel web portal that unifies six primary PPI databases (BIND, Biomolecular Interaction Network Database; DIP, Database of Interacting Proteins; HPRD, Human Protein Reference Database; IntAct; MINT, Molecular Interaction Database; and MIPS, Munich Information Center for Protein Sequences) into a single consistent repository, which currently includes more than 196,700 redundancy-removed PPIs. PRIMOS supports three advanced search strategies centering on disease-relevant PPIs, on inter- and intra-organismal crosstalk relations (e.g., pathogen-host interactions), and on highly connected protein nodes analysis ("hub" identification). The main novelties distinguishing PRIMOS from other secondary PPI databases are the reassessment of known PPIs, and the capacity to validate personal experimental data by our peer-reviewed, homology-based validation. This article focuses on definite PRIMOS use cases (presentation of embedded biological concepts, example applications) to demonstrate its broad functionality and practical value. PRIMOS is publicly available at http://primos.fh-hagenberg.at.
Collapse
Affiliation(s)
- Raphaela Rid
- Division of Molecular Dermatology, Department of Dermatology, Paracelsus Medical University Salzburg, Salzburg, Austria
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Turinsky AL, Razick S, Turner B, Donaldson IM, Wodak SJ. Navigating the global protein-protein interaction landscape using iRefWeb. Methods Mol Biol 2014; 1091:315-31. [PMID: 24203342 DOI: 10.1007/978-1-62703-691-7_22] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
iRefWeb is a bioinformatics resource that offers access to a large collection of data on protein-protein interactions in over a thousand organisms. This collection is consolidated from 14 major public databases that curate the scientific literature. The collection is enhanced with a range of versatile data filters and search options that categorize various types of protein-protein interactions and protein complexes. Users of iRefWeb are able to retrieve all curated interactions for a given organism or those involving a given protein (or a list of proteins), narrow down their search results based on different supporting evidence, and assess the reliability of these interactions using various criteria. They may also examine all data and annotations related to any publication that described the interaction-detection experiments. iRefWeb is freely available to the research community worldwide at http://wodaklab.org/iRefWeb .
Collapse
Affiliation(s)
- Andrei L Turinsky
- Molecular Structure and Function program, Hospital for Sick Children, Toronto, ON, Canada
| | | | | | | | | |
Collapse
|
20
|
Wodak SJ, Vlasblom J, Turinsky AL, Pu S. Protein–protein interaction networks: the puzzling riches. Curr Opin Struct Biol 2013; 23:941-53. [DOI: 10.1016/j.sbi.2013.08.002] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 07/14/2013] [Accepted: 08/08/2013] [Indexed: 12/13/2022]
|
21
|
Mosca R, Pons T, Céol A, Valencia A, Aloy P. Towards a detailed atlas of protein–protein interactions. Curr Opin Struct Biol 2013; 23:929-40. [PMID: 23896349 DOI: 10.1016/j.sbi.2013.07.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2013] [Revised: 07/04/2013] [Accepted: 07/08/2013] [Indexed: 12/30/2022]
|
22
|
Klapa MI, Tsafou K, Theodoridis E, Tsakalidis A, Moschonas NK. Reconstruction of the experimentally supported human protein interactome: what can we learn? BMC SYSTEMS BIOLOGY 2013; 7:96. [PMID: 24088582 PMCID: PMC4015887 DOI: 10.1186/1752-0509-7-96] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Accepted: 09/25/2013] [Indexed: 02/02/2023]
Abstract
BACKGROUND Understanding the topology and dynamics of the human protein-protein interaction (PPI) network will significantly contribute to biomedical research, therefore its systematic reconstruction is required. Several meta-databases integrate source PPI datasets, but the protein node sets of their networks vary depending on the PPI data combined. Due to this inherent heterogeneity, the way in which the human PPI network expands via multiple dataset integration has not been comprehensively analyzed. We aim at assembling the human interactome in a global structured way and exploring it to gain insights of biological relevance. RESULTS First, we defined the UniProtKB manually reviewed human "complete" proteome as the reference protein-node set and then we mined five major source PPI datasets for direct PPIs exclusively between the reference proteins. We updated the protein and publication identifiers and normalized all PPIs to the UniProt identifier level. The reconstructed interactome covers approximately 60% of the human proteome and has a scale-free structure. No apparent differentiating gene functional classification characteristics were identified for the unrepresented proteins. The source dataset integration augments the network mainly in PPIs. Polyubiquitin emerged as the highest-degree node, but the inclusion of most of its identified PPIs may be reconsidered. The high number (>300) of connections of the subsequent fifteen proteins correlates well with their essential biological role. According to the power-law network structure, the unrepresented proteins should mainly have up to four connections with equally poorly-connected interactors. CONCLUSIONS Reconstructing the human interactome based on the a priori definition of the protein nodes enabled us to identify the currently included part of the human "complete" proteome, and discuss the role of the proteins within the network topology with respect to their function. As the network expansion has to comply with the scale-free theory, we suggest that the core of the human interactome has essentially emerged. Thus, it could be employed in systems biology and biomedical research, despite the considerable number of currently unrepresented proteins. The latter are probably involved in specialized physiological conditions, justifying the scarcity of related PPI information, and their identification can assist in designing relevant functional experiments and targeted text mining algorithms.
Collapse
Affiliation(s)
- Maria I Klapa
- Department of General Biology, School of Medicine, University of Patras, Rio, Patras, Greece.
| | | | | | | | | |
Collapse
|
23
|
Wang X, Thijssen B, Yu H. Target essentiality and centrality characterize drug side effects. PLoS Comput Biol 2013; 9:e1003119. [PMID: 23874169 PMCID: PMC3708859 DOI: 10.1371/journal.pcbi.1003119] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2012] [Accepted: 05/15/2013] [Indexed: 01/19/2023] Open
Abstract
To investigate factors contributing to drug side effects, we systematically examine relationships between 4,199 side effects associated with 996 drugs and their 647 human protein targets. We find that it is the number of essential targets, not the number of total targets, that determines the side effects of corresponding drugs. Furthermore, within the context of a three-dimensional interaction network with atomic-resolution interaction interfaces, we find that drugs causing more side effects are also characterized by high degree and betweenness of their targets and highly shared interaction interfaces on these targets. Our findings suggest that both essentiality and centrality of a drug target are key factors contributing to side effects and should be taken into consideration in rational drug design. The ultimate goal of medical research is to develop effective treatments for disease with minimal side effects. Currently, about 20% of drug candidates failed at clinical trial phases II and III due to safety issues. Therefore, understanding the determining factors of drug side effects is of paramount importance to human health and the pharmaceutical industry. Here, we present the first systematic study to uncover key factors leading to drug side effects within the framework of the human protein interactome network. Our results show that it is the number of essential targets, not the number of total targets, of a drug that determines the occurrence of its side effects. Furthermore, we find that the centrality, both degree and betweenness, of the drug targets is also an important determining factor of drug side effects. Our findings will shed light on new factors to be incorporated into the drug development pipeline.
Collapse
Affiliation(s)
- Xiujuan Wang
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
| | - Bram Thijssen
- Department of Bioinformatics, Maastricht University, Maastricht, The Netherlands
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail:
| |
Collapse
|
24
|
Chen WM, Danziger SA, Chiang JH, Aitchison JD. PhosphoChain: a novel algorithm to predict kinase and phosphatase networks from high-throughput expression data. ACTA ACUST UNITED AC 2013; 29:2435-44. [PMID: 23832245 DOI: 10.1093/bioinformatics/btt387] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION Protein phosphorylation is critical for regulating cellular activities by controlling protein activities, localization and turnover, and by transmitting information within cells through signaling networks. However, predictions of protein phosphorylation and signaling networks remain a significant challenge, lagging behind predictions of transcriptional regulatory networks into which they often feed. RESULTS We developed PhosphoChain to predict kinases, phosphatases and chains of phosphorylation events in signaling networks by combining mRNA expression levels of regulators and targets with a motif detection algorithm and optional prior information. PhosphoChain correctly reconstructed ∼78% of the yeast mitogen-activated protein kinase pathway from publicly available data. When tested on yeast phosphoproteomic data from large-scale mass spectrometry experiments, PhosphoChain correctly identified ∼27% more phosphorylation sites than existing motif detection tools (NetPhosYeast and GPS2.0), and predictions of kinase-phosphatase interactions overlapped with ∼59% of known interactions present in yeast databases. PhosphoChain provides a valuable framework for predicting condition-specific phosphorylation events from high-throughput data. AVAILABILITY PhosphoChain is implemented in Java and available at http://virgo.csie.ncku.edu.tw/PhosphoChain/ or http://aitchisonlab.com/PhosphoChain
Collapse
Affiliation(s)
- Wei-Ming Chen
- Institute for Systems Biology, Seattle, WA 98109-5234, USA, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan and Seattle Biomedical Research Institute, Seattle, WA 98109-5219, USA
| | | | | | | |
Collapse
|
25
|
Abstract
UNLABELLED Protein interaction networks are important for the understanding of regulatory mechanisms, for the explanation of experimental data and for the prediction of protein functions. Unfortunately, most interaction data is available only for model organisms. As a possible remedy, the transfer of interactions to organisms of interest is common practice, but it is not clear when interactions can be transferred from one organism to another and, thus, the confidence in the derived interactions is low. Here, we propose to use a rich set of features to train Random Forests in order to score transferred interactions. We evaluated the transfer from a range of eukaryotic organisms to S. cerevisiae using orthologs. Directly transferred interactions to S. cerevisiae are on average only 24% consistent with the current S. cerevisiae interaction network. By using commonly applied filter approaches the transfer precision can be improved, but at the cost of a large decrease in the number of transferred interactions. Our Random Forest approach uses various features derived from both the target and the source network as well as the ortholog annotations to assign confidence values to transferred interactions. Thereby, we could increase the average transfer consistency to 85%, while still transferring almost 70% of all correctly transferable interactions. We tested our approach for the transfer of interactions to other species and showed that our approach outperforms competing methods for the transfer of interactions to species where no experimental knowledge is available. Finally, we applied our predictor to score transferred interactions to 83 targets species and we were able to extend the available interactome of B. taurus, M. musculus and G. gallus with over 40,000 interactions each. Our transferred interaction networks are publicly available via our web interface, which allows to inspect and download transferred interaction sets of different sizes, for various species, and at specified expected precision levels. AVAILABILITY http://services.bio.ifi.lmu.de/coin-db/.
Collapse
Affiliation(s)
- Robert Pesch
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
- * E-mail:
| | - Ralf Zimmer
- Institute for Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
26
|
iPPI-DB: a manually curated and interactive database of small non-peptide inhibitors of protein-protein interactions. Drug Discov Today 2013; 18:958-68. [PMID: 23688585 DOI: 10.1016/j.drudis.2013.05.003] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Revised: 05/06/2013] [Accepted: 05/10/2013] [Indexed: 01/05/2023]
Abstract
The development of small molecule drugs targeting protein-protein interactions (PPI) represents a major challenge, in part owing to the misunderstanding of the PPI chemical space. To this end, we have manually collected the structures, the physicochemical and pharmacological profiles of 1650 PPI inhibitors across 13 families of PPI targets in a database named iPPI-DB. To access iPPI-DB, we propose a user-friendly web application (www.ippidb.cdithem.fr) with customizable queries and intuitive visualizing functionalities for associated properties of the compounds. This could assist scientists to design the next generation of PPI drugs. In this review, we describe iPPI-DB in the context of other low molecular weight molecule databases.
Collapse
|
27
|
Maynard SM, Mungall CJ, Lewis SE, Imam FT, Martone ME. A knowledge based approach to matching human neurodegenerative disease and animal models. Front Neuroinform 2013; 7:7. [PMID: 23717278 PMCID: PMC3653101 DOI: 10.3389/fninf.2013.00007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Accepted: 04/09/2013] [Indexed: 12/19/2022] Open
Abstract
Neurodegenerative diseases present a wide and complex range of biological and clinical features. Animal models are key to translational research, yet typically only exhibit a subset of disease features rather than being precise replicas of the disease. Consequently, connecting animal to human conditions using direct data-mining strategies has proven challenging, particularly for diseases of the nervous system, with its complicated anatomy and physiology. To address this challenge we have explored the use of ontologies to create formal descriptions of structural phenotypes across scales that are machine processable and amenable to logical inference. As proof of concept, we built a Neurodegenerative Disease Phenotype Ontology (NDPO) and an associated Phenotype Knowledge Base (PKB) using an entity-quality model that incorporates descriptions for both human disease phenotypes and those of animal models. Entities are drawn from community ontologies made available through the Neuroscience Information Framework (NIF) and qualities are drawn from the Phenotype and Trait Ontology (PATO). We generated ~1200 structured phenotype statements describing structural alterations at the subcellular, cellular and gross anatomical levels observed in 11 human neurodegenerative conditions and associated animal models. PhenoSim, an open source tool for comparing phenotypes, was used to issue a series of competency questions to compare individual phenotypes among organisms and to determine which animal models recapitulate phenotypic aspects of the human disease in aggregate. Overall, the system was able to use relationships within the ontology to bridge phenotypes across scales, returning non-trivial matches based on common subsumers that were meaningful to a neuroscientist with an advanced knowledge of neuroanatomy. The system can be used both to compare individual phenotypes and also phenotypes in aggregate. This proof of concept suggests that expressing complex phenotypes using formal ontologies provides considerable benefit for comparing phenotypes across scales and species.
Collapse
Affiliation(s)
- Sarah M Maynard
- Department of Neurosciences, Center for Research in Biological Systems, University of California San Diego, San Diego, CA, USA
| | | | | | | | | |
Collapse
|
28
|
Neves M, Damaschun A, Mah N, Lekschas F, Seltmann S, Stachelscheid H, Fontaine JF, Kurtz A, Leser U. Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat020. [PMID: 23599415 PMCID: PMC3629873 DOI: 10.1093/database/bat020] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/
Collapse
Affiliation(s)
- Mariana Neves
- Humboldt-Universität zu Berlin, Knowledge Management in Bioinformatics, Berlin, 10099, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
A survey of protein interaction data and multigenic inherited disorders. BMC Bioinformatics 2013; 14:47. [PMID: 23398688 PMCID: PMC3598893 DOI: 10.1186/1471-2105-14-47] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2012] [Accepted: 02/05/2013] [Indexed: 11/15/2022] Open
Abstract
Background Multigenic diseases are often associated with protein complexes or interactions involved in the same pathway. We wanted to estimate to what extent this is true given a consolidated protein interaction data set. The study stresses data integration and data representation issues. Results We constructed 497 multigenic disease groups from OMIM and tested for overlaps with interaction and pathway data. A total of 159 disease groups had significant overlaps with protein interaction data consolidated by iRefIndex. A further 68 disease overlaps were found only in the KEGG pathway database. No single database contained all significant overlaps thus stressing the importance of data integration. We also found that disease groups overlapped with all three interaction data types: n-ary, spoke-represented complexes and binary data – thus stressing the importance of considering each of these data types separately. Conclusions Almost half of our multigenic disease groups could potentially be explained by protein complexes and pathways. However, the fact that no database or data type was able to cover all disease groups suggests that no single database has systematically covered all disease groups for potential related complex and pathway data. This survey provides a basis for further curation efforts to confirm and search for overlaps between diseases and interaction data. The accompanying R script can be used to reproduce the work and track progress in this area as databases change. Disease group overlaps can be further explored using the iRefscape plugin for Cytoscape.
Collapse
|
30
|
Zhang QC, Petrey D, Garzón JI, Deng L, Honig B. PrePPI: a structure-informed database of protein-protein interactions. Nucleic Acids Res 2013; 41:D828-33. [PMID: 23193263 PMCID: PMC3531098 DOI: 10.1093/nar/gks1231] [Citation(s) in RCA: 192] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PrePPI (http://bhapp.c2b2.columbia.edu/PrePPI) is a database that combines predicted and experimentally determined protein-protein interactions (PPIs) using a Bayesian framework. Predicted interactions are assigned probabilities of being correct, which are derived from calculated likelihood ratios (LRs) by combining structural, functional, evolutionary and expression information, with the most important contribution coming from structure. Experimentally determined interactions are compiled from a set of public databases that manually collect PPIs from the literature and are also assigned LRs. A final probability is then assigned to every interaction by combining the LRs for both predicted and experimentally determined interactions. The current version of PrePPI contains ∼2 million PPIs that have a probability more than ∼0.1 of which ∼60 000 PPIs for yeast and ∼370 000 PPIs for human are considered high confidence (probability > 0.5). The PrePPI database constitutes an integrated resource that enables users to examine aggregate information on PPIs, including both known and potentially novel interactions, and that provides structural models for many of the PPIs.
Collapse
Affiliation(s)
- Qiangfeng Cliff Zhang
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - José Ignacio Garzón
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Lei Deng
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia Initiative in Systems Biology, Columbia University, New York, NY 10032, USA and School of Software, Central South University, Changsha 410083, China
- *To whom correspondence should be addressed. Tel: +1 212 851 4651; Fax: +1 212 851 4650,
| |
Collapse
|
31
|
Bosley AD, Das S, Andresson T. A Role for Protein–Protein Interaction Networks in the Identification and Characterization of Potential Biomarkers. PROTEOMIC AND METABOLOMIC APPROACHES TO BIOMARKER DISCOVERY 2013:333-347. [DOI: 10.1016/b978-0-12-394446-7.00021-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
32
|
Armean IM, Lilley KS, Trotter MWB. Popular computational methods to assess multiprotein complexes derived from label-free affinity purification and mass spectrometry (AP-MS) experiments. Mol Cell Proteomics 2012; 12:1-13. [PMID: 23071097 DOI: 10.1074/mcp.r112.019554] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Advances in sensitivity, resolution, mass accuracy, and throughput have considerably increased the number of protein identifications made via mass spectrometry. Despite these advances, state-of-the-art experimental methods for the study of protein-protein interactions yield more candidate interactions than may be expected biologically owing to biases and limitations in the experimental methodology. In silico methods, which distinguish between true and false interactions, have been developed and applied successfully to reduce the number of false positive results yielded by physical interaction assays. Such methods may be grouped according to: (1) the type of data used: methods based on experiment-specific measurements (e.g., spectral counts or identification scores) versus methods that extract knowledge encoded in external annotations (e.g., public interaction and functional categorisation databases); (2) the type of algorithm applied: the statistical description and estimation of physical protein properties versus predictive supervised machine learning or text-mining algorithms; (3) the type of protein relation evaluated: direct (binary) interaction of two proteins in a cocomplex versus probability of any functional relationship between two proteins (e.g., co-occurrence in a pathway, sub cellular compartment); and (4) initial motivation: elucidation of experimental data by evaluation versus prediction of novel protein-protein interaction, to be experimentally validated a posteriori. This work reviews several popular computational scoring methods and software platforms for protein-protein interactions evaluation according to their methodology, comparative strengths and weaknesses, data representation, accessibility, and availability. The scoring methods and platforms described include: CompPASS, SAINT, Decontaminator, MINT, IntAct, STRING, and FunCoup. References to related work are provided throughout in order to provide a concise but thorough introduction to a rapidly growing interdisciplinary field of investigation.
Collapse
Affiliation(s)
- Irina M Armean
- Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | | | | |
Collapse
|
33
|
Becnel LB, McKenna NJ. Minireview: progress and challenges in proteomics data management, sharing, and integration. Mol Endocrinol 2012; 26:1660-74. [PMID: 22902541 DOI: 10.1210/me.2012-1180] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The proteome represents the identity, expression levels, interacting partners, and posttranslational modifications of proteins expressed within any given cell. Proteomic studies aim to census the quantitative and qualitative factors regulating the biological relationships of proteins acting in concert as functional cellular networks. In the field of endocrinology, proteomics has been of considerable value in determining the function and mechanism of action of endocrine signaling molecules in the cell membrane, cytoplasm, and nucleus and for the discovery of proteins as candidates for clinical biomarkers. The volume of data that can be generated by proteomics methodologies, up to gigabytes of data within a few hours, brings with it its own logistical hurdles and presents significant challenges to realizing the full potential of these datasets. In this minireview, we describe selected current proteomics methodologies and their application in basic and translational endocrinology before focusing on mass spectrometry as a model for current progress and challenges in data analysis, management, sharing, and integration.
Collapse
Affiliation(s)
- Lauren B Becnel
- Department of Medicine, Hematology and Oncology, Baylor College of Medicine, 1 Baylor Plaza MS-BCM305, Houston, Texas 77030, USA.
| | | |
Collapse
|
34
|
Das J, Yu H. HINT: High-quality protein interactomes and their applications in understanding human disease. BMC SYSTEMS BIOLOGY 2012; 6:92. [PMID: 22846459 PMCID: PMC3483187 DOI: 10.1186/1752-0509-6-92] [Citation(s) in RCA: 313] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 06/30/2012] [Indexed: 12/22/2022]
Abstract
Background A global map of protein-protein interactions in cellular systems provides key insights into the workings of an organism. A repository of well-validated high-quality protein-protein interactions can be used in both large- and small-scale studies to generate and validate a wide range of functional hypotheses. Results We develop HINT (http://hint.yulab.org) - a database of high-quality protein-protein interactomes for human, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Oryza sativa. These were collected from several databases and filtered both systematically and manually to remove low-quality/erroneous interactions. The resulting datasets are classified by type (binary physical interactions vs. co-complex associations) and data source (high-throughput systematic setups vs. literature-curated small-scale experiments). We find strong sociological sampling biases in literature-curated datasets of small-scale interactions. An interactome without such sampling biases was used to understand network properties of human disease-genes - hubs are unlikely to cause disease, but if they do, they usually cause multiple disorders. Conclusions HINT is of significant interest to researchers in all fields of biology as it addresses the ubiquitous need of having a repository of high-quality protein-protein interactions. These datasets can be utilized to generate specific hypotheses about specific proteins and/or pathways, as well as analyzing global properties of cellular networks. HINT will be regularly updated and all versions will be tracked.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA.
| | | |
Collapse
|
35
|
Orchard S, Kerrien S, Abbani S, Aranda B, Bhate J, Bidwell S, Bridge A, Briganti L, Brinkman FSL, Brinkman F, Cesareni G, Chatr-aryamontri A, Chautard E, Chen C, Dumousseau M, Goll J, Hancock REW, Hancock R, Hannick LI, Jurisica I, Khadake J, Lynn DJ, Mahadevan U, Perfetto L, Raghunath A, Ricard-Blum S, Roechert B, Salwinski L, Stümpflen V, Tyers M, Uetz P, Xenarios I, Hermjakob H. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat Methods 2012; 9:345-50. [PMID: 22453911 DOI: 10.1038/nmeth.1931] [Citation(s) in RCA: 402] [Impact Index Per Article: 30.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
The International Molecular Exchange (IMEx) consortium is an international collaboration between major public interaction data providers to share literature-curation efforts and make a nonredundant set of protein interactions available in a single search interface on a common website (http://www.imexconsortium.org/). Common curation rules have been developed, and a central registry is used to manage the selection of articles to enter into the dataset. We discuss the advantages of such a service to the user, our quality-control measures and our data-distribution practices.
Collapse
Affiliation(s)
- Sandra Orchard
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Gingras AC, Raught B. Beyond hairballs: The use of quantitative mass spectrometry data to understand protein-protein interactions. FEBS Lett 2012; 586:2723-31. [PMID: 22710165 DOI: 10.1016/j.febslet.2012.03.065] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2012] [Revised: 03/30/2012] [Accepted: 03/30/2012] [Indexed: 10/28/2022]
Abstract
The past 10 years have witnessed a dramatic proliferation in the availability of protein interaction data. However, for interaction mapping based on affinity purification coupled with mass spectrometry (AP-MS), there is a wealth of information present in the datasets that often goes unrecorded in public repositories, and as such remains largely unexplored. Further, how this type of data is represented and used by bioinformaticians has not been well established. Here, we point out some common mistakes in how AP-MS data are handled, and describe how protein complex organization and interaction dynamics can be inferred using quantitative AP-MS approaches.
Collapse
Affiliation(s)
- Anne-Claude Gingras
- Centre for Systems Biology, Samuel Lunenfeld Research Institute at Mount Sinai Hospital, Department of Molecular Genetics, University of Toronto, Canada.
| | | |
Collapse
|
37
|
Wang X, Wei X, Thijssen B, Das J, Lipkin SM, Yu H. Three-dimensional reconstruction of protein networks provides insight into human genetic disease. Nat Biotechnol 2012; 30:159-64. [PMID: 22252508 DOI: 10.1038/nbt.2106] [Citation(s) in RCA: 290] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 12/19/2011] [Indexed: 01/13/2023]
Abstract
To better understand the molecular mechanisms and genetic basis of human disease, we systematically examine relationships between 3,949 genes, 62,663 mutations and 3,453 associated disorders by generating a three-dimensional, structurally resolved human interactome. This network consists of 4,222 high-quality binary protein-protein interactions with their atomic-resolution interfaces. We find that in-frame mutations (missense point mutations and in-frame insertions and deletions) are enriched on the interaction interfaces of proteins associated with the corresponding disorders, and that the disease specificity for different mutations of the same gene can be explained by their location within an interface. We also predict 292 candidate genes for 694 unknown disease-to-gene associations with proposed molecular mechanism hypotheses. This work indicates that knowledge of how in-frame disease mutations alter specific interactions is critical to understanding pathogenesis. Structurally resolved interaction networks should be valuable tools for interpreting the wealth of data being generated by large-scale structural genomics and disease association studies.
Collapse
Affiliation(s)
- Xiujuan Wang
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | | | | | | | | | | |
Collapse
|
38
|
Kwofie SK, Schaefer U, Sundararajan VS, Bajic VB, Christoffels A. HCVpro: Hepatitis C virus protein interaction database. INFECTION GENETICS AND EVOLUTION 2011; 11:1971-7. [PMID: 21930248 DOI: 10.1016/j.meegid.2011.09.001] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2011] [Revised: 08/24/2011] [Accepted: 09/02/2011] [Indexed: 02/07/2023]
|
39
|
Mora A, Donaldson IM. iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database. BMC Bioinformatics 2011; 12:455. [PMID: 22115179 PMCID: PMC3282787 DOI: 10.1186/1471-2105-12-455] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 11/24/2011] [Indexed: 11/19/2022] Open
Abstract
Background The iRefIndex addresses the need to consolidate protein interaction data into a single uniform data resource. iRefR provides the user with access to this data source from an R environment. Results The iRefR package includes tools for selecting specific subsets of interest from the iRefIndex by criteria such as organism, source database, experimental method, protein accessions and publication identifier. Data may be converted between three representations (MITAB, edgeList and graph) for use with other R packages such as igraph, graph and RBGL. The user may choose between different methods for resolving redundancies in interaction data and how n-ary data is represented. In addition, we describe a function to identify binary interaction records that possibly represent protein complexes. We show that the user choice of data selection, redundancy resolution and n-ary data representation all have an impact on graphical analysis. Conclusions The package allows the user to control how these issues are dealt with and communicate them via an R-script written using the iRefR package - this will facilitate communication of methods, reproducibility of network analyses and further modification and comparison of methods by researchers.
Collapse
Affiliation(s)
- Antonio Mora
- Department for Molecular Biosciences, University of Oslo, P,O, Box 1041 Blindern, 0316 Oslo, Norway
| | | |
Collapse
|
40
|
Razick S, Mora A, Michalickova K, Boddie P, Donaldson IM. iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex. BMC Bioinformatics 2011; 12:388. [PMID: 21975162 PMCID: PMC3228863 DOI: 10.1186/1471-2105-12-388] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2011] [Accepted: 10/05/2011] [Indexed: 11/10/2022] Open
Abstract
Background The iRefIndex consolidates protein interaction data from ten databases in a rigorous manner using sequence-based hash keys. Working with consolidated interaction data comes with distinct challenges: data are redundant, overlapping, highly interconnected and may be collected and represented using different curation practices. These phenomena were quantified in our previous studies. Results The iRefScape plug-in for the Cytoscape graphical viewer addresses these challenges. We show how these factors impact on data-mining tasks and how our solutions resolve them in a simple and efficient manner. A uniform accession space is used to limit redundancy and support search expansion and searching on multiple accession types. Multiple node and edge features support data filtering and mining. Node colours and features supply information about search result provenance. Overlapping evidence is presented using a multi-graph and a bi-partite representation is used to distinguish binary and n-ary source data. Searching for interactions between sets of proteins is supported and specifically includes searches on disease-related genes found in OMIM. Finally, a synchronized adjacency-matrix view facilitates visualization of relationships between sets of user defined groups. Conclusions The iRefScape plug-in will be of interest to advanced users of interaction data. The plug-in provides access to a consolidated data set in a uniform accession space while remaining faithful to the underlying source data. Tools are provided to facilitate a range of tasks from a simple search to knowledge discovery. The plug-in uses a number of strategies that will be of interest to other plug-in developers.
Collapse
Affiliation(s)
- Sabry Razick
- The Biotechnology Centre of Oslo, University of Oslo, P,O, Box 1125 Blindern, 0317 Oslo, Norway
| | | | | | | | | |
Collapse
|
41
|
|
42
|
Stojmirović A, Yu YK. ppiTrim: constructing non-redundant and up-to-date interactomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:bar036. [PMID: 21873645 PMCID: PMC3162744 DOI: 10.1093/database/bar036] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Robust advances in interactome analysis demand comprehensive, non-redundant and consistently annotated data sets. By non-redundant, we mean that the accounting of evidence for every interaction should be faithful: each independent experimental support is counted exactly once, no more, no less. While many interactions are shared among public repositories, none of them contains the complete known interactome for any model organism. In addition, the annotations of the same experimental result by different repositories often disagree. This brings up the issue of which annotation to keep while consolidating evidences that are the same. The iRefIndex database, including interactions from most popular repositories with a standardized protein nomenclature, represents a significant advance in all aspects, especially in comprehensiveness. However, iRefIndex aims to maintain all information/annotation from original sources and requires users to perform additional processing to fully achieve the aforementioned goals. Another issue has to do with protein complexes. Some databases represent experimentally observed complexes as interactions with more than two participants, while others expand them into binary interactions using spoke or matrix model. To avoid untested interaction information buildup, it is preferable to replace the expanded protein complexes, either from spoke or matrix models, with a flat list of complex members. To address these issues and to achieve our goals, we have developed ppiTrim, a script that processes iRefIndex to produce non-redundant, consistently annotated data sets of physical interactions. Our script proceeds in three stages: mapping all interactants to gene identifiers and removing all undesired raw interactions, deflating potentially expanded complexes, and reconciling for each interaction the annotation labels among different source databases. As an illustration, we have processed the three largest organismal data sets: yeast, human and fruitfly. While ppiTrim can resolve most apparent conflicts between different labelings, we also discovered some unresolvable disagreements mostly resulting from different annotation policies among repositories. Database URL:http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/ppiTrim.html
Collapse
Affiliation(s)
- Aleksandar Stojmirović
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
43
|
Lopes TJS, Schaefer M, Shoemaker J, Matsuoka Y, Fontaine JF, Neumann G, Andrade-Navarro MA, Kawaoka Y, Kitano H. Tissue-specific subnetworks and characteristics of publicly available human protein interaction databases. ACTA ACUST UNITED AC 2011; 27:2414-21. [PMID: 21798963 DOI: 10.1093/bioinformatics/btr414] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Protein-protein interaction (PPI) databases are widely used tools to study cellular pathways and networks; however, there are several databases available that still do not account for cell type-specific differences. Here, we evaluated the characteristics of six interaction databases, incorporated tissue-specific gene expression information and finally, investigated if the most popular proteins of scientific literature are involved in good quality interactions. RESULTS We found that the evaluated databases are comparable in terms of node connectivity (i.e. proteins with few interaction partners also have few interaction partners in other databases), but may differ in the identity of interaction partners. We also observed that the incorporation of tissue-specific expression information significantly altered the interaction landscape and finally, we demonstrated that many of the most intensively studied proteins are engaged in interactions associated with low confidence scores. In summary, interaction databases are valuable research tools but may lead to different predictions on interactions or pathways. The accuracy of predictions can be improved by incorporating datasets on organ- and cell type-specific gene expression, and by obtaining additional interaction evidence for the most 'popular' proteins. CONTACT kitano@sbi.jp SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tiago J S Lopes
- JST ERATO KAWAOKA Infection-induced Host Responses Project, Tokyo, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Perrakis A, Musacchio A, Cusack S, Petosa C. Investigating a macromolecular complex: the toolkit of methods. J Struct Biol 2011; 175:106-12. [PMID: 21620973 DOI: 10.1016/j.jsb.2011.05.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2011] [Revised: 05/11/2011] [Accepted: 05/12/2011] [Indexed: 02/08/2023]
Abstract
Structural biologists studying macromolecular complexes spend considerable effort doing strictly "non-structural" work: investigating the physiological relevance and biochemical properties of a complex, preparing homogeneous samples for structural analysis, and experimentally validating structure-based hypotheses regarding function or mechanism. Familiarity with the diverse perspectives and techniques available for studying complexes helps in the critical assessment of non-structural data, expedites the pre-structural characterization of a complex and facilitates the investigation of function. Here we survey the approaches and techniques used to study macromolecular complexes from various viewpoints, including genetics, cell and molecular biology, biochemistry/biophysics, structural biology, and systems biology/bioinformatics. The aim of this overview is to heighten awareness of the diversity of perspectives and experimental tools available for investigating complexes and of their usefulness for the structural biologist.
Collapse
Affiliation(s)
- Anastassis Perrakis
- Department of Biochemistry, NKI, Plesmanlaan 121, 1066 CX, Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
45
|
Vidal M, Cusick ME, Barabási AL. Interactome networks and human disease. Cell 2011; 144:986-98. [PMID: 21414488 PMCID: PMC3102045 DOI: 10.1016/j.cell.2011.02.016] [Citation(s) in RCA: 1189] [Impact Index Per Article: 84.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 02/07/2011] [Accepted: 02/09/2011] [Indexed: 02/06/2023]
Abstract
Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease.
Collapse
Affiliation(s)
- Marc Vidal
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Michael E. Cusick
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Albert-László Barabási
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Center for Complex Network Research (CCNR) and Departments of Physics, Biology and Computer Science, Northeastern University, Boston, MA 02115, USA
- Department of Medicine, Brigham and Women s Hospital, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|