1
|
Shakibania T, Arabfard M, Najafi A. A predictive approach for host-pathogen interactions using deep learning and protein sequences. Virusdisease 2024; 35:434-445. [PMID: 39464732 PMCID: PMC11502655 DOI: 10.1007/s13337-024-00882-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 07/03/2024] [Indexed: 10/29/2024] Open
Abstract
Research on host-pathogen interactions (HPIs) has evolved rapidly during the past decades. The more humans discover new pathogens, the more challenging it gets to find a cure and prevent infections caused by those pathogens. Many experimental techniques have been proposed to predict the interactions but most of them are highly costly and time-consuming. Fortunately, computational methods have been proven to be efficient in overcoming such limitations. In this study, we propose utilizing Deep Learning methods to predict HPIs using protein sequences. We use the monoMonoKGap (mMKGap) algorithm with K = 2 to extract features from the sequences. We also used the Negatome Database to generate negative interactions. The proposed method was performed on three separate balanced human-pathogen datasets with 10-fold cross-validation. Our method yielded very high accuracies of 99.65%, 99.52%, and 99.66% (mean accuracy of 99.61%). To further evaluate the performance of the deep Network, we compared it with other classification methods, which were the Random Forest (RF) as multiple Decision Tree, the Support Vector Machine (SVM), and Convolutional Neural Network (CNN). We also tested the Dipeptide Composition algorithm as another feature extraction method to compare the results with the mMKGap method. The experimental results prove that the proposed method is very accurate, robust, and practical and could be used as a reliable framework in HPI research.
Collapse
Affiliation(s)
- Taha Shakibania
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Masoud Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
Martins YC, Ziviani A, Cerqueira e Costa MDO, Cavalcanti MCR, Nicolás MF, de Vasconcelos ATR. PPIntegrator: semantic integrative system for protein-protein interaction and application for host-pathogen datasets. BIOINFORMATICS ADVANCES 2023; 3:vbad067. [PMID: 37359724 PMCID: PMC10290227 DOI: 10.1093/bioadv/vbad067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 04/28/2023] [Accepted: 05/30/2023] [Indexed: 06/28/2023]
Abstract
Summary Semantic web standards have shown importance in the last 20 years in promoting data formalization and interlinking between the existing knowledge graphs. In this context, several ontologies and data integration initiatives have emerged in recent years for the biological area, such as the broadly used Gene Ontology that contains metadata to annotate gene function and subcellular location. Another important subject in the biological area is protein-protein interactions (PPIs) which have applications like protein function inference. Current PPI databases have heterogeneous exportation methods that challenge their integration and analysis. Presently, several initiatives of ontologies covering some concepts of the PPI domain are available to promote interoperability across datasets. However, the efforts to stimulate guidelines for automatic semantic data integration and analysis for PPIs in these datasets are limited. Here, we present PPIntegrator, a system that semantically describes data related to protein interactions. We also introduce an enrichment pipeline to generate, predict and validate new potential host-pathogen datasets by transitivity analysis. PPIntegrator contains a data preparation module to organize data from three reference databases and a triplification and data fusion module to describe the provenance information and results. This work provides an overview of the PPIntegrator system applied to integrate and compare host-pathogen PPI datasets from four bacterial species using our proposed transitivity analysis pipeline. We also demonstrated some critical queries to analyze this kind of data and highlight the importance and usage of the semantic data generated by our system. Availability and implementation https://github.com/YasCoMa/ppintegrator, https://github.com/YasCoMa/ppi_validation_process and https://github.com/YasCoMa/predprin.
Collapse
Affiliation(s)
- Yasmmin Côrtes Martins
- Bioinformatics Laboratory, National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | - Artur Ziviani
- Data Extreme Laboratory (DEXL), National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | | | | | - Marisa Fabiana Nicolás
- Bioinformatics Laboratory, National Laboratory for Scientific Computing, Petrópolis 25651-076, Brazil
| | | |
Collapse
|
3
|
Ibrahim AH, Karabulut OC, Karpuzcu BA, Türk E, Süzek BE. A correlation coefficient-based feature selection approach for virus-host protein-protein interaction prediction. PLoS One 2023; 18:e0285168. [PMID: 37130110 PMCID: PMC10153705 DOI: 10.1371/journal.pone.0285168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 04/17/2023] [Indexed: 05/03/2023] Open
Abstract
Prediction of virus-host protein-protein interactions (PPI) is a broad research area where various machine-learning-based classifiers are developed. Transforming biological data into machine-usable features is a preliminary step in constructing these virus-host PPI prediction tools. In this study, we have adopted a virus-host PPI dataset and a reduced amino acids alphabet to create tripeptide features and introduced a correlation coefficient-based feature selection. We applied feature selection across several correlation coefficient metrics and statistically tested their relevance in a structural context. We compared the performance of feature-selection models against that of the baseline virus-host PPI prediction models created using different classification algorithms without the feature selection. We also tested the performance of these baseline models against the previously available tools to ensure their predictive power is acceptable. Here, the Pearson coefficient provides the best performance with respect to the baseline model as measured by AUPR; a drop of 0.003 in AUPR while achieving a 73.3% (from 686 to 183) reduction in the number of tripeptides features for random forest. The results suggest our correlation coefficient-based feature selection approach, while decreasing the computation time and space complexity, has a limited impact on the prediction performance of virus-host PPI prediction tools.
Collapse
Affiliation(s)
- Ahmed Hassan Ibrahim
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
- Georgetown University Medical Center, Biochemistry and Molecular & Cellular Biology, Washington DC, United States of America
| |
Collapse
|
4
|
Khan T, Raza S. Exploration of Computational Aids for Effective Drug Designing and Management of Viral Diseases: A Comprehensive Review. Curr Top Med Chem 2023; 23:1640-1663. [PMID: 36725827 DOI: 10.2174/1568026623666230201144522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/14/2022] [Accepted: 12/19/2022] [Indexed: 02/03/2023]
Abstract
BACKGROUND Microbial diseases, specifically originating from viruses are the major cause of human mortality all over the world. The current COVID-19 pandemic is a case in point, where the dynamics of the viral-human interactions are still not completely understood, making its treatment a case of trial and error. Scientists are struggling to devise a strategy to contain the pandemic for over a year and this brings to light the lack of understanding of how the virus grows and multiplies in the human body. METHODS This paper presents the perspective of the authors on the applicability of computational tools for deep learning and understanding of host-microbe interaction, disease progression and management, drug resistance and immune modulation through in silico methodologies which can aid in effective and selective drug development. The paper has summarized advances in the last five years. The studies published and indexed in leading databases have been included in the review. RESULTS Computational systems biology works on an interface of biology and mathematics and intends to unravel the complex mechanisms between the biological systems and the inter and intra species dynamics using computational tools, and high-throughput technologies developed on algorithms, networks and complex connections to simulate cellular biological processes. CONCLUSION Computational strategies and modelling integrate and prioritize microbial-host interactions and may predict the conditions in which the fine-tuning attenuates. These microbial-host interactions and working mechanisms are important from the aspect of effective drug designing and fine- tuning the therapeutic interventions.
Collapse
Affiliation(s)
- Tahmeena Khan
- Department of Chemistry, Integral University, Lucknow, 226026, U.P., India
| | - Saman Raza
- Department of Chemistry, Isabella Thoburn College, Lucknow, 226007, U.P., India
| |
Collapse
|
5
|
Karpuzcu BA, Türk E, Ibrahim AH, Karabulut OC, Süzek BE. Machine Learning Methods for Virus-Host Protein-Protein Interaction Prediction. Methods Mol Biol 2023; 2690:401-417. [PMID: 37450162 DOI: 10.1007/978-1-0716-3327-4_31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2023]
Abstract
The attachment of a virion to a respective cellular receptor on the host organism occurring through the virus-host protein-protein interactions (PPIs) is a decisive step for viral pathogenicity and infectivity. Therefore, a vast number of wet-lab experimental techniques are used to study virus-host PPIs. Taking the great number and enormous variety of virus-host PPIs and the cost as well as labor of laboratory work, however, computational approaches toward analyzing the available interaction data and predicting previously unidentified interactions have been on the rise. Among them, machine-learning-based models are getting increasingly more attention with a great body of resources and tools proposed recently.In this chapter, we first provide the methodology with major steps toward the development of a virus-host PPI prediction tool. Next, we discuss the challenges involved and evaluate several existing machine-learning-based virus-host PPI prediction tools. Finally, we describe our experience with several ensemble techniques as utilized on available prediction results retrieved from individual PPI prediction tools. Overall, based on our experience, we recognize there is still room for the development of new individual and/or ensemble virus-host PPI prediction tools that leverage existing tools.
Collapse
Affiliation(s)
- Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Ahmad Hassan Ibrahim
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey.
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey.
| |
Collapse
|
6
|
Hu RS, Hesham AEL, Zou Q. Machine Learning and Its Applications for Protozoal Pathogens and Protozoal Infectious Diseases. Front Cell Infect Microbiol 2022; 12:882995. [PMID: 35573796 PMCID: PMC9097758 DOI: 10.3389/fcimb.2022.882995] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 03/28/2022] [Indexed: 12/24/2022] Open
Abstract
In recent years, massive attention has been attracted to the development and application of machine learning (ML) in the field of infectious diseases, not only serving as a catalyst for academic studies but also as a key means of detecting pathogenic microorganisms, implementing public health surveillance, exploring host-pathogen interactions, discovering drug and vaccine candidates, and so forth. These applications also include the management of infectious diseases caused by protozoal pathogens, such as Plasmodium, Trypanosoma, Toxoplasma, Cryptosporidium, and Giardia, a class of fatal or life-threatening causative agents capable of infecting humans and a wide range of animals. With the reduction of computational cost, availability of effective ML algorithms, popularization of ML tools, and accumulation of high-throughput data, it is possible to implement the integration of ML applications into increasing scientific research related to protozoal infection. Here, we will present a brief overview of important concepts in ML serving as background knowledge, with a focus on basic workflows, popular algorithms (e.g., support vector machine, random forest, and neural networks), feature extraction and selection, and model evaluation metrics. We will then review current ML applications and major advances concerning protozoal pathogens and protozoal infectious diseases through combination with correlative biology expertise and provide forward-looking insights for perspectives and opportunities in future advances in ML techniques in this field.
Collapse
Affiliation(s)
- Rui-Si Hu
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Abd El-Latif Hesham
- Genetics Department, Faculty of Agriculture, Beni-Suef University, Beni-Suef, Egypt
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- *Correspondence: Quan Zou,
| |
Collapse
|
7
|
Zhou H, Beltrán JF, Brito IL. Host-microbiome protein-protein interactions capture disease-relevant pathways. Genome Biol 2022; 23:72. [PMID: 35246229 PMCID: PMC8895870 DOI: 10.1186/s13059-022-02643-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 02/22/2022] [Indexed: 01/02/2023] Open
Abstract
Background Host-microbe interactions are crucial for normal physiological and immune system development and are implicated in a variety of diseases, including inflammatory bowel disease (IBD), colorectal cancer (CRC), obesity, and type 2 diabetes (T2D). Despite large-scale case-control studies aimed at identifying microbial taxa or genes involved in pathogeneses, the mechanisms linking them to disease have thus far remained elusive. Results To identify potential pathways through which human-associated bacteria impact host health, we leverage publicly-available interspecies protein-protein interaction (PPI) data to find clusters of microbiome-derived proteins with high sequence identity to known human-protein interactors. We observe differential targeting of putative human-interacting bacterial genes in nine independent metagenomic studies, finding evidence that the microbiome broadly targets human proteins involved in immune, oncogenic, apoptotic, and endocrine signaling pathways in relation to IBD, CRC, obesity, and T2D diagnoses. Conclusions This host-centric analysis provides a mechanistic hypothesis-generating platform and extensively adds human functional annotation to commensal bacterial proteins. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02643-9.
Collapse
Affiliation(s)
- Hao Zhou
- Department of Microbiology, Cornell University, Ithaca, NY, USA
| | - Juan Felipe Beltrán
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA
| | - Ilana Lauren Brito
- Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
8
|
Panditrao G, Ganguli P, Sarkar RR. Delineating infection strategies of Leishmania donovani secretory proteins in Human through host-pathogen protein Interactome prediction. Pathog Dis 2021; 79:6408463. [PMID: 34677584 DOI: 10.1093/femspd/ftab051] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 10/20/2021] [Indexed: 12/11/2022] Open
Abstract
Interactions of Leishmania donovani secretory virulence factors with the host proteins and their interplay during the infection process in humans is poorly studied in Visceral Leishmaniasis. Lack of a holistic study of pathway level de-regulations caused due to these virulence factors leads to a poor understanding of the parasite strategies to subvert the host immune responses, secure its survival inside the host and further the spread of infection to the visceral organs. In this study, we propose a computational workflow to predict host-pathogen protein interactome of L.donovani secretory virulence factors with human proteins combining sequence-based Interolog mapping and structure-based Domain Interaction mapping techniques. We further employ graph theoretical approaches and shortest path methods to analyze the interactome. Our study deciphers the infection paths involving some unique and understudied disease-associated signaling pathways influencing the cellular phenotypic responses in the host. Our statistical analysis based in silico knockout study unveils for the first time UBC, 1433Z and HS90A mediator proteins as potential immunomodulatory candidates through which the virulence factors employ the infection paths. These identified pathways and novel mediator proteins can be effectively used as possible targets to control and modulate the infection process further aiding in the treatment of Visceral Leishmaniasis.
Collapse
Affiliation(s)
- Gauri Panditrao
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune 411008, Maharashtra, India
| | - Piyali Ganguli
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune 411008, Maharashtra, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| | - Ram Rup Sarkar
- Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Pune 411008, Maharashtra, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, Uttar Pradesh, India
| |
Collapse
|
9
|
Makepeace BL, Psifidi A, Robledo D, Xia D. Editorial: Genetics Architecture and Underlying Molecular Mechanisms in Host-Pathogen Interactions. Front Genet 2021; 12:695109. [PMID: 34490034 PMCID: PMC8418152 DOI: 10.3389/fgene.2021.695109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/29/2021] [Indexed: 11/26/2022] Open
Affiliation(s)
- Benjamin L Makepeace
- Department of Infection Biology and Microbiomes, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - Androniki Psifidi
- Department of Clinical Science and Services, Royal Veterinary College, University of London, London, United Kingdom
| | - Diego Robledo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Dong Xia
- Department of Comparative Biomedical Science, Royal Veterinary College, University of London, London, United Kingdom
| |
Collapse
|
10
|
Application and Perspectives of MALDI-TOF Mass Spectrometry in Clinical Microbiology Laboratories. Microorganisms 2021; 9:microorganisms9071539. [PMID: 34361974 PMCID: PMC8307939 DOI: 10.3390/microorganisms9071539] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/06/2021] [Accepted: 07/18/2021] [Indexed: 12/11/2022] Open
Abstract
Early diagnosis of severe infections requires of a rapid and reliable diagnosis to initiate appropriate treatment, while avoiding unnecessary antimicrobial use and reducing associated morbidities and healthcare costs. It is a fact that conventional methods usually require more than 24–48 h to culture and profile bacterial species. Mass spectrometry (MS) is an analytical technique that has emerged as a powerful tool in clinical microbiology for identifying peptides and proteins, which makes it a promising tool for microbial identification. Matrix assisted laser desorption ionization–time of flight MS (MALDI–TOF MS) offers a cost- and time-effective alternative to conventional methods, such as bacterial culture and even 16S rRNA gene sequencing, for identifying viruses, bacteria and fungi and detecting virulence factors and mechanisms of resistance. This review provides an overview of the potential applications and perspectives of MS in clinical microbiology laboratories and proposes its use as a first-line method for microbial identification and diagnosis.
Collapse
|
11
|
Sudhakar P, Machiels K, Verstockt B, Korcsmaros T, Vermeire S. Computational Biology and Machine Learning Approaches to Understand Mechanistic Microbiome-Host Interactions. Front Microbiol 2021; 12:618856. [PMID: 34046017 PMCID: PMC8148342 DOI: 10.3389/fmicb.2021.618856] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Accepted: 03/19/2021] [Indexed: 12/11/2022] Open
Abstract
The microbiome, by virtue of its interactions with the host, is implicated in various host functions including its influence on nutrition and homeostasis. Many chronic diseases such as diabetes, cancer, inflammatory bowel diseases are characterized by a disruption of microbial communities in at least one biological niche/organ system. Various molecular mechanisms between microbial and host components such as proteins, RNAs, metabolites have recently been identified, thus filling many gaps in our understanding of how the microbiome modulates host processes. Concurrently, high-throughput technologies have enabled the profiling of heterogeneous datasets capturing community level changes in the microbiome as well as the host responses. However, due to limitations in parallel sampling and analytical procedures, big gaps still exist in terms of how the microbiome mechanistically influences host functions at a system and community level. In the past decade, computational biology and machine learning methodologies have been developed with the aim of filling the existing gaps. Due to the agnostic nature of the tools, they have been applied in diverse disease contexts to analyze and infer the interactions between the microbiome and host molecular components. Some of these approaches allow the identification and analysis of affected downstream host processes. Most of the tools statistically or mechanistically integrate different types of -omic and meta -omic datasets followed by functional/biological interpretation. In this review, we provide an overview of the landscape of computational approaches for investigating mechanistic interactions between individual microbes/microbiome and the host and the opportunities for basic and clinical research. These could include but are not limited to the development of activity- and mechanism-based biomarkers, uncovering mechanisms for therapeutic interventions and generating integrated signatures to stratify patients.
Collapse
Affiliation(s)
- Padhmanand Sudhakar
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Kathleen Machiels
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
| | - Bram Verstockt
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| | - Tamas Korcsmaros
- Earlham Institute, Norwich, United Kingdom
- Quadram Institute Bioscience, Norwich, United Kingdom
| | - Séverine Vermeire
- Department of Chronic Diseases, Metabolism and Ageing, Translational Research Center for Gastrointestinal Disorders (TARGID), KU Leuven, Leuven, Belgium
- Department of Gastroenterology and Hepatology, University Hospitals Leuven, KU Leuven, Leuven, Belgium
| |
Collapse
|
12
|
Prasasty VD, Hutagalung RA, Gunadi R, Sofia DY, Rosmalena R, Yazid F, Sinaga E. Prediction of human-Streptococcus pneumoniae protein-protein interactions using logistic regression. Comput Biol Chem 2021; 92:107492. [PMID: 33964803 DOI: 10.1016/j.compbiolchem.2021.107492] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 04/21/2021] [Indexed: 02/07/2023]
Abstract
Streptococcus pneumoniae is a major cause of mortality in children under five years old. In recent years, the emergence of antibiotic-resistant strains of S. pneumoniae increases the threat level of this pathogen. For that reason, the exploration of S. pneumoniae protein virulence factors should be considered in developing new drugs or vaccines, for instance by the analysis of host-pathogen protein-protein interactions (HP-PPIs). In this research, prediction of protein-protein interactions was performed with a logistic regression model with the number of protein domain occurrences as features. By utilizing HP-PPIs of three different pathogens as training data, the model achieved 57-77 % precision, 64-75 % recall, and 96-98 % specificity. Prediction of human-S. pneumoniae protein-protein interactions using the model yielded 5823 interactions involving thirty S. pneumoniae proteins and 324 human proteins. Pathway enrichment analysis showed that most of the pathways involved in the predicted interactions are immune system pathways. Network topology analysis revealed β-galactosidase (BgaA) as the most central among the S. pneumoniae proteins in the predicted HP-PPI networks, with a degree centrality of 1.0 and a betweenness centrality of 0.451853. Further experimental studies are required to validate the predicted interactions and examine their roles in S. pneumoniae infection.
Collapse
Affiliation(s)
- Vivitri Dewi Prasasty
- Faculty of Biotechnology, Atma Jaya Catholic University of Indonesia, Jakarta, 12930, Indonesia.
| | - Rory Anthony Hutagalung
- Faculty of Biotechnology, Atma Jaya Catholic University of Indonesia, Jakarta, 12930, Indonesia
| | - Reinhart Gunadi
- Department of Biology, Faculty of Life Sciences, Universitas Surya, Tangerang, Banten, 15143, Indonesia
| | - Dewi Yustika Sofia
- Department of Biology, Faculty of Life Sciences, Universitas Surya, Tangerang, Banten, 15143, Indonesia
| | - Rosmalena Rosmalena
- Department of Medical Chemistry, Faculty of Medicine, Universitas Indonesia, Jakarta, 10430, Indonesia
| | - Fatmawaty Yazid
- Department of Medical Chemistry, Faculty of Medicine, Universitas Indonesia, Jakarta, 10430, Indonesia
| | - Ernawati Sinaga
- Faculty of Biology, Universitas Nasional, Jakarta, 12520, Indonesia.
| |
Collapse
|
13
|
KÖSESOY İ, GÖK M, KAHVECİ T. Prediction of host-pathogen protein interactions by extended network model. Turk J Biol 2021; 45:138-148. [PMID: 33907496 PMCID: PMC8068772 DOI: 10.3906/biy-2009-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 01/04/2021] [Indexed: 11/26/2022] Open
Abstract
Knowledge of the pathogen-host interactions between the species is essentialin order to develop a solution strategy against infectious diseases. In vitro methods take extended periods of time to detect interactions and provide very few of the possible interaction pairs. Hence, modelling interactions between proteins has necessitated the development of computational methods. The main scope of this paper is integrating the known protein interactions between thehost and pathogen organisms to improve the prediction success rate of unknown pathogen-host interactions. Thus, the truepositive rate of the predictions was expected to increase.In order to perform this study extensively, encoding methods and learning algorithms of several proteins were tested. Along with human as the host organism, two different pathogen organisms were used in the experiments. For each combination of protein-encoding and prediction method, both the original prediction algorithms were tested using only pathogen-host interactions and the same methodwas testedagain after integrating the known protein interactions within each organism. The effect of merging the networks of pathogen-host interactions of different species on the prediction performance of state-of-the-art methods was also observed. Successwas measured in terms of Matthews correlation coefficient, precision, recall, F1 score, and accuracy metrics. Empirical results showed that integrating the host and pathogen interactions yields better performance consistently in almost all experiments.
Collapse
Affiliation(s)
- İrfan KÖSESOY
- Department of Computer Engineering, Faculty of Engineering, Yalova University, YalovaTurkey
| | - Murat GÖK
- Department of Computer Engineering, Faculty of Engineering, Yalova University, YalovaTurkey
| | - Tamer KAHVECİ
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FLUSA
| |
Collapse
|
14
|
Wang Y, Zhou M, Zou Q, Xu L. Machine learning for phytopathology: from the molecular scale towards the network scale. Brief Bioinform 2021; 22:6204793. [PMID: 33787847 DOI: 10.1093/bib/bbab037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/09/2021] [Accepted: 01/26/2021] [Indexed: 01/16/2023] Open
Abstract
With the increasing volume of high-throughput sequencing data from a variety of omics techniques in the field of plant-pathogen interactions, sorting, retrieving, processing and visualizing biological information have become a great challenge. Within the explosion of data, machine learning offers powerful tools to process these complex omics data by various algorithms, such as Bayesian reasoning, support vector machine and random forest. Here, we introduce the basic frameworks of machine learning in dissecting plant-pathogen interactions and discuss the applications and advances of machine learning in plant-pathogen interactions from molecular to network biology, including the prediction of pathogen effectors, plant disease resistance protein monitoring and the discovery of protein-protein networks. The aim of this review is to provide a summary of advances in plant defense and pathogen infection and to indicate the important developments of machine learning in phytopathology.
Collapse
Affiliation(s)
- Yansu Wang
- Postdoctoral Innovation Practice Base, Shenzhen Polytechnic, China
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- Shenzhen Polytechnic, China
| |
Collapse
|
15
|
Chen H, Shen J, Wang L, Chi C. APEX2S: A two‐layer machine learning model for discovery of host‐pathogen protein‐protein interactions on cloud‐based multiomics data. CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE 2020; 32. [DOI: 10.1002/cpe.5846] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 04/30/2020] [Indexed: 01/03/2025]
Abstract
SummaryPresented by the avalanche of biological interactions data, computational biology is now facing greater challenges on big data analysis and solicits more studies to mine and integrate cloud‐based multiomics data, especially when the data are related to infectious diseases. Meanwhile, machine learning techniques have recently succeeded in different computational biology tasks. In this article, we have calibrated the focus for host‐pathogen protein‐protein interactions study, aiming to apply the machine learning techniques for learning the interactions data and making predictions. A comprehensive and practical workflow to harness different cloud‐based multiomics data is discussed. In particular, a novel two‐layer machine learning model, namely APEX2S, is proposed for discovery of the protein‐protein interactions data. The results show that our model can better learn and predict from the accumulated host‐pathogen protein‐protein interactions.
Collapse
Affiliation(s)
- Huaming Chen
- School of Computing and Information Technology University of Wollongong Wollongong New South Wales Australia
| | - Jun Shen
- School of Computing and Information Technology University of Wollongong Wollongong New South Wales Australia
| | - Lei Wang
- School of Computing and Information Technology University of Wollongong Wollongong New South Wales Australia
| | | |
Collapse
|
16
|
Chen H, Li F, Wang L, Jin Y, Chi CH, Kurgan L, Song J, Shen J. Systematic evaluation of machine learning methods for identifying human-pathogen protein-protein interactions. Brief Bioinform 2020; 22:5847611. [PMID: 32459334 DOI: 10.1093/bib/bbaa068] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 03/31/2020] [Accepted: 04/01/2020] [Indexed: 12/11/2022] Open
Abstract
In recent years, high-throughput experimental techniques have significantly enhanced the accuracy and coverage of protein-protein interaction identification, including human-pathogen protein-protein interactions (HP-PPIs). Despite this progress, experimental methods are, in general, expensive in terms of both time and labour costs, especially considering that there are enormous amounts of potential protein-interacting partners. Developing computational methods to predict interactions between human and bacteria pathogen has thus become critical and meaningful, in both facilitating the detection of interactions and mining incomplete interaction maps. In this paper, we present a systematic evaluation of machine learning-based computational methods for human-bacterium protein-protein interactions (HB-PPIs). We first reviewed a vast number of publicly available databases of HP-PPIs and then critically evaluate the availability of these databases. Benefitting from its well-structured nature, we subsequently preprocess the data and identified six bacterium pathogens that could be used to study bacterium subjects in which a human was the host. Additionally, we thoroughly reviewed the literature on 'host-pathogen interactions' whereby existing models were summarized that we used to jointly study the impact of different feature representation algorithms and evaluate the performance of existing machine learning computational models. Owing to the abundance of sequence information and the limited scale of other protein-related information, we adopted the primary protocol from the literature and dedicated our analysis to a comprehensive assessment of sequence information and machine learning models. A systematic evaluation of machine learning models and a wide range of feature representation algorithms based on sequence information are presented as a comparison survey towards the prediction performance evaluation of HB-PPIs.
Collapse
|
17
|
Verma M, Shukla K. Convergence analysis of accelerated proximal extra-gradient method with applications. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.049] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
18
|
Barman RK, Mukhopadhyay A, Maulik U, Das S. Identification of infectious disease-associated host genes using machine learning techniques. BMC Bioinformatics 2019; 20:736. [PMID: 31881961 PMCID: PMC6935192 DOI: 10.1186/s12859-019-3317-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 12/16/2019] [Indexed: 02/06/2023] Open
Abstract
Background With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. Results We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. Conclusions To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.
Collapse
Affiliation(s)
- Ranjan Kumar Barman
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India.,Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Anirban Mukhopadhyay
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
| | - Ujjwal Maulik
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Santasabuj Das
- Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India. .,Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, P-33, C.I.T.Road Scheme XM, Beliaghata-700010, Kolkata, West Bengal, India.
| |
Collapse
|
19
|
Mei S, Zhang K. In silico unravelling pathogen-host signaling cross-talks via pathogen mimicry and human protein-protein interaction networks. Comput Struct Biotechnol J 2019; 18:100-113. [PMID: 31956393 PMCID: PMC6956678 DOI: 10.1016/j.csbj.2019.12.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/07/2019] [Accepted: 12/14/2019] [Indexed: 01/08/2023] Open
Abstract
Pathogen-host protein interactions are fundamental for pathogens to manipulate host signaling pathways and subvert host immune defense. For most pathogens, very few or no experimental studies have been conducted to investigate their signaling cross-talks with host. In this study, we propose a computational framework to validate the biological assumption that human protein-protein interaction (PPI) networks alone are sufficient to infer pathogen-host PPIs via pathogen functional mimicry. Pathogen functional mimicry assumes that a pathogen functionally mimics and substitutes host counterpart proteins in order for the pathogen to get involved in or hijack the host cellular processes. Through pathogen functional mimicry defined via gene ontology (GO) semantic similarity, we first use the known human PPIs as templates to infer pathogen-host PPIs, and the PPIs are further used as training data to build an l2-regularized logistic regression model for novel pathogen-host PPI prediction. Independent tests on the experimental data from human immunodeficiency virus and Francisella tularensis validate the effectiveness of the proposed pathogen functional mimicry technique. Performance comparisons also show that the proposed technique y excels the existing pathogen sequence mimicry approaches and transfer learning methods. The proposed framework provides a new avenue to study the experimentally less-studied pathogens in the worst scenarios that very few or no experimental pathogen-host PPIs are available. As two case studies, we apply the proposed framework to Salmonella typhimurium and Human respiratory syncytial virus to reconstruct the pathogen-host PPI networks and further investigate the interference of these two pathogens with human immune signaling and transcription regulatory system.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang 110034, China
| | - Kun Zhang
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| |
Collapse
|
20
|
Zheng N, Wang K, Zhan W, Deng L. Targeting Virus-host Protein Interactions: Feature Extraction and Machine Learning Approaches. Curr Drug Metab 2019; 20:177-184. [PMID: 30156155 DOI: 10.2174/1389200219666180829121038] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Revised: 05/21/2018] [Accepted: 08/02/2018] [Indexed: 01/15/2023]
Abstract
BACKGROUND Targeting critical viral-host Protein-Protein Interactions (PPIs) has enormous application prospects for therapeutics. Using experimental methods to evaluate all possible virus-host PPIs is labor-intensive and time-consuming. Recent growth in computational identification of virus-host PPIs provides new opportunities for gaining biological insights, including applications in disease control. We provide an overview of recent computational approaches for studying virus-host PPI interactions. METHODS In this review, a variety of computational methods for virus-host PPIs prediction have been surveyed. These methods are categorized based on the features they utilize and different machine learning algorithms including classical and novel methods. RESULTS We describe the pivotal and representative features extracted from relevant sources of biological data, mainly include sequence signatures, known domain interactions, protein motifs and protein structure information. We focus on state-of-the-art machine learning algorithms that are used to build binary prediction models for the classification of virus-host protein pairs and discuss their abilities, weakness and future directions. CONCLUSION The findings of this review confirm the importance of computational methods for finding the potential protein-protein interactions between virus and host. Although there has been significant progress in the prediction of virus-host PPIs in recent years, there is a lot of room for improvement in virus-host PPI prediction.
Collapse
Affiliation(s)
- Nantao Zheng
- School of Software, Central South University, Changsha, 410075, China
| | - Kairou Wang
- School of Software, Central South University, Changsha, 410075, China
| | - Weihua Zhan
- School of Electronics and Computer Science, Zhejiang Wanli University, Ningbo 315100, China
| | - Lei Deng
- School of Software, Central South University, Changsha, 410075, China.,Shanghai Key Lab of Intelligent Information Processing, Shanghai 200433, China
| |
Collapse
|
21
|
Ahmed I, Witbooi P, Christoffels A. Prediction of human-Bacillus anthracis protein-protein interactions using multi-layer neural network. Bioinformatics 2019; 34:4159-4164. [PMID: 29945178 PMCID: PMC6289132 DOI: 10.1093/bioinformatics/bty504] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Accepted: 06/24/2018] [Indexed: 12/22/2022] Open
Abstract
Motivation Triplet amino acids have successfully been included in feature selection to predict human-HPV protein-protein interactions (PPI). The utility of supervised learning methods is curtailed due to experimental data not being available in sufficient quantities. Improvements in machine learning techniques and features selection will enhance the study of PPI between host and pathogen. Results We present a comparison of a neural network model versus SVM for prediction of host-pathogen PPI based on a combination of features including: amino acid quadruplets, pairwise sequence similarity, and human interactome properties. The neural network and SVM were implemented using Python Sklearn library. The neural network model using quadruplet features and other network features outperformance the SVM model. The models are tested against published predictors and then applied to the human-B.anthracis case. Gene ontology term enrichment analysis identifies immunology response and regulation as functions of interacting proteins. For prediction of Human-viral PPI, our model (neural network) is a significant improvement in overall performance compared to a predictor using the triplets feature and achieves a good accuracy in predicting human-B.anthracis PPI. Availability and implementation All code can be downloaded from ftp://ftp.sanbi.ac.za/machine_learning/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ibrahim Ahmed
- South African National Bioinformatics Institute, South African MRC Bioinformatics Unit
| | - Peter Witbooi
- Department of Mathematics and Applied Mathematics, University of the Western Cape, Bellville, South Africa
| | - Alan Christoffels
- South African National Bioinformatics Institute, South African MRC Bioinformatics Unit
| |
Collapse
|
22
|
Saha S, Sengupta K, Chatterjee P, Basu S, Nasipuri M. Analysis of protein targets in pathogen-host interaction in infectious diseases: a case study on Plasmodium falciparum and Homo sapiens interaction network. Brief Funct Genomics 2019; 17:441-450. [PMID: 29028886 DOI: 10.1093/bfgp/elx024] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Infection and disease progression is the outcome of protein interactions between pathogen and host. Pathogen, the role player of Infection, is becoming a severe threat to life as because of its adaptability toward drugs and evolutionary dynamism in nature. Identifying protein targets by analyzing protein interactions between host and pathogen is the key point. Proteins with higher degree and possessing some topologically significant graph theoretical measures are found to be drug targets. On the other hand, exceptional nodes may be involved in infection mechanism because of some pathway process and biologically unknown factors. In this article, we attempt to investigate characteristics of host-pathogen protein interactions by presenting a comprehensive review of computational approaches applied on different infectious diseases. As an illustration, we have analyzed a case study on infectious disease malaria, with its causative agent Plasmodium falciparum acting as 'Bait' and host, Homo sapiens/human acting as 'Prey'. In this pathogen-host interaction network based on some interconnectivity and centrality properties, proteins are viewed as central, peripheral, hub and non-hub nodes and their significance on infection process. Besides, it is observed that because of sparseness of the pathogen and host interaction network, there may be some topologically unimportant but biologically significant proteins, which can also act as Bait/Prey. So, functional similarity or gene ontology mapping can help us in this case to identify these proteins.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science and Engineering at Dr Sudhir Chandra Sur Degree Engineering College, India
| | - Kaustav Sengupta
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, India
| |
Collapse
|
23
|
Lian X, Yang S, Li H, Fu C, Zhang Z. Machine-Learning-Based Predictor of Human–Bacteria Protein–Protein Interactions by Incorporating Comprehensive Host-Network Properties. J Proteome Res 2019; 18:2195-2205. [DOI: 10.1021/acs.jproteome.9b00074] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Xianyi Lian
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Shiping Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Hong Li
- Key Laboratory of Tropical Biological Resources of Ministry of Education, Hainan University, Haikou, 570228, China
| | - Chen Fu
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
24
|
Ivan FX, Kwoh CK, Chow VT, Zheng J. Genome Analysis – Identification of Genes Involved in Host-Pathogen Protein-Protein Interaction Networks. ENCYCLOPEDIA OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY 2019:410-424. [DOI: 10.1016/b978-0-12-809633-8.20124-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
25
|
A new sequence based encoding for prediction of host-pathogen protein interactions. Comput Biol Chem 2018; 78:170-177. [PMID: 30553999 DOI: 10.1016/j.compbiolchem.2018.12.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 08/23/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
Pathogen-host interactions are very important to figure out the infection process at the molecular level, where pathogen proteins physically bind to human proteins to manipulate critical biological processes in the host cell. Data scarcity and data unavailability are two major problems for computational approaches in the prediction of pathogen-host interactions. Developing a computational method to predict pathogen-host interactions with high accuracy, based on protein sequences alone, is of great importance because it can eliminate these problems. In this study, we propose a novel and robust sequence based feature extraction method, named Location Based Encoding, to predict pathogen-host interactions with machine learning based algorithms. In this context, we use Bacillus Anthracis and Yersinia Pestis data sets as the pathogen organisms and human proteins as the host model to compare our method with sequence based protein encoding methods, which are widely used in the literature, namely amino acid composition, amino acid pair, and conjoint triad. We use these encoding methods with decision trees (Random Forest, j48), statistical (Bayesian Networks, Naive Bayes), and instance based (kNN) classifiers to predict pathogen-host interactions. We conduct different experiments to evaluate the effectiveness of our method. We obtain the best results among all the experiments with RF classifier in terms of F1, accuracy, MCC, and AUC.
Collapse
|
26
|
Halder AK, Dutta P, Kundu M, Basu S, Nasipuri M. Review of computational methods for virus-host protein interaction prediction: a case study on novel Ebola-human interactions. Brief Funct Genomics 2018; 17:381-391. [PMID: 29028879 PMCID: PMC7109800 DOI: 10.1093/bfgp/elx026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Identification of potential virus-host interactions is useful and vital to control the highly infectious virus-caused diseases. This may contribute toward development of new drugs to treat the viral infections. Recently, database records of clinically and experimentally validated interactions between a small set of human proteins and Ebola virus (EBOV) have been published. Using the information of the known human interaction partners of EBOV, our main objective is to identify a set of proteins that may interact with EBOV proteins. Here, we first review the state-of-the-art, computational methods used for prediction of novel virus-host interactions for infectious diseases followed by a case study on EBOV-human interactions. The assessment result shows that the predicted human host proteins are highly similar with known human interaction partners of EBOV in the context of structure and semantics and are responsible for similar biochemical activities, pathways and host-pathogen relationships.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Pritha Dutta
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Mahantapas Kundu
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, India
| |
Collapse
|
27
|
Zhang L, Liu JY, Gu H, Du Y, Zuo JF, Zhang Z, Zhang M, Li P, Dunwell JM, Cao Y, Zhang Z, Zhang YM. Bradyrhizobium diazoefficiens USDA 110- Glycine max Interactome Provides Candidate Proteins Associated with Symbiosis. J Proteome Res 2018; 17:3061-3074. [PMID: 30091610 DOI: 10.1021/acs.jproteome.8b00209] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Although the legume-rhizobium symbiosis is a most-important biological process, there is a limited knowledge about the protein interaction network between host and symbiont. Using interolog- and domain-based approaches, we constructed an interspecies protein interactome containing 5115 protein-protein interactions between 2291 Glycine max and 290 Bradyrhizobium diazoefficiens USDA 110 proteins. The interactome was further validated by the expression pattern analysis in nodules, gene ontology term semantic similarity, co-expression analysis, and luciferase complementation image assay. In the G. max-B. diazoefficiens interactome, bacterial proteins are mainly ion channel and transporters of carbohydrates and cations, while G. max proteins are mainly involved in the processes of metabolism, signal transduction, and transport. We also identified the top 10 highly interacting proteins (hubs) for each species. Kyoto Encyclopedia of Genes and Genomes pathway analysis for each hub showed that a pair of 14-3-3 proteins (SGF14g and SGF14k) and 5 heat shock proteins in G. max are possibly involved in symbiosis, and 10 hubs in B. diazoefficiens may be important symbiotic effectors. Subnetwork analysis showed that 18 symbiosis-related soluble N-ethylmaleimide sensitive factor attachment protein receptor proteins may play roles in regulating bacterial ion channels, and SGF14g and SGF14k possibly regulate the rhizobium dicarboxylate transport protein DctA. The predicted interactome provide a valuable basis for understanding the molecular mechanism of nodulation in soybean.
Collapse
Affiliation(s)
- Li Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
- School of Public Health , Xinxiang Medical University , Xinxiang 453003 , China
| | - Jin-Yang Liu
- College of Agriculture, Nanjing Agricultural University , Nanjing 210095 , China
| | - Huan Gu
- College of Agriculture, Nanjing Agricultural University , Nanjing 210095 , China
| | - Yanfang Du
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Jian-Fang Zuo
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Zhibin Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Menglin Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Pan Li
- School of Public Health , Xinxiang Medical University , Xinxiang 453003 , China
| | - Jim M Dunwell
- School of Agriculture, Policy and Development , University of Reading , Reading RG6 6AR , United Kingdom
| | - Yangrong Cao
- College of Life Science and Technology , Huazhong Agricultural University , Wuhan 430070 , China
| | - Zuxin Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| | - Yuan-Ming Zhang
- Crop Information Center , College of Plant Science and Technology, Huazhong Agricultural University , Wuhan 430070 , China
| |
Collapse
|
28
|
Soyemi J, Isewon I, Oyelade J, Adebiyi E. Inter-Species/Host-Parasite Protein Interaction Predictions Reviewed. Curr Bioinform 2018; 13:396-406. [PMID: 31496926 PMCID: PMC6691774 DOI: 10.2174/1574893613666180108155851] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Revised: 12/31/2017] [Accepted: 01/02/2018] [Indexed: 01/01/2023]
Abstract
BACKGROUND Host-parasite protein interactions (HPPI) are those interactions occurring between a parasite and its host. Host-parasite protein interaction enhances the understanding of how parasite can infect its host. The interaction plays an important role in initiating infections, although it is not all host-parasite interactions that result in infection. Identifying the protein-protein interactions (PPIs) that allow a parasite to infect its host has a lot do in discovering possible drug targets. Such PPIs, when altered, would prevent the host from being infected by the parasite and in some cases, result in the parasite inability to complete specific stages of its life cycle and invariably lead to the death of such parasite. It therefore becomes important to understand the workings of host-parasite interactions which are the major causes of most infectious diseases. OBJECTIVE Many studies have been conducted in literature to predict HPPI, mostly using computational methods with few experimental methods. Computational method has proved to be faster and more efficient in manipulating and analyzing real life data. This study looks at various computational methods used in literature for host-parasite/inter-species protein-protein interaction predictions with the hope of getting a better insight into computational methods used and identify whether machine learning approaches have been extensively used for the same purpose. METHODS The various methods involved in host-parasite protein interactions were reviewed with their individual strengths. Tabulations of studies that carried out host-parasite/inter-species protein interaction predictions were performed, analyzing their predictive methods, filters used, potential protein-protein interactions discovered in those studies and various validation measurements used as the case may be. The commonly used measurement indexes for such studies were highlighted displaying the various formulas. Finally, future prospects of studies specific to human-plasmodium falciparum PPI predictions were proposed. RESULT We discovered that quite a few studies reviewed implemented machine learning approach for HPPI predictions when compared with methods such as sequence homology search and protein structure and domain-motif. The key challenge well noted in HPPI predictions is getting relevant information. CONCLUSION This review presents useful knowledge and future directions on the subject matter.
Collapse
Affiliation(s)
- Jumoke Soyemi
- Department of Computer Science, The Federal Polytechnic, Ilaro, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| | - Itunnuoluwa Isewon
- Department of Computer & Information Sciences, Covenant University, Ota, Nigeria and
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| | - Jelili Oyelade
- Department of Computer & Information Sciences, Covenant University, Ota, Nigeria and
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Nigeria and
- Covenant University Bioinformatics Research (CUBRe), Ota, Nigeria
| |
Collapse
|
29
|
Devkota P, Danzi MC, Wuchty S. Beyond degree and betweenness centrality: Alternative topological measures to predict viral targets. PLoS One 2018; 13:e0197595. [PMID: 29795705 PMCID: PMC5967884 DOI: 10.1371/journal.pone.0197595] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Accepted: 05/04/2018] [Indexed: 11/18/2022] Open
Abstract
The availability of large-scale screens of host-virus interaction interfaces enabled the topological analysis of viral protein targets of the host. In particular, host proteins that bind viral proteins are generally hubs and proteins with high betweenness centrality. Recently, other topological measures were introduced that a virus may tap to infect a host cell. Utilizing experimentally determined sets of human protein targets from Herpes, Hepatitis, HIV and Influenza, we pooled molecular interactions between proteins from different pathway databases. Apart from a protein's degree and betweenness centrality, we considered a protein's pathway participation, ability to topologically control a network and protein PageRank index. In particular, we found that proteins with increasing values of such measures tend to accumulate viral targets and distinguish viral targets from non-targets. Furthermore, all such topological measures strongly correlate with the occurrence of a given protein in different pathways. Building a random forest classifier that is based on such topological measures, we found that protein PageRank index had the highest impact on the classification of viral (non-)targets while proteins' ability to topologically control an interaction network played the least important role.
Collapse
Affiliation(s)
- Prajwal Devkota
- Dept. of Computer Science, Univ. of Miami, Coral Gables, FL, United States of America
| | - Matt C. Danzi
- The Miami Project to Cure Paralysis, Miller School of Medicine, University of Miami, Miami, FL, United States of America
- Center for Computational Science, Univ. of Miami, Coral Gables, FL, United States of America
| | - Stefan Wuchty
- Dept. of Computer Science, Univ. of Miami, Coral Gables, FL, United States of America
- Center for Computational Science, Univ. of Miami, Coral Gables, FL, United States of America
- Dept. of Biology, Univ. of Miami, Coral Gables, FL, United States of America
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States of America
- * E-mail:
| |
Collapse
|
30
|
Valleron AJ. Data Science Priorities for a University Hospital-Based Institute of Infectious Diseases: A Viewpoint. Clin Infect Dis 2018; 65:S84-S88. [PMID: 28859346 DOI: 10.1093/cid/cix351] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Automation of laboratory tests, bioinformatic analysis of biological sequences, and professional data management are used routinely in a modern university hospital-based infectious diseases institute. This dates back to at least the 1980s. However, the scientific methods of this 21st century are changing with the increased power and speed of computers, with the "big data" revolution having already happened in genomics and environment, and eventually arriving in medical informatics. The research will be increasingly "data driven," and the powerful machine learning methods whose efficiency is demonstrated in daily life will also revolutionize medical research. A university-based institute of infectious diseases must therefore not only gather excellent computer scientists and statisticians (as in the past, and as in any medical discipline), but also fully integrate the biologists and clinicians with these computer scientists, statisticians, and mathematical modelers having a broad culture in machine learning, knowledge representation, and knowledge discovery.
Collapse
|
31
|
A Machine Learning Approach for Air Quality Prediction: Model Regularization and Optimization. BIG DATA AND COGNITIVE COMPUTING 2018. [DOI: 10.3390/bdcc2010005] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, we tackle air quality forecasting by using machine learning approaches to predict the hourly concentration of air pollutants (e.g., ozone, particle matter ( PM 2.5 ) and sulfur dioxide). Machine learning, as one of the most popular techniques, is able to efficiently train a model on big data by using large-scale optimization algorithms. Although there exist some works applying machine learning to air quality prediction, most of the prior studies are restricted to several-year data and simply train standard regression models (linear or nonlinear) to predict the hourly air pollution concentration. In this work, we propose refined models to predict the hourly air pollution concentration on the basis of meteorological data of previous days by formulating the prediction over 24 h as a multi-task learning (MTL) problem. This enables us to select a good model with different regularization techniques. We propose a useful regularization by enforcing the prediction models of consecutive hours to be close to each other and compare it with several typical regularizations for MTL, including standard Frobenius norm regularization, nuclear norm regularization, and ℓ 2 , 1 -norm regularization. Our experiments have showed that the proposed parameter-reducing formulations and consecutive-hour-related regularizations achieve better performance than existing standard regression models and existing regularizations.
Collapse
|
32
|
Yang S, Li H, He H, Zhou Y, Zhang Z. Critical assessment and performance improvement of plant–pathogen protein–protein interaction prediction methods. Brief Bioinform 2017; 20:274-287. [DOI: 10.1093/bib/bbx123] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Indexed: 01/15/2023] Open
Affiliation(s)
- Shiping Yang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| | - Hong Li
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| | - Huaqin He
- College of Life Sciences, Fujian Agriculture and Forestry University
| | - Yuan Zhou
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University
| |
Collapse
|
33
|
Kshirsagar M, Murugesan K, Carbonell JG, Klein-Seetharaman J. Multitask Matrix Completion for Learning Protein Interactions Across Diseases. J Comput Biol 2017; 24:501-514. [PMID: 28128642 DOI: 10.1089/cmb.2016.0201] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Disease-causing pathogens such as viruses introduce their proteins into the host cells in which they interact with the host's proteins, enabling the virus to replicate inside the host. These interactions between pathogen and host proteins are key to understanding infectious diseases. Often multiple diseases involve phylogenetically related or biologically similar pathogens. Here we present a multitask learning method to jointly model interactions between human proteins and three different but related viruses: Hepatitis C, Ebola virus, and Influenza A. Our multitask matrix completion-based model uses a shared low-rank structure in addition to a task-specific sparse structure to incorporate the various interactions. We obtain between 7 and 39 percentage points improvement in predictive performance over prior state-of-the-art models. We show how our model's parameters can be interpreted to reveal both general and specific interaction-relevant characteristics of the viruses. Our code is available online.
Collapse
Affiliation(s)
| | - Keerthiram Murugesan
- 2 Language Technologies Institute, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - Jaime G Carbonell
- 2 Language Technologies Institute, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - Judith Klein-Seetharaman
- 3 Metabolic & Vascular Health, Warwick Medical School, University of Warwick , Coventry, United Kingdom
| |
Collapse
|
34
|
Durmuş S, Ülgen KÖ. Comparative interactomics for virus-human protein-protein interactions: DNA viruses versus RNA viruses. FEBS Open Bio 2017; 7:96-107. [PMID: 28097092 PMCID: PMC5221455 DOI: 10.1002/2211-5463.12167] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Revised: 11/06/2016] [Accepted: 11/16/2016] [Indexed: 01/01/2023] Open
Abstract
Viruses are obligatory intracellular pathogens and completely depend on their hosts for survival and reproduction. The strategies adopted by viruses to exploit host cell processes and to evade host immune systems during infections may differ largely with the type of the viral genetic material. An improved understanding of these viral infection mechanisms is only possible through a better understanding of the pathogen-host interactions (PHIs) that enable viruses to enter into the host cells and manipulate the cellular mechanisms to their own advantage. Experimentally-verified protein-protein interaction (PPI) data of pathogen-host systems only became available at large scale within the last decade. In this study, we comparatively analyzed the current PHI networks belonging to DNA and RNA viruses and their human host, to get insights into the infection strategies used by these viral groups. We investigated the functional properties of human proteins in the PHI networks, to observe and compare the attack strategies of DNA and RNA viruses. We observed that DNA viruses are able to attack both human cellular and metabolic processes simultaneously during infections. On the other hand, RNA viruses preferentially interact with human proteins functioning in specific cellular processes as well as in intracellular transport and localization within the cell. Observing virus-targeted human proteins, we propose heterogeneous nuclear ribonucleoproteins and transporter proteins as potential antiviral therapeutic targets. The observed common and specific infection mechanisms in terms of viral strategies to attack human proteins may provide crucial information for further design of broad and specific next-generation antiviral therapeutics.
Collapse
Affiliation(s)
- Saliha Durmuş
- Computational Systems Biology GroupDepartment of BioengineeringGebze Technical UniversityKocaeliTurkey
| | - Kutlu Ö. Ülgen
- Department of Chemical EngineeringBoğaziçi UniversityİstanbulTurkey
| |
Collapse
|
35
|
Jindalertudomdee J, Hayashida M, Song J, Akutsu T. Host-Pathogen Protein Interaction Prediction Based on Local Topology Structures of a Protein Interaction Network. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE) 2016:7-12. [DOI: 10.1109/bibe.2016.26] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
36
|
Mei S, Zhang K. Computational discovery of Epstein-Barr virus targeted human genes and signalling pathways. Sci Rep 2016; 6:30612. [PMID: 27470517 PMCID: PMC4965740 DOI: 10.1038/srep30612] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 07/05/2016] [Indexed: 12/22/2022] Open
Abstract
Epstein-Barr virus (EBV) plays important roles in the origin and the progression of human carcinomas, e.g. diffuse large B cell tumors, T cell lymphomas, etc. Discovering EBV targeted human genes and signaling pathways is vital to understand EBV tumorigenesis. In this study we propose a noise-tolerant homolog knowledge transfer method to reconstruct functional protein-protein interactions (PPI) networks between Epstein-Barr virus and Homo sapiens. The training set is augmented via homolog instances and the homolog noise is counteracted by support vector machine (SVM). Additionally we propose two methods to define subcellular co-localization (i.e. stringent and relaxed), based on which to further derive physical PPI networks. Computational results show that the proposed method achieves sound performance of cross validation and independent test. In the space of 648,672 EBV-human protein pairs, we obtain 51,485 functional interactions (7.94%), 869 stringent physical PPIs and 46,050 relaxed physical PPIs. Fifty-eight evidences are found from the latest database and recent literature to validate the model. This study reveals that Epstein-Barr virus interferes with normal human cell life, such as cholesterol homeostasis, blood coagulation, EGFR binding, p53 binding, Notch signaling, Hedgehog signaling, etc. The proteome-wide predictions are provided in the supplementary file for further biomedical research.
Collapse
Affiliation(s)
- Suyu Mei
- Software College, Shenyang Normal University, Shenyang, 110034, China
| | - Kun Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| |
Collapse
|
37
|
Sen R, Nayak L, De RK. A review on host-pathogen interactions: classification and prediction. Eur J Clin Microbiol Infect Dis 2016; 35:1581-99. [PMID: 27470504 DOI: 10.1007/s10096-016-2716-7] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 06/22/2016] [Indexed: 01/01/2023]
Abstract
The research on host-pathogen interactions is an ever-emerging and evolving field. Every other day a new pathogen gets discovered, along with comes the challenge of its prevention and cure. As the intelligent human always vies for prevention, which is better than cure, understanding the mechanisms of host-pathogen interactions gets prior importance. There are many mechanisms involved from the pathogen as well as the host sides while an interaction happens. It is a vis-a-vis fight of the counter genes and proteins from both sides. Who wins depends on whether a host gets an infection or not. Moreover, a higher level of complexity arises when the pathogens evolve and become resistant to a host's defense mechanisms. Such pathogens pose serious challenges for treatment. The entire human population is in danger of such long-lasting persistent infections. Some of these infections even increase the rate of mortality. Hence there is an immediate emergency to understand how the pathogens interact with their host for successful invasion. It may lead to discovery of appropriate preventive measures, and the development of rational therapeutic measures and medication against such infections and diseases. This review, a state-of-the-art updated scenario of host-pathogen interaction research, has been done by keeping in mind this urgency. It covers the biological and computational aspects of host-pathogen interactions, classification of the methods by which the pathogens interact with their hosts, different machine learning techniques for prediction of host-pathogen interactions, and future scopes of this research field.
Collapse
Affiliation(s)
- R Sen
- Machine Intelligence Unit, Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata, 700108, India
| | - L Nayak
- Machine Intelligence Unit, Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata, 700108, India
| | - R K De
- Machine Intelligence Unit, Indian Statistical Institute, 203, Barrackpore Trunk Road, Kolkata, 700108, India.
| |
Collapse
|
38
|
Chen H, Shen J, Wang L, Song J. Towards Data Analytics of Pathogen-Host Protein-Protein Interaction: A Survey. 2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS) 2016:377-388. [DOI: 10.1109/bigdatacongress.2016.60] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
39
|
Abbasi WA, Minhas FUAA. Issues in performance evaluation for host-pathogen protein interaction prediction. J Bioinform Comput Biol 2016; 14:1650011. [PMID: 26932275 DOI: 10.1142/s0219720016500116] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The study of interactions between host and pathogen proteins is important for understanding the underlying mechanisms of infectious diseases and for developing novel therapeutic solutions. Wet-lab techniques for detecting protein-protein interactions (PPIs) can benefit from computational predictions. Machine learning is one of the computational approaches that can assist biologists by predicting promising PPIs. A number of machine learning based methods for predicting host-pathogen interactions (HPI) have been proposed in the literature. The techniques used for assessing the accuracy of such predictors are of critical importance in this domain. In this paper, we question the effectiveness of K-fold cross-validation for estimating the generalization ability of HPI prediction for proteins with no known interactions. K-fold cross-validation does not model this scenario, and we demonstrate a sizable difference between its performance and the performance of an alternative evaluation scheme called leave one pathogen protein out (LOPO) cross-validation. LOPO is more effective in modeling the real world use of HPI predictors, specifically for cases in which no information about the interacting partners of a pathogen protein is available during training. We also point out that currently used metrics such as areas under the precision-recall or receiver operating characteristic curves are not intuitive to biologists and propose simpler and more directly interpretable metrics for this purpose.
Collapse
Affiliation(s)
- Wajid Arshad Abbasi
- 1 Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad, Pakistan
| | - Fayyaz Ul Amir Afsar Minhas
- 1 Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences (PIEAS), Nilore, Islamabad, Pakistan
| |
Collapse
|
40
|
Proactive Transfer Learning for Heterogeneous Feature and Label Spaces. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES 2016. [DOI: 10.1007/978-3-319-46227-1_44] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
41
|
Rai AN, Epperson WB, Nanduri B. Application of Functional Genomics for Bovine Respiratory Disease Diagnostics. Bioinform Biol Insights 2015; 9:13-23. [PMID: 26526746 PMCID: PMC4620937 DOI: 10.4137/bbi.s30525] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 08/24/2015] [Accepted: 08/26/2015] [Indexed: 12/27/2022] Open
Abstract
Bovine respiratory disease (BRD) is the most common economically important disease affecting cattle. For developing accurate diagnostics that can predict disease susceptibility/resistance and stratification, it is necessary to identify the molecular mechanisms that underlie BRD. To study the complex interactions among the bovine host and the multitude of viral and bacterial pathogens, as well as the environmental factors associated with BRD etiology, genome-scale high-throughput functional genomics methods such as microarrays, RNA-seq, and proteomics are helpful. In this review, we summarize the progress made in our understanding of BRD using functional genomics approaches. We also discuss some of the available bioinformatics resources for analyzing high-throughput data, in the context of biological pathways and molecular interactions. Although resources for studying host response to infection are avail-able, the corresponding information is lacking for majority of BRD pathogens, impeding progress in identifying diagnostic signatures for BRD using functional genomics approaches.
Collapse
Affiliation(s)
- Aswathy N Rai
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, MS, USA
| | - William B Epperson
- Department of Pathobiology and Population Medicine, College of Veterinary Medicine, Mississippi State University, MS, USA
| | - Bindu Nanduri
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, MS, USA. ; Institute for Genomics, Biocomputing, and Biotechnology, Mississippi State University, MS, USA
| |
Collapse
|
42
|
Memišević V, Zavaljevski N, Rajagopala SV, Kwon K, Pieper R, DeShazer D, Reifman J, Wallqvist A. Mining host-pathogen protein interactions to characterize Burkholderia mallei infectivity mechanisms. PLoS Comput Biol 2015; 11:e1004088. [PMID: 25738731 PMCID: PMC4349708 DOI: 10.1371/journal.pcbi.1004088] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 12/15/2014] [Indexed: 01/01/2023] Open
Abstract
Burkholderia pathogenicity relies on protein virulence factors to control and promote bacterial internalization, survival, and replication within eukaryotic host cells. We recently used yeast two-hybrid (Y2H) screening to identify a small set of novel Burkholderia proteins that were shown to attenuate disease progression in an aerosol infection animal model using the virulent Burkholderia mallei ATCC 23344 strain. Here, we performed an extended analysis of primarily nine B. mallei virulence factors and their interactions with human proteins to map out how the bacteria can influence and alter host processes and pathways. Specifically, we employed topological analyses to assess the connectivity patterns of targeted host proteins, identify modules of pathogen-interacting host proteins linked to processes promoting infectivity, and evaluate the effect of crosstalk among the identified host protein modules. Overall, our analysis showed that the targeted host proteins generally had a large number of interacting partners and interacted with other host proteins that were also targeted by B. mallei proteins. We also introduced a novel Host-Pathogen Interaction Alignment (HPIA) algorithm and used it to explore similarities between host-pathogen interactions of B. mallei, Yersinia pestis, and Salmonella enterica. We inferred putative roles of B. mallei proteins based on the roles of their aligned Y. pestis and S. enterica partners and showed that up to 73% of the predicted roles matched existing annotations. A key insight into Burkholderia pathogenicity derived from these analyses of Y2H host-pathogen interactions is the identification of eukaryotic-specific targeted cellular mechanisms, including the ubiquitination degradation system and the use of the focal adhesion pathway as a fulcrum for transmitting mechanical forces and regulatory signals. This provides the mechanisms to modulate and adapt the host-cell environment for the successful establishment of host infections and intracellular spread. Burkholderia species need to manipulate many host processes and pathways in order to establish a successful intracellular infection in eukaryotic host organisms. Burkholderia mallei uses secreted virulence factor proteins as a means to execute host-pathogen interactions and promote pathogenesis. While validated virulence factor proteins have been shown to attenuate infection in animal models, their actual roles in modifying and influencing host processes are not well understood. Here, we used host-pathogen protein-protein interactions derived from yeast two-hybrid screens to study nine known B. mallei virulence factors and map out potential virulence mechanisms. From the data, we derived both general and specific insights into Burkholderia host-pathogen infectivity pathways. We showed that B. mallei virulence factors tended to target multifunctional host proteins, proteins that interacted with each other, and host proteins with a large number of interacting partners. We also identified similarities between host-pathogen interactions of B. mallei, Yersinia pestis, and Salmonella enterica using a novel host-pathogen interactions alignment algorithm. Importantly, our data are compatible with a framework in which multiple B. mallei virulence factors broadly influence key host processes related to ubiquitin-mediated proteolysis and focal adhesion. This provides B. mallei the means to modulate and adapt the host-cell environment to advance infection.
Collapse
Affiliation(s)
- Vesna Memišević
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
| | - Nela Zavaljevski
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
| | | | - Keehwan Kwon
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Rembert Pieper
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - David DeShazer
- Bacteriology Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, Maryland, United States of America
| | - Jaques Reifman
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
- * E-mail:
| | - Anders Wallqvist
- Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, Maryland, United States of America
| |
Collapse
|
43
|
Nourani E, Khunjush F, Durmuş S. Computational approaches for prediction of pathogen-host protein-protein interactions. Front Microbiol 2015; 6:94. [PMID: 25759684 PMCID: PMC4338785 DOI: 10.3389/fmicb.2015.00094] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 01/26/2015] [Indexed: 12/25/2022] Open
Abstract
Infectious diseases are still among the major and prevalent health problems, mostly because of the drug resistance of novel variants of pathogens. Molecular interactions between pathogens and their hosts are the key parts of the infection mechanisms. Novel antimicrobial therapeutics to fight drug resistance is only possible in case of a thorough understanding of pathogen-host interaction (PHI) systems. Existing databases, which contain experimentally verified PHI data, suffer from scarcity of reported interactions due to the technically challenging and time consuming process of experiments. These have motivated many researchers to address the problem by proposing computational approaches for analysis and prediction of PHIs. The computational methods primarily utilize sequence information, protein structure and known interactions. Classic machine learning techniques are used when there are sufficient known interactions to be used as training data. On the opposite case, transfer and multitask learning methods are preferred. Here, we present an overview of these computational approaches for predicting PHI systems, discussing their weakness and abilities, with future directions.
Collapse
Affiliation(s)
- Esmaeil Nourani
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University Shiraz, Iran
| | - Farshad Khunjush
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University Shiraz, Iran ; School of Computer Science, Institute for Research in Fundamental Sciences (IPM) Tehran, Iran
| | - Saliha Durmuş
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University Kocaeli, Turkey
| |
Collapse
|
44
|
Mei S, Zhu H. A novel one-class SVM based negative data sampling method for reconstructing proteome-wide HTLV-human protein interaction networks. Sci Rep 2015; 5:8034. [PMID: 25620466 PMCID: PMC5379509 DOI: 10.1038/srep08034] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 12/22/2014] [Indexed: 11/09/2022] Open
Abstract
Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.
Collapse
Affiliation(s)
- Suyu Mei
- 1] Software College, Shenyang Normal University, Shenyang, 110034, China [2] Bioinformatics Section, School of Biomedical Sciences, Southern Medical University, Guangzhou, 510515, China
| | - Hao Zhu
- Bioinformatics Section, School of Biomedical Sciences, Southern Medical University, Guangzhou, 510515, China
| |
Collapse
|
45
|
Jain S, Gitter A, Bar-Joseph Z. Multitask learning of signaling and regulatory networks with application to studying human response to flu. PLoS Comput Biol 2014; 10:e1003943. [PMID: 25522349 PMCID: PMC4270428 DOI: 10.1371/journal.pcbi.1003943] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/28/2014] [Indexed: 01/04/2023] Open
Abstract
Reconstructing regulatory and signaling response networks is one of the major goals of systems biology. While several successful methods have been suggested for this task, some integrating large and diverse datasets, these methods have so far been applied to reconstruct a single response network at a time, even when studying and modeling related conditions. To improve network reconstruction we developed MT-SDREM, a multi-task learning method which jointly models networks for several related conditions. In MT-SDREM, parameters are jointly constrained across the networks while still allowing for condition-specific pathways and regulation. We formulate the multi-task learning problem and discuss methods for optimizing the joint target function. We applied MT-SDREM to reconstruct dynamic human response networks for three flu strains: H1N1, H5N1 and H3N2. Our multi-task learning method was able to identify known and novel factors and genes, improving upon prior methods that model each condition independently. The MT-SDREM networks were also better at identifying proteins whose removal affects viral load indicating that joint learning can still lead to accurate, condition-specific, networks. Supporting website with MT-SDREM implementation: http://sb.cs.cmu.edu/mtsdrem To understand why some flu strains are more virulent than others, researchers attempt to profile and model the molecular human response to these strains and identify similarities and differences between the resulting models. So far, the modeling and analysis part has been done independently for each strain and the results contrasted in a post-processing step. Here we present a new method, termed MT-SDREM, that simultaneously models the response to all strains allowing us to identify both, the core response elements that are shared among the strains, and factors that are uniquely activated or repressed by individual strains. We applied this method to study the human response to three flu strains: H1N1, H3N2 and H5N1. As we show, the method was able to correctly identify several common and known factors regulating immune response to such strains and also identified unique factors for each of the strains. The models reconstructed by the simultaneous analysis method improved upon those generated by methods that model each strain response separately. Our joint models can be used to identify strain specific treatments as well as treatments that are likely to be effective against all three strains.
Collapse
Affiliation(s)
- Siddhartha Jain
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Anthony Gitter
- Microsoft Research, Cambridge, Massachusetts, United States of America
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
46
|
AdaBoost based multi-instance transfer learning for predicting proteome-wide interactions between Salmonella and human proteins. PLoS One 2014; 9:e110488. [PMID: 25330226 PMCID: PMC4212833 DOI: 10.1371/journal.pone.0110488] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Accepted: 09/19/2014] [Indexed: 11/23/2022] Open
Abstract
Pathogen-host protein-protein interaction (PPI) plays an important role in revealing the underlying pathogenesis of viruses and bacteria. The need of rapidly mapping proteome-wide pathogen-host interactome opens avenues for and imposes burdens on computational modeling. For Salmonella typhimurium, only 62 interactions with human proteins are reported to date, and the computational modeling based on such a small training data is prone to yield model overfitting. In this work, we propose a multi-instance transfer learning method to reconstruct the proteome-wide Salmonella-human PPI networks, wherein the training data is augmented by homolog knowledge transfer in the form of independent homolog instances. We use AdaBoost instance reweighting to counteract the noise from homolog instances, and deliberately design three experimental settings to validate the assumption that the homolog instances are effective to address the problems of data scarcity and data unavailability. The experimental results show that the proposed method outperforms the existing models and some predictions are validated by the findings from recent literature. Lastly, we conduct gene ontology based clustering analysis of the predicted networks to provide insights into the pathogenesis of Salmonella.
Collapse
|
47
|
Abstract
MOTIVATION Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. RESULTS Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. AVAILABILITY Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease.
Collapse
Affiliation(s)
- Nagarajan Natarajan
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| | - Inderjit S Dhillon
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
48
|
Mulder NJ, Akinola RO, Mazandu GK, Rapanoel H. Using biological networks to improve our understanding of infectious diseases. Comput Struct Biotechnol J 2014; 11:1-10. [PMID: 25379138 PMCID: PMC4212278 DOI: 10.1016/j.csbj.2014.08.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Infectious diseases are the leading cause of death, particularly in developing countries. Although many drugs are available for treating the most common infectious diseases, in many cases the mechanism of action of these drugs or even their targets in the pathogen remain unknown. In addition, the key factors or processes in pathogens that facilitate infection and disease progression are often not well understood. Since proteins do not work in isolation, understanding biological systems requires a better understanding of the interconnectivity between proteins in different pathways and processes, which includes both physical and other functional interactions. Such biological networks can be generated within organisms or between organisms sharing a common environment using experimental data and computational predictions. Though different data sources provide different levels of accuracy, confidence in interactions can be measured using interaction scores. Connections between interacting proteins in biological networks can be represented as graphs and edges, and thus studied using existing algorithms and tools from graph theory. There are many different applications of biological networks, and here we discuss three such applications, specifically applied to the infectious disease tuberculosis, with its causative agent Mycobacterium tuberculosis and host, Homo sapiens. The applications include the use of the networks for function prediction, comparison of networks for evolutionary studies, and the generation and use of host–pathogen interaction networks.
Collapse
Affiliation(s)
- Nicola J Mulder
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Richard O Akinola
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Gaston K Mazandu
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| | - Holifidy Rapanoel
- Computational Biology Group, Department of Clinical Laboratory Sciences, IDM, University of Cape Town Faculty of Health Sciences, Anzio Road, Observatory, Cape Town, South Africa
| |
Collapse
|
49
|
Mei S, Zhu H. Computational reconstruction of proteome-wide protein interaction networks between HTLV retroviruses and Homo sapiens. BMC Bioinformatics 2014; 15:245. [PMID: 25037487 PMCID: PMC4133621 DOI: 10.1186/1471-2105-15-245] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Accepted: 07/14/2014] [Indexed: 11/15/2022] Open
Abstract
Background Human T-cell leukemia viruses (HTLV) tend to induce some fatal human diseases like Adult T-cell Leukemia (ATL) by targeting human T lymphocytes. To indentify the protein-protein interactions (PPI) between HTLV viruses and Homo sapiens is one of the significant approaches to reveal the underlying mechanism of HTLV infection and host defence. At present, as biological experiments are labor-intensive and expensive, the identified part of the HTLV-human PPI networks is rather small. Although recent years have witnessed much progress in computational modeling for reconstructing pathogen-host PPI networks, data scarcity and data unavailability are two major challenges to be effectively addressed. To our knowledge, no computational method for proteome-wide HTLV-human PPI networks reconstruction has been reported. Results In this work we develop Multi-instance Adaboost method to conduct homolog knowledge transfer for computationally reconstructing proteome-wide HTLV-human PPI networks. In this method, the homolog knowledge in the form of gene ontology (GO) is treated as auxiliary homolog instance to address the problems of data scarcity and data unavailability, while the potential negative knowledge transfer is automatically attenuated by AdaBoost instance reweighting. The cross validation experiments show that the homolog knowledge transfer in the form of independent homolog instances can effectively enrich the feature information and substitute for the missing GO information. Moreover, the independent tests show that the method can validate 70.3% of the recently curated interactions, significantly exceeding the 2.1% recognition rate by the HT-Y2H experiment. We have used the method to reconstruct the proteome-wide HTLV-human PPI networks and further conducted gene ontology based clustering of the predicted networks for further biomedical research. The gene ontology based clustering analysis of the predictions provides much biological insight into the pathogenesis of HTLV retroviruses. Conclusions The Multi-instance AdaBoost method can effectively address the problems of data scarcity and data unavailability for the proteome-wide HTLV-human PPI interaction networks reconstruction. The gene ontology based clustering analysis of the predictions reveals some important signaling pathways and biological modules that HTLV retroviruses are likely to target. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-245) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Suyu Mei
- Bioinformatics Section, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.
| | | |
Collapse
|
50
|
Amaya M, Baer A, Voss K, Campbell C, Mueller C, Bailey C, Kehn-Hall K, Petricoin E, Narayanan A. Proteomic strategies for the discovery of novel diagnostic and therapeutic targets for infectious diseases. Pathog Dis 2014; 71:177-89. [PMID: 24488789 PMCID: PMC7108530 DOI: 10.1111/2049-632x.12150] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Revised: 01/18/2014] [Accepted: 01/23/2014] [Indexed: 12/14/2022] Open
Abstract
Viruses have developed numerous and elegant strategies to manipulate the host cell machinery to establish a productive infectious cycle. The interaction of viral proteins with host proteins plays an important role in infection and pathogenesis, often bypassing traditional host defenses such as the interferon response and apoptosis. Host–viral protein interactions can be studied using a variety of proteomic approaches ranging from genetic and biochemical to large‐scale high‐throughput technologies. Protein interactions between host and viral proteins are greatly influenced by host signal transduction pathways. In this review, we will focus on comparing proteomic information obtained through differing technologies and how their integration can be used to determine the functional aspect of the host response to infection. We will briefly review and evaluate techniques employed to elucidate viral–host interactions with a primary focus on Protein Microarrays (PMA) and Mass Spectrometry (MS) as potential tools in the discovery of novel therapeutic targets. As many potential molecular markers and targets are proteins, proteomic profiling is expected to yield both clearer and more direct answers to functional and pharmacologic questions.
Collapse
Affiliation(s)
- Moushimi Amaya
- National Center for Biodefense and Infectious Diseases, George Mason University, Manassas, VA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|