1
|
Sahragard R, Arabfard M, Ahmadi A, Najafi A. VHI-Pred: A Multi-Feature-Based Tool for Predicting Human-Virus Protein-Protein Interactions. Mol Biotechnol 2025:10.1007/s12033-025-01417-5. [PMID: 40186829 DOI: 10.1007/s12033-025-01417-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2024] [Accepted: 03/05/2025] [Indexed: 04/07/2025]
Abstract
Viral diseases pose a significant threat to public health, highlighting the importance of understanding protein-protein interactions between hosts and viruses for therapeutic development. However, this process is often expensive and time-consuming, especially given the rapid evolution of viruses. Machine learning algorithms and artificial intelligence have emerged as powerful tools for efficiently identifying these interactions. This study aims to develop a machine learning-based model to predict protein interactions between viral pathogens and human hosts while analyzing the factors influencing these interactions. The prediction model was constructed using three machine learning algorithms: Random Forest (RF), XGBoost (XGB), and Artificial Neural Networks (ANN). Each algorithm underwent rigorous testing. The modeling features included physicochemical properties, motifs, and amino acid sequences. Model performance was evaluated using fitness, accuracy, precision, sensitivity, and specificity metrics, with validation conducted via the K-fold method. The accuracy of the RF, XGB, and ANN models was 87%, 86%, and 86%, respectively. By integrating dimensionality reduction and clustering techniques, the accuracy of the RF model improved to 90%. Traditionally, studying host-pathogen interactions is labor intensive and costly. The integration of machine learning algorithms into this field significantly enhances the efficiency of analyzing viral pathogen-human host interactions. This study demonstrates the effectiveness of such an approach and provides valuable insights for future research. The results are accessible to researchers through a web application at http://vhi.sysbiomed.ir .
Collapse
Affiliation(s)
- Rasool Sahragard
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Masoud Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Ahmadi
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
2
|
Ying H, Wu X, Jia X, Yang Q, Liu H, Zhao H, Chen Z, Xu M, Wang T, Li M, Zhao Z, Zheng R, Wang S, Lin H, Xu Y, Lu J, Wang W, Ning G, Zheng J, Bi Y. Single-cell transcriptome-wide Mendelian randomization and colocalization reveals immune-mediated regulatory mechanisms and drug targets for COVID-19. EBioMedicine 2025; 113:105596. [PMID: 39933264 PMCID: PMC11867302 DOI: 10.1016/j.ebiom.2025.105596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 01/24/2025] [Accepted: 01/27/2025] [Indexed: 02/13/2025] Open
Abstract
BACKGROUND COVID-19 continues to show long-term impacts on our health. Limited effective immune-mediated antiviral drugs have been launched. METHODS We conducted a Mendelian randomization (MR) and colocalization analysis using 26,597 single-cell expression quantitative trait loci (sc-eQTL) to proxy effects of expressions of 16,597 genes in 14 peripheral blood immune cells and tested them against four COVID-19 outcomes from COVID-19 Genetic Housing Initiative GWAS meta-analysis Round 7. We also carried out additional validations including colocalization, linkage disequilibrium check and host-pathogen interactome predictions. We integrated MR findings with clinical trial evidence from several drug gene related databases to identify drugs with repurposing potential. Finally, we developed a tier system and identified immune-cell-based prioritized drug targets for COVID-19. FINDINGS We identified 132 putative causal genes in 14 immune cells (343 MR associations) for COVID-19, with 58 genes that were not reported previously. 145 (73%) gene-COVID-19 pairs showed effects on COVID-19 in only one immune cell type, which implied widespread immune-cell specific effects. For pathway analyses, we found the putative causal genes were enriched in natural killer (NK) recruiting cells but de-enriched in NK cells. Using a deep learning model, we found 107 (81%) of the putative causal genes (41 novel genes) were predicted to interact with SARS-COV-2 proteins. Integrating the above evidence with drug trial information, we developed a tier system and prioritized 37 drug targets for COVID-19. INTERPRETATION Our study showcased the central role of immune-mediated regulatory mechanisms for COVID-19 and prioritized drug targets that might inform interventions for viral infectious diseases. FUNDING This work was supported by grants from the National Key Research and Development Program of China (2022YFC2505203).
Collapse
Affiliation(s)
- Hui Ying
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xueyan Wu
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Xiaojing Jia
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qianqian Yang
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Haoyu Liu
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Huiling Zhao
- MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK
| | - Zhihe Chen
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Min Xu
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Tiange Wang
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Mian Li
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhiyun Zhao
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ruizhi Zheng
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shuangyuan Wang
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Hong Lin
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yu Xu
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jieli Lu
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Weiqing Wang
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Guang Ning
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jie Zheng
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; MRC Integrative Epidemiology Unit (IEU), Bristol Medical School, University of Bristol, Oakfield House, Oakfield Grove, Bristol, BS8 2BN, UK.
| | - Yufang Bi
- Department of Endocrine and Metabolic Diseases, Shanghai Institute of Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Shanghai National Clinical Research Center for Endocrine and Metabolic Diseases, Key Laboratory for Endocrine and Metabolic Diseases of the National Health Commission of the PR China, Shanghai National Center for Translational Medicine, Shanghai Digital Medicine Innovation Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
3
|
Duhan N, Kaundal R. AtSubP-2.0: An integrated web server for the annotation of Arabidopsis proteome subcellular localization using deep learning. THE PLANT GENOME 2025; 18:e20536. [PMID: 39924294 PMCID: PMC11807733 DOI: 10.1002/tpg2.20536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 09/22/2024] [Accepted: 10/17/2024] [Indexed: 02/11/2025]
Abstract
The organization of subcellular components in a cell is critical for its function and studying cellular processes, protein-protein interactions, identifying potential drug targets, network analysis, and other systems biology mechanisms. Determining protein localization experimentally is time-consuming and expensive. Due to the need for meticulous experimentation, validation, and data analysis, computational methods provide a quick and accurate alternative. Arabidopsis thaliana, a beneficial model organism in plant biology, facilitates experimentation and applies to other plants. Predicting its proteins' subcellular localization can improve our understanding of cellular processes and have applications in crop improvement and biotechnology. We propose AtSubP-2.0, an extension of our previously developed and widely used AtSubP v1.0 tool for annotating the Arabidopsis proteome. For precise protein subcellular localization prediction, AtSubP-2.0 employs a four-phase strategy. The first phase differentiates between single and dual localization with accuracy (97.66% in fivefold training/testing, 98.10% on independent data) and high Matthews correlation coefficient (0.88 training, 0.90 independent). Single localized proteins are classified into 12 locations at the second phase, with accuracy (98.37% in fivefold training/testing, 97.43% on independent data) and Matthews correlation coefficient (0.94 training, 0.91 independent). The third phase categorizes dual location proteins into nine classes with accuracy (99.65% in fivefold training/testing, 98.16% on independent data) and Matthews correlation coefficient (0.92 training, 0.87 independent). We also employed a fourth phase that classifies the membrane type proteins predicted in phase I into single-pass and multi-pass membrane with accuracy (98% in fivefold training/testing, 98.55% on independent data) and a high Matthews correlation coefficient (0.95 training, 0.97 independent). A web-based prediction server has been implemented for community use and is freely available at https://kaabil.net/AtSubP2/, including a standalone version. AtSubP2 will help researchers to better understand organelle-specific functions, cellular processes, and regulatory mechanisms important for plant growth, development, and response to environmental stimuli.
Collapse
Affiliation(s)
- Naveen Duhan
- Bioinformatics Facility, Center for Integrated BioSystemsUtah State UniversityLoganUtahUSA
- Department of Plants, Soils, and Climate, College of Agriculture and Applied ScienceUtah State UniversityLoganUtahUSA
| | - Rakesh Kaundal
- Bioinformatics Facility, Center for Integrated BioSystemsUtah State UniversityLoganUtahUSA
- Department of Plants, Soils, and Climate, College of Agriculture and Applied ScienceUtah State UniversityLoganUtahUSA
- Department of Computer Science, College of ScienceUtah State UniversityLoganUtahUSA
| |
Collapse
|
4
|
Chen H, Liu J, Tang G, Hao G, Yang G. Bioinformatic Resources for Exploring Human-virus Protein-protein Interactions Based on Binding Modes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae075. [PMID: 39404802 PMCID: PMC11658832 DOI: 10.1093/gpbjnl/qzae075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 10/05/2024] [Accepted: 10/11/2024] [Indexed: 12/21/2024]
Abstract
Historically, there have been many outbreaks of viral diseases that have continued to claim millions of lives. Research on human-virus protein-protein interactions (PPIs) is vital to understanding the principles of human-virus relationships, providing an essential foundation for developing virus control strategies to combat diseases. The rapidly accumulating data on human-virus PPIs offer unprecedented opportunities for bioinformatics research around human-virus PPIs. However, available detailed analyses and summaries to help use these resources systematically and efficiently are lacking. Here, we comprehensively review the bioinformatic resources used in human-virus PPI research, and discuss and compare their functions, performance, and limitations. This review aims to provide researchers with a bioinformatic toolbox that will hopefully better facilitate the exploration of human-virus PPIs based on binding modes.
Collapse
Affiliation(s)
- Huimin Chen
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Jiaxin Liu
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Gege Tang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| | - Gefei Hao
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
- State Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Guangfu Yang
- State Key Laboratory of Green Pesticide, International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
5
|
Ahmad EM, Abdelsamad A, El-Shabrawi HM, El-Awady MAM, Aly MAM, El-Soda M. In-silico identification of putatively functional intergenic small open reading frames in the cucumber genome and their predicted response to biotic and abiotic stresses. PLANT, CELL & ENVIRONMENT 2024; 47:5330-5342. [PMID: 39189930 DOI: 10.1111/pce.15104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 07/13/2024] [Accepted: 08/10/2024] [Indexed: 08/28/2024]
Abstract
The availability of high-throughput sequencing technologies increased our understanding of different genomes. However, the genomes of all living organisms still have many unidentified coding sequences. The increased number of missing small open reading frames (sORFs) is due to the length threshold used in most gene identification tools, which is true in the genic and, more importantly and surprisingly, in the intergenic regions. Scanning the cucumber genome intergenic regions revealed 420 723 sORF. We excluded 3850 sORF with similarities to annotated cucumber proteins. To propose the functionality of the remaining 416 873 sORF, we calculated their codon adaptation index (CAI). We found 398 937 novel sORF (nsORF) with CAI ≥ 0.7 that were further used for downstream analysis. Searching against the Rfam database revealed 109 nsORFs similar to multiple RNA families. Using SignalP-5.0 and NLS, identified 11 592 signal peptides. Five predicted proteins interacting with Meloidogyne incognita and Powdery mildew proteins were selected using published transcriptome data of host-pathogen interactions. Gene ontology enrichment interpreted the function of those proteins, illustrating that nsORFs' expression could contribute to the cucumber's response to biotic and abiotic stresses. This research highlights the importance of previously overlooked nsORFs in the cucumber genome and provides novel insights into their potential functions.
Collapse
Affiliation(s)
- Esraa M Ahmad
- Department of Genetics, Faculty of Agriculture, Cairo University, Giza, Egypt
| | - Ahmed Abdelsamad
- Department of Genetics, Faculty of Agriculture, Cairo University, Giza, Egypt
| | - Hattem M El-Shabrawi
- Plant Biotechnology Department, Genetic Engineering & Biotechnology Division, National Research Center, Giza, Egypt
| | | | - Mohammed A M Aly
- Department of Genetics, Faculty of Agriculture, Cairo University, Giza, Egypt
| | - Mohamed El-Soda
- Department of Genetics, Faculty of Agriculture, Cairo University, Giza, Egypt
| |
Collapse
|
6
|
Tahir ul Qamar M, Noor F, Guo YX, Zhu XT, Chen LL. Deep-HPI-pred: An R-Shiny applet for network-based classification and prediction of Host-Pathogen protein-protein interactions. Comput Struct Biotechnol J 2024; 23:316-329. [PMID: 38192372 PMCID: PMC10772389 DOI: 10.1016/j.csbj.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Revised: 12/11/2023] [Accepted: 12/12/2023] [Indexed: 01/10/2024] Open
Abstract
Host-pathogen interactions (HPIs) are vital in numerous biological activities and are intrinsically linked to the onset and progression of infectious diseases. HPIs are pivotal in the entire lifecycle of diseases: from the onset of pathogen introduction, navigating through the mechanisms that bypass host cellular defenses, to its subsequent proliferation inside the host. At the heart of these stages lies the synergy of proteins from both the host and the pathogen. By understanding these interlinking protein dynamics, we can gain crucial insights into how diseases progress and pave the way for stronger plant defenses and the swift formulation of countermeasures. In the framework of current study, we developed a web-based R/Shiny app, Deep-HPI-pred, that uses network-driven feature learning method to predict the yet unmapped interactions between pathogen and host proteins. Leveraging citrus and CLas bacteria training datasets as case study, we spotlight the effectiveness of Deep-HPI-pred in discerning Protein-protein interaction (PPIs) between them. Deep-HPI-pred use Multilayer Perceptron (MLP) models for HPI prediction, which is based on a comprehensive evaluation of topological features and neural network architectures. When subjected to independent validation datasets, the predicted models consistently surpassed a Matthews correlation coefficient (MCC) of 0.80 in host-pathogen interactions. Remarkably, the use of Eigenvector Centrality as the leading topological feature further enhanced this performance. Further, Deep-HPI-pred also offers relevant gene ontology (GO) term information for each pathogen and host protein within the system. This protein annotation data contributes an additional layer to our understanding of the intricate dynamics within host-pathogen interactions. In the additional benchmarking studies, the Deep-HPI-pred model has proven its robustness by consistently delivering reliable results across different host-pathogen systems, including plant-pathogens (accuracy of 98.4% and 97.9%), human-virus (accuracy of 94.3%), and animal-bacteria (accuracy of 96.6%) interactomes. These results not only demonstrate the model's versatility but also pave the way for gaining comprehensive insights into the molecular underpinnings of complex host-pathogen interactions. Taken together, the Deep-HPI-pred applet offers a unified web service for both identifying and illustrating interaction networks. Deep-HPI-pred applet is freely accessible at its homepage: https://cbi.gxu.edu.cn/shiny-apps/Deep-HPI-pred/ and at github: https://github.com/tahirulqamar/Deep-HPI-pred.
Collapse
Affiliation(s)
- Muhammad Tahir ul Qamar
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Fatima Noor
- Integrative Omics and Molecular Modeling Laboratory, Department of Bioinformatics and Biotechnology, Government College University Faisalabad (GCUF), Faisalabad 38000, Pakistan
| | - Yi-Xiong Guo
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xi-Tong Zhu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Ling-Ling Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| |
Collapse
|
7
|
Hu Y, Wang Y, Hu X, Chao H, Li S, Ni Q, Zhu Y, Hu Y, Zhao Z, Chen M. T4SEpp: A pipeline integrating protein language models to predict bacterial type IV secreted effectors. Comput Struct Biotechnol J 2024; 23:801-812. [PMID: 38328004 PMCID: PMC10847861 DOI: 10.1016/j.csbj.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 01/20/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024] Open
Abstract
Many pathogenic bacteria use type IV secretion systems (T4SSs) to deliver effectors (T4SEs) into the cytoplasm of eukaryotic cells, causing diseases. The identification of effectors is a crucial step in understanding the mechanisms of bacterial pathogenicity, but this remains a major challenge. In this study, we used the full-length embedding features generated by six pre-trained protein language models to train classifiers predicting T4SEs and compared their performance. We integrated three modules into a model called T4SEpp. The first module searched for full-length homologs of known T4SEs, signal sequences, and effector domains; the second module fine-tuned a machine learning model using data for a signal sequence feature; and the third module used the three best-performing pre-trained protein language models. T4SEpp outperformed other state-of-the-art (SOTA) software tools, achieving ∼0.98 accuracy at a high specificity of ∼0.99, based on the assessment of an independent validation dataset. T4SEpp predicted 13 T4SEs from Helicobacter pylori, including the well-known CagA and 12 other potential ones, among which eleven could potentially interact with human proteins. This suggests that these potential T4SEs may be associated with the pathogenicity of H. pylori. Overall, T4SEpp provides a better solution to assist in the identification of bacterial T4SEs and facilitates studies of bacterial pathogenicity. T4SEpp is freely accessible at https://bis.zju.edu.cn/T4SEpp.
Collapse
Affiliation(s)
- Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
- Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen, China
| | - Xiaotian Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Haoyu Chao
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Sida Li
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Qinyang Ni
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yanyan Zhu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
- Institute of Hematology, Zhejiang University School of Medicine, The First Affiliated Hospital, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
8
|
Zhang Y, Thomas JP, Korcsmaros T, Gul L. Integrating multi-omics to unravel host-microbiome interactions in inflammatory bowel disease. Cell Rep Med 2024; 5:101738. [PMID: 39293401 PMCID: PMC11525031 DOI: 10.1016/j.xcrm.2024.101738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/11/2024] [Accepted: 08/21/2024] [Indexed: 09/20/2024]
Abstract
The gut microbiome is crucial for nutrient metabolism, immune regulation, and intestinal homeostasis with changes in its composition linked to complex diseases like inflammatory bowel disease (IBD). Although the precise host-microbial mechanisms in disease pathogenesis remain unclear, high-throughput sequencing have opened new ways to unravel the role of interspecies interactions in IBD. Systems biology-a holistic computational framework for modeling complex biological systems-is critical for leveraging multi-omics datasets to identify disease mechanisms. This review highlights the significance of multi-omics data in IBD research and provides an overview of state-of-the-art systems biology resources and computational tools for data integration. We explore gaps, challenges, and future directions in the research field aiming to uncover novel biomarkers and therapeutic targets, ultimately advancing personalized treatment strategies. While focusing on IBD, the proposed approaches are applicable for other complex diseases, like cancer, and neurodegenerative diseases, where the microbiome has also been implicated.
Collapse
Affiliation(s)
- Yiran Zhang
- Department of Surgery & Cancer, Imperial College London, London W12 0NN, UK; Department of Metabolism, Digestion and Reproduction, Imperial College London, London W12 0NN, UK
| | - John P Thomas
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London W12 0NN, UK; UKRI MRC Laboratory of Medical Sciences, Hammersmith Hospital Campus, London W12 0HS, UK
| | - Tamas Korcsmaros
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London W12 0NN, UK; NIHR Imperial BRC Organoid Facility, Imperial College London, London W12 0NN, UK; Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK.
| | - Lejla Gul
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London W12 0NN, UK; Quadram Institute Bioscience, Norwich Research Park, Norwich NR4 7UQ, UK
| |
Collapse
|
9
|
Shakibania T, Arabfard M, Najafi A. A predictive approach for host-pathogen interactions using deep learning and protein sequences. Virusdisease 2024; 35:434-445. [PMID: 39464732 PMCID: PMC11502655 DOI: 10.1007/s13337-024-00882-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 07/03/2024] [Indexed: 10/29/2024] Open
Abstract
Research on host-pathogen interactions (HPIs) has evolved rapidly during the past decades. The more humans discover new pathogens, the more challenging it gets to find a cure and prevent infections caused by those pathogens. Many experimental techniques have been proposed to predict the interactions but most of them are highly costly and time-consuming. Fortunately, computational methods have been proven to be efficient in overcoming such limitations. In this study, we propose utilizing Deep Learning methods to predict HPIs using protein sequences. We use the monoMonoKGap (mMKGap) algorithm with K = 2 to extract features from the sequences. We also used the Negatome Database to generate negative interactions. The proposed method was performed on three separate balanced human-pathogen datasets with 10-fold cross-validation. Our method yielded very high accuracies of 99.65%, 99.52%, and 99.66% (mean accuracy of 99.61%). To further evaluate the performance of the deep Network, we compared it with other classification methods, which were the Random Forest (RF) as multiple Decision Tree, the Support Vector Machine (SVM), and Convolutional Neural Network (CNN). We also tested the Dipeptide Composition algorithm as another feature extraction method to compare the results with the mMKGap method. The experimental results prove that the proposed method is very accurate, robust, and practical and could be used as a reliable framework in HPI research.
Collapse
Affiliation(s)
- Taha Shakibania
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Masoud Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Ali Najafi
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| |
Collapse
|
10
|
Yang X, Wuchty S, Liang Z, Ji L, Wang B, Zhu J, Zhang Z, Dong Y. Multi-modal features-based human-herpesvirus protein-protein interaction prediction by using LightGBM. Brief Bioinform 2024; 25:bbae005. [PMID: 38279649 PMCID: PMC10818167 DOI: 10.1093/bib/bbae005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/25/2023] [Accepted: 01/01/2021] [Indexed: 01/28/2024] Open
Abstract
The identification of human-herpesvirus protein-protein interactions (PPIs) is an essential and important entry point to understand the mechanisms of viral infection, especially in malignant tumor patients with common herpesvirus infection. While natural language processing (NLP)-based embedding techniques have emerged as powerful approaches, the application of multi-modal embedding feature fusion to predict human-herpesvirus PPIs is still limited. Here, we established a multi-modal embedding feature fusion-based LightGBM method to predict human-herpesvirus PPIs. In particular, we applied document and graph embedding approaches to represent sequence, network and function modal features of human and herpesviral proteins. Training our LightGBM models through our compiled non-rigorous and rigorous benchmarking datasets, we obtained significantly better performance compared to individual-modal features. Furthermore, our model outperformed traditional feature encodings-based machine learning methods and state-of-the-art deep learning-based methods using various benchmarking datasets. In a transfer learning step, we show that our model that was trained on human-herpesvirus PPI dataset without cytomegalovirus data can reliably predict human-cytomegalovirus PPIs, indicating that our method can comprehensively capture multi-modal fusion features of protein interactions across various herpesvirus subtypes. The implementation of our method is available at https://github.com/XiaodiYangpku/MultimodalPPI/.
Collapse
Affiliation(s)
- Xiaodi Yang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Miami FL, 33146, USA
- Department of Biology, University of Miami, Miami FL, 33146, USA
- Institute of Data Science and Computation, University of Miami, Miami, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
| | - Zeyin Liang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Li Ji
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Bingjie Wang
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Jialin Zhu
- Department of Hematology, Peking University First Hospital, Beijing, China
| | - Ziding Zhang
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yujun Dong
- Department of Hematology, Peking University First Hospital, Beijing, China
| |
Collapse
|
11
|
Zhao Z, Hu Y, Hu Y, White AP, Wang Y. Features and algorithms: facilitating investigation of secreted effectors in Gram-negative bacteria. Trends Microbiol 2023; 31:1162-1178. [PMID: 37349207 DOI: 10.1016/j.tim.2023.05.011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 05/22/2023] [Accepted: 05/22/2023] [Indexed: 06/24/2023]
Abstract
Gram-negative bacteria deliver effector proteins through type III, IV, or VI secretion systems (T3SSs, T4SSs, and T6SSs) into host cells, causing infections and diseases. In general, effector proteins for each of these distinct secretion systems lack homology and are difficult to identify. Sequence analysis has disclosed many common features, helping us to understand the evolution, function, and secretion mechanisms of the effectors. In combination with various algorithms, the known common features have facilitated accurate prediction of new effectors. Ensemblers or integrated pipelines achieve a better prediction of performance, which combines multiple computational models or modules with multidimensional features. Natural language processing (NLP) models also show the merits, which could enable discovery of novel features and, in turn, facilitate more precise effector prediction, extending our knowledge about each secretion mechanism.
Collapse
Affiliation(s)
- Ziyi Zhao
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China
| | - Yixue Hu
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China
| | - Yueming Hu
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China
| | - Aaron P White
- Vaccine and Infectious Disease Organization (VIDO), University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Yejun Wang
- Youth Innovation Team of Medical Bioinformatics, Shenzhen University Medical School, Shenzhen 518060, China; Department of Cell Biology and Genetics, College of Basic Medicine, Shenzhen University Medical School, Shenzhen 518060, China.
| |
Collapse
|
12
|
Macho Rendón J, Rebollido-Ríos R, Torrent Burgas M. HPIPred: Host-pathogen interactome prediction with phenotypic scoring. Comput Struct Biotechnol J 2022; 20:6534-6542. [PMID: 36514317 PMCID: PMC9718936 DOI: 10.1016/j.csbj.2022.11.026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 11/09/2022] [Accepted: 11/10/2022] [Indexed: 11/22/2022] Open
Abstract
Protein-protein interactions (PPIs) are involved in most cellular processes. Unfortunately, current knowledge of host-pathogen interactomes is still very limited. Experimental methods used to detect PPIs have several limitations, including increasing complexity and economic cost in large-scale screenings. Hence, computational methods are commonly used to support experimental data, although they generally suffer from high false-positive rates. To address this issue, we have created HPIPred, a host-pathogen PPI prediction tool based on numerical encoding of physicochemical properties. Unlike other available methods, HPIPred integrates phenotypic data to prioritize biologically meaningful results. We used HPIPred to screen the entire Homo sapiens and Pseudomonas aeruginosa PAO1 proteomes to generate a host-pathogen interactome with 763 interactions displaying a highly connected network topology. Our predictive model can be used to prioritize protein-protein interactions as potential targets for antibacterial drug development. Available at: https://github.com/SysBioUAB/hpi_predictor.
Collapse
|
13
|
Kumar S, Kumar GS, Maitra SS, Malý P, Bharadwaj S, Sharma P, Dwivedi VD. Viral informatics: bioinformatics-based solution for managing viral infections. Brief Bioinform 2022; 23:6659740. [PMID: 35947964 DOI: 10.1093/bib/bbac326] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/26/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open
Abstract
Several new viral infections have emerged in the human population and establishing as global pandemics. With advancements in translation research, the scientific community has developed potential therapeutics to eradicate or control certain viral infections, such as smallpox and polio, responsible for billions of disabilities and deaths in the past. Unfortunately, some viral infections, such as dengue virus (DENV) and human immunodeficiency virus-1 (HIV-1), are still prevailing due to a lack of specific therapeutics, while new pathogenic viral strains or variants are emerging because of high genetic recombination or cross-species transmission. Consequently, to combat the emerging viral infections, bioinformatics-based potential strategies have been developed for viral characterization and developing new effective therapeutics for their eradication or management. This review attempts to provide a single platform for the available wide range of bioinformatics-based approaches, including bioinformatics methods for the identification and management of emerging or evolved viral strains, genome analysis concerning the pathogenicity and epidemiological analysis, computational methods for designing the viral therapeutics, and consolidated information in the form of databases against the known pathogenic viruses. This enriched review of the generally applicable viral informatics approaches aims to provide an overview of available resources capable of carrying out the desired task and may be utilized to expand additional strategies to improve the quality of translation viral informatics research.
Collapse
Affiliation(s)
- Sanjay Kumar
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | - Geethu S Kumar
- Department of Life Science, School of Basic Science and Research, Sharda University, Greater Noida, Uttar Pradesh, India.,Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India
| | | | - Petr Malý
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Shiv Bharadwaj
- Laboratory of Ligand Engineering, Institute of Biotechnology of the Czech Academy of Sciences v.v.i., BIOCEV Research Center, Vestec, Czech Republic
| | - Pradeep Sharma
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Vivek Dhar Dwivedi
- Center for Bioinformatics, Computational and Systems Biology, Pathfinder Research and Training Foundation, Greater Noida, India.,Institute of Advanced Materials, IAAM, 59053 Ulrika, Sweden
| |
Collapse
|