1
|
Yılmaz S, Yorgancioglu K, Koyutürk M. Bias-aware training and evaluation of link prediction algorithms in network biology. Proc Natl Acad Sci U S A 2025; 122:e2416646122. [PMID: 40493194 DOI: 10.1073/pnas.2416646122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 05/02/2025] [Indexed: 06/12/2025] Open
Abstract
For biomedical applications, new link prediction algorithms are continuously being developed. These algorithms are typically evaluated computationally, using test sets generated by sampling the edges uniformly at random. However, as we demonstrate, this evaluation approach introduces a bias toward "rich nodes," i.e., those with higher degrees in the network. More concerningly, this bias persists even when different network snapshots are used for evaluation, as recommended in the machine learning community. This creates a cycle in research where newly developed algorithms generate more knowledge on well-studied biological entities while understudied entities are commonly overlooked. To overcome this issue, we propose a weighted validation setting specifically focusing on low-degree nodes and present AWARE strategies to facilitate bias-aware training and evaluation of link prediction algorithms. These strategies can help researchers gain better insights from computational evaluations and promote the development of new algorithms focusing on novel findings and understudied proteins.
Collapse
Affiliation(s)
- Serhan Yılmaz
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106
| | - Kaan Yorgancioglu
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106
| | - Mehmet Koyutürk
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH 44106
| |
Collapse
|
2
|
Xia R, Li W, Cheng Y, Xie L, Xu X. Molecular surfaces modeling: Advancements in deep learning for molecular interactions and predictions. Biochem Biophys Res Commun 2025; 763:151799. [PMID: 40239539 DOI: 10.1016/j.bbrc.2025.151799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Revised: 03/20/2025] [Accepted: 04/10/2025] [Indexed: 04/18/2025]
Abstract
Molecular surface analysis can provide a high-dimensional, rich representation of molecular properties and interactions, which is crucial for enabling powerful predictive modeling and rational molecular design across diverse scientific and technological domains. With remarkable successes achieved by artificial intelligence (AI) in different fields such as computer vision and natural language processing, there is a growing imperative to harness AI's potential in accelerating molecular discovery and innovation. The integration of AI techniques with molecular surface analysis has opened up new frontiers, allowing researchers to uncover hidden patterns, relationships, and design principles that were previously elusive. By leveraging the complementary strengths of molecular surface representations and advanced AI algorithms, scientists can now explore chemical space more efficiently, optimize molecular properties with greater precision, and drive transformative advancements in areas like drug development, materials engineering, and catalysis. In this review, we aim to provide an overview of recent advancements in the field of molecular surface analysis and its integration with AI techniques. These AI-driven approaches have led to significant advancements in various downstream tasks, including interface site prediction, protein-protein interaction prediction, surface-centric molecular generation and design.
Collapse
Affiliation(s)
- Renjie Xia
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Wei Li
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China
| | - Yi Cheng
- College of Engineering, Lishui University, Lishui, 323000, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China.
| | - Xiaojun Xu
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, China.
| |
Collapse
|
3
|
Mei H, Wang Z, Yang H, Li X, Xu Y. Network analysis of multivariate time series data in biological systems: methods and applications. Brief Bioinform 2025; 26:bbaf223. [PMID: 40401349 PMCID: PMC12096012 DOI: 10.1093/bib/bbaf223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2025] [Revised: 04/17/2025] [Accepted: 04/30/2025] [Indexed: 05/23/2025] Open
Abstract
Network analysis has become an essential tool in biological and biomedical research, providing insights into complex biological mechanisms. Since biological systems are inherently time-dependent, incorporating time-varying methods is crucial for capturing temporal changes, adaptive interactions, and evolving dependencies within networks. Our study explores key time-varying methodologies for network structure estimation and network inference based on observed structures. We begin by discussing approaches for estimating network structures from data, focusing on the time-varying Gaussian graphical model, dynamic Bayesian network, and vector autoregression-based causal analysis. Next, we examine analytical techniques that leverage pre-specified or observed networks, including other autoregression-based methods and latent variable models. Furthermore, we explore practical applications and computational tools designed for these methods. By synthesizing these approaches, our study provides a comprehensive evaluation of their strengths and limitations in the context of biological data analysis.
Collapse
Affiliation(s)
- Hao Mei
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Zhiyuan Wang
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Hang Yang
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Xiaoke Li
- Center for Applied Statistics, School of Statistics, Institute of Health Data Science, Renmin University of China, 59 Zhongguancun Street, 100872 Beijing, China
| | - Yaqing Xu
- Department of Epidemiology and Biostatistics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 227 South Chongqing Road, 200025 Shanghai, China
| |
Collapse
|
4
|
Cao MY, Zainudin S, Daud KM. Feature fusion with attributed deepwalk for protein-protein interaction prediction. Sci Rep 2025; 15:12255. [PMID: 40210917 PMCID: PMC11985984 DOI: 10.1038/s41598-025-96510-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 03/28/2025] [Indexed: 04/12/2025] Open
Abstract
protein-protein interactions (PPIs) are crucial for understanding cellular processes and disease mechanisms. While experimental methods for detecting PPIs exist, computational approaches offer a more efficient alternative. However, current computational methods often rely on single feature types or simple feature concatenation, potentially missing the complex nature of protein interactions. This study proposes FFADW (Feature Fusion Method with Attributed DeepWalk), a novel approach that integrates sequence and network features using a weighted fusion strategy controlled by an adjustable α parameter. Specifically, sequence similarity is computed using Levenshtein distance, while network similarity is measured via a Gaussian kernel-based approach. These complementary features are fused through the weighting mechanism before being processed by the Attributed DeepWalk algorithm, which enhances protein representations by learning low-dimensional embeddings. The fused representations are then used to train classifiers for PPI prediction. Evaluation across three datasets using multiple classifiers demonstrated that FFADW significantly improves sample clustering and performs better than existing approaches, with the XGBoost classifier showing the best results. The weighted fusion approach effectively combines different aspects of protein data while reducing noise and redundancy, offering an improved technique for computational PPI prediction.
Collapse
Affiliation(s)
- Mei-Yuan Cao
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia.
| | - Suhaila Zainudin
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
| |
Collapse
|
5
|
Meng L, Wei L, Wu R. MVGNN-PPIS: A novel multi-view graph neural network for protein-protein interaction sites prediction based on Alphafold3-predicted structures and transfer learning. Int J Biol Macromol 2025; 300:140096. [PMID: 39848362 DOI: 10.1016/j.ijbiomac.2025.140096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 01/04/2025] [Accepted: 01/17/2025] [Indexed: 01/25/2025]
Abstract
Protein-protein interactions (PPI) are crucial for understanding numerous biological processes and pathogenic mechanisms. Identifying interaction sites is essential for biomedical research and targeted drug development. Compared to experimental methods, accurate computational approaches for protein-protein interaction sites (PPIS) prediction can save significant time and costs. In this study, we propose a novel model named MVGNN-PPIS. To the best of our knowledge, it is the first to utilize predicted structures generated by AlphaFold3, and combined with transfer learning techniques, for predicting PPIS. This approach addresses the limitations of traditional methods that depend on native protein structures and multiple sequence alignments (MSA). Additionally, we introduced a multi-view graph framework based on two types of graph structures: the k-nearest neighbor graph and the adjacency matrix. By alternately employing a Graph Transformer and Graph Convolutional Networks (GCN) to aggregate node information, this framework effectively captures both local and global dependencies of each residue in the predicted structures, thereby significantly enhancing the model's sensitivity to binding sites. This framework further integrates direction, distances and angular information between the 3D coordinates of side-chain atom centroids to construct a relative coordinate system, generating enhanced edge features that ensure the model's equivariance to molecular translations and rotations in space. During training, the Focal Loss function is employed to effectively address the class imbalance in the dataset. Experimental results demonstrate that MVGNN outperforms the current state-of-the-art methods across multiple PPIS benchmark datasets. To further validate the model's generalization capability, we extended MVGNN to the domain of predicting protein-nucleic acid interaction sites, where it also achieved superior performance.
Collapse
Affiliation(s)
- Lu Meng
- College of Information Science and Engineering, Northeastern University, China.
| | - Lishuai Wei
- College of Information Science and Engineering, Northeastern University, China
| | - Rina Wu
- College of Information Science and Engineering, Northeastern University, China
| |
Collapse
|
6
|
Dey L, Chakraborty S. Supervised learning approaches for predicting Ebola-Human Protein-Protein interactions. Gene 2025; 942:149228. [PMID: 39828063 DOI: 10.1016/j.gene.2025.149228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 12/04/2024] [Accepted: 01/07/2025] [Indexed: 01/22/2025]
Abstract
The goal of this research work is to predict protein-protein interactions (PPIs) between the Ebola virus and the host who is at risk of infection. Since there are very limited databases available on the Ebola virus; we have prepared a comprehensive database of all the PPIs between the Ebola virus and human proteins (EbolaInt). Our work focuses on the finding of some new protein-protein interactions between humans and the Ebola virus using some state- of-the-arts machine learning techniques. However, it is basically a two-class problem with a positive interacting dataset and a negative non-interacting dataset. These datasets contain various sequence-based human protein features such as structure of amino acid and conjoint triad and domain-related features. In this research, we have briefly discussed and used some well-known supervised learning approaches to predict PPIs between human proteins and Ebola virus proteins, including K-nearest neighbours (KNN), random forest (RF), support vector machine (SVM), and deep feed-forward multi-layer perceptron (DMLP) etc. We have validated our prediction results using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Our goal with this prediction is to compare all other models' accuracy, precision, recall, and f1-score for predicting these PPIs. In the result section, DMLP is giving the highest accuracy along with the prediction of 2655 potential human target proteins.
Collapse
Affiliation(s)
- Lopamudra Dey
- Department of Biomedical and Clinical Sciences, Linköping University, Sweden; Department of Computer Science & Engineering, Meghnad Saha Institute of Technology, Kolkata, India
| | - Sanjay Chakraborty
- Department of Computer and Information Science (IDA), REAL, AIICS, Linköping University, Sweden; Department of Computer Science & Engineering, Techno International New Town, Kolkata, India.
| |
Collapse
|
7
|
Yang G, Liu Y, Wen S, Chen W, Zhu X, Wang Y. DTI-MHAPR: optimized drug-target interaction prediction via PCA-enhanced features and heterogeneous graph attention networks. BMC Bioinformatics 2025; 26:11. [PMID: 39800678 PMCID: PMC11726937 DOI: 10.1186/s12859-024-06021-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 12/20/2024] [Indexed: 01/16/2025] Open
Abstract
Drug-target interactions (DTIs) are pivotal in drug discovery and development, and their accurate identification can significantly expedite the process. Numerous DTI prediction methods have emerged, yet many fail to fully harness the feature information of drugs and targets or address the issue of feature redundancy. We aim to refine DTI prediction accuracy by eliminating redundant features and capitalizing on the node topological structure to enhance feature extraction. To achieve this, we introduce a PCA-augmented multi-layer heterogeneous graph-based network that concentrates on key features throughout the encoding-decoding phase. Our approach initiates with the construction of a heterogeneous graph from various similarity metrics, which is then encoded via a graph neural network. We concatenate and integrate the resultant representation vectors to merge multi-level information. Subsequently, principal component analysis is applied to distill the most informative features, with the random forest algorithm employed for the final decoding of the integrated data. Our method outperforms six baseline models in terms of accuracy, as demonstrated by extensive experimentation. Comprehensive ablation studies, visualization of results, and in-depth case analyses further validate our framework's efficacy and interpretability, providing a novel tool for drug discovery that integrates multimodal features.
Collapse
Affiliation(s)
- Guang Yang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Yinbo Liu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Sijian Wen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Wenxi Chen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China
| | - Yongmei Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Changjiang West Road, Hefei, 230036, Anhui, China.
| |
Collapse
|
8
|
Zhao BW, Su XR, Yang Y, Li DX, Li GD, Hu PW, Luo X, Hu L. A heterogeneous information network learning model with neighborhood-level structural representation for predicting lncRNA-miRNA interactions. Comput Struct Biotechnol J 2024; 23:2924-2933. [PMID: 39963422 PMCID: PMC11832017 DOI: 10.1016/j.csbj.2024.06.032] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 06/13/2024] [Accepted: 06/23/2024] [Indexed: 02/20/2025] Open
Abstract
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) are closely related to the treatment of human diseases. Traditional biological experiments often require time-consuming and labor-intensive in their search for mechanisms of disease. Computational methods are regarded as an effective way to predict unknown lncRNA-miRNA interactions (LMIs). However, most of them complete their tasks by mainly focusing on a single lncRNA-miRNA network without considering the complex mechanism between biomolecular in life activities, which are believed to be useful for improving the accuracy of LMI prediction. To address this, a heterogeneous information network (HIN) learning model with neighborhood-level structural representation, called HINLMI, to precisely identify LMIs. In particular, HINLMI first constructs a HIN by integrating nine interactions of five biomolecules. After that, different representation learning strategies are applied to learn the biological and network representations of lncRNAs and miRNAs in the HIN from different perspectives. Finally, HINLMI incorporates the XGBoost classifier to predict unknown LMIs using final embeddings of lncRNAs and miRNAs. Experimental results show that HINLMI yields a best performance on the real dataset when compared with state-of-the-art computational models. Moreover, several analysis experiments indicate that the simultaneous consideration of biological knowledge and network topology of lncRNAs and miRNAs allows HINLMI to accurately predict LMIs from a more comprehensive perspective. The promising performance of HINLMI also reveals that the utilization of rich heterogeneous information can provide an alternative insight for HINLMI to identify novel interactions between lncRNAs and miRNAs.
Collapse
Affiliation(s)
- Bo-Wei Zhao
- College of Computer and Information Science, School of Software, Southwest University, Chongqing 400715, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Yue Yang
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Dong-Xu Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Guo-Dong Li
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Peng-Wei Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Xin Luo
- College of Computer and Information Science, School of Software, Southwest University, Chongqing 400715, China
| | - Lun Hu
- The Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| |
Collapse
|
9
|
Ge F, Li CF, Zhang CM, Zhang M, Yu DJ. PRITrans: A Transformer-Based Approach for the Prediction of the Effects of Missense Mutation on Protein-RNA Interactions. Int J Mol Sci 2024; 25:12348. [PMID: 39596413 PMCID: PMC11594650 DOI: 10.3390/ijms252212348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/13/2024] [Accepted: 11/15/2024] [Indexed: 11/28/2024] Open
Abstract
Protein-RNA interactions are essential to many cellular functions, and missense mutations in RNA-binding proteins can disrupt these interactions, often leading to disease. To address this, we developed PRITrans, a specialized computational method aimed at predicting the effects of missense mutations on protein-RNA interactions, which is vital for understanding disease mechanisms and advancing molecular biology research. PRITrans is a novel deep learning model designed to predict the effects of missense mutations on protein-RNA interactions, which employs a Transformer architecture enhanced with multiscale convolution modules for comprehensive feature extraction. Its primary innovation lies in integrating protein language model embeddings with a deep feature fusion strategy, effectively handling high-dimensional feature representations. By utilizing multi-layer self-attention mechanisms, PRITrans captures nuanced, high-level sequence information, while multiscale convolutions extract features across various depths, thereby enhancing predictive accuracy. Consequently, this architecture enables significant improvements in ΔΔG prediction compared to traditional approaches. We validated PRITrans using three different cross-validation strategies on two newly reconstructed mutation datasets, S315 and S630 (containing 315 forward and 315 reverse mutations). The results consistently demonstrated PRITrans's strong performance on both datasets. PRITrans demonstrated strong predictive capability, achieving a Pearson correlation coefficient of 0.741 and a root mean square error (RMSE) of 1.168 kcal/mol on the S630 dataset. Moreover, its robust performance extended to independent test sets, achieving a Pearson correlation of 0.699 and an RMSE of 1.592 kcal/mol. These results underscore PRITrans's potential as a powerful tool for protein-RNA interaction studies. Moreover, when tested against existing prediction methods on an independent dataset, PRITrans showed improved predictive accuracy and robustness.
Collapse
Affiliation(s)
- Fang Ge
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts & Telecommunications, 9 Wenyuan, Nanjing 210023, China;
| | - Cui-Feng Li
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China; (C.-F.L.); (C.-M.Z.); (M.Z.)
| | - Chao-Ming Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China; (C.-F.L.); (C.-M.Z.); (M.Z.)
| | - Ming Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China; (C.-F.L.); (C.-M.Z.); (M.Z.)
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
10
|
Panahi B, Khalilpour Shadbad R. Navigating the microalgal maze: a comprehensive review of recent advances and future perspectives in biological networks. PLANTA 2024; 260:114. [PMID: 39367989 DOI: 10.1007/s00425-024-04543-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 09/28/2024] [Indexed: 10/07/2024]
Abstract
MAIN CONCLUSION PPI analysis deepens our knowledge in critical processes like carbon fixation and nutrient sensing. Moreover, signaling networks, including pathways like MAPK/ERK and TOR, provide valuable information in how microalgae respond to environmental changes and stress. Additionally, species-species interaction networks for microalgae provide a comprehensive understanding of how different species interact within their environments. This review examines recent advancements in the study of biological networks within microalgae, with a focus on the intricate interactions that define these organisms. It emphasizes how network biology, an interdisciplinary field, offers valuable insights into microalgae functions through various methodologies. Crucial approaches, such as protein-protein interaction (PPI) mapping utilizing yeast two-hybrid screening and mass spectrometry, are essential for comprehending cellular processes and optimizing functions, such as photosynthesis and fatty acid biosynthesis. The application of advanced computational methods and information mining has significantly improved PPI analysis, revealing networks involved in critical processes like carbon fixation and nutrient sensing. The review also encompasses transcriptional networks, which play a role in gene regulation and stress responses, as well as metabolic networks represented by genome-scale metabolic models (GEMs), which aid in strain optimization and the prediction of metabolic outcomes. Furthermore, signaling networks, including pathways like MAPK/ERK and TOR, are crucial for understanding how microalgae respond to environmental changes and stress. Additionally, species-species interaction networks for microalgae provide a comprehensive understanding of how different species interact within their environments. The integration of these network biology approaches has deepened our understanding of microalgal interactions, paving the way for more efficient cultivation and new industrial applications.
Collapse
Affiliation(s)
- Bahman Panahi
- Department of Genomics, Branch for Northwest & West Region, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research, Education and Extension Organization (AREEO), Tabriz, 5156915-598, Iran.
| | - Robab Khalilpour Shadbad
- Department of Cellular and Molecular Biology, Faculty of Science, Azarbaijan Shahid Madani University, Tabriz, Iran
| |
Collapse
|
11
|
Li M, Su F, Wu O, Zhang J. Class-Level Logit Perturbation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13926-13940. [PMID: 37220052 DOI: 10.1109/tnnls.2023.3273355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Features, logits, and labels are the three primary data when a sample passes through a deep neural network (DNN). Feature perturbation and label perturbation receive increasing attention in recent years. They have been proven to be useful in various deep learning approaches. For example, (adversarial) feature perturbation can improve the robustness or even generalization capability of learned models. However, limited studies have explicitly explored for the perturbation of logit vectors. This work discusses several existing methods related to class-level logit perturbation. A unified viewpoint between regular/irregular data augmentation and loss variations incurred by logit perturbation is established. A theoretical analysis is provided to illuminate why class-level logit perturbation is useful. Accordingly, new methodologies are proposed to explicitly learn to perturb logits for both the single-label and multilabel classification tasks. Meta-learning is also leveraged to determine the regular or irregular augmentation for each class. Extensive experiments on benchmark image classification datasets and their long-tail versions indicated the competitive performance of our learning method. As it only perturbs on logit, it can be used as a plug-in to fuse with any existing classification algorithms. All the codes are available at https://github.com/limengyang1992/lpl.
Collapse
|
12
|
Gong F, Cao D, Sun X, Li Z, Qu C, Fan Y, Cao Z, Zhao K, Zhao K, Qiu D, Li Z, Ren R, Ma X, Zhang X, Yin D. Homologous mapping yielded a comprehensive predicted protein-protein interaction network for peanut (Arachis hypogaea L.). BMC PLANT BIOLOGY 2024; 24:873. [PMID: 39304811 DOI: 10.1186/s12870-024-05580-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND Protein-protein interactions are the primary means through which proteins carry out their functions. These interactions thus have crucial roles in life activities. The wide availability of fully sequenced animal and plant genomes has facilitated establishment of relatively complete global protein interaction networks for some model species. The genomes of cultivated and wild peanut (Arachis hypogaea L.) have also been sequenced, but the functions of most of the encoded proteins remain unclear. RESULTS We here used homologous mapping of validated protein interaction data from model species to generate complete peanut protein interaction networks for A. hypogaea cv. 'Tifrunner' (282,619 pairs), A. hypogaea cv. 'Shitouqi' (256,441 pairs), A. monticola (440,470 pairs), A. duranensis (136,363 pairs), and A. ipaensis (172,813 pairs). A detailed analysis was conducted for a putative disease-resistance subnetwork in the Tifrunner network to identify candidate genes and validate functional interactions. The network suggested that DX2UEH and its interacting partners may participate in peanut resistance to bacterial wilt; this was preliminarily validated with overexpression experiments in peanut. CONCLUSION Our results provide valuable new information for future analyses of gene and protein functions and regulatory networks in peanut.
Collapse
Affiliation(s)
- Fangping Gong
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Di Cao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Xiaojian Sun
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Zhuo Li
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Chengxin Qu
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Yi Fan
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Zenghui Cao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Kai Zhao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Kunkun Zhao
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Ding Qiu
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Zhongfeng Li
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Rui Ren
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Xingli Ma
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Xingguo Zhang
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China
| | - Dongmei Yin
- College of Agronomy, Henan Agricultural University, Zhengzhou, 450000, People's Republic of China.
| |
Collapse
|
13
|
Pak M, Bang D, Sung I, Kim S, Lee S. DGDRP: drug-specific gene selection for drug response prediction via re-ranking through propagating and learning biological network. Front Genet 2024; 15:1441558. [PMID: 39371421 PMCID: PMC11450864 DOI: 10.3389/fgene.2024.1441558] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 09/03/2024] [Indexed: 10/08/2024] Open
Abstract
Introduction: Drug response prediction, especially in terms of cell viability prediction, is a well-studied research problem with significant implications for personalized medicine. It enables the identification of the most effective drugs based on individual genetic profiles, aids in selecting potential drug candidates, and helps identify biomarkers that predict drug efficacy and toxicity.A deeper investigation on drug response prediction reveals that drugs exert their effects by targeting specific proteins, which in turn perturb related genes in cascading ways. This perturbation affects cellular pathways and regulatory networks, ultimately influencing the cellular response to the drug. Identifying which genes are perturbed and how they interact can provide critical insights into the mechanisms of drug action. Hence, the problem of predicting drug response can be framed as a dual problem involving both the prediction of drug efficacy and the selection of drug-specific genes. Identifying these drug-specific genes (biomarkers) is crucial because they serve as indicators of how the drug will affect the biological system, thereby facilitating both drug response prediction and biomarker discovery.Methods: In this study, we propose DGDRP (Drug-specific Gene selection for Drug Response Prediction), a graph neural network (GNN)-based model that uses a novel rank-and-re-rank process for drug-specific gene selection. DGDRP first ranks genes using a pathway knowledge-enhanced network propagation algorithm based on drug target information, ensuring biological relevance. It then re-ranks genes based on the similarity between gene and drug target embeddings learned from the GNN, incorporating semantic relationships. Thus, our model adaptively learns to select drug mechanism-associated genes that contribute to drug response prediction. This integrated approach not only improves drug response predictions compared to other gene selection methods but also allows for effective biomarker discovery.Discussion: As a result, our approach demonstrates improved drug response predictions compared to other gene selection methods and demonstrates comparability with state-of-the-art deep learning models. Case studies further support our method by showing alignment of selected gene sets with the mechanisms of action of input drugs.Conclusion: Overall, DGDRP represents a deep learning based re-ranking strategy, offering a robust gene selection framework for more accurate drug response prediction. The source code for DGDRP can be found at: https://github.com/minwoopak/heteronet.
Collapse
Affiliation(s)
- Minwoo Pak
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
| | - Dongmin Bang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Aigendrug Co., Ltd., Seoul, Republic of Korea
| | - Inyoung Sung
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea
| | - Sunho Lee
- Aigendrug Co., Ltd., Seoul, Republic of Korea
| |
Collapse
|
14
|
Thareja P, Chhillar RS, Dalal S, Simaiya S, Lilhore UK, Alroobaea R, Alsafyani M, Baqasah AM, Algarni S. Intelligence model on sequence-based prediction of PPI using AISSO deep concept with hyperparameter tuning process. Sci Rep 2024; 14:21797. [PMID: 39294330 PMCID: PMC11410825 DOI: 10.1038/s41598-024-72558-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 09/09/2024] [Indexed: 09/20/2024] Open
Abstract
Protein-protein interaction (PPI) prediction is vital for interpreting biological activities. Even though many diverse sorts of data and machine learning approaches have been employed in PPI prediction, performance still has to be enhanced. As a result, we adopted an Aquilla Influenced Shark Smell (AISSO)-based hybrid prediction technique to construct a sequence-dependent PPI prediction model. This model has two stages of operation: feature extraction and prediction. Along with sequence-based and Gene Ontology features, unique features were produced in the feature extraction stage utilizing the improved semantic similarity technique, which may deliver reliable findings. These collected characteristics were then sent to the prediction step, and hybrid neural networks, such as the Improved Recurrent Neural Network and Deep Belief Networks, were used to predict the PPI using modified score level fusion. These neural networks' weight variables were adjusted utilizing a unique optimal methodology called Aquila Influenced Shark Smell (AISSO), and the outcomes showed that the developed model had attained an accuracy of around 88%, which is much better than the traditional methods; this model AISSO-based PPI prediction can provide precise and effective predictions.
Collapse
Affiliation(s)
- Preeti Thareja
- DCSA, Maharshi Dayanand University, Rohtak, Haryana, India
| | | | - Sandeep Dalal
- DCSA, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Sarita Simaiya
- Arba Minch University, Arba Minch, Ethiopia.
- Department of Computer Science and Engineering, Galgotias University, Greater Noida, UP, India.
| | - Umesh Kumar Lilhore
- Department of Computer Science and Engineering, Galgotias University, Greater Noida, UP, India
| | - Roobaea Alroobaea
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, 21944, Taif, Saudi Arabia
| | - Majed Alsafyani
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, 21944, Taif, Saudi Arabia
| | - Abdullah M Baqasah
- Department of Information Technology, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944, Saudi Arabia
| | - Sultan Algarni
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, 21589, Jeddah, Saudi Arabia
| |
Collapse
|
15
|
Yang S, Cheng P, Liu Y, Feng D, Wang S. Exploring the Knowledge of an Outstanding Protein to Protein Interaction Transformer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1287-1298. [PMID: 38536676 DOI: 10.1109/tcbb.2024.3381825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
Protein-to-protein interaction (PPI) prediction aims to predict whether two given proteins interact or not. Compared with traditional experimental methods of high cost and low efficiency, the current deep learning based approach makes it possible to discover massive potential PPIs from large-scale databases. However, deep PPI prediction models perform poorly on unseen species, as their proteins are not in the training set. Targetting on this issue, the paper first proposes PPITrans, a Transformer based PPI prediction model that exploits a language model pre-trained on proteins to conduct binary PPI prediction. To validate the effectiveness on unseen species, PPITrans is trained with Human PPIs and tested on PPIs of other species. Experimental results show that PPITrans significantly outperforms the previous state-of-the-art on various metrics, especially on PPIs of unseen species. For example, the AUPR improves 0.339 absolutely on Fly PPIs. Aiming to explore the knowledge learned by PPITrans from PPI data, this paper also designs a series of probes belonging to three categories. Their results reveal several interesting findings, like that although PPITrans cannot capture the spatial structure of proteins, it can obtain knowledge of PPI type and binding affinity, learning more than binary PPI.
Collapse
|
16
|
Cai W, Liu P, Wang Z, Jiang H, Liu C, Fei Z, Yang Z. Link prediction in protein-protein interaction network: A similarity multiplied similarity algorithm with paths of length three. J Theor Biol 2024; 589:111850. [PMID: 38740126 DOI: 10.1016/j.jtbi.2024.111850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/26/2024] [Accepted: 05/03/2024] [Indexed: 05/16/2024]
Abstract
Protein-protein interactions (PPIs) are crucial for various biological processes, and predicting PPIs is a major challenge. To solve this issue, the most common method is link prediction. Currently, the link prediction methods based on network Paths of Length Three (L3) have been proven to be highly effective. In this paper, we propose a novel link prediction algorithm, named SMS, which is based on L3 and protein similarities. We first design a mixed similarity that combines the topological structure and attribute features of nodes. Then, we compute the predicted value by summing the product of all similarities along the L3. Furthermore, we propose the Max Similarity Multiplied Similarity (maxSMS) algorithm from the perspective of maximum impact. Our computational prediction results show that on six datasets, including S. cerevisiae, H. sapiens, and others, the maxSMS algorithm improves the precision of the top 500, area under the precision-recall curve, and normalized discounted cumulative gain by an average of 26.99%, 53.67%, and 6.7%, respectively, compared to other optimal methods.
Collapse
Affiliation(s)
- Wangmin Cai
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Peiqiang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China.
| | - Zunfang Wang
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Hong Jiang
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Chang Liu
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Zhaojie Fei
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| | - Zhuang Yang
- School of Computer Science and Technology, Shandong Technology and Business University, Yantai, China
| |
Collapse
|
17
|
Tang T, Zhang X, Li W, Wang Q, Liu Y, Cao X. Co-training based prediction of multi-label protein-protein interactions. Comput Biol Med 2024; 177:108623. [PMID: 38788374 DOI: 10.1016/j.compbiomed.2024.108623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 05/01/2024] [Accepted: 05/16/2024] [Indexed: 05/26/2024]
Abstract
Prediction of protein-protein interaction (PPI) types enhances the comprehension of the underlying structural characteristics and functions of proteins, which gives rise to a multi-label classification problem. The nominal features describe the physicochemical characteristics of proteins directly, establishing a more robust correlation with the interaction types between proteins than ordered features. Motivated by this, we propose a multi-label PPI prediction model referred to as CoMPPI (Co-training based Multi-Label prediction of Protein-Protein Interaction). This approach aims to maximize the utility of both ordered and nominal features extracted from protein sequences. Specifically, CoMPPI incorporates graph convolutional network (GCN) and 1D convolution operation to process the complementary subsets of features individually, leveraging both local and contextualized information in a more efficient way. In addition, two multi-type PPI datasets were constructed to eliminate the duplication in previous datasets. We compare the performance of CoMPPI with three state-of-the-art methods on three datasets partitioned using distinct schemes (Breadth-first search, Depth-first search, and Random), CoMPPI consistently outperforms the other methods across all cases, demonstrating improvements ranging from 3.81% to 32.40% in Micro-F1. The subsequent ablation experiment confirms the efficacy of employing the co-training framework for multi-label PPI prediction, indicating promising avenues for future advancements in this domain.
Collapse
Affiliation(s)
- Tao Tang
- School of Modern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Nanjing, 210023, Jiangsu, China
| | - Xiaocai Zhang
- Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), 1 Fusionopolis Way, Singapore, 138632, Singapore
| | - Weizhuo Li
- School of Modern Posts, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Nanjing, 210023, Jiangsu, China
| | - Qing Wang
- School of Management, Nanjing University of Posts and Telecommunications, 9 Wenyuan Rd, Nanjing, 210023, Jiangsu, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 2 Lushan Rd, Changsha, 410086, Hunan, China; Key Laboratory of Intelligent Computing & Signal Processing of Ministry of Education, Anhui University, 111 Jiulong Road, Hefei, 230601, Anhui, China.
| | - Xiaofeng Cao
- School of Artificial Intelligence, Jilin University, 2699 Qianjin St, Jilin, 130012, Changchun, China
| |
Collapse
|
18
|
Cao MY, Zainudin S, Daud KM. Protein features fusion using attributed network embedding for predicting protein-protein interaction. BMC Genomics 2024; 25:466. [PMID: 38741045 DOI: 10.1186/s12864-024-10361-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 04/29/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) hold significant importance in biology, with precise PPI prediction as a pivotal factor in comprehending cellular processes and facilitating drug design. However, experimental determination of PPIs is laborious, time-consuming, and often constrained by technical limitations. METHODS We introduce a new node representation method based on initial information fusion, called FFANE, which amalgamates PPI networks and protein sequence data to enhance the precision of PPIs' prediction. A Gaussian kernel similarity matrix is initially established by leveraging protein structural resemblances. Concurrently, protein sequence similarities are gauged using the Levenshtein distance, enabling the capture of diverse protein attributes. Subsequently, to construct an initial information matrix, these two feature matrices are merged by employing weighted fusion to achieve an organic amalgamation of structural and sequence details. To gain a more profound understanding of the amalgamated features, a Stacked Autoencoder (SAE) is employed for encoding learning, thereby yielding more representative feature representations. Ultimately, classification models are trained to predict PPIs by using the well-learned fusion feature. RESULTS When employing 5-fold cross-validation experiments on SVM, our proposed method achieved average accuracies of 94.28%, 97.69%, and 84.05% in terms of Saccharomyces cerevisiae, Homo sapiens, and Helicobacter pylori datasets, respectively. CONCLUSION Experimental findings across various authentic datasets validate the efficacy and superiority of this fusion feature representation approach, underscoring its potential value in bioinformatics.
Collapse
Affiliation(s)
- Mei-Yuan Cao
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia.
| | - Suhaila Zainudin
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia
| | - Kauthar Mohd Daud
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia
| |
Collapse
|
19
|
Zeng X, Meng FF, Wen ML, Li SJ, Li Y. GNNGL-PPI: multi-category prediction of protein-protein interactions using graph neural networks based on global graphs and local subgraphs. BMC Genomics 2024; 25:406. [PMID: 38724906 PMCID: PMC11080243 DOI: 10.1186/s12864-024-10299-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 04/10/2024] [Indexed: 05/13/2024] Open
Abstract
Most proteins exert their functions by interacting with other proteins, making the identification of protein-protein interactions (PPI) crucial for understanding biological activities, pathological mechanisms, and clinical therapies. Developing effective and reliable computational methods for predicting PPI can significantly reduce the time-consuming and labor-intensive associated traditional biological experiments. However, accurately identifying the specific categories of protein-protein interactions and improving the prediction accuracy of the computational methods remain dual challenges. To tackle these challenges, we proposed a novel graph neural network method called GNNGL-PPI for multi-category prediction of PPI based on global graphs and local subgraphs. GNNGL-PPI consisted of two main components: using Graph Isomorphism Network (GIN) to extract global graph features from PPI network graph, and employing GIN As Kernel (GIN-AK) to extract local subgraph features from the subgraphs of protein vertices. Additionally, considering the imbalanced distribution of samples in each category within the benchmark datasets, we introduced an Asymmetric Loss (ASL) function to further enhance the predictive performance of the method. Through evaluations on six benchmark test sets formed by three different dataset partitioning algorithms (Random, BFS, DFS), GNNGL-PPI outperformed the state-of-the-art multi-category prediction methods of PPI, as measured by the comprehensive performance evaluation metric F1-measure. Furthermore, interpretability analysis confirmed the effectiveness of GNNGL-PPI as a reliable multi-category prediction method for predicting protein-protein interactions.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, 671003, Dali, China
| | - Fan-Fang Meng
- College of Mathematics and Computer Science, Dali University, 671003, Dali, China
| | - Meng-Liang Wen
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, Yunnan University, 650000, Kunming, China
| | - Shu-Juan Li
- Yunnan Institute of Endemic Diseases Control & Prevention, 671000, Dali, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, 671003, Dali, China.
| |
Collapse
|
20
|
Lu C, Jiang J, Chen Q, Liu H, Ju X, Wang H. Analysis and prediction of interactions between transmembrane and non-transmembrane proteins. BMC Genomics 2024; 25:401. [PMID: 38658824 PMCID: PMC11040819 DOI: 10.1186/s12864-024-10251-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
BACKGROUND Most of the important biological mechanisms and functions of transmembrane proteins (TMPs) are realized through their interactions with non-transmembrane proteins(nonTMPs). The interactions between TMPs and nonTMPs in cells play vital roles in intracellular signaling, energy metabolism, investigating membrane-crossing mechanisms, correlations between disease and drugs. RESULTS Despite the importance of TMP-nonTMP interactions, the study of them remains in the wet experimental stage, lacking specific and comprehensive studies in the field of bioinformatics. To fill this gap, we performed a comprehensive statistical analysis of known TMP-nonTMP interactions and constructed a deep learning-based predictor to identify potential interactions. The statistical analysis describes known TMP-nonTMP interactions from various perspectives, such as distributions of species and protein families, enrichment of GO and KEGG pathways, as well as hub proteins and subnetwork modules in the PPI network. The predictor implemented by an end-to-end deep learning model can identify potential interactions from protein primary sequence information. The experimental results over the independent validation demonstrated considerable prediction performance with an MCC of 0.541. CONCLUSIONS To our knowledge, we were the first to focus on TMP-nonTMP interactions. We comprehensively analyzed them using bioinformatics methods and predicted them via deep learning-based solely on their sequence. This research completes a key link in the protein network, benefits the understanding of protein functions, and helps in pathogenesis studies of diseases and associated drug development.
Collapse
Affiliation(s)
- Chang Lu
- School of Psychology, School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Jiuhong Jiang
- School of Psychology, School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Qiufen Chen
- School of Psychology, School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Huanhuan Liu
- School of Psychology, School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China
| | - Xingda Ju
- School of Psychology, School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China.
| | - Han Wang
- School of Psychology, School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China.
| |
Collapse
|
21
|
Liu W, Wang Z, You R, Xie C, Wei H, Xiong Y, Yang J, Zhu S. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 2024; 15:2775. [PMID: 38555371 PMCID: PMC10981738 DOI: 10.1038/s41467-024-46808-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/08/2024] [Indexed: 04/02/2024] Open
Abstract
Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences. Extensive experimental results show that PLMSearch can search millions of query-target protein pairs in seconds like MMseqs2 while increasing the sensitivity by more than threefold, and is comparable to state-of-the-art structure search methods. In particular, unlike traditional sequence search methods, PLMSearch can recall most remote homology pairs with dissimilar sequences but similar structures. PLMSearch is freely available at https://dmiip.sjtu.edu.cn/PLMSearch .
Collapse
Affiliation(s)
- Wei Liu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ziye Wang
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Ronghui You
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China
| | - Chenghan Xie
- School of Mathematical Sciences, Fudan University, 200433, Shanghai, China
| | - Hong Wei
- School of Mathematical Sciences, Nankai University, 300071, Tianjin, China
| | - Yi Xiong
- Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Jianyi Yang
- Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Science, Shandong University, 266237, Qingdao, China.
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
- Shanghai Qi Zhi Institute, Shanghai, China.
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, China.
- Zhangjiang Fudan International Innovation Center, Shanghai, China.
| |
Collapse
|
22
|
Dou Y, Ren Y, Zhao X, Jin J, Xiong S, Luo L, Xu X, Yang X, Yu J, Guo L, Liang T. CSSLdb: Discovery of cancer-specific synthetic lethal interactions based on machine learning and statistic inference. Comput Biol Med 2024; 170:108066. [PMID: 38310806 DOI: 10.1016/j.compbiomed.2024.108066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/22/2023] [Accepted: 01/27/2024] [Indexed: 02/06/2024]
Abstract
Synthetic lethality (SL) occurs when the inactivation of two genes results in cell death while the inactivation of either gene alone is non-lethal. SL-based therapy has become a promising anti-cancer treatment option with the increasing researches and applications in clinical practice, while the specific therapeutic opportunities for various cancers have not yet been comprehensively investigated. Herein, we described a computational approach based on machine learning and statistical inference to discover the cancer-specific synthetic lethal interactions. First, Random Forest and One-Class SVM were used to perform cancer unbiased prediction of synthetic lethality. Then, two strategies, including mutual exclusivity and differential expression, were used to screen cancer-specific synthetic lethal interactions, resulting in 14,582 SL gene pairs in 33 cancer types. Finally, we developed a freely available database of CSSLdb (Cancer Specific Synthetic Lethality Database, http://www.tmliang.cn/CSSL/) to present cancer-specific synthetic lethal genetic interactions, which would enrich the relevant research and contribute to underlying therapy strategies based on synthetic lethality.
Collapse
Affiliation(s)
- Yuyang Dou
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yujie Ren
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Xinmiao Zhao
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Jiaming Jin
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Shizheng Xiong
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Lulu Luo
- Jiangsu Key Laboratory for Molecular and Medical Biotechnology, School of Life Science, Nanjing Normal University, Nanjing, 210023, China
| | - Xinru Xu
- Jiangsu Key Laboratory for Molecular and Medical Biotechnology, School of Life Science, Nanjing Normal University, Nanjing, 210023, China
| | - Xueni Yang
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Jiafeng Yu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, 253023, China
| | - Li Guo
- State Key Laboratory of Organic Electronics and Information Displays & Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
| | - Tingming Liang
- Jiangsu Key Laboratory for Molecular and Medical Biotechnology, School of Life Science, Nanjing Normal University, Nanjing, 210023, China.
| |
Collapse
|
23
|
Pan X, Li Y, Huang P, Staecker H, He M. Extracellular vesicles for developing targeted hearing loss therapy. J Control Release 2024; 366:460-478. [PMID: 38182057 DOI: 10.1016/j.jconrel.2023.12.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 12/19/2023] [Accepted: 12/28/2023] [Indexed: 01/07/2024]
Abstract
Substantial efforts have been made for local administration of small molecules or biologics in treating hearing loss diseases caused by either trauma, genetic mutations, or drug ototoxicity. Recently, extracellular vesicles (EVs) naturally secreted from cells have drawn increasing attention on attenuating hearing impairment from both preclinical studies and clinical studies. Highly emerging field utilizing diverse bioengineering technologies for developing EVs as the bioderived therapeutic materials, along with artificial intelligence (AI)-based targeting toolkits, shed the light on the unique properties of EVs specific to inner ear delivery. This review will illuminate such exciting research field from fundamentals of hearing protective functions of EVs to biotechnology advancement and potential clinical translation of functionalized EVs. Specifically, the advancements in assessing targeting ligands using AI algorithms are systematically discussed. The overall translational potential of EVs is reviewed in the context of auditory sensing system for developing next generation gene therapy.
Collapse
Affiliation(s)
- Xiaoshu Pan
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida 32610, United States
| | - Yanjun Li
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, Florida 32610, United States
| | - Peixin Huang
- Department of Otolaryngology, Head and Neck Surgery, University of Kansas School of Medicine, Kansas City, Kansas 66160, United States
| | - Hinrich Staecker
- Department of Otolaryngology, Head and Neck Surgery, University of Kansas School of Medicine, Kansas City, Kansas 66160, United States.
| | - Mei He
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, Florida 32610, United States.
| |
Collapse
|
24
|
Ghosh S, Mitra P. MaTPIP: A deep-learning architecture with eXplainable AI for sequence-driven, feature mixed protein-protein interaction prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107955. [PMID: 38064959 DOI: 10.1016/j.cmpb.2023.107955] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/09/2023] [Accepted: 11/26/2023] [Indexed: 01/26/2024]
Abstract
BACKGROUND AND OBJECTIVE Protein-protein interaction (PPI) is a vital process in all living cells, controlling essential cell functions such as cell cycle regulation, signal transduction, and metabolic processes with broad applications that include antibody therapeutics, vaccines, and drug discovery. The problem of sequence-based PPI prediction has been a long-standing issue in computational biology. METHODS We introduce MaTPIP, a cutting-edge deep-learning framework for predicting PPI. MaTPIP stands out due to its innovative design, fusing pre-trained Protein Language Model (PLM)-based features with manually curated protein sequence attributes, emphasizing the part-whole relationship by incorporating two-dimensional granular part (amino-acid) level features and one-dimensional whole-level (protein) features. What sets MaTPIP apart is its ability to integrate these features across three different input terminals seamlessly. MatPIP also includes a distinctive configuration of Convolutional Neural Network (CNN) with Transformer components for concurrent utilization of CNN and sequential characteristics in each iteration and a one-dimensional to two-dimensional converter followed by a unified embedding. The statistical significance of this classifier is validated using McNemar's test. RESULTS MaTPIP outperformed the existing methods on both the Human PPI benchmark and cross-species PPI testing datasets, demonstrating its immense generalization capability for PPI prediction. We used seven diverse datasets with varying PPI target class distributions. Notably, within the novel PPI scenario, the most challenging category for Human PPI Benchmark, MaTPIP improves the existing state-of-the-art score from 74.1% to 78.6% (measured in Area under ROC Curve), from 23.2% to 32.8% (in average precision) and from 4.9% to 9.5% (in precision at 3% recall) for 50%, 10% and 0.3% target class distributions, respectively. In cross-species PPI evaluation, hybrid MaTPIP establishes a new benchmark score (measured in Area Under precision-recall curve) of 81.1% from the previous 60.9% for Mouse, 80.9% from 56.2% for Fly, 78.1% from 55.9% for Worm, 59.9% from 41.7% for Yeast, and 66.2% from 58.8% for E.coli. Our eXplainable AI-based assessment reveals an average contribution of different feature families per prediction on these datasets. CONCLUSIONS MaTPIP mixes manually curated features with the feature extracted from the pre-trained PLM to predict sequence-based protein-protein association. Furthermore, MaTPIP demonstrates strong generalization capabilities for cross-species PPI predictions.
Collapse
Affiliation(s)
- Shubhrangshu Ghosh
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India; TCS Research, Tata Consultancy Services Limited, Kolkata, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal, India.
| |
Collapse
|
25
|
Lannelongue L, Inouye M. Pitfalls of machine learning models for protein-protein interaction networks. Bioinformatics 2024; 40:btae012. [PMID: 38200587 PMCID: PMC10868344 DOI: 10.1093/bioinformatics/btae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/24/2023] [Accepted: 01/09/2024] [Indexed: 01/12/2024] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools, based on classic machine learning, have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and discrepancies between algorithms that remain unexplained. RESULTS To better understand the underlying inference mechanisms that underpin these models, we designed an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use it to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specializes in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a principled foundation for future construction, comparison, and application of PPI networks. AVAILABILITY AND IMPLEMENTATION The code and data are available on GitHub: https://github.com/Llannelongue/B4PPI.
Collapse
Affiliation(s)
- Loïc Lannelongue
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, CB2 0BB Cambridge, United Kingdom
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, United Kingdom
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, 3004 Victoria, Australia
- British Heart Foundation Centre of Research Excellence, University of Cambridge, CB2 0BB Cambridge, United Kingdom
| |
Collapse
|
26
|
Giulini M, Honorato RV, Rivera JL, Bonvin AMJJ. ARCTIC-3D: automatic retrieval and clustering of interfaces in complexes from 3D structural information. Commun Biol 2024; 7:49. [PMID: 38184711 PMCID: PMC10771469 DOI: 10.1038/s42003-023-05718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 12/18/2023] [Indexed: 01/08/2024] Open
Abstract
The formation of a stable complex between proteins lies at the core of a wide variety of biological processes and has been the focus of countless experiments. The huge amount of information contained in the protein structural interactome in the Protein Data Bank can now be used to characterise and classify the existing biological interfaces. We here introduce ARCTIC-3D, a fast and user-friendly data mining and clustering software to retrieve data and rationalise the interface information associated with the protein input data. We demonstrate its use by various examples ranging from showing the increased interaction complexity of eukaryotic proteins, 20% of which on average have more than 3 different interfaces compared to only 10% for prokaryotes, to associating different functions to different interfaces. In the context of modelling biomolecular assemblies, we introduce the concept of "recognition entropy", related to the number of possible interfaces of the components of a protein-protein complex, which we demonstrate to correlate with the modelling difficulty in classical docking approaches. The identified interface clusters can also be used to generate various combinations of interface-specific restraints for integrative modelling. The ARCTIC-3D software is freely available at github.com/haddocking/arctic3d and can be accessed as a web-service at wenmr.science.uu.nl/arctic3d.
Collapse
Affiliation(s)
- Marco Giulini
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Rodrigo V Honorato
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Jesús L Rivera
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands
| | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Padualaan 8, 3584, Utrecht, CH, The Netherlands.
| |
Collapse
|
27
|
Li D, Xiao Z, Sun H, Jiang X, Zhao W, Shen X. Prediction of Drug-Disease Associations Based on Multi-Kernel Deep Learning Method in Heterogeneous Graph Embedding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:120-128. [PMID: 38051617 DOI: 10.1109/tcbb.2023.3339189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Computational drug repositioning can identify potential associations between drugs and diseases. This technology has been shown to be effective in accelerating drug development and reducing experimental costs. Although there has been plenty of research for this task, existing methods are deficient in utilizing complex relationships among biological entities, which may not be conducive to subsequent simulation of drug treatment processes. In this article, we propose a heterogeneous graph embedding method called HMLKGAT to infer novel potential drugs for diseases. More specifically, we first construct a heterogeneous information network by combining drug-disease, drug-protein and disease-protein biological networks. Then, a multi-layer graph attention model is utilized to capture the complex associations in the network to derive representations for drugs and diseases. Finally, to maintain the relationship of nodes in different feature spaces, we propose a multi-kernel learning method to transform and combine the representations. Experimental results demonstrate that HMLKGAT outperforms six state-of-the-art methods in drug-related disease prediction, and case studies of five classical drugs further demonstrate the effectiveness of HMLKGAT.
Collapse
|
28
|
Chen HM, Liu JX, Liu D, Hao GF, Yang GF. Human-virus protein-protein interactions maps assist in revealing the pathogenesis of viral infection. Rev Med Virol 2024; 34:e2517. [PMID: 38282401 DOI: 10.1002/rmv.2517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/12/2023] [Accepted: 01/16/2024] [Indexed: 01/30/2024]
Abstract
Many significant viral infections have been recorded in human history, which have caused enormous negative impacts worldwide. Human-virus protein-protein interactions (PPIs) mediate viral infection and immune processes in the host. The identification, quantification, localization, and construction of human-virus PPIs maps are critical prerequisites for understanding the biophysical basis of the viral invasion process and characterising the framework for all protein functions. With the technological revolution and the introduction of artificial intelligence, the human-virus PPIs maps have been expanded rapidly in the past decade and shed light on solving complicated biomedical problems. However, there is still a lack of prospective insight into the field. In this work, we comprehensively review and compare the effectiveness, potential, and limitations of diverse approaches for constructing large-scale PPIs maps in human-virus, including experimental methods based on biophysics and biochemistry, databases of human-virus PPIs, computational methods based on artificial intelligence, and tools for visualising PPIs maps. The work aims to provide a toolbox for researchers, hoping to better assist in deciphering the relationship between humans and viruses.
Collapse
Affiliation(s)
- Hui-Min Chen
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Jia-Xin Liu
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Di Liu
- CAS Key Laboratory of Special Pathogens and Biosafety, Wuhan Institute of Virology, Center for Biosafety Mega-Science, Chinese Academy of Sciences, Wuhan, China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Guang-Fu Yang
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| |
Collapse
|
29
|
Yao D, Mei S, Tang W, Xu X, Lu Q, Shi Z. AAAKB: A manually curated database for tracking and predicting genes of Abdominal aortic aneurysm (AAA). PLoS One 2023; 18:e0289966. [PMID: 38100461 PMCID: PMC10723669 DOI: 10.1371/journal.pone.0289966] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 07/31/2023] [Indexed: 12/17/2023] Open
Abstract
Abdominal aortic aneurysm (AAA), an extremely dangerous vascular disease with high mortality, causes massive internal bleeding due to aneurysm rupture. To boost the research on AAA, efforts should be taken to organize and link the information about AAA-related genes and their functions. Currently, most researchers screen through genetic databases manually, which is cumbersome and time-consuming. Here, we developed "AAAKB" a manually curated knowledgebase containing genes, SNPs and pathways associated with AAA. In order to facilitate researchers to further explore the mechanism network of AAA, AAAKB provides predicted genes that are potentially associated with AAA. The prediction is based on the protein interaction information of genes collected in the database, and the random forest algorithm (RF) is used to build the prediction model. Some of these predicted genes are differentially expressed in patients with AAA, and some have been reported to play a role in other cardiovascular diseases, illustrating the utility of the knowledgebase in predicting novel genes. Also, AAAKB integrates a protein interaction visualization tool to quickly determine the shortest paths between target proteins. As the first knowledgebase to provide a comprehensive catalog of AAA-related genes, AAAKB will be an ideal research platform for AAA. Database URL: http://www.lqlgroup.cn:3838/AAAKB/.
Collapse
Affiliation(s)
- Di Yao
- Institute of Industrial Internet and Internet of Things, China Academy of Information and Communications Technology (CAICT), China
| | - Shuyuan Mei
- Key Laboratory of Cardiovascular and Cerebrovascular Medicine, School of Pharmacy, Nanjing Medical University, Nanjing, China
| | - Wangyang Tang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Xingyu Xu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Qiulun Lu
- Key Laboratory of Cardiovascular and Cerebrovascular Medicine, School of Pharmacy, Nanjing Medical University, Nanjing, China
| | - Zhiguang Shi
- Key Laboratory of Cardiovascular and Cerebrovascular Medicine, School of Pharmacy, Nanjing Medical University, Nanjing, China
| |
Collapse
|
30
|
Wu J, Liu B, Zhang J, Wang Z, Li J. DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning. BMC Bioinformatics 2023; 24:473. [PMID: 38097937 PMCID: PMC10722729 DOI: 10.1186/s12859-023-05594-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 12/01/2023] [Indexed: 12/17/2023] Open
Abstract
PURPOSE Sequenced Protein-Protein Interaction (PPI) prediction represents a pivotal area of study in biology, playing a crucial role in elucidating the mechanistic underpinnings of diseases and facilitating the design of novel therapeutic interventions. Conventional methods for extracting features through experimental processes have proven to be both costly and exceedingly complex. In light of these challenges, the scientific community has turned to computational approaches, particularly those grounded in deep learning methodologies. Despite the progress achieved by current deep learning technologies, their effectiveness diminishes when applied to larger, unfamiliar datasets. RESULTS In this study, the paper introduces a novel deep learning framework, termed DL-PPI, for predicting PPIs based on sequence data. The proposed framework comprises two key components aimed at improving the accuracy of feature extraction from individual protein sequences and capturing relationships between proteins in unfamiliar datasets. 1. Protein Node Feature Extraction Module: To enhance the accuracy of feature extraction from individual protein sequences and facilitate the understanding of relationships between proteins in unknown datasets, the paper devised a novel protein node feature extraction module utilizing the Inception method. This module efficiently captures relevant patterns and representations within protein sequences, enabling more informative feature extraction. 2. Feature-Relational Reasoning Network (FRN): In the Global Feature Extraction module of our model, the paper developed a novel FRN that leveraged Graph Neural Networks to determine interactions between pairs of input proteins. The FRN effectively captures the underlying relational information between proteins, contributing to improved PPI predictions. DL-PPI framework demonstrates state-of-the-art performance in the realm of sequence-based PPI prediction.
Collapse
Affiliation(s)
- Jiahui Wu
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Bo Liu
- School of Mathematical and Computational Sciences, Massey University, Auckland, 0745, New Zealand.
| | - Jidong Zhang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Zhihan Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
| |
Collapse
|
31
|
Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023; 24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
32
|
Sun S, Zheng Z, Wang J, Li F, He A, Lai K, Zhang S, Lu JH, Tian R, Tan CSH. Improved in situ characterization of protein complex dynamics at scale with thermal proximity co-aggregation. Nat Commun 2023; 14:7697. [PMID: 38001062 PMCID: PMC10673876 DOI: 10.1038/s41467-023-43526-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 11/13/2023] [Indexed: 11/26/2023] Open
Abstract
Cellular activities are carried out vastly by protein complexes but large repertoire of protein complexes remains functionally uncharacterized which necessitate new strategies to delineate their roles in various cellular processes and diseases. Thermal proximity co-aggregation (TPCA) is readily deployable to characterize protein complex dynamics in situ and at scale. We develop a version termed Slim-TPCA that uses fewer temperatures increasing throughputs by over 3X, with new scoring metrics and statistical evaluation that result in minimal compromise in coverage and detect more relevant complexes. Less samples are needed, batch effects are minimized while statistical evaluation cost is reduced by two orders of magnitude. We applied Slim-TPCA to profile K562 cells under different duration of glucose deprivation. More protein complexes are found dissociated, in accordance with the expected downregulation of most cellular activities, that include 55S ribosome and respiratory complexes in mitochondria revealing the utility of TPCA to study protein complexes in organelles. Protein complexes in protein transport and degradation are found increasingly assembled unveiling their involvement in metabolic reprogramming during glucose deprivation. In summary, Slim-TPCA is an efficient strategy for characterization of protein complexes at scale across cellular conditions, and is available as Python package at https://pypi.org/project/Slim-TPCA/ .
Collapse
Affiliation(s)
- Siyuan Sun
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Zhenxiang Zheng
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Jun Wang
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Fengming Li
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - An He
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Kunjia Lai
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Shuang Zhang
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Zhuhai, Macau SAR, China
| | - Jia-Hong Lu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Zhuhai, Macau SAR, China
| | - Ruijun Tian
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Chris Soon Heng Tan
- Department of Chemistry and Research Center for Chemical Biology and Omics Analysis, College of Science, Southern University of Science and Technology, Shenzhen, Guangdong, China.
| |
Collapse
|
33
|
Bi X, Liang W, Zhao Q, Wang J. SSLpheno: a self-supervised learning approach for gene-phenotype association prediction using protein-protein interactions and gene ontology data. Bioinformatics 2023; 39:btad662. [PMID: 37941450 PMCID: PMC10666204 DOI: 10.1093/bioinformatics/btad662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 10/17/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023] Open
Abstract
MOTIVATION Medical genomics faces significant challenges in interpreting disease phenotype and genetic heterogeneity. Despite the establishment of standardized disease phenotype databases, computational methods for predicting gene-phenotype associations still suffer from imbalanced category distribution and a lack of labeled data in small categories. RESULTS To address the problem of labeled-data scarcity, we propose a self-supervised learning strategy for gene-phenotype association prediction, called SSLpheno. Our approach utilizes an attributed network that integrates protein-protein interactions and gene ontology data. We apply a Laplacian-based filter to ensure feature smoothness and use self-supervised training to optimize node feature representation. Specifically, we calculate the cosine similarity of feature vectors and select positive and negative sample nodes for reconstruction training labels. We employ a deep neural network for multi-label classification of phenotypes in the downstream task. Our experimental results demonstrate that SSLpheno outperforms state-of-the-art methods, especially in categories with fewer annotations. Moreover, our case studies illustrate the potential of SSLpheno as an effective prescreening tool for gene-phenotype association identification. AVAILABILITY AND IMPLEMENTATION https://github.com/bixuehua/SSLpheno.
Collapse
Affiliation(s)
- Xuehua Bi
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi 830017, China
| | - Weiyang Liang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Qichang Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
34
|
Li ZW, Wang QK, Yuan CA, Han PY, You ZH, Wang L. Predicting MiRNA-Disease Associations by Graph Representation Learning Based on Jumping Knowledge Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2629-2638. [PMID: 35925844 DOI: 10.1109/tcbb.2022.3196394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Growing studies have shown that miRNAs are inextricably linked with many human diseases, and a great deal of effort has been spent on identifying their potential associations. Compared with traditional experimental methods, computational approaches have achieved promising results. In this article, we propose a graph representation learning method to predict miRNA-disease associations. Specifically, we first integrate the verified miRNA-disease associations with the similarity information of miRNA and disease to construct a miRNA-disease heterogeneous graph. Then, we apply a graph attention network to aggregate the neighbor information of nodes in each layer, and then feed the representation of the hidden layer into the structure-aware jumping knowledge network to obtain the global features of nodes. The output features of miRNAs and diseases are then concatenated and fed into a fully connected layer to score the potential associations. Through five-fold cross-validation, the average AUC, accuracy and precision values of our model are 93.30%, 85.18% and 88.90%, respectively. In addition, for three case studies of the esophageal tumor, lymphoma and prostate tumor, 46, 45 and 45 of the top 50 miRNAs predicted by our model were confirmed by relevant databases. Overall, our method could provide a reliable alternative for miRNA-disease association prediction.
Collapse
|
35
|
Luo X, Wang L, Hu P, Hu L. Predicting Protein-Protein Interactions Using Sequence and Network Information via Variational Graph Autoencoder. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3182-3194. [PMID: 37155405 DOI: 10.1109/tcbb.2023.3273567] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Protein-protein interactions (PPIs) play a critical role in the proteomics study, and a variety of computational algorithms have been developed to predict PPIs. Though effective, their performance is constrained by high false-positive and false-negative rates observed in PPI data. To overcome this problem, a novel PPI prediction algorithm, namely PASNVGA, is proposed in this work by combining the sequence and network information of proteins via variational graph autoencoder. To do so, PASNVGA first applies different strategies to extract the features of proteins from their sequence and network information, and obtains a more compact form of these features using principal component analysis. In addition, PASNVGA designs a scoring function to measure the higher-order connectivity between proteins and so as to obtain a higher-order adjacency matrix. With all these features and adjacency matrices, PASNVGA trains a variational graph autoencoder model to further learn the integrated embeddings of proteins. The prediction task is then completed by using a simple feedforward neural network. Extensive experiments have been conducted on five PPI datasets collected from different species. Compared with several state-of-the-art algorithms, PASNVGA has been demonstrated as a promising PPI prediction algorithm.
Collapse
|
36
|
Hasman M, Mayr M, Theofilatos K. Uncovering Protein Networks in Cardiovascular Proteomics. Mol Cell Proteomics 2023; 22:100607. [PMID: 37356494 PMCID: PMC10460687 DOI: 10.1016/j.mcpro.2023.100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 05/01/2023] [Accepted: 06/20/2023] [Indexed: 06/27/2023] Open
Abstract
Biological networks have been widely used in many different diseases to identify potential biomarkers and design drug targets. In the present review, we describe the main computational techniques for reconstructing and analyzing different types of protein networks and summarize the previous applications of such techniques in cardiovascular diseases. Existing tools are critically compared, discussing when each method is preferred such as the use of co-expression networks for functional annotation of protein clusters and the use of directed networks for inferring regulatory associations. Finally, we are presenting examples of reconstructing protein networks of different types (regulatory, co-expression, and protein-protein interaction networks). We demonstrate the necessity to reconstruct networks separately for each cardiovascular tissue type and disease entity and provide illustrative examples of the importance of taking into consideration relevant post-translational modifications. Finally, we demonstrate and discuss how the findings of protein networks could be interpreted using single-cell RNA-sequencing data.
Collapse
Affiliation(s)
- Maria Hasman
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | - Manuel Mayr
- King's British Heart Foundation Centre, Kings College London, London, United Kingdom
| | | |
Collapse
|
37
|
Kuo KM, Talley PC, Chang CS. The accuracy of artificial intelligence used for non-melanoma skin cancer diagnoses: a meta-analysis. BMC Med Inform Decis Mak 2023; 23:138. [PMID: 37501114 PMCID: PMC10375663 DOI: 10.1186/s12911-023-02229-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/07/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND With rising incidence of skin cancer and relatively increased mortality rates, an improved diagnosis of such a potentially fatal disease is of vital importance. Although frequently curable, it nevertheless places a considerable burden upon healthcare systems. Among the various types of skin cancers, non-melanoma skin cancer is most prevalent. Despite such prevalence and its associated cost, scant proof concerning the diagnostic accuracy via Artificial Intelligence (AI) for non-melanoma skin cancer exists. This study meta-analyzes the diagnostic test accuracy of AI used to diagnose non-melanoma forms of skin cancer, and it identifies potential covariates that account for heterogeneity between extant studies. METHODS Various electronic databases (Scopus, PubMed, ScienceDirect, SpringerLink, and Dimensions) were examined to discern eligible studies beginning from March 2022. Those AI studies predictive of non-melanoma skin cancer were included. Summary estimates of sensitivity, specificity, and area under receiver operating characteristic curves were used to evaluate diagnostic accuracy. The revised Quality Assessment of Diagnostic Studies served to assess any risk of bias. RESULTS A literature search produced 39 eligible articles for meta-analysis. The summary sensitivity, specificity, and area under receiver operating characteristic curve of AI for diagnosing non-melanoma skin cancer was 0.78, 0.98, & 0.97, respectively. Skin cancer typology, data sources, cross validation, ensemble models, types of techniques, pre-trained models, and image augmentation became significant covariates accounting for heterogeneity in terms of both sensitivity and/or specificity. CONCLUSIONS Meta-analysis results revealed that AI is predictive of non-melanoma with an acceptable performance, but sensitivity may become improved. Further, ensemble models and pre-trained models are employable to improve true positive rating.
Collapse
Affiliation(s)
- Kuang Ming Kuo
- Department of Business Management, National United University, No.1, Miaoli, 360301, Lienda, Taiwan, Republic of China
| | - Paul C Talley
- Department of Applied English, I-Shou University, No. 1, Sec. 1, Syuecheng Rd., Dashu District, 84001, Kaohsiung City, Taiwan, Republic of China
| | - Chao-Sheng Chang
- Department of Occupational Therapy, I-Shou University, No. 1, Yida Rd., Yanchao District, 82445, Kaohsiung City, Taiwan, Republic of China.
- Department of Emergency Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan, Republic of China.
| |
Collapse
|
38
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
39
|
Zhang J, Zhou P, Zheng Y, Wu H. Predicting influenza with pandemic-awareness via Dynamic Virtual Graph Significance Networks. Comput Biol Med 2023; 158:106807. [PMID: 37001208 DOI: 10.1016/j.compbiomed.2023.106807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/20/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023]
Abstract
Every year, influenza spreads worldwide and burdens people's health substantially. We need a reliable model to help hospitals, pharmaceutical companies, and governments better prepare for influenza outbreaks in a timely manner. However, the domain knowledge for such public health events, such as the variable influenza seasonality and occasional pandemics, poses significant challenges in predicting influenza outbreaks. The existing methods use current and historical values in a user-defined time window as input to predict future values but lack considering the situations outside the window. To address these limitations, we proposed Dynamic Virtual Graph Significance Networks (DVGSN). The graph-based algorithm can supervisedly and dynamically learn the implied knowledge from similar "infection situations" in all the historical timepoints without the limitation of time window. Furthermore, representation learning on the dynamic virtual graph can tackle the varied seasonality with pandemic-awareness without requiring domain knowledge input. The extensive experiments on real-world influenza data demonstrate that DVGSN significantly outperforms the state-of-the-art methods. To the best of our knowledge, this is the first attempt to supervisedly learn a dynamic virtual graph for time-series prediction tasks. Moreover, the proposed method has rich interpretabilities, which makes the method more acceptable in the fields of public health, life sciences, and so on. Our source code and dataset are available at https://github.com/aI-area/DVGSN.
Collapse
|
40
|
Subbaroyan A, Sil P, Martin OC, Samal A. Leveraging developmental landscapes for model selection in Boolean gene regulatory networks. Brief Bioinform 2023; 24:7145905. [PMID: 37114653 DOI: 10.1093/bib/bbad160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 03/26/2023] [Accepted: 04/03/2023] [Indexed: 04/29/2023] Open
Abstract
Boolean models are a well-established framework to model developmental gene regulatory networks (DGRNs) for acquisition of cellular identities. During the reconstruction of Boolean DGRNs, even if the network structure is given, there is generally a large number of combinations of Boolean functions that will reproduce the different cell fates (biological attractors). Here we leverage the developmental landscape to enable model selection on such ensembles using the relative stability of the attractors. First we show that previously proposed measures of relative stability are strongly correlated and we stress the usefulness of the one that captures best the cell state transitions via the mean first passage time (MFPT) as it also allows the construction of a cellular lineage tree. A property of great computational importance is the insensitivity of the different stability measures to changes in noise intensities. That allows us to use stochastic approaches to estimate the MFPT and thereby scale up the computations to large networks. Given this methodology, we revisit different Boolean models of Arabidopsis thaliana root development, showing that a most recent one does not respect the biologically expected hierarchy of cell states based on relative stabilities. We therefore developed an iterative greedy algorithm that searches for models which satisfy the expected hierarchy of cell states and found that its application to the root development model yields many models that meet this expectation. Our methodology thus provides new tools that can enable reconstruction of more realistic and accurate Boolean models of DGRNs.
Collapse
Affiliation(s)
- Ajay Subbaroyan
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| | - Priyotosh Sil
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| | - Olivier C Martin
- Université Paris-Saclay, CNRS, INRAE, Univ Evry, Institute of Plant Sciences Paris-Saclay (IPS2), 91405, Orsay, France
- Université de Paris, CNRS, INRAE, Institute of Plant Sciences Paris-Saclay (IPS2), 91405, Orsay, France
| | - Areejit Samal
- The Institute of Mathematical Sciences (IMSc), Chennai, 600113, India
- Homi Bhabha National Institute (HBNI), Mumbai, 400094, India
| |
Collapse
|
41
|
Sousa DF, Couto FM. K-RET: knowledgeable biomedical relation extraction system. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:7108769. [PMID: 37018156 PMCID: PMC10112952 DOI: 10.1093/bioinformatics/btad174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 02/25/2023] [Accepted: 03/29/2023] [Indexed: 04/20/2023]
Abstract
MOTIVATION Relation extraction (RE) is a crucial process to deal with the amount of text published daily, e.g. to find missing associations in a database. RE is a text mining task for which the state-of-the-art approaches use bidirectional encoders, namely, BERT. However, state-of-the-art performance may be limited by the lack of efficient external knowledge injection approaches, with a larger impact in the biomedical area given the widespread usage and high quality of biomedical ontologies. This knowledge can propel these systems forward by aiding them in predicting more explainable biomedical associations. With this in mind, we developed K-RET, a novel, knowledgeable biomedical RE system that, for the first time, injects knowledge by handling different types of associations, multiple sources and where to apply it, and multi-token entities. RESULTS We tested K-RET on three independent and open-access corpora (DDI, BC5CDR, and PGR) using four biomedical ontologies handling different entities. K-RET improved state-of-the-art results by 2.68% on average, with the DDI Corpus yielding the most significant boost in performance, from 79.30% to 87.19% in F-measure, representing a P-value of 2.91×10-12. AVAILABILITY AND IMPLEMENTATION https://github.com/lasigeBioTM/K-RET.
Collapse
Affiliation(s)
- Diana F Sousa
- Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Francisco M Couto
- Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| |
Collapse
|
42
|
Huang Y, Wuchty S, Zhou Y, Zhang Z. SGPPI: structure-aware prediction of protein-protein interactions in rigorous conditions with graph convolutional network. Brief Bioinform 2023; 24:6995378. [PMID: 36682013 DOI: 10.1093/bib/bbad020] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 11/17/2022] [Accepted: 01/05/2023] [Indexed: 01/23/2023] Open
Abstract
While deep learning (DL)-based models have emerged as powerful approaches to predict protein-protein interactions (PPIs), the reliance on explicit similarity measures (e.g. sequence similarity and network neighborhood) to known interacting proteins makes these methods ineffective in dealing with novel proteins. The advent of AlphaFold2 presents a significant opportunity and also a challenge to predict PPIs in a straightforward way based on monomer structures while controlling bias from protein sequences. In this work, we established Structure and Graph-based Predictions of Protein Interactions (SGPPI), a structure-based DL framework for predicting PPIs, using the graph convolutional network. In particular, SGPPI focused on protein patches on the protein-protein binding interfaces and extracted the structural, geometric and evolutionary features from the residue contact map to predict PPIs. We demonstrated that our model outperforms traditional machine learning methods and state-of-the-art DL-based methods using non-representation-bias benchmark datasets. Moreover, our model trained on human dataset can be reliably transferred to predict yeast PPIs, indicating that SGPPI can capture converging structural features of protein interactions across various species. The implementation of SGPPI is available at https://github.com/emerson106/SGPPI.
Collapse
Affiliation(s)
- Yan Huang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
- Department of Biomedical Informatics, Ministry of Education Key Laboratory of Molecular Cardiovascular Sciences, Center for Non-Coding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Stefan Wuchty
- Department of Computer Science, University of Miami, Coral Gables, FL 33146, USA
- Department of Biology, University of Miami, Coral Gables, FL 33146, USA
- Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL 33136, USA
- Institute of Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA
| | - Yuan Zhou
- Department of Biomedical Informatics, Ministry of Education Key Laboratory of Molecular Cardiovascular Sciences, Center for Non-Coding RNA Medicine, School of Basic Medical Sciences, Peking University, Beijing 100191, China
| | - Ziding Zhang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
43
|
Gao R, Luo J, Ding H, Zhai H. INSnet: a method for detecting insertions based on deep learning network. BMC Bioinformatics 2023; 24:80. [PMID: 36879189 PMCID: PMC9990265 DOI: 10.1186/s12859-023-05216-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 03/01/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND Many studies have shown that structural variations (SVs) strongly impact human disease. As a common type of SV, insertions are usually associated with genetic diseases. Therefore, accurately detecting insertions is of great significance. Although many methods for detecting insertions have been proposed, these methods often generate some errors and miss some variants. Hence, accurately detecting insertions remains a challenging task. RESULTS In this paper, we propose a method named INSnet to detect insertions using a deep learning network. First, INSnet divides the reference genome into continuous sub-regions and takes five features for each locus through alignments between long reads and the reference genome. Next, INSnet uses a depthwise separable convolutional network. The convolution operation extracts informative features through spatial information and channel information. INSnet uses two attention mechanisms, the convolutional block attention module (CBAM) and efficient channel attention (ECA) to extract key alignment features in each sub-region. In order to capture the relationship between adjacent subregions, INSnet uses a gated recurrent unit (GRU) network to further extract more important SV signatures. After predicting whether a sub-region contains an insertion through the previous steps, INSnet determines the precise site and length of the insertion. The source code is available from GitHub at https://github.com/eioyuou/INSnet . CONCLUSION Experimental results show that INSnet can achieve better performance than other methods in terms of F1 score on real datasets.
Collapse
Affiliation(s)
- Runtian Gao
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.
| | - Hongyu Ding
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| | - Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo, 454003, China
| |
Collapse
|
44
|
Wang X, Yang W, Yang Y, He Y, Zhang J, Wang L, Hu L. PPISB: A Novel Network-Based Algorithm of Predicting Protein-Protein Interactions With Mixed Membership Stochastic Blockmodel. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1606-1612. [PMID: 35939453 DOI: 10.1109/tcbb.2022.3196336] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein-protein interactions (PPIs) play an essential role for most of biological processes in cells. Many computational algorithms have thus been proposed to predict PPIs. However, most of them heavily rest on the biological information of proteins while ignoring the latent structural features of proteins presented in a PPI network. In this paper, we propose an efficient network-based prediction algorithm, namely PPISB, based on a mixed membership stochastic blockmodel. By simulating the generative process of a PPI network, PPISB is able to capture the latent community structures. The inference procedure adopted by PPISB further optimizes the membership distributions of proteins over different complexes. After that, a distance measure is designed to compute the similarity between two proteins in terms of their likelihoods of being in the same complex, thus verifying whether they interact with each other or not. To evaluate the performance of PPISB, a series of extensive experiments have been conducted with five PPI networks collected from different species and the results demonstrate that PPISB has a promising performance when applied to predict PPIs in terms of several evaluation metrics. Hence, we reason that PPISB is preferred over state-of-the-art network-based prediction algorithms especially for predicting potential PPIs.
Collapse
|
45
|
Yuen HY, Jansson J. Normalized L3-based link prediction in protein-protein interaction networks. BMC Bioinformatics 2023; 24:59. [PMID: 36814208 PMCID: PMC9945744 DOI: 10.1186/s12859-023-05178-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 02/08/2023] [Indexed: 02/24/2023] Open
Abstract
BACKGROUND Protein-protein interaction (PPI) data is an important type of data used in functional genomics. However, high-throughput experiments are often insufficient to complete the PPI interactome of different organisms. Computational techniques are thus used to infer missing data, with link prediction being one such approach that uses the structure of the network of PPIs known so far to identify non-edges whose addition to the network would make it more sound, according to some underlying assumptions. Recently, a new idea called the L3 principle introduced biological motivation into PPI link predictions, yielding predictors that are superior to general-purpose link predictors for complex networks. Interestingly, the L3 principle can be interpreted in another way, so that other signatures of PPI networks can also be characterized for PPI predictions. This alternative interpretation uncovers candidate PPIs that the current L3-based link predictors may not be able to fully capture, underutilizing the L3 principle. RESULTS In this article, we propose a formulation of link predictors that we call NormalizedL3 (L3N) which addresses certain missing elements within L3 predictors in the perspective of network modeling. Our computational validations show that the L3N predictors are able to find missing PPIs more accurately (in terms of true positives among the predicted PPIs) than the previously proposed methods on several datasets from the literature, including BioGRID, STRING, MINT, and HuRI, at the cost of using more computation time in some of the cases. In addition, we found that L3-based link predictors (including L3N) ranked a different pool of PPIs higher than the general-purpose link predictors did. This suggests that different types of PPIs can be predicted based on different topological assumptions, and that even better PPI link predictors may be obtained in the future by improved network modeling.
Collapse
Affiliation(s)
- Ho Yin Yuen
- Department of Biomedical Engineering, The Hong Kong Polytechnic University, Hong Kong, China.
| | - Jesper Jansson
- Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan.
| |
Collapse
|
46
|
Duran-Frigola M, Cigler M, Winter GE. Advancing Targeted Protein Degradation via Multiomics Profiling and Artificial Intelligence. J Am Chem Soc 2023; 145:2711-2732. [PMID: 36706315 PMCID: PMC9912273 DOI: 10.1021/jacs.2c11098] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Indexed: 01/28/2023]
Abstract
Only around 20% of the human proteome is considered to be druggable with small-molecule antagonists. This leaves some of the most compelling therapeutic targets outside the reach of ligand discovery. The concept of targeted protein degradation (TPD) promises to overcome some of these limitations. In brief, TPD is dependent on small molecules that induce the proximity between a protein of interest (POI) and an E3 ubiquitin ligase, causing ubiquitination and degradation of the POI. In this perspective, we want to reflect on current challenges in the field, and discuss how advances in multiomics profiling, artificial intelligence, and machine learning (AI/ML) will be vital in overcoming them. The presented roadmap is discussed in the context of small-molecule degraders but is equally applicable for other emerging proximity-inducing modalities.
Collapse
Affiliation(s)
- Miquel Duran-Frigola
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
- Ersilia
Open Source Initiative, 28 Belgrave Road, CB1 3DE, Cambridge, United Kingdom
| | - Marko Cigler
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| | - Georg E. Winter
- CeMM
Research Center for Molecular Medicine of the Austrian Academy of
Sciences, 1090 Vienna, Austria
| |
Collapse
|
47
|
Pratyush P, Pokharel S, Saigo H, KC DB. pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model. BMC Bioinformatics 2023; 24:41. [PMID: 36755242 PMCID: PMC9909867 DOI: 10.1186/s12859-023-05164-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 01/30/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Protein S-nitrosylation (SNO) plays a key role in transferring nitric oxide-mediated signals in both animals and plants and has emerged as an important mechanism for regulating protein functions and cell signaling of all main classes of protein. It is involved in several biological processes including immune response, protein stability, transcription regulation, post translational regulation, DNA damage repair, redox regulation, and is an emerging paradigm of redox signaling for protection against oxidative stress. The development of robust computational tools to predict protein SNO sites would contribute to further interpretation of the pathological and physiological mechanisms of SNO. RESULTS Using an intermediate fusion-based stacked generalization approach, we integrated embeddings from supervised embedding layer and contextualized protein language model (ProtT5) and developed a tool called pLMSNOSite (protein language model-based SNO site predictor). On an independent test set of experimentally identified SNO sites, pLMSNOSite achieved values of 0.340, 0.735 and 0.773 for MCC, sensitivity and specificity respectively. These results show that pLMSNOSite performs better than the compared approaches for the prediction of S-nitrosylation sites. CONCLUSION Together, the experimental results suggest that pLMSNOSite achieves significant improvement in the prediction performance of S-nitrosylation sites and represents a robust computational approach for predicting protein S-nitrosylation sites. pLMSNOSite could be a useful resource for further elucidation of SNO and is publicly available at https://github.com/KCLabMTU/pLMSNOSite .
Collapse
Affiliation(s)
- Pawel Pratyush
- grid.259979.90000 0001 0663 5937Department of Computer Science, Michigan Technological University, Houghton, MI USA
| | - Suresh Pokharel
- grid.259979.90000 0001 0663 5937Department of Computer Science, Michigan Technological University, Houghton, MI USA
| | - Hiroto Saigo
- grid.177174.30000 0001 2242 4849Department of Electrical Engineering and Computer Science, Kyushu University, 744, Motooka, Nishi-Ku, 819-0395 Japan
| | - Dukka B. KC
- grid.259979.90000 0001 0663 5937Department of Computer Science, Michigan Technological University, Houghton, MI USA
| |
Collapse
|
48
|
A graph neural network model for deciphering the biological mechanisms of plant electrical signal classification. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110153] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
49
|
Hou Z, Yang Y, Ma Z, Wong KC, Li X. Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning. Commun Biol 2023; 6:73. [PMID: 36653447 PMCID: PMC9849350 DOI: 10.1038/s42003-023-04462-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 01/11/2023] [Indexed: 01/20/2023] Open
Abstract
Protein-protein interactions (PPIs) govern cellular pathways and processes, by significantly influencing the functional expression of proteins. Therefore, accurate identification of protein-protein interaction binding sites has become a key step in the functional analysis of proteins. However, since most computational methods are designed based on biological features, there are no available protein language models to directly encode amino acid sequences into distributed vector representations to model their characteristics for protein-protein binding events. Moreover, the number of experimentally detected protein interaction sites is much smaller than that of protein-protein interactions or protein sites in protein complexes, resulting in unbalanced data sets that leave room for improvement in their performance. To address these problems, we develop an ensemble deep learning model (EDLM)-based protein-protein interaction (PPI) site identification method (EDLMPPI). Evaluation results show that EDLMPPI outperforms state-of-the-art techniques including several PPI site prediction models on three widely-used benchmark datasets including Dset_448, Dset_72, and Dset_164, which demonstrated that EDLMPPI is superior to those PPI site prediction models by nearly 10% in terms of average precision. In addition, the biological and interpretable analyses provide new insights into protein binding site identification and characterization mechanisms from different perspectives. The EDLMPPI webserver is available at http://www.edlmppi.top:5002/ .
Collapse
Affiliation(s)
- Zilong Hou
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuning Yang
- Information Science and Technology, Northeast Normal University, Jilin, China
| | - Zhiqiang Ma
- Information Science and Technology, Northeast Normal University, Jilin, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China.
| |
Collapse
|
50
|
Peng W, Wu R, Dai W, Yu N. Identifying cancer driver genes based on multi-view heterogeneous graph convolutional network and self-attention mechanism. BMC Bioinformatics 2023; 24:16. [PMID: 36639646 PMCID: PMC9838012 DOI: 10.1186/s12859-023-05140-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Accepted: 01/06/2023] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Correctly identifying the driver genes that promote cell growth can significantly assist drug design, cancer diagnosis and treatment. The recent large-scale cancer genomics projects have revealed multi-omics data from thousands of cancer patients, which requires to design effective models to unlock the hidden knowledge within the valuable data and discover cancer drivers contributing to tumorigenesis. RESULTS In this work, we propose a graph convolution network-based method called MRNGCN that integrates multiple gene relationship networks to identify cancer driver genes. First, we constructed three gene relationship networks, including the gene-gene, gene-outlying gene and gene-miRNA networks. Then, genes learnt feature presentations from the three networks through three sharing-parameter heterogeneous graph convolution network (HGCN) models with the self-attention mechanism. After that, these gene features pass a convolution layer to generate fused features. Finally, we utilized the fused features and the original feature to optimize the model by minimizing the node and link prediction losses. Meanwhile, we combined the fused features, the original features and the three features learned from every network through a logistic regression model to predict cancer driver genes. CONCLUSIONS We applied the MRNGCN to predict pan-cancer and cancer type-specific driver genes. Experimental results show that our model performs well in terms of the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPRC) compared to state-of-the-art methods. Ablation experimental results show that our model successfully improved the cancer driver identification by integrating multiple gene relationship networks.
Collapse
Affiliation(s)
- Wei Peng
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China ,grid.218292.20000 0000 8571 108XComputer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Rong Wu
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China
| | - Wei Dai
- grid.218292.20000 0000 8571 108XFaculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650050 China ,grid.218292.20000 0000 8571 108XComputer Technology Application Key Lab of Yunnan Province, Kunming University of Science and Technology, Kunming, 650050 China
| | - Ning Yu
- grid.264262.60000 0001 0725 9953Department of Computing Sciences, The College at Brockport, State University of New York, 350 New Campus Drive, Brockport, NY 14422 USA
| |
Collapse
|