1
|
Anteghini M, Gualdi F, Oliva B. How did we get there? AI applications to biological networks and sequences. Comput Biol Med 2025; 190:110064. [PMID: 40184941 DOI: 10.1016/j.compbiomed.2025.110064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/18/2025] [Accepted: 03/20/2025] [Indexed: 04/07/2025]
Abstract
The rapidly advancing field of artificial intelligence (AI) has transformed numerous scientific domains, including biology, where a vast and complex volume of data is available for analysis. This paper provides a comprehensive overview of the current state of AI-driven methodologies in genomics, proteomics, and systems biology. We discuss how machine learning algorithms, particularly deep learning models, have enhanced the accuracy and efficiency of embedding sequences, motif discovery, and the prediction of gene expression and protein structure. Additionally, we explore the integration of AI in the embedding and analysis of biological networks, including protein-protein interaction networks and multi-layered networks. By leveraging large-scale biological data, AI techniques have enabled unprecedented insights into complex biological processes and disease mechanisms. This work underlines the potential of applying AI to complex biological data, highlighting current applications and suggesting directions for future research to further explore AI in this rapidly evolving field.
Collapse
Affiliation(s)
- Marco Anteghini
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy; Visual and Data-Centric Computing, Zuse Institut Berlin, Berlin, Germany.
| | - Francesco Gualdi
- Structural Bioinformatics Lab, Universitat Pompeu Fabra, Barcelona, Spain; Istituto dalle Molle di Studi sull'Intelligenza Artificiale, USI/SUPSI (Università Svizzera Italiana/Scuola Universitaria Professionale Svizzera Italiana) Lugano, Switzerland.
| | - Baldo Oliva
- Structural Bioinformatics Lab, Universitat Pompeu Fabra, Barcelona, Spain.
| |
Collapse
|
2
|
Lee CM, Nguyen J, Pope B, Imami AS, Ryan VWG, Sahay S, Mathis V, Pulvender P, Eby HM, Arvay T, Alganem K, Wegman-Points L, McCullunsmith R, Yuan LL. Functional kinome profiling reveals brain protein kinase signaling pathways and gene networks altered by acute voluntary exercise in rats. PLoS One 2025; 20:e0321596. [PMID: 40233052 PMCID: PMC11999169 DOI: 10.1371/journal.pone.0321596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 03/07/2025] [Indexed: 04/17/2025] Open
Abstract
Regular exercise confers numerous physical and mental health benefits, yet individual variability in exercise participation and outcomes is still poorly understood. Uncovering the neurobiological mechanisms governing exercise behavior is essential for promoting physical activity and developing targeted interventions for related disorders. While genetic studies have provided insights, they often cannot account for protein-level alterations, such as changes in kinase activity. Here, we employ protein kinase activity profiling to delineate brain protein kinase activity and signaling networks modulated by acute voluntary exercise in rats. Focusing on the dorsal striatum, which governs voluntary exercise, as well as the hippocampus, which is susceptible to modulation by physical activity, we aim to understand the molecular basis of exercise behavior. Utilizing high throughput kinome array profiling and advanced pathway analyses, we identified protein kinase signaling pathways implicated in regulating voluntary exercise. Pathway analysis using Gene Ontology (GO) revealed significant alterations in 155 GO terms in the dorsal striatum and 206 GO terms in the hippocampus. Changes in kinase activity were observed in the striatum and hippocampus between the exercise (voluntary wheel running, VWR) and sedentary control rats. In both regions, global serine-threonine kinase (STK) activity was decreased, while global phospho-tyrosine kinase (PTK) activity was increased in VWR rats compared to control rats. We also identified specific kinases altered in VWR rats, including the IKappaB Kinase (IKK) and protein kinase delta (PKD) families. C-terminal src Kinase (CSK), epidermal growth factor (EGFR), and vascular endothelial growth factor receptor (VEGFR) tyrosine kinase were also enriched. These findings suggest regional heterogeneity of kinase activity following voluntary exercise, emphasizing potential molecular mechanisms underlying exercise behavior. This exploratory study lays the groundwork for future investigations into the causality of variations in exercise outcomes among individuals and different sexes, as well as the development of targeted interventions to promote physical activity and combat associated chronic diseases.
Collapse
Affiliation(s)
- Chia-Ming Lee
- Department of Physiology and Pharmacology, College of Osteopathic Medicine, Des Moines University, Des Moines, Iowa, United States of America
| | - Jennifer Nguyen
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Brock Pope
- Department of Physiology and Pharmacology, College of Osteopathic Medicine, Des Moines University, Des Moines, Iowa, United States of America
| | - Ali Sajid Imami
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - V. William George Ryan
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Smita Sahay
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Victoria Mathis
- Department of Physiology and Pharmacology, College of Osteopathic Medicine, Des Moines University, Des Moines, Iowa, United States of America
| | - Priyanka Pulvender
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Hunter Michael Eby
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Taylen Arvay
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Khaled Alganem
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Lauren Wegman-Points
- Department of Physiology and Pharmacology, College of Osteopathic Medicine, Des Moines University, Des Moines, Iowa, United States of America
| | - Robert McCullunsmith
- Department of Neurosciences and Psychiatry, College of Medicine and Life Sciences, University of Toledo, Toledo, Ohio, United States of America
- ProMedica, Neurosciences Institute, Toledo, Ohio, United States of America
| | - Li-Lian Yuan
- Department of Physiology and Pharmacology, College of Osteopathic Medicine, Des Moines University, Des Moines, Iowa, United States of America
| |
Collapse
|
3
|
Wang J, Chen J, Hu Y, Song C, Li X, Qian Y, Deng L. DeepMFFGO: A Protein Function Prediction Method for Large-Scale Multifeature Fusion. J Chem Inf Model 2025; 65:3841-3853. [PMID: 40116538 DOI: 10.1021/acs.jcim.5c00062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]
Abstract
Protein functional studies are crucial in the fields of drug target discovery and drug design. However, the existing methods have significant bottlenecks in utilizing multisource data fusion and Gene Ontology (GO) hierarchy. To this end, this study innovatively proposes the DeepMFFGO model designed for protein function prediction under large-scale multifeature fusion. A fine-tuning strategy using intermediate-level feature selection is proposed to reduce redundancy in protein sequences and mitigate distortion of the top-level features. A hierarchical progressive fusion structure is designed to explore feature connections, optimize complementarity through dynamic weight allocation, and reduce redundant interference. On the CAFA3 data set, the Fmax values of the DeepMFFGO model on the MF, BP, and CC ontologies reach 0.702, 0.599, and 0.704, respectively, which are improved by 4.2%, 2.4%, and 0.07%, respectively, compared with state-of-the-art multisource methods.
Collapse
Affiliation(s)
- Jingfu Wang
- School of Software, Xinjiang University, Urumqi 830091, China
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Jiaying Chen
- School of Software, Xinjiang University, Urumqi 830091, China
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Yue Hu
- School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
- Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
| | - Chaolin Song
- School of Software, Xinjiang University, Urumqi 830091, China
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
| | - Xinhui Li
- School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
- Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
| | - Yurong Qian
- Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China
- Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
- School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China
- Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
| | - Lei Deng
- School of Software, Xinjiang University, Urumqi 830091, China
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
4
|
Nguyen JH, Curtis MA, Imami AS, Ryan WG, Alganem K, Neifer KL, Saferin N, Nawor CN, Kistler BP, Miller GW, Shukla R, McCullumsmith RE, Burkett JP. Developmental pyrethroid exposure disrupts molecular pathways for MAP kinase and circadian rhythms in mouse brain. Physiol Genomics 2025; 57:240-253. [PMID: 39961078 DOI: 10.1152/physiolgenomics.00033.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 05/07/2024] [Accepted: 02/10/2025] [Indexed: 02/26/2025] Open
Abstract
Neurodevelopmental disorders (NDDs) are a category of pervasive disorders of the developing nervous system with few or no recognized biomarkers. A significant portion of the risk for NDDs, including attention deficit hyperactivity disorder (ADHD), is contributed by the environment, and exposure to pyrethroid pesticides during pregnancy has been identified as a potential risk factor for NDD in the unborn child. We recently showed that low-dose developmental exposure to the pyrethroid pesticide deltamethrin in mice causes male-biased changes to ADHD- and NDD-relevant behaviors as well as the striatal dopamine system. Here, we used an integrated multiomics approach to determine the broadest possible set of biological changes in the mouse brain caused by developmental pyrethroid exposure (DPE). Using a litter-based, split-sample design, we exposed mouse dams during pregnancy and lactation to deltamethrin (3 mg/kg or vehicle every 3 days) at a concentration well below the EPA-determined benchmark dose used for regulatory guidance. We raised male offspring to adulthood, euthanized them, and pulverized and divided whole brain samples for split-sample transcriptomics, kinomics, and multiomics integration. Transcriptome analysis revealed alterations to multiple canonical clock genes, and kinome analysis revealed changes in the activity of multiple kinases involved in synaptic plasticity, including the mitogen-activated protein (MAP) kinase ERK. Multiomics integration revealed a dysregulated protein-protein interaction network containing primary clusters for MAP kinase cascades, regulation of apoptosis, and synaptic function. These results demonstrate that DPE causes a multimodal biophenotype in the brain relevant to ADHD and identifies new potential mechanisms of action.NEW & NOTEWORTHY Here, we provide the first evidence that low-dose developmental exposure to a pyrethroid pesticide, deltamethrin, results in molecular disruptions in the adult mouse brain in pathways regulating circadian rhythms and neuronal growth (MAP kinase). This same exposure causes a neurodevelopmental disorder (NDD)-relevant behavioral change in adult mice, making these findings relevant to the prevention of NDDs.
Collapse
Affiliation(s)
- Jennifer H Nguyen
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Melissa A Curtis
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Ali S Imami
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - William G Ryan
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Khaled Alganem
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Kari L Neifer
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Nilanjana Saferin
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Charlotte N Nawor
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Brian P Kistler
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Gary W Miller
- Department of Environmental Health, Emory Rollins School of Public Health, Atlanta, Georgia, United States
| | - Rammohan Shukla
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| | - Robert E McCullumsmith
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
- Neurosciences Institute, ProMedica, Toledo, Ohio, United States
| | - James P Burkett
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, Ohio, United States
| |
Collapse
|
5
|
Jeong CU, Kim J, Kim D, Sohn KA. GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes. Bioinformatics 2025; 41:btaf160. [PMID: 40217132 PMCID: PMC12036960 DOI: 10.1093/bioinformatics/btaf160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2024] [Revised: 03/03/2025] [Accepted: 04/08/2025] [Indexed: 04/30/2025] Open
Abstract
MOTIVATION Leveraging deep learning for the representation learning of Gene Ontology (GO) and Gene Ontology Annotation (GOA) holds significant promise for enhancing downstream biological tasks such as protein-protein interaction prediction. Prior approaches have predominantly used text- and graph-based methods, embedding GO and GOA in a single geometric space (e.g. Euclidean or hyperbolic). However, since the GO graph exhibits a complex and nonmonotonic hierarchy, single-space embeddings are insufficient to fully capture its structural nuances. RESULTS In this study, we address this limitation by exploiting geometric interaction to better reflect the intricate hierarchical structure of GO. Our proposed method, Geometry-Aware Knowledge Graph Embeddings for GO and Genes (GeOKG), leverages interactions among various geometric representations during training, thereby modeling the complex hierarchy of GO more effectively. Experiments at the GO level demonstrate the benefits of incorporating these geometric interactions, while gene-level tests reveal that GeOKG outperforms existing methods in protein-protein interaction prediction. These findings highlight the potential of using geometric interaction for embedding heterogeneous biomedical networks. AVAILABILITY AND IMPLEMENTATION https://github.com/ukjung21/GeOKG.
Collapse
Affiliation(s)
- Chang-Uk Jeong
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jaesik Kim
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Dokyoon Kim
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kyung-Ah Sohn
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea
- Department of Artificial Intelligence, Ajou University, Suwon, 16499, South Korea
| |
Collapse
|
6
|
Edera AA, Stegmayer G, Milone DH. gGN: Representing the Gene Ontology as low-rank Gaussian distributions. Comput Biol Med 2024; 183:109234. [PMID: 39395345 DOI: 10.1016/j.compbiomed.2024.109234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Revised: 09/06/2024] [Accepted: 09/30/2024] [Indexed: 10/14/2024]
Abstract
Computational representations of knowledge graphs are critical for several tasks in bioinformatics, including large-scale graph analysis and gene function characterization. In this study, we introduce gGN, an unsupervised neural network for learning node representations as Gaussian distributions. Unlike prior efforts, where the covariance matrices of these distributions are simplified to diagonal, we propose representing them with a low-rank approximation. This representation not only maintains manageable learning complexity, allowing for scaling to large graphs, but is also more effective for modeling the structural features of knowledge graphs, such as their hierarchical and directional relationships between nodes. To learn the low-rank Gaussian distributions, we introduce a semantic-based loss function that effectively preserves these structural features. Systematic experiments reveal that gGN preserves structural features more effectively than existing approaches and scales efficiently on large knowledge graphs. Furthermore, applying gGN to represent the Gene Ontology, a widely used knowledge graph in bioinformatics, outperformed multiple baseline methods in ubiquitous gene characterization tasks. Altogether, the proposed low-rank Gaussian distributions not only effectively represent knowledge graphs but also open new avenues for enhancing bioinformatics tasks. gGN is publicly available as an easily installable package at https://github.com/aedera/ggn.
Collapse
Affiliation(s)
- Alejandro A Edera
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL 3000, Santa Fe, Argentina.
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL 3000, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL 3000, Santa Fe, Argentina
| |
Collapse
|
7
|
Guan J, Ji Y, Peng C, Zou W, Tang X, Shang J, Sun Y. GOPhage: protein function annotation for bacteriophages by integrating the genomic context. Brief Bioinform 2024; 26:bbaf014. [PMID: 39838963 PMCID: PMC11751364 DOI: 10.1093/bib/bbaf014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 12/15/2024] [Accepted: 01/06/2025] [Indexed: 01/23/2025] Open
Abstract
Bacteriophages are viruses that target bacteria, playing a crucial role in microbial ecology. Phage proteins are important in understanding phage biology, such as virus infection, replication, and evolution. Although a large number of new phages have been identified via metagenomic sequencing, many of them have limited protein function annotation. Accurate function annotation of phage proteins presents several challenges, including their inherent diversity and the scarcity of annotated ones. Existing tools have yet to fully leverage the unique properties of phages in annotating protein functions. In this work, we propose a new protein function annotation tool for phages by leveraging the modular genomic structure of phage genomes. By employing embeddings from the latest protein foundation models and Transformer to capture contextual information between proteins in phage genomes, GOPhage surpasses state-of-the-art methods in annotating diverged proteins and proteins with uncommon functions by 6.78% and 13.05% improvement, respectively. GOPhage can annotate proteins lacking homology search results, which is critical for characterizing the rapidly accumulating phage genomes. We demonstrate the utility of GOPhage by identifying 688 potential holins in phages, which exhibit high structural conservation with known holins. The results show the potential of GOPhage to extend our understanding of newly discovered phages.
Collapse
Affiliation(s)
- Jiaojiao Guan
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong (SAR), China
| | - Yongxin Ji
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong (SAR), China
| | - Cheng Peng
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong (SAR), China
| | - Wei Zou
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong (SAR), China
| | - Xubo Tang
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong (SAR), China
| | - Jiayu Shang
- Department of Information Engineering, Chinese University of Hong Kong, Shatin, New Territories, Hong Kong (SAR), China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, Hong Kong (SAR), China
| |
Collapse
|
8
|
Vu TTD, Kim J, Jung J. An experimental analysis of graph representation learning for Gene Ontology based protein function prediction. PeerJ 2024; 12:e18509. [PMID: 39553733 PMCID: PMC11569786 DOI: 10.7717/peerj.18509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024] Open
Abstract
Understanding protein function is crucial for deciphering biological systems and facilitating various biomedical applications. Computational methods for predicting Gene Ontology functions of proteins emerged in the 2000s to bridge the gap between the number of annotated proteins and the rapidly growing number of newly discovered amino acid sequences. Recently, there has been a surge in studies applying graph representation learning techniques to biological networks to enhance protein function prediction tools. In this review, we provide fundamental concepts in graph embedding algorithms. This study described graph representation learning methods for protein function prediction based on four principal data categories, namely PPI network, protein structure, Gene Ontology graph, and integrated graph. The commonly used approaches for each category were summarized and diagrammed, with the specific results of each method explained in detail. Finally, existing limitations and potential solutions were discussed, and directions for future research within the protein research community were suggested.
Collapse
Affiliation(s)
- Thi Thuy Duong Vu
- Faculty of Fundamental Sciences, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam
| | - Jeongho Kim
- Department of Information and Communication Engineering, Myongji University, Yongin, Republic of South Korea
| | - Jaehee Jung
- Department of Information and Communication Engineering, Myongji University, Yongin, Republic of South Korea
| |
Collapse
|
9
|
Lin B, Luo X, Liu Y, Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform 2024; 25:bbae289. [PMID: 39003530 PMCID: PMC11246557 DOI: 10.1093/bib/bbae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/18/2024] [Indexed: 07/15/2024] Open
Abstract
Protein function prediction is critical for understanding the cellular physiological and biochemical processes, and it opens up new possibilities for advancements in fields such as disease research and drug discovery. During the past decades, with the exponential growth of protein sequence data, many computational methods for predicting protein function have been proposed. Therefore, a systematic review and comparison of these methods are necessary. In this study, we divide these methods into four different categories, including sequence-based methods, 3D structure-based methods, PPI network-based methods and hybrid information-based methods. Furthermore, their advantages and disadvantages are discussed, and then their performance is comprehensively evaluated and compared. Finally, we discuss the challenges and opportunities present in this field.
Collapse
Affiliation(s)
- Baohui Lin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| | - Xiaoling Luo
- Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Shenzhen, Guangdong, China
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518061, China
| | - Yumeng Liu
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| |
Collapse
|
10
|
Nguyen JH, Curtis MA, Imami AS, Ryan WG, Alganem K, Neifer KL, Saferin N, Nawor CN, Kistler BP, Miller GW, Shukla R, McCullumsmith RE, Burkett JP. Developmental pyrethroid exposure disrupts molecular pathways for MAP kinase and circadian rhythms in mouse brain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.28.555113. [PMID: 37745438 PMCID: PMC10515776 DOI: 10.1101/2023.08.28.555113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Neurodevelopmental disorders (NDDs) are a category of pervasive disorders of the developing nervous system with few or no recognized biomarkers. A significant portion of the risk for NDDs, including attention deficit hyperactivity disorder (ADHD), is contributed by the environment, and exposure to pyrethroid pesticides during pregnancy has been identified as a potential risk factor for NDD in the unborn child. We recently showed that low-dose developmental exposure to the pyrethroid pesticide deltamethrin in mice causes male-biased changes to ADHD- and NDD-relevant behaviors as well as the striatal dopamine system. Here, we used an integrated multiomics approach to determine the broadest possible set of biological changes in the mouse brain caused by developmental pyrethroid exposure (DPE). Using a litter-based, split-sample design, we exposed mouse dams during pregnancy and lactation to deltamethrin (3 mg/kg or vehicle every 3 days) at a concentration well below the EPA-determined benchmark dose used for regulatory guidance. We raised male offspring to adulthood, euthanized them, and pulverized and divided whole brain samples for split-sample transcriptomics, kinomics and multiomics integration. Transcriptome analysis revealed alterations to multiple canonical clock genes, and kinome analysis revealed changes in the activity of multiple kinases involved in synaptic plasticity, including the mitogen-activated protein (MAP) kinase ERK. Multiomics integration revealed a dysregulated protein-protein interaction network containing primary clusters for MAP kinase cascades, regulation of apoptosis, and synaptic function. These results demonstrate that DPE causes a multi-modal biophenotype in the brain relevant to ADHD and identifies new potential mechanisms of action.
Collapse
Affiliation(s)
- Jennifer H. Nguyen
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - Melissa A. Curtis
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - Ali S. Imami
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - William G. Ryan
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - Khaled Alganem
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
- The Medical Cities at the Ministry of Interior, Riyadh, Saudi Arabia (current)
| | - Kari L. Neifer
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - Nilanjana Saferin
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - Charlotte N. Nawor
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - Brian P. Kistler
- Department of Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| | - Gary W. Miller
- Department of Environmental Health, Emory Rollins School of Public Health, Atlanta, GA 30322
- Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, NY 10032 (current)
| | - Rammohan Shukla
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY 82071 (current)
| | - Robert E. McCullumsmith
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
- Neurosciences Institute, Promedica, Toledo, OH 43606
| | - James P. Burkett
- Department of Neurosciences, University of Toledo College of Medicine and Life Sciences, Toledo, OH 43614
| |
Collapse
|
11
|
Liu M, Srivastava G, Ramanujam J, Brylinski M. SynerGNet: A Graph Neural Network Model to Predict Anticancer Drug Synergy. Biomolecules 2024; 14:253. [PMID: 38540674 PMCID: PMC10967862 DOI: 10.3390/biom14030253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/16/2024] [Accepted: 02/19/2024] [Indexed: 01/03/2025] Open
Abstract
Drug combination therapy shows promise in cancer treatment by addressing drug resistance, reducing toxicity, and enhancing therapeutic efficacy. However, the intricate and dynamic nature of biological systems makes identifying potential synergistic drugs a costly and time-consuming endeavor. To facilitate the development of combination therapy, techniques employing artificial intelligence have emerged as a transformative solution, providing a sophisticated avenue for advancing existing therapeutic approaches. In this study, we developed SynerGNet, a graph neural network model designed to accurately predict the synergistic effect of drug pairs against cancer cell lines. SynerGNet utilizes cancer-specific featured graphs created by integrating heterogeneous biological features into the human protein-protein interaction network, followed by a reduction process to enhance topological diversity. Leveraging synergy data provided by AZ-DREAM Challenges, the model yields a balanced accuracy of 0.68, significantly outperforming traditional machine learning. Encouragingly, augmenting the training data with carefully constructed synthetic instances improved the balanced accuracy of SynerGNet to 0.73. Finally, the results of an independent validation conducted against DrugCombDB demonstrated that it exhibits a strong performance when applied to unseen data. SynerGNet shows a great potential in detecting drug synergy, positioning itself as a valuable tool that could contribute to the advancement of combination therapy for cancer treatment.
Collapse
Affiliation(s)
- Mengmeng Liu
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA; (M.L.)
| | - Gopal Srivastava
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - J. Ramanujam
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA; (M.L.)
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA
| |
Collapse
|
12
|
Li W, Wang B, Dai J, Kou Y, Chen X, Pan Y, Hu S, Xu ZZ. Partial order relation-based gene ontology embedding improves protein function prediction. Brief Bioinform 2024; 25:bbae077. [PMID: 38446740 PMCID: PMC10917077 DOI: 10.1093/bib/bbae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 01/22/2024] [Indexed: 03/08/2024] Open
Abstract
Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG). In this study, we propose a novel GO term representation learning method, PO2Vec, to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.
Collapse
Affiliation(s)
- Wenjing Li
- College of Computer Science and Software, Shenzhen University, Shenzhen, China
| | - Bin Wang
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
| | - Jin Dai
- Center for Quantum Technology Research and School of Physics, Beijing Institute of Technology, Beijing, China
| | - Yan Kou
- Xbiome, Scientific Research Building, Tsinghua High-Tech Park, Shenzhen, China
| | - Xiaojun Chen
- College of Computer Science and Software, Shenzhen University, Shenzhen, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, China
| | - Shuangwei Hu
- Xbiome, Scientific Research Building, Tsinghua High-Tech Park, Shenzhen, China
| | - Zhenjiang Zech Xu
- School of Mathematics and Computer Sciences, Nanchang University, Nanchang, China
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China
| |
Collapse
|
13
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. E-SNPs&GO: embedding of protein sequence and function improves the annotation of human pathogenic variants. Bioinformatics 2022; 38:5168-5174. [PMID: 36227117 PMCID: PMC9710551 DOI: 10.1093/bioinformatics/btac678] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 09/14/2022] [Accepted: 10/10/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The advent of massive DNA sequencing technologies is producing a huge number of human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones is one of the crucial challenges in precision medicine. Computational tools based on artificial intelligence provide models for protein sequence encoding, bypassing database searches for evolutionary information. We leverage the new encoding schemes for an efficient annotation of protein variants. RESULTS E-SNPs&GO is a novel method that, given an input protein sequence and a single amino acid variation, can predict whether the variation is related to diseases or not. The proposed method adopts an input encoding completely based on protein language models and embedding techniques, specifically devised to encode protein sequences and GO functional annotations. We trained our model on a newly generated dataset of 101 146 human protein single amino acid variants in 13 661 proteins, derived from public resources. When tested on a blind set comprising 10 266 variants, our method well compares to recent approaches released in literature for the same task, reaching a Matthews Correlation Coefficient score of 0.72. We propose E-SNPs&GO as a suitable, efficient and accurate large-scale annotator of protein variant datasets. AVAILABILITY AND IMPLEMENTATION The method is available as a webserver at https://esnpsandgo.biocomp.unibo.it. Datasets and predictions are available at https://esnpsandgo.biocomp.unibo.it/datasets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| |
Collapse
|
14
|
Zhao L, Sun H, Cao X, Wen N, Wang J, Wang C. Learning representations for gene ontology terms by jointly encoding graph structure and textual node descriptors. Brief Bioinform 2022; 23:6651302. [PMID: 35901452 DOI: 10.1093/bib/bbac318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 06/06/2022] [Accepted: 07/13/2022] [Indexed: 11/14/2022] Open
Abstract
Measuring the semantic similarity between Gene Ontology (GO) terms is a fundamental step in numerous functional bioinformatics applications. To fully exploit the metadata of GO terms, word embedding-based methods have been proposed recently to map GO terms to low-dimensional feature vectors. However, these representation methods commonly overlook the key information hidden in the whole GO structure and the relationship between GO terms. In this paper, we propose a novel representation model for GO terms, named GT2Vec, which jointly considers the GO graph structure obtained by graph contrastive learning and the semantic description of GO terms based on BERT encoders. Our method is evaluated on a protein similarity task on a collection of benchmark datasets. The experimental results demonstrate the effectiveness of using a joint encoding graph structure and textual node descriptors to learn vector representations for GO terms.
Collapse
Affiliation(s)
- Lingling Zhao
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| | - Huiting Sun
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Xinyi Cao
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Naifeng Wen
- College of Electromechanical and Information Engineering, Dalian Minzu University, Dalian 116600, China
| | - Junjie Wang
- Department of Medical Informatics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|