1
|
van der Weg K, Merdivan E, Piraud M, Gohlke H. TopEC: prediction of Enzyme Commission classes by 3D graph neural networks and localized 3D protein descriptor. Nat Commun 2025; 16:2737. [PMID: 40108108 PMCID: PMC11923149 DOI: 10.1038/s41467-025-57324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 02/11/2025] [Indexed: 03/22/2025] Open
Abstract
Tools available for inferring enzyme function from general sequence, fold, or evolutionary information are generally successful. However, they can lead to misclassification if a deviation in local structural features influences the function. Here, we present TopEC, a 3D graph neural network based on a localized 3D descriptor to learn chemical reactions of enzymes from enzyme structures and predict Enzyme Commission (EC) classes. Using message-passing frameworks, we include distance and angle information to significantly improve the predictive performance for EC classification (F-score: 0.72) compared to regular 2D graph neural networks. We trained networks without fold bias that can classify enzyme structures for a vast functional space (>800 ECs). Our model is robust to uncertainties in binding site locations and similar functions in distinct binding sites. We observe that TopEC networks learn from an interplay between biochemical features and local shape-dependent features. TopEC is available as a repository on GitHub: https://github.com/IBG4-CBCLab/TopEC and https://doi.org/10.25838/d5p-66 .
Collapse
Affiliation(s)
- Karel van der Weg
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany
| | - Erinc Merdivan
- Helmholtz AI Central Unit, Ingolstädter Landstraße 1, 85764, Oberschleißheim, Germany
| | - Marie Piraud
- Helmholtz AI Central Unit, Ingolstädter Landstraße 1, 85764, Oberschleißheim, Germany
| | - Holger Gohlke
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany.
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany.
| |
Collapse
|
2
|
Zou Y, Guo T, Fu Z, Guo Z, Bo W, Yan D, Wang Q, Zeng J, Xu D, Wang T, Chen L. A structure-based framework for selective inhibitor design and optimization. Commun Biol 2025; 8:422. [PMID: 40075154 PMCID: PMC11903766 DOI: 10.1038/s42003-025-07840-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 02/27/2025] [Indexed: 03/14/2025] Open
Abstract
Structure-based drug design aims to create active compounds with favorable properties by analyzing target structures. Recently, deep generative models have facilitated structure-specific molecular generation. However, many methods are limited by inadequate pharmaceutical data, resulting in suboptimal molecular properties and unstable conformations. Additionally, these approaches often overlook binding pocket interactions and struggle with selective inhibitor design. To address these challenges, we developed a framework called Coarse-grained and Multi-dimensional Data-driven molecular generation (CMD-GEN). CMD-GEN bridges ligand-protein complexes with drug-like molecules by utilizing coarse-grained pharmacophore points sampled from diffusion model, enriching training data. Through a hierarchical architecture, it decomposes three-dimensional molecule generation within the pocket into pharmacophore point sampling, chemical structure generation, and conformation alignment, mitigating instability issues. CMD-GEN outperforms other methods in benchmark tests and controls drug-likeness effectively. Furthermore, CMD-GEN excels in cases across three synthetic lethal targets, and wet-lab validation with PARP1/2 inhibitors confirms its potential in selective inhibitor design.
Collapse
Affiliation(s)
- Yurong Zou
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Tao Guo
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Zhiyuan Fu
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Zhongning Guo
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Weichen Bo
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Dengjie Yan
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, West China School of Pharmacy, Sichuan University, Chengdu, China
| | - Qiantao Wang
- Key Laboratory of Drug-Targeting and Drug Delivery System of the Education Ministry and Sichuan Province, West China School of Pharmacy, Sichuan University, Chengdu, China
| | - Jun Zeng
- Western Health, Faculty of Medicine Dentistry and Health Sciences, University of Melbourne, Carlton, VIC, Australia
| | - Dingguo Xu
- MOE Key Laboratory of Green Chemistry and Technology, College of Chemistry, Sichuan University, Chengdu, China
| | - Taijin Wang
- Chengdu Zenitar Biomedical Technology Co., Ltd., Chengdu, China.
| | - Lijuan Chen
- State Key Laboratory of Biotherapy and Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, China.
- Chengdu Zenitar Biomedical Technology Co., Ltd., Chengdu, China.
| |
Collapse
|
3
|
Sim J, Kim D, Kim B, Choi J, Lee J. Recent advances in AI-driven protein-ligand interaction predictions. Curr Opin Struct Biol 2025; 92:103020. [PMID: 39999605 DOI: 10.1016/j.sbi.2025.103020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Revised: 01/23/2025] [Accepted: 01/31/2025] [Indexed: 02/27/2025]
Abstract
Structure-based drug discovery is a fundamental approach in modern drug development, leveraging computational models to predict protein-ligand interactions. AI-driven methodologies are significantly improving key aspects of the field, including ligand binding site prediction, protein-ligand binding pose estimation, scoring function development, and virtual screening. In this review, we summarize the recent AI-driven advances in various protein-ligand interaction prediction tasks. Traditional docking methods based on empirical scoring functions often lack accuracy, whereas AI models, including graph neural networks, mixture density networks, transformers, and diffusion models, have enhanced predictive performance. Ligand binding site prediction has been refined using geometric deep learning and sequence-based embeddings, aiding in the identification of potential druggable target sites. Binding pose prediction has evolved with sampling-based and regression-based models, as well as protein-ligand co-generation frameworks. AI-powered scoring functions now integrate physical constraints and deep learning techniques to improve binding affinity estimation, leading to more robust virtual screening strategies. Despite these advances, generalization across diverse protein-ligand pairs remains a challenge. As AI technologies continue to evolve, they are expected to revolutionize molecular docking and affinity prediction, increasing both the accuracy and efficiency of structure-based drug discovery.
Collapse
Affiliation(s)
- Jaemin Sim
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea
| | - Dongwoo Kim
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Bomin Kim
- College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea
| | - Jieun Choi
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea
| | - Juyong Lee
- Department of Molecular Medicine and Biopharmaceutical Sciences, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, 08826, Republic of Korea; College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea; Research Institute of Pharmaceutical Science, College of Pharmacy, Seoul National University, Seoul, 08826, Republic of Korea; Arontier Co., Seoul, 06735, Republic of Korea.
| |
Collapse
|
4
|
Brouwer EM, Medipally HKR, Schwab S, Song S, Nowaczyk MM, Hagemann M. Characterization of the oxygen-tolerant formate dehydrogenase from Clostridium carboxidivorans. Front Microbiol 2025; 15:1527626. [PMID: 39872818 PMCID: PMC11770034 DOI: 10.3389/fmicb.2024.1527626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 12/24/2024] [Indexed: 01/30/2025] Open
Abstract
Fixation of CO2 into the organic compound formate by formate dehydrogenases (FDHs) is regarded as the oldest autotrophic process on Earth. It has been proposed that an FDH-dependent CO2 fixation module could support CO2 assimilation even in photoautotrophic organisms. In the present study, we characterized FDH from Clostridium carboxidivorans (ccFDH) due to its ability to reduce CO2 under aerobic conditions. During the production of recombinant ccFDH, in which the selenocysteine codon was replaced by Cys, we were able to replace the W with Mo as the transition metal in the ccFDH metal cofactor, resulting in a two-fold increase of 6 μmol formate min-1 in enzyme activity. Then, we generated ccFDH variants in which the strict NADH preference of the enzyme was changed to NADPH, as this reducing agent is produced in high amounts during the photosynthetic light process. Finally, we showed that the native ccFDH can also directly use ferredoxin as a reducing agent, which is produced by the photosynthetic light reactions at photosystem I. These data collectively suggest that ccFDH and, particularly, its optimized variants can be regarded as suitable enzymes to couple formate production to photosynthesis in photoautotroph organisms, which could potentially support CO2 assimilation via the Calvin-Benson-Bassham (CBB) cycle and minimize CO2 losses due to photorespiration.
Collapse
Affiliation(s)
- Eva-Maria Brouwer
- Department of Plant Physiology, Institute of Biosciences, University of Rostock, Rostock, Germany
| | - Hitesh K. R. Medipally
- Department of Plant Biochemistry, Faculty of Biology and Biotechnology, Ruhr-University Bochum, Bochum, Germany
- Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH—Royal Institute of Technology, Stockholm, Sweden
| | - Saskia Schwab
- Department of Plant Physiology, Institute of Biosciences, University of Rostock, Rostock, Germany
| | - Shanshan Song
- Department of Plant Physiology, Institute of Biosciences, University of Rostock, Rostock, Germany
| | - Marc M. Nowaczyk
- Department of Plant Biochemistry, Faculty of Biology and Biotechnology, Ruhr-University Bochum, Bochum, Germany
- Department of Biochemistry, Institute of Biosciences, University of Rostock, Rostock, Germany
- Department of Life, Light and Matter, Interdisciplinary Faculty, University of Rostock, Rostock, Germany
| | - Martin Hagemann
- Department of Plant Physiology, Institute of Biosciences, University of Rostock, Rostock, Germany
- Department of Life, Light and Matter, Interdisciplinary Faculty, University of Rostock, Rostock, Germany
| |
Collapse
|
5
|
Kim J, Woo J, Park JY, Kim KJ, Kim D. Deep learning for NAD/NADP cofactor prediction and engineering using transformer attention analysis in enzymes. Metab Eng 2025; 87:86-94. [PMID: 39571721 DOI: 10.1016/j.ymben.2024.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 09/25/2024] [Accepted: 11/17/2024] [Indexed: 12/13/2024]
Abstract
Understanding and manipulating the cofactor preferences of NAD(P)-dependent oxidoreductases, the most widely distributed enzyme group in nature, is increasingly crucial in bioengineering. However, large-scale identification of the cofactor preferences and the design of mutants to switch cofactor specificity remain as complex tasks. Here, we introduce DISCODE (Deep learning-based Iterative pipeline to analyze Specificity of COfactors and to Design Enzyme), a novel transformer-based deep learning model to predict NAD(P) cofactor preferences. For model training, a total of 7,132 NAD(P)-dependent enzyme sequences were collected. Leveraging whole-length sequence information, DISCODE classifies the cofactor preferences of NAD(P)-dependent oxidoreductase protein sequences without structural or taxonomic limitation. The model showed 97.4% and 97.3% of accuracy and F1 score, respectively. A notable feature of DISCODE is the interpretability of its transformer layers. Analysis of attention layers in the model enables identification of several residues that showed significantly higher attention weights. They were well aligned with structurally important residues that closely interact with NAD(P), facilitating the identification of key residues for determining cofactor specificities. These key residues showed high consistency with verified cofactor switching mutants. Integrated into an enzyme design pipeline, DISCODE coupled with attention analysis, enables a fully automated approach to redesign cofactor specificity.
Collapse
Affiliation(s)
- Jaehyung Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Jihoon Woo
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea
| | - Kyung-Jin Kim
- School of Life Sciences, BK21 FOUR KNU Creative BioResearch Group, KNU Institute of Microbiology, Kyungpook National University, Daegu, 41566, Republic of Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, 44919, Republic of Korea.
| |
Collapse
|
6
|
Ye Y, Jiang H, Xu R, Wang S, Zheng L, Guo J. The INSIGHT platform: Enhancing NAD(P)-dependent specificity prediction for co-factor specificity engineering. Int J Biol Macromol 2024; 278:135064. [PMID: 39182884 DOI: 10.1016/j.ijbiomac.2024.135064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 08/27/2024]
Abstract
Enzyme specificity towards cofactors like NAD(P)H is crucial for applications in bioremediation and eco-friendly chemical synthesis. Despite their role in converting pollutants and creating sustainable products, predicting enzyme specificity faces challenges due to sparse data and inadequate models. To bridge this gap, we developed the cutting-edge INSIGHT platform to enhance the prediction of coenzyme specificity in NAD(P)-dependent enzymes. INSIGHT integrates extensive data from principal bioinformatics resources, concentrating on both NADH and NADPH specificities, and utilizes advanced protein language models to refine the predictions. This integration not only strengthens computational predictions but also meets the practical demands of high-throughput screening and optimization. Experimental validation confirms INSIGHT's effectiveness, boosting our ability to engineer enzymes for efficient, sustainable industrial and environmental processes. This work advances the practical use of computational tools in enzyme research, addressing industrial needs and offering scalable solutions for environmental challenges.
Collapse
Affiliation(s)
- Yilin Ye
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao
| | | | - Ran Xu
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., China
| | | | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macao.
| |
Collapse
|
7
|
Filgueiras JPC, Zámocký M, Turchetto-Zolet AC. Unraveling the evolutionary origin of the P5CS gene: a story of gene fusion and horizontal transfer. Front Mol Biosci 2024; 11:1341684. [PMID: 38693917 PMCID: PMC11061531 DOI: 10.3389/fmolb.2024.1341684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/25/2024] [Indexed: 05/03/2024] Open
Abstract
The accumulation of proline in response to the most diverse types of stress is a widespread defense mechanism. In prokaryotes, fungi, and certain unicellular eukaryotes (green algae), the first two reactions of proline biosynthesis occur through two distinct enzymes, γ-glutamyl kinase (GK E.C. 2.7.2.11) and γ-glutamyl phosphate reductase (GPR E.C. 1.2.1.41), encoded by two different genes, ProB and ProA, respectively. Plants, animals, and a few unicellular eukaryotes carry out these reactions through a single bifunctional enzyme, the Δ1-pyrroline-5-carboxylate synthase (P5CS), which has the GK and GPR domains fused. To better understand the origin and diversification of the P5CS gene, we use a robust phylogenetic approach with a broad sampling of the P5CS, ProB and ProA genes, including species from all three domains of life. Our results suggest that the collected P5CS genes have arisen from a single fusion event between the ProA and ProB gene paralogs. A peculiar fusion event occurred in an ancestral eukaryotic lineage and was spread to other lineages through horizontal gene transfer. As for the diversification of this gene family, the phylogeny of the P5CS gene in plants shows that there have been multiple independent processes of duplication and loss of this gene, with the duplications being related to old polyploidy events.
Collapse
Affiliation(s)
- João Pedro Carmo Filgueiras
- Graduate Program in Genetics and Molecular Biology, Department of Genetics, Institute of Biosciences, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| | - Marcel Zámocký
- Laboratory of Phylogenomic Ecology, Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
| | - Andreia Carina Turchetto-Zolet
- Graduate Program in Genetics and Molecular Biology, Department of Genetics, Institute of Biosciences, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
| |
Collapse
|
8
|
Vadaddi SM, Zhao Q, Savoie BM. Graph to Activation Energy Models Easily Reach Irreducible Errors but Show Limited Transferability. J Phys Chem A 2024; 128:2543-2555. [PMID: 38517281 DOI: 10.1021/acs.jpca.3c07240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Activation energy characterization of competing reactions is a costly but crucial step for understanding the kinetic relevance of distinct reaction pathways, product yields, and myriad other properties of reacting systems. The standard methodology for activation energy characterization has historically been a transition state search using the highest level of theory that can be afforded. However, recently, several groups have popularized the idea of predicting activation energies directly based on nothing more than the reactant and product graphs, a sufficiently complex neural network, and a broad enough data set. Here, we have revisited this task using the recently developed Reaction Graph Depth 1 (RGD1) transition state data set and several newly developed graph attention architectures. All of these new architectures achieve similar state-of-the-art results of ∼4 kcal/mol mean absolute error on withheld testing sets of reactions but poor performance on external testing sets composed of reactions with differing mechanisms, reaction molecularity, or reactant size distribution. Limited transferability is also shown to be shared by other contemporary graph to activation energy architectures through a series of case studies. We conclude that an array of standard graph architectures can already achieve results comparable to the irreducible error of available reaction data sets but that out-of-distribution performance remains poor.
Collapse
Affiliation(s)
- Sai Mahit Vadaddi
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| | - Qiyuan Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Brett M Savoie
- Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana 47906, United States
| |
Collapse
|
9
|
Wang Z, Wang S, Li Y, Guo J, Wei Y, Mu Y, Zheng L, Li W. A new paradigm for applying deep learning to protein-ligand interaction prediction. Brief Bioinform 2024; 25:bbae145. [PMID: 38581420 PMCID: PMC10998640 DOI: 10.1093/bib/bbae145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 03/18/2024] [Indexed: 04/08/2024] Open
Abstract
Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.
Collapse
Affiliation(s)
- Zechen Wang
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Sheng Wang
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
| | - Yangyang Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| | - Jingjing Guo
- Centre in Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Rua de Luís Gonzaga Gomes, Macao, China
| | - Yanjie Wei
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Yuguang Mu
- School of Biological Sciences, Nanyang Technological University, Singapore
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech, Xiangke Road, 200030, Shanghai, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Xueyuan Road 1068, Shenzhen, 518055 Guang Dong, China
| | - Weifeng Li
- School of Physics, Shandong University, South Shanda Road, 250100 Shandong, China
| |
Collapse
|
10
|
Han S, Lee JE, Kang S, So M, Jin H, Lee JH, Baek S, Jun H, Kim TY, Lee YS. Standigm ASK™: knowledge graph and artificial intelligence platform applied to target discovery in idiopathic pulmonary fibrosis. Brief Bioinform 2024; 25:bbae035. [PMID: 38349059 PMCID: PMC10862655 DOI: 10.1093/bib/bbae035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/28/2023] [Indexed: 02/15/2024] Open
Abstract
Standigm ASK™ revolutionizes healthcare by addressing the critical challenge of identifying pivotal target genes in disease mechanisms-a fundamental aspect of drug development success. Standigm ASK™ integrates a unique combination of a heterogeneous knowledge graph (KG) database and an attention-based neural network model, providing interpretable subgraph evidence. Empowering users through an interactive interface, Standigm ASK™ facilitates the exploration of predicted results. Applying Standigm ASK™ to idiopathic pulmonary fibrosis (IPF), a complex lung disease, we focused on genes (AMFR, MDFIC and NR5A2) identified through KG evidence. In vitro experiments demonstrated their relevance, as TGFβ treatment induced gene expression changes associated with epithelial-mesenchymal transition characteristics. Gene knockdown reversed these changes, identifying AMFR, MDFIC and NR5A2 as potential therapeutic targets for IPF. In summary, Standigm ASK™ emerges as an innovative KG and artificial intelligence platform driving insights in drug target discovery, exemplified by the identification and validation of therapeutic targets for IPF.
Collapse
Affiliation(s)
- Seokjin Han
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Ji Eun Lee
- College of Pharmacy, Ewha Womans University, Ewhayeodae-gil, 03760, Seoul, Republic of Korea
| | - Seolhee Kang
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Minyoung So
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Hee Jin
- College of Pharmacy, Ewha Womans University, Ewhayeodae-gil, 03760, Seoul, Republic of Korea
| | - Jang Ho Lee
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Sunghyeob Baek
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Hyungjin Jun
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Tae Yong Kim
- Standigm Inc., Nonhyeon-ro 85-gil, 06234, Seoul, Republic of Korea
| | - Yun-Sil Lee
- College of Pharmacy, Ewha Womans University, Ewhayeodae-gil, 03760, Seoul, Republic of Korea
| |
Collapse
|
11
|
Lin X, Chen J, Ma W, Tang W, Wang Y. EEG emotion recognition using improved graph neural network with channel selection. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 231:107380. [PMID: 36745954 DOI: 10.1016/j.cmpb.2023.107380] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/11/2023] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND AND OBJECTIVE Emotion classification tasks based on electroencephalography (EEG) are an essential part of artificial intelligence, with promising applications in healthcare areas such as autism research and emotion detection in pregnant women. However, the complex data acquisition environment provides a variable number of EEG channels, which interferes with the model to simulate the process of information transfer in the human brain. Therefore, this paper proposes an improved graph convolution model with dynamic channel selection. METHODS The proposed model combines the advantages of 1D convolution and graph convolution to capture the intra- and inter-channel EEG features, respectively. We add functional connectivity in the graph structure that helps to simulate the relationship between brain regions further. In addition, an adjustable scale of channel selection can be performed based on the attention distribution in the graph structure. RESULTS We conducted various experiments on the DEAP-Twente, DEAP-Geneva, and SEED datasets and achieved average accuracies of 90.74%, 91%, and 90.22%, respectively, which exceeded most existing models. Meanwhile, with only 20% of the EEG channels retained, the models achieved average accuracies of 82.78%, 84%, and 83.93% on the above three datasets, respectively. CONCLUSIONS The experimental results show that the proposed model can achieve effective emotion classification in complex dataset environments. Also, the proposed channel selection method is informative for reducing the cost of affective computing.
Collapse
Affiliation(s)
- Xuefen Lin
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| | - Jielin Chen
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China.
| | - Weifeng Ma
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| | - Wei Tang
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| | - Yuchen Wang
- School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, PR China
| |
Collapse
|
12
|
Li G, Buric F, Zrimec J, Viknander S, Nielsen J, Zelezniak A, Engqvist MKM. Learning deep representations of enzyme thermal adaptation. Protein Sci 2022; 31:e4480. [PMID: 36261883 PMCID: PMC9679980 DOI: 10.1002/pro.4480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 09/02/2022] [Accepted: 10/15/2022] [Indexed: 12/14/2022]
Abstract
Temperature is a fundamental environmental factor that shapes the evolution of organisms. Learning thermal determinants of protein sequences in evolution thus has profound significance for basic biology, drug discovery, and protein engineering. Here, we use a data set of over 3 million BRENDA enzymes labeled with optimal growth temperatures (OGTs) of their source organisms to train a deep neural network model (DeepET). The protein-temperature representations learned by DeepET provide a temperature-related statistical summary of protein sequences and capture structural properties that affect thermal stability. For prediction of enzyme optimal catalytic temperatures and protein melting temperatures via a transfer learning approach, our DeepET model outperforms classical regression models trained on rationally designed features and other deep-learning-based representations. DeepET thus holds promise for understanding enzyme thermal adaptation and guiding the engineering of thermostable enzymes.
Collapse
Affiliation(s)
- Gang Li
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Filip Buric
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Jan Zrimec
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Department of Biotechnology and Systems BiologyNational Institute of BiologyLjubljanaSlovenia
| | - Sandra Viknander
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Jens Nielsen
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- BioInnovation InstituteCopenhagen NDenmark
| | - Aleksej Zelezniak
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Life Sciences CentreInstitute of Biotechnology, Vilnius UniversityVilniusLithuania
- Randall Centre for Cell & Molecular BiophysicsKing's College London, New Hunt's House, Guy's Campus, SE1 1ULLondonUK
| | - Martin K. M. Engqvist
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Enginzyme ABStockholmSweden
| |
Collapse
|
13
|
Ludwiczak J, Winski A, Dunin-Horkawicz S. Localpdb- a Python package to manage protein structures and their annotations. Bioinformatics 2022; 38:2633-2635. [PMID: 35199148 PMCID: PMC9048648 DOI: 10.1093/bioinformatics/btac121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 01/07/2022] [Accepted: 02/21/2022] [Indexed: 12/02/2022] Open
Abstract
Motivation The wealth of protein structures collected in the Protein Data Bank enabled large-scale studies of their function and evolution. Such studies, however, require the generation of customized datasets combining the structural data with miscellaneous accessory resources providing functional, taxonomic and other annotations. Unfortunately, the functionality of currently available tools for the creation of such datasets is limited and their usage frequently requires laborious surveying of various data sources and resolving inconsistencies between their versions. Results To address this problem, we developed localpdb, a versatile Python library for the management of protein structures and their annotations. The library features a flexible plugin system enabling seamless unification of the structural data with diverse auxiliary resources, full version control and powerful functionality of creating highly customized datasets. The localpdb can be used in a wide range of bioinformatic tasks, in particular those involving large-scale protein structural analyses and machine learning. Availability and implementation localpdb is freely available at https://github.com/labstructbioinf/localpdb. Documentation along with the usage examples can be accessed at https://labstructbioinf.github.io/localpdb/.
Collapse
Affiliation(s)
- Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| | - Aleksander Winski
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| | - Stanislaw Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| |
Collapse
|