Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018;34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 254] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open

For:	Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018;34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 254] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open

Number

Cited by Other Article(s)

Liu J, Li K, Tang X, Zhang Y, Guan X. Grain protein function prediction based on improved FCN and bidirectional LSTM. Food Chem 2025;482:143955. [PMID: 40209386 DOI: 10.1016/j.foodchem.2025.143955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 03/10/2025] [Accepted: 03/17/2025] [Indexed: 04/12/2025]

Cross MCG, Aboulnaga E, TerAvest MA. A small number of point mutations confer formate tolerance in Shewanella oneidensis. Appl Environ Microbiol 2025;91:e0196824. [PMID: 40207971 DOI: 10.1128/aem.01968-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Accepted: 01/18/2025] [Indexed: 04/11/2025] Open

Abstract

Microbial electrosynthesis (MES) is a sustainable approach to chemical production from CO2 and clean electricity. However, limitations in electron transfer efficiency and gaps in understanding of electron transfer pathways in MES systems prevent full realization of this technology. Shewanella oneidensis could serve as an MES biocatalyst because it has a well-studied, efficient transmembrane electron transfer pathway. A key first step in MES in this organism could be CO2 reduction to formate. However, we report that wild-type S. oneidensis does not tolerate high levels of formate. In this work, we created and characterized formate-tolerant strains of S. oneidensis for further engineering and future use in MES systems through adaptive laboratory evolution. Two different point mutations in a gene encoding a predicted sodium-dependent bicarbonate transporter and a DUF2721-containing protein separately confer formate tolerance to S. oneidensis. The mutations were further evaluated to understand their role in improving formate tolerance. We also show that the wild-type and mutant versions of the putative sodium-dependent bicarbonate transporter improve formate tolerance of Zymomonas mobilis, indicating the potential of transferring this formate tolerance phenotype to other organisms.

IMPORTANCE

Shewanella oneidensis is a bacterium with a well-studied, efficient extracellular electron transfer pathway. This capability could make this organism a suitable host for microbial electrosynthesis using CO2 or formate as feedstocks. However, we report here that formate is toxic to S. oneidensis, limiting the potential for its use in these systems. In this work, we evolve several strains of S. oneidensis that have improved formate tolerance, and we investigate some mutations that confer this phenotype. The phenotype is confirmed to be attributed to several single point mutations by transferring the wild-type and mutant versions of each gene to the wild-type strain. Finally, the formate tolerance mechanism of one variant is studied using structural modeling and expression in another host. This study, therefore, presents a simple method for conferring formate tolerance to bacterial hosts.

Collapse

Kong D, Qian J, Gao C, Wang Y, Shi T, Ye C. Machine Learning Empowering Microbial Cell Factory: A Comprehensive Review. Appl Biochem Biotechnol 2025:10.1007/s12010-025-05260-x. [PMID: 40397295 DOI: 10.1007/s12010-025-05260-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/02/2025] [Indexed: 05/22/2025]

Percudani R, De Rito C. Predicting Protein Function in the AI and Big Data Era. Biochemistry 2025. [PMID: 40380914 DOI: 10.1021/acs.biochem.5c00186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/19/2025]

Shao J, Chen J, Liu B. ProFun-SOM: Protein Function Prediction for Specific Ontology Based on Multiple Sequence Alignment Reconstruction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:8060-8071. [PMID: 38980781 DOI: 10.1109/tnnls.2024.3419250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]

de Oliveira GB, Pedrini H, Dias Z. SUPERMAGO: Protein Function Prediction Based on Transformer Embeddings. Proteins 2025;93:981-996. [PMID: 39711079 DOI: 10.1002/prot.26782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/28/2024] [Accepted: 12/09/2024] [Indexed: 12/24/2024]

Kong Y, Chen H, Huang X, Chang L, Yang B, Chen W. Precise metabolic modeling in post-omics era: accomplishments and perspectives. Crit Rev Biotechnol 2025;45:683-701. [PMID: 39198033 DOI: 10.1080/07388551.2024.2390089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/18/2024] [Accepted: 07/23/2024] [Indexed: 09/01/2024]

Kelly T, Xia S, Lu J, Zhang Y. Unified Deep Learning of Molecular and Protein Language Representations with T5ProtChem. J Chem Inf Model 2025;65:3990-3998. [PMID: 40197028 PMCID: PMC12042257 DOI: 10.1021/acs.jcim.5c00051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2025] [Revised: 03/19/2025] [Accepted: 03/31/2025] [Indexed: 04/09/2025]

Zhang H, Sun Y, Wang Y, Luo X, Liu Y, Chen B, Jin X, Zhu D. GTPLM-GO: Enhancing Protein Function Prediction Through Dual-Branch Graph Transformer and Protein Language Model Fusing Sequence and Local-Global PPI Information. Int J Mol Sci 2025;26:4088. [PMID: 40362328 PMCID: PMC12072039 DOI: 10.3390/ijms26094088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2025] [Revised: 04/21/2025] [Accepted: 04/23/2025] [Indexed: 05/15/2025] Open

Chen S, Zheng P, Zheng L, Yao Q, Meng Z, Lin L, Chen X, Liu R. BERT-DomainAFP: Antifreeze protein recognition and classification model based on BERT and structural domain annotation. iScience 2025;28:112077. [PMID: 40241758 PMCID: PMC12002629 DOI: 10.1016/j.isci.2025.112077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 01/03/2025] [Accepted: 02/17/2025] [Indexed: 04/18/2025] Open

Affiliation(s)

Shengzhen Chen State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Ping Zheng State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Lele Zheng State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Qinglong Yao State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Ziyu Meng State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Longshan Lin Laboratory of Marine Biodiversity Research, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen 361005, China
Xinhua Chen State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Ruoyu Liu State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China

Collapse

Xiong Y, Yuan S, Xiong Y, Li L, Peng J, Zhang J, Fan X, Jiang C, Sha LN, Wang Z, Peng X, Zhang Z, Yu Q, Lei X, Dong Z, Liu Y, Zhao J, Li G, Yang Z, Jia S, Li D, Sun M, Bai S, Liu J, Yang Y, Ma X. Analysis of allohexaploid wheatgrass genome reveals its Y haplome origin in Triticeae and high-altitude adaptation. Nat Commun 2025;16:3104. [PMID: 40164609 PMCID: PMC11958778 DOI: 10.1038/s41467-025-58341-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 03/19/2025] [Indexed: 04/02/2025] Open

Affiliation(s)

Yi Xiong College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Shuai Yuan State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
Yanli Xiong College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China Sichuan Academy of Grassland Sciences, Chengdu, Sichuan, 611700, China
Lizuiyue Li National Plateau Wetlands Research Center, Southwest Forestry University, Kunming, 650224, China Yunnan Key Laboratory of Plateau Wetland Conservation Restoration and Ecological Services, Southwest Forestry University, Kunming, 650224, China
Jinghan Peng College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Jin Zhang State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China
Xing Fan Triticeae Research Institute, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Chengzhi Jiang School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
Li-Na Sha College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Zhaoting Wang College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Xue Peng College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Zecheng Zhang College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Qingqing Yu Sichuan Academy of Grassland Sciences, Chengdu, Sichuan, 611700, China
Xiong Lei Sichuan Academy of Grassland Sciences, Chengdu, Sichuan, 611700, China
Zhixiao Dong College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Yingjie Liu College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Junming Zhao College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China
Guangrong Li School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
Zujun Yang School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, 610054, China
Shangang Jia College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
Daxu Li Sichuan Academy of Grassland Sciences, Chengdu, Sichuan, 611700, China
Ming Sun School of Life Science and Engineering, Southwest University of Science and Technology, Mianyang, Sichuan, 621010, China
Shiqie Bai School of Life Science and Engineering, Southwest University of Science and Technology, Mianyang, Sichuan, 621010, China.
Jianquan Liu State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
Yongzhi Yang State Key Laboratory of Herbage Improvement and Grassland Agro-Ecosystem, College of Ecology, Lanzhou University, Lanzhou, 730000, China.
Xiao Ma College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu, Sichuan, 611130, China.

Collapse

Kim HR, Ji H, Kim GB, Lee SY. Enzyme functional classification using artificial intelligence. Trends Biotechnol 2025:S0167-7799(25)00088-5. [PMID: 40155269 DOI: 10.1016/j.tibtech.2025.03.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 02/27/2025] [Accepted: 03/06/2025] [Indexed: 04/01/2025]

Affiliation(s)

Ha Rim Kim Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
Hongkeun Ji Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
Gi Bae Kim Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; BioProcess Engineering Research Center, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
Sang Yup Lee Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 four), KAIST Institute for BioCentury, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; Graduate School of Engineering Biology, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; BioProcess Engineering Research Center, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea; Center for Synthetic Biology, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea.

Collapse

Mao Y, Xu W, Shun Y, Chai L, Xue L, Yang Y, Li M. A multimodal model for protein function prediction. Sci Rep 2025;15:10465. [PMID: 40140535 PMCID: PMC11947276 DOI: 10.1038/s41598-025-94612-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Accepted: 03/14/2025] [Indexed: 03/28/2025] Open

Li J, Chen X, Huang H, Zeng M, Yu J, Gong X, Ye Q. $\mathcal{S}$ able: bridging the gap in protein structure understanding with an empowering and versatile pre-training paradigm. Brief Bioinform 2025;26:bbaf120. [PMID: 40163822 PMCID: PMC11957296 DOI: 10.1093/bib/bbaf120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 01/23/2025] [Accepted: 02/23/2025] [Indexed: 04/02/2025] Open

Wijaya AJ, Anžel A, Richard H, Hattab G. Current state and future prospects of Horizontal Gene Transfer detection. NAR Genom Bioinform 2025;7:lqaf005. [PMID: 39935761 PMCID: PMC11811736 DOI: 10.1093/nargab/lqaf005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 12/26/2024] [Accepted: 02/04/2025] [Indexed: 02/13/2025] Open

Rosati R, Romeo L, Vargas VM, Gutierrez PA, Frontoni E, Hervas-Martinez C. Learning Ordinal-Hierarchical Constraints for Deep Learning Classifiers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:4765-4778. [PMID: 38347692 DOI: 10.1109/tnnls.2024.3360641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]

Luo J, Luo Y. Learning maximally spanning representations improves protein function annotation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.13.638156. [PMID: 40027840 PMCID: PMC11870436 DOI: 10.1101/2025.02.13.638156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]

Abstract

Automated protein function annotation is a fundamental problem in computational biology, crucial for understanding the functional roles of proteins in biological processes, with broad implications in medicine and biotechnology. A persistent challenge in this problem is the imbalanced, long-tail distribution of available function annotations: a small set of well-studied function classes account for most annotated proteins, while many other classes have few annotated proteins, often due to investigative bias, experimental limitations, or intrinsic biases in protein evolution. As a result, existing machine learning models for protein function prediction tend to only optimize the prediction accuracy for well-studied function classes overrepresented in the training data, leading to poor accuracy for understudied functions. In this work, we develop MSRep, a novel deep learning-based protein function annotation framework designed to address this imbalance issue and improve annotation accuracy. MSRep is inspired by an intriguing phenomenon, called neural collapse (NC), commonly observed in high-accuracy deep neural networks used for classification tasks, where hidden representations in the final layer collapse to class-specific mean embeddings, while maintaining maximal inter-class separation. Given that NC consistently emerges across diverse architectures and tasks for high-accuracy models, we hypothesize that inducing NC structure in models trained on imbalanced data can enhance both prediction accuracy and generalizability. To achieve this, MSRep refines a pre-trained protein language model to produce NC-like representations by optimizing an NC-inspired loss function, which ensures that minority functions are equally represented in the embedding space as majority functions, in contrast to conventional classification methods whose embedding spaces are dominated by overrepresented classes. In evaluations across four protein function annotation tasks on the prediction of Enzyme Commission numbers, Gene3D codes, Pfam families, and Gene Ontology terms, MSRep demonstrates superior predictive performance for both well- and underrepresented classes, outperforming several state-of-the-art annotation tools. We anticipate that MSRep will enhance the annotation of understudied functions and novel, uncharacterized proteins, advancing future protein function studies and accelerating the discovery of new functional proteins. The source code of MSRep is available at https://github.com/luo-group/MSRep.

Collapse

Lee Y, Gao P, Xu Y, Wang Z, Li S, Chen J. MEGA-GO: functions prediction of diverse protein sequence length using Multi-scalE Graph Adaptive neural network. Bioinformatics 2025;41:btaf032. [PMID: 39847542 PMCID: PMC11810639 DOI: 10.1093/bioinformatics/btaf032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/13/2025] [Accepted: 01/21/2025] [Indexed: 01/25/2025] Open

Chen JY, Wang JF, Hu Y, Li XH, Qian YR, Song CL. Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review. Front Bioeng Biotechnol 2025;13:1506508. [PMID: 39906415 PMCID: PMC11790633 DOI: 10.3389/fbioe.2025.1506508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Accepted: 01/02/2025] [Indexed: 02/06/2025] Open

Chou JC, Dassama LMK. Lipid Trafficking in Diverse Bacteria. Acc Chem Res 2025;58:36-46. [PMID: 39680024 PMCID: PMC11713862 DOI: 10.1021/acs.accounts.4c00540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 11/27/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024]

Abstract

Lipids are essential for life and serve as cell envelope components, signaling molecules, and nutrients. For lipids to achieve their required functions, they need to be correctly localized. This requires the action of transporter proteins and an energy source. The current understanding of bacterial lipid transporters is limited to a few classes. Given the diversity of lipid species and the predicted existence of specific lipid transporters, many more transporters await discovery and characterization. These proteins could be prime targets for modulators that control bacterial cell proliferation and pathogenesis. One overarching goal of our research is to understand the molecular mechanisms of bacterial metabolite trafficking, including lipids, and to leverage that understanding to identify or engineer inhibitory ligands. In recent years, our work has revealed two novel lipid transport systems in bacteria: bacterial sterol transporters (Bst) A, B, and C in Methylococcus capsulatus and the TatT proteins in Enhygromyxa salina and Treponema pallidum. Both systems are composed of transporters bioinformatically identified as being involved in the transport of other metabolites, but substrates were never revealed. However, the genetic colocalization of the genes encoding BstABC with sterol biosynthetic enzymes in M. capsulatus suggested that they might recognize sterols as substrates. Also, homologues of TatTs are present in diverse bacteria but are overrepresented in bacteria deficient in de novo lipid synthesis or residing in nutrient-poor environments; we reasoned that these proteins might facilitate the transport of lipids. Our efforts to reveal the substrate scope of two TatT proteins revealed their engagement with long-chain fatty acids. Enabling the discovery of the BstABC system and the TatT proteins were bioinformatic analyses, quantitative measurements of protein-ligand equilibrium affinities, and high-resolution structural studies that provided remarkable insights into ligand binding cavities and the structural basis for ligand interaction. These approaches, in particular our bioinformatics and structural work, highlighted the diversity of protein sequence and structures amenable to lipid engagement. These observations allowed the hypothesis that lipid handling proteins, in general and especially so in the bacterial domain, can have diverse amino acid compositions and three-dimensional structures. As such, bioinformatics geared at identifying them in poorly characterized genomes is likely to miss many candidates that diverge from well-characterized family members. This realization spurred efforts to understand the unifying features in all of the lipid handling proteins we have characterized to date. To do this, we inspected the ligand binding sites of the proteins: they were remarkably hydrophobic and sometimes displayed a dichotomy of hydrophobic and hydrophilic amino acids, akin to the ligands that they accommodate in those cavities. Because of this, we reasoned that the physicochemical features of ligand binding cavities could be accurate predictors of a protein's propensity to bind lipids. This finding was leveraged to create structure-based lipid-interacting pocket predictor (SLiPP), a machine-learning algorithm capable of identifying ligand cavities with physico-chemical features consistent with those of known lipid binding sites. SLiPP is especially useful in poorly annotated genomes (such as with bacterial pathogens), where it could reveal candidate proteins to be targeted for the development of antimicrobials.

Collapse

Wang W, Shuai Y, Zeng M, Fan W, Li M. DPFunc: accurately predicting protein function via deep learning with domain-guided structure information. Nat Commun 2025;16:70. [PMID: 39746897 PMCID: PMC11697396 DOI: 10.1038/s41467-024-54816-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 11/21/2024] [Indexed: 01/04/2025] Open

Boadu F, Lee A, Cheng J. Deep learning methods for protein function prediction. Proteomics 2025;25:e2300471. [PMID: 38996351 PMCID: PMC11735672 DOI: 10.1002/pmic.202300471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 06/15/2024] [Accepted: 06/18/2024] [Indexed: 07/14/2024]

Yan R, Islam MT, Xing L. Deep representation learning of protein-protein interaction networks for enhanced pattern discovery. SCIENCE ADVANCES 2024;10:eadq4324. [PMID: 39693438 DOI: 10.1126/sciadv.adq4324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 11/14/2024] [Indexed: 12/20/2024]

Crawford J, Chikina M, Greene CS. Best holdout assessment is sufficient for cancer transcriptomic model selection. PATTERNS (NEW YORK, N.Y.) 2024;5:101115. [PMID: 39776849 PMCID: PMC11701843 DOI: 10.1016/j.patter.2024.101115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 08/01/2024] [Accepted: 11/13/2024] [Indexed: 01/11/2025]

Soleymani F, Paquet E, Viktor HL, Michalowski W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput Struct Biotechnol J 2024;23:2779-2797. [PMID: 39050782 PMCID: PMC11268121 DOI: 10.1016/j.csbj.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open

Shi W, Zhang Y, Sun Y, Lin Z. Function-Genes and Disease-Genes Prediction Based on Network Embedding and One-Class Classification. Interdiscip Sci 2024;16:781-801. [PMID: 39230798 DOI: 10.1007/s12539-024-00638-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 05/14/2024] [Accepted: 05/21/2024] [Indexed: 09/05/2024]

Xiang W, Xiong Z, Chen H, Xiong J, Zhang W, Fu Z, Zheng M, Liu B, Shi Q. FAPM: functional annotation of proteins using multimodal models beyond structural modeling. Bioinformatics 2024;40:btae680. [PMID: 39540736 PMCID: PMC11630832 DOI: 10.1093/bioinformatics/btae680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 10/12/2024] [Accepted: 11/12/2024] [Indexed: 11/16/2024] Open

Vu TTD, Kim J, Jung J. An experimental analysis of graph representation learning for Gene Ontology based protein function prediction. PeerJ 2024;12:e18509. [PMID: 39553733 PMCID: PMC11569786 DOI: 10.7717/peerj.18509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024] Open

Liu Q, Zhang C, Freddolino L. InterLabelGO+: unraveling label correlations in protein function prediction. Bioinformatics 2024;40:btae655. [PMID: 39499152 PMCID: PMC11568131 DOI: 10.1093/bioinformatics/btae655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 10/07/2024] [Accepted: 11/01/2024] [Indexed: 11/07/2024] Open

Kumar V, Deepak A, Ranjan A, Prakash A. CrossPredGO: A Novel Light-Weight Cross-Modal Multi-Attention Framework for Protein Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:1709-1720. [PMID: 38843056 DOI: 10.1109/tcbb.2024.3410696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]

Kumar V, Deepak A, Ranjan A, Prakash A. Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:1922-1933. [PMID: 38990747 DOI: 10.1109/tcbb.2024.3426491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]

Taha K. Employing Machine Learning Techniques to Detect Protein Function: A Survey, Experimental, and Empirical Evaluations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:1965-1986. [PMID: 39008392 DOI: 10.1109/tcbb.2024.3427381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]

Li L, Dannenfelser R, Cruz C, Yao V. A best-match approach for gene set analyses in embedding spaces. Genome Res 2024;34:1421-1433. [PMID: 39231608 PMCID: PMC11529866 DOI: 10.1101/gr.279141.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/29/2024] [Indexed: 09/06/2024]

Langschied F, Bordin N, Cosentino S, Fuentes-Palacios D, Glover N, Hiller M, Hu Y, Huerta-Cepas J, Coelho LP, Iwasaki W, Majidian S, Manzano-Morales S, Persson E, Richards TA, Gabaldón T, Sonnhammer E, Thomas PD, Dessimoz C, Ebersberger I. Quest for Orthologs in the Era of Biodiversity Genomics. Genome Biol Evol 2024;16:evae224. [PMID: 39404012 PMCID: PMC11523110 DOI: 10.1093/gbe/evae224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/11/2024] [Indexed: 11/01/2024] Open

Affiliation(s)

Felix Langschied Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
Nicola Bordin Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK
Salvatore Cosentino Department of Integrated Biosciences, The University of Tokyo, 277-0882 Tokyo, Japan
Diego Fuentes-Palacios Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
Natasha Glover SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
Michael Hiller Department of Comparative Genomics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
Yanhui Hu Department of Genetics, Harvard Medical School, Boston, MA 02115, USA Drosophila RNAi Screening Center, Harvard Medical School, Boston, MA 02115, USA
Jaime Huerta-Cepas Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), Campus de Montegancedo-UPM, Madrid, Spain
Luis Pedro Coelho Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Translational Research Institute, Woolloongabba, Queensland, Australia
Wataru Iwasaki Department of Integrated Biosciences, University of Tokyo, 277-0882 Tokyo, Japan
Sina Majidian SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
Saioa Manzano-Morales Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
Emma Persson Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
Thomas A Richards Department of Biology, University of Oxford, Oxford, OX1 3SZUK
Toni Gabaldón Barcelona Supercomputing Center (BSC-CNS), 08034 Barcelona, Spain Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
Erik Sonnhammer Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Solna, Sweden
Paul D Thomas Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
Christophe Dessimoz SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
Ingo Ebersberger Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany

Collapse

Meng L, Wang X. TAWFN: a deep learning framework for protein function prediction. Bioinformatics 2024;40:btae571. [PMID: 39312678 PMCID: PMC11639667 DOI: 10.1093/bioinformatics/btae571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/27/2024] [Accepted: 09/19/2024] [Indexed: 09/25/2024] Open

Abstract

MOTIVATION

Proteins play pivotal roles in biological systems, and precise prediction of their functions is indispensable for practical applications. Despite the surge in protein sequence data facilitated by high-throughput techniques, unraveling the exact functionalities of proteins still demands considerable time and resources. Currently, numerous methods rely on protein sequences for prediction, while methods targeting protein structures are scarce, often employing convolutional neural networks (CNN) or graph convolutional networks (GCNs) individually.

RESULTS

To address these challenges, our approach starts from protein structures and proposes a method that combines CNN and GCN into a unified framework called the two-model adaptive weight fusion network (TAWFN) for protein function prediction. First, amino acid contact maps and sequences are extracted from the protein structure. Then, the sequence is used to generate one-hot encoded features and deep semantic features. These features, along with the constructed graph, are fed into the adaptive graph convolutional networks (AGCN) module and the multi-layer convolutional neural network (MCNN) module as needed, resulting in preliminary classification outcomes. Finally, the preliminary classification results are inputted into the adaptive weight computation network, where adaptive weights are calculated to fuse the initial predictions from both networks, yielding the final prediction result. To evaluate the effectiveness of our method, experiments were conducted on the PDBset and AFset datasets. For molecular function, biological process, and cellular component tasks, TAWFN achieved area under the precision-recall curve (AUPR) values of 0.718, 0.385, and 0.488 respectively, with corresponding Fmax scores of 0.762, 0.628, and 0.693, and Smin scores of 0.326, 0.483, and 0.454. The experimental results demonstrate that TAWFN exhibits promising performance, outperforming existing methods.

AVAILABILITY AND IMPLEMENTATION

The TAWFN source code can be found at: https://github.com/ss0830/TAWFN.

Collapse

Qiao B, Wang S, Hou M, Chen H, Zhou Z, Xie X, Pang S, Yang C, Yang F, Zou Q, Sun S. Identifying nucleotide-binding leucine-rich repeat receptor and pathogen effector pairing using transfer-learning and bilinear attention network. Bioinformatics 2024;40:btae581. [PMID: 39331576 PMCID: PMC11969219 DOI: 10.1093/bioinformatics/btae581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 08/24/2024] [Accepted: 09/25/2024] [Indexed: 09/29/2024] Open

Affiliation(s)

Baixue Qiao Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150001, China
Shuda Wang Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150001, China
Mingjun Hou Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
Haodi Chen Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
Zhengwenyang Zhou Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
Xueying Xie Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
Shaozi Pang Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China
Chunxue Yang College of Landscape Architecture, Northeast Forestry University, Harbin 150001, China
Fenglong Yang Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou 350122, China Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350122, China
Quan Zou Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
Shanwen Sun Key Laboratory of Saline-Alkali Vegetation Ecology Restoration, Ministry of Education (Northeast Forestry University), Harbin 150001, China State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150001, China

Collapse

Meher PK, Pradhan UK, Sethi PL, Naha S, Gupta A, Parsad R. PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants. PLANT MOLECULAR BIOLOGY 2024;114:106. [PMID: 39316155 DOI: 10.1007/s11103-024-01500-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 09/04/2024] [Indexed: 09/25/2024]

Abstract

Photosynthetic proteins play a crucial role in agricultural productivity by harnessing light energy for plant growth. Understanding these proteins, especially within C3 and C4 pathways, holds promise for improving crops in challenging environments. Despite existing models, a comprehensive computational framework specifically targeting plant photosynthetic proteins is lacking. The underutilization of plant datasets in computational algorithms accentuates the gap this study aims to fill by introducing a novel sequence-based computational method for identifying these proteins. The scope of this study encompassed diverse plant species, ensuring comprehensive representation across C3 and C4 pathways. Utilizing six deep learning models and seven shallow learning algorithms, paired with six sequence-derived feature sets followed by feature selection strategy, this study developed a comprehensive model for prediction of plant-specific photosynthetic proteins. Following 5-fold cross-validation analysis, LightGBM with 65 and 90 LGBM-VIM selected features respectively emerged as the best models for C3 (auROC: 91.78%, auPRC: 92.55%) and C4 (auROC: 99.05%, auPRC: 99.18%) plants. Validation using an independent dataset confirmed the robustness of the proposed model for both C3 (auROC: 87.23%, auPRC: 88.40%) and C4 (auROC: 92.83%, auPRC: 92.29%) categories. Comparison with existing methods demonstrated the superiority of the proposed model in predicting plant-specific photosynthetic proteins. This study further established a free online prediction server PredPSP ( https://iasri-sg.icar.gov.in/predpsp/ ) to facilitate ongoing efforts for identifying photosynthetic proteins in C3 and C4 plants. Being first of its kind, this study offers valuable insights into predicting plant-specific photosynthetic proteins which holds significant implications for plant biology.

Collapse

Bai P, Li G, Luo J, Liang C. Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training. Brief Bioinform 2024;25:bbae568. [PMID: 39489606 PMCID: PMC11531862 DOI: 10.1093/bib/bbae568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 09/24/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024] Open

Mi J, Wang H, Li J, Sun J, Li C, Wan J, Zeng Y, Gao J. GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features. Brief Bioinform 2024;25:bbae559. [PMID: 39487084 PMCID: PMC11530295 DOI: 10.1093/bib/bbae559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/03/2024] [Accepted: 10/17/2024] [Indexed: 11/04/2024] Open

Abstract

Recent advances in high-throughput sequencing have led to an explosion of genomic and transcriptomic data, offering a wealth of protein sequence information. However, the functions of most proteins remain unannotated. Traditional experimental methods for annotation of protein functions are costly and time-consuming. Current deep learning methods typically rely on Graph Convolutional Networks to propagate features between protein residues. However, these methods fail to capture fine atomic-level geometric structural features and cannot directly compute or propagate structural features (such as distances, directions, and angles) when transmitting features, often simplifying them to scalars. Additionally, difficulties in capturing long-range dependencies limit the model's ability to identify key nodes (residues). To address these challenges, we propose a geometric graph network (GGN-GO) for predicting protein function that enriches feature extraction by capturing multi-scale geometric structural features at the atomic and residue levels. We use a geometric vector perceptron to convert these features into vector representations and aggregate them with node features for better understanding and propagation in the network. Moreover, we introduce a graph attention pooling layer captures key node information by adaptively aggregating local functional motifs, while contrastive learning enhances graph representation discriminability through random noise and different views. The experimental results show that GGN-GO outperforms six comparative methods in tasks with the most labels for both experimentally validated and predicted protein structures. Furthermore, GGN-GO identifies functional residues corresponding to those experimentally confirmed, showcasing its interpretability and the ability to pinpoint key protein regions. The code and data are available at: https://github.com/MiJia-ID/GGN-GO.

Collapse

Barrios-Núñez I, Martínez-Redondo G, Medina-Burgos P, Cases I, Fernández R, Rojas A. Decoding functional proteome information in model organisms using protein language models. NAR Genom Bioinform 2024;6:lqae078. [PMID: 38962255 PMCID: PMC11217674 DOI: 10.1093/nargab/lqae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/31/2024] [Accepted: 06/26/2024] [Indexed: 07/05/2024] Open

Jang YJ, Qin QQ, Huang SY, Peter ATJ, Ding XM, Kornmann B. Accurate prediction of protein function using statistics-informed graph networks. Nat Commun 2024;15:6601. [PMID: 39097570 PMCID: PMC11297950 DOI: 10.1038/s41467-024-50955-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/15/2024] [Indexed: 08/05/2024] Open

Khandelwal M, Kumar Rout R. DeepPRMS: advanced deep learning model to predict protein arginine methylation sites. Brief Funct Genomics 2024;23:452-463. [PMID: 38267081 DOI: 10.1093/bfgp/elae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 11/17/2023] [Accepted: 01/03/2024] [Indexed: 01/26/2024] Open

Truong-Quoc C, Lee JY, Kim KS, Kim DN. Prediction of DNA origami shape using graph neural network. NATURE MATERIALS 2024;23:984-992. [PMID: 38486095 DOI: 10.1038/s41563-024-01846-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 02/22/2024] [Indexed: 07/10/2024]

Chen Z, Luo Q. DualNetGO: a dual network model for protein function prediction via effective feature selection. Bioinformatics 2024;40:btae437. [PMID: 38963311 PMCID: PMC11538015 DOI: 10.1093/bioinformatics/btae437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 06/05/2024] [Accepted: 07/03/2024] [Indexed: 07/05/2024] Open

Abstract

MOTIVATION

Protein-protein interaction (PPI) networks are crucial for automatically annotating protein functions. As multiple PPI networks exist for the same set of proteins that capture properties from different aspects, it is a challenging task to effectively utilize these heterogeneous networks. Recently, several deep learning models have combined PPI networks from all evidence, or concatenated all graph embeddings for protein function prediction. However, the lack of a judicious selection procedure prevents the effective harness of information from different PPI networks, as these networks vary in densities, structures, and noise levels. Consequently, combining protein features indiscriminately could increase the noise level, leading to decreased model performance.

RESULTS

We develop DualNetGO, a dual-network model comprised of a Classifier and a Selector, to predict protein functions by effectively selecting features from different sources including graph embeddings of PPI networks, protein domain, and subcellular location information. Evaluation of DualNetGO on human and mouse datasets in comparison with other network-based models shows at least 4.5%, 6.2%, and 14.2% improvement on Fmax in BP, MF, and CC gene ontology categories, respectively, for human, and 3.3%, 10.6%, and 7.7% improvement on Fmax for mouse. We demonstrate the generalization capability of our model by training and testing on the CAFA3 data, and show its versatility by incorporating Esm2 embeddings. We further show that our model is insensitive to the choice of graph embedding method and is time- and memory-saving. These results demonstrate that combining a subset of features including PPI networks and protein attributes selected by our model is more effective in utilizing PPI network information than only using one kind of or concatenating graph embeddings from all kinds of PPI networks.

AVAILABILITY AND IMPLEMENTATION

The source code of DualNetGO and some of the experiment data are available at: https://github.com/georgedashen/DualNetGO.

Collapse

de Oliveira GB, Pedrini H, Dias Z. Integrating Transformers and AutoML for Protein Function Prediction. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024;2024:1-5. [PMID: 40039729 DOI: 10.1109/embc53108.2024.10782139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]

Zhapa-Camacho F, Tang Z, Kulmanov M, Hoehndorf R. Predicting protein functions using positive-unlabeled ranking with ontology-based priors. Bioinformatics 2024;40:i401-i409. [PMID: 38940168 PMCID: PMC11211813 DOI: 10.1093/bioinformatics/btae237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open

Curion F, Theis FJ. Machine learning integrative approaches to advance computational immunology. Genome Med 2024;16:80. [PMID: 38862979 PMCID: PMC11165829 DOI: 10.1186/s13073-024-01350-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 05/23/2024] [Indexed: 06/13/2024] Open

Liu Y, Zhang Y, Chen Z, Peng J. POLAT: Protein function prediction based on soft mask graph network and residue-Label ATtention. Comput Biol Chem 2024;110:108064. [PMID: 38677014 DOI: 10.1016/j.compbiolchem.2024.108064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 01/19/2024] [Accepted: 03/26/2024] [Indexed: 04/29/2024]

Lin B, Luo X, Liu Y, Jin X. A comprehensive review and comparison of existing computational methods for protein function prediction. Brief Bioinform 2024;25:bbae289. [PMID: 39003530 PMCID: PMC11246557 DOI: 10.1093/bib/bbae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 05/18/2024] [Indexed: 07/15/2024] Open

Ansari M, White AD. Learning peptide properties with positive examples only. DIGITAL DISCOVERY 2024;3:977-986. [PMID: 38756224 PMCID: PMC11094695 DOI: 10.1039/d3dd00218g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 03/30/2024] [Indexed: 05/18/2024]