Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhu YH, Zhang C, Yu DJ, Zhang Y. Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction. PLoS Comput Biol 2022;18:e1010793. [PMID: 36548439 PMCID: PMC9822105 DOI: 10.1371/journal.pcbi.1010793] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 01/06/2023] [Accepted: 12/05/2022] [Indexed: 12/24/2022] Open

For:	Zhu YH, Zhang C, Yu DJ, Zhang Y. Integrating unsupervised language model with triplet neural networks for protein gene ontology prediction. PLoS Comput Biol 2022;18:e1010793. [PMID: 36548439 PMCID: PMC9822105 DOI: 10.1371/journal.pcbi.1010793] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 01/06/2023] [Accepted: 12/05/2022] [Indexed: 12/24/2022] Open

Number

Cited by Other Article(s)

Liu Z, Qiu WR, Liu Y, Yan H, Pei W, Zhu YH, Qiu J. A comprehensive review of computational methods for Protein-DNA binding site prediction. Anal Biochem 2025;703:115862. [PMID: 40209920 DOI: 10.1016/j.ab.2025.115862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 03/20/2025] [Accepted: 04/06/2025] [Indexed: 04/12/2025]

de Oliveira GB, Pedrini H, Dias Z. SUPERMAGO: Protein Function Prediction Based on Transformer Embeddings. Proteins 2025;93:981-996. [PMID: 39711079 DOI: 10.1002/prot.26782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/28/2024] [Accepted: 12/09/2024] [Indexed: 12/24/2024]

Wang J, Chen J, Hu Y, Song C, Li X, Qian Y, Deng L. DeepMFFGO: A Protein Function Prediction Method for Large-Scale Multifeature Fusion. J Chem Inf Model 2025;65:3841-3853. [PMID: 40116538 DOI: 10.1021/acs.jcim.5c00062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]

Affiliation(s)

Jingfu Wang School of Software, Xinjiang University, Urumqi 830091, China Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
Jiaying Chen School of Software, Xinjiang University, Urumqi 830091, China Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
Yue Hu School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
Chaolin Song School of Software, Xinjiang University, Urumqi 830091, China Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
Xinhui Li School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
Yurong Qian Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
Lei Deng School of Software, Xinjiang University, Urumqi 830091, China School of Computer Science and Engineering, Central South University, Changsha 410083, China

Collapse

Mao Y, Xu W, Shun Y, Chai L, Xue L, Yang Y, Li M. A multimodal model for protein function prediction. Sci Rep 2025;15:10465. [PMID: 40140535 PMCID: PMC11947276 DOI: 10.1038/s41598-025-94612-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Accepted: 03/14/2025] [Indexed: 03/28/2025] Open

Song C, He S, Qian Y, Li X, Hu Y, Chen J, Wang J, Deng L. DeepMVD: A Novel Multiview Dynamic Feature Fusion Model for Accurate Protein Function Prediction. J Chem Inf Model 2025;65:3077-3089. [PMID: 40053671 DOI: 10.1021/acs.jcim.4c02216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2025]

Affiliation(s)

Chaolin Song School of Software, Xinjiang University, Urumqi 830091, China Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
Shiwen He School of Software, Xinjiang University, Urumqi 830091, China School of Computer Science and Engineering, Central South University, Changsha 410083, China
Yurong Qian Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
Xinhui Li School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
Yue Hu School of Computer Science and Technology, Xinjiang University, Urumqi 830046, China Joint International Research Laboratory of Silk Road Multilingual Cognitive Computing, Xinjiang University, Urumqi, Xinjiang 830046, China
Jiaying Chen School of Software, Xinjiang University, Urumqi 830091, China Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
Jingfu Wang School of Software, Xinjiang University, Urumqi 830091, China Xinjiang Engineering Research Center of Big Data and Intelligent Software, School of Software, Xinjiang University, Urumqi 830091, China Key Laboratory of Software Engineering, Xinjiang University, Urumqi 830091, China
Lei Deng School of Software, Xinjiang University, Urumqi 830091, China School of Computer Science and Engineering, Central South University, Changsha 410083, China

Collapse

Feng T, Chen X, Wu S, Tang W, Zhou H, Fang Z. Predicting the bacterial host range of plasmid genomes using the language model-based one-class support vector machine algorithm. Microb Genom 2025;11. [PMID: 39932495 DOI: 10.1099/mgen.0.001355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025] Open

Abstract

The prediction of the plasmid host range is crucial for investigating the dissemination of plasmids and the transfer of resistance and virulence genes mediated by plasmids. Several machine learning-based tools have been developed to predict plasmid host ranges. These tools have been trained and tested based on the bacterial host records of plasmids in related databases. Typically, a plasmid genome in databases such as the National Center for Biotechnology Information is annotated with only one or a few bacterial hosts, which does not encompass all possible hosts. Consequently, existing methods may significantly underestimate the host ranges of mobile plasmids. In this work, we propose a novel method named HRPredict, which employs a word vector model to digitally represent the encoded proteins on plasmid genomes. Since it is difficult to confirm which host a particular plasmid definitely cannot enter, we developed a machine learning approach for predicting whether a plasmid can enter a specific bacterium as a no-negative samples learning task. Using multiple one-class support vector machine (SVM) models that do not require negative samples for training, HRPredict predicts the host range of plasmids across 45 families, 56 genera and 56 species. In the benchmark test set, we constructed reliable negative samples for each host taxonomic unit via two indirect methods, and we found that the area under the curve (AUC), F1-score, recall, precision and accuracy of most taxonomic unit prediction models exceeded 0.9. Among the 13 broad-host-range plasmid types, HRPredict demonstrated greater coverage than HOTSPOT and PlasmidHostFinder, thus successfully predicting the majority of hosts previously reported. Through feature importance calculation for each SVM model, we found that genes closely related to the plasmid host range are involved in functions such as bacterial adaptability, pathogenicity and survival. These findings provide significant insight into the mechanisms through which bacteria adjust to diverse environments through plasmids. The HRPredict algorithm is expected to facilitate in-depth research on the spread of broad-host-range plasmids and enable host-range predictions for novel plasmids reconstructed from microbiome sequencing data.

Collapse

Chen JY, Wang JF, Hu Y, Li XH, Qian YR, Song CL. Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review. Front Bioeng Biotechnol 2025;13:1506508. [PMID: 39906415 PMCID: PMC11790633 DOI: 10.3389/fbioe.2025.1506508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2024] [Accepted: 01/02/2025] [Indexed: 02/06/2025] Open

Wang W, Shuai Y, Zeng M, Fan W, Li M. DPFunc: accurately predicting protein function via deep learning with domain-guided structure information. Nat Commun 2025;16:70. [PMID: 39746897 PMCID: PMC11697396 DOI: 10.1038/s41467-024-54816-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Accepted: 11/21/2024] [Indexed: 01/04/2025] Open

Guan J, Ji Y, Peng C, Zou W, Tang X, Shang J, Sun Y. GOPhage: protein function annotation for bacteriophages by integrating the genomic context. Brief Bioinform 2024;26:bbaf014. [PMID: 39838963 PMCID: PMC11751364 DOI: 10.1093/bib/bbaf014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 12/15/2024] [Accepted: 01/06/2025] [Indexed: 01/23/2025] Open

Vu TTD, Kim J, Jung J. An experimental analysis of graph representation learning for Gene Ontology based protein function prediction. PeerJ 2024;12:e18509. [PMID: 39553733 PMCID: PMC11569786 DOI: 10.7717/peerj.18509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024] Open

Liu Q, Zhang C, Freddolino L. InterLabelGO+: unraveling label correlations in protein function prediction. Bioinformatics 2024;40:btae655. [PMID: 39499152 PMCID: PMC11568131 DOI: 10.1093/bioinformatics/btae655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Revised: 10/07/2024] [Accepted: 11/01/2024] [Indexed: 11/07/2024] Open

Wu J, Liu Y, Zhu Y, Yu DJ. Improving Antifreeze Proteins Prediction With Protein Language Models and Hybrid Feature Extraction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:2349-2358. [PMID: 39316498 DOI: 10.1109/tcbb.2024.3467261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]

de Crécy-Lagard V, Dias R, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601547. [PMID: 39005379 PMCID: PMC11244979 DOI: 10.1101/2024.07.01.601547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]

Meng L, Wang X. TAWFN: a deep learning framework for protein function prediction. Bioinformatics 2024;40:btae571. [PMID: 39312678 PMCID: PMC11639667 DOI: 10.1093/bioinformatics/btae571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 08/27/2024] [Accepted: 09/19/2024] [Indexed: 09/25/2024] Open

Abstract

MOTIVATION

Proteins play pivotal roles in biological systems, and precise prediction of their functions is indispensable for practical applications. Despite the surge in protein sequence data facilitated by high-throughput techniques, unraveling the exact functionalities of proteins still demands considerable time and resources. Currently, numerous methods rely on protein sequences for prediction, while methods targeting protein structures are scarce, often employing convolutional neural networks (CNN) or graph convolutional networks (GCNs) individually.

RESULTS

To address these challenges, our approach starts from protein structures and proposes a method that combines CNN and GCN into a unified framework called the two-model adaptive weight fusion network (TAWFN) for protein function prediction. First, amino acid contact maps and sequences are extracted from the protein structure. Then, the sequence is used to generate one-hot encoded features and deep semantic features. These features, along with the constructed graph, are fed into the adaptive graph convolutional networks (AGCN) module and the multi-layer convolutional neural network (MCNN) module as needed, resulting in preliminary classification outcomes. Finally, the preliminary classification results are inputted into the adaptive weight computation network, where adaptive weights are calculated to fuse the initial predictions from both networks, yielding the final prediction result. To evaluate the effectiveness of our method, experiments were conducted on the PDBset and AFset datasets. For molecular function, biological process, and cellular component tasks, TAWFN achieved area under the precision-recall curve (AUPR) values of 0.718, 0.385, and 0.488 respectively, with corresponding Fmax scores of 0.762, 0.628, and 0.693, and Smin scores of 0.326, 0.483, and 0.454. The experimental results demonstrate that TAWFN exhibits promising performance, outperforming existing methods.

AVAILABILITY AND IMPLEMENTATION

The TAWFN source code can be found at: https://github.com/ss0830/TAWFN.

Collapse

Kugic A, Martin I, Modersohn L, Pallaoro P, Kreuzthaler M, Schulz S, Boeker M. Processing of Short-Form Content in Clinical Narratives: Systematic Scoping Review. J Med Internet Res 2024;26:e57852. [PMID: 39325515 PMCID: PMC11467596 DOI: 10.2196/57852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/24/2024] [Accepted: 07/25/2024] [Indexed: 09/27/2024] Open

Abstract

BACKGROUND

Clinical narratives are essential components of electronic health records. The adoption of electronic health records has increased documentation time for hospital staff, leading to the use of abbreviations and acronyms more frequently. This brevity can potentially hinder comprehension for both professionals and patients.

OBJECTIVE

This review aims to provide an overview of the types of short forms found in clinical narratives, as well as the natural language processing (NLP) techniques used for their identification, expansion, and disambiguation.

METHODS

In the databases Web of Science, Embase, MEDLINE, EBMR (Evidence-Based Medicine Reviews), and ACL Anthology, publications that met the inclusion criteria were searched according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for a systematic scoping review. Original, peer-reviewed publications focusing on short-form processing in human clinical narratives were included, covering the period from January 2018 to February 2023. Short-form types were extracted, and multidimensional research methodologies were assigned to each target objective (identification, expansion, and disambiguation). NLP study recommendations and study characteristics were systematically assigned occurrence rates for evaluation.

RESULTS

Out of a total of 6639 records, only 19 articles were included in the final analysis. Rule-based approaches were predominantly used for identifying short forms, while string similarity and vector representations were applied for expansion. Embeddings and deep learning approaches were used for disambiguation.

CONCLUSIONS

The scope and types of what constitutes a clinical short form were often not explicitly defined by the authors. This lack of definition poses challenges for reproducibility and for determining whether specific methodologies are suitable for different types of short forms. Analysis of a subset of NLP recommendations for assessing quality and reproducibility revealed only partial adherence to these recommendations. Single-character abbreviations were underrepresented in studies on clinical narrative processing, as were investigations in languages other than English. Future research should focus on these 2 areas, and each paper should include descriptions of the types of content analyzed.

Collapse

Bai P, Li G, Luo J, Liang C. Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training. Brief Bioinform 2024;25:bbae568. [PMID: 39489606 PMCID: PMC11531862 DOI: 10.1093/bib/bbae568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2024] [Revised: 09/24/2024] [Accepted: 10/22/2024] [Indexed: 11/05/2024] Open

Politano G, Benso A, Rehman HU, Re A. PRONTO-TK: a user-friendly PROtein Neural neTwOrk tool-kit for accessible protein function prediction. NAR Genom Bioinform 2024;6:lqae112. [PMID: 39193069 PMCID: PMC11348006 DOI: 10.1093/nargab/lqae112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2024] [Revised: 08/01/2024] [Accepted: 08/15/2024] [Indexed: 08/29/2024] Open

Yan H, Wang S, Liu H, Mamitsuka H, Zhu S. GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation. Bioinformatics 2024;40:ii53-ii61. [PMID: 39230707 PMCID: PMC11520413 DOI: 10.1093/bioinformatics/btae401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open

Wang B, Li W. Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction. Genes (Basel) 2024;15:1090. [PMID: 39202449 PMCID: PMC11353971 DOI: 10.3390/genes15081090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open

Jang YJ, Qin QQ, Huang SY, Peter ATJ, Ding XM, Kornmann B. Accurate prediction of protein function using statistics-informed graph networks. Nat Commun 2024;15:6601. [PMID: 39097570 PMCID: PMC11297950 DOI: 10.1038/s41467-024-50955-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/15/2024] [Indexed: 08/05/2024] Open

Rakibova Y, Dunham DT, Seed KD, Freddolino L. Nucleoid-associated proteins shape the global protein occupancy and transcriptional landscape of a clinical isolate of Vibrio cholerae. mSphere 2024;9:e0001124. [PMID: 38920383 PMCID: PMC11288032 DOI: 10.1128/msphere.00011-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 05/21/2024] [Indexed: 06/27/2024] Open

Abstract

Vibrio cholerae, the causative agent of the diarrheal disease cholera, poses an ongoing health threat due to its wide repertoire of horizontally acquired elements (HAEs) and virulence factors. New clinical isolates of the bacterium with improved fitness abilities, often associated with HAEs, frequently emerge. The appropriate control and expression of such genetic elements is critical for the bacteria to thrive in the different environmental niches they occupy. H-NS, the histone-like nucleoid structuring protein, is the best-studied xenogeneic silencer of HAEs in gamma-proteobacteria. Although H-NS and other highly abundant nucleoid-associated proteins (NAPs) have been shown to play important roles in regulating HAEs and virulence in model bacteria, we still lack a comprehensive understanding of how different NAPs modulate transcription in V. cholerae. By obtaining genome-wide measurements of protein occupancy and active transcription in a clinical isolate of V. cholerae, harboring recently discovered HAEs encoding for phage defense systems, we show that a lack of H-NS causes a robust increase in the expression of genes found in many HAEs. We further found that TsrA, a protein with partial homology to H-NS, regulates virulence genes primarily through modulation of H-NS activity. We also identified few sites that are affected by TsrA independently of H-NS, suggesting TsrA may act with diverse regulatory mechanisms. Our results demonstrate how the combinatorial activity of NAPs is employed by a clinical isolate of an important pathogen to regulate recently discovered HAEs.

IMPORTANCE

New strains of the bacterial pathogen Vibrio cholerae, bearing novel horizontally acquired elements (HAEs), frequently emerge. HAEs provide beneficial traits to the bacterium, such as antibiotic resistance and defense against invading bacteriophages. Xenogeneic silencers are proteins that help bacteria harness new HAEs and silence those HAEs until they are needed. H-NS is the best-studied xenogeneic silencer; it is one of the nucleoid-associated proteins (NAPs) in gamma-proteobacteria and is responsible for the proper regulation of HAEs within the bacterial transcriptional network. We studied the effects of H-NS and other NAPs on the HAEs of a clinical isolate of V. cholerae. Importantly, we found that H-NS partners with a small and poorly characterized protein, TsrA, to help domesticate new HAEs involved in bacterial survival and in causing disease. A proper understanding of the regulatory state in emerging isolates of V. cholerae will provide improved therapies against new isolates of the pathogen.

Collapse

de Oliveira GB, Pedrini H, Dias Z. Integrating Transformers and AutoML for Protein Function Prediction. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024;2024:1-5. [PMID: 40039729 DOI: 10.1109/embc53108.2024.10782139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]

Zhapa-Camacho F, Tang Z, Kulmanov M, Hoehndorf R. Predicting protein functions using positive-unlabeled ranking with ontology-based priors. Bioinformatics 2024;40:i401-i409. [PMID: 38940168 PMCID: PMC11211813 DOI: 10.1093/bioinformatics/btae237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open

Le VT, Malik MS, Tseng YH, Lee YC, Huang CI, Ou YY. DeepPLM_mCNN: An approach for enhancing ion channel and ion transporter recognition by multi-window CNN based on features from pre-trained language models. Comput Biol Chem 2024;110:108055. [PMID: 38555810 DOI: 10.1016/j.compbiolchem.2024.108055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/28/2024] [Accepted: 03/19/2024] [Indexed: 04/02/2024]

Zhang C, Freddolino L. A large-scale assessment of sequence database search tools for homology-based protein function prediction. Brief Bioinform 2024;25:bbae349. [PMID: 39038936 PMCID: PMC11262835 DOI: 10.1093/bib/bbae349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/03/2024] [Accepted: 07/05/2024] [Indexed: 07/24/2024] Open

Rakibova Y, Dunham DT, Seed KD, Freddolino PL. Nucleoid-associated proteins shape the global protein occupancy and transcriptional landscape of a clinical isolate of Vibrio cholerae. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.30.573743. [PMID: 38260642 PMCID: PMC10802314 DOI: 10.1101/2023.12.30.573743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Abstract

Vibrio cholerae, the causative agent of the diarrheal disease cholera, poses an ongoing health threat due to its wide repertoire of horizontally acquired elements (HAEs) and virulence factors. New clinical isolates of the bacterium with improved fitness abilities, often associated with HAEs, frequently emerge. The appropriate control and expression of such genetic elements is critical for the bacteria to thrive in the different environmental niches it occupies. H-NS, the histone-like nucleoid structuring protein, is the best studied xenogeneic silencer of HAEs in gamma-proteobacteria. Although H-NS and other highly abundant nucleoid-associated proteins (NAPs) have been shown to play important roles in regulating HAEs and virulence in model bacteria, we still lack a comprehensive understanding of how different NAPs modulate transcription in V. cholerae. By obtaining genome-wide measurements of protein occupancy and active transcription in a clinical isolate of V. cholerae, harboring recently discovered HAEs encoding for phage defense systems, we show that a lack of H-NS causes a robust increase in the expression of genes found in many HAEs. We further found that TsrA, a protein with partial homology to H-NS, regulates virulence genes primarily through modulation of H-NS activity. We also identified a few sites that are affected by TsrA independently of H-NS, suggesting TsrA may act with diverse regulatory mechanisms. Our results demonstrate how the combinatorial activity of NAPs is employed by a clinical isolate of an important pathogen to regulate recently discovered HAEs.

Importance

New strains of the bacterial pathogen Vibrio cholerae, bearing novel horizontally acquired elements (HAEs), frequently emerge. HAEs provide beneficial traits to the bacterium, such as antibiotic resistance and defense against invading bacteriophages. Xenogeneic silencers are proteins that help bacteria harness new HAEs and silence those HAEs until they are needed. H-NS is the best-studied xenogeneic silencer; it is one of the nucleoid-associated proteins (NAPs) in gamma-proteobacteria and is responsible for the proper regulation of HAEs within the bacterial transcriptional network. We studied the effects of H-NS and other NAPs on the HAEs of a clinical isolate of V. cholerae. Importantly, we found that H-NS partners with a small and poorly characterized protein, TsrA, to help domesticate new HAEs involved in bacterial survival and in causing disease. Proper understanding of the regulatory state in emerging isolates of V. cholerae will provide improved therapies against new isolates of the pathogen.

Collapse

Wang W, Shuai Y, Yang Q, Zhang F, Zeng M, Li M. A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches. Brief Bioinform 2024;25:bbae050. [PMID: 38388682 PMCID: PMC10883809 DOI: 10.1093/bib/bbae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/17/2024] [Accepted: 01/26/2024] [Indexed: 02/24/2024] Open

Zhu YH, Liu Z, Liu Y, Ji Z, Yu DJ. ULDNA: integrating unsupervised multi-source language models with LSTM-attention network for high-accuracy protein-DNA binding site prediction. Brief Bioinform 2024;25:bbae040. [PMID: 38349057 PMCID: PMC10939370 DOI: 10.1093/bib/bbae040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 01/02/2024] [Accepted: 01/22/2024] [Indexed: 02/15/2024] Open

Zhang C, Lydia Freddolino P. A large-scale assessment of sequence database search tools for homology-based protein function prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.14.567021. [PMID: 38013998 PMCID: PMC10680702 DOI: 10.1101/2023.11.14.567021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]

Oliveira GB, Pedrini H, Dias Z. TEMPROT: protein function annotation using transformers embeddings and homology search. BMC Bioinformatics 2023;24:242. [PMID: 37291492 DOI: 10.1186/s12859-023-05375-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open

Abstract

BACKGROUND

Although the development of sequencing technologies has provided a large number of protein sequences, the analysis of functions that each one plays is still difficult due to the efforts of laboratorial methods, making necessary the usage of computational methods to decrease this gap. As the main source of information available about proteins is their sequences, approaches that can use this information, such as classification based on the patterns of the amino acids and the inference based on sequence similarity using alignment tools, are able to predict a large collection of proteins. The methods available in the literature that use this type of feature can achieve good results, however, they present restrictions of protein length as input to their models. In this work, we present a new method, called TEMPROT, based on the fine-tuning and extraction of embeddings from an available architecture pre-trained on protein sequences. We also describe TEMPROT+, an ensemble between TEMPROT and BLASTp, a local alignment tool that analyzes sequence similarity, which improves the results of our former approach.

RESULTS

The evaluation of our proposed classifiers with the literature approaches has been conducted on our dataset, which was derived from CAFA3 challenge database. Both TEMPROT and TEMPROT+ achieved competitive results on [Formula: see text], [Formula: see text], AuPRC and IAuPRC metrics on Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) ontologies compared to state-of-the-art models, with the main results equal to 0.581, 0.692 and 0.662 of [Formula: see text] on BP, CC and MF, respectively.

CONCLUSIONS

The comparison with the literature showed that our model presented competitive results compared the state-of-the-art approaches considering the amino acid sequence pattern recognition and homology analysis. Our model also presented improvements related to the input size that the model can use to train compared to the literature methods.

Collapse