Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Asgari E, Mofrad MRK. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS One 2015;10:e0141287. [PMID: 26555596 PMCID: PMC4640716 DOI: 10.1371/journal.pone.0141287] [Citation(s) in RCA: 399] [Impact Index Per Article: 39.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 10/05/2015] [Indexed: 12/22/2022] Open

For:	Asgari E, Mofrad MRK. Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics. PLoS One 2015;10:e0141287. [PMID: 26555596 PMCID: PMC4640716 DOI: 10.1371/journal.pone.0141287] [Citation(s) in RCA: 399] [Impact Index Per Article: 39.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 10/05/2015] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Dražić E, Jelušić D, Janković Bevandić P, Mauša G, Kalafatovic D. Using Machine Learning to Fast-Track Peptide Nanomaterial Discovery. ACS NANO 2025;19:20295-20320. [PMID: 40440125 DOI: 10.1021/acsnano.5c00670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2025]

Wei Y, Tan Z, Liu L. CR-deal: Explainable Neural Network for circRNA-RBP Binding Site Recognition and Interpretation. Interdiscip Sci 2025;17:463-476. [PMID: 40146403 DOI: 10.1007/s12539-025-00694-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 02/01/2025] [Accepted: 02/06/2025] [Indexed: 03/28/2025]

Chatterjee A, Ravandi B, Haddadi P, Philip NH, Abdelmessih M, Mowrey WR, Ricchiuto P, Liang Y, Ding W, Mobarec JC, Eliassi-Rad T. Topology-driven negative sampling enhances generalizability in protein-protein interaction prediction. Bioinformatics 2025;41:btaf148. [PMID: 40193392 PMCID: PMC12080959 DOI: 10.1093/bioinformatics/btaf148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 03/03/2025] [Accepted: 04/04/2025] [Indexed: 04/09/2025] Open

Feng M, Liu L, Xian ZN, Wei X, Li K, Yan W, Lu Q, Shi Y, He G. PSTP: accurate residue-level phase separation prediction using protein conformational and language model embeddings. Brief Bioinform 2025;26:bbaf171. [PMID: 40315433 PMCID: PMC12047702 DOI: 10.1093/bib/bbaf171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Revised: 03/07/2025] [Accepted: 03/19/2025] [Indexed: 05/04/2025] Open

Affiliation(s)

Mofan Feng Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China Shanghai Institute of Medical Genetics, Shanghai Children’s Hospital, Shanghai Jiao Tong University School of Medicine, No. 24 Lane 1400 West Beijing Road, Jing’an District, Shanghai 200040, China
Liangjie Liu Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China Shanghai Institute of Medical Genetics, Shanghai Children’s Hospital, Shanghai Jiao Tong University School of Medicine, No. 24 Lane 1400 West Beijing Road, Jing’an District, Shanghai 200040, China
Zhuo-Ning Xian School of Environmental Science & Engineering, Shanghai Jiao Tong University, No. 800 Dongchuan Road, Minhang District, Shanghai 200240, China
Xiaoxi Wei Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China
Keyi Li Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China Shanghai Institute of Medical Genetics, Shanghai Children’s Hospital, Shanghai Jiao Tong University School of Medicine, No. 24 Lane 1400 West Beijing Road, Jing’an District, Shanghai 200040, China
Wenqian Yan Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China Shanghai Institute of Medical Genetics, Shanghai Children’s Hospital, Shanghai Jiao Tong University School of Medicine, No. 24 Lane 1400 West Beijing Road, Jing’an District, Shanghai 200040, China
Qing Lu Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China
Yi Shi Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China Shanghai Institute of Medical Genetics, Shanghai Children’s Hospital, Shanghai Jiao Tong University School of Medicine, No. 24 Lane 1400 West Beijing Road, Jing’an District, Shanghai 200040, China
Guang He Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China Shanghai Institute of Medical Genetics, Shanghai Children’s Hospital, Shanghai Jiao Tong University School of Medicine, No. 24 Lane 1400 West Beijing Road, Jing’an District, Shanghai 200040, China

Collapse

Kim S, Kim MA, Kim B, Lee J, Jung SK, Kim J, Chung HY, Lee CY, Jeong S. Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment. BMC Genomics 2025;26:395. [PMID: 40269678 PMCID: PMC12020041 DOI: 10.1186/s12864-025-11589-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Accepted: 04/09/2025] [Indexed: 04/25/2025] Open

Abstract

BACKGROUND

Influenza A virus (IAV) is a major global health threat, causing seasonal epidemics and occasional pandemics. Particularly, Influenza A viruses from avian species pose significant zoonotic threats, with PB2 adaptation serving as a critical first step in cross-species transmission. A comprehensive risk assessment framework based on PB2 sequences is necessary, which should encompass detailed analyses of specific residues and mutations while maintaining sufficient generality for application to non-PB2 segments.

RESULTS

In this study, we developed two complementary approaches: a regression-based model for accurately distinguishing among risk groups, and a SHAP-based risk assessment model for more meaningful risk analyses. For the regression-based risk models, we compared various methodologies, including tree ensemble methods, conventional regression models, and deep learning architectures. The optimized regression model, combined with SHAP value analysis, identified and ranked individual residues contributing to zoonotic potential. The SHAP-based risk model enabled intra-class analyses within the zoonotic risk assessment framework and quantified risk yields from specific mutations.

CONCLUSION

Experimental analyses demonstrated that the Random Forest regression model outperformed other models in most cases, and we validated the target value settings for risk regression through ablation studies. Our SHAP-based analysis identified key residues (271A, 627K, 591R, 588A, 292I, 684S, 684A, 81M, 199S, and 368Q) and mutations (T271A, Q368R/K, E627K, Q591R, A588T/I/V, and I292V/T) critical for zoonotic risk assessment. Using the SHAP-based risk assessment model, we found that influenza A viruses from Phasianidae showed elevated zoonotic risk scores compared to those from other avian species. Additionally, mutations I292V/T, Q368R, A588T/I, V598A/I/T, and E/V627K were identified as significant mutations in the Phasianidae. These PB2-focused quantitative methods provide a robust and generalizable framework for both rapid screening of avians' zoonotic potential and analytical quantification of risks associated with specific residues or mutations.

Collapse

Kallergis G, Asgari E, Empting M, Hirsch AKH, Klawonn F, McHardy AC. Domain adaptable language modeling of chemical compounds identifies potent pathoblockers for Pseudomonas aeruginosa. Commun Chem 2025;8:114. [PMID: 40216964 PMCID: PMC11992043 DOI: 10.1038/s42004-025-01484-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/05/2025] [Indexed: 04/14/2025] Open

Wang Y, Wang C. PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features. Molecules 2025;30:1704. [PMID: 40333592 PMCID: PMC12029579 DOI: 10.3390/molecules30081704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2025] [Revised: 04/05/2025] [Accepted: 04/06/2025] [Indexed: 05/09/2025] Open

D'Hondt S, Oramas J, De Winter H. A beginner's approach to deep learning applied to VS and MD techniques. J Cheminform 2025;17:47. [PMID: 40200329 PMCID: PMC11980327 DOI: 10.1186/s13321-025-00985-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 03/12/2025] [Indexed: 04/10/2025] Open

Abstract

It has become impossible to imagine the fields of biochemistry and medicinal chemistry without computational chemistry and molecular modelling techniques. In many steps of the drug development process in silico methods have become indispensable. Virtual screening (VS) can tremendously expedite the early discovery phase, whilst the use of molecular dynamics (MD) simulations forms a powerful additional tool to in vitro methods throughout the entire drug discovery process. In the field of biochemistry, MD has also become a compelling method for studying biophysical systems (e.g., protein folding) complementary to experimental techniques. However, both VS and MD come with their own limitations and methodological difficulties, from hardware limitations to restrictions in algorithmic capabilities. One solution to overcoming these difficulties lies in the field of machine learning (ML), and more specifically deep learning (DL). There are many ways in which DL can be applied to these molecular modelling techniques to achieve more accurate results in a more efficient manner or expedite the data analysis of the acquired results. Despite steadily increasing interest in DL amidst computational chemists, knowledge is still limited and scattered over different resources. This review is aimed at computational chemists with knowledge of molecular modelling, who wish to possibly integrate DL approaches in their research and already have a basic understanding of the fundamentals of DL. This review focusses on a survey of recent applications of DL in molecular modelling techniques. The different sections are logically subdivided, based on where DL is integrated in the research: (1) for the improvement of VS workflows, (2) for the improvement of certain workflows in MD simulations, (3) for aiding in the calculations of interatomic forces, or (4) for data analysis of MD trajectories. It will become clear that DL has the capacity to completely transform the way molecular modelling is carried out.

Collapse

Heinzinger M, Rost B. Teaching AI to speak protein. Curr Opin Struct Biol 2025;91:102986. [PMID: 39985945 DOI: 10.1016/j.sbi.2025.102986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 12/30/2024] [Accepted: 01/02/2025] [Indexed: 02/24/2025]

Chaturvedi M, Rashid MA, Paliwal KK. RNA structure prediction using deep learning - A comprehensive review. Comput Biol Med 2025;188:109845. [PMID: 39983363 DOI: 10.1016/j.compbiomed.2025.109845] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 02/09/2025] [Accepted: 02/10/2025] [Indexed: 02/23/2025]

Bjerregaard A, Groth PM, Hauberg S, Krogh A, Boomsma W. Foundation models of protein sequences: A brief overview. Curr Opin Struct Biol 2025;91:103004. [PMID: 39983412 DOI: 10.1016/j.sbi.2025.103004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2024] [Revised: 01/24/2025] [Accepted: 01/26/2025] [Indexed: 02/23/2025]

Abbas Z, Kim S, Lee N, Kazmi SAW, Lee SW. A robust ensemble framework for anticancer peptide classification using multi-model voting approach. Comput Biol Med 2025;188:109750. [PMID: 40032410 DOI: 10.1016/j.compbiomed.2025.109750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/14/2025] [Accepted: 01/22/2025] [Indexed: 03/05/2025]

Refahi M, Sokhansanj BA, Mell JC, Brown JR, Yoo H, Hearne G, Rosen GL. Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization. Commun Biol 2025;8:517. [PMID: 40155693 PMCID: PMC11953366 DOI: 10.1038/s42003-025-07902-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 03/07/2025] [Indexed: 04/01/2025] Open

Mao Y, Xu W, Shun Y, Chai L, Xue L, Yang Y, Li M. A multimodal model for protein function prediction. Sci Rep 2025;15:10465. [PMID: 40140535 PMCID: PMC11947276 DOI: 10.1038/s41598-025-94612-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Accepted: 03/14/2025] [Indexed: 03/28/2025] Open

Tyagi N, Vahab N, Tyagi S. Genome language modeling (GLM): a beginner's cheat sheet. Biol Methods Protoc 2025;10:bpaf022. [PMID: 40370585 PMCID: PMC12077296 DOI: 10.1093/biomethods/bpaf022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 02/17/2025] [Accepted: 03/23/2025] [Indexed: 05/16/2025] Open

Zheng S. Navigating the unstructured by evaluating alphafold's efficacy in predicting missing residues and structural disorder in proteins. PLoS One 2025;20:e0313812. [PMID: 40131945 PMCID: PMC11936262 DOI: 10.1371/journal.pone.0313812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2024] [Accepted: 02/18/2025] [Indexed: 03/27/2025] Open

Michels J, Bandarupalli R, Ahangar Akbari A, Le T, Xiao H, Li J, Hom EFY. Natural Language Processing Methods for the Study of Protein-Ligand Interactions. J Chem Inf Model 2025;65:2191-2213. [PMID: 39993834 PMCID: PMC11898065 DOI: 10.1021/acs.jcim.4c01907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 02/05/2025] [Accepted: 02/06/2025] [Indexed: 02/26/2025]

Wang J, Liu Z, Zhang C, Cao Y, Liu B, Shu Y, Thum Y, Zhang J. A deep learning approach to understanding controlled ovarian stimulation and in vitro fertilization dynamics. Sci Rep 2025;15:7821. [PMID: 40050418 PMCID: PMC11885538 DOI: 10.1038/s41598-025-92186-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Accepted: 02/25/2025] [Indexed: 03/09/2025] Open

Mao Y, Wu J, Weng J, Li M, Xiong Y, Gu W, Jiang R, Pang R, Lin X, Tang D. Inter-view contrastive learning and miRNA fusion for lncRNA-protein interaction prediction in heterogeneous graphs. Brief Bioinform 2025;26:bbaf148. [PMID: 40194558 PMCID: PMC11975365 DOI: 10.1093/bib/bbaf148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2024] [Revised: 02/19/2025] [Accepted: 03/16/2025] [Indexed: 04/09/2025] Open

Kalemati M, Zamani Emani M, Koohi S. InceptionDTA: Predicting drug-target binding affinity with biological context features and inception networks. Heliyon 2025;11:e42476. [PMID: 40007773 PMCID: PMC11850134 DOI: 10.1016/j.heliyon.2025.e42476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 01/23/2025] [Accepted: 02/04/2025] [Indexed: 02/27/2025] Open

Abstract

Predicting drug-target binding affinity via in silico methods is crucial in drug discovery. Traditional machine learning relies on manually engineered features from limited data, leading to suboptimal performance. In contrast, deep learning excels at extracting features from raw sequences but often overlooks essential biological context features, hindering effective binding prediction. Additionally, these models struggle to capture global and local feature distributions efficiently in protein sequences and drug SMILES. Previous state-of-the-art models, like transformers and graph-based approaches, face scalability and resource efficiency challenges. Transformers struggle with scalability, while graph-based methods have difficulty handling large datasets and complex molecular structures. In this paper, we introduce InceptionDTA, a novel drug-target binding affinity prediction model that leverages CharVec, an enhanced variant of Prot2Vec, to incorporate both biological context and categorical features into protein sequence encoding. InceptionDTA utilizes a multi-scale convolutional architecture based on the Inception network to capture features at various spatial resolutions, enabling the extraction of both local and global features from protein sequences and drug SMILES. We evaluate InceptionDTA across a range of benchmark datasets commonly used in drug-target binding affinity prediction. Our results demonstrate that InceptionDTA outperforms various sequence-based, transformer-based, and graph-based deep learning approaches across warm-start, refined, and cold-start splitting settings. In addition to using CharVec, which demonstrates greater accuracy in absolute predictions, InceptionDTA also includes a version that employs simple label encoding and excels in ranking and predicting relative binding affinities. This versatility highlights how InceptionDTA can effectively adapt to various predictive requirements. These results emphasize the promise of our approach in expediting drug repurposing initiatives, enabling the discovery of new drugs, and contributing to advancements in disease treatment.

Collapse

Vural O, Jololian L. Machine learning approaches for predicting protein-ligand binding sites from sequence data. FRONTIERS IN BIOINFORMATICS 2025;5:1520382. [PMID: 39963299 PMCID: PMC11830693 DOI: 10.3389/fbinf.2025.1520382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Accepted: 01/10/2025] [Indexed: 02/20/2025] Open

Feng T, Chen X, Wu S, Tang W, Zhou H, Fang Z. Predicting the bacterial host range of plasmid genomes using the language model-based one-class support vector machine algorithm. Microb Genom 2025;11. [PMID: 39932495 DOI: 10.1099/mgen.0.001355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025] Open

Abstract

The prediction of the plasmid host range is crucial for investigating the dissemination of plasmids and the transfer of resistance and virulence genes mediated by plasmids. Several machine learning-based tools have been developed to predict plasmid host ranges. These tools have been trained and tested based on the bacterial host records of plasmids in related databases. Typically, a plasmid genome in databases such as the National Center for Biotechnology Information is annotated with only one or a few bacterial hosts, which does not encompass all possible hosts. Consequently, existing methods may significantly underestimate the host ranges of mobile plasmids. In this work, we propose a novel method named HRPredict, which employs a word vector model to digitally represent the encoded proteins on plasmid genomes. Since it is difficult to confirm which host a particular plasmid definitely cannot enter, we developed a machine learning approach for predicting whether a plasmid can enter a specific bacterium as a no-negative samples learning task. Using multiple one-class support vector machine (SVM) models that do not require negative samples for training, HRPredict predicts the host range of plasmids across 45 families, 56 genera and 56 species. In the benchmark test set, we constructed reliable negative samples for each host taxonomic unit via two indirect methods, and we found that the area under the curve (AUC), F1-score, recall, precision and accuracy of most taxonomic unit prediction models exceeded 0.9. Among the 13 broad-host-range plasmid types, HRPredict demonstrated greater coverage than HOTSPOT and PlasmidHostFinder, thus successfully predicting the majority of hosts previously reported. Through feature importance calculation for each SVM model, we found that genes closely related to the plasmid host range are involved in functions such as bacterial adaptability, pathogenicity and survival. These findings provide significant insight into the mechanisms through which bacteria adjust to diverse environments through plasmids. The HRPredict algorithm is expected to facilitate in-depth research on the spread of broad-host-range plasmids and enable host-range predictions for novel plasmids reconstructed from microbiome sequencing data.

Collapse

Liu J, Yang M, Yu Y, Xu H, Wang T, Li K, Zhou X. Advancing bioinformatics with large language models: components, applications and perspectives. ARXIV 2025:arXiv:2401.04155v2. [PMID: 38259343 PMCID: PMC10802675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Bhargav P, Mukherjee A. AlphaMut: A Deep Reinforcement Learning Model to Suggest Helix-Disrupting Mutations. J Chem Theory Comput 2025;21:463-473. [PMID: 39702999 DOI: 10.1021/acs.jctc.4c01387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]

Hu J, Hu S, Xia M, Zheng K, Zhang X. Drug-target binding affinity prediction based on power graph and word2vec. BMC Med Genomics 2025;18:9. [PMID: 39806396 PMCID: PMC11730168 DOI: 10.1186/s12920-024-02073-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 12/13/2024] [Indexed: 01/16/2025] Open

Zeng R, Li Z, Li J, Zhang Q. DNA promoter task-oriented dictionary mining and prediction model based on natural language technology. Sci Rep 2025;15:153. [PMID: 39747934 PMCID: PMC11697570 DOI: 10.1038/s41598-024-84105-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/19/2024] [Indexed: 01/04/2025] Open

Bowyer S, Allen DJ, Furnham N. Unveiling the ghost: machine learning's impact on the landscape of virology. J Gen Virol 2025;106. [PMID: 39804261 DOI: 10.1099/jgv.0.002067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2025] Open

Santa S, Kwofie SK, Agyenkwa-Mawuli K, Quaye O, Brown CA, Tagoe EA. Prediction of Human Papillomavirus-Host Oncoprotein Interactions Using Deep Learning. Bioinform Biol Insights 2024;18:11779322241304666. [PMID: 39664297 PMCID: PMC11632871 DOI: 10.1177/11779322241304666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 11/16/2024] [Indexed: 12/13/2024] Open

Liang L, Duan Y, Zeng C, Wan B, Yao H, Liu H, Lu T, Zhang Y, Chen Y, Shen J. CPIScore: A Deep Learning Approach for Rapid Scoring and Interpretation of Protein-Ligand Binding Interactions. J Chem Inf Model 2024;64:8809-8823. [PMID: 39563077 DOI: 10.1021/acs.jcim.4c01175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2024]

Ghazikhani H, Butler G. Ion channel classification through machine learning and protein language model embeddings. J Integr Bioinform 2024;21:jib-2023-0047. [PMID: 39572876 PMCID: PMC11698620 DOI: 10.1515/jib-2023-0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 09/04/2024] [Indexed: 01/06/2025] Open

Basu S, Yu J, Kihara D, Kurgan L. Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences. Brief Bioinform 2024;26:bbaf016. [PMID: 39833102 PMCID: PMC11745544 DOI: 10.1093/bib/bbaf016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/24/2024] [Accepted: 01/06/2025] [Indexed: 01/22/2025] Open

Chang Y, Wu L. CapHLA: a comprehensive tool to predict peptide presentation and binding to HLA class I and class II. Brief Bioinform 2024;26:bbae595. [PMID: 39688477 DOI: 10.1093/bib/bbae595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 09/13/2024] [Accepted: 12/14/2024] [Indexed: 12/18/2024] Open

Vu TTD, Kim J, Jung J. An experimental analysis of graph representation learning for Gene Ontology based protein function prediction. PeerJ 2024;12:e18509. [PMID: 39553733 PMCID: PMC11569786 DOI: 10.7717/peerj.18509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 10/21/2024] [Indexed: 11/19/2024] Open

Abdollahi S, Schaub DP, Barroso M, Laubach NC, Hutwelker W, Panzer U, Gersting SØW, Bonn S. A comprehensive comparison of deep learning-based compound-target interaction prediction models to unveil guiding design principles. J Cheminform 2024;16:118. [PMID: 39468635 PMCID: PMC11520803 DOI: 10.1186/s13321-024-00913-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 10/10/2024] [Indexed: 10/30/2024] Open

Abstract

The evaluation of compound-target interactions (CTIs) is at the heart of drug discovery efforts. Given the substantial time and monetary costs of classical experimental screening, significant efforts have been dedicated to develop deep learning-based models that can accurately predict CTIs. A comprehensive comparison of these models on a large, curated CTI dataset is, however, still lacking. Here, we perform an in-depth comparison of 12 state-of-the-art deep learning architectures that use different protein and compound representations. The models were selected for their reported performance and architectures. To reliably compare model performance, we curated over 300 thousand binding and non-binding CTIs and established several gold-standard datasets of varying size and information. Based on our findings, DeepConv-DTI consistently outperforms other models in CTI prediction performance across the majority of datasets. It achieves an MCC of 0.6 or higher for most of the datasets and is one of the fastest models in training and inference. These results indicate that utilizing convolutional-based windows as in DeepConv-DTI to traverse trainable embeddings is a highly effective approach for capturing informative protein features. We also observed that physicochemical embeddings of targets increased model performance. We therefore modified DeepConv-DTI to include normalized physicochemical properties, which resulted in the overall best performing model Phys-DeepConv-DTI. This work highlights how the systematic evaluation of input features of compounds and targets, as well as their corresponding neural network architectures, can serve as a roadmap for the future development of improved CTI models.Scientific contributionThis work features comprehensive CTI datasets to allow for the objective comparison and benchmarking of CTI prediction algorithms. Based on this dataset, we gained insights into which embeddings of compounds and targets and which deep learning-based algorithms perform best, providing a blueprint for the future development of CTI algorithms. Using the insights gained from this screen, we provide a novel CTI algorithm with state-of-the-art performance.

Collapse

Michels J, Bandarupalli R, Akbari AA, Le T, Xiao H, Li J, Hom EFY. Natural Language Processing Methods for the Study of Protein-Ligand Interactions. ARXIV 2024:arXiv:2409.13057v2. [PMID: 39483353 PMCID: PMC11527106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]

Flamholz ZN, Li C, Kelly L. Improving viral annotation with artificial intelligence. mBio 2024;15:e0320623. [PMID: 39230289 PMCID: PMC11481560 DOI: 10.1128/mbio.03206-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open

Mera-Banguero C, Orduz S, Cardona P, Orrego A, Muñoz-Pérez J, Branch-Bedoya JW. AmpClass: an Antimicrobial Peptide Predictor Based on Supervised Machine Learning. AN ACAD BRAS CIENC 2024;96:e20230756. [PMID: 39383429 DOI: 10.1590/0001-3765202420230756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 04/07/2024] [Indexed: 10/11/2024] Open

Susanty M, Mursalim MKN, Hertadi R, Purwarianti A, LE Rajab T. Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification. Comput Biol Chem 2024;112:108163. [PMID: 39098138 DOI: 10.1016/j.compbiolchem.2024.108163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/02/2024] [Accepted: 07/24/2024] [Indexed: 08/06/2024]

Yang S, Cheng P, Liu Y, Feng D, Wang S. Exploring the Knowledge of an Outstanding Protein to Protein Interaction Transformer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:1287-1298. [PMID: 38536676 DOI: 10.1109/tcbb.2024.3381825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]

Cai C, Li J, Xia Y, Li W. FluPMT: Prediction of Predominant Strains of Influenza A Viruses via Multi-Task Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024;21:1254-1263. [PMID: 38498763 DOI: 10.1109/tcbb.2024.3378468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]

Ji M, Kan Y, Kim D, Lee S, Yi G. DeepPI: Alignment-Free Analysis of Flexible Length Proteins Based on Deep Learning and Image Generator. Interdiscip Sci 2024;16:1-12. [PMID: 38568406 DOI: 10.1007/s12539-024-00618-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 02/01/2024] [Accepted: 02/03/2024] [Indexed: 09/19/2024]

Zhou J, Huang M. Navigating the landscape of enzyme design: from molecular simulations to machine learning. Chem Soc Rev 2024;53:8202-8239. [PMID: 38990263 DOI: 10.1039/d4cs00196f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]

Nambiar A, Forsyth JM, Liu S, Maslov S. DR-BERT: A protein language model to annotate disordered regions. Structure 2024;32:1260-1268.e3. [PMID: 38701796 DOI: 10.1016/j.str.2024.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 06/16/2023] [Accepted: 04/08/2024] [Indexed: 05/05/2024]

Ghazikhani H, Butler G. Exploiting protein language models for the precise classification of ion channels and ion transporters. Proteins 2024;92:998-1055. [PMID: 38656743 DOI: 10.1002/prot.26694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 03/26/2024] [Accepted: 04/08/2024] [Indexed: 04/26/2024]

Ryan V WG, Imami AS, Ali Sajid H, Vergis J, Zhang X, Meller J, Shukla R, McCullumsmith R. Interpreting and visualizing pathway analyses using embedding representations with PAVER. Bioinformation 2024;20:700-704. [PMID: 39309552 PMCID: PMC11414338 DOI: 10.6026/973206300200700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 07/31/2024] [Accepted: 07/31/2024] [Indexed: 09/25/2024] Open

Cosentino S, Sriswasdi S, Iwasaki W. SonicParanoid2: fast, accurate, and comprehensive orthology inference with machine learning and language models. Genome Biol 2024;25:195. [PMID: 39054525 PMCID: PMC11270883 DOI: 10.1186/s13059-024-03298-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 06/04/2024] [Indexed: 07/27/2024] Open

Jia Q, Xia Y, Dong F, Li W. MetaFluAD: meta-learning for predicting antigenic distances among influenza viruses. Brief Bioinform 2024;25:bbae395. [PMID: 39129362 PMCID: PMC11317534 DOI: 10.1093/bib/bbae395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/24/2024] [Accepted: 07/27/2024] [Indexed: 08/13/2024] Open

Ranjan A, Bess A, Alvin C, Mukhopadhyay S. MDF-DTA: A Multi-Dimensional Fusion Approach for Drug-Target Binding Affinity Prediction. J Chem Inf Model 2024;64:4980-4990. [PMID: 38888163 PMCID: PMC11234358 DOI: 10.1021/acs.jcim.4c00310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/15/2024] [Accepted: 05/29/2024] [Indexed: 06/20/2024]

Banerjee P, Eulenstein O, Friedberg I. Discovering genomic islands in unannotated bacterial genomes using sequence embedding. BIOINFORMATICS ADVANCES 2024;4:vbae089. [PMID: 38911822 PMCID: PMC11193100 DOI: 10.1093/bioadv/vbae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 05/26/2024] [Accepted: 06/11/2024] [Indexed: 06/25/2024]

Zhang B, Hou Z, Yang Y, Wong KC, Zhu H, Li X. SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues. Commun Biol 2024;7:679. [PMID: 38830995 PMCID: PMC11148103 DOI: 10.1038/s42003-024-06332-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/15/2024] [Indexed: 06/05/2024] Open

Abstract

Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .

Collapse