Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018;34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 254] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open

For:	Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018;34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 254] [Impact Index Per Article: 36.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open

Number

Cited by Other Article(s)

Pan H, Wu Z, Liu W, Zhang G. AlphaFun: Structural-Alignment-Based Proteome Annotation Reveals why the Functionally Unknown Proteins (uPE1) Are So Understudied. J Proteome Res 2024;23:1593-1602. [PMID: 38626392 PMCID: PMC11078154 DOI: 10.1021/acs.jproteome.3c00678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/27/2024] [Accepted: 04/03/2024] [Indexed: 04/18/2024]

Armah-Sekum RE, Szedmak S, Rousu J. Protein function prediction through multi-view multi-label latent tensor reconstruction. BMC Bioinformatics 2024;25:174. [PMID: 38698340 PMCID: PMC11067221 DOI: 10.1186/s12859-024-05789-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 04/17/2024] [Indexed: 05/05/2024] Open

Saha SS, Sandha SS, Aggarwal M, Wang B, Han L, DE Gortari Briseno J, Srivastava M. TinyNS: Platform-Aware Neurosymbolic Auto Tiny Machine Learning. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS : TECS 2024;23:43. [PMID: 38933471 PMCID: PMC11200268 DOI: 10.1145/3603171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 05/28/2023] [Indexed: 06/28/2024]

Dotan E, Jaschek G, Pupko T, Belinkov Y. Effect of tokenization on transformers for biological sequences. Bioinformatics 2024;40:btae196. [PMID: 38608190 PMCID: PMC11055402 DOI: 10.1093/bioinformatics/btae196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 02/20/2024] [Accepted: 04/11/2024] [Indexed: 04/14/2024] Open

Abstract

MOTIVATION

Deep-learning models are transforming biological research, including many bioinformatics and comparative genomics algorithms, such as sequence alignments, phylogenetic tree inference, and automatic classification of protein functions. Among these deep-learning algorithms, models for processing natural languages, developed in the natural language processing (NLP) community, were recently applied to biological sequences. However, biological sequences are different from natural languages, such as English, and French, in which segmentation of the text to separate words is relatively straightforward. Moreover, biological sequences are characterized by extremely long sentences, which hamper their processing by current machine-learning models, notably the transformer architecture. In NLP, one of the first processing steps is to transform the raw text to a list of tokens. Deep-learning applications to biological sequence data mostly segment proteins and DNA to single characters. In this work, we study the effect of alternative tokenization algorithms on eight different tasks in biology, from predicting the function of proteins and their stability, through nucleotide sequence alignment, to classifying proteins to specific families.

RESULTS

We demonstrate that applying alternative tokenization algorithms can increase accuracy and at the same time, substantially reduce the input length compared to the trivial tokenizer in which each character is a token. Furthermore, applying these tokenization algorithms allows interpreting trained models, taking into account dependencies among positions. Finally, we trained these tokenizers on a large dataset of protein sequences containing more than 400 billion amino acids, which resulted in over a 3-fold decrease in the number of tokens. We then tested these tokenizers trained on large-scale data on the above specific tasks and showed that for some tasks it is highly beneficial to train database-specific tokenizers. Our study suggests that tokenizers are likely to be a critical component in future deep-network analysis of biological sequence data.

AVAILABILITY AND IMPLEMENTATION

Code, data, and trained tokenizers are available on https://github.com/technion-cs-nlp/BiologicalTokenizers.

Collapse

Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024;25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open

Wenzel M, Grüner E, Strodthoff N. Insights into the inner workings of transformer models for protein function prediction. Bioinformatics 2024;40:btae031. [PMID: 38244570 PMCID: PMC10950482 DOI: 10.1093/bioinformatics/btae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/14/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open

Mickael M, Łazarczyk M, Kubick N, Gurba A, Kocki T, Horbańczuk JO, Atanasov AG, Sacharczuk M, Religa P. FEZF2 and AIRE1: An Evolutionary Trade-off in the Elimination of Auto-reactive T Cells in the Thymus. J Mol Evol 2024;92:72-86. [PMID: 38285197 DOI: 10.1007/s00239-024-10157-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 01/15/2024] [Indexed: 01/30/2024]

Abstract

Autoimmune Regulator 1 (AIRE1) and Forebrain Embryonic Zinc Finger-Like Protein 2 (FEZF2) play pivotal roles in orchestrating the expression of tissue-restricted antigens (TRA) to facilitate the elimination of autoreactive T cells. AIRE1's presence in the gonads of various vertebrates has raised questions about its potential involvement in gene expression control for germline cell selection. Nevertheless, the evolutionary history of these genes has remained enigmatic, as has the rationale behind their apparent redundancy in vertebrates. Furthermore, the origin of the elimination process itself has remained elusive. To shed light on these mysteries, we conducted a comprehensive evolutionary analysis employing a range of tools, including multiple sequence alignment, phylogenetic tree construction, ancestral sequence reconstruction, and positive selection assessment. Our investigations revealed intriguing insights. AIRE1 homologs emerged during the divergence of T cells in higher vertebrates, signifying its role in this context. Conversely, FEZF2 exhibited multiple homologs spanning invertebrates, lampreys, and higher vertebrates. Ancestral sequence reconstruction demonstrated distinct origins for AIRE1 and FEZF2, underscoring that their roles in regulating TRA have evolved through disparate pathways. Furthermore, it became evident that both FEZF2 and AIRE1 govern a diverse repertoire of genes, encompassing ancient and more recently diverged targets. Notably, FEZF2 demonstrates expression in both vertebrate and invertebrate embryos and germlines, accentuating its widespread role. Intriguingly, FEZF2 harbors motifs associated with autophagy, such as DKFPHP, SYSELWKSSL, and SYSEL, a process integral to cell selection in invertebrates. Our findings suggest that FEZF2 initially emerged to regulate self-elimination in the gonads of invertebrates. As organisms evolved toward greater complexity, AIRE1 likely emerged to complement FEZF2's role, participating in the regulation of cell selection for elimination in both gonads and the thymus. This dynamic interplay between AIRE1 and FEZF2 underscores their multifaceted contributions to TRA expression regulation across diverse evolutionary contexts.

Collapse

Zheng L, Shi S, Lu M, Fang P, Pan Z, Zhang H, Zhou Z, Zhang H, Mou M, Huang S, Tao L, Xia W, Li H, Zeng Z, Zhang S, Chen Y, Li Z, Zhu F. AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding. Genome Biol 2024;25:41. [PMID: 38303023 PMCID: PMC10832132 DOI: 10.1186/s13059-024-03166-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 01/05/2024] [Indexed: 02/03/2024] Open

Affiliation(s)

Lingyan Zheng College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
Shuiyang Shi College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Mingkun Lu College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Pan Fang Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
Ziqi Pan College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Hongning Zhang College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Zhimeng Zhou College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Hanyu Zhang College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Minjie Mou College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Shijie Huang College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Lin Tao Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
Weiqi Xia Pharmaceutical Department, Zhejiang Provincial People's Hospital, Hangzhou, 310014, China
Honglin Li School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
Zhenyu Zeng Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
Shun Zhang Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
Yuzong Chen State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China
Zhaorong Li Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China. Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
Feng Zhu College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China. Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China. Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.

Collapse

Chou JCC, Decosto CM, Chatterjee P, Dassama LMK. Rapid proteome-wide prediction of lipid-interacting proteins through ligand-guided structural genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.26.577452. [PMID: 38352308 PMCID: PMC10862712 DOI: 10.1101/2024.01.26.577452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]

Wang W, Shuai Y, Yang Q, Zhang F, Zeng M, Li M. A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches. Brief Bioinform 2024;25:bbae050. [PMID: 38388682 PMCID: PMC10883809 DOI: 10.1093/bib/bbae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/17/2024] [Accepted: 01/26/2024] [Indexed: 02/24/2024] Open

Tepeli YI, Seale C, Gonçalves JP. ELISL: early-late integrated synthetic lethality prediction in cancer. Bioinformatics 2024;40:btad764. [PMID: 38113447 PMCID: PMC11616771 DOI: 10.1093/bioinformatics/btad764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/06/2023] [Accepted: 12/18/2023] [Indexed: 12/21/2023] Open

Fu Y, Gu Z, Luo X, Guo Q, Lai L, Deng M. Learning a generalized graph transformer for protein function prediction in dissimilar sequences. Gigascience 2024;13:giae093. [PMID: 39657158 PMCID: PMC11734293 DOI: 10.1093/gigascience/giae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 07/04/2024] [Accepted: 10/25/2024] [Indexed: 12/17/2024] Open

Abstract

BACKGROUND

In the face of a growing disparity between high-throughput sequence data and low-throughput experimental studies, the emerging field of deep learning stands as a promising alternative. Generally, many data-driven approaches are capable of facilitating fast and accurate predictions of protein functions. Nevertheless, the inherent statistical nature of deep learning techniques may limit their generalization capabilities when applied to novel nonhomologous proteins that diverge significantly from existing ones.

RESULTS

In this work, we herein propose a novel, generalized approach named Graph Adversarial Learning with Alignment (GALA) for protein function prediction. Our GALA method integrates a graph transformer architecture with an attention pooling module to extract embeddings from both protein sequences and structures, facilitating unified learning of protein representations. Particularly noteworthy, GALA incorporates a domain discriminator conditioned on both learnable representations and predicted probabilities, which undergoes adversarial learning to ensure representation invariance across diverse environments. To optimize the model with abundant label information, we generate label embeddings in the hidden space, explicitly aligning them with protein representations. Benchmarked on datasets derived from the PDB database and Swiss-Prot database, our GALA achieves considerable performance comparable to several state-of-the-art methods. Even more, GALA demonstrates wonderful biological interpretability by identifying significant functional residues associated with Gene Ontology terms through class activation mapping.

CONCLUSIONS

GALA, which leverages adversarial learning and label embedding alignment to acquire domain-invariant protein representations, exhibits outstanding generalizability in function prediction for proteins from previously unseen sequence space. By incorporating the structures predicted by AlphaFold2, GALA demonstrates significant potential for function annotation in newly discovered sequences. A detailed implementation of our GALA is available at https://github.com/fuyw-aisw/GALA.

Collapse

Sharma L, Deepak A, Ranjan A, Krishnasamy G. A CNN-CBAM-BIGRU model for protein function prediction. Stat Appl Genet Mol Biol 2024;23:sagmb-2024-0004. [PMID: 38943434 DOI: 10.1515/sagmb-2024-0004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 06/07/2024] [Indexed: 07/01/2024]

Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023;3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]

Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023;480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]

Hamamsy T, Barot M, Morton JT, Steinegger M, Bonneau R, Cho K. Learning sequence, structure, and function representations of proteins with language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.26.568742. [PMID: 38045331 PMCID: PMC10690258 DOI: 10.1101/2023.11.26.568742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]

Jiao P, Wang B, Wang X, Liu B, Wang Y, Li J. Struct2GO: protein function prediction based on graph pooling algorithm and AlphaFold2 structure information. BIOINFORMATICS (OXFORD, ENGLAND) 2023;39:btad637. [PMID: 37847755 PMCID: PMC10612405 DOI: 10.1093/bioinformatics/btad637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 10/05/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023]

Zhang X, Guo H, Zhang F, Wang X, Wu K, Qiu S, Liu B, Wang Y, Hu Y, Li J. HNetGO: protein function prediction via heterogeneous network transformer. Brief Bioinform 2023;24:bbab556. [PMID: 37861172 PMCID: PMC10588005 DOI: 10.1093/bib/bbab556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/18/2021] [Accepted: 12/04/2021] [Indexed: 10/21/2023] Open

Wu J, Qing H, Ouyang J, Zhou J, Gao Z, Mason CE, Liu Z, Shi T. HiFun: homology independent protein function prediction by a novel protein-language self-attention model. Brief Bioinform 2023;24:bbad311. [PMID: 37649370 DOI: 10.1093/bib/bbad311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/31/2023] [Accepted: 08/08/2023] [Indexed: 09/01/2023] Open

Zhang Y, Yao S, Chen P. Prediction of hot spots towards drug discovery by protein sequence embedding with 1D convolutional neural network. PLoS One 2023;18:e0290899. [PMID: 37721924 PMCID: PMC10506709 DOI: 10.1371/journal.pone.0290899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 08/18/2023] [Indexed: 09/20/2023] Open

Sahoo BR, Bardwell JCA. SERF, a family of tiny highly conserved, highly charged proteins with enigmatic functions. FEBS J 2023;290:4150-4162. [PMID: 35694898 DOI: 10.1111/febs.16555] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 06/07/2022] [Accepted: 06/10/2022] [Indexed: 11/27/2022]

Zhang X, Wang L, Liu H, Zhang X, Liu B, Wang Y, Li J. Prot2GO: Predicting GO Annotations From Protein Sequences and Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:2772-2780. [PMID: 34971539 DOI: 10.1109/tcbb.2021.3139841] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Zhang F, Zhang Y, Zhu X, Chen X, Lu F, Zhang X. DeepSG2PPI: A Protein-Protein Interaction Prediction Method Based on Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:2907-2919. [PMID: 37079417 DOI: 10.1109/tcbb.2023.3268661] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Koelsch N, Manjili MH. From Reductionistic Approach to Systems Immunology Approach for the Understanding of Tumor Microenvironment. Int J Mol Sci 2023;24:12086. [PMID: 37569461 PMCID: PMC10419122 DOI: 10.3390/ijms241512086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/23/2023] [Accepted: 07/27/2023] [Indexed: 08/13/2023] Open

Chandra O, Sharma M, Pandey N, Jha IP, Mishra S, Kong SL, Kumar V. Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes. Comput Struct Biotechnol J 2023;21:3590-3603. [PMID: 37520281 PMCID: PMC10371796 DOI: 10.1016/j.csbj.2023.07.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/01/2023] Open

Cagiada M, Bottaro S, Lindemose S, Schenstrøm SM, Stein A, Hartmann-Petersen R, Lindorff-Larsen K. Discovering functionally important sites in proteins. Nat Commun 2023;14:4175. [PMID: 37443362 PMCID: PMC10345196 DOI: 10.1038/s41467-023-39909-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 07/15/2023] Open

Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open

Li H, Liu B. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo. PLoS Comput Biol 2023;19:e1011214. [PMID: 37339155 DOI: 10.1371/journal.pcbi.1011214] [Citation(s) in RCA: 55] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 05/24/2023] [Indexed: 06/22/2023] Open

Abstract

As the key for biological sequence structure and function prediction, disease diagnosis and treatment, biological sequence similarity analysis has attracted more and more attentions. However, the exiting computational methods failed to accurately analyse the biological sequence similarities because of the various data types (DNA, RNA, protein, disease, etc) and their low sequence similarities (remote homology). Therefore, new concepts and techniques are desired to solve this challenging problem. Biological sequences (DNA, RNA and protein sequences) can be considered as the sentences of "the book of life", and their similarities can be considered as the biological language semantics (BLS). In this study, we are seeking the semantics analysis techniques derived from the natural language processing (NLP) to comprehensively and accurately analyse the biological sequence similarities. 27 semantics analysis methods derived from NLP were introduced to analyse biological sequence similarities, bringing new concepts and techniques to biological sequence similarity analysis. Experimental results show that these semantics analysis methods are able to facilitate the development of protein remote homology detection, circRNA-disease associations identification and protein function annotation, achieving better performance than the other state-of-the-art predictors in the related fields. Based on these semantics analysis methods, a platform called BioSeq-Diabolo has been constructed, which is named after a popular traditional sport in China. The users only need to input the embeddings of the biological sequence data. BioSeq-Diabolo will intelligently identify the task, and then accurately analyse the biological sequence similarities based on biological language semantics. BioSeq-Diabolo will integrate different biological sequence similarities in a supervised manner by using Learning to Rank (LTR), and the performance of the constructed methods will be evaluated and analysed so as to recommend the best methods for the users. The web server and stand-alone package of BioSeq-Diabolo can be accessed at http://bliulab.net/BioSeq-Diabolo/server/.

Collapse

Oliveira GB, Pedrini H, Dias Z. TEMPROT: protein function annotation using transformers embeddings and homology search. BMC Bioinformatics 2023;24:242. [PMID: 37291492 DOI: 10.1186/s12859-023-05375-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open

Abstract

BACKGROUND

Although the development of sequencing technologies has provided a large number of protein sequences, the analysis of functions that each one plays is still difficult due to the efforts of laboratorial methods, making necessary the usage of computational methods to decrease this gap. As the main source of information available about proteins is their sequences, approaches that can use this information, such as classification based on the patterns of the amino acids and the inference based on sequence similarity using alignment tools, are able to predict a large collection of proteins. The methods available in the literature that use this type of feature can achieve good results, however, they present restrictions of protein length as input to their models. In this work, we present a new method, called TEMPROT, based on the fine-tuning and extraction of embeddings from an available architecture pre-trained on protein sequences. We also describe TEMPROT+, an ensemble between TEMPROT and BLASTp, a local alignment tool that analyzes sequence similarity, which improves the results of our former approach.

RESULTS

The evaluation of our proposed classifiers with the literature approaches has been conducted on our dataset, which was derived from CAFA3 challenge database. Both TEMPROT and TEMPROT+ achieved competitive results on [Formula: see text], [Formula: see text], AuPRC and IAuPRC metrics on Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) ontologies compared to state-of-the-art models, with the main results equal to 0.581, 0.692 and 0.662 of [Formula: see text] on BP, CC and MF, respectively.

CONCLUSIONS

The comparison with the literature showed that our model presented competitive results compared the state-of-the-art approaches considering the amino acid sequence pattern recognition and homology analysis. Our model also presented improvements related to the input size that the model can use to train compared to the literature methods.

Collapse

Ansari M, White AD. Learning Peptide Properties with Positive Examples Only. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.01.543289. [PMID: 37333233 PMCID: PMC10274696 DOI: 10.1101/2023.06.01.543289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]

Wang Z, Deng Z, Zhang W, Lou Q, Choi KS, Wei Z, Wang L, Wu J. MMSMAPlus: a multi-view multi-scale multi-attention embedding model for protein function prediction. Brief Bioinform 2023:7187109. [PMID: 37258453 DOI: 10.1093/bib/bbad201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 04/16/2023] [Accepted: 05/08/2023] [Indexed: 06/02/2023] Open

Maranga M, Szczerbiak P, Bezshapkin V, Gligorijevic V, Chandler C, Bonneau R, Xavier RJ, Vatanen T, Kosciolek T. Comprehensive Functional Annotation of Metagenomes and Microbial Genomes Using a Deep Learning-Based Method. mSystems 2023;8:e0117822. [PMID: 37010293 PMCID: PMC10134832 DOI: 10.1128/msystems.01178-22] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 02/06/2023] [Indexed: 04/04/2023] Open

Abstract

Comprehensive protein function annotation is essential for understanding microbiome-related disease mechanisms in the host organisms. However, a large portion of human gut microbial proteins lack functional annotation. Here, we have developed a new metagenome analysis workflow integrating de novo genome reconstruction, taxonomic profiling, and deep learning-based functional annotations from DeepFRI. This is the first approach to apply deep learning-based functional annotations in metagenomics. We validate DeepFRI functional annotations by comparing them to orthology-based annotations from eggNOG on a set of 1,070 infant metagenomes from the DIABIMMUNE cohort. Using this workflow, we generated a sequence catalogue of 1.9 million nonredundant microbial genes. The functional annotations revealed 70% concordance between Gene Ontology annotations predicted by DeepFRI and eggNOG. DeepFRI improved the annotation coverage, with 99% of the gene catalogue obtaining Gene Ontology molecular function annotations, although they are less specific than those from eggNOG. Additionally, we constructed pangenomes in a reference-free manner using high-quality metagenome-assembled genomes (MAGs) and analyzed the associated annotations. eggNOG annotated more genes on well-studied organisms, such as Escherichia coli, while DeepFRI was less sensitive to taxa. Further, we show that DeepFRI provides additional annotations in comparison to the previous DIABIMMUNE studies. This workflow will contribute to novel understanding of the functional signature of the human gut microbiome in health and disease as well as guiding future metagenomics studies. IMPORTANCE The past decade has seen advancement in high-throughput sequencing technologies resulting in rapid accumulation of genomic data from microbial communities. While this growth in sequence data and gene discovery is impressive, the majority of microbial gene functions remain uncharacterized. The coverage of functional information coming from either experimental sources or inferences is low. To solve these challenges, we have developed a new workflow to computationally assemble microbial genomes and annotate the genes using a deep learning-based model DeepFRI. This improved microbial gene annotation coverage to 1.9 million metagenome-assembled genes, representing 99% of the assembled genes, which is a significant improvement compared to 12% Gene Ontology term annotation coverage by commonly used orthology-based approaches. Importantly, the workflow supports pangenome reconstruction in a reference-free manner, allowing us to analyze the functional potential of individual bacterial species. We therefore propose this alternative approach combining deep-learning functional predictions with the commonly used orthology-based annotations as one that could help us uncover novel functions observed in metagenomic microbiome studies.

Collapse

Hoang VT, Jeon HJ, You ES, Yoon Y, Jung S, Lee OJ. Graph Representation Learning and Its Applications: A Survey. SENSORS (BASEL, SWITZERLAND) 2023;23:4168. [PMID: 37112507 PMCID: PMC10144941 DOI: 10.3390/s23084168] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/16/2023] [Accepted: 04/17/2023] [Indexed: 06/19/2023]

Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023;14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open

Affiliation(s)

Maha A. Thafar Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
Somayah Albaradei Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Mahmut Uludag Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Mona Alshahrani National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
Takashi Gojobori Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Magbubah Essack Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia *Correspondence: Xin Gao, ; Magbubah Essack,
Xin Gao Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia *Correspondence: Xin Gao, ; Magbubah Essack,

Collapse

Wu Z, Guo M, Jin X, Chen J, Liu B. CFAGO: cross-fusion of network and attributes based on attention mechanism for protein function prediction. Bioinformatics 2023;39:7072461. [PMID: 36883697 PMCID: PMC10032634 DOI: 10.1093/bioinformatics/btad123] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/28/2023] [Accepted: 03/05/2023] [Indexed: 03/09/2023] Open

Abstract

MOTIVATION

Protein function annotation is fundamental to understanding biological mechanisms. The abundant genome-scale protein-protein interaction (PPI) networks, together with other protein biological attributes, provide rich information for annotating protein functions. As PPI networks and biological attributes describe protein functions from different perspectives, it is highly challenging to cross-fuse them for protein function prediction. Recently, several methods combine the PPI networks and protein attributes via the graph neural networks (GNNs). However, GNNs may inherit or even magnify the bias caused by noisy edges in PPI networks. Besides, GNNs with stacking of many layers may cause the over-smoothing problem of node representations.

RESULTS

We develop a novel protein function prediction method, CFAGO, to integrate single-species PPI networks and protein biological attributes via a multi-head attention mechanism. CFAGO is first pre-trained with an encoder-decoder architecture to capture the universal protein representation of the two sources. It is then fine-tuned to learn more effective protein representations for protein function prediction. Benchmark experiments on human and mouse datasets show CFAGO outperforms state-of-the-art single-species network-based methods by at least 7.59%, 6.90%, 11.68% in terms of m-AUPR, M-AUPR, and Fmax, respectively, demonstrating cross-fusion by multi-head attention mechanism can greatly improve the protein function prediction. We further evaluate the quality of captured protein representations in terms of Davies Bouldin Score, whose results show that cross-fused protein representations by multi-head attention mechanism are at least 2.7% better than that of original and concatenated representations. We believe CFAGO is an effective tool for protein function prediction.

AVAILABILITY AND IMPLEMENTATION

The source code of CFAGO and experiments data are available at: http://bliulab.net/CFAGO/.

Collapse

Li M, Shi W, Zhang F, Zeng M, Li Y. A Deep Learning Framework for Predicting Protein Functions With Co-Occurrence of GO Terms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:833-842. [PMID: 35476573 DOI: 10.1109/tcbb.2022.3170719] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Computational prediction of disordered binding regions. Comput Struct Biotechnol J 2023;21:1487-1497. [PMID: 36851914 PMCID: PMC9957716 DOI: 10.1016/j.csbj.2023.02.018] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open

Ranjan A, Fahad MS, Fernandez-Baca D, Tripathi S, Deepak A. MCWS-Transformers: Towards an Efficient Modeling of Protein Sequences via Multi Context-Window Based Scaled Self-Attention. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1188-1199. [PMID: 35536815 DOI: 10.1109/tcbb.2022.3173789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Pan T, Li C, Bi Y, Wang Z, Gasser RB, Purcell AW, Akutsu T, Webb GI, Imoto S, Song J. PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships. Bioinformatics 2023;39:7043095. [PMID: 36794913 PMCID: PMC9978587 DOI: 10.1093/bioinformatics/btad094] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 02/10/2023] [Accepted: 02/15/2023] [Indexed: 02/17/2023] Open

Yan TC, Yue ZX, Xu HQ, Liu YH, Hong YF, Chen GX, Tao L, Xie T. A systematic review of state-of-the-art strategies for machine learning-based protein function prediction. Comput Biol Med 2023;154:106446. [PMID: 36680931 DOI: 10.1016/j.compbiomed.2022.106446] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/07/2022] [Accepted: 12/19/2022] [Indexed: 12/24/2022]

Sanderson T, Bileschi ML, Belanger D, Colwell LJ. ProteInfer, deep neural networks for protein functional inference. eLife 2023;12:e80942. [PMID: 36847334 PMCID: PMC10063232 DOI: 10.7554/elife.80942] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 02/24/2023] [Indexed: 03/01/2023] Open

Gao Z, Jiang C, Zhang J, Jiang X, Li L, Zhao P, Yang H, Huang Y, Li J. Hierarchical graph learning for protein-protein interaction. Nat Commun 2023;14:1093. [PMID: 36841846 PMCID: PMC9968329 DOI: 10.1038/s41467-023-36736-1] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 02/14/2023] [Indexed: 02/27/2023] Open

Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023;18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open

Liu J, Tang X, Guan X. Grain protein function prediction based on self-attention mechanism and bidirectional LSTM. Brief Bioinform 2023;24:6886418. [PMID: 36567619 DOI: 10.1093/bib/bbac493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 10/13/2022] [Accepted: 10/18/2022] [Indexed: 12/27/2022] Open

Investigation of the Molecular Evolution of Treg Suppression Mechanisms Indicates a Convergent Origin. Curr Issues Mol Biol 2023;45:628-648. [PMID: 36661528 PMCID: PMC9857879 DOI: 10.3390/cimb45010042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/05/2023] [Accepted: 01/06/2023] [Indexed: 01/12/2023] Open

Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva N, Pyysalo S, Bork P, Jensen L, von Mering C. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res 2023;51:D638-D646. [PMID: 36370105 PMCID: PMC9825434 DOI: 10.1093/nar/gkac1000] [Citation(s) in RCA: 3009] [Impact Index Per Article: 1504.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/19/2022] [Indexed: 11/13/2022] Open

Affiliation(s)

Damian Szklarczyk Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Rebecca Kirsch Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
Mikaela Koutrouli Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
Katerina Nastou Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
Farrokh Mehryary TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
Radja Hachilif Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Annika L Gable Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Tao Fang Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Nadezhda T Doncheva Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
Sampo Pyysalo TurkuNLP lab, Department of Computing, University of Turku, 20014 Turku, Finland
Peer Bork Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany Yonsei Frontier Lab (YFL), Yonsei University, Seoul 03722, South Korea Max Delbrück Centre for Molecular Medicine, 13125 Berlin, Germany Department of Bioinformatics, Biozentrum, University of Würzburg, 97074 Würzburg, Germany
Lars J Jensen Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen N, Denmark
Christian von Mering Department of Molecular Life Sciences, University of Zurich, 8057 Zurich, Switzerland SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland

Collapse

Ranjan A, Tiwari A, Deepak A. A Sub-Sequence Based Approach to Protein Function Prediction via Multi-Attention Based Multi-Aspect Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:94-105. [PMID: 34826296 DOI: 10.1109/tcbb.2021.3130923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Tharmakulasingam M, Gardner B, La Ragione R, Fernando A. Rectified Classifier Chains for Prediction of Antibiotic Resistance From Multi-Labelled Data With Missing Labels. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:625-636. [PMID: 35130168 DOI: 10.1109/tcbb.2022.3148577] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Sharma L, Deepak A, Ranjan A, Krishnasamy G. A novel hybrid CNN and BiGRU-Attention based deep learning model for protein function prediction. Stat Appl Genet Mol Biol 2023;22:sagmb-2022-0057. [PMID: 37658681 DOI: 10.1515/sagmb-2022-0057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 04/20/2023] [Indexed: 09/03/2023]

100

Sarker B, Khare N, Devignes MD, Aridhi S. Improving automatic GO annotation with semantic similarity. BMC Bioinformatics 2022;23:433. [PMID: 36510133 PMCID: PMC9743508 DOI: 10.1186/s12859-022-04958-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Accepted: 09/19/2022] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

Automatic functional annotation of proteins is an open research problem in bioinformatics. The growing number of protein entries in public databases, for example in UniProtKB, poses challenges in manual functional annotation. Manual annotation requires expert human curators to search and read related research articles, interpret the results, and assign the annotations to the proteins. Thus, it is a time-consuming and expensive process. Therefore, designing computational tools to perform automatic annotation leveraging the high quality manual annotations that already exist in UniProtKB/SwissProt is an important research problem RESULTS: In this paper, we extend and adapt the GrAPFI (graph-based automatic protein function inference) (Sarker et al. in BMC Bioinform 21, 2020; Sarker et al., in: Proceedings of 7th international conference on complex networks and their applications, Cambridge, 2018) method for automatic annotation of proteins with gene ontology (GO) terms renaming it as GrAPFI-GO. The original GrAPFI method uses label propagation in a similarity graph where proteins are linked through the domains, families, and superfamilies that they share. Here, we also explore various types of similarity measures based on common neighbors in the graph. Moreover, GO terms are arranged in a hierarchical manner according to semantic parent-child relations. Therefore, we propose an efficient pruning and post-processing technique that integrates both semantic similarity and hierarchical relations between the GO terms. We produce experimental results comparing the GrAPFI-GO method with and without considering common neighbors similarity. We also test the performance of GrAPFI-GO and other annotation tools for GO annotation on a benchmark of proteins with and without the proposed pruning and post-processing procedure.

CONCLUSION

Our results show that the proposed semantic hierarchical post-processing potentially improves the performance of GrAPFI-GO and of other annotation tools as well. Thus, GrAPFI-GO exposes an original efficient and reusable procedure, to exploit the semantic relations among the GO terms in order to improve the automatic annotation of protein functions.

Collapse