Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhou N, Jiang Y, Bergquist TR, Lee AJ, Kacsoh BZ, Crocker AW, Lewis KA, Georghiou G, Nguyen HN, Hamid MN, Davis L, Dogan T, Atalay V, Rifaioglu AS, Dalkıran A, Cetin Atalay R, Zhang C, Hurto RL, Freddolino PL, Zhang Y, Bhat P, Supek F, Fernández JM, Gemovic B, Perovic VR, Davidović RS, Sumonja N, Veljkovic N, Asgari E, Mofrad MRK, Profiti G, Savojardo C, Martelli PL, Casadio R, Boecker F, Schoof H, Kahanda I, Thurlby N, McHardy AC, Renaux A, Saidi R, Gough J, Freitas AA, Antczak M, Fabris F, Wass MN, Hou J, Cheng J, Wang Z, Romero AE, Paccanaro A, Yang H, Goldberg T, Zhao C, Holm L, Törönen P, Medlar AJ, Zosa E, Borukhov I, Novikov I, Wilkins A, Lichtarge O, Chi PH, Tseng WC, Linial M, Rose PW, Dessimoz C, Vidulin V, Dzeroski S, Sillitoe I, Das S, Lees JG, Jones DT, Wan C, Cozzetto D, Fa R, Torres M, Warwick Vesztrocy A, Rodriguez JM, Tress ML, Frasca M, Notaro M, Grossi G, Petrini A, Re M, Valentini G, Mesiti M, Roche DB, Reeb J, Ritchie DW, Aridhi S, Alborzi SZ, Devignes MD, Koo DCE, Bonneau R, Gligorijević V, Barot M, Fang H, Toppo S, Lavezzo E, Falda M, Berselli M, Tosatto SCE, Carraro M, Piovesan D, Ur Rehman H, Mao Q, Zhang S, Vucetic S, Black GS, Jo D, Suh E, Dayton JB, Larsen DJ, Omdahl AR, McGuffin LJ, Brackenridge DA, Babbitt PC, Yunes JM, Fontana P, Zhang F, Zhu S, You R, Zhang Z, Dai S, Yao S, Tian W, Cao R, Chandler C, Amezola M, Johnson D, Chang JM, Liao WH, Liu YW, Pascarelli S, Frank Y, Hoehndorf R, Kulmanov M, Boudellioua I, Politano G, Di Carlo S, Benso A, Hakala K, Ginter F, Mehryary F, Kaewphan S, Björne J, Moen H, Tolvanen MEE, Salakoski T, Kihara D, Jain A, Šmuc T, Altenhoff A, Ben-Hur A, Rost B, Brenner SE, Orengo CA, Jeffery CJ, Bosco G, Hogan DA, Martin MJ, O'Donovan C, Mooney SD, Greene CS, Radivojac P, Friedberg I. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol 2019;20:244. [PMID: 31744546 PMCID: PMC6864930 DOI: 10.1186/s13059-019-1835-8] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 09/24/2019] [Indexed: 12/23/2022] Open

For:	Zhou N, Jiang Y, Bergquist TR, Lee AJ, Kacsoh BZ, Crocker AW, Lewis KA, Georghiou G, Nguyen HN, Hamid MN, Davis L, Dogan T, Atalay V, Rifaioglu AS, Dalkıran A, Cetin Atalay R, Zhang C, Hurto RL, Freddolino PL, Zhang Y, Bhat P, Supek F, Fernández JM, Gemovic B, Perovic VR, Davidović RS, Sumonja N, Veljkovic N, Asgari E, Mofrad MRK, Profiti G, Savojardo C, Martelli PL, Casadio R, Boecker F, Schoof H, Kahanda I, Thurlby N, McHardy AC, Renaux A, Saidi R, Gough J, Freitas AA, Antczak M, Fabris F, Wass MN, Hou J, Cheng J, Wang Z, Romero AE, Paccanaro A, Yang H, Goldberg T, Zhao C, Holm L, Törönen P, Medlar AJ, Zosa E, Borukhov I, Novikov I, Wilkins A, Lichtarge O, Chi PH, Tseng WC, Linial M, Rose PW, Dessimoz C, Vidulin V, Dzeroski S, Sillitoe I, Das S, Lees JG, Jones DT, Wan C, Cozzetto D, Fa R, Torres M, Warwick Vesztrocy A, Rodriguez JM, Tress ML, Frasca M, Notaro M, Grossi G, Petrini A, Re M, Valentini G, Mesiti M, Roche DB, Reeb J, Ritchie DW, Aridhi S, Alborzi SZ, Devignes MD, Koo DCE, Bonneau R, Gligorijević V, Barot M, Fang H, Toppo S, Lavezzo E, Falda M, Berselli M, Tosatto SCE, Carraro M, Piovesan D, Ur Rehman H, Mao Q, Zhang S, Vucetic S, Black GS, Jo D, Suh E, Dayton JB, Larsen DJ, Omdahl AR, McGuffin LJ, Brackenridge DA, Babbitt PC, Yunes JM, Fontana P, Zhang F, Zhu S, You R, Zhang Z, Dai S, Yao S, Tian W, Cao R, Chandler C, Amezola M, Johnson D, Chang JM, Liao WH, Liu YW, Pascarelli S, Frank Y, Hoehndorf R, Kulmanov M, Boudellioua I, Politano G, Di Carlo S, Benso A, Hakala K, Ginter F, Mehryary F, Kaewphan S, Björne J, Moen H, Tolvanen MEE, Salakoski T, Kihara D, Jain A, Šmuc T, Altenhoff A, Ben-Hur A, Rost B, Brenner SE, Orengo CA, Jeffery CJ, Bosco G, Hogan DA, Martin MJ, O'Donovan C, Mooney SD, Greene CS, Radivojac P, Friedberg I. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol 2019;20:244. [PMID: 31744546 PMCID: PMC6864930 DOI: 10.1186/s13059-019-1835-8] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 09/24/2019] [Indexed: 12/23/2022] Open

Number

Cited by Other Article(s)

Zhang F, Naeem M, Yu B, Liu F, Ju J. Improving the enzymatic activity and stability of N-carbamoyl hydrolase using deep learning approach. Microb Cell Fact 2024;23:164. [PMID: 38834993 DOI: 10.1186/s12934-024-02439-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 05/24/2024] [Indexed: 06/06/2024] Open

Bennett JJR, Stern AD, Zhang X, Birtwistle MR, Pandey G. Low-frequency ERK and Akt activity dynamics are predictive of stochastic cell division events. NPJ Syst Biol Appl 2024;10:65. [PMID: 38834572 PMCID: PMC11150372 DOI: 10.1038/s41540-024-00389-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 05/20/2024] [Indexed: 06/06/2024] Open

Abstract

Understanding the dynamics of intracellular signaling pathways, such as ERK1/2 (ERK) and Akt1/2 (Akt), in the context of cell fate decisions is important for advancing our knowledge of cellular processes and diseases, particularly cancer. While previous studies have established associations between ERK and Akt activities and proliferative cell fate, the heterogeneity of single-cell responses adds complexity to this understanding. This study employed a data-driven approach to address this challenge, developing machine learning models trained on a dataset of growth factor-induced ERK and Akt activity time courses in single cells, to predict cell division events. The most predictive models were developed by applying discrete wavelet transforms (DWTs) to extract low-frequency features from the time courses, followed by using Ensemble Integration, a data integration and predictive modeling framework. The results demonstrated that these models effectively predicted cell division events in MCF10A cells (F-measure=0.524, AUC=0.726). ERK dynamics were found to be more predictive than Akt, but the combination of both measurements further enhanced predictive performance. The ERK model`s performance also generalized to predicting division events in RPE cells, indicating the potential applicability of these models and our data-driven methodology for predicting cell division across different biological contexts. Interpretation of these models suggested that ERK dynamics throughout the cell cycle, rather than immediately after growth factor stimulation, were associated with the likelihood of cell division. Overall, this work contributes insights into the predictive power of intra-cellular signaling dynamics for cell fate decisions, and highlights the potential of machine learning approaches in unraveling complex cellular behaviors.

Collapse

Urhan A, Cosma BM, Earl AM, Manson AL, Abeel T. SAFPred: synteny-aware gene function prediction for bacteria using protein embeddings. Bioinformatics 2024;40:btae328. [PMID: 38775729 PMCID: PMC11147799 DOI: 10.1093/bioinformatics/btae328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Revised: 04/08/2024] [Accepted: 05/21/2024] [Indexed: 06/04/2024] Open

Abstract

MOTIVATION

Today, we know the function of only a small fraction of the protein sequences predicted from genomic data. This problem is even more salient for bacteria, which represent some of the most phylogenetically and metabolically diverse taxa on Earth. This low rate of bacterial gene annotation is compounded by the fact that most function prediction algorithms have focused on eukaryotes, and conventional annotation approaches rely on the presence of similar sequences in existing databases. However, often there are no such sequences for novel bacterial proteins. Thus, we need improved gene function prediction methods tailored for bacteria. Recently, transformer-based language models-adopted from the natural language processing field-have been used to obtain new representations of proteins, to replace amino acid sequences. These representations, referred to as protein embeddings, have shown promise for improving annotation of eukaryotes, but there have been only limited applications on bacterial genomes.

RESULTS

To predict gene functions in bacteria, we developed SAFPred, a novel synteny-aware gene function prediction tool based on protein embeddings from state-of-the-art protein language models. SAFpred also leverages the unique operon structure of bacteria through conserved synteny. SAFPred outperformed both conventional sequence-based annotation methods and state-of-the-art methods on multiple bacterial species, including for distant homolog detection, where the sequence similarity to the proteins in the training set was as low as 40%. Using SAFPred to identify gene functions across diverse enterococci, of which some species are major clinical threats, we identified 11 previously unrecognized putative novel toxins, with potential significance to human and animal health.

AVAILABILITY AND IMPLEMENTATION

https://github.com/AbeelLab/safpred.

Collapse

Liu Y, Zhang Y, Chen Z, Peng J. POLAT: Protein function prediction based on soft mask graph network and residue-Label ATtention. Comput Biol Chem 2024;110:108064. [PMID: 38677014 DOI: 10.1016/j.compbiolchem.2024.108064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 01/19/2024] [Accepted: 03/26/2024] [Indexed: 04/29/2024]

Ulusoy E, Doğan T. Mutual annotation-based prediction of protein domain functions with Domain2GO. Protein Sci 2024;33:e4988. [PMID: 38757367 PMCID: PMC11099699 DOI: 10.1002/pro.4988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/25/2024] [Accepted: 03/30/2024] [Indexed: 05/18/2024]

Abstract

Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.

Collapse

Ofer D, Linial M. Automated annotation of disease subtypes. J Biomed Inform 2024;154:104650. [PMID: 38701887 DOI: 10.1016/j.jbi.2024.104650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 03/28/2024] [Accepted: 04/29/2024] [Indexed: 05/05/2024]

Ding K, Luo J, Luo Y. Leveraging conformal prediction to annotate enzyme function space with limited false positives. PLoS Comput Biol 2024;20:e1012135. [PMID: 38809942 DOI: 10.1371/journal.pcbi.1012135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 05/03/2024] [Indexed: 05/31/2024] Open

Althagafi A, Zhapa-Camacho F, Hoehndorf R. Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning. Bioinformatics 2024;40:btae301. [PMID: 38696757 PMCID: PMC11132820 DOI: 10.1093/bioinformatics/btae301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 04/05/2024] [Accepted: 04/30/2024] [Indexed: 05/04/2024] Open

Abstract

MOTIVATION

Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability.

RESULTS

We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information.

AVAILABILITY AND IMPLEMENTATION

EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.

Collapse

Armah-Sekum RE, Szedmak S, Rousu J. Protein function prediction through multi-view multi-label latent tensor reconstruction. BMC Bioinformatics 2024;25:174. [PMID: 38698340 PMCID: PMC11067221 DOI: 10.1186/s12859-024-05789-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Accepted: 04/17/2024] [Indexed: 05/05/2024] Open

Huang Z, Chen S, He K, Yu T, Fu J, Gao S, Li H. Exploring salt tolerance mechanisms using machine learning for transcriptomic insights: case study in Spartina alterniflora. HORTICULTURE RESEARCH 2024;11:uhae082. [PMID: 38766535 PMCID: PMC11101319 DOI: 10.1093/hr/uhae082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/12/2024] [Indexed: 05/22/2024]

Abstract

Salt stress poses a significant threat to global cereal crop production, emphasizing the need for a comprehensive understanding of salt tolerance mechanisms. Accurate functional annotations of differentially expressed genes are crucial for gaining insights into the salt tolerance mechanism. The challenge of predicting gene functions in under-studied species, especially when excluding infrequent GO terms, persists. Therefore, we proposed the use of NetGO 3.0, a machine learning-based annotation method that does not rely on homology information between species, to predict the functions of differentially expressed genes under salt stress. Spartina alterniflora, a halophyte with salt glands, exhibits remarkable salt tolerance, making it an excellent candidate for in-depth transcriptomic analysis. However, current research on the S. alterniflora transcriptome under salt stress is limited. In this study we used S. alterniflora as an example to investigate its transcriptional responses to various salt concentrations, with a focus on understanding its salt tolerance mechanisms. Transcriptomic analysis revealed substantial changes impacting key pathways, such as gene transcription, ion transport, and ROS metabolism. Notably, we identified a member of the SWEET gene family in S. alterniflora, SA_12G129900.m1, showing convergent selection with the rice ortholog SWEET15. Additionally, our genome-wide analyses explored alternative splicing responses to salt stress, providing insights into the parallel functions of alternative splicing and transcriptional regulation in enhancing salt tolerance in S. alterniflora. Surprisingly, there was minimal overlap between differentially expressed and differentially spliced genes following salt exposure. This innovative approach, combining transcriptomic analysis with machine learning-based annotation, avoids the reliance on homology information and facilitates the discovery of unknown gene functions, and is applicable across all sequenced species.

Collapse

Giri SJ, Ibtehaz N, Kihara D. GO2Sum: generating human-readable functional summary of proteins from GO terms. NPJ Syst Biol Appl 2024;10:29. [PMID: 38491038 PMCID: PMC10943200 DOI: 10.1038/s41540-024-00358-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open

Tavis S, Hettich RL. Multi-Omics integration can be used to rescue metabolic information for some of the dark region of the Pseudomonas putida proteome. BMC Genomics 2024;25:267. [PMID: 38468234 PMCID: PMC10926591 DOI: 10.1186/s12864-024-10082-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/02/2024] [Indexed: 03/13/2024] Open

Wenzel M, Grüner E, Strodthoff N. Insights into the inner workings of transformer models for protein function prediction. Bioinformatics 2024;40:btae031. [PMID: 38244570 PMCID: PMC10950482 DOI: 10.1093/bioinformatics/btae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/14/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open

Koutsandreas T, Felden B, Chevet E, Chatziioannou A. Protein homeostasis imprinting across evolution. NAR Genom Bioinform 2024;6:lqae014. [PMID: 38486886 PMCID: PMC10939379 DOI: 10.1093/nargab/lqae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 10/07/2023] [Accepted: 01/24/2024] [Indexed: 03/17/2024] Open

Zheng L, Shi S, Lu M, Fang P, Pan Z, Zhang H, Zhou Z, Zhang H, Mou M, Huang S, Tao L, Xia W, Li H, Zeng Z, Zhang S, Chen Y, Li Z, Zhu F. AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding. Genome Biol 2024;25:41. [PMID: 38303023 PMCID: PMC10832132 DOI: 10.1186/s13059-024-03166-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 01/05/2024] [Indexed: 02/03/2024] Open

Affiliation(s)

Lingyan Zheng College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China
Shuiyang Shi College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Mingkun Lu College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Pan Fang Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
Ziqi Pan College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Hongning Zhang College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Zhimeng Zhou College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Hanyu Zhang College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Minjie Mou College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Shijie Huang College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
Lin Tao Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, China
Weiqi Xia Pharmaceutical Department, Zhejiang Provincial People's Hospital, Hangzhou, 310014, China
Honglin Li School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
Zhenyu Zeng Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
Shun Zhang Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
Yuzong Chen State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China
Zhaorong Li Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China. Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
Feng Zhu College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China. Industry Solutions Research and Development, Alibaba Cloud Computing, Hangzhou, 330110, China. Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.

Collapse

Bonello J, Orengo C. FunPredCATH: An ensemble method for predicting protein function using CATH. BIOCHIMICA ET BIOPHYSICA ACTA. PROTEINS AND PROTEOMICS 2024;1872:140985. [PMID: 38122964 DOI: 10.1016/j.bbapap.2023.140985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/05/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023]

Abstract

MOTIVATION

The growth of unannotated proteins in UniProt increases at a very high rate every year due to more efficient sequencing methods. However, the experimental annotation of proteins is a lengthy and expensive process. Using computational techniques to narrow the search can speed up the process by providing highly specific Gene Ontology (GO) terms.

METHODOLOGY

We propose an ensemble approach that combines three generic base predictors that predict Gene Ontology (BP, CC and MF) terms from sequences across different species. We train our models on UniProtGOA annotation data and use the CATH domain resources to identify the protein families. We then calculate a score based on the prevalence of individual GO terms in the functional families that is then used as an indicator of confidence when assigning the GO term to an uncharacterised protein.

METHODS

In the ensemble, we use a statistics-based method that scores the occurrence of GO terms in a CATH FunFam against a background set of proteins annotated by the same GO term. We also developed a set-based method that uses Set Intersection and Set Union to score the occurrence of GO terms within the same CATH FunFam. Finally, we also use FunFams-Plus, a predictor method developed by the Orengo Group at UCL to predict GO terms for uncharacterised proteins in the CAFA3 challenge.

EVALUATION

We evaluated the methods against the CAFA3 benchmark and DomFun. We used the Precision, Recall and Fmax metrics and the benchmark datasets that are used in CAFA3 to evaluate our models and compare them to the CAFA3 results. Our results show that FunPredCATH compares well with top CAFA methods in the different ontologies and benchmarks.

CONTRIBUTIONS

FunPredCATH compares well with other prediction methods on CAFA3, and the ensemble approach outperforms the base methods. We show that non-IEA models obtain higher Fmax scores than the IEA counterparts, while the models including IEA annotations have higher coverage at the expense of a lower Fmax score.

Collapse

O'Meara MJ, Rapala JR, Nichols CB, Alexandre AC, Billmyre RB, Steenwyk JL, Alspaugh JA, O'Meara TR. CryptoCEN: A Co-Expression Network for Cryptococcus neoformans reveals novel proteins involved in DNA damage repair. PLoS Genet 2024;20:e1011158. [PMID: 38359090 PMCID: PMC10901339 DOI: 10.1371/journal.pgen.1011158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 02/28/2024] [Accepted: 01/30/2024] [Indexed: 02/17/2024] Open

Chica RA, Ferruz N. What does it take for an 'AlphaFold Moment' in functional protein engineering and design? Nat Biotechnol 2024;42:173-174. [PMID: 38361055 DOI: 10.1038/s41587-023-02120-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2024]

Gonzalez Pepe I, Chatelain Y, Kiar G, Glatard T. Numerical stability of DeepGOPlus inference. PLoS One 2024;19:e0296725. [PMID: 38285635 PMCID: PMC10824456 DOI: 10.1371/journal.pone.0296725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 12/16/2023] [Indexed: 01/31/2024] Open

Abstract

Convolutional neural networks (CNNs) are currently among the most widely-used deep neural network (DNN) architectures available and achieve state-of-the-art performance for many problems. Originally applied to computer vision tasks, CNNs work well with any data with a spatial relationship, besides images, and have been applied to different fields. However, recent works have highlighted numerical stability challenges in DNNs, which also relates to their known sensitivity to noise injection. These challenges can jeopardise their performance and reliability. This paper investigates DeepGOPlus, a CNN that predicts protein function. DeepGOPlus has achieved state-of-the-art performance and can successfully take advantage and annotate the abounding protein sequences emerging in proteomics. We determine the numerical stability of the model's inference stage by quantifying the numerical uncertainty resulting from perturbations of the underlying floating-point data. In addition, we explore the opportunity to use reduced-precision floating point formats for DeepGOPlus inference, to reduce memory consumption and latency. This is achieved by instrumenting DeepGOPlus' execution using Monte Carlo Arithmetic, a technique that experimentally quantifies floating point operation errors and VPREC, a tool that emulates results with customizable floating point precision formats. Focus is placed on the inference stage as it is the primary deliverable of the DeepGOPlus model, widely applicable across different environments. All in all, our results show that although the DeepGOPlus CNN is very stable numerically, it can only be selectively implemented with lower-precision floating-point formats. We conclude that predictions obtained from the pre-trained DeepGOPlus model are very reliable numerically, and use existing floating-point formats efficiently.

Collapse

Wang W, Shuai Y, Yang Q, Zhang F, Zeng M, Li M. A comprehensive computational benchmark for evaluating deep learning-based protein function prediction approaches. Brief Bioinform 2024;25:bbae050. [PMID: 38388682 PMCID: PMC10883809 DOI: 10.1093/bib/bbae050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/17/2024] [Accepted: 01/26/2024] [Indexed: 02/24/2024] Open

Li W, Wang B, Dai J, Kou Y, Chen X, Pan Y, Hu S, Xu ZZ. Partial order relation-based gene ontology embedding improves protein function prediction. Brief Bioinform 2024;25:bbae077. [PMID: 38446740 PMCID: PMC10917077 DOI: 10.1093/bib/bbae077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 01/22/2024] [Indexed: 03/08/2024] Open

Andrade B, Chen A, Gilson MK. Host-guest systems for the SAMPL9 blinded prediction challenge: phenothiazine as a privileged scaffold for binding to cyclodextrins. Phys Chem Chem Phys 2024;26:2035-2043. [PMID: 38126539 PMCID: PMC10832227 DOI: 10.1039/d3cp05347d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]

Abstract

Model systems are widely used in biology and chemistry to gain insight into more complex systems. In the field of computational chemistry, researchers use host-guest systems, relatively simple exemplars of noncovalent binding, to train and test the computational methods used in drug discovery. Indeed, host-guest systems have been developed to support the community-wide blinded SAMPL prediction challenges for over a decade. While seeking new host-guest systems for the recent SAMPL9 binding prediction challenge, which is the focus of the present PCCP Themed Collection, we identified phenothiazine as a privileged scaffold for guests of β cyclodextrin (βCD) and its derivatives. Building on this observation, we used calorimetry and NMR spectroscopy to characterize the noncovalent association of native βCD and three methylated derivatives of βCD with five phenothiazine drugs. The strongest association observed, that of thioridazine and one of the methyl derivatives, exceeds the well-known high affinity of rimantidine with βCD. Intriguingly, however, methylation of βCD at the 3 position abolished detectible binding for all of the drugs studied. The dataset has a clear pattern of entropy-enthalpy compensation. The NMR data show that all of the drugs position at least one aromatic proton at the secondary face of the CD, and most also show evidence of deep penetration of the binding site. The results of this study were used in the SAMPL9 blinded binding affinity-prediction challenge, which are detailed in accompanying papers of the present Themed Collection. These data also open the phenothiazines and, potentially, chemically similar drugs, such as the tricyclic antidepressants, as relatively potent binders of βCD, setting the stage for future SAMPL challenge datasets and for possible applications as drug reversal agents.

Collapse

Aspromonte MC, Nugnes MV, Quaglia F, Bouharoua A, Tosatto SCE, Piovesan D. DisProt in 2024: improving function annotation of intrinsically disordered proteins. Nucleic Acids Res 2024;52:D434-D441. [PMID: 37904585 PMCID: PMC10767923 DOI: 10.1093/nar/gkad928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 11/01/2023] Open

Bergquist T, Schaffter T, Yan Y, Yu T, Prosser J, Gao J, Chen G, Charzewski Ł, Nawalany Z, Brugere I, Retkute R, Prusokas A, Prusokas A, Choi Y, Lee S, Choe J, Lee I, Kim S, Kang J, Mooney SD, Guinney J. Evaluation of crowdsourced mortality prediction models as a framework for assessing artificial intelligence in medicine. J Am Med Inform Assoc 2023;31:35-44. [PMID: 37604111 PMCID: PMC10746301 DOI: 10.1093/jamia/ocad159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 07/05/2023] [Accepted: 08/08/2023] [Indexed: 08/23/2023] Open

Affiliation(s)

Timothy Bergquist Sage Bionetworks, Seattle, WA, United States Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
Thomas Schaffter Sage Bionetworks, Seattle, WA, United States
Yao Yan Sage Bionetworks, Seattle, WA, United States Molecular Engineering and Sciences Institute, University of Washington, Seattle, WA, United States
Thomas Yu Sage Bionetworks, Seattle, WA, United States
Justin Prosser Institute of Translational Health Sciences, University of Washington, Seattle, WA, United States
Jifan Gao Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
Guanhua Chen Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, United States
Łukasz Charzewski Proacta, Warsaw, Poland Division of Biophysics, University of Warsaw, Warsaw, Poland
Zofia Nawalany Proacta, Warsaw, Poland
Ivan Brugere Department of Computer Science, University of Illinois at Chicago, Chicago, IL, United States
Renata Retkute Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
Alidivinas Prusokas Plant and Molecular Sciences, School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, United Kingdom
Augustinas Prusokas Department of Life Sciences, Imperial College London, London, United Kingdom
Yonghwa Choi Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Sanghoon Lee Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Junseok Choe Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Inggeol Lee Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
Sunkyu Kim Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea
Jaewoo Kang Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul, Republic of Korea Department of Interdisciplinary Program in Bioinformatics, College of Informatics, Korea University, Seoul, Republic of Korea
Sean D Mooney Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
Justin Guinney Sage Bionetworks, Seattle, WA, United States Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States

Collapse

Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, Rollins N, Shaw A, Weitzman R, Frazer J, Dias M, Franceschi D, Orenbuch R, Gal Y, Marks DS. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570727. [PMID: 38106144 PMCID: PMC10723403 DOI: 10.1101/2023.12.07.570727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Chen J, Gu Z, Lai L, Pei J. In silico protein function prediction: the rise of machine learning-based approaches. MEDICAL REVIEW (2021) 2023;3:487-510. [PMID: 38282798 PMCID: PMC10808870 DOI: 10.1515/mr-2023-0038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 10/11/2023] [Indexed: 01/30/2024]

Ancarola ME, Maldonado LL, García LCA, Franchini GR, Mourglia-Ettlin G, Kamenetzky L, Cucher MA. A Comparative Analysis of the Protein Cargo of Extracellular Vesicles from Helminth Parasites. Life (Basel) 2023;13:2286. [PMID: 38137887 PMCID: PMC10744797 DOI: 10.3390/life13122286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 11/15/2023] [Accepted: 11/23/2023] [Indexed: 12/24/2023] Open

Affiliation(s)

María Eugenia Ancarola Department of Microbiology, School of Medicine, University of Buenos Aires, Buenos Aires C1121, Argentina; (M.E.A.); (L.L.M.) Institute of Research on Microbiology and Medical Parasitology (IMPaM, UBA-CONICET), University of Buenos Aires, Buenos Aires C1121, Argentina
Lucas L. Maldonado Department of Microbiology, School of Medicine, University of Buenos Aires, Buenos Aires C1121, Argentina; (M.E.A.); (L.L.M.) Institute of Research on Microbiology and Medical Parasitology (IMPaM, UBA-CONICET), University of Buenos Aires, Buenos Aires C1121, Argentina Instituto de Tecnología (INTEC), Universidad Argentina de la Empresa (UADE), Buenos Aires C1073, Argentina
Lucía C. A. García Department of Microbiology, School of Medicine, University of Buenos Aires, Buenos Aires C1121, Argentina; (M.E.A.); (L.L.M.) Institute of Research on Microbiology and Medical Parasitology (IMPaM, UBA-CONICET), University of Buenos Aires, Buenos Aires C1121, Argentina
Gisela R. Franchini Instituto de Investigaciones Bioquímicas de La Plata (INIBIOLP), Facultad de Ciencias Médicas, Universidad Nacional de La Plata (UNLP)-Consejo Nacional de Investigaciones Científicas Y Técnicas (CONICET), La Plata B1900, Argentina; Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas, Universidad Nacional de La Plata (UNLP), La Plata B1900, Argentina
Gustavo Mourglia-Ettlin Área Inmunología, Departamento de Biociencias, Facultad de Química, Universidad de la República, Montevideo 11800, Uruguay;
Laura Kamenetzky Instituto de Biociencias, Biotecnología y Biología Traslacional, Departamento de Fisiología y Biología Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires C1428, Argentina;
Marcela A. Cucher Department of Microbiology, School of Medicine, University of Buenos Aires, Buenos Aires C1121, Argentina; (M.E.A.); (L.L.M.) Institute of Research on Microbiology and Medical Parasitology (IMPaM, UBA-CONICET), University of Buenos Aires, Buenos Aires C1121, Argentina

Collapse

Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023;480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]

Hamamsy T, Barot M, Morton JT, Steinegger M, Bonneau R, Cho K. Learning sequence, structure, and function representations of proteins with language models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.26.568742. [PMID: 38045331 PMCID: PMC10690258 DOI: 10.1101/2023.11.26.568742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]

Urhan A, Cosma BM, Earl AM, Manson AL, Abeel T. SAP: Synteny-aware gene function prediction for bacteria using protein embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.02.539034. [PMID: 37205418 PMCID: PMC10187222 DOI: 10.1101/2023.05.02.539034] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]

Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP allows protein function prediction using function-aware domain embedding representations. Commun Biol 2023;6:1103. [PMID: 37907681 PMCID: PMC10618451 DOI: 10.1038/s42003-023-05476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open

Xu D, Yang Y, Gong D, Chen X, Jin K, Jiang H, Yu W, Li J, Zhang J, Pan W. GFAP: ultrafast and accurate gene functional annotation software for plants. PLANT PHYSIOLOGY 2023;193:1745-1748. [PMID: 37403633 DOI: 10.1093/plphys/kiad393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 06/20/2023] [Accepted: 06/20/2023] [Indexed: 07/06/2023]

Affiliation(s)

Dong Xu Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, Zhejiang 311300, China
Yingxue Yang Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
Desheng Gong Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
Xiaojian Chen School of Computer and Electronic Information/School of Artificial Intelligence, Nanjing Normal University, Nanjing 210023, China
Kangming Jin State Key Laboratory of Plant Physiology and Biochemistry, College of Life Science, Zhejiang University, Hangzhou 310058, China
Heling Jiang Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
Wenjuan Yu Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
Jihong Li College of Forestry, Shandong Agricultural University, Tai'an, Shandong 271018, China
Jin Zhang State Key Laboratory of Subtropical Silviculture, College of Forestry and Biotechnology, Zhejiang A&F University, Hangzhou, Zhejiang 311300, China
Weihua Pan Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China

Collapse

Buton N, Coste F, Le Cunff Y. Predicting enzymatic function of protein sequences with attention. Bioinformatics 2023;39:btad620. [PMID: 37874958 PMCID: PMC10612403 DOI: 10.1093/bioinformatics/btad620] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 09/11/2023] [Accepted: 10/22/2023] [Indexed: 10/26/2023] Open

Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023;12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open

Zhang X, Guo H, Zhang F, Wang X, Wu K, Qiu S, Liu B, Wang Y, Hu Y, Li J. HNetGO: protein function prediction via heterogeneous network transformer. Brief Bioinform 2023;24:bbab556. [PMID: 37861172 PMCID: PMC10588005 DOI: 10.1093/bib/bbab556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/18/2021] [Accepted: 12/04/2021] [Indexed: 10/21/2023] Open

Wu J, Qing H, Ouyang J, Zhou J, Gao Z, Mason CE, Liu Z, Shi T. HiFun: homology independent protein function prediction by a novel protein-language self-attention model. Brief Bioinform 2023;24:bbad311. [PMID: 37649370 DOI: 10.1093/bib/bbad311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/31/2023] [Accepted: 08/08/2023] [Indexed: 09/01/2023] Open

Pividori M, Lu S, Li B, Su C, Johnson ME, Wei WQ, Feng Q, Namjou B, Kiryluk K, Kullo IJ, Luo Y, Sullivan BD, Voight BF, Skarke C, Ritchie MD, Grant SFA, Greene CS. Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms. Nat Commun 2023;14:5562. [PMID: 37689782 PMCID: PMC10492839 DOI: 10.1038/s41467-023-41057-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2021] [Accepted: 08/18/2023] [Indexed: 09/11/2023] Open

Affiliation(s)

Milton Pividori Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA
Sumei Lu Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
Binglan Li Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
Chun Su Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
Matthew E Johnson Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
Wei-Qi Wei Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Qiping Feng Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Bahram Namjou Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229, USA
Krzysztof Kiryluk Department of Medicine, Division of Nephrology, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, 10032, USA
Iftikhar J Kullo Mayo Clinic, Rochester, MN, 55905, USA
Yuan Luo Northwestern University, Chicago, IL, 60611, USA
Blair D Sullivan Kahlert School of Computing, University of Utah, Salt Lake City, UT, 84112, USA
Benjamin F Voight Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
Carsten Skarke Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
Marylyn D Ritchie Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
Struan F A Grant Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA Division of Endocrinology and Diabetes, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
Casey S Greene Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, 80045, USA. Center for Health AI, University of Colorado School of Medicine, Aurora, CO, 80045, USA.

Collapse

Bianca F, Ispano E, Gazzola E, Lavezzo E, Fontana P, Toppo S. FunTaxIS-lite: a simple and light solution to investigate protein functions in all living organisms. Bioinformatics 2023;39:btad549. [PMID: 37672040 PMCID: PMC10500080 DOI: 10.1093/bioinformatics/btad549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 07/27/2023] [Accepted: 09/05/2023] [Indexed: 09/07/2023] Open

Zhang X, Wang L, Liu H, Zhang X, Liu B, Wang Y, Li J. Prot2GO: Predicting GO Annotations From Protein Sequences and Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:2772-2780. [PMID: 34971539 DOI: 10.1109/tcbb.2021.3139841] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554486. [PMID: 37662252 PMCID: PMC10473699 DOI: 10.1101/2023.08.23.554486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]

O’Meara MJ, Rapala JR, Nichols CB, Alexandre C, Billmyre RB, Steenwyk JL, Alspaugh JA, O’Meara TR. CryptoCEN: A Co-Expression Network for Cryptococcus neoformans reveals novel proteins involved in DNA damage repair. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.17.553567. [PMID: 37645941 PMCID: PMC10462067 DOI: 10.1101/2023.08.17.553567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]

Dennler O, Coste F, Blanquart S, Belleannée C, Théret N. Phylogenetic inference of the emergence of sequence modules and protein-protein interactions in the ADAMTS-TSL family. PLoS Comput Biol 2023;19:e1011404. [PMID: 37651409 PMCID: PMC10499240 DOI: 10.1371/journal.pcbi.1011404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 09/13/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open

Jeffery CJ. Current successes and remaining challenges in protein function prediction. FRONTIERS IN BIOINFORMATICS 2023;3:1222182. [PMID: 37576715 PMCID: PMC10415035 DOI: 10.3389/fbinf.2023.1222182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 07/03/2023] [Indexed: 08/15/2023] Open

Ribone AI, Fass M, Gonzalez S, Lia V, Paniego N, Rivarola M. Co-Expression Networks in Sunflower: Harnessing the Power of Multi-Study Transcriptomic Public Data to Identify and Categorize Candidate Genes for Fungal Resistance. PLANTS (BASEL, SWITZERLAND) 2023;12:2767. [PMID: 37570920 PMCID: PMC10421300 DOI: 10.3390/plants12152767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 07/19/2023] [Accepted: 07/21/2023] [Indexed: 08/13/2023]

Abstract

Fungal plant diseases are a major threat to food security worldwide. Current efforts to identify and list loci involved in different biological processes are more complicated than originally thought, even when complete genome assemblies are available. Despite numerous experimental and computational efforts to characterize gene functions in plants, about ~40% of protein-coding genes in the model plant Arabidopsis thaliana L. are still not categorized in the Gene Ontology (GO) Biological Process (BP) annotation. In non-model organisms, such as sunflower (Helianthus annuus L.), the number of BP term annotations is far fewer, ~22%. In the current study, we performed gene co-expression network analysis using eight terabytes of public transcriptome datasets and expression-based functional prediction to categorize and identify loci involved in the response to fungal pathogens. We were able to construct a reference gene network of healthy green tissue (GreenGCN) and a gene network of healthy and stressed root tissues (RootGCN). Both networks achieved robust, high-quality scores on the metrics of guilt-by-association and selective constraints versus gene connectivity. We were able to identify eight modules enriched in defense functions, of which two out of the three modules in the RootGCN were also conserved in the GreenGCN, suggesting similar defense-related expression patterns. We identified 16 WRKY genes involved in defense related functions and 65 previously uncharacterized loci now linked to defense response. In addition, we identified and classified 122 loci previously identified within QTLs or near candidate loci reported in GWAS studies of disease resistance in sunflower linked to defense response. All in all, we have implemented a valuable strategy to better describe genes within specific biological processes.

Collapse

Chandra O, Sharma M, Pandey N, Jha IP, Mishra S, Kong SL, Kumar V. Patterns of transcription factor binding and epigenome at promoters allow interpretable predictability of multiple functions of non-coding and coding genes. Comput Struct Biotechnol J 2023;21:3590-3603. [PMID: 37520281 PMCID: PMC10371796 DOI: 10.1016/j.csbj.2023.07.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/01/2023] Open

Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open

Boadu F, Cao H, Cheng J. Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function. Bioinformatics 2023;39:i318-i325. [PMID: 37387145 DOI: 10.1093/bioinformatics/btad208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open

Li H, Liu B. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo. PLoS Comput Biol 2023;19:e1011214. [PMID: 37339155 DOI: 10.1371/journal.pcbi.1011214] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 05/24/2023] [Indexed: 06/22/2023] Open

Abstract

As the key for biological sequence structure and function prediction, disease diagnosis and treatment, biological sequence similarity analysis has attracted more and more attentions. However, the exiting computational methods failed to accurately analyse the biological sequence similarities because of the various data types (DNA, RNA, protein, disease, etc) and their low sequence similarities (remote homology). Therefore, new concepts and techniques are desired to solve this challenging problem. Biological sequences (DNA, RNA and protein sequences) can be considered as the sentences of "the book of life", and their similarities can be considered as the biological language semantics (BLS). In this study, we are seeking the semantics analysis techniques derived from the natural language processing (NLP) to comprehensively and accurately analyse the biological sequence similarities. 27 semantics analysis methods derived from NLP were introduced to analyse biological sequence similarities, bringing new concepts and techniques to biological sequence similarity analysis. Experimental results show that these semantics analysis methods are able to facilitate the development of protein remote homology detection, circRNA-disease associations identification and protein function annotation, achieving better performance than the other state-of-the-art predictors in the related fields. Based on these semantics analysis methods, a platform called BioSeq-Diabolo has been constructed, which is named after a popular traditional sport in China. The users only need to input the embeddings of the biological sequence data. BioSeq-Diabolo will intelligently identify the task, and then accurately analyse the biological sequence similarities based on biological language semantics. BioSeq-Diabolo will integrate different biological sequence similarities in a supervised manner by using Learning to Rank (LTR), and the performance of the constructed methods will be evaluated and analysed so as to recommend the best methods for the users. The web server and stand-alone package of BioSeq-Diabolo can be accessed at http://bliulab.net/BioSeq-Diabolo/server/.

Collapse

Oliveira GB, Pedrini H, Dias Z. TEMPROT: protein function annotation using transformers embeddings and homology search. BMC Bioinformatics 2023;24:242. [PMID: 37291492 DOI: 10.1186/s12859-023-05375-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open

Abstract

BACKGROUND

Although the development of sequencing technologies has provided a large number of protein sequences, the analysis of functions that each one plays is still difficult due to the efforts of laboratorial methods, making necessary the usage of computational methods to decrease this gap. As the main source of information available about proteins is their sequences, approaches that can use this information, such as classification based on the patterns of the amino acids and the inference based on sequence similarity using alignment tools, are able to predict a large collection of proteins. The methods available in the literature that use this type of feature can achieve good results, however, they present restrictions of protein length as input to their models. In this work, we present a new method, called TEMPROT, based on the fine-tuning and extraction of embeddings from an available architecture pre-trained on protein sequences. We also describe TEMPROT+, an ensemble between TEMPROT and BLASTp, a local alignment tool that analyzes sequence similarity, which improves the results of our former approach.

RESULTS

The evaluation of our proposed classifiers with the literature approaches has been conducted on our dataset, which was derived from CAFA3 challenge database. Both TEMPROT and TEMPROT+ achieved competitive results on [Formula: see text], [Formula: see text], AuPRC and IAuPRC metrics on Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) ontologies compared to state-of-the-art models, with the main results equal to 0.581, 0.692 and 0.662 of [Formula: see text] on BP, CC and MF, respectively.

CONCLUSIONS

The comparison with the literature showed that our model presented competitive results compared the state-of-the-art approaches considering the amino acid sequence pattern recognition and homology analysis. Our model also presented improvements related to the input size that the model can use to train compared to the literature methods.

Collapse

Spiers AJ, Dorfmueller HC, Jerdan R, McGregor J, Nicoll A, Steel K, Cameron S. Bioinformatics characterization of BcsA-like orphan proteins suggest they form a novel family of pseudomonad cyclic-β-glucan synthases. PLoS One 2023;18:e0286540. [PMID: 37267309 DOI: 10.1371/journal.pone.0286540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/18/2023] [Indexed: 06/04/2023] Open

Abstract

Bacteria produce a variety of polysaccharides with functional roles in cell surface coating, surface and host interactions, and biofilms. We have identified an 'Orphan' bacterial cellulose synthase catalytic subunit (BcsA)-like protein found in four model pseudomonads, P. aeruginosa PA01, P. fluorescens SBW25, P. putida KT2440 and P. syringae pv. tomato DC3000. Pairwise alignments indicated that the Orphan and BcsA proteins shared less than 41% sequence identity suggesting they may not have the same structural folds or function. We identified 112 Orphans among soil and plant-associated pseudomonads as well as in phytopathogenic and human opportunistic pathogenic strains. The wide distribution of these highly conserved proteins suggest they form a novel family of synthases producing a different polysaccharide. In silico analysis, including sequence comparisons, secondary structure and topology predictions, and protein structural modelling, revealed a two-domain transmembrane ovoid-like structure for the Orphan protein with a periplasmic glycosyl hydrolase family GH17 domain linked via a transmembrane region to a cytoplasmic glycosyltransferase family GT2 domain. We suggest the GT2 domain synthesises β-(1,3)-glucan that is transferred to the GH17 domain where it is cleaved and cyclised to produce cyclic-β-(1,3)-glucan (CβG). Our structural models are consistent with enzymatic characterisation and recent molecular simulations of the PaPA01 and PpKT2440 GH17 domains. It also provides a functional explanation linking PaPAK and PaPA14 Orphan (also known as NdvB) transposon mutants with CβG production and biofilm-associated antibiotic resistance. Importantly, cyclic glucans are also involved in osmoregulation, plant infection and induced systemic suppression, and our findings suggest this novel family of CβG synthases may provide similar range of adaptive responses for pseudomonads.

Collapse