1
|
Erckert K, Rost B. Assessing the role of evolutionary information for enhancing protein language model embeddings. Sci Rep 2024; 14:20692. [PMID: 39237735 PMCID: PMC11377704 DOI: 10.1038/s41598-024-71783-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
Embeddings from protein Language Models (pLMs) are replacing evolutionary information from multiple sequence alignments (MSAs) as the most successful input for protein prediction. Is this because embeddings capture evolutionary information? We tested various approaches to explicitly incorporate evolutionary information into embeddings on various protein prediction tasks. While older pLMs (SeqVec, ProtBert) significantly improved through MSAs, the more recent pLM ProtT5 did not benefit. For most tasks, pLM-based outperformed MSA-based methods, and the combination of both even decreased performance for some (intrinsic disorder). We highlight the effectiveness of pLM-based methods and find limited benefits from integrating MSAs.
Collapse
Affiliation(s)
- Kyra Erckert
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Burkhard Rost
- TUM School of Computation, Information and Technology, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany
| |
Collapse
|
2
|
Rojano E, Jabato FM, Perkins JR, Córdoba-Caballero J, García-Criado F, Sillitoe I, Orengo C, Ranea JAG, Seoane-Zonjic P. Assigning protein function from domain-function associations using DomFun. BMC Bioinformatics 2022; 23:43. [PMID: 35033002 PMCID: PMC8761305 DOI: 10.1186/s12859-022-04565-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/05/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. RESULTS We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer's method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of [Formula: see text] and [Formula: see text] We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer's method led to the top performance in almost all scenarios. CONCLUSIONS DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer's method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun . Code maintained at https://github.com/ElenaRojano/DomFun . Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project .
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - Fernando M. Jabato
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - José Córdoba-Caballero
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
| | - Federico García-Criado
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT UK
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT UK
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - Pedro Seoane-Zonjic
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| |
Collapse
|
3
|
Wang L, Quan Y, Zhu Y, Xie X, Wang Z, Wang L, Wei X, Che F. The regenerating protein 3A: a crucial molecular with dual roles in cancer. Mol Biol Rep 2021; 49:1491-1500. [PMID: 34811636 PMCID: PMC8825409 DOI: 10.1007/s11033-021-06904-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 10/29/2021] [Indexed: 12/20/2022]
Abstract
Introduction REG3A, a member of the third subclass of the Reg family, has been found in a variety of tissues but is not detected in immune cells. In the past decade, it has been determined that REG3A expression is regulated by injury, infection, inflammatory stimuli, and pro-cytokines via different signaling pathways, and it acts as a tissue-repair, bactericidal, and anti-inflammatory molecule in human diseases. Recently, the role of REG3A in cancer has received increasing attention. The present article aims to investigate the structure, expression, regulation, function of REG3A, and to highlight the potential role of REG3A in tumors. Methods A detailed literature search and data organization were conducted to find information about the role of REG3A in variety of physiological functions and tumors. Results Contradictory roles of REG3A have been reported in different tumor models. Some studies have demonstrated that high expression of REG3A in cancers can be oncogenic. Other studies have shown decreased REG3A expression in cancer cells as well as suppressed tumor growth. Conclusions Taken together, better understanding of REG3A may lead to new insights that make it a potentially useful target for cancer therapy.
Collapse
Affiliation(s)
- Liying Wang
- Department of Clinlical Medicine, Weifang Medical College, Weifang, China.,Department of Neurology, Linyi People's Hospital, Linyi, China
| | - Yanchun Quan
- Central Laboratory, Linyi People's Hospital, Linyi, China. .,Key Laboratory of Neurophysiology, Linyi People's Hospital, Linyi, Shandong, China. .,Key Laboratory of Tumor Biology, Linyi People's Hospital, Linyi, Shandong, China.
| | - Yanxi Zhu
- Central Laboratory, Linyi People's Hospital, Linyi, China.,Key Laboratory of Neurophysiology, Linyi People's Hospital, Linyi, Shandong, China.,Key Laboratory of Tumor Biology, Linyi People's Hospital, Linyi, Shandong, China
| | - Xiaoli Xie
- Central Laboratory, Linyi People's Hospital, Linyi, China.,Key Laboratory of Neurophysiology, Linyi People's Hospital, Linyi, Shandong, China.,Key Laboratory of Tumor Biology, Linyi People's Hospital, Linyi, Shandong, China
| | - Zhiqiang Wang
- Central Laboratory, Linyi People's Hospital, Linyi, China.,Key Laboratory of Neurophysiology, Linyi People's Hospital, Linyi, Shandong, China.,Key Laboratory of Tumor Biology, Linyi People's Hospital, Linyi, Shandong, China
| | - Long Wang
- Central Laboratory, Linyi People's Hospital, Linyi, China.,Key Laboratory of Neurophysiology, Linyi People's Hospital, Linyi, Shandong, China.,Key Laboratory of Tumor Biology, Linyi People's Hospital, Linyi, Shandong, China
| | - Xiuhong Wei
- Shandong First Medical University & Shandong Academy of Medical Sciences, Taian, Shandong, China
| | - Fengyuan Che
- Department of Neurology, Linyi People's Hospital, Linyi, China. .,Central Laboratory, Linyi People's Hospital, Linyi, China. .,Key Laboratory of Neurophysiology, Linyi People's Hospital, Linyi, Shandong, China. .,Key Laboratory of Tumor Biology, Linyi People's Hospital, Linyi, Shandong, China.
| |
Collapse
|
4
|
Zhou S, Yu Z, Chu W. Effect of quorum-quenching bacterium Bacillus sp. QSI-1 on protein profiles and extracellular enzymatic activities of Aeromonas hydrophila YJ-1. BMC Microbiol 2019; 19:135. [PMID: 31226935 PMCID: PMC6588933 DOI: 10.1186/s12866-019-1515-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 06/17/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND In natural environments, bacteria always live in communities with others where their physiological characteristics are influenced by each other. Bacteria can communicate with one another by using autoinducers. The current knowledge on the effect of quenching bacteria on others is limited to assess the impact of quorum-quenching bacterium Bacillus sp. QSI-1 on proteins pattern and virulence factors production of Aeromonas hydrophila YJ-1. Proteomic analysis was performed to find out protein changes and virulence factors, after 24 h co-culture. RESULTS Results showed that several proteins of A. hydrophila YJ-1 were altered, seventy-two differentially expressed protein spots were excised from 2-DE gels and analyzed by MALDI-TOF/TOF MS, resulting in 63 individual proteins being clearly identified from 70 spots. Among these proteins, 50 were divided into 22 classes and mapped onto 18 biological pathways. Mixed-culture growth with Bacillus sp. QSI-1 resulted in an increase of A. hydrophilia proteins involved in RNA polymerase activity, biosynthesis of secondary metabolites, flagellar assembly, and two-component systems. In contrast, mixed culture resulted in a decreased level of proteins involved in thiamine metabolism; valine, leucine and isoleucine biosynthesis; pantothenate and CoA biosynthesis. In addition, the two extracellular virulence factors, proteases and hemolysin, were significantly reduced when A. hydrophila was co-cultured with QSI-1, while only lipase activity was observed to increase. CONCLUSIONS The information gathered from our experiment showed that Bacillus sp. QSI-1 has a major impact on the expression of proteins, including virulence factors of A. hydrophila.
Collapse
Affiliation(s)
- Shuxin Zhou
- Department of Pharmaceutical Microbiology, School of Life Science and Technology, China Pharmaceutical University, Nanjing, 210009, China
| | - Zixun Yu
- School of Pharmacy, China Pharmaceutical University, Nanjing, 210009, China
| | - Weihua Chu
- Department of Pharmaceutical Microbiology, School of Life Science and Technology, China Pharmaceutical University, Nanjing, 210009, China.
| |
Collapse
|
5
|
Möncke-Buchner E, Szczepek M, Bokelmann M, Heinemann P, Raftery MJ, Krüger DH, Reuter M. Sin Nombre hantavirus nucleocapsid protein exhibits a metal-dependent DNA-specific endonucleolytic activity. Virology 2016; 496:67-76. [PMID: 27261891 DOI: 10.1016/j.virol.2016.05.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 05/10/2016] [Accepted: 05/12/2016] [Indexed: 01/09/2023]
Abstract
We demonstrate that the nucleocapsid protein of Sin Nombre hantavirus (SNV-N) has a DNA-specific endonuclease activity. Upon incubation of SNV-N with DNA in the presence of magnesium or manganese, we observed DNA digestion in sequence-unspecific manner. In contrast, RNA was not affected under the same conditions. Moreover, pre-treatment of SNV-N with RNase before DNA cleavage increased the endonucleolytic activity. Structure-based protein fold prediction using known structures from the PDB database revealed that Asp residues in positions 88 and 103 of SNV-N show sequence similarity with the active site of the restriction endonuclease HindIII. Crystal structure of HindIII predicts that residues Asp93 and Asp108 are essential for coordination of the metal ions required for HindIII DNA cleavage. Therefore, we hypothesized that homologous residues in SNV-N, Asp88 and Asp103, may have a similar function. Replacing Asp88 and Asp103 by alanine led to an SNV-N protein almost completely abrogated for endonuclease activity.
Collapse
Affiliation(s)
- Elisabeth Möncke-Buchner
- Institute of Medical Virology, Helmut-Ruska-Haus, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Michal Szczepek
- Institute of Medical Physics and Biophysics, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Marcel Bokelmann
- Institute of Medical Virology, Helmut-Ruska-Haus, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Patrick Heinemann
- Institute of Medical Virology, Helmut-Ruska-Haus, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Martin J Raftery
- Institute of Medical Virology, Helmut-Ruska-Haus, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Detlev H Krüger
- Institute of Medical Virology, Helmut-Ruska-Haus, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Monika Reuter
- Institute of Medical Virology, Helmut-Ruska-Haus, Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany.
| |
Collapse
|
6
|
Li L, Li D, Chen H, Han JG. Studies on the binding modes of Lassa nucleoprotein complexed with m7GpppG and dTTP by molecular dynamic simulations and free energy calculations. J Biomol Struct Dyn 2012; 31:299-315. [PMID: 22871039 DOI: 10.1080/07391102.2012.703061] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Lassa virus can cause dreadful human hemorrhagic disease, for which there is no effective therapy. A recent study points out that the amino (N)-terminal domain of Lassa virus nucleoprotein (NP) plays an important role in viral RNA synthesis and firstly solved the X-ray crystal structures of NP complexed with the capped Deoxythymidine triphosphate (dTTP) analog, but the binding mode of m7GpppG to the N domain of NP, which is required for viral RNA transcription, has not been studied. In this study, molecular dynamics (MD) simulations have been carried out to investigate the characters of dTTP binding to two forms of NP, i.e. the NP without the C domain and the full-length NP model, using two different force fields, ff03 and ff99SB, respectively. Our calculated results show that the truncated model is reasonable and can replace the full protein model in the following MD simulations, and that ff99SB combined with the general AMBER force field is more suitable for sampling the structure of small molecule NP complex. From the comparisons of stability of hydrogen bonds between small molecule and protein in the dTTP and Uridine 5'-Triphosphate complexes, one finds that the stable hydrogen bonds between the second phosphate group of small molecules and two residues, Thr178 and Arg323, are critical for cap analogs binding to the N domain of NP. Additionally, docking method combined with MD simulations have been applied to predict the binding mode of m7GpppG to NP; and the hydrogen bond analysis and the binding free energy decomposition method (MM/GBSA) are conducted to study the interactions in the putative binding mode. The calculated results are expected to provide guidance for drug development.
Collapse
Affiliation(s)
- Liang Li
- National Synchrotron Radiation Laboratory, University of Science and Technology of China , 230029 Hefei , People's Republic of China
| | | | | | | |
Collapse
|