1
|
Kaminski K, Ludwiczak J, Pawlicki K, Alva V, Dunin-Horkawicz S. pLM-BLAST: distant homology detection based on direct comparison of sequence representations from protein language models. Bioinformatics 2023; 39:btad579. [PMID: 37725369 PMCID: PMC10576641 DOI: 10.1093/bioinformatics/btad579] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 07/09/2023] [Accepted: 09/15/2023] [Indexed: 09/21/2023] Open
Abstract
MOTIVATION The detection of homology through sequence comparison is a typical first step in the study of protein function and evolution. In this work, we explore the applicability of protein language models to this task. RESULTS We introduce pLM-BLAST, a tool inspired by BLAST, that detects distant homology by comparing single-sequence representations (embeddings) derived from a protein language model, ProtT5. Our benchmarks reveal that pLM-BLAST maintains a level of accuracy on par with HHsearch for both highly similar sequences (with >50% identity) and markedly divergent sequences (with <30% identity), while being significantly faster. Additionally, pLM-BLAST stands out among other embedding-based tools due to its ability to compute local alignments. We show that these local alignments, produced by pLM-BLAST, often connect highly divergent proteins, thereby highlighting its potential to uncover previously undiscovered homologous relationships and improve protein annotation. AVAILABILITY AND IMPLEMENTATION pLM-BLAST is accessible via the MPI Bioinformatics Toolkit as a web server for searching precomputed databases (https://toolkit.tuebingen.mpg.de/tools/plmblast). It is also available as a standalone tool for building custom databases and performing batch searches (https://github.com/labstructbioinf/pLM-BLAST).
Collapse
Affiliation(s)
- Kamil Kaminski
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw 02-097, Poland
| | - Jan Ludwiczak
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
| | - Kamil Pawlicki
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen 72076, Germany
| | - Stanislaw Dunin-Horkawicz
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw 02-089, Poland
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen 72076, Germany
| |
Collapse
|
2
|
Malik SS, Masood N, Fatima I, Kazmi Z. Microbial-Based Cancer Therapy: Diagnostic Tools and Therapeutic Strategies. MICROORGANISMS FOR SUSTAINABILITY 2019:53-82. [DOI: 10.1007/978-981-13-8844-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
|
3
|
M. Fialho A, Bernardes N, M Chakrabarty A. Exploring the anticancer potential of the bacterial protein azurin. AIMS Microbiol 2016. [DOI: 10.3934/microbiol.2016.3.292] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
4
|
Liu J, Chakraborty S, Hosseinzadeh P, Yu Y, Tian S, Petrik I, Bhagi A, Lu Y. Metalloproteins containing cytochrome, iron-sulfur, or copper redox centers. Chem Rev 2014; 114:4366-469. [PMID: 24758379 PMCID: PMC4002152 DOI: 10.1021/cr400479b] [Citation(s) in RCA: 624] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Indexed: 02/07/2023]
Affiliation(s)
- Jing Liu
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Saumen Chakraborty
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Parisa Hosseinzadeh
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Yang Yu
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Shiliang Tian
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Igor Petrik
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Ambika Bhagi
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Yi Lu
- Department of Chemistry, Department of Biochemistry, and Center for Biophysics
and Computational
Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
5
|
KARUNWI OLUKAYODE, BALDWIN CASSIDY, GRIESHEIMER GISELA, SARUPRIA SAPNA, GUISEPPI-ELIE ANTHONY. MOLECULAR DYNAMICS SIMULATIONS OF PEPTIDE–SWCNT INTERACTIONS RELATED TO ENZYME CONJUGATES FOR BIOSENSORS AND BIOFUEL CELLS. ACTA ACUST UNITED AC 2014. [DOI: 10.1142/s1793984413430071] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
With the demonstration of direct electron transfer between the redox active prosthetic group, flavin adenine dinucleotide (FAD), of glucose oxidase (GOx) and single-walled carbon nanotubes (SWCNT), there has been growing interest in the fabrication of CNT-enzyme supramolecular constructs that control the placement of SWCNTs within the tunneling distance of co-factors for enhanced electron transfer efficiency in generation-3 biosensors and advanced biofuel cells. These conjugate systems raise a series of questions such as: which peptide sequences within the enzymes have high affinity for the SWCNTs? And, are these high affinity sequences likely to be in the vicinity of the redox-active co-factor to allow for direct electron transfer? Phage display has recently been used to identify specific peptide sequences that have high affinity for SWCNTs. Molecular dynamics simulations were performed to study the interactions of five discrete peptides with (16,0) SWCNT in explicit water as well as with graphene. From the progression of the radius of gyration, Rg, the peptides studied were concertedly adsorbed to both the SWCNT and graphene. Peptide properties calculated using individual amino acid values, such as hydrophobicity indices, did not correlate with the observed adsorption behavior as quantified by Rg, indicating that the adsorption behavior of the peptide was not based on the individual amino acid residues. However, the Rg values, reflective of the physicochemical embrace of the surface (SWCNT or graphene) had a strong positive correlation with the solubility parameter, indicating concerted, cooperative interaction of peptide segments with the materials. The end residues appear to dominate the progression of adsorption regardless of character. Sequences identified by phage display share some homology with key enzymes (GOx, lactate oxidase and laccase) used in biosensors and enzyme-based biofuel cells. These analogous sequences appear to be buried deep within the shell of fully folded proteins and as such are expected to be close to the redox-active prosthetic group.
Collapse
Affiliation(s)
- OLUKAYODE KARUNWI
- Center for Bioelectronics, Biosensors and Biochips (C3B), Clemson University Advanced Materials Center, 100 Technology Drive, Anderson, South Carolina 29625, USA
- Department of Bioengineering, Clemson University, Clemson, SC 29634, USA
| | - CASSIDY BALDWIN
- SC Governor's School for Science & Mathematics, Hartsville, SC 29550, USA
| | - GISELA GRIESHEIMER
- SC Governor's School for Science & Mathematics, Hartsville, SC 29550, USA
| | - SAPNA SARUPRIA
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, SC 29634, USA
| | - ANTHONY GUISEPPI-ELIE
- Center for Bioelectronics, Biosensors and Biochips (C3B), Clemson University Advanced Materials Center, 100 Technology Drive, Anderson, South Carolina 29625, USA
- Department of Bioengineering, Department of Chemical and Biomolecular Engineering, Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
6
|
Mulligan VK, Chakrabartty A. Protein misfolding in the late-onset neurodegenerative diseases: Common themes and the unique case of amyotrophic lateral sclerosis. Proteins 2013; 81:1285-303. [DOI: 10.1002/prot.24285] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2012] [Revised: 02/27/2013] [Accepted: 02/28/2013] [Indexed: 12/12/2022]
Affiliation(s)
| | - Avijit Chakrabartty
- Department of Biochemistry; Toronto Ontario M5G 1L7 Canada
- Department of Medical Biophysics; University of Toronto; Toronto Ontario M5G 1L7 Canada
- Campbell Family Institute for Cancer Research, Ontario Cancer Institute/University Health Network; Toronto Ontario M5G 1L7 Canada
| |
Collapse
|
7
|
Bodelón G, Palomino C, Fernández LÁ. Immunoglobulin domains inEscherichia coliand other enterobacteria: from pathogenesis to applications in antibody technologies. FEMS Microbiol Rev 2013; 37:204-50. [DOI: 10.1111/j.1574-6976.2012.00347.x] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 06/07/2012] [Accepted: 06/14/2012] [Indexed: 11/28/2022] Open
|
8
|
Ji HF, Chen L, Jiang YY, Zhang HY. Evolutionary formation of new protein folds is linked to metallic cofactor recruitment. Bioessays 2009; 31:975-80. [PMID: 19644916 DOI: 10.1002/bies.200800201] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
To explore whether the generation of new protein folds could be linked to metallic cofactor recruitment, we identified the oldest examples of folds for manganese, iron, zinc, and copper proteins by analyzing their fold-domain mapping patterns. We discovered that the generation of these folds was tightly coupled to corresponding metals. We found that the emerging order for these folds, i.e., manganese and iron protein folds appeared earlier than zinc and copper counterparts, coincides with the putative bioavailability of the corresponding metals in the ancient anoxic ocean. Therefore, we conclude that metallic cofactors, like organic cofactors, play an evolutionary role in the formation of new protein folds. This link could be explained by the emergence of protein structures with novel folds that could fulfill the new protein functions introduced by the metallic cofactors. These findings not only have important implications for understanding the evolutionary mechanisms of protein architectures, but also provide a further interpretation for the evolutionary story of superoxide dismutases.
Collapse
Affiliation(s)
- Hong-Fang Ji
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Center for Advanced Study, Shandong University of Technology, Zibo 255049, P. R. China
| | | | | | | |
Collapse
|
9
|
Sato K, Li C, Salard I, Thompson AJ, Banfield MJ, Dennison C. Metal-binding loop length and not sequence dictates structure. Proc Natl Acad Sci U S A 2009; 106:5616-21. [PMID: 19299503 PMCID: PMC2666997 DOI: 10.1073/pnas.0811324106] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2008] [Indexed: 11/18/2022] Open
Abstract
The C-terminal copper-binding loop in the beta-barrel fold of the cupredoxin azurin has been replaced with a range of sequences containing alanine, glycine, and valine residues to assess the importance of amino acid composition and the length of this region. The introduction of 2 and 4 alanines between the coordinating Cys, His, and Met results in loop structures matching those in naturally occurring proteins with the same loop lengths. A loop with 4 alanines between the Cys and His and 3 between the His and Met ligands has a structure identical to that of the WT protein, whose loop is the same length. Loop structure is dictated by length and not sequence allowing the properties of the main surface patch for interactions with partners, to which the loop is a major contributor, to be optimized. Loops with 2 amino acids between the ligands using glycine, alanine, and valine residues have been compared. An empirical relationship is found between copper site protection by the loop and reduction potential. A loop adorned with 4 methyl groups is sufficient to protect the copper ion, enabling most sequences to adequately perform this task. The mutant with 3 alanine residues between the ligands forms a strand-swapped dimer in the crystal structure, an arrangement that has not, to our knowledge, been seen previously for this family of proteins. Cupredoxins function as redox shuttles and are required to be monomeric; therefore, none have evolved with a metal-binding loop of this length.
Collapse
Affiliation(s)
- Katsuko Sato
- Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom; and
| | - Chan Li
- Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom; and
| | - Isabelle Salard
- Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom; and
| | - Andrew J. Thompson
- Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom; and
| | - Mark J. Banfield
- Department of Biological Chemistry, John Innes Centre, Norwich, NR4 7UH, United Kingdom
| | - Christopher Dennison
- Institute for Cell and Molecular Biosciences, Medical School, Newcastle University, Newcastle upon Tyne NE2 4HH, United Kingdom; and
| |
Collapse
|
10
|
Stevens FJ. Possible evolutionary links between immunoglobulin light chains and other proteins involved in amyloidosis. Amyloid 2008; 15:96-107. [PMID: 18484336 DOI: 10.1080/13506120802005973] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
With limited exceptions, proteins that account for the amyloidoses appear to be evolutionarily unrelated. Transthyretin is classified as having an "immunoglobulin-like" fold as found in light chain variable and constant domains. Thus, these amyloidogenic proteins have significant conformational similarity. In the absence of primary structure similarity sufficient to justify an inference of an evolutionary relationship, transthyretin is considered an analog of immunoglobulin domains having accrued the immunoglobulin-like fold by some form of convergent evolution of structure. Improvements in sequence comparison tools and strategies, coupled with recent logarithmic increases in the availability of primary structure data, now make it possible to suggest that transthyretin and immunoglobulins may have a common evolutionary origin. In addition, lactadherin, the medin fragment of which accounts for the most common form of human amyloid, also appears to be evolutionarily linked to transthyretin and immunoglobulins.
Collapse
Affiliation(s)
- Fred J Stevens
- Biosciences Division, Argonne National Laboratory, Argonne, IL 60439, USA.
| |
Collapse
|