1
|
Riziotis IG, Kafas JC, Ong G, Borkakoti N, Ribeiro AJM, Thornton JM. Paradigms of convergent evolution in enzymes. FEBS J 2025; 292:537-555. [PMID: 39578229 PMCID: PMC11796326 DOI: 10.1111/febs.17332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 09/10/2024] [Accepted: 11/12/2024] [Indexed: 11/24/2024]
Abstract
There are many occurrences of enzymes catalysing the same reaction but having significantly different structures. Leveraging the comprehensive information on enzymes stored in the Mechanism and Catalytic Site Atlas (M-CSA), we present a collection of 34 cases for which there is sufficient evidence of functional convergence without an evolutionary link. For each case, we compare enzymes which have identical Enzyme Commission numbers (i.e. catalyse the same reaction), but different identifiers in the CATH data resource (i.e. different folds). We focus on similarities between their sequences, structures, active site geometries, cofactors and catalytic mechanisms. These features are then assessed to evaluate whether all the evidence for these structurally diverse proteins supports their independent evolution to catalyse the same chemical reaction. Our approach combines published literature information with knowledge-based computational resources from, amongst others, M-CSA, PDBe and PDBsum, supported by tailor-made software to explore active site structures and assess similarities in mechanism. We find that there are multiple types of convergent functional evolution observed to date, and it is necessary to investigate sequence, structure, active site geometry and enzyme mechanisms to describe such convergence accurately.
Collapse
Affiliation(s)
| | | | - Gabriel Ong
- European Bioinformatics Institute (EMBL‐EBI)CambridgeUK
| | | | | | | |
Collapse
|
2
|
Lai JS, Burley SK, Duarte JM. ZMPY3D: accelerating protein structure volume analysis through vectorized 3D Zernike moments and Python-based GPU integration. BIOINFORMATICS ADVANCES 2024; 4:vbae111. [PMID: 39100546 PMCID: PMC11297494 DOI: 10.1093/bioadv/vbae111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 07/12/2024] [Accepted: 07/25/2024] [Indexed: 08/06/2024]
Abstract
Motivation Volumetric 3D object analyses are being applied in research fields such as structural bioinformatics, biophysics, and structural biology, with potential integration of artificial intelligence/machine learning (AI/ML) techniques. One such method, 3D Zernike moments, has proven valuable in analyzing protein structures (e.g., protein fold classification, protein-protein interaction analysis, and molecular dynamics simulations). Their compactness and efficiency make them amenable to large-scale analyses. Established methods for deriving 3D Zernike moments, however, can be inefficient, particularly when higher order terms are required, hindering broader applications. As the volume of experimental and computationally-predicted protein structure information continues to increase, structural biology has become a "big data" science requiring more efficient analysis tools. Results This application note presents a Python-based software package, ZMPY3D, to accelerate computation of 3D Zernike moments by vectorizing the mathematical formulae and using graphical processing units (GPUs). The package offers popular GPU-supported libraries such as CuPy and TensorFlow together with NumPy implementations, aiming to improve computational efficiency, adaptability, and flexibility in future algorithm development. The ZMPY3D package can be installed via PyPI, and the source code is available from GitHub. Volumetric-based protein 3D structural similarity scores and transform matrix of superposition functionalities have both been implemented, creating a powerful computational tool that will allow the research community to amalgamate 3D Zernike moments with existing AI/ML tools, to advance research and education in protein structure bioinformatics. Availability and implementation ZMPY3D, implemented in Python, is available on GitHub (https://github.com/tawssie/ZMPY3D) and PyPI, released under the GPL License.
Collapse
Affiliation(s)
- Jhih-Siang Lai
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
- Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, United States
| | - Jose M Duarte
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, United States
| |
Collapse
|
3
|
Zheng Z, Goncearenco A, Berezovsky IN. Back in time to the Gly-rich prototype of the phosphate binding elementary function. Curr Res Struct Biol 2024; 7:100142. [PMID: 38655428 PMCID: PMC11035071 DOI: 10.1016/j.crstbi.2024.100142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/31/2024] [Accepted: 04/03/2024] [Indexed: 04/26/2024] Open
Abstract
Binding of nucleotides and their derivatives is one of the most ancient elementary functions dating back to the Origin of Life. We review here the works considering one of the key elements in binding of (di)nucleotide-containing ligands - phosphate binding. We start from a brief discussion of major participants, conditions, and events in prebiotic evolution that resulted in the Origin of Life. Tracing back to the basic functions, including metal and phosphate binding, and, potentially, formation of primitive protein-protein interactions, we focus here on the phosphate binding. Critically assessing works on the structural, functional, and evolutionary aspects of phosphate binding, we perform a simple computational experiment reconstructing its most ancient and generic sequence prototype. The profiles of the phosphate binding signatures have been derived in form of position-specific scoring matrices (PSSMs), their peculiarities depending on the type of the ligands have been analyzed, and evolutionary connections between them have been delineated. Then, the apparent prototype that gave rise to all relevant phosphate-binding signatures had also been reconstructed. We show that two major signatures of the phosphate binding that discriminate between the binding of dinucleotide- and nucleotide-containing ligands are GxGxxG and GxxGxG, respectively. It appears that the signature archetypal for dinucleotide-containing ligands is more generic, and it can frequently bind phosphate groups in nucleotide-containing ligands as well. The reconstructed prototype's key signature GxGGxG underlies the role of glycine residues in providing flexibility and interactions necessary for binding the phosphate groups. The prototype also contains other ancient amino acids, valine, and alanine, showing versatility towards evolutionary design and functional diversification.
Collapse
Affiliation(s)
- Zejun Zheng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | | | - Igor N. Berezovsky
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
4
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
5
|
Riziotis IG, Ribeiro AJM, Borkakoti N, Thornton JM. The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities. J Mol Biol 2023; 435:168254. [PMID: 37652131 DOI: 10.1016/j.jmb.2023.168254] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/20/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]
Abstract
Enzyme catalysis is governed by a limited toolkit of residues and organic or inorganic co-factors. Therefore, it is expected that recurring residue arrangements will be found across the enzyme space, which perform a defined catalytic function, are structurally similar and occur in unrelated enzymes. Leveraging the integrated information in the Mechanism and Catalytic Site Atlas (M-CSA) (enzyme structure, sequence, catalytic residue annotations, catalysed reaction, detailed mechanism description), 3D templates were derived to represent compact groups of catalytic residues. A fuzzy template-template search, allowed us to identify those recurring motifs, which are conserved or convergent, that we define as the "modules of enzyme catalysis". We show that a large fraction of these modules facilitate binding of metal ions, co-factors and substrates, and are frequently the result of convergent evolution. A smaller number of convergent modules perform a well-defined catalytic role, such as the variants of the catalytic triad (i.e. Ser-His-Asp/Cys-His-Asp) and the saccharide-cleaving Asp/Glu triad. It is also shown that enzymes whose functions have diverged during evolution preserve regions of their active site unaltered, as shown by modules performing similar or identical steps of the catalytic mechanism. We have compiled a comprehensive library of catalytic modules, that characterise a broad spectrum of enzymes. These modules can be used as templates in enzyme design and for better understanding catalysis in 3D.
Collapse
Affiliation(s)
- Ioannis G Riziotis
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK.
| | - António J M Ribeiro
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Neera Borkakoti
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Janet M Thornton
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| |
Collapse
|
6
|
Jeffery CJ. Current successes and remaining challenges in protein function prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1222182. [PMID: 37576715 PMCID: PMC10415035 DOI: 10.3389/fbinf.2023.1222182] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 07/03/2023] [Indexed: 08/15/2023] Open
Abstract
In recent years, improvements in protein function prediction methods have led to increased success in annotating protein sequences. However, the functions of over 30% of protein-coding genes remain unknown for many sequenced genomes. Protein functions vary widely, from catalyzing chemical reactions to binding DNA or RNA or forming structures in the cell, and some types of functions are challenging to predict due to the physical features associated with those functions. Other complications in understanding protein functions arise due to the fact that many proteins have more than one function or very small differences in sequence or structure that correspond to different functions. We will discuss some of the recent developments in predicting protein functions and some of the remaining challenges.
Collapse
Affiliation(s)
- Constance J. Jeffery
- Department of Biological Sciences, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
7
|
Borkakoti N, Thornton JM. AlphaFold2 protein structure prediction: Implications for drug discovery. Curr Opin Struct Biol 2023; 78:102526. [PMID: 36621153 PMCID: PMC7614146 DOI: 10.1016/j.sbi.2022.102526] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 12/01/2022] [Accepted: 12/03/2022] [Indexed: 01/09/2023]
Abstract
The drug discovery process involves designing compounds to selectively interact with their targets. The majority of therapeutic targets for low molecular weight (small molecule) drugs are proteins. The outstanding accuracy with which recent artificial intelligence methods compile the three-dimensional structure of proteins has made protein targets more accessible to the drug design process. Here, we present our perspective of the significance of accurate protein structure prediction on various stages of the small molecule drug discovery life cycle focusing on current capabilities and assessing how further evolution of such predictive procedures can have a more decisive impact in the discovery of new medicines.
Collapse
Affiliation(s)
- Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|