1
|
Andreini C. Twenty years in metalloprotein bioinformatics: A short history of a long journey. J Inorg Biochem 2025; 266:112854. [PMID: 39961171 DOI: 10.1016/j.jinorgbio.2025.112854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 01/31/2025] [Accepted: 02/07/2025] [Indexed: 03/01/2025]
Abstract
The study of the structure and function of metalloproteins is a central subject of inorganic biochemistry. Starting from the 2000s, computational methods have flanked experimental research by exploiting the ever-increasing computing power and the huge amount of data produced by omics technologies. In this article, we retrace the major advancements that brought bioinformatics from being of minor relevance to being an essential tool for today's inorganic biochemists, focusing on the contributions coming from the Magnetic Resonance Center (CERM) of Florence, where we have been developing for twenty years methods and resources to investigate metalloproteins with computational approaches.
Collapse
Affiliation(s)
- Claudia Andreini
- Magnetic Resonance Center, University of Florence, 50019 Sesto Fiorentino, Italy; Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy.
| |
Collapse
|
2
|
Li Q, Liu X, Liu K, Ren H, Jian S, Lu H, Cheng Y, Zou Q, Huang Y. The invasion of Cassytha filiformis accelerated the litter decomposition of native plant communities in small tropical coral islands. BMC PLANT BIOLOGY 2025; 25:504. [PMID: 40259227 PMCID: PMC12010556 DOI: 10.1186/s12870-025-06556-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Accepted: 04/15/2025] [Indexed: 04/23/2025]
Abstract
BACKGROUND Plant invasion affects plant community composition, biodiversity, and nutrient cycling in terrestrial ecosystems, particularly in vulnerable ecosystems. As an invasive parasitic plant, Cassytha filiformis has caused extensive damage to the native vegetation of the Paracel Islands. However, the effects of C. filiformis invasion on litter decomposition and nutrient release in native plant communities remain unclear. We conducted an in-situ decomposition experiment in native plant communities on a coral island to explore the litter decomposition dynamics varying across enzyme activities, soil properties and C. filiformis invasive degrees. RESULTS The mass loss of litter was determined during the decomposition process. The data showed that litter mass loss under severe invasion was significantly lower than in uninvaded sites after nine months of decomposition. The invasion of C. filiformis accelerated the nitrogen release and lignin decomposition with increased litter quality and polyphenol oxidase activity. Besides, soil phosphorus availability and potassium content also induced the oxidase activity. Meanwhile, the decomposition of litter organic carbon was delayed because β-1, 4-glucosidase activity was low in the first six months. Besides, peroxidase activity maintained a high level in invasive plots, indicating that the residues of C. filiformis may have allelopathy. CONCLUSION Our results suggested that the invasion of C. filiformis accelerated litter mass loss and element release on coral islands by regulating litter quality and enzyme activity. However, the short-term rapid litter decomposition may result in nutrient loss, which is not conducive to the growth of native plants.
Collapse
Affiliation(s)
- Qiang Li
- School of Tropical Medicine, Hainan Medical University, Haikou, 571199, China
| | - Xiao Liu
- School of Geography and Tourism, Qilu Normal University, Jinan, 250200, China
| | - Ke Liu
- CAS Engineering Laboratory for Vegetation Ecosystem Restoration On Islands and Coastal Zones & Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Hai Ren
- CAS Engineering Laboratory for Vegetation Ecosystem Restoration On Islands and Coastal Zones & Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Shuguang Jian
- CAS Engineering Laboratory for Vegetation Ecosystem Restoration On Islands and Coastal Zones & Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Hongfang Lu
- CAS Engineering Laboratory for Vegetation Ecosystem Restoration On Islands and Coastal Zones & Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Yuanhao Cheng
- CAS Engineering Laboratory for Vegetation Ecosystem Restoration On Islands and Coastal Zones & Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Qingchi Zou
- Liaoning Natural Forest Protection Center, Shenyang, 110036, China
| | - Yao Huang
- Ministry of Education Key Laboratory for Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants, School of Ecology, Hainan University, Haikou, 570228, China.
| |
Collapse
|
3
|
Ahsan M, Pindi C, Palermo G. Emerging Mechanisms of Metal-Catalyzed RNA and DNA Modifications. Annu Rev Phys Chem 2025; 76:497-518. [PMID: 39952635 DOI: 10.1146/annurev-physchem-082423-030241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2025]
Abstract
Metal ions play a critical role in various chemical, biological, and environmental processes. This review reports on emerging chemical mechanisms in the catalysis of DNA and RNA. We provide an overview of the metal-dependent mechanisms of DNA cleavage in CRISPR (clustered regularly interspaced short palindromic repeats)-Cas systems that are transforming life sciences through genome editing technologies, and showcase intriguing metal-dependent mechanisms of RNA cleavages. We show that newly discovered CRISPR-Cas complexes operate as protein-assisted ribozymes, highlighting RNA's versatility and the enhancement of CRISPR-Cas functions through strategic metal ion use. We demonstrate the power of computer simulations in observing chemical processes as they unfold and in advancing structural biology through innovative approaches for refining cryo-electron microscopy maps. Understanding metal ion involvement in nucleic acid catalysis is crucial for advancing genome editing, aiding therapeutic interventions for genetic disorders, and improving the editing tools' specificity and efficiency.
Collapse
Affiliation(s)
- Mohd Ahsan
- Department of Bioengineering, University of California, Riverside, California, USA; , ,
| | - Chinmai Pindi
- Department of Bioengineering, University of California, Riverside, California, USA; , ,
| | - Giulia Palermo
- Department of Bioengineering, University of California, Riverside, California, USA; , ,
- Department of Chemistry, University of California, Riverside, California, USA
| |
Collapse
|
4
|
Kim S, Lee W, Kim HI, Kim MK, Choi TS. Recent advances and future challenges in predictive modeling of metalloproteins by artificial intelligence. Mol Cells 2025; 48:100191. [PMID: 39938866 PMCID: PMC11919430 DOI: 10.1016/j.mocell.2025.100191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 01/31/2025] [Accepted: 02/01/2025] [Indexed: 02/14/2025] Open
Abstract
Metal coordination is essential for structural/catalytic functions of metalloproteins that mediate a wide range of biological processes in living organisms. Advances in bioinformatics have significantly enhanced our understanding of metal-binding sites and their functional roles in metalloproteins. State-of-the-art computational models developed for metal-binding sites seamlessly integrate protein sequence and structural data to unravel the complexities of metal coordination environments. Our goal in this mini-review is to give an overview of these tools and highlight the current challenges (predicting dynamic metal-binding sites, determining functional metalation states, and designing intricate coordination networks) remaining in the predictive models of metal-binding sites. Addressing these challenges will not only deepen our knowledge of natural metalloproteins but also accelerate the development of artificial metalloproteins with novel and precisely engineered functionalities.
Collapse
Affiliation(s)
- Soohyeong Kim
- Departments of Chemistry, Korea University, Seoul 02841, Republic of Korea
| | - Wonseok Lee
- Division of Life Sciences, Korea University, Seoul 02841, Republic of Korea
| | - Hugh I. Kim
- Departments of Chemistry, Korea University, Seoul 02841, Republic of Korea
| | - Min Kyung Kim
- College of Pharmacy, Gachon University, Incheon 21936, Republic of Korea
| | - Tae Su Choi
- Division of Life Sciences, Korea University, Seoul 02841, Republic of Korea
| |
Collapse
|
5
|
van der Weg K, Merdivan E, Piraud M, Gohlke H. TopEC: prediction of Enzyme Commission classes by 3D graph neural networks and localized 3D protein descriptor. Nat Commun 2025; 16:2737. [PMID: 40108108 PMCID: PMC11923149 DOI: 10.1038/s41467-025-57324-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 02/11/2025] [Indexed: 03/22/2025] Open
Abstract
Tools available for inferring enzyme function from general sequence, fold, or evolutionary information are generally successful. However, they can lead to misclassification if a deviation in local structural features influences the function. Here, we present TopEC, a 3D graph neural network based on a localized 3D descriptor to learn chemical reactions of enzymes from enzyme structures and predict Enzyme Commission (EC) classes. Using message-passing frameworks, we include distance and angle information to significantly improve the predictive performance for EC classification (F-score: 0.72) compared to regular 2D graph neural networks. We trained networks without fold bias that can classify enzyme structures for a vast functional space (>800 ECs). Our model is robust to uncertainties in binding site locations and similar functions in distinct binding sites. We observe that TopEC networks learn from an interplay between biochemical features and local shape-dependent features. TopEC is available as a repository on GitHub: https://github.com/IBG4-CBCLab/TopEC and https://doi.org/10.25838/d5p-66 .
Collapse
Affiliation(s)
- Karel van der Weg
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany
| | - Erinc Merdivan
- Helmholtz AI Central Unit, Ingolstädter Landstraße 1, 85764, Oberschleißheim, Germany
| | - Marie Piraud
- Helmholtz AI Central Unit, Ingolstädter Landstraße 1, 85764, Oberschleißheim, Germany
| | - Holger Gohlke
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics), Forschungszentrum Jülich GmbH, 52425, Jülich, Germany.
- Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, 40225, Düsseldorf, Germany.
| |
Collapse
|
6
|
Xiong G, Xiao Z. Computational approaches for the identification of novel metal-binding pharmacophores: advances and challenges. Drug Discov Today 2025; 30:104293. [PMID: 39805538 DOI: 10.1016/j.drudis.2025.104293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2024] [Revised: 01/02/2025] [Accepted: 01/08/2025] [Indexed: 01/16/2025]
Abstract
Metalloenzymes are important therapeutic targets for a variety of human diseases. Computational approaches have recently emerged as effective tools to understand metal-ligand interactions and expand the structural diversity of both metalloenzyme inhibitors (MIs) and metal-binding pharmacophores (MBPs). In this review, we highlight key advances in currently available fine-tuning modeling methods and data-driven cheminformatic approaches. We also discuss major challenges to the recognition of novel MBPs and MIs. The evidence provided herein could expedite future computational efforts to guide metalloenzyme-based drug discovery.
Collapse
Affiliation(s)
- Guoli Xiong
- State Key Laboratory of Digestive Health, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zhiyan Xiao
- State Key Laboratory of Digestive Health, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China; Beijing Key Laboratory of Active Substance Discovery and Druggability Evaluation, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.
| |
Collapse
|
7
|
Capdevila DA, Rondón JJ, Edmonds KA, Rocchio JS, Dujovne MV, Giedroc DP. Bacterial Metallostasis: Metal Sensing, Metalloproteome Remodeling, and Metal Trafficking. Chem Rev 2024; 124:13574-13659. [PMID: 39658019 DOI: 10.1021/acs.chemrev.4c00264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024]
Abstract
Transition metals function as structural and catalytic cofactors for a large diversity of proteins and enzymes that collectively comprise the metalloproteome. Metallostasis considers all cellular processes, notably metal sensing, metalloproteome remodeling, and trafficking (or allocation) of metals that collectively ensure the functional integrity and adaptability of the metalloproteome. Bacteria employ both protein and RNA-based mechanisms that sense intracellular transition metal bioavailability and orchestrate systems-level outputs that maintain metallostasis. In this review, we contextualize metallostasis by briefly discussing the metalloproteome and specialized roles that metals play in biology. We then offer a comprehensive perspective on the diversity of metalloregulatory proteins and metal-sensing riboswitches, defining general principles within each sensor superfamily that capture how specificity is encoded in the sequence, and how selectivity can be leveraged in downstream synthetic biology and biotechnology applications. This is followed by a discussion of recent work that highlights selected metalloregulatory outputs, including metalloproteome remodeling and metal allocation by metallochaperones to both client proteins and compartments. We close by briefly discussing places where more work is needed to fill in gaps in our understanding of metallostasis.
Collapse
Affiliation(s)
- Daiana A Capdevila
- Fundación Instituto Leloir, Instituto de Investigaciones Bioquímicas de Buenos Aires (IIBBA-CONICET), C1405 BWE Buenos Aires, Argentina
| | - Johnma J Rondón
- Fundación Instituto Leloir, Instituto de Investigaciones Bioquímicas de Buenos Aires (IIBBA-CONICET), C1405 BWE Buenos Aires, Argentina
| | - Katherine A Edmonds
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405-7102, United States
| | - Joseph S Rocchio
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405-7102, United States
| | - Matias Villarruel Dujovne
- Fundación Instituto Leloir, Instituto de Investigaciones Bioquímicas de Buenos Aires (IIBBA-CONICET), C1405 BWE Buenos Aires, Argentina
| | - David P Giedroc
- Department of Chemistry, Indiana University, Bloomington, Indiana 47405-7102, United States
| |
Collapse
|
8
|
Peng X, Tang H, Zhao Z, Zheng Y, Gui X, Jiang A, He P, Wen X, Zhang Q, Mei Z, Shi Y, Chu C, Zhang Y, Liu G. Intelligent Generic High-Throughput Oscillatory Shear Technology Fabricates Programmable Microrobots for Real-Time Visual Guidance During Embolization. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2024:e2408613. [PMID: 39676403 DOI: 10.1002/smll.202408613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 11/07/2024] [Indexed: 12/17/2024]
Abstract
Microrobots for endovascular embolization face challenges in precise delivery within dynamic blood vessels. Here, an intelligent generic high-throughput oscillatory shear technology (iGHOST) is proposed to fabricate diversely programmable, multifunctional microrobots capable of real-time visual guidance for in vivo endovascular embolization. Leveraging machine learning (ML), key synthesis parameters affecting the success and sphericity of the microrobots are identified. Therefore, the ML-optimized iGHOST enables continuous production of uniform microrobots with programmable sizes (400-1000 µm) at an ultrahigh rate exceeding 240 mL h-1 by oscillatory segmenting fluid into droplets before ionic cross-linking, and without requiring purification. Particularly, the iGHOST-fabricated magnetically responsive lipiodol-calcium alginate (MagLiCA) microrobots are highly distinguishable under X-ray imaging, which allows for precise navigation in fluid flows of up to 4 mL min-1 and accurate embolization in liver and kidney blood vessels, thus addressing the current issues. Crucially, MagLiCA microrobots possess drug-loading capabilities, enabling simultaneous embolization and site-specific treatment. The iGHOST process is an intelligent, rapid, and green manufacturing method, which can produce size-controllable, multifunctional microrobots with the potential for precise drug delivery and treatment under real-time imaging across various medical applications.
Collapse
Affiliation(s)
- Xuqi Peng
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
| | - Haitian Tang
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
| | - Zhenwen Zhao
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
| | - Yating Zheng
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
| | - Xiran Gui
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
| | - Aijun Jiang
- Department of General Surgery, Naval Medical Center of PLA, Naval Medical University, Shanghai, 200052, China
| | - Pan He
- Department of General Surgery, Institute of Hepatobiliary-Pancreatic-Intestinal Diseases, Affiliated Hospital of North Sichuan Medical College, Nanchong, 637000, China
| | - Xiaofei Wen
- Department of Vascular & Tumor Interventional Radiology, The First Affiliated Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen, 361000, China
| | - Qian Zhang
- Institute of Artificial Intelligence, Xiamen University, Xiamen, 361102, China
| | - Ziyang Mei
- Institute of Artificial Intelligence, Xiamen University, Xiamen, 361102, China
| | - Yesi Shi
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
| | - Chengchao Chu
- Eye Institute of Xiamen University, Fujian Provincial Key Laboratory of Ophthalmology and Visual Science, School of Medicine, Xiamen University, Xiamen, 361102, China
| | - Yang Zhang
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
- Shen Zhen Research Institute of Xiamen University, Shenzhen, 518057, China
| | - Gang Liu
- State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, State Key Laboratory of Vaccines for Infectious Diseases, Center for Molecular Imaging and Translational Medicine, Xiang An Biomedicine Laboratory, School of Public Health, Xiamen University, Xiamen, 361102, China
| |
Collapse
|
9
|
Kamps JJAG, Bosman R, Orville AM, Aller P. Sample efficient approaches in time-resolved X-ray serial crystallography and complementary X-ray emission spectroscopy using drop-on-demand tape-drive systems. Methods Enzymol 2024; 709:57-103. [PMID: 39608948 DOI: 10.1016/bs.mie.2024.10.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2024]
Abstract
Dynamic structural biology enables studying biological events at the atomic scale from 10's of femtoseconds to a few seconds duration. With the advent of X-ray Free Electron Lasers (XFELs) and 4th generation synchrotrons, serial crystallography is becoming a major player for time-resolved experiments in structural biology. Despite significant progress, challenges such as obtaining sufficient amounts of protein to produce homogeneous microcrystal slurry, remain. Given this, it has been paramount to develop instrumentation that reduces the amount of microcrystal slurry required for experiments. Tape-drive systems use a conveyor belt made of X-ray transparent material as a motorized solid-support to steer deposited microcrystals into the beam. For efficient sample consumption on-demand ejectors can be synchronized with the X-ray pulses to expose crystals contained in droplets deposited on the tape. Reactions in the crystals can be triggered via various strategies, including pump-probe, substrate/ligand mixing, or gas incubation in the space between droplet ejection and X-ray illumination. Another challenge in time-resolved serial crystallography is interpreting the resulting electron density maps. This is especially difficult for metalloproteins where the active site metal is intimately involved in catalysis and often proceeds through multiple oxidation states during enzymatic catalysis. The unrestricted space around tape-drive systems can be used to accommodate complementary spectroscopic equipment. Here, we highlight tape-drive sample delivery systems for complementary and simultaneous X-ray diffraction (XRD) and X-ray emission spectroscopy (XES) measurements. We describe how the combination of both XRD and XES is a powerful tool for time-resolved experiments at XFELs and synchrotrons.
Collapse
Affiliation(s)
- Jos J A G Kamps
- Diamond Light Source, Harwell Science & Innovation Campus, Didcot, United Kingdom; Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot, United Kingdom
| | - Robert Bosman
- Diamond Light Source, Harwell Science & Innovation Campus, Didcot, United Kingdom; University Medical Center Hamburg-Eppendorf (UKE), Hamburg, Germany
| | - Allen M Orville
- Diamond Light Source, Harwell Science & Innovation Campus, Didcot, United Kingdom; Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot, United Kingdom
| | - Pierre Aller
- Diamond Light Source, Harwell Science & Innovation Campus, Didcot, United Kingdom; Research Complex at Harwell, Rutherford Appleton Laboratory, Didcot, United Kingdom.
| |
Collapse
|
10
|
Zhou L, Tao C, Shen X, Sun X, Wang J, Yuan Q. Unlocking the potential of enzyme engineering via rational computational design strategies. Biotechnol Adv 2024; 73:108376. [PMID: 38740355 DOI: 10.1016/j.biotechadv.2024.108376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 04/27/2024] [Accepted: 05/08/2024] [Indexed: 05/16/2024]
Abstract
Enzymes play a pivotal role in various industries by enabling efficient, eco-friendly, and sustainable chemical processes. However, the low turnover rates and poor substrate selectivity of enzymes limit their large-scale applications. Rational computational enzyme design, facilitated by computational algorithms, offers a more targeted and less labor-intensive approach. There has been notable advancement in employing rational computational protein engineering strategies to overcome these issues, it has not been comprehensively reviewed so far. This article reviews recent developments in rational computational enzyme design, categorizing them into three types: structure-based, sequence-based, and data-driven machine learning computational design. Case studies are presented to demonstrate successful enhancements in catalytic activity, stability, and substrate selectivity. Lastly, the article provides a thorough analysis of these approaches, highlights existing challenges and potential solutions, and offers insights into future development directions.
Collapse
Affiliation(s)
- Lei Zhou
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| | - Chunmeng Tao
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| | - Xiaolin Shen
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| | - Xinxiao Sun
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China
| | - Jia Wang
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China.
| | - Qipeng Yuan
- State Key Laboratory of Chemical Resource Engineering, Beijing University of Chemical Technology, Beijing 100029, China.
| |
Collapse
|
11
|
Sgueglia G, Vrettas MD, Chino M, De Simone A, Lombardi A. MetalHawk: Enhanced Classification of Metal Coordination Geometries by Artificial Neural Networks. J Chem Inf Model 2024; 64:2356-2367. [PMID: 37956388 PMCID: PMC11005052 DOI: 10.1021/acs.jcim.3c00873] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/29/2023] [Accepted: 10/26/2023] [Indexed: 11/15/2023]
Abstract
The chemical properties of metal complexes are strongly dependent on the number and geometrical arrangement of ligands coordinated to the metal center. Existing methods for determining either coordination number or geometry rely on a trade-off between accuracy and computational costs, which hinders their application to the study of large structure data sets. Here, we propose MetalHawk (https://github.com/vrettasm/MetalHawk), a machine learning-based approach to perform simultaneous classification of metal site coordination number and geometry through artificial neural networks (ANNs), which were trained using the Cambridge Structural Database (CSD) and Metal Protein Data Bank (MetalPDB). We demonstrate that the CSD-trained model can be used to classify sites belonging to the most common coordination numbers and geometry classes with balanced accuracy equal to 96.51% for CSD-deposited metal sites. The CSD-trained model was also found to be capable of classifying bioinorganic metal sites from the MetalPDB database, with balanced accuracy equal to 84.29% on the whole PDB data set and to 91.66% on manually reviewed sites in the PDB validation set. Moreover, we report evidence that the output vectors of the CSD-trained model can be considered as a proxy indicator of metal-site distortions, showing that these can be interpreted as a low-dimensional representation of subtle geometrical features present in metal site structures.
Collapse
Affiliation(s)
- Gianmattia Sgueglia
- Department
of Chemical Sciences, University of Naples
Federico II, Via Cintia 21, 80126 Napoli, Italy
| | - Michail D. Vrettas
- Department
of Pharmacy, University of Naples Federico
II, Via Domenico Montesano
49, 80131 Napoli, Italy
| | - Marco Chino
- Department
of Chemical Sciences, University of Naples
Federico II, Via Cintia 21, 80126 Napoli, Italy
| | - Alfonso De Simone
- Department
of Pharmacy, University of Naples Federico
II, Via Domenico Montesano
49, 80131 Napoli, Italy
| | - Angela Lombardi
- Department
of Chemical Sciences, University of Naples
Federico II, Via Cintia 21, 80126 Napoli, Italy
| |
Collapse
|
12
|
Bell EL, Hutton AE, Burke AJ, O'Connell A, Barry A, O'Reilly E, Green AP. Strategies for designing biocatalysts with new functions. Chem Soc Rev 2024; 53:2851-2862. [PMID: 38353665 PMCID: PMC10946311 DOI: 10.1039/d3cs00972f] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Indexed: 03/19/2024]
Abstract
The engineering of natural enzymes has led to the availability of a broad range of biocatalysts that can be used for the sustainable manufacturing of a variety of chemicals and pharmaceuticals. However, for many important chemical transformations there are no known enzymes that can serve as starting templates for biocatalyst development. These limitations have fuelled efforts to build entirely new catalytic sites into proteins in order to generate enzymes with functions beyond those found in Nature. This bottom-up approach to enzyme development can also reveal new fundamental insights into the molecular origins of efficient protein catalysis. In this tutorial review, we will survey the different strategies that have been explored for designing new protein catalysts. These methods will be illustrated through key selected examples, which demonstrate how highly proficient and selective biocatalysts can be developed through experimental protein engineering and/or computational design. Given the rapid pace of development in the field, we are optimistic that designer enzymes will begin to play an increasingly prominent role as industrial biocatalysts in the coming years.
Collapse
Affiliation(s)
- Elizabeth L Bell
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO 80401, USA
- Manchester Institute of Biotechnology, School of Chemistry, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.
| | - Amy E Hutton
- Manchester Institute of Biotechnology, School of Chemistry, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.
| | - Ashleigh J Burke
- Manchester Institute of Biotechnology, School of Chemistry, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.
- Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093, USA
| | - Adam O'Connell
- School of Chemistry, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Amber Barry
- School of Chemistry, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Elaine O'Reilly
- School of Chemistry, University College Dublin, Belfield, Dublin 4, Ireland.
| | - Anthony P Green
- Manchester Institute of Biotechnology, School of Chemistry, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK.
| |
Collapse
|
13
|
Zhuang J, Midgley AC, Wei Y, Liu Q, Kong D, Huang X. Machine-Learning-Assisted Nanozyme Design: Lessons from Materials and Engineered Enzymes. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2210848. [PMID: 36701424 DOI: 10.1002/adma.202210848] [Citation(s) in RCA: 46] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/03/2023] [Indexed: 05/11/2023]
Abstract
Nanozymes are nanomaterials that exhibit enzyme-like biomimicry. In combination with intrinsic characteristics of nanomaterials, nanozymes have broad applicability in materials science, chemical engineering, bioengineering, biochemistry, and disease theranostics. Recently, the heterogeneity of published results has highlighted the complexity and diversity of nanozymes in terms of consistency of catalytic capacity. Machine learning (ML) shows promising potential for discovering new materials, yet it remains challenging for the design of new nanozymes based on ML approaches. Alternatively, ML is employed to promote optimization of intelligent design and application of catalytic materials and engineered enzymes. Incorporation of the successful ML algorithms used in the intelligent design of catalytic materials and engineered enzymes can concomitantly facilitate the guided development of next-generation nanozymes with desirable properties. Here, recent progress in ML, its utilization in the design of catalytic materials and enzymes, and how emergent ML applications serve as promising strategies to circumvent challenges associated with time-expensive and laborious testing in nanozyme research and development are summarized. The potential applications of successful examples of ML-aided catalytic materials and engineered enzymes in nanozyme design are also highlighted, with special focus on the unified aims in enhancing design and recapitulation of substrate selectivity and catalytic activity.
Collapse
Affiliation(s)
- Jie Zhuang
- School of Medicine, and State, Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, 300071, China
| | - Adam C Midgley
- Key Laboratory of Bioactive Materials for the Ministry of Education, College of Life Sciences, State Key Laboratory of Medicinal Chemical Biology, and Frontiers, Science Center for Cell Responses, Nankai University, Tianjin, 300071, China
| | - Yonghua Wei
- Key Laboratory of Bioactive Materials for the Ministry of Education, College of Life Sciences, State Key Laboratory of Medicinal Chemical Biology, and Frontiers, Science Center for Cell Responses, Nankai University, Tianjin, 300071, China
| | - Qiqi Liu
- Key Laboratory of Bioactive Materials for the Ministry of Education, College of Life Sciences, State Key Laboratory of Medicinal Chemical Biology, and Frontiers, Science Center for Cell Responses, Nankai University, Tianjin, 300071, China
| | - Deling Kong
- Key Laboratory of Bioactive Materials for the Ministry of Education, College of Life Sciences, State Key Laboratory of Medicinal Chemical Biology, and Frontiers, Science Center for Cell Responses, Nankai University, Tianjin, 300071, China
| | - Xinglu Huang
- Key Laboratory of Bioactive Materials for the Ministry of Education, College of Life Sciences, State Key Laboratory of Medicinal Chemical Biology, and Frontiers, Science Center for Cell Responses, Nankai University, Tianjin, 300071, China
| |
Collapse
|
14
|
Yang J, Li FZ, Arnold FH. Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering. ACS CENTRAL SCIENCE 2024; 10:226-241. [PMID: 38435522 PMCID: PMC10906252 DOI: 10.1021/acscentsci.3c01275] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 12/26/2023] [Accepted: 01/16/2024] [Indexed: 03/05/2024]
Abstract
Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
Collapse
Affiliation(s)
- Jason Yang
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Francesca-Zhoufan Li
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| | - Frances H. Arnold
- Division
of Chemistry and Chemical Engineering, California
Institute of Technology, Pasadena, California 91125, United States
- Division
of Biology and Biological Engineering, California
Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
15
|
McGuinness KN, Fehon N, Feehan R, Miller M, Mutter AC, Rybak LA, Nam J, AbuSalim JE, Atkinson JT, Heidari H, Losada N, Kim JD, Koder RL, Lu Y, Silberg JJ, Slusky JSG, Falkowski PG, Nanda V. The energetics and evolution of oxidoreductases in deep time. Proteins 2024; 92:52-59. [PMID: 37596815 DOI: 10.1002/prot.26563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/06/2023] [Indexed: 08/20/2023]
Abstract
The core metabolic reactions of life drive electrons through a class of redox protein enzymes, the oxidoreductases. The energetics of electron flow is determined by the redox potentials of organic and inorganic cofactors as tuned by the protein environment. Understanding how protein structure affects oxidation-reduction energetics is crucial for studying metabolism, creating bioelectronic systems, and tracing the history of biological energy utilization on Earth. We constructed ProtReDox (https://protein-redox-potential.web.app), a manually curated database of experimentally determined redox potentials. With over 500 measurements, we can begin to identify how proteins modulate oxidation-reduction energetics across the tree of life. By mapping redox potentials onto networks of oxidoreductase fold evolution, we can infer the evolution of electron transfer energetics over deep time. ProtReDox is designed to include user-contributed submissions with the intention of making it a valuable resource for researchers in this field.
Collapse
Affiliation(s)
- Kenneth N McGuinness
- Department of Natural Sciences, Caldwell University, Caldwell, New Jersey, USA
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Nolan Fehon
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Ryan Feehan
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
| | - Michelle Miller
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Andrew C Mutter
- Department of Physics, The City College of New York, New York, New York, USA
| | - Laryssa A Rybak
- Department of Physics, The City College of New York, New York, New York, USA
| | - Justin Nam
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Jenna E AbuSalim
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - Joshua T Atkinson
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Hirbod Heidari
- Department of Chemistry, University of Texas at Austin, Austin, Texas, USA
| | - Natalie Losada
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
| | - J Dongun Kim
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Ronald L Koder
- Department of Physics, The City College of New York, New York, New York, USA
| | - Yi Lu
- Department of Chemistry, University of Texas at Austin, Austin, Texas, USA
| | - Jonathan J Silberg
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas, USA
| | - Joanna S G Slusky
- Computational Biology Program, The University of Kansas, Lawrence, Kansas, USA
- Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, USA
| | - Paul G Falkowski
- Environmental Biophysics and Molecular Ecology Program, Department of Marine and Coastal Sciences, Rutgers University, New Brunswick, New Jersey, USA
- Department of Earth and Planetary Sciences, Rutgers University, New Brunswick, New Jersey, USA
| | - Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, New Jersey, USA
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey, USA
| |
Collapse
|
16
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
17
|
Laveglia V, Bazayeva M, Andreini C, Rosato A. Hunting down zinc(II)-binding sites in proteins with distance matrices. Bioinformatics 2023; 39:btad653. [PMID: 37878807 PMCID: PMC10630175 DOI: 10.1093/bioinformatics/btad653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/17/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION In recent years, high-throughput sequencing technologies have made available the genome sequences of a huge variety of organisms. However, the functional annotation of the encoded proteins often still relies on low-throughput and costly experimental studies. Bioinformatics approaches offer a promising alternative to accelerate this process. In this work, we focus on the binding of zinc(II) ions, which is needed for 5%-10% of any organism's proteins to achieve their physiologically relevant form. RESULTS To implement a predictor of zinc(II)-binding sites in the 3D structures of proteins, we used a neural network, followed by a filter of the network output against the local structure of all known sites. The latter was implemented as a function comparing the distance matrices of the Cα and Cβ atoms of the sites. We called the resulting tool Master of Metals (MOM). The structural models for the entire proteome of an organism generated by AlphaFold can be used as input to our tool in order to achieve annotation at the whole organism level within a few hours. To demonstrate this, we applied MOM to the yeast proteome, obtaining a precision of about 76%, based on data for homologous proteins. AVAILABILITY AND IMPLEMENTATION Master of Metals has been implemented in Python and is available at https://github.com/cerm-cirmmp/Master-of-metals.
Collapse
Affiliation(s)
- Vincenzo Laveglia
- Department of Chemistry, University of Florence, Sesto Fiorentino 50019, Italy
| | - Milana Bazayeva
- Department of Chemistry, University of Florence, Sesto Fiorentino 50019, Italy
- Magnetic Resonance Center (CERM), University of Florence, Sesto Fiorentino 50019, Italy
| | - Claudia Andreini
- Department of Chemistry, University of Florence, Sesto Fiorentino 50019, Italy
- Magnetic Resonance Center (CERM), University of Florence, Sesto Fiorentino 50019, Italy
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Sesto Fiorentino 50019, Italy
| | - Antonio Rosato
- Department of Chemistry, University of Florence, Sesto Fiorentino 50019, Italy
- Magnetic Resonance Center (CERM), University of Florence, Sesto Fiorentino 50019, Italy
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Sesto Fiorentino 50019, Italy
| |
Collapse
|
18
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and design of protease enzyme specificity using a structure-aware graph convolutional network. Proc Natl Acad Sci U S A 2023; 120:e2303590120. [PMID: 37729196 PMCID: PMC10523478 DOI: 10.1073/pnas.2303590120] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 08/14/2023] [Indexed: 09/22/2023] Open
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key posttranslational modification involved in physiology and disease. The ability to robustly and rapidly predict protease-substrate specificity would also enable targeted proteolytic cleavage by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pretrained PGCN model to guide the design of protease libraries for cleaving two noncanonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Joseph H. Lubin
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | | | - Guanyang Wang
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Statistics, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers–The State University of New Jersey, Piscataway, NJ08854
- Department of Chemistry and Chemical Biology, Rutgers–The State University of New Jersey, Piscataway, NJ08854
| |
Collapse
|
19
|
Dürr SL, Levy A, Rothlisberger U. Metal3D: a general deep learning framework for accurate metal ion location prediction in proteins. Nat Commun 2023; 14:2713. [PMID: 37169763 PMCID: PMC10175565 DOI: 10.1038/s41467-023-37870-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 03/29/2023] [Indexed: 05/13/2023] Open
Abstract
Metal ions are essential cofactors for many proteins and play a crucial role in many applications such as enzyme design or design of protein-protein interactions because they are biologically abundant, tether to the protein using strong interactions, and have favorable catalytic properties. Computational design of metalloproteins is however hampered by the complex electronic structure of many biologically relevant metals such as zinc . In this work, we develop two tools - Metal3D (based on 3D convolutional neural networks) and Metal1D (solely based on geometric criteria) to improve the location prediction of zinc ions in protein structures. Comparison with other currently available tools shows that Metal3D is the most accurate zinc ion location predictor to date with predictions within 0.70 ± 0.64 Å of experimental locations. Metal3D outputs a confidence metric for each predicted site and works on proteins with few homologes in the protein data bank. Metal3D predicts a global zinc density that can be used for annotation of computationally predicted structures and a per residue zinc density that can be used in protein design workflows. Currently trained on zinc, the framework of Metal3D is readily extensible to other metals by modifying the training data.
Collapse
Affiliation(s)
- Simon L Dürr
- Laboratory of Computational Chemistry and Biochemistry,Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Andrea Levy
- Laboratory of Computational Chemistry and Biochemistry,Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry,Institute of Chemical Sciences and Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland.
| |
Collapse
|
20
|
Feehan R, Copeland M, Franklin MW, Slusky JSG. MAHOMES II: A webserver for predicting if a metal binding site is enzymatic. Protein Sci 2023; 32:e4626. [PMID: 36916762 PMCID: PMC10044107 DOI: 10.1002/pro.4626] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 03/08/2023] [Accepted: 03/10/2023] [Indexed: 03/15/2023]
Abstract
Recent advances have enabled high-quality computationally generated structures for proteins with no solved crystal structures. However, protein function data remains largely limited to experimental methods and homology mapping. Since structure determines function, it is natural that methods capable of using computationally generated structures for functional annotations need to be advanced. Our laboratory recently developed a method to distinguish between metalloenzyme and nonenzyme sites. Here we report improvements to this method by upgrading our physicochemical features to alleviate the need for structures with sub-angstrom precision and using machine learning to reduce training data labeling error. Our improved classifier identifies protein bound metal sites as enzymatic or nonenzymatic with 94% precision and 92% recall. We demonstrate that both adjustments increased predictive performance and reliability on sites with sub-angstrom variations. We constructed a set of predicted metalloprotein structures with no solved crystal structures and no detectable homology to our training data. Our model had an accuracy of 90%-97.5% depending on the quality of the predicted structures included in our test. Finally, we found the physicochemical trends that drove this model's successful performance were local protein density, second shell ionizable residue burial, and the pocket's accessibility to the site. We anticipate that our model's ability to correctly identify catalytic metal sites could enable identification of new enzymatic mechanisms and improve de novo metalloenzyme design success rates.
Collapse
Affiliation(s)
- Ryan Feehan
- Center for Computational BiologyThe University of Kansas, 2030 Becker Dr66047LawrenceKansasUSA
| | - Matthew Copeland
- Center for Computational BiologyThe University of Kansas, 2030 Becker Dr66047LawrenceKansasUSA
| | - Meghan W. Franklin
- Center for Computational BiologyThe University of Kansas, 2030 Becker Dr66047LawrenceKansasUSA
| | - Joanna S. G. Slusky
- Center for Computational BiologyThe University of Kansas, 2030 Becker Dr66047LawrenceKansasUSA
- Department of Molecular Biosciences|The University of Kansas, Ave. Lawrence KS 66045‐31011200SunnysideKansasUSA
- Present address:
Generate BiomedicinesSomervilleMassachusettsUSA
| |
Collapse
|
21
|
Feehan R, Copeland M, Franklin MW, Slusky JSG. MAHOMES II: A webserver for predicting if a metal binding site is enzymatic. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531790. [PMID: 36945603 PMCID: PMC10028950 DOI: 10.1101/2023.03.08.531790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Recent advances have enabled high-quality computationally generated structures for proteins with no solved crystal structures. However, protein function data remains largely limited to experimental methods and homology mapping. Since structure determines function, it is natural that methods capable of using computationally generated structures for functional annotations need to be advanced. Our laboratory recently developed a method to distinguish between metalloenzyme and non-enzyme sites. Here we report improvements to this method by upgrading our physicochemical features to alleviate the need for structures with sub-angstrom precision and using machine learning to reduce training data labeling error. Our improved classifier identifies protein bound metal sites as enzymatic or non-enzymatic with 94% precision and 92% recall. We demonstrate that both adjustments increased predictive performance and reliability on sites with sub-angstrom variations. We constructed a set of predicted metalloprotein structures with no solved crystal structures and no detectable homology to our training data. Our model had an accuracy of 90 - 97.5% depending on the quality of the predicted structures included in our test. Finally, we found the physicochemical trends that drove this model's successful performance were local protein density, second shell ionizable residue burial, and the pocket's accessibility to the site. We anticipate that our model's ability to correctly identify catalytic metal sites could enable identification of new enzymatic mechanisms and improve de novo metalloenzyme design success rates. Significance statement Identification of enzyme active sites on proteins with unsolved crystallographic structures can accelerate discovery of novel biochemical reactions, which can impact healthcare, industrial processes, and environmental remediation. Our lab has developed an ML tool for predicting sites on computationally generated protein structures as enzymatic and non-enzymatic. We have made our tool available on a webserver, allowing the scientific community to rapidly search previously unknown protein function space.
Collapse
Affiliation(s)
- Ryan Feehan
- Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047
| | - Matthew Copeland
- Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047
| | - Meghan W. Franklin
- Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047
| | - Joanna S. G. Slusky
- Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047
- Department of Molecular Biosciences, The University of Kansas, 1200 Sunnyside Ave. Lawrence KS 66045-3101
| |
Collapse
|
22
|
Lu C, Lubin JH, Sarma VV, Stentz SZ, Wang G, Wang S, Khare SD. Prediction and Design of Protease Enzyme Specificity Using a Structure-Aware Graph Convolutional Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.16.528728. [PMID: 36824945 PMCID: PMC9949123 DOI: 10.1101/2023.02.16.528728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
Site-specific proteolysis by the enzymatic cleavage of small linear sequence motifs is a key post-translational modification involved in physiology and disease. The ability to robustly and rapidly predict protease substrate specificity would also enable targeted proteolytic cleavage - editing - of a target protein by designed proteases. Current methods for predicting protease specificity are limited to sequence pattern recognition in experimentally-derived cleavage data obtained for libraries of potential substrates and generated separately for each protease variant. We reasoned that a more semantically rich and robust model of protease specificity could be developed by incorporating the three-dimensional structure and energetics of molecular interactions between protease and substrates into machine learning workflows. We present Protein Graph Convolutional Network (PGCN), which develops a physically-grounded, structure-based molecular interaction graph representation that describes molecular topology and interaction energetics to predict enzyme specificity. We show that PGCN accurately predicts the specificity landscapes of several variants of two model proteases: the NS3/4 protease from the Hepatitis C virus (HCV) and the Tobacco Etch Virus (TEV) proteases. Node and edge ablation tests identified key graph elements for specificity prediction, some of which are consistent with known biochemical constraints for protease:substrate recognition. We used a pre-trained PGCN model to guide the design of TEV protease libraries for cleaving two non-canonical substrates, and found good agreement with experimental cleavage results. Importantly, the model can accurately assess designs featuring diversity at positions not present in the training data. The described methodology should enable the structure-based prediction of specificity landscapes of a wide variety of proteases and the construction of tailor-made protease editors for site-selectively and irreversibly modifying chosen target proteins.
Collapse
Affiliation(s)
- Changpeng Lu
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Joseph H. Lubin
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Vidur V. Sarma
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
| | | | - Guanyang Wang
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sijian Wang
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Statistics, Rutgers - The State University of New Jersey, Piscataway, NJ
| | - Sagar D. Khare
- Institute for Quantitative Biomedicine, Rutgers - The State University of New Jersey, Piscataway, NJ
- Department of Chemistry & Chemical Biology, Rutgers - The State University of New Jersey, Piscataway, NJ
| |
Collapse
|
23
|
Fan R, Suo B, Ding Y. Identification of Vesicle Transport Proteins via Hypergraph Regularized K-Local Hyperplane Distance Nearest Neighbour Model. Front Genet 2022; 13:960388. [PMID: 35910197 PMCID: PMC9326258 DOI: 10.3389/fgene.2022.960388] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 06/22/2022] [Indexed: 12/04/2022] Open
Abstract
The prediction of protein function is a common topic in the field of bioinformatics. In recent years, advances in machine learning have inspired a growing number of algorithms for predicting protein function. A large number of parameters and fairly complex neural networks are often used to improve the prediction performance, an approach that is time-consuming and costly. In this study, we leveraged traditional features and machine learning classifiers to boost the performance of vesicle transport protein identification and make the prediction process faster. We adopt the pseudo position-specific scoring matrix (PsePSSM) feature and our proposed new classifier hypergraph regularized k-local hyperplane distance nearest neighbour (HG-HKNN) to classify vesicular transport proteins. We address dataset imbalances with random undersampling. The results show that our strategy has an area under the receiver operating characteristic curve (AUC) of 0.870 and a Matthews correlation coefficient (MCC) of 0.53 on the benchmark dataset, outperforming all state-of-the-art methods on the same dataset, and other metrics of our model are also comparable to existing methods.
Collapse
Affiliation(s)
- Rui Fan
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Bing Suo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
24
|
Andreini C, Rosato A. Structural Bioinformatics and Deep Learning of Metalloproteins: Recent Advances and Applications. Int J Mol Sci 2022; 23:7684. [PMID: 35887033 PMCID: PMC9323969 DOI: 10.3390/ijms23147684] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/04/2022] [Accepted: 07/06/2022] [Indexed: 02/04/2023] Open
Abstract
All living organisms require metal ions for their energy production and metabolic and biosynthetic processes. Within cells, the metal ions involved in the formation of adducts interact with metabolites and macromolecules (proteins and nucleic acids). The proteins that require binding to one or more metal ions in order to be able to carry out their physiological function are called metalloproteins. About one third of all protein structures in the Protein Data Bank involve metalloproteins. Over the past few years there has been tremendous progress in the number of computational tools and techniques making use of 3D structural information to support the investigation of metalloproteins. This trend has been boosted by the successful applications of neural networks and machine/deep learning approaches in molecular and structural biology at large. In this review, we discuss recent advances in the development and availability of resources dealing with metalloproteins from a structure-based perspective. We start by addressing tools for the prediction of metal-binding sites (MBSs) using structural information on apo-proteins. Then, we provide an overview of the methods for and lessons learned from the structural comparison of MBSs in a fold-independent manner. We then move to describing databases of metalloprotein/MBS structures. Finally, we summarizing recent ML/DL applications enhancing the functional interpretation of metalloprotein structures.
Collapse
Affiliation(s)
- Claudia Andreini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Magnetic Resonance Center (CERM), Department of Chemistry, University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Magnetic Resonance Center (CERM), Department of Chemistry, University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
25
|
Laveglia V, Giachetti A, Sala D, Andreini C, Rosato A. Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network. J Chem Inf Model 2022; 62:2951-2960. [PMID: 35679182 PMCID: PMC9241070 DOI: 10.1021/acs.jcim.2c00522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Thirty-eight percent of protein structures in the Protein Data Bank contain at least one metal ion. However, not all these metal sites are biologically relevant. Cations present as impurities during sample preparation or in the crystallization buffer can cause the formation of protein-metal complexes that do not exist in vivo. We implemented a deep learning approach to build a classifier able to distinguish between physiological and adventitious zinc-binding sites in the 3D structures of metalloproteins. We trained the classifier using manually annotated sites extracted from the MetalPDB database. Using a 10-fold cross validation procedure, the classifier achieved an accuracy of about 90%. The same neural classifier could predict the physiological relevance of non-heme mononuclear iron sites with an accuracy of nearly 80%, suggesting that the rules learned on zinc sites have general relevance. By quantifying the relative importance of the features describing the input zinc sites from the network perspective and by analyzing the characteristics of the MetalPDB datasets, we inferred some common principles. Physiological sites present a low solvent accessibility of the aminoacids forming coordination bonds with the metal ion (the metal ligands), a relatively large number of residues in the metal environment (≥20), and a distinct pattern of conservation of Cys and His residues in the site. Adventitious sites, on the other hand, tend to have a low number of donor atoms from the polypeptide chain (often one or two). These observations support the evaluation of the physiological relevance of novel metal-binding sites in protein structures.
Collapse
Affiliation(s)
- Vincenzo Laveglia
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Davide Sala
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Institute for Drug Discovery, Leipzig University, Brüderstr. 34, 04103 Leipzig, Germany.,Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudia Andreini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
26
|
Barnsley KK, Ondrechen MJ. Enzyme active sites: Identification and prediction of function using computational chemistry. Curr Opin Struct Biol 2022; 74:102384. [DOI: 10.1016/j.sbi.2022.102384] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 03/20/2022] [Accepted: 03/28/2022] [Indexed: 11/03/2022]
|
27
|
Tatta ER, Imchen M, Moopantakath J, Kumavath R. Bioprospecting of microbial enzymes: current trends in industry and healthcare. Appl Microbiol Biotechnol 2022; 106:1813-1835. [PMID: 35254498 DOI: 10.1007/s00253-022-11859-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 02/15/2022] [Accepted: 02/26/2022] [Indexed: 12/13/2022]
Abstract
Microbial enzymes have an indispensable role in producing foods, pharmaceuticals, and other commercial goods. Many novel enzymes have been reported from all domains of life, such as plants, microbes, and animals. Nonetheless, industrially desirable enzymes of microbial origin are limited. This review article discusses the classifications, applications, sources, and challenges of most demanded industrial enzymes such as pectinases, cellulase, lipase, and protease. In addition, the production of novel enzymes through protein engineering technologies such as directed evolution, rational, and de novo design, for the improvement of existing industrial enzymes is also explored. We have also explored the role of metagenomics, nanotechnology, OMICs, and machine learning approaches in the bioprospecting of novel enzymes. Overall, this review covers the basics of biocatalysts in industrial and healthcare applications and provides an overview of existing microbial enzyme optimization tools. KEY POINTS: • Microbial bioactive molecules are vital for therapeutic and industrial applications. • High-throughput OMIC is the most proficient approach for novel enzyme discovery. • Comprehensive databases and efficient machine learning models are the need of the hour to fast forward de novo enzyme design and discovery.
Collapse
Affiliation(s)
- Eswar Rao Tatta
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India
| | - Madangchanok Imchen
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India
| | - Jamseel Moopantakath
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India
| | - Ranjith Kumavath
- Department of Genomic Science, School of Biological Sciences, Central University of Kerala, Tejaswini Hills, Periya (PO.), Kasaragod, Kerala, 671320, India.
| |
Collapse
|
28
|
Yu Y, Wang R, Teo RD. Machine Learning Approaches for Metalloproteins. Molecules 2022; 27:1277. [PMID: 35209064 PMCID: PMC8878495 DOI: 10.3390/molecules27041277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 02/10/2022] [Accepted: 02/11/2022] [Indexed: 01/10/2023] Open
Abstract
Metalloproteins are a family of proteins characterized by metal ion binding, whereby the presence of these ions confers key catalytic and ligand-binding properties. Due to their ubiquity among biological systems, researchers have made immense efforts to predict the structural and functional roles of metalloproteins. Ultimately, having a comprehensive understanding of metalloproteins will lead to tangible applications, such as designing potent inhibitors in drug discovery. Recently, there has been an acceleration in the number of studies applying machine learning to predict metalloprotein properties, primarily driven by the advent of more sophisticated machine learning algorithms. This review covers how machine learning tools have consolidated and expanded our comprehension of various aspects of metalloproteins (structure, function, stability, ligand-binding interactions, and inhibitors). Future avenues of exploration are also discussed.
Collapse
Affiliation(s)
- Yue Yu
- Division of Natural and Applied Sciences, Duke Kunshan University, Kunshan, Jiangsu 215316, China;
- Department of Physics, Duke University, Durham, NC 27708, USA
| | - Ruobing Wang
- Department of Chemistry, Duke University, Durham, NC 27708, USA;
| | - Ruijie D. Teo
- Department of Chemistry, Duke University, Durham, NC 27708, USA;
- UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
29
|
Wehrspan ZJ, McDonnell RT, Elcock AH. Identification of Iron-Sulfur (Fe-S) Cluster and Zinc (Zn) Binding Sites Within Proteomes Predicted by DeepMind's AlphaFold2 Program Dramatically Expands the Metalloproteome. J Mol Biol 2022; 434:167377. [PMID: 34838520 PMCID: PMC8785651 DOI: 10.1016/j.jmb.2021.167377] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 02/01/2023]
Abstract
DeepMind's AlphaFold2 software has ushered in a revolution in high quality, 3D protein structure prediction. In very recent work by the DeepMind team, structure predictions have been made for entire proteomes of twenty-one organisms, with >360,000 structures made available for download. Here we show that thousands of novel binding sites for iron-sulfur (Fe-S) clusters and zinc (Zn) ions can be identified within these predicted structures by exhaustive enumeration of all potential ligand-binding orientations. We demonstrate that AlphaFold2 routinely makes highly specific predictions of ligand binding sites: for example, binding sites that are comprised exclusively of four cysteine sidechains fall into three clusters, representing binding sites for 4Fe-4S clusters, 2Fe-2S clusters, or individual Zn ions. We show further: (a) that the majority of known Fe-S cluster and Zn binding sites documented in UniProt are recovered by the AlphaFold2 structures, (b) that there are occasional disputes between AlphaFold2 and UniProt with AlphaFold2 predicting highly plausible alternative binding sites, (c) that the Fe-S cluster binding sites that we identify in E. coli agree well with previous bioinformatics predictions, (d) that cysteines predicted here to be part of ligand binding sites show little overlap with those shown via chemoproteomics techniques to be highly reactive, and (e) that AlphaFold2 occasionally appears to build erroneous disulfide bonds between cysteines that should instead coordinate a ligand. These results suggest that AlphaFold2 could be an important tool for the functional annotation of proteomes, and the methodology presented here is likely to be useful for predicting other ligand-binding sites.
Collapse
Affiliation(s)
| | | | - Adrian H Elcock
- Department of Biochemistry, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
30
|
Han K, Shen LC, Zhu YH, Xu J, Song J, Yu DJ. MAResNet: predicting transcription factor binding sites by combining multi-scale bottom-up and top-down attention and residual network. Brief Bioinform 2021; 23:6399874. [PMID: 34664074 DOI: 10.1093/bib/bbab445] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 09/06/2021] [Accepted: 09/28/2021] [Indexed: 11/14/2022] Open
Abstract
Accurate identification of transcription factor binding sites is of great significance in understanding gene expression, biological development and drug design. Although a variety of methods based on deep-learning models and large-scale data have been developed to predict transcription factor binding sites in DNA sequences, there is room for further improvement in prediction performance. In addition, effective interpretation of deep-learning models is greatly desirable. Here we present MAResNet, a new deep-learning method, for predicting transcription factor binding sites on 690 ChIP-seq datasets. More specifically, MAResNet combines the bottom-up and top-down attention mechanisms and a state-of-the-art feed-forward network (ResNet), which is constructed by stacking attention modules that generate attention-aware features. In particular, the multi-scale attention mechanism is utilized at the first stage to extract rich and representative sequence features. We further discuss the attention-aware features learned from different attention modules in accordance with the changes as the layers go deeper. The features learned by MAResNet are also visualized through the TMAP tool to illustrate that the method can extract the unique characteristics of transcription factor binding sites. The performance of MAResNet is extensively tested on 690 test subsets with an average AUC of 0.927, which is higher than that of the current state-of-the-art methods. Overall, this study provides a new and useful framework for the prediction of transcription factor binding sites by combining the funnel attention modules with the residual network.
Collapse
Affiliation(s)
- Ke Han
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jian Xu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing, 210094, China
| |
Collapse
|
31
|
Feehan R, Montezano D, Slusky JSG. Machine learning for enzyme engineering, selection and design. Protein Eng Des Sel 2021; 34:gzab019. [PMID: 34296736 PMCID: PMC8299298 DOI: 10.1093/protein/gzab019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 06/18/2021] [Accepted: 06/23/2021] [Indexed: 11/15/2022] Open
Abstract
Machine learning is a useful computational tool for large and complex tasks such as those in the field of enzyme engineering, selection and design. In this review, we examine enzyme-related applications of machine learning. We start by comparing tools that can identify the function of an enzyme and the site responsible for that function. Then we detail methods for optimizing important experimental properties, such as the enzyme environment and enzyme reactants. We describe recent advances in enzyme systems design and enzyme design itself. Throughout we compare and contrast the data and algorithms used for these tasks to illustrate how the algorithms and data can be best used by future designers.
Collapse
Affiliation(s)
- Ryan Feehan
- Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047-1620, USA
| | - Daniel Montezano
- Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047-1620, USA
| | - Joanna S G Slusky
- Center for Computational Biology, The University of Kansas, 2030 Becker Dr., Lawrence, KS 66047-1620, USA
- Department of Molecular Biosciences, The University of Kansas, 1200 Sunnyside Ave. Lawrence, KS 66045-7600, USA
| |
Collapse
|