1
|
Zhou R, Fan J, Li S, Zeng W, Chen Y, Zheng X, Chen H, Liao J. LVPocket: integrated 3D global-local information to protein binding pockets prediction with transfer learning of protein structure classification. J Cheminform 2024; 16:79. [PMID: 38972994 PMCID: PMC11229186 DOI: 10.1186/s13321-024-00871-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 06/12/2024] [Indexed: 07/09/2024] Open
Abstract
BACKGROUND Previous deep learning methods for predicting protein binding pockets mainly employed 3D convolution, yet an abundance of convolution operations may lead the model to excessively prioritize local information, thus overlooking global information. Moreover, it is essential for us to account for the influence of diverse protein folding structural classes. Because proteins classified differently structurally exhibit varying biological functions, whereas those within the same structural class share similar functional attributes. RESULTS We proposed LVPocket, a novel method that synergistically captures both local and global information of protein structure through the integration of Transformer encoders, which help the model achieve better performance in binding pockets prediction. And then we tailored prediction models for data of four distinct structural classes of proteins using the transfer learning. The four fine-tuned models were trained on the baseline LVPocket model which was trained on the sc-PDB dataset. LVPocket exhibits superior performance on three independent datasets compared to current state-of-the-art methods. Additionally, the fine-tuned model outperforms the baseline model in terms of performance. SCIENTIFIC CONTRIBUTION We present a novel model structure for predicting protein binding pockets that provides a solution for relying on extensive convolutional computation while neglecting global information about protein structures. Furthermore, we tackle the impact of different protein folding structures on binding pocket prediction tasks through the application of transfer learning methods.
Collapse
Affiliation(s)
- Ruifeng Zhou
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Jing Fan
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Sishu Li
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Wenjie Zeng
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Yilun Chen
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Xiaoshan Zheng
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China
| | - Hongyang Chen
- Research Center for Graph Computing, Zhejiang Lab, Hangzhou, 311121, Zhejiang, People's Republic of China.
| | - Jun Liao
- School of Science, China Pharmaceutical University, Nanjing, 210009, Jiangsu, People's Republic of China.
- Zhejiang Lab, Hangzhou, 311121, Zhejiang, People's Republic of China.
| |
Collapse
|
2
|
Yu Y, Rué Casamajo A, Finnigan W, Schnepel C, Barker R, Morrill C, Heath RS, De Maria L, Turner NJ, Scrutton NS. Structure-Based Design of Small Imine Reductase Panels for Target Substrates. ACS Catal 2023; 13:12310-12321. [PMID: 37736118 PMCID: PMC10510103 DOI: 10.1021/acscatal.3c02278] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 08/20/2023] [Indexed: 09/23/2023]
Abstract
Biocatalysis is important in the discovery, development, and manufacture of pharmaceuticals. However, the identification of enzymes for target transformations of interest requires major screening efforts. Here, we report a structure-based computational workflow to prioritize protein sequences by a score based on predicted activities on substrates, thereby reducing a resource-intensive laboratory-based biocatalyst screening. We selected imine reductases (IREDs) as a class of biocatalysts to illustrate the application of the computational workflow termed IREDFisher. Validation by using published data showed that IREDFisher can retrieve the best enzymes and increase the hit rate by identifying the top 20 ranked sequences. The power of IREDFisher is confirmed by computationally screening 1400 sequences for chosen reductive amination reactions with different levels of complexity. Highly active IREDs were identified by only testing 20 samples in vitro. Our speed test shows that it only takes 90 min to rank 85 sequences from user input and 30 min for the established IREDFisher database containing 591 IRED sequences. IREDFisher is available as a user-friendly web interface (https://enzymeevolver.com/IREDFisher). IREDFisher enables the rapid discovery of IREDs for applications in synthesis and directed evolution studies, with minimal time and resource expenditure. Future use of the workflow with other enzyme families could be implemented following the modification of the workflow scoring function.
Collapse
Affiliation(s)
- Yuqi Yu
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
- Augmented
Biologics Discovery & Design, Department of Biologics Engineering, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB21 6GH, U.K.
| | - Arnau Rué Casamajo
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - William Finnigan
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Christian Schnepel
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rhys Barker
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Charlotte Morrill
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Rachel S. Heath
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Leonardo De Maria
- Medicinal
Chemistry, Research and Early Development, Respiratory and Immunology
(RI), BioPharmaceuticals R&D, AstraZeneca, Gothenburg 43150, Sweden
| | - Nicholas J. Turner
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| | - Nigel S. Scrutton
- Department
of Chemistry, The University of Manchester,
Manchester Institute of Biotechnology, 131 Princess Street, Manchester M1 7DN, U.K.
| |
Collapse
|
3
|
Chang Y, Hawkins BA, Du JJ, Groundwater PW, Hibbs DE, Lai F. A Guide to In Silico Drug Design. Pharmaceutics 2022; 15:pharmaceutics15010049. [PMID: 36678678 PMCID: PMC9867171 DOI: 10.3390/pharmaceutics15010049] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/16/2022] [Accepted: 12/17/2022] [Indexed: 12/28/2022] Open
Abstract
The drug discovery process is a rocky path that is full of challenges, with the result that very few candidates progress from hit compound to a commercially available product, often due to factors, such as poor binding affinity, off-target effects, or physicochemical properties, such as solubility or stability. This process is further complicated by high research and development costs and time requirements. It is thus important to optimise every step of the process in order to maximise the chances of success. As a result of the recent advancements in computer power and technology, computer-aided drug design (CADD) has become an integral part of modern drug discovery to guide and accelerate the process. In this review, we present an overview of the important CADD methods and applications, such as in silico structure prediction, refinement, modelling and target validation, that are commonly used in this area.
Collapse
Affiliation(s)
- Yiqun Chang
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Bryson A. Hawkins
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Jonathan J. Du
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Paul W. Groundwater
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - David E. Hibbs
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Felcia Lai
- Sydney Pharmacy School, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
- Correspondence:
| |
Collapse
|
4
|
Liao J, Wang Q, Wu F, Huang Z. In Silico Methods for Identification of Potential Active Sites of Therapeutic Targets. Molecules 2022; 27:7103. [PMID: 36296697 PMCID: PMC9609013 DOI: 10.3390/molecules27207103] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/12/2022] [Accepted: 08/25/2022] [Indexed: 07/30/2023] Open
Abstract
Target identification is an important step in drug discovery, and computer-aided drug target identification methods are attracting more attention compared with traditional drug target identification methods, which are time-consuming and costly. Computer-aided drug target identification methods can greatly reduce the searching scope of experimental targets and associated costs by identifying the diseases-related targets and their binding sites and evaluating the druggability of the predicted active sites for clinical trials. In this review, we introduce the principles of computer-based active site identification methods, including the identification of binding sites and assessment of druggability. We provide some guidelines for selecting methods for the identification of binding sites and assessment of druggability. In addition, we list the databases and tools commonly used with these methods, present examples of individual and combined applications, and compare the methods and tools. Finally, we discuss the challenges and limitations of binding site identification and druggability assessment at the current stage and provide some recommendations and future perspectives.
Collapse
Affiliation(s)
- Jianbo Liao
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Key Laboratory of Computer-Aided Drug Design of Dongguan City, Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan 523808, China
- The Second School of Clinical Medicine, Guangdong Medical University, Dongguan 523808, China
| | - Qinyu Wang
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Key Laboratory of Computer-Aided Drug Design of Dongguan City, Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan 523808, China
| | - Fengxu Wu
- Hubei Key Laboratory of Wudang Local Chinese Medicine Research, School of Pharmaceutical Sciences, Hubei University of Medicine, Shiyan 442000, China
| | - Zunnan Huang
- Key Laboratory of Big Data Mining and Precision Drug Design of Guangdong Medical University, Key Laboratory of Computer-Aided Drug Design of Dongguan City, Key Laboratory for Research and Development of Natural Drugs of Guangdong Province, School of Pharmacy, Guangdong Medical University, Dongguan 523808, China
- Marine Biomedical Research Institute of Guangdong Zhanjiang, Zhanjiang 524023, China
| |
Collapse
|
5
|
Toti D, Macari G, Barbierato E, Polticelli F. FGDB: a comprehensive graph database of ligand fragments from the Protein Data Bank. Database (Oxford) 2022; 2022:6619197. [PMID: 35763362 PMCID: PMC9239314 DOI: 10.1093/database/baac044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 05/06/2022] [Accepted: 05/31/2022] [Indexed: 11/22/2022]
Abstract
This work presents Fragment Graph DataBase (FGDB), a graph database of ligand fragments extracted and generated from the protein entries available in the Protein Data Bank (PDB). FGDB is meant to support and elicit campaigns of fragment-based drug design, by enabling users to query it in order to construct ad hoc, target-specific libraries. In this regard, the database features more than 17 000 fragments, typically small, highly soluble and chemically stable molecules expressed via their canonical Simplified Molecular Input Line Entry System (SMILES) representation. For these fragments, the database provides information related to their contact frequencies with the amino acids, the ligands they are contained in and the proteins the latter bind to. The graph database can be queried via standard web forms and textual searches by a number of identifiers (SMILES, ligand and protein PDB ids) as well as via graphical queries that can be performed against the graph itself, providing users with an intuitive and effective view upon the underlying biological entities. Further search mechanisms via advanced conjunctive/disjunctive/negated textual queries are also possible, in order to allow scientists to look for specific relationships and export their results for further studies. This work also presents two sample use cases where maternal embryonic leucine zipper kinase and mesotrypsin are used as a target, being proteins of high biomedical relevance for the development of cancer therapies. Database URL: http://biochimica3.bio.uniroma3.it/fragments-web/
Collapse
Affiliation(s)
- Daniele Toti
- Department of Mathematics and Physics, Catholic University of the Sacred Heart, Faculty of Mathematical, Physical and Natural Sciences , via della Garzetta 48, Brescia 25133, Italy
| | - Gabriele Macari
- Department of Sciences, Roma Tre University , viale Marconi 446, Roma, Lazio 00146, Italy
| | - Enrico Barbierato
- Department of Mathematics and Physics, Catholic University of the Sacred Heart, Faculty of Mathematical, Physical and Natural Sciences , via della Garzetta 48, Brescia 25133, Italy
| | - Fabio Polticelli
- Department of Sciences, Roma Tre University , viale Marconi 446, Roma, Lazio 00146, Italy
- Roma Tre Section, National Institute of Nuclear Physics , via della Vasca Navale 84, Roma 00146, Italy
| |
Collapse
|
6
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|
7
|
Guterres H, Park SJ, Zhang H, Im W. CHARMM-GUI LBS Finder & Refiner for Ligand Binding Site Prediction and Refinement. J Chem Inf Model 2021; 61:3744-3751. [PMID: 34296608 DOI: 10.1021/acs.jcim.1c00561] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A protein performs its task by binding a variety of ligands in its local region that is also known as the ligand-binding-site (LBS). Therefore, accurate prediction, characterization, and refinement of LBS can facilitate protein functional annotations and structure-based drug design. In this work, we present CHARMM-GUI LBS Finder & Refiner (https://www.charmm-gui.org/input/lbsfinder) that predicts potential LBS, offers interactive features for local LBS structure analysis, and prepares various molecular dynamics (MD) systems and inputs by setting up distance restraint potentials for LBS structure refinement. LBS Finder & Refiner supports 5 different commonly used simulation programs, such as NAMD, AMBER, GROMACS, GENESIS, and OpenMM, for LBS structure refinement together with hydrogen mass repartitioning. The capability of LBS Finder & Refiner is illustrated through LBS structure predictions and refinements of 48 modeled and 20 apo benchmark target proteins. Overall, successful LBS structure predictions and refinements are seen in our benchmark tests. We hope that LBS Finder & Refiner is useful to predict, characterize, and refine potential LBS on any given protein of interest.
Collapse
Affiliation(s)
- Hugo Guterres
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Sang-Jun Park
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Han Zhang
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| | - Wonpil Im
- Departments of Biological Sciences, Chemistry, Bioengineering, and Computer Science and Engineering, Lehigh University, Bethlehem, Pennsylvania 18015, United States
| |
Collapse
|
8
|
Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches to predict protein functional families and functional sites. Curr Opin Struct Biol 2021; 70:108-122. [PMID: 34225010 DOI: 10.1016/j.sbi.2021.05.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/13/2021] [Accepted: 05/25/2021] [Indexed: 01/06/2023]
Abstract
Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.
Collapse
Affiliation(s)
- Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
9
|
Marques SM, Planas-Iglesias J, Damborsky J. Web-based tools for computational enzyme design. Curr Opin Struct Biol 2021; 69:19-34. [PMID: 33667757 DOI: 10.1016/j.sbi.2021.01.010] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/14/2021] [Accepted: 01/27/2021] [Indexed: 12/30/2022]
Abstract
Enzymes are in high demand for very diverse biotechnological applications. However, natural biocatalysts often need to be engineered for fine-tuning their properties towards the end applications, such as the activity, selectivity, stability to temperature or co-solvents, and solubility. Computational methods are increasingly used in this task, providing predictions that narrow down the space of possible mutations significantly and can enormously reduce the experimental burden. Many computational tools are available as web-based platforms, making them accessible to non-expert users. These platforms are typically user-friendly, contain walk-throughs, and do not require deep expertise and installations. Here we describe some of the most recent outstanding web-tools for enzyme engineering and formulate future perspectives in this field.
Collapse
Affiliation(s)
- Sérgio M Marques
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic.
| |
Collapse
|
10
|
Mylonas SK, Axenopoulos A, Daras P. DeepSurf: A surface-based deep learning approach for the prediction of ligand binding sites on proteins. Bioinformatics 2021; 37:1681-1690. [PMID: 33471069 DOI: 10.1093/bioinformatics/btab009] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Revised: 12/16/2020] [Accepted: 01/05/2021] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. RESULTS In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3 D voxelized grids are placed on the protein's surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior results on three diverse testing datasets, by surpassing all its main deep learning-based competitors, while attaining competitive performance to a set of traditional non-data-driven approaches. AVAILABILITY The source code of the method along with trained models are freely available at https://github.com/stemylonas/DeepSurf.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stelios K Mylonas
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| | - Apostolos Axenopoulos
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| | - Petros Daras
- Information Technologies Institute, Centre for Research and Technology Hellas, Thessaloniki, 57001, Greece
| |
Collapse
|
11
|
Macari G, Toti D, Pasquadibisceglie A, Polticelli F. DockingApp RF: A State-of-the-Art Novel Scoring Function for Molecular Docking in a User-Friendly Interface to AutoDock Vina. Int J Mol Sci 2020; 21:ijms21249548. [PMID: 33333976 PMCID: PMC7765429 DOI: 10.3390/ijms21249548] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 12/11/2020] [Accepted: 12/11/2020] [Indexed: 11/28/2022] Open
Abstract
Motivation: Bringing a new drug to the market is expensive and time-consuming. To cut the costs and time, computer-aided drug design (CADD) approaches have been increasingly included in the drug discovery pipeline. However, despite traditional docking tools show a good conformational space sampling ability, they are still unable to produce accurate binding affinity predictions. This work presents a novel scoring function for molecular docking seamlessly integrated into DockingApp, a user-friendly graphical interface for AutoDock Vina. The proposed function is based on a random forest model and a selection of specific features to overcome the existing limits of Vina’s original scoring mechanism. A novel version of DockingApp, named DockingApp RF, has been developed to host the proposed scoring function and to automatize the rescoring procedure of the output of AutoDock Vina, even to nonexpert users. Results: By coupling intermolecular interaction, solvent accessible surface area features and Vina’s energy terms, DockingApp RF’s new scoring function is able to improve the binding affinity prediction of AutoDock Vina. Furthermore, comparison tests carried out on the CASF-2013 and CASF-2016 datasets demonstrate that DockingApp RF’s performance is comparable to other state-of-the-art machine-learning- and deep-learning-based scoring functions. The new scoring function thus represents a significant advancement in terms of the reliability and effectiveness of docking compared to AutoDock Vina’s scoring function. At the same time, the characteristics that made DockingApp appealing to a wide range of users are retained in this new version and have been complemented with additional features.
Collapse
Affiliation(s)
- Gabriele Macari
- Department of Sciences, Roma Tre University, 00146 Rome, Italy; (G.M.); (A.P.)
| | - Daniele Toti
- Faculty of Mathematical, Physical and Natural Sciences, Catholic University of the Sacred Heart, 25121 Brescia, Italy;
| | | | - Fabio Polticelli
- Department of Sciences, Roma Tre University, 00146 Rome, Italy; (G.M.); (A.P.)
- National Institute of Nuclear Physics, Roma Tre Section, 00146 Rome, Italy
- Correspondence:
| |
Collapse
|
12
|
Caprari S, Brandi V, Pasquadibisceglie A, Polticelli F. Uncovering the structure and function of Pseudomonas aeruginosa periplasmic proteins by an in silico approach. J Biomol Struct Dyn 2019; 38:4508-4520. [PMID: 31631799 DOI: 10.1080/07391102.2019.1683468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Pseudomonas aeruginosa is an opportunistic human pathogen highly relevant from a biomedical viewpoint. It is one of the main causes of infection in hospitalized patients and a major cause of mortality of cystic fibrosis patients. This is also due to its ability to develop resistance to antibiotics by various mechanisms. Therefore, it is urgent and desirable to identify novel targets for the development of new antibacterial drugs against Pseudomonas aeruginosa. In this work this problem was tackled by an in silico approach aimed at providing a reliable structural model and functional annotation for the Pseudomonas aeruginosa periplasmic proteins for which these data are not available yet. A total of 83 protein sequences were analyzed, and the corresponding structural models were built, leading to the identification of 32 periplasmic 'substrate-binding proteins', 14 enzymes and 4 proteins with different functions, including lipids and metals binding. The most interesting cases were found within the 'enzymes' group with the identification of a lipase, which can be regarded as a virulence factor, a protease involved in the assembly of β-barrel membrane proteins and a l,d-transpeptidase, which could contribute to confer resistance to β-lactam antibiotics to the bacterium.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Silvia Caprari
- Department of Sciences, Roma Tre University, Rome, Italy
| | | | | | - Fabio Polticelli
- Department of Sciences, Roma Tre University, Rome, Italy.,National Institute of Nuclear Physics, Roma Tre Section, Rome, Italy
| |
Collapse
|
13
|
Computational methods and tools for binding site recognition between proteins and small molecules: from classical geometrical approaches to modern machine learning strategies. J Comput Aided Mol Des 2019; 33:887-903. [PMID: 31628659 DOI: 10.1007/s10822-019-00235-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 10/11/2019] [Indexed: 10/25/2022]
Abstract
In the current "genomic era" the number of identified genes is growing exponentially. However, the biological function of a large number of the corresponding proteins is still unknown. Recognition of small molecule ligands (e.g., substrates, inhibitors, allosteric regulators, etc.) is pivotal for protein functions in the vast majority of the cases and knowledge of the region where these processes take place is essential for protein function prediction and drug design. In this regard, computational methods represent essential tools to tackle this problem. A significant number of software tools have been developed in the last few years which exploit either protein sequence information, structure information or both. This review describes the most recent developments in protein function recognition and binding site prediction, in terms of both freely-available and commercial solutions and tools, detailing the main characteristics of the considered tools and providing a comparative analysis of their performance.
Collapse
|
14
|
Jendele L, Krivak R, Skoda P, Novotny M, Hoksza D. PrankWeb: a web server for ligand binding site prediction and visualization. Nucleic Acids Res 2019; 47:W345-W349. [PMID: 31114880 PMCID: PMC6602436 DOI: 10.1093/nar/gkz424] [Citation(s) in RCA: 215] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 04/27/2019] [Accepted: 05/09/2019] [Indexed: 11/12/2022] Open
Abstract
PrankWeb is an online resource providing an interface to P2Rank, a state-of-the-art method for ligand binding site prediction. P2Rank is a template-free machine learning method based on the prediction of local chemical neighborhood ligandability centered on points placed on a solvent-accessible protein surface. Points with a high ligandability score are then clustered to form the resulting ligand binding sites. In addition, PrankWeb provides a web interface enabling users to easily carry out the prediction and visually inspect the predicted binding sites via an integrated sequence-structure view. Moreover, PrankWeb can determine sequence conservation for the input molecule and use this in both the prediction and result visualization steps. Alongside its online visualization options, PrankWeb also offers the possibility of exporting the results as a PyMOL script for offline visualization. The web frontend communicates with the server side via a REST API. In high-throughput scenarios, therefore, users can utilize the server API directly, bypassing the need for a web-based frontend or installation of the P2Rank application. PrankWeb is available at http://prankweb.cz/, while the web application source code and the P2Rank method can be accessed at https://github.com/jendelel/PrankWebApp and https://github.com/rdk/p2rank, respectively.
Collapse
Affiliation(s)
- Lukas Jendele
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Czech Republic
| | - Radoslav Krivak
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Czech Republic
| | - Petr Skoda
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Czech Republic
| | - Marian Novotny
- Department of Cell Biology, Faculty of Science, Charles University, Czech Republic
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Czech Republic
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Luxembourg
| |
Collapse
|
15
|
Fragment-Based Ligand-Protein Contact Statistics: Application to Docking Simulations. Int J Mol Sci 2019; 20:ijms20102499. [PMID: 31117183 PMCID: PMC6567162 DOI: 10.3390/ijms20102499] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 05/16/2019] [Accepted: 05/17/2019] [Indexed: 01/26/2023] Open
Abstract
In this work, the information contained in the contacts between fragments of small-molecule ligands and protein residues has been collected and its exploitability has been verified by using the scoring of docking simulations as a test case for bringing about a proof of concept. Contact statistics between small-molecule fragments and binding site residues were collected and analyzed using a dataset composed of 200,000+ binding sites and associated ligands, derived from the database of the LIBRA ligand binding site recognition software, as a starting point. The fragments were generated by applying the decomposition algorithm implemented in BRICS. A simple "potential" based on the contact frequencies was tested against the CASF-2013 benchmark; its performance was then evaluated through the rescoring of docking poses generated for the DUD-E dataset. The results obtained indicate that this approach, its simplicity notwithstanding, yields promising results that are comparable, and in some cases, superior, to those obtained with other, more complex scoring functions.
Collapse
|
16
|
Toti D, Macari G, Polticelli F. Protein-ligand binding site detection as an alternative route to molecular docking and drug repurposing. BIO-ALGORITHMS AND MED-SYSTEMS 2018. [DOI: 10.1515/bams-2018-0004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Abstract
After the onset of the genomic era, the detection of ligand binding sites in proteins has emerged over the last few years as a powerful tool for protein function prediction. Several approaches, both sequence and structure based, have been developed, but the full potential of the corresponding tools has not been exploited yet. Here, we describe the development and classification of a large, almost exhaustive, collection of protein-ligand binding sites to be used, in conjunction with the Ligand Binding Site Recognition Application Web Application developed in our laboratory, as an alternative to virtual screening through molecular docking simulations to identify novel lead compounds for known targets. Ligand binding sites derived from the Protein Data Bank have been clustered according to ligand similarity, and given a known ligand, the binding mode of related ligands to the same target can be predicted. The collection of ligand binding sites contains more than 200,000 sites corresponding to more than 20,000 different ligands. Furthermore, the ligand binding sites of all Food and Drug Administration-approved drugs have been classified as well, allowing to investigate the possible binding of each of them (and related compounds) to a given target for drug repurposing and redesign initiatives. Sample usage cases are also described to demonstrate the effectiveness of this approach.
Collapse
|
17
|
Han M, Song Y, Qian J, Ming D. Sequence-based prediction of physicochemical interactions at protein functional sites using a function-and-interaction-annotated domain profile database. BMC Bioinformatics 2018; 19:204. [PMID: 29859055 PMCID: PMC5984826 DOI: 10.1186/s12859-018-2206-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 05/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background Identifying protein functional sites (PFSs) and, particularly, the physicochemical interactions at these sites is critical to understanding protein functions and the biochemical reactions involved. Several knowledge-based methods have been developed for the prediction of PFSs; however, accurate methods for predicting the physicochemical interactions associated with PFSs are still lacking. Results In this paper, we present a sequence-based method for the prediction of physicochemical interactions at PFSs. The method is based on a functional site and physicochemical interaction-annotated domain profile database, called fiDPD, which was built using protein domains found in the Protein Data Bank. This method was applied to 13 target proteins from the very recent Critical Assessment of Structure Prediction (CASP10/11), and our calculations gave a Matthews correlation coefficient (MCC) value of 0.66 for PFS prediction and an 80% recall in the prediction of the associated physicochemical interactions. Conclusions Our results show that, in addition to the PFSs, the physical interactions at these sites are also conserved in the evolution of proteins. This work provides a valuable sequence-based tool for rational drug design and side-effect assessment. The method is freely available and can be accessed at http://202.119.249.49.
Collapse
Affiliation(s)
- Min Han
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yifan Song
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Jiaqiang Qian
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, 200438, People's Republic of China
| | - Dengming Ming
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Biotech Building Room B1-404, 30 South Puzhu Road, Jiangsu, 211816, Nanjing, People's Republic of China.
| |
Collapse
|