1
|
Sciabola S, Torella R, Nagata A, Boehm M. Critical Assessment of State‐of‐the‐Art Ligand‐Based Virtual Screening Methods. Mol Inform 2022; 41:e2200103. [DOI: 10.1002/minf.202200103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 07/24/2022] [Indexed: 11/10/2022]
|
2
|
Li M, Hu J, Wang Y, Li Y, Zhang L, Liu Z. Challenging Reverse Screening: A Benchmark Study for Comprehensive Evaluation. Mol Inform 2021; 41:e2100063. [PMID: 34787366 DOI: 10.1002/minf.202100063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 10/15/2021] [Indexed: 11/08/2022]
Abstract
As an efficient way of computational target prediction, reverse docking can find not only potential targets but also binding modes for a query ligand. Though the number of available docking tools keeps expanding, there is still not a comprehensive evaluation study which can uncover the advantages and limitations of these strategies in the research field of computational target-fishing. In this study, we propose a brand-new evaluation dataset tailor-made for reverse docking, which is composed of a true positive set (the core set) and two negative sets (the similar decoy set and the dissimilar decoy set). The proposed evaluation dataset can assess the prediction performance of docking tools as various values affected by varying degrees of inter-target ranking bias. The performance of four classical docking programs (AutoDock, AutoDock Vina, Glide and GOLD) was evaluated utilizing our dataset, and a biased prediction performance was observed regarding binding site properties. The results demonstrated that Glide (SP) and Glide(XP) had the best capacity to find true targets whether there was inter-target ranking bias or not.
Collapse
Affiliation(s)
- Mingna Li
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, 100191, Beijing, P.R. China
| | - Jianxing Hu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, 100191, Beijing, P.R. China
| | - Yanxing Wang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, 100191, Beijing, P.R. China
| | - Yibo Li
- Academy for Advanced Interdisciplinary Studies, Peking University, Yiheyuan Road 5, Haidian District, Beijing, P.R. China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, 100191, Beijing, P.R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, 100191, Beijing, P.R. China
| |
Collapse
|
3
|
Šribar D, Grabowski M, Murgueitio MS, Bermudez M, Weindl G, Wolber G. Identification and characterization of a novel chemotype for human TLR8 inhibitors. Eur J Med Chem 2019; 179:744-752. [DOI: 10.1016/j.ejmech.2019.06.084] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 10/26/2022]
|
4
|
Evaluation of different virtual screening strategies on the basis of compound sets with characteristic core distributions and dissimilarity relationships. J Comput Aided Mol Des 2019; 33:729-743. [DOI: 10.1007/s10822-019-00218-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 08/13/2019] [Indexed: 02/07/2023]
|
5
|
Tutone M, Perricone U, Almerico AM. Conf-VLKA: A structure-based revisitation of the Virtual Lock-and-key Approach. J Mol Graph Model 2016; 71:50-57. [PMID: 27842227 DOI: 10.1016/j.jmgm.2016.11.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 11/03/2016] [Accepted: 11/07/2016] [Indexed: 02/02/2023]
Abstract
In a previous work, we developed the in house Virtual Lock-and-Key Approach (VLKA) in order to evaluate target assignment starting from molecular descriptors calculated on known inhibitors used as an information source. This protocol was able to predict the correct biological target for the whole dataset with a good degree of reliability (80%), and proved experimentally, which was useful for the target fishing of unknown compounds. In this paper, we tried to remodel the previous in house developed VLKA in a more sophisticated one in order to evaluate the influence of 3D conformation of ligands on the accuracy of the prediction. We applied the same previous algorithm of scoring and ranking but, this time, combining it with a structure-based approach as docking. For this reason, we retrieved from the RCSB Protein Data Bank (PDB), the available 3D structures of the biological targets included into the previous work, and we used them to calculate poses of the 7352 dataset compounds in the VLKA biological targets. First, docking protocol has been used to retrieve docking scores, then, from the docked poses of each molecule, 3D-descriptors were calculated (Conf-VLKA), While the use of the simple docking scores proved to be inadequate to improve compounds classification, the Conf-VLKA showed some interesting variations compared to the original VLKA, especially for targets whose ligands present a high number of rotamers. This work represent a first preliminary study to be completed using other techniques such as induced fit docking or molecular dynamics structure clustering to take into account the protein side chains adaptation to ligands structures.
Collapse
Affiliation(s)
- Marco Tutone
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università di Palermo, Via Archirafi 28, Palermo, Italy.
| | - Ugo Perricone
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università di Palermo, Via Archirafi 28, Palermo, Italy
| | - Anna Maria Almerico
- Dipartimento di Scienze e Tecnologie Biologiche Chimiche e Farmaceutiche (STEBICEF), Università di Palermo, Via Archirafi 28, Palermo, Italy
| |
Collapse
|
6
|
Abstract
INTRODUCTION With the emergence of the 'big data' era, the biomedical research community has great interest in exploiting publicly available chemical information for drug discovery. PubChem is an example of public databases that provide a large amount of chemical information free of charge. AREAS COVERED This article provides an overview of how PubChem's data, tools, and services can be used for virtual screening and reviews recent publications that discuss important aspects of exploiting PubChem for drug discovery. EXPERT OPINION PubChem offers comprehensive chemical information useful for drug discovery. It also provides multiple programmatic access routes, which are essential to build automated virtual screening pipelines that exploit PubChem data. In addition, PubChemRDF allows users to download PubChem data and load them into a local computing facility, facilitating data integration between PubChem and other resources. PubChem resources have been used in many studies for developing bioactivity and toxicity prediction models, discovering polypharmacologic (multi-target) ligands, and identifying new macromolecule targets of compounds (for drug-repurposing or off-target side effect prediction). These studies demonstrate the usefulness of PubChem as a key resource for computer-aided drug discovery and related area.
Collapse
Affiliation(s)
- Sunghwan Kim
- a National Center for Biotechnology Information, National Library of Medicine , National Institutes of Health , Department of Health and Human Services, Bethesda , MD , USA
| |
Collapse
|
7
|
Ibrahim TM, Bauer MR, Dörr A, Veyisoglu E, Boeckler FM. pROC-Chemotype Plots Enhance the Interpretability of Benchmarking Results in Structure-Based Virtual Screening. J Chem Inf Model 2015; 55:2297-307. [DOI: 10.1021/acs.jcim.5b00475] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tamer M. Ibrahim
- Laboratory
for Molecular Design and Pharmaceutical Biophysics, Department of
Pharmaceutical and Medicinal Chemistry, Institute of Pharmaceutical
Sciences, Eberhard Karls Universität Tübingen, Auf
der Morgenstelle 8, 72076 Tübingen, Germany
| | - Matthias R. Bauer
- Laboratory
for Molecular Design and Pharmaceutical Biophysics, Department of
Pharmaceutical and Medicinal Chemistry, Institute of Pharmaceutical
Sciences, Eberhard Karls Universität Tübingen, Auf
der Morgenstelle 8, 72076 Tübingen, Germany
| | - Alexander Dörr
- Center
for Bioinformatics Tübingen (ZBIT), Eberhard Karls University Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Erdem Veyisoglu
- Laboratory
for Molecular Design and Pharmaceutical Biophysics, Department of
Pharmaceutical and Medicinal Chemistry, Institute of Pharmaceutical
Sciences, Eberhard Karls Universität Tübingen, Auf
der Morgenstelle 8, 72076 Tübingen, Germany
| | - Frank M. Boeckler
- Laboratory
for Molecular Design and Pharmaceutical Biophysics, Department of
Pharmaceutical and Medicinal Chemistry, Institute of Pharmaceutical
Sciences, Eberhard Karls Universität Tübingen, Auf
der Morgenstelle 8, 72076 Tübingen, Germany
- Center
for Bioinformatics Tübingen (ZBIT), Eberhard Karls University Tübingen, Sand 1, 72076 Tübingen, Germany
| |
Collapse
|
8
|
Lagarde N, Zagury JF, Montes M. Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives. J Chem Inf Model 2015; 55:1297-307. [PMID: 26038804 DOI: 10.1021/acs.jcim.5b00090] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.
Collapse
Affiliation(s)
- Nathalie Lagarde
- Laboratoire Génomique, Bioinformatique et Applications, EA 4627, Conservatoire National des Arts et Métiers, 292 rue Saint Martin, 75003 Paris, France
| | - Jean-François Zagury
- Laboratoire Génomique, Bioinformatique et Applications, EA 4627, Conservatoire National des Arts et Métiers, 292 rue Saint Martin, 75003 Paris, France
| | - Matthieu Montes
- Laboratoire Génomique, Bioinformatique et Applications, EA 4627, Conservatoire National des Arts et Métiers, 292 rue Saint Martin, 75003 Paris, France
| |
Collapse
|
9
|
Lindh M, Svensson F, Schaal W, Zhang J, Sköld C, Brandt P, Karlén A. Toward a Benchmarking Data Set Able to Evaluate Ligand- and Structure-based Virtual Screening Using Public HTS Data. J Chem Inf Model 2015; 55:343-53. [DOI: 10.1021/ci5005465] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Martin Lindh
- Organic Pharmaceutical Chemistry,
Department of Medicinal Chemistry, Uppsala University, Biomedical
Centre, Box 574, SE- 751 23 Uppsala, Sweden
| | - Fredrik Svensson
- Organic Pharmaceutical Chemistry,
Department of Medicinal Chemistry, Uppsala University, Biomedical
Centre, Box 574, SE- 751 23 Uppsala, Sweden
| | - Wesley Schaal
- Organic Pharmaceutical Chemistry,
Department of Medicinal Chemistry, Uppsala University, Biomedical
Centre, Box 574, SE- 751 23 Uppsala, Sweden
| | - Jin Zhang
- Organic Pharmaceutical Chemistry,
Department of Medicinal Chemistry, Uppsala University, Biomedical
Centre, Box 574, SE- 751 23 Uppsala, Sweden
| | - Christian Sköld
- Organic Pharmaceutical Chemistry,
Department of Medicinal Chemistry, Uppsala University, Biomedical
Centre, Box 574, SE- 751 23 Uppsala, Sweden
| | - Peter Brandt
- Organic Pharmaceutical Chemistry,
Department of Medicinal Chemistry, Uppsala University, Biomedical
Centre, Box 574, SE- 751 23 Uppsala, Sweden
| | - Anders Karlén
- Organic Pharmaceutical Chemistry,
Department of Medicinal Chemistry, Uppsala University, Biomedical
Centre, Box 574, SE- 751 23 Uppsala, Sweden
| |
Collapse
|
10
|
Hamza A, Wagner JM, Wei NN, Kwiatkowski S, Zhan CG, Watt DS, Korotkov KV. Application of the 4D fingerprint method with a robust scoring function for scaffold-hopping and drug repurposing strategies. J Chem Inf Model 2014; 54:2834-45. [PMID: 25229183 PMCID: PMC4210175 DOI: 10.1021/ci5003872] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Two
factors contribute to the inefficiency associated with screening
pharmaceutical library collections as a means of identifying new drugs:
[1] the limited success of virtual screening (VS) methods in identifying
new scaffolds; [2] the limited accuracy of computational methods in
predicting off-target effects. We recently introduced a 3D shape-based
similarity algorithm of the SABRE program, which encodes a consensus
molecular shape pattern of a set of active ligands into a 4D fingerprint
descriptor. Here, we report a mathematical model for shape similarity
comparisons and ligand database filtering using this 4D fingerprint
method and benchmarked the scoring function HWK (Hamza–Wei–Korotkov),
using the 81 targets of the DEKOIS database. Subsequently, we applied
our combined 4D fingerprint and HWK scoring function
VS approach in scaffold-hopping and drug repurposing using the National
Cancer Institute (NCI) and Food and Drug Administration (FDA) databases,
and we identified new inhibitors with different scaffolds of MycP1 protease from the mycobacterial ESX-1 secretion system. Experimental
evaluation of nine compounds from the NCI database and three from
the FDA database displayed IC50 values ranging from 70
to 100 μM against MycP1 and possessed high structural
diversity, which provides departure points for further structure–activity
relationship (SAR) optimization. In addition, this study demonstrates
that the combination of our 4D fingerprint algorithm and the HWK scoring function may provide a means for identifying
repurposed drugs for the treatment of infectious diseases and may
be used in the drug-target profile strategy.
Collapse
Affiliation(s)
- Adel Hamza
- Department of Molecular and Cellular Biochemistry, ‡Center for Structural Biology, §Center for Pharmaceutical Research and Innovation, College of Pharmacy, ∥Molecular Modeling and Biopharmaceutical Center, and ⊥Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky , Lexington, Kentucky 40536, United States
| | | | | | | | | | | | | |
Collapse
|
11
|
Rosenbaum L, Jahn A, Dörr A, Zell A. Optimization and visualization of the edge weights in optimal assignment methods for virtual screening. BioData Min 2013; 6:7. [PMID: 23531368 PMCID: PMC3639874 DOI: 10.1186/1756-0381-6-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 03/10/2013] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Ligand-based virtual screening plays a fundamental part in the early drug discovery stage. In a virtual screening, a chemical library is searched for molecules with similar properties to a query molecule by means of a similarity function. The optimal assignment of chemical graphs has proven to be a valuable similarity function for many cheminformatic tasks, such as virtual screening. The optimal assignment assumes all atoms of a query molecule to be equally important, which is not realistic depending on the binding mode of a ligand. The importance of a query molecule's atoms can be integrated in the optimal assignment by weighting the assignment edges. We optimized the edge weights with respect to the virtual screening performance by means of evolutionary algorithms. Furthermore, we propose a visualization approach for the interpretation of the edge weights. RESULTS We evaluated two different evolutionary algorithms, differential evolution and particle swarm optimization, for their suitability for optimizing the assignment edge weights. The results showed that both optimization methods are suited to optimize the edge weights. Furthermore, we compared our approach to the optimal assignment with equal edge weights and two literature similarity functions on a subset of the Directory of Useful Decoys using sophisticated virtual screening performance metrics. Our approach achieved a considerably better overall and early enrichment performance. The visualization of the edge weights enables the identification of substructures that are important for a good retrieval of ligands and for the binding to the protein target. CONCLUSIONS The optimization of the edge weights in optimal assignment methods is a valuable approach for ligand-based virtual screening experiments. The approach can be applied to any similarity function that employs the optimal assignment method, which includes a variety of similarity measures that have proven to be valuable in various cheminformatic tasks. The proposed visualization helps to get a better understanding of the binding mode of the analyzed query molecule.
Collapse
Affiliation(s)
- Lars Rosenbaum
- University of Tübingen, Center for Bioinformatics (ZBIT), Sand 1, 72076 Tübingen, Germany.
| | | | | | | |
Collapse
|
12
|
Kim S, Bolton EE, Bryant SH. Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis. J Cheminform 2012; 4:28. [PMID: 23134593 PMCID: PMC3537644 DOI: 10.1186/1758-2946-4-28] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2012] [Accepted: 10/03/2012] [Indexed: 01/08/2023] Open
Abstract
Background To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D) descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules. Results The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set) and between non-inactive compounds in a given biological assay (156-K set). When the “best-conformer-pair” approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for STST-opt, CTST-opt, ComboTST-opt, STCT-opt, CTCT-opt, and ComboTCT-opt, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive–non-inactive (NN) pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs. Conclusion These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non-inactive from random and inactive spaces “on average”, although some assays show a noticeable separation between the non-inactive and random spaces when multiple conformers are used for each compound. The present study is a critical next step to understand effects of conformational diversity of the molecules upon the 3-D molecular similarity and its application to biological activity data analysis in PubChem. The results of this study may be helpful to build search and analysis tools that exploit 3-D molecular similarity between compounds archived in PubChem and other molecular libraries in a more efficient way.
Collapse
Affiliation(s)
- Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, 20894, MD, USA.
| | | | | |
Collapse
|
13
|
Ripphausen P, Wassermann AM, Bajorath J. REPROVIS-DB: A Benchmark System for Ligand-Based Virtual Screening Derived from Reproducible Prospective Applications. J Chem Inf Model 2011; 51:2467-73. [DOI: 10.1021/ci200309j] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Peter Ripphausen
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| | - Anne Mai Wassermann
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstr. 2, D-53113 Bonn, Germany
| |
Collapse
|
14
|
Vogel SM, Bauer MR, Boeckler FM. DEKOIS: Demanding Evaluation Kits for Objective in Silico Screening — A Versatile Tool for Benchmarking Docking Programs and Scoring Functions. J Chem Inf Model 2011; 51:2650-65. [DOI: 10.1021/ci2001549] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Simon M. Vogel
- Laboratory for Molecular Design and Pharmaceutical Biophysics, Department of Pharmaceutical and Medicinal Chemistry, Institute of Pharmacy, Eberhard Karls University Tuebingen, Auf der Morgenstelle 8, 72076 Tuebingen, Germany
| | - Matthias R. Bauer
- Laboratory for Molecular Design and Pharmaceutical Biophysics, Department of Pharmaceutical and Medicinal Chemistry, Institute of Pharmacy, Eberhard Karls University Tuebingen, Auf der Morgenstelle 8, 72076 Tuebingen, Germany
| | - Frank M. Boeckler
- Laboratory for Molecular Design and Pharmaceutical Biophysics, Department of Pharmaceutical and Medicinal Chemistry, Institute of Pharmacy, Eberhard Karls University Tuebingen, Auf der Morgenstelle 8, 72076 Tuebingen, Germany
| |
Collapse
|
15
|
Jahn A, Rosenbaum L, Hinselmann G, Zell A. 4D Flexible Atom-Pairs: An efficient probabilistic conformational space comparison for ligand-based virtual screening. J Cheminform 2011; 3:23. [PMID: 21733172 PMCID: PMC3156737 DOI: 10.1186/1758-2946-3-23] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 07/06/2011] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND The performance of 3D-based virtual screening similarity functions is affected by the applied conformations of compounds. Therefore, the results of 3D approaches are often less robust than 2D approaches. The application of 3D methods on multiple conformer data sets normally reduces this weakness, but entails a significant computational overhead. Therefore, we developed a special conformational space encoding by means of Gaussian mixture models and a similarity function that operates on these models. The application of a model-based encoding allows an efficient comparison of the conformational space of compounds. RESULTS Comparisons of our 4D flexible atom-pair approach with over 15 state-of-the-art 2D- and 3D-based virtual screening similarity functions on the 40 data sets of the Directory of Useful Decoys show a robust performance of our approach. Even 3D-based approaches that operate on multiple conformers yield inferior results. The 4D flexible atom-pair method achieves an averaged AUC value of 0.78 on the filtered Directory of Useful Decoys data sets. The best 2D- and 3D-based approaches of this study yield an AUC value of 0.74 and 0.72, respectively. As a result, the 4D flexible atom-pair approach achieves an average rank of 1.25 with respect to 15 other state-of-the-art similarity functions and four different evaluation metrics. CONCLUSIONS Our 4D method yields a robust performance on 40 pharmaceutically relevant targets. The conformational space encoding enables an efficient comparison of the conformational space. Therefore, the weakness of the 3D-based approaches on single conformations is circumvented. With over 100,000 similarity calculations on a single desktop CPU, the utilization of the 4D flexible atom-pair in real-world applications is feasible.
Collapse
Affiliation(s)
- Andreas Jahn
- University of Tübingen, Center for Bioinformatics Tübingen (ZBIT), Sand 1, 72076 Tübingen, Germany
| | - Lars Rosenbaum
- University of Tübingen, Center for Bioinformatics Tübingen (ZBIT), Sand 1, 72076 Tübingen, Germany
| | - Georg Hinselmann
- University of Tübingen, Center for Bioinformatics Tübingen (ZBIT), Sand 1, 72076 Tübingen, Germany
| | - Andreas Zell
- University of Tübingen, Center for Bioinformatics Tübingen (ZBIT), Sand 1, 72076 Tübingen, Germany
| |
Collapse
|
16
|
Koeppen H, Kriegl J, Lessel U, Tautermann CS, Wellenzohn B. Ligand-Based Virtual Screening. METHODS AND PRINCIPLES IN MEDICINAL CHEMISTRY 2011. [DOI: 10.1002/9783527633326.ch3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
17
|
|
18
|
Khanna V, Ranganathan S. Molecular similarity and diversity approaches in chemoinformatics. Drug Dev Res 2010. [DOI: 10.1002/ddr.20404] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Varun Khanna
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney, Australia
| | - Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney, Australia
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| |
Collapse
|
19
|
Giganti D, Guillemain H, Spadoni JL, Nilges M, Zagury JF, Montes M. Comparative evaluation of 3D virtual ligand screening methods: impact of the molecular alignment on enrichment. J Chem Inf Model 2010; 50:992-1004. [PMID: 20527883 DOI: 10.1021/ci900507g] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In the early stage of drug discovery programs, when the structure of a complex involving a target and a small molecule is available, structure-based virtual ligand screening methods are generally preferred. However, ligand-based strategies like shape-similarity search methods can also be applied. Shape-similarity search methods consist in exploring a pseudo-binding-site derived from the known small molecule used as a reference. Several of these methods use conformational sampling algorithms which are also shared by corresponding docking methods: for example Surflex-dock/Surflex-sim, FlexX/FlexS, ICM, and OMEGA-FRED/OMEGA-ROCS. Using 11 systems issued from the challenging "own" subsets of the Directory of Useful Decoys (DUD-own), we evaluated and compared the performance of the above-cited programs in terms of molecular alignment accuracy, enrichment in active compounds, and enrichment in different chemotypes (scaffold-hopping). Since molecular alignment is a crucial aspect of performance for the different methods, we have assessed its impact on enrichment. We have also illustrated the paradox of retrieving active compounds with good scores even if they are inaccurately positioned. Finally, we have highlighted possible positive aspects of using shape-based approaches in drug-discovery protocols when the structure of the target in complex with a small molecule is known.
Collapse
Affiliation(s)
- David Giganti
- Unite de Bioinformatique Structurale, Institut Pasteur, 26 rue du Dr Roux, 75015 Paris, France
| | | | | | | | | | | |
Collapse
|
20
|
Bender A. How similar are those molecules after all? Use two descriptors and you will have three different answers. Expert Opin Drug Discov 2010; 5:1141-51. [DOI: 10.1517/17460441.2010.517832] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
21
|
Murata K, Nagata N, Nakanishi I, Kitaura K. SDOVS: A solvent dipole ordering-based method for virtual screening. J Comput Chem 2010; 31:2714-22. [DOI: 10.1002/jcc.21565] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
22
|
Geppert H, Vogt M, Bajorath J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J Chem Inf Model 2010; 50:205-16. [PMID: 20088575 DOI: 10.1021/ci900419k] [Citation(s) in RCA: 231] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Hanna Geppert
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universitat, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | | | | |
Collapse
|
23
|
Krüger DM, Evers A. Comparison of structure- and ligand-based virtual screening protocols considering hit list complementarity and enrichment factors. ChemMedChem 2010; 5:148-58. [PMID: 19908272 DOI: 10.1002/cmdc.200900314] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Structure- and ligand-based virtual-screening methods (docking, 2D- and 3D-similarity searching) were analysed for their effectiveness in virtual screening against four different targets: angiotensin-converting enzyme (ACE), cyclooxygenase 2 (COX-2), thrombin and human immunodeficiency virus 1 (HIV-1) protease. The relative performance of the tools was compared by examining their ability to recognise known active compounds from a set of actives and nonactives. Furthermore, we investigated whether the application of different virtual-screening methods in parallel provides complementary or redundant hit lists. Docking was performed with GOLD, Glide, FlexX and Surflex. The obtained docking poses were rescored by using nine different scoring functions in addition to the scoring functions implemented as objective functions in the docking algorithms. Ligand-based virtual screening was done with ROCS (3D-similarity searching), Feature Trees and Scitegic Functional Fingerprints (2D-similarity searching). The results show that structure- and ligand-based virtual-screening methods provide comparable enrichments in detecting active compounds. Interestingly, the hit lists that are obtained from different virtual-screening methods are generally highly complementary. These results suggest that a parallel application of different structure- and ligand-based virtual-screening methods increases the chance of identifying more (and more diverse) active compounds from a virtual-screening campaign.
Collapse
Affiliation(s)
- Dennis M Krüger
- Institut für pharmazeutische und medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, Universitätsstrasse 1, 40225 Düsseldorf, Germany
| | | |
Collapse
|
24
|
Fechner N, Jahn A, Hinselmann G, Zell A. Estimation of the applicability domain of kernel-based machine learning models for virtual screening. J Cheminform 2010; 2:2. [PMID: 20222949 PMCID: PMC2851576 DOI: 10.1186/1758-2946-2-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Accepted: 03/11/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model. RESULTS We evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening. CONCLUSION The proposed applicability domain formulations for kernel-based QSAR models can successfully identify compounds for which no reliable predictions can be expected from the model. The resulting reduction of the search space and the elimination of some of the active compounds should not be considered as a drawback, because the results indicate that, in most cases, these omitted ligands would not be found by the model anyway.
Collapse
Affiliation(s)
- Nikolas Fechner
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Jahn
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Georg Hinselmann
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Zell
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| |
Collapse
|
25
|
Leach AR, Gillet VJ, Lewis RA, Taylor R. Three-dimensional pharmacophore methods in drug discovery. J Med Chem 2010; 53:539-58. [PMID: 19831387 DOI: 10.1021/jm900817u] [Citation(s) in RCA: 264] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Andrew R Leach
- Computational and Structural Chemistry, GlaxoSmithKline Research & Development, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, UK.
| | | | | | | |
Collapse
|
26
|
Hessler G, Baringhaus KH. The scaffold hopping potential of pharmacophores. DRUG DISCOVERY TODAY. TECHNOLOGIES 2010; 7:e203-e270. [PMID: 24103802 DOI: 10.1016/j.ddtec.2010.09.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
|
27
|
Abstract
This chapter reviews the use of molecular fingerprints for chemical similarity searching. The fingerprints encode the presence of 2D substructural fragments in a molecule, and the similarity between a pair of molecules is a function of the number of fragments that they have in common. Although this provides a very simple way of estimating the degree of structural similarity between two molecules, it has been found to provide an effective and an efficient tool for searching large chemical databases. The review describes the historical development of similarity searching since it was first described in the mid-1980s, reviews the many different coefficients, representations, and weightings that can be combined to form a similarity measure, describes quantitative measures of the effectiveness of similarity searching, and concludes by looking at current developments based on the use of data fusion and machine learning techniques.
Collapse
Affiliation(s)
- Peter Willett
- Department of Information Studies, The University of Sheffield, Sheffield, UK
| |
Collapse
|
28
|
Medina-Franco J, MartÃnez-Mayorga K, Bender A, Scior T. Scaffold Diversity Analysis of Compound Data Sets Using an Entropy-Based Measure. ACTA ACUST UNITED AC 2009. [DOI: 10.1002/qsar.200960069] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
29
|
Jahn A, Hinselmann G, Fechner N, Zell A. Optimal assignment methods for ligand-based virtual screening. J Cheminform 2009; 1:14. [PMID: 20150995 PMCID: PMC2820492 DOI: 10.1186/1758-2946-1-14] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2009] [Accepted: 08/25/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ligand-based virtual screening experiments are an important task in the early drug discovery stage. An ambitious aim in each experiment is to disclose active structures based on new scaffolds. To perform these "scaffold-hoppings" for individual problems and targets, a plethora of different similarity methods based on diverse techniques were published in the last years. The optimal assignment approach on molecular graphs, a successful method in the field of quantitative structure-activity relationships, has not been tested as a ligand-based virtual screening method so far. RESULTS We evaluated two already published and two new optimal assignment methods on various data sets. To emphasize the "scaffold-hopping" ability, we used the information of chemotype clustering analyses in our evaluation metrics. Comparisons with literature results show an improved early recognition performance and comparable results over the complete data set. A new method based on two different assignment steps shows an increased "scaffold-hopping" behavior together with a good early recognition performance. CONCLUSION The presented methods show a good combination of chemotype discovery and enrichment of active structures. Additionally, the optimal assignment on molecular graphs has the advantage to investigate and interpret the mappings, allowing precise modifications of internal parameters of the similarity measure for specific targets. All methods have low computation times which make them applicable to screen large data sets.
Collapse
Affiliation(s)
- Andreas Jahn
- University of Tübingen, Center for Bioinformatics Tübingen (ZBIT), Sand 1, 72076 Tübingen, Germany
| | | | | | | |
Collapse
|
30
|
Hammerling U, Tallsjö A, Grafström R, Ilbäck NG. Comparative Hazard Characterization in Food Toxicology. Crit Rev Food Sci Nutr 2009; 49:626-69. [DOI: 10.1080/10408390802145617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
31
|
Rohrer SG, Baumann K. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data. J Chem Inf Model 2009; 49:169-84. [PMID: 19434821 DOI: 10.1021/ci8002649] [Citation(s) in RCA: 223] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
Collapse
Affiliation(s)
- Sebastian G Rohrer
- Institute of Pharmaceutical Chemistry, Beethovenstrasse 55, Braunschweig University of Technology, 38106 Braunschweig, Germany
| | | |
Collapse
|
32
|
Wong WW, Burkowski FJ. A constructive approach for discovering new drug leads: Using a kernel methodology for the inverse-QSAR problem. J Cheminform 2009; 1:4. [PMID: 20142987 PMCID: PMC2816860 DOI: 10.1186/1758-2946-1-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2009] [Accepted: 04/28/2009] [Indexed: 12/04/2022] Open
Abstract
Background
The inverse-QSAR problem seeks to find a new molecular descriptor from which one can recover the structure of a molecule that possess a desired activity or property. Surprisingly, there are very few papers providing solutions to this problem. It is a difficult problem because the molecular descriptors involved with the inverse-QSAR algorithm must adequately address the forward QSAR problem for a given biological activity if the subsequent recovery phase is to be meaningful. In addition, one should be able to construct a feasible molecule from such a descriptor. The difficulty of recovering the molecule from its descriptor is the major limitation of most inverse-QSAR methods. Results
In this paper, we describe the reversibility of our previously reported descriptor, the vector space model molecular descriptor (VSMMD) based on a vector space model that is suitable for kernel studies in QSAR modeling. Our inverse-QSAR approach can be described using five steps: (1) generate the VSMMD for the compounds in the training set; (2) map the VSMMD in the input space to the kernel feature space using an appropriate kernel function; (3) design or generate a new point in the kernel feature space using a kernel feature space algorithm; (4) map the feature space point back to the input space of descriptors using a pre-image approximation algorithm; (5) build the molecular structure template using our VSMMD molecule recovery algorithm. Conclusion
The empirical results reported in this paper show that our strategy of using kernel methodology for an inverse-Quantitative Structure-Activity Relationship is sufficiently powerful to find a meaningful solution for practical problems. Electronic supplementary material The online version of this article (doi:10.1186/1758-2946-1-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- William Wl Wong
- The David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | | |
Collapse
|
33
|
Mackey MD, Melville JL. Better than Random? The Chemotype Enrichment Problem. J Chem Inf Model 2009; 49:1154-62. [DOI: 10.1021/ci8003978] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Mark D. Mackey
- Cresset BioMolecular Discovery Ltd., BioPark Hertfordshire, Broadwater Road, Welwyn Garden City, Hertfordshire AL7 3AX, United Kingdom
| | - James L. Melville
- Cresset BioMolecular Discovery Ltd., BioPark Hertfordshire, Broadwater Road, Welwyn Garden City, Hertfordshire AL7 3AX, United Kingdom
| |
Collapse
|
34
|
von Korff M, Freyss J, Sander T. Comparison of Ligand- and Structure-Based Virtual Screening on the DUD Data Set. J Chem Inf Model 2009; 49:209-31. [DOI: 10.1021/ci800303k] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Modest von Korff
- Department of Research Informatics, Actelion Ltd., Gewerbestrasse 16, CH-4123 Allschwil, Switzerland
| | - Joel Freyss
- Department of Research Informatics, Actelion Ltd., Gewerbestrasse 16, CH-4123 Allschwil, Switzerland
| | - Thomas Sander
- Department of Research Informatics, Actelion Ltd., Gewerbestrasse 16, CH-4123 Allschwil, Switzerland
| |
Collapse
|
35
|
Lessel U, Wellenzohn B, Lilienthal M, Claussen H. Searching Fragment Spaces with Feature Trees. J Chem Inf Model 2009; 49:270-9. [DOI: 10.1021/ci800272a] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Uta Lessel
- Department of Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach an der Riss, Germany, and BioSolveIT, An der Ziegelei 75, 53757 St. Augustin, Germany
| | - Bernd Wellenzohn
- Department of Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach an der Riss, Germany, and BioSolveIT, An der Ziegelei 75, 53757 St. Augustin, Germany
| | - Markus Lilienthal
- Department of Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach an der Riss, Germany, and BioSolveIT, An der Ziegelei 75, 53757 St. Augustin, Germany
| | - Holger Claussen
- Department of Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, 88397 Biberach an der Riss, Germany, and BioSolveIT, An der Ziegelei 75, 53757 St. Augustin, Germany
| |
Collapse
|
36
|
Cheeseright TJ, Mackey MD, Melville JL, Vinter JG. FieldScreen: Virtual Screening Using Molecular Fields. Application to the DUD Data Set. J Chem Inf Model 2008; 48:2108-17. [DOI: 10.1021/ci800110p] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Timothy J. Cheeseright
- Cresset BioMolecular Discovery Ltd., BioPark Hertfordshire, Broadwater Road, Welwyn Garden City, Hertfordshire AL7 3AX, United Kingdom
| | - Mark D. Mackey
- Cresset BioMolecular Discovery Ltd., BioPark Hertfordshire, Broadwater Road, Welwyn Garden City, Hertfordshire AL7 3AX, United Kingdom
| | - James L. Melville
- Cresset BioMolecular Discovery Ltd., BioPark Hertfordshire, Broadwater Road, Welwyn Garden City, Hertfordshire AL7 3AX, United Kingdom
| | - Jeremy G. Vinter
- Cresset BioMolecular Discovery Ltd., BioPark Hertfordshire, Broadwater Road, Welwyn Garden City, Hertfordshire AL7 3AX, United Kingdom
| |
Collapse
|
37
|
Boehm M, Wu TY, Claussen H, Lemmen C. Similarity Searching and Scaffold Hopping in Synthetically Accessible Combinatorial Chemistry Spaces. J Med Chem 2008; 51:2468-80. [DOI: 10.1021/jm0707727] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Markus Boehm
- Pfizer Global Research and Development, Eastern Point Road, Groton, Connecticut 06340, University of North Carolina, Chapel Hill, North Carolina 27599, and BioSolveIT GmbH, An der Ziegelei 75, D-53757 Sankt Augustin, Germany
| | - Tong-Ying Wu
- Pfizer Global Research and Development, Eastern Point Road, Groton, Connecticut 06340, University of North Carolina, Chapel Hill, North Carolina 27599, and BioSolveIT GmbH, An der Ziegelei 75, D-53757 Sankt Augustin, Germany
| | - Holger Claussen
- Pfizer Global Research and Development, Eastern Point Road, Groton, Connecticut 06340, University of North Carolina, Chapel Hill, North Carolina 27599, and BioSolveIT GmbH, An der Ziegelei 75, D-53757 Sankt Augustin, Germany
| | - Christian Lemmen
- Pfizer Global Research and Development, Eastern Point Road, Groton, Connecticut 06340, University of North Carolina, Chapel Hill, North Carolina 27599, and BioSolveIT GmbH, An der Ziegelei 75, D-53757 Sankt Augustin, Germany
| |
Collapse
|
38
|
Rohrer SG, Baumann K. Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics. J Chem Inf Model 2008; 48:704-18. [DOI: 10.1021/ci700099u] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Sebastian G. Rohrer
- Institute of Pharmaceutical Chemistry, Beethovenstrasse 55, Braunschweig University of Technology, 38106 Braunschweig, Germany
| | - Knut Baumann
- Institute of Pharmaceutical Chemistry, Beethovenstrasse 55, Braunschweig University of Technology, 38106 Braunschweig, Germany
| |
Collapse
|
39
|
Clark RD, Webster-Clark DJ. Managing bias in ROC curves. J Comput Aided Mol Des 2008; 22:141-6. [PMID: 18256892 DOI: 10.1007/s10822-008-9181-z] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Accepted: 01/14/2008] [Indexed: 10/22/2022]
Abstract
Two modifications to the standard use of receiver operating characteristic (ROC) curves for evaluating virtual screening methods are proposed. The first is to replace the linear plots usually used with semi-logarithmic ones (pROC plots), including when doing "area under the curve" (AUC) calculations. Doing so is a simple way to bias the statistic to favor identification of "hits" early in the recovery curve rather than late. A second suggested modification entails weighting each active based on the size of the lead series to which it belongs. Two weighting schemes are described: arithmetic, in which the weight for each active is inversely proportional to the size of the cluster from which it comes; and harmonic, in which weights are inversely proportional to the rank of each active within its class. Either scheme is able to distinguish biased from unbiased screening statistics, but the harmonically weighted AUC in particular emphasizes the ability to place representatives of each class of active early in the recovery curve.
Collapse
Affiliation(s)
- Robert D Clark
- Tripos Informatics Research Center, 1699 South Hanley Road, Saint Louis, MO 63144, USA.
| | | |
Collapse
|
40
|
Good AC, Oprea TI. Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection? J Comput Aided Mol Des 2008; 22:169-78. [DOI: 10.1007/s10822-007-9167-2] [Citation(s) in RCA: 153] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2007] [Accepted: 12/19/2007] [Indexed: 11/28/2022]
|
41
|
Lin HH, Han LY, Yap CW, Xue Y, Liu XH, Zhu F, Chen YZ. Prediction of factor Xa inhibitors by machine learning methods. J Mol Graph Model 2007; 26:505-18. [PMID: 17418603 DOI: 10.1016/j.jmgm.2007.03.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2006] [Revised: 02/04/2007] [Accepted: 03/07/2007] [Indexed: 01/04/2023]
Abstract
Factor Xa (FXa) inhibitors have been explored as anticoagulants for treatment and prevention of thrombotic diseases. Molecular docking, pharmacophore, quantitative structure-activity relationships, and support vector machines (SVM) have been used for computer prediction of FXa inhibitors. These methods achieve promising prediction accuracies of 69-80% for FXa inhibitors and 85-99% for non-inhibitors. Prediction performance, particularly for inhibitors, may be further improved by exploring methods applicable to more diverse range of compounds and by using more appropriate set of molecular descriptors. We tested the capability of several machine learning methods (C4.5 decision tree, k-nearest neighbor, probabilistic neural network, and support vector machine) by using a much more diverse set of 1098 compounds (360 inhibitors and 738 non-inhibitors) than those in other studies. A feature selection method was used for selecting molecular descriptors appropriate for distinguishing FXa inhibitors and non-inhibitors. The prediction accuracies of these methods are 89.1-97.5% for FXa inhibitors and 92.3-98.1% for non-inhibitors. In particular, compared to other studies, support vector machine gives a substantially improved accuracy of 94.6% for FXa non-inhibitors and maintains a comparable accuracy of 98.1% for inhibitors, based-on a more rigorous test with more diverse range of compounds. Our study suggests that machine learning methods such as SVM are useful for facilitating the prediction of FXa inhibitors.
Collapse
Affiliation(s)
- H H Lin
- Bioinformatics and Drug Design Group, Department of Pharmacy, National University of Singapore, Blk SOC1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | | | |
Collapse
|
42
|
Fontaine F, Bolton E, Borodina Y, Bryant SH. Fast 3D shape screening of large chemical databases through alignment-recycling. Chem Cent J 2007; 1:12. [PMID: 17880744 PMCID: PMC1994057 DOI: 10.1186/1752-153x-1-12] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 06/06/2007] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. RESULTS Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. CONCLUSION Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed.
Collapse
Affiliation(s)
- Fabien Fontaine
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yulia Borodina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
43
|
Perekhodtsev G. Neighborhood Behavior: Validation of Two-Dimensional Molecular Similarity as a Predictor of Similar Biological Activities and Docking Scores. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200610052] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
44
|
Williams C. Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Mol Divers 2006; 10:311-32. [PMID: 17031535 DOI: 10.1007/s11030-006-9039-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2005] [Accepted: 01/25/2006] [Indexed: 11/29/2022]
Abstract
Recent research has shown that using data fusion rules in fingerprint-based similarity searching can improve results over traditional searches. Group fusion scores, which use multiple reference compounds, have in particular been shown to be quite effective in increasing enrichment rates over single reference structure based searches. In this paper, the effectiveness of using data fusion with multiple reference compounds to increase similarity search recall rates was investigated using 44 biological targets and four different 2D fingerprinting systems, including a new 2D typed triangle fingerprinting system introduced here. Scaffold-hopping abilities using data fusion rules were investigated using eight (8) different classes of scaffolds active against cGMP phosphodiesterase isoform 5 (PDE5). An approach to using the reference group for ranking and visualizing important fingerprints bits, or reverse fingerprinting, was presented, and used to score and visualize important pharmacophore features within sample active molecules. Finally, similarity statistics within the reference groups were investigated and compared to recall rates.
Collapse
MESH Headings
- 3',5'-Cyclic-GMP Phosphodiesterases/chemistry
- Chemical Phenomena
- Chemistry, Pharmaceutical/methods
- Chemistry, Pharmaceutical/statistics & numerical data
- Chemistry, Physical
- Combinatorial Chemistry Techniques
- Cyclic Nucleotide Phosphodiesterases, Type 5
- Databases, Factual/statistics & numerical data
- Databases, Factual/trends
- Drug Design
- Humans
- Models, Biological
- Molecular Structure
- Receptors, Cell Surface/genetics
- Receptors, Cell Surface/physiology
- Reference Values
- Structure-Activity Relationship
Collapse
|
45
|
Davies JW, Glick M, Jenkins JL. Streamlining lead discovery by aligning in silico and high-throughput screening. Curr Opin Chem Biol 2006; 10:343-51. [PMID: 16822701 DOI: 10.1016/j.cbpa.2006.06.022] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2006] [Accepted: 06/21/2006] [Indexed: 12/01/2022]
Abstract
Lead discovery in the pharmaceutical environment is largely an industrial-scale process in which it is typical to screen 1-5 million compounds in a matter of weeks using High Throughput Screening (HTS). This process is a very costly endeavor. Typically a HTS campaign of 1 million compounds will cost anywhere from $500000 to $1000000. There is consequently a great deal of pressure to maximize the return on investment by finding fast and more effective ways to screen. A panacea that has emerged over the past few years to help address this issue is in silico screening. In silico screening is now incorporated in all areas of lead discovery; from target identification and library design, to hit analysis and compound profiling. However, as lead discovery has evolved over the past few years, so has the role of in silico screening.
Collapse
Affiliation(s)
- John W Davies
- Lead Discovery Center, Novartis Institutes for Biomedical Research Inc, 250 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|