1
|
Lambrinidis G, Tsantili-Kakoulidou A. Multi-objective optimization methods in novel drug design. Expert Opin Drug Discov 2020; 16:647-658. [PMID: 33353441 DOI: 10.1080/17460441.2021.1867095] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Introduction: In multi-objective drug design, optimization gains importance, being upgraded to a discipline that attracts its own research. Current strategies are broadly classified into single - objective optimization (SOO) and multi-objective optimization (MOO).Areas covered: Starting with SOO and the ways used to incorporate multiple criteria into it, the present review focuses on MOO techniques, their comparison, advantages, and restrictions. Pareto analysis and the concept of dominance stand in the core of MOO. The Pareto front, Pareto ranking, and limitations of Pareto-based methods, due to high dimensions and data uncertainty, are outlined. Desirability functions and the weighted sum approaches are described as stand-alone techniques to transform the MOO problem to SOO or in combination with pareto analysis and evolutionary algorithms. Representative applications in different drug research areas are also discussed.Expert opinion: Despite their limitations, the use of combined MOO techniques, as well as being complementary to SOO or in conjunction with artificial intelligence, contributes dramatically to efficient drug design, assisting decisions and increasing success probabilities. For multi-target drug design, optimization is supported by network approaches, while applicability of MOO to other fields like drug technology or biological complexity opens new perspectives in the interrelated fields of medicinal chemistry and molecular biology.
Collapse
Affiliation(s)
- George Lambrinidis
- Division of Pharmaceutical Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Panepistimiopolis, Zografou, Athens, Greece
| | - Anna Tsantili-Kakoulidou
- Division of Pharmaceutical Chemistry, Department of Pharmacy, National and Kapodistrian University of Athens, Panepistimiopolis, Zografou, Athens, Greece
| |
Collapse
|
2
|
Tomberg A, Boström J. Can easy chemistry produce complex, diverse, and novel molecules? Drug Discov Today 2020; 25:2174-2181. [DOI: 10.1016/j.drudis.2020.09.027] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 08/27/2020] [Accepted: 09/25/2020] [Indexed: 11/24/2022]
|
3
|
|
4
|
|
5
|
|
6
|
Quo vadis G protein-coupled receptor ligands? A tool for analysis of the emergence of new groups of compounds over time. Bioorg Med Chem Lett 2016; 27:626-631. [PMID: 27993519 DOI: 10.1016/j.bmcl.2016.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Revised: 11/30/2016] [Accepted: 12/01/2016] [Indexed: 11/24/2022]
Abstract
Exponential growth in the number of compounds with experimentally verified activity towards particular target has led to the emergence of various databases gathering data on biological activity. In this study, the ligands of family A of the G Protein-Coupled Receptors that are collected in the ChEMBL database were examined, and special attention was given to serotonin receptors. Sets of compounds were examined in terms of their appearance over time, they were mapped to the chemical space of drugs deposited in DrugBank, and the emergence of structurally new clusters of compounds was indicated. In addition, a tool for detailed analysis of the obtained visualizations was prepared and made available online at http://chem.gmum.net/vischem, which enables the investigation of chemical structures while referring to particular data points depicted in the figures and changes in compounds datasets over time.
Collapse
|
7
|
Colliandre L, Le Guilloux V, Bourg S, Morin-Allory L. Visual characterization and diversity quantification of chemical libraries: 2. Analysis and selection of size-independent, subspace-specific diversity indices. J Chem Inf Model 2012; 52:327-42. [PMID: 22181665 DOI: 10.1021/ci200535y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
High Throughput Screening (HTS) is a standard technique widely used to find hit compounds in drug discovery projects. The high costs associated with such experiments have highlighted the need to carefully design screening libraries in order to avoid wasting resources. Molecular diversity is an established concept that has been used to this end for many years. In this article, a new approach to quantify the molecular diversity of screening libraries is presented. The approach is based on the Delimited Reference Chemical Subspace (DRCS) methodology, a new method that can be used to delimit the densest subspace spanned by a reference library in a reduced 2D continuous space. A total of 22 diversity indices were implemented or adapted to this methodology, which is used here to remove outliers and obtain a relevant cell-based partition of the subspace. The behavior of these indices was assessed and compared in various extreme situations and with respect to a set of theoretical rules that a diversity function should satisfy when libraries of different sizes have to be compared. Some gold standard indices are found inappropriate in such a context, while none of the tested indices behave perfectly in all cases. Five DRCS-based indices accounting for different aspects of diversity were finally selected, and a simple framework is proposed to use them effectively. Various libraries have been profiled with respect to more specific subspaces, which further illustrate the interest of the method.
Collapse
Affiliation(s)
- Lionel Colliandre
- Institut de Chimie Organique et Analytique (ICOA), Université d'Orléans-CNRS, UMR 7311 B.P. 6759 Rue de Chartres, 45067 Orléans Cedex 2, France
| | | | | | | |
Collapse
|
8
|
Hack MD, Rassokhin DN, Buyck C, Seierstad M, Skalkin A, ten Holte P, Jones TK, Mirzadegan T, Agrafiotis DK. Library Enhancement through the Wisdom of Crowds. J Chem Inf Model 2011; 51:3275-86. [DOI: 10.1021/ci200446y] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Michael D. Hack
- Johnson & Johnson Pharmaceutical Research & Development, L.L.C., 3210 Merryfield Row, San Diego, California 92121, United States
| | - Dmitrii N. Rassokhin
- Johnson & Johnson Pharmaceutical Research & Development, L.L.C., Welsh & McKean Roads, Spring House, Pennsylvania 19477, United States
| | - Christophe Buyck
- Janssen Research & Development, Division of Janssen Pharmaceutica NV, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Mark Seierstad
- Johnson & Johnson Pharmaceutical Research & Development, L.L.C., 3210 Merryfield Row, San Diego, California 92121, United States
| | - Andrew Skalkin
- Johnson & Johnson Pharmaceutical Research & Development, L.L.C., Welsh & McKean Roads, Spring House, Pennsylvania 19477, United States
| | - Peter ten Holte
- Janssen Research & Development, Division of Janssen Pharmaceutica NV, Turnhoutseweg 30, B-2340 Beerse, Belgium
| | - Todd K. Jones
- Johnson & Johnson Pharmaceutical Research & Development, L.L.C., 3210 Merryfield Row, San Diego, California 92121, United States
- Todd Jones Consulting, San Diego, California
| | - Taraneh Mirzadegan
- Johnson & Johnson Pharmaceutical Research & Development, L.L.C., 3210 Merryfield Row, San Diego, California 92121, United States
| | - Dimitris K. Agrafiotis
- Johnson & Johnson Pharmaceutical Research & Development, L.L.C., Welsh & McKean Roads, Spring House, Pennsylvania 19477, United States
| |
Collapse
|
9
|
Schamberger J, Grimm M, Steinmeyer A, Hillisch A. Rendezvous in chemical space? Comparing the small molecule compound libraries of Bayer and Schering. Drug Discov Today 2011; 16:636-41. [DOI: 10.1016/j.drudis.2011.04.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2010] [Revised: 03/28/2011] [Accepted: 04/19/2011] [Indexed: 11/28/2022]
|
10
|
Ma C, Wang L, Xie XQ. GPU accelerated chemical similarity calculation for compound library comparison. J Chem Inf Model 2011; 51:1521-7. [PMID: 21692447 DOI: 10.1021/ci1004948] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Chemical similarity calculation plays an important role in compound library design, virtual screening, and "lead" optimization. In this manuscript, we present a novel GPU-accelerated algorithm for all-vs-all Tanimoto matrix calculation and nearest neighbor search. By taking advantage of multicore GPU architecture and CUDA parallel programming technology, the algorithm is up to 39 times superior to the existing commercial software that runs on CPUs. Because of the utilization of intrinsic GPU instructions, this approach is nearly 10 times faster than existing GPU-accelerated sparse vector algorithm, when Unity fingerprints are used for Tanimoto calculation. The GPU program that implements this new method takes about 20 min to complete the calculation of Tanimoto coefficients between 32 M PubChem compounds and 10K Active Probes compounds, i.e., 324G Tanimoto coefficients, on a 128-CUDA-core GPU.
Collapse
Affiliation(s)
- Chao Ma
- Department of Computational and Systems Biology, Joint Pitt/CMU Computational Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
| | | | | |
Collapse
|
11
|
Gillet VJ. Diversity selection algorithms. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2011. [DOI: 10.1002/wcms.33] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
12
|
Meinl T, Ostermann C, Berthold MR. Maximum-Score Diversity Selection for Early Drug Discovery. J Chem Inf Model 2011; 51:237-47. [DOI: 10.1021/ci100426r] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Thorsten Meinl
- Nycomed Chair for Bioinformatics and Information Mining, University of Konstanz, Konstanz, Germany
| | | | - Michael R. Berthold
- Nycomed Chair for Bioinformatics and Information Mining, University of Konstanz, Konstanz, Germany
| |
Collapse
|
13
|
Liu Y, Verducci JS. Review of statistical analyses in drug discovery and chemogenomics. Stat Anal Data Min 2009. [DOI: 10.1002/sam.10041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
14
|
Walters WP, Murcko MA. Library Filtering Systems and Prediction of Drug‐Like Properties. ACTA ACUST UNITED AC 2008. [DOI: 10.1002/9783527613083.ch2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
15
|
Agrafiotis DK, Lobanov VS, Rassokhin DN, Izrailev S. The Measurement of Molecular Diversity. ACTA ACUST UNITED AC 2008. [DOI: 10.1002/9783527613083.ch12] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
16
|
Lewis RA, Pickett SD, Clark DE. Computer-Aided Molecular Diversity Analysis and Combinatorial Library Design. REVIEWS IN COMPUTATIONAL CHEMISTRY 2007. [DOI: 10.1002/9780470125939.ch1] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
17
|
Papp A, Gulyas-Forró A, Gulyas Z, Dorman G, Urge L, Darvas F. Explicit Diversity Index (EDI): a novel measure for assessing the diversity of compound databases. J Chem Inf Model 2006; 46:1898-904. [PMID: 16995719 DOI: 10.1021/ci060074f] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A novel diversity assessment method, the Explicit Diversity Index (EDI), is introduced for druglike molecules. EDI combines structural and synthesis-related dissimilarity values and expresses them as a single number. As an easily interpretable measure, it facilitates the decision making in the design of combinatorial libraries, and it might assist in the comparison of compound sets provided by different manufacturers. Because of its rapid calculation algorithm, EDI enables the diversity assessment of in-house or commercial compound collections.
Collapse
Affiliation(s)
- Akos Papp
- AMRI Hungary, Zahony u. 7, H-1031 Budapest, Hungary, ComGrid Ltd., Zahony u. 7, H-1031 Budapest, Hungary
| | | | | | | | | | | |
Collapse
|
18
|
Engels MFM, Gibbs AC, Jaeger EP, Verbinnen D, Lobanov VS, Agrafiotis DK. A Cluster-Based Strategy for Assessing the Overlap between Large Chemical Libraries and Its Application to a Recent Acquisition. J Chem Inf Model 2006; 46:2651-60. [PMID: 17125205 DOI: 10.1021/ci600219n] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We report on the structural comparison of the corporate collections of Johnson & Johnson Pharmaceutical Research & Development (JNJPRD) and 3-Dimensional Pharmaceuticals (3DP), performed in the context of the recent acquisition of 3DP by JNJPRD. The main objective of the study was to assess the druglikeness of the 3DP library and the extent to which it enriched the chemical diversity of the JNJPRD corporate collection. The two databases, at the time of acquisition, collectively contained more than 1.1 million compounds with a clearly defined structural description. The analysis was based on a clustering approach and aimed at providing an intuitive quantitative estimate and visual representation of this enrichment. A novel hierarchical clustering algorithm called divisive k-means was employed in combination with Kelley's cluster-level selection method to partition the combined data set into clusters, and the diversity contribution of each library was evaluated as a function of the relative occupancy of these clusters. Typical 3DP chemotypes enriching the diversity of the JNJPRD collection were catalogued and visualized using a modified maximum common substructure algorithm. The joint collection of JNJPRD and 3DP compounds was also compared to other databases of known medicinally active or druglike compounds. The potential of the methodology for the analysis of very large chemical databases is discussed.
Collapse
Affiliation(s)
- Michael F M Engels
- Johnson and Johnson Pharmaceutical Research and Development, Division of Janssen Pharmaceutica, Turnhoutsweg 30, 2340 Beerse, Belgium.
| | | | | | | | | | | |
Collapse
|
19
|
Monge A, Arrault A, Marot C, Morin-Allory L. Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers. Mol Divers 2006; 10:389-403. [PMID: 17031540 DOI: 10.1007/s11030-006-9033-5] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2005] [Accepted: 04/07/2006] [Indexed: 01/04/2023]
Abstract
The data for 3.8 million compounds from structural databases of 32 providers were gathered and stored in a single chemical database. Duplicates are removed using the IUPAC International Chemical Identifier. After this, 2.6 million compounds remain. Each database and the final one were studied in term of uniqueness, diversity, frameworks, 'drug-like' and 'lead-like' properties. This study also shows that there are more than 87 000 frameworks in the database. It contains 2.1 million 'drug-like' molecules among which, more than one million are 'lead-like'. This study has been carried out using 'ScreeningAssistant', a software dedicated to chemical databases management and screening sets generation. Compounds are stored in a MySQL database and all the operations on this database are carried out by Java code. The druglikeness and leadlikeness are estimated with 'in-house' scores using functions to estimate convenience to properties; unicity using the InChI code and diversity using molecular frameworks and fingerprints. The software has been conceived in order to facilitate the update of the database. 'ScreeningAssistant' is freely available under the GPL license.
Collapse
Affiliation(s)
- Aurélien Monge
- Institut de Chimie Organique et Analytique, UMR CNRS 6005, Université d'Orléans, Orléans Cedex 2, France.
| | | | | | | |
Collapse
|
20
|
Affiliation(s)
- K. M. Eskridge
- a Department of Statistics , University of Nebraska , Lincoln, NE, 68583-0712
| | - S. G. Gilmour
- b School of Mathematical Sciences, Queen Mary , University of London , E1 4NS, UK
| | - R. Mead
- c Department of Applied Statistics , University of Reading , RG6 6FN, UK
| | - N. A. Butler
- c Department of Applied Statistics , University of Reading , RG6 6FN, UK
| | - D. A. Travnicek
- a Department of Statistics , University of Nebraska , Lincoln, NE, 68583-0712
| |
Collapse
|
21
|
Guha R, Dutta D, Jurs PC, Chen T. R-NN Curves: An Intuitive Approach to Outlier Detection Using a Distance Based Method. J Chem Inf Model 2006; 46:1713-22. [PMID: 16859303 DOI: 10.1021/ci060013h] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Libraries of chemical structures are used in a variety of cheminformatics tasks such as virtual screening and QSAR modeling and are generally characterized using molecular descriptors. When working with libraries it is useful to understand the distribution of compounds in the space defined by a set of descriptors. We present a simple approach to the analysis of the spatial distribution of the compounds in a library in general and outlier detection in particular based on counts of neighbors within a series of increasing radii. The resultant curves, termed R-NN curves, appear to follow a logistic model for any given descriptor space, which we justify theoretically for the 2D case. The method can be applied to data sets of arbitrary dimensions. The R-NN curves provide a visual method to easily detect compounds lying in a sparse region of a given descriptor space. We also present a method to numerically characterize the R-NN curves thus allowing identification of outliers in a single plot.
Collapse
Affiliation(s)
- Rajarshi Guha
- Department of Chemistry, Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | | | | | | |
Collapse
|
22
|
Abstract
Medicinal chemists have traditionally realized assessments of chemical diversity and subsequent compound acquisition, although a recent study suggests that experts are usually inconsistent in reviewing large data sets. To analyze the scaffold diversity of commercially available screening collections, we have developed a general workflow aimed at (1) identifying druglike compounds, (2) clustering them by maximum common substructures (scaffolds), (3) measuring the scaffold diversity encoded by each screening collection independently of its size, and finally (4) merging all common substructures in a nonredundant scaffold library that can easily be browsed by structural and topological queries. Starting from 2.4 million compounds out of 12 commercial sources, four categories of libraries could be identified: large- and medium-sized combinatorial libraries (low scaffold diversity), diverse libraries (medium diversity, medium size), and highly diverse libraries (high diversity, low size). The chemical space covered by the scaffold library can be searched to prioritize scaffold-focused libraries.
Collapse
Affiliation(s)
- Mireille Krier
- CNRS UMR7175-LC1, Institut Gilbert Laustriat, 74 route du Rhin, F-67401 Illkirch Cédex, France
| | | | | |
Collapse
|
23
|
Remlinger KS, Hughes-Oliver JM, Young SS, Lam RL. Statistical Design of Pools Using Optimal Coverage and Minimal Collision. Technometrics 2006. [DOI: 10.1198/004017005000000481] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
24
|
Abstract
Twenty years ago, drug discovery was a somewhat plodding and scholastic endeavor; those days are gone. The intellectual challenges are greater than ever but the pace has changed. Although there are greater opportunities for therapeutic targets than ever before, the costs and risks are great and the increasingly competitive environment makes the pace of pharmaceutical drug hunting range from exciting to overwhelming. These changes are catalyzed by major changes to drug discovery processes through application of rapid parallel synthesis of large chemical libraries and high-throughput screening. These techniques result in huge volumes of data for use in decision making. Besides the size and complex nature of biological and chemical data sets and the many sources of data “noise”, the needs of business produce many, often conflicting, decision criteria and constraints such as time, cost, and patent caveats. The drive is still to find potent and selective molecules but, in recent years, key aspects of drug discovery are being shifted to earlier in the process. Discovery scientists are now concerned with building molecules that have good stability but also reasonable properties of absorption into the bloodstream, distribution and binding to tissues, metabolism and excretion, low toxicity, and reasonable cost of production. These requirements result in a high-dimensional decision problem with conflicting criteria and limited resources. An overview of the broad range of issues and activities involved in pharmaceutical screening is given along with references for further reading.
Collapse
|
25
|
Fabricant DS, Nikolic D, Lankin DC, Chen SN, Jaki BU, Krunic A, van Breemen RB, Fong HHS, Farnsworth NR, Pauli GF. Cimipronidine, a cyclic guanidine alkaloid from Cimicifuga racemosa. JOURNAL OF NATURAL PRODUCTS 2005; 68:1266-70. [PMID: 16124775 DOI: 10.1021/np050066d] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A new cyclic guanidine alkaloid, cimipronidine (1), together with the known compound fukinolic acid (2), was isolated from the n-BuOH-soluble fraction of Cimicifuga racemosa roots that showed 5-HT7 receptor binding activity. Structure elucidation of 1, a minor constituent, presented unique challenges based on its polarity, but was accomplished with the use of a combination of one- and two-dimensional NMR as well as MS analyses. The relative configuration was established by analyzing the H,H-coupling constants and the results of the 2-D gradient NOESY spectrum. The previously reported serotonergic (5-HT7), highly polar, n-BuOH-soluble fraction was characterized by HPLC-ELSD and was shown to be a mixture containing the following compounds: cimicifugic acids A, B, and F, fukinolic acid, ferulic acid, isoferulic acid, and compound 1, potentially significant as a marker compound of C. racemosa.
Collapse
Affiliation(s)
- Daniel S Fabricant
- Program for Collaborative Research in the Pharmaceutical Sciences, College of Pharmacy, University of Illinois at Chicago, Chicago, Illinois 60612, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Le Bailly de Tilleghem C, Beck B, Boulanger B, Govaerts B. A Fast Exchange Algorithm for Designing Focused Libraries in Lead Optimization. J Chem Inf Model 2005; 45:758-67. [PMID: 15921465 DOI: 10.1021/ci049787t] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Combinatorial chemistry is widely used in drug discovery. Once a lead compound has been identified, a series of R-groups and reagents can be selected and combined to generate new potential drugs. The combinatorial nature of this problem leads to chemical libraries containing usually a very large number of virtual compounds, far too large to permit their chemical synthesis. Therefore, one often wants to select a subset of "good" reagents for each R-group of reagents and synthesize all their possible combinations. In this research, one encounters some difficulties. First, the selection of reagents has to be done such that the compounds of the resulting sublibrary simultaneously optimize a series of chemical properties. For each compound, a desirability index, a concept proposed by Harrington,(20) is used to summarize those properties in one fitness value. Then a loss function is used as objective criteria to globally quantify the quality of a sublibrary. Second, there are a huge number of possible sublibraries, and the solutions space has to be explored as fast as possible. The WEALD algorithm proposed in this paper starts with a random solution and iterates by applying exchanges, a simple method proposed by Fedorov(13) and often used in the generation of optimal designs. Those exchanges are guided by a weighting of the reagents adapted recursively as the solutions space is explored. The algorithm is applied on a real database and reveals to converge rapidly. It is compared to results given by two other algorithms presented in the combinatorial chemistry literature: the Ultrafast algorithm of D. Agrafiotis and V. Lobanov and the Piccolo algorithm of W. Zheng et al.
Collapse
Affiliation(s)
- Céline Le Bailly de Tilleghem
- Institute of Statistics from the Université catholique de Louvain - 20, voie du roman pays, 1348 Louvain-la-Neuve, Belgium.
| | | | | | | |
Collapse
|
27
|
Brown RD, Clark DE. Genetic diversity: applications of evolutionary algorithms to combinatorial library design. Expert Opin Ther Pat 2005. [DOI: 10.1517/13543776.8.11.1447] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
28
|
Jónsdóttir SO, Jørgensen FS, Brunak S. Prediction methods and databases within chemoinformatics: emphasis on drugs and drug candidates. Bioinformatics 2005; 21:2145-60. [PMID: 15713739 DOI: 10.1093/bioinformatics/bti314] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION To gather information about available databases and chemoinformatics methods for prediction of properties relevant to the drug discovery and optimization process. RESULTS We present an overview of the most important databases with 2-dimensional and 3-dimensional structural information about drugs and drug candidates, and of databases with relevant properties. Access to experimental data and numerical methods for selecting and utilizing these data is crucial for developing accurate predictive in silico models. Many interesting predictive methods for classifying the suitability of chemical compounds as potential drugs, as well as for predicting their physico-chemical and ADMET properties have been proposed in recent years. These methods are discussed, and some possible future directions in this rapidly developing field are described.
Collapse
Affiliation(s)
- Svava Osk Jónsdóttir
- Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark.
| | | | | |
Collapse
|
29
|
Young SS, Wang M, Gu F. Design of Diverse and Focused Combinatorial Libraries Using an Alternating Algorithm. ACTA ACUST UNITED AC 2003; 43:1916-21. [PMID: 14632440 DOI: 10.1021/ci034125+] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
There is considerable research in chemistry to develop reaction conditions so that any of a very large number of reactants will successfully form new compounds, e.g. for two components, A(i) + B(j) --> A-B(ij). The numbers of A's and B's usually make it impossible to make all the possible products; with multicomponent reactions, there could easily be millions to billions of possible products. There is a need to identify subsets of reagents so that the resulting products have desirable predicted properties. Our idea is to select reactants sequentially and iteratively to optimize the evolving candidate library. The new Alternating Algorithm, AA, can be used for diversity, a space-filling design, or for a focused design, using either a near neighborhood or structure-activity relationship, SAR. A diversity design seeks to select compounds different from one another; a focused design seeks to find compounds similar to an active compound or compounds that follow a structure activity relationship. The benefit of the method is rapid computation of diversity or focused combinatorial chemical libraries.
Collapse
Affiliation(s)
- S Stanley Young
- National Institute of Statistical Sciences, RTP, North Carolina 27709, USA.
| | | | | |
Collapse
|
30
|
Affiliation(s)
- Mary P Bradley
- Pfizer Global Research and Development, 2800 Plymouth Road, Ann Arbor, MI 48105, USA.
| |
Collapse
|
31
|
Abstract
Combinatorial chemistry and high-throughput screening have caused a fundamental shift in the way chemists contemplate experiments. Designing a combinatorial library is a controversial art that involves a heterogeneous mix of chemistry, mathematics, economics, experience, and intuition. Although there seems to be little agreement as to what constitutes an ideal library, one thing is certain: only one property or measure seldom defines the quality of the design. In most real-world applications, a good experiment requires the simultaneous optimization of several, often conflicting, design objectives, some of which may be vague and uncertain. In this paper, we discuss a class of algorithms for subset selection rooted in the principles of multiobjective optimization. Our approach is to employ an objective function that encodes all of the desired selection criteria, and then use a simulated annealing or evolutionary approach to identify the optimal (or a nearly optimal) subset from among the vast number of possibilities. Many design criteria can be accommodated, including diversity, similarity to known actives, predicted activity and/or selectivity determined by quantitative structure-activity relationship (QSAR) models or receptor binding models, enforcement of certain property distributions, reagent cost and availability, and many others. The method is robust, convergent, and extensible, offers the user full control over the relative significance of the various objectives in the final design, and permits the simultaneous selection of compounds from multiple libraries in full- or sparse-array format.
Collapse
Affiliation(s)
- D K Agrafiotis
- 3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Suite 104, Exton, Pennsylvania 19341, USA.
| |
Collapse
|
32
|
Abstract
The fast identification of quality lead compounds in the pharmaceutical industry through a combination of high throughput synthesis and screening has become more challenging in recent years. Although the number of available compounds for high throughput screening (HTS) has dramatically increased, large-scale random combinatorial libraries have contributed proportionally less to identify novel leads for drug discovery projects. Therefore, the concept of 'drug-likeness' of compound selections has become a focus in recent years. In parallel, the low success rate of converting lead compounds into drugs often due to unfavorable pharmacokinetic parameters has sparked a renewed interest in understanding more clearly what makes a compound drug-like. Various approaches have been devised to address the drug-likeness of molecules employing retrospective analyses of known drug collections as well as attempting to capture 'chemical wisdom' in algorithms. For example, simple property counting schemes, machine learning methods, regression models, and clustering methods have been employed to distinguish between drugs and non-drugs. Here we review computational techniques to address the drug-likeness of compound selections and offer an outlook for the further development of the field.
Collapse
Affiliation(s)
- Ingo Muegge
- Bayer Research Center, 400 Morgan Lane, West Haven, Connecticut 06516, USA.
| |
Collapse
|
33
|
Matter H. Computational approaches towards the quantification of molecular diversity and design of compound libraries. EXS 2003:125-56. [PMID: 12613175 DOI: 10.1007/978-3-0348-7997-2_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Hans Matter
- Aventis Pharma Deutschland GmbH, DI&A Chemistry, Molecular Modelling, Building G878, D-65926 Frankfurt am Main, Germany
| |
Collapse
|
34
|
Godden JW, Xue L, Kitchen DB, Stahura FL, Schermerhorn EJ, Bajorath J. Median Partitioning: a novel method for the selection of representative subsets from large compound pools. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:885-93. [PMID: 12132890 DOI: 10.1021/ci0203693] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A method termed Median Partitioning (MP) has been developed to select diverse sets of molecules from large compound pools. Unlike many other methods for subset selection, the MP approach does not depend on pairwise comparison of molecules and can therefore be applied to very large compound collections. The only time limiting step is the calculation of molecular descriptors for database compounds. MP employs arrays of property descriptors with little correlation to divide large compound pools into partitions from which representative molecules can be selected. In each of n subsequent steps, a population of molecules is divided into subpopulations above and below the median value of a property descriptor until a desired number of 2n partitions are obtained. For descriptor evaluation and selection, an entropy formulation was embedded in a genetic algorithm. MP has been applied here to generate a subset of the Available Chemicals Directory, and the results have been compared with cell-based partitioning.
Collapse
Affiliation(s)
- Jeffrey W Godden
- Department of Computer-Aided Drug Discovery, Albany Molecular Research, Inc, 21 Corporate Circle, Albany, New York 12212-5098, USA
| | | | | | | | | | | |
Collapse
|
35
|
Tounge BA, Pfahler LB, Reynolds CH. Chemical information based scaling of molecular descriptors: a universal chemical scale for library design and analysis. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:879-84. [PMID: 12132889 DOI: 10.1021/ci025503y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Scaling is a difficult issue for any analysis of chemical properties or molecular topology when disparate descriptors are involved. To compare properties across different data sets, a common scale must be defined. Using several publicly available databases (ACD, CMC, MDDR, and NCI) as a basis, we propose to define chemically meaningful scales for a number of molecular properties and topology descriptors. These chemically derived scaling functions have several advantages. First, it is possible to define chemically relevant scales, greatly simplifying similarity and diversity analyses across data sets. Second, this approach provides a convenient method for setting descriptor boundaries that define chemically reasonable topology spaces. For example, descriptors can be scaled so that compounds with little potential for biological activity, bioavailability, or other drug-like characteristics are easily identified as outliers. We have compiled scaling values for 314 molecular descriptors. In addition the 10th and 90th percentile values for each descriptor have been calculated for use in outlier filtering.
Collapse
Affiliation(s)
- Brett A Tounge
- Johnson & Johnson Pharmaceutical Research and Development, L.L.C., P.O. Box 776, Welsh and McKean Roads, Spring House, Pennsylvania 19477-0776, USA.
| | | | | |
Collapse
|
36
|
Agrafiotis DK, Lobanov VS, Salemme FR. Combinatorial informatics in the post-genomics ERA. Nat Rev Drug Discov 2002; 1:337-46. [PMID: 12120409 DOI: 10.1038/nrd791] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The multitude of potential drug targets emerging from genome sequencing demands new approaches to drug discovery. A chemogenomics strategy, which involves the generation of small-molecule compounds that can be used both as tools to probe biological mechanisms and as leads for drug-property optimization, provides a highly parallel, industrialized solution. Key to the success of this strategy is an integrated suite of chemi-informatics applications that can allow the rapid and directed optimization of chemical compounds with drug-like properties using 'just-in-time' combinatorial chemical synthesis. An effective embodiment of this process requires new computational and data-mining tools that cover all aspects of library generation, compound selection and experimental design, and work effectively on a massive scale.
Collapse
Affiliation(s)
- Dimitris K Agrafiotis
- 3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, Pennsylvania 19341, USA.
| | | | | |
Collapse
|
37
|
Abstract
Combinatorial chemistry and high-throughput screening have caused a fundamental shift in the way chemists contemplate experiments. Designing a combinatorial library is a controversial art that involves a heterogeneous mix of chemistry, mathematics, economics, experience, and intuition. Although there seems to be little agreement as to what constitutes an ideal library, one thing is certain: only one property or measure seldom defines the quality of the design. In most real-world applications, a good experiment requires the simultaneous optimization of several, often conflicting, design objectives, some of which may be vague and uncertain. In this paper, we discuss a class of algorithms for subset selection rooted in the principles of multiobjective optimization. Our approach is to employ an objective function that encodes all of the desired selection criteria, and then use a simulated annealing or evolutionary approach to identify the optimal (or a nearly optimal) subset from among the vast number of possibilities. Many design criteria can be accommodated, including diversity, similarity to known actives, predicted activity and/or selectivity determined by quantitative structure-activity relationship (QSAR) models or receptor binding models, enforcement of certain property distributions, reagent cost and availability, and many others. The method is robust, convergent, and extensible, offers the user full control over the relative significance of the various objectives in the final design, and permits the simultaneous selection of compounds from multiple libraries in full- or sparse-array format.
Collapse
Affiliation(s)
- D K Agrafiotis
- 3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Suite 104, Exton, Pennsylvania 19341, USA.
| |
Collapse
|
38
|
|
39
|
Bradley MP. An overview of the diversity represented in commercially-available databases. J Comput Aided Mol Des 2002; 16:301-9. [PMID: 12489680 DOI: 10.1023/a:1020811805001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Mary P Bradley
- Pfizer Global Research and Development, 2800 Plymouth Road, Ann Arbor, MI 48105, USA.
| |
Collapse
|
40
|
Abstract
Recent developments in combinatorial chemistry and high-throughput screening have dramatically increased the scale on which drug discovery programs are carried out. Along with these advances has come a need for automated methods of determining which compounds from a library should be synthesized and screened. These methods range from simple counting schemes to sophisticated machine learning techniques such as neural networks. While many of these methods have performed well in validation studies, the field is still in its formative stage. This paper reviews a number of computational techniques for identifying drug-like molecules and examines challenges facing the field.
Collapse
Affiliation(s)
- W Patrick Walters
- Vertex Pharmaceuticals, 130 Waverly Street, 02139, Cambridge, MA 02139, USA.
| | | |
Collapse
|
41
|
Agrafiotis DK, Rassokhin DN. A fractal approach for selecting an appropriate bin size for cell-based diversity estimation. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2002; 42:117-22. [PMID: 11855975 DOI: 10.1021/ci010314l] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A novel approach for selecting an appropriate bin size for cell-based diversity assessment is presented. The method measures the sensitivity of the diversity index as a function of grid resolution, using a box-counting algorithm that is reminiscent of those used in fractal analysis. It is shown that the relative variance of the diversity score (sum of squared cell occupancies) of several commonly used molecular descriptor sets exhibits a bell-shaped distribution, whose exact characteristics depend on the distribution of the data set, the number of points considered, and the dimensionality of the feature space. The peak of this distribution represents the optimal bin size for a given data set and sample size. Although box counting can be performed in an algorithmically efficient manner, the ability of cell-based methods to distinguish between subsets of different spread falls sharply with dimensionality, and the method becomes useless beyond a few dimensions.
Collapse
Affiliation(s)
- Dimitris K Agrafiotis
- 3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, Pennsylvania 19341, USA.
| | | |
Collapse
|
42
|
Anzali S, Barnickel G, Cezanne B, Krug M, Filimonov D, Poroikov V. Discriminating between drugs and nondrugs by prediction of activity spectra for substances (PASS). J Med Chem 2001; 44:2432-7. [PMID: 11448225 DOI: 10.1021/jm0010670] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Using the computer system PASS (prediction of activity spectra for substances), which predicts simultaneously several hundreds of biological activities, a training set for discriminating between drugs and nondrugs is created. For the training set, two subsets of databases of drugs and nondrugs (a subset of the World Drug Index, WDI, vs the Available Chemicals Directory, ACD) are used. The high value of prediction accuracy shows that the chemical descriptors and algorithms used in PASS provide highly robust structure-activity relationships and reliable predictions. Compared to other methods applied in this field, the direct benchmark undertaken with this paper showed that the results obtained with PASS are in good accordance with these approaches. In addition, it has been shown that the more specific drug information used in the training set of PASS, the more specific discrimination between drug and nondrug can be obtained.
Collapse
Affiliation(s)
- S Anzali
- Bio- and Chemoinformatics Department, Merck KGaA, Darmstadt D-64271, Germany.
| | | | | | | | | | | |
Collapse
|
43
|
Agrafiotis DK, Rassokhin DN. Design and prioritization of plates for high-throughput screening. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2001; 41:798-805. [PMID: 11410060 DOI: 10.1021/ci000313d] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A general algorithm for the prioritization and selection of plates for high-throughput screening is presented. The method uses a simulated annealing algorithm to search through the space of plate combinations for the one that maximizes some user-defined objective function. The algorithm is robust and convergent, and permits the simultaneous optimization of multiple design objectives, including molecular diversity, similarity to known actives, predicted activity or binding affinity, and many others. It is shown that the arrangement of compounds among the plates may have important consequences on the ability to design a well-targeted and cost-effective experiment. To that end, two simple and effective schemes for the construction of homogeneous and heterogeneous plates are outlined, using a novel similarity sorting algorithm based on one-dimensional nonlinear mapping.
Collapse
Affiliation(s)
- D K Agrafiotis
- 3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, Pennsylvania 19341, USA.
| | | |
Collapse
|
44
|
Voigt JH, Bienfait B, Wang S, Nicklaus MC. Comparison of the NCI open database with seven large chemical structural databases. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2001; 41:702-12. [PMID: 11410049 DOI: 10.1021/ci000150t] [Citation(s) in RCA: 176] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.
Collapse
Affiliation(s)
- J H Voigt
- Laboratory of Medicinal Chemistry, Center for Cancer Research, National Cancer Institute, National Institutes of Health, NCI at Frederick, 376 Boyles Street, Frederick, Maryland 21702, USA
| | | | | | | |
Collapse
|
45
|
Abstract
Recent advances in NMR-based screening methods have made it possible to screen larger libraries of molecules with higher throughput. However, experience shows that intelligent library design is important if NMR screening is to succeed in aiding our discovery of potent and useful lead compounds. This review presents the current state-of-the-art methodologies for designing primary and follow-up libraries for NMR screening. Diversity, drug-likeness and combinatorial libraries are discussed, and the inherent pitfalls of the NMR approach are addressed.
Collapse
Affiliation(s)
- C A. Lepre
- Vertex Pharmaceuticals, 130 Waverly Street, 02139-4242, Cambridge, MA, USA
| |
Collapse
|
46
|
Su AI, Lorber DM, Weston GS, Baase WA, Matthews BW, Shoichet BK. Docking molecules by families to increase the diversity of hits in database screens: computational strategy and experimental evaluation. Proteins 2001; 42:279-93. [PMID: 11119652 DOI: 10.1002/1097-0134(20010201)42:2<279::aid-prot150>3.0.co;2-u] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Molecular docking programs screen chemical databases for novel ligands that fit protein binding sites. When one compound fits the site well, close analogs typically do the same. Therefore, many of the compounds that are found in such screens resemble one another. This reduces the variety and novelty of the compounds suggested. In an attempt to increase the diversity of docking hit lists, the Available Chemicals Directory was grouped into families of related structures. All members of every family were docked and scored, but only the best scoring molecule of a high-ranking family was allowed in the hit list. The identity and scores of the other members of these families were recorded as annotations to the best family member, but they were not independently ranked. This family-based docking method was compared with molecule-by-molecule docking in screens against the structures of thymidylate synthase, dihydrofolate reductase (DHFR), and the cavity site of the mutant T4 lysozyme Leu99 --> Ala (L99A). In each case, the diversity of the hit list increased, and more families of known ligands were found. To investigate whether the newly identified hits were sensible, we tested representative examples experimentally for binding to L99A and DHFR. Of the six compounds tested against L99A, five bound to the internal cavity. Of the seven compounds tested against DHFR, six inhibited the enzyme with apparent K(i) values between 0.26 and 100 microM. The segregation of potential ligands into families of related molecules is a simple technique to increase the diversity of candidates suggested by database screens. The general approach should be applicable to most docking methods. Proteins 2001;42:279-293.
Collapse
Affiliation(s)
- A I Su
- Department of Molecular Pharmacology & Biological Chemistry, Northwestern University, Chicago, Illinois 60611-3008, USA
| | | | | | | | | | | |
Collapse
|
47
|
Agrafiotis DK. A constant time algorithm for estimating the diversity of large chemical libraries. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2001; 41:159-67. [PMID: 11206368 DOI: 10.1021/ci000091j] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We describe a novel diversity metric for use in the design of combinatorial chemistry and high-throughput screening experiments. The method estimates the cumulative probability distribution of intermolecular dissimilarities in the collection of interest and then measures the deviation of that distribution from the respective distribution of a uniform sample using the Kolmogorov-Smirnov statistic. The distinct advantage of this approach is that the cumulative distribution can be easily estimated using probability sampling and does not require exhaustive enumeration of all pairwise distances in the data set. The function is intuitive, very fast to compute, does not depend on the size of the collection, and can be used to perform diversity estimates on both global and local scale. More importantly, it allows meaningful comparison of data sets of different cardinality and is not affected by the curse of dimensionality, which plagues many other diversity indices. The advantages of this approach are demonstrated using examples from the combinatorial chemistry literature.
Collapse
Affiliation(s)
- D K Agrafiotis
- 3-Dimensional Pharmaceuticals, Inc., Exton, Pennsylvania 19341, USA.
| |
Collapse
|
48
|
Cordell GA, Quinn-Beattie ML, Farnsworth NR. The potential of alkaloids in drug discovery. Phytother Res 2001; 15:183-205. [PMID: 11351353 DOI: 10.1002/ptr.890] [Citation(s) in RCA: 193] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Alkaloids are an important group of diversely distributed, chemically, biologically and commercially significant natural products. This article suggests why now, with the presently available technology, and the remaining biome available and reasonably accessible, is an opportune moment to consciously focus on the discovery of further alkaloids with pharmacophoric utility.
Collapse
Affiliation(s)
- G A Cordell
- Program for Collaborative Research in the Pharmaceutical Sciences, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60612, USA.
| | | | | |
Collapse
|
49
|
Roberts G, Myatt GJ, Johnson WP, Cross KP, Blower PE. LeadScope: software for exploring large sets of screening data. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2000; 40:1302-14. [PMID: 11128088 DOI: 10.1021/ci0000631] [Citation(s) in RCA: 125] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern approaches to drug discovery have dramatically increased the speed and quantity of compounds that are made and tested for potential potency. The task of collecting, organizing, and assimilating this information is a major bottleneck in the discovery of new drugs. We have developed LeadScope a novel, interactive computer program for visualizing, browsing, and interpreting chemical and biological screening data that can assist pharmaceutical scientists in finding promising drug candidates. The software organizes the chemical data by structural features familiar to medicinal chemists. Graphs are used to summarize the data, and structural classes are highlighted that are statistically correlated with biological activity.
Collapse
Affiliation(s)
- G Roberts
- LeadScope Inc, Columbus, Ohio 43212, USA
| | | | | | | | | |
Collapse
|
50
|
Pogliani L. From molecular connectivity indices to semiempirical connectivity terms: recent trends in graph theoretical descriptors. Chem Rev 2000; 100:3827-58. [PMID: 11749329 DOI: 10.1021/cr0004456] [Citation(s) in RCA: 140] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- L Pogliani
- Dipartimento di Chimica, Università della Calabria, 87030 Rende, Italy
| |
Collapse
|