1
|
Uzundurukan A, Nelson M, Teske C, Islam MS, Mohamed E, Christy JV, Martin HJ, Muratov E, Glover S, Fuoco D. Meta-analysis and review of in silico methods in drug discovery - part 1: technological evolution and trends from big data to chemical space. THE PHARMACOGENOMICS JOURNAL 2025; 25:8. [PMID: 40204715 DOI: 10.1038/s41397-025-00368-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 03/13/2025] [Accepted: 04/01/2025] [Indexed: 04/11/2025]
Abstract
This review offers an overview of advanced in silico methods crucial for drug discovery, emphasizing their integration with data science, and investigates the effectiveness of data science, machine learning, and artificial intelligence via a thorough meta-analysis of existing technologies. This meta-analysis aims to rank these technologies based on their applications and accessibility of knowledge. Initially, a search strategy yielded 900 papers, which were then refined into two subsets: the top 300 most-cited papers since 2000 and papers selected for systematic review based on high impact. From these, 97 articles were identified for discussion, categorized by their influence on society. The focus remains on the qualitative impact of these disciplines rather than solely on metrics like new drug approvals. Ultimately, the review underscores the role of big data in enhancing our comprehension of drug candidate trajectories from development to commercialization, utilizing information stored in publicly available databases to chemical space. Graphical extrapolation of some keywords (Drug Discovery; Big Data; Database; Metadata) discussed in this article and their evolution (in terms of absolute items that are available) by time.
Collapse
Affiliation(s)
- Arife Uzundurukan
- Centre de Recherche Acoustique-Signal-Humain, Université de Sherbrooke, 2500 Bd de l'Université, Sherbrooke, J1K 2R1, QC, Canada
- Department of Chemical Engineering, École Polytechnique de Montréal, 2500 Chem. de Polytechnique, Montréal, H3T 1J4, QC, Canada
| | - Mark Nelson
- Piramal Pharma Solutions, Inc, 18655 Krause St., Riverview, MI 48193, Altoris, Inc., San Diego, CA, USA
| | | | - Mohamed Shahidul Islam
- Quality and Compliance Department, BIOVANTEK Global, 10149, chemin de la cote-de-liesse, Montréal, QC, Canada
| | - Elzagheid Mohamed
- Royal Commission for Jubail and Yanbu, Jubail Industrial City, Kingdom of Saudi Arabia
| | | | - Holli-Joi Martin
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Eugene Muratov
- UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC, USA
| | - Samantha Glover
- Quantum Business Solution. Beverly Hills, Los Angeles, CA, USA
| | - Domenico Fuoco
- Department of Chemical Engineering, École Polytechnique de Montréal, 2500 Chem. de Polytechnique, Montréal, H3T 1J4, QC, Canada.
| |
Collapse
|
2
|
Chevillard F, Hell S, Liberatore E. BB-SAR: An Application for Data-driven Analysis and Rational Design of Medicinal Chemistry Series. J Chem Inf Model 2025; 65:2845-2853. [PMID: 40042356 DOI: 10.1021/acs.jcim.4c02121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2025]
Abstract
In drug discovery, medicinal chemists face the challenge of generating and analyzing large data sets, often exceeding a thousand molecules and numerous physicochemical and biological properties. To address this, we introduced BB-SAR, an interpolative methodology that tackles both data complexity and interpretability, by breaking down molecules into their constituent building blocks (BBs). Establishing a direct correlation between molecules and their constituent BBs enables the association of these BBs with their respective biological and physicochemical properties. This facilitates more intuitive data analysis and enables the identification of critical trends between molecular features and their associated properties. While individual BBs rarely dictate property behavior, their combinations do. BB-SAR identifies impactful combinations for designing new, improved compounds. Additionally, it simplifies traditional medicinal chemistry analysis strategies and enhances the efficiency of drug discovery by providing a more inherent understanding of complex data sets within a concise framework.
Collapse
Affiliation(s)
- Florent Chevillard
- Idorsia Pharmaceuticals Ltd, Hegenheimermattweg 91, Allschwil 4123, Switzerland
| | - Sandrine Hell
- Idorsia Pharmaceuticals Ltd, Hegenheimermattweg 91, Allschwil 4123, Switzerland
| | - Elisa Liberatore
- Idorsia Pharmaceuticals Ltd, Hegenheimermattweg 91, Allschwil 4123, Switzerland
| |
Collapse
|
3
|
Ronkowski C, Deshpande D, Sharma N, Vahed M, Patel YM, Gukasyan HJ, Wu M, Peng K, Church TD, Kim RE, Mirzaian E, Padula WV, Tomaszewski D, Ng TMH, Wong-Beringer A, Zaro J, Qato DM, Davies DL, Papadopoulos V, Mangul S. Pioneering Computational Culture Within Pharmacy Schools by Empowering Students With Data Science and Bioinformatics Skills. AMERICAN JOURNAL OF PHARMACEUTICAL EDUCATION 2025; 89:101341. [PMID: 39674347 DOI: 10.1016/j.ajpe.2024.101341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/03/2024] [Accepted: 12/06/2024] [Indexed: 12/16/2024]
Abstract
As advancements in digital health lead to the generation of increasingly diverse and voluminous pharmaceutical data, it is increasingly critical that we teach trainee pharmaceutical scientists how to leverage this data to lead future innovations in health care and pharmaceutical research. To address this need, the University of Southern California Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences is incorporating data science and bioinformatics into the graduate and undergraduate curricula through introductory courses tailored for students without prior programming experience. These courses feature a teaching framework designed to make the fundamentals of data science and bioinformatics accessible to pharmacy students through step-by-step, Jupyter-based coding assignments with examples relevant to the pharmaceutical sciences. The framework supports Doctor of Pharmacy students by focusing on the practical applications of data science in clinical settings, while for Doctor of Philosophy (PhD) and Master's (MS) students, the emphasis is on research methodologies and advanced data analysis techniques. Here, we outline the design of this framework, highlighting the strategies we developed and the opportunities it provides to cultivate a computational culture within our institution and beyond.
Collapse
Affiliation(s)
- Cynthia Ronkowski
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Dhrithi Deshpande
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Nitesh Sharma
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Mohammad Vahed
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Yesha M Patel
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Hovhannes J Gukasyan
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Maryann Wu
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Kerui Peng
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Terry David Church
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Rory E Kim
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Edith Mirzaian
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - William Vincent Padula
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Daniel Tomaszewski
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Tien M H Ng
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Annie Wong-Beringer
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Jennica Zaro
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Dima M Qato
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA; Department of Clinical Pharmacy, Shaeffer Center for Health Policy and Economics, Los Angeles, CA, USA
| | - Daryl L Davies
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA
| | - Vassilios Papadopoulos
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA; USC Suzanne Dworak-Peck School of Social Work, University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Department of Pharmacology and Pharmaceutical Sciences, Los Angeles, CA, USA; John Stauffer Decanal Chair in Pharmaceutical Sciences, University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Department of Pharmacology and Pharmaceutical Sciences, Los Angeles, CA, USA
| | - Serghei Mangul
- University of Southern California, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Titus Family Department of Clinical Pharmacy, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Gao X, Zhang F, Guo X, Yao M, Wang X, Chen D, Zhang G, Wang X, Lai L. Attention-based deep learning for accurate cell image analysis. Sci Rep 2025; 15:1265. [PMID: 39779905 PMCID: PMC11711278 DOI: 10.1038/s41598-025-85608-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 01/03/2025] [Indexed: 01/11/2025] Open
Abstract
High-content analysis (HCA) holds enormous potential for drug discovery and research, but widely used methods can be cumbersome and yield inaccurate results. Noisy and redundant signals in cell images impede accurate deep learning-based image analysis. To address these issues, we introduce X-Profiler, a novel HCA method that combines cellular experiments, image processing, and deep learning modeling. X-Profiler combines the convolutional neural network and Transformer to encode high-content images, effectively filtering out noisy signals and precisely characterizing cell phenotypes. In comparative tests on drug-induced cardiotoxicity, mitochondrial toxicity classification, and compound classification, X-Profiler outperformed both DeepProfiler and CellProfiler, as two highly recognized and representative methods in this field. Our results demonstrate the utility and versatility of X-Profiler, and we anticipate its wide application in HCA for advancing drug development and disease research.
Collapse
Affiliation(s)
- Xiangrui Gao
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
| | - Fan Zhang
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
| | - Xueyu Guo
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
| | - Mengcheng Yao
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
| | - Xiaoxiao Wang
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
| | - Dong Chen
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
| | - Genwei Zhang
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China
| | - Xiaodong Wang
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China.
| | - Lipeng Lai
- XtalPi Innovation Center, 706 Block B, Dongsheng Building, Haidian District, Beijing, China.
| |
Collapse
|
5
|
Foadian E, Sanchez S, Kalinin SV, Ahmadi M. From Sunlight to Solutions: Closing the Loop on Halide Perovskites. ACS MATERIALS AU 2025; 5:11-23. [PMID: 39802149 PMCID: PMC11718539 DOI: 10.1021/acsmaterialsau.4c00096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Revised: 10/20/2024] [Accepted: 10/21/2024] [Indexed: 01/16/2025]
Abstract
Halide perovskites (HPs) are emerging as key materials in the fight against global warming with well recognized applications, such as photovoltaics, and emergent opportunities, such as photocatalysis for methane removal and environmental remediation. These current and emergent applications are enabled by a unique combination of high absorption coefficients, tunable band gaps, and long carrier diffusion lengths, making them highly efficient for solar energy conversion. To address the challenge of discovery and optimization of HPs in huge chemical and compositional spaces of possible candidates, this perspective discusses a comprehensive strategy for screening HPs through automated high-throughput and combinatorial synthesis techniques. A critical aspect of this approach is closing the characterization loop, where machine learning (ML) and human collaboration play pivotal roles. By leveraging human creativity and domain knowledge for hypothesis generation and employing ML to test and refine these hypotheses efficiently, we aim to accelerate the discovery and optimization of HPs under specific environmental conditions. This synergy enables rapid identification of the most promising materials, advancing from fundamental discovery to scalable manufacturability. Our ultimate goal of this work is to transition from laboratory-scale innovations to real-world applications, ensuring that HPs can be deployed effectively in technologies that mitigate global warming, such as in solar energy harvesting and methane removal systems.
Collapse
Affiliation(s)
- Elham Foadian
- Institute
for Advanced Materials and Manufacturing, Department of Materials Science and Engineering, Knoxville, Tennessee 37996, United States
| | - Sheryl Sanchez
- Institute
for Advanced Materials and Manufacturing, Department of Materials Science and Engineering, Knoxville, Tennessee 37996, United States
| | - Sergei V. Kalinin
- Institute
for Advanced Materials and Manufacturing, Department of Materials Science and Engineering, Knoxville, Tennessee 37996, United States
- Physical
Science Directorate, Pacific Northwest National
Laboratory, Richland, Washington 99354, United States
| | - Mahshid Ahmadi
- Institute
for Advanced Materials and Manufacturing, Department of Materials Science and Engineering, Knoxville, Tennessee 37996, United States
| |
Collapse
|
6
|
Nguyen-Vo TH, Do TTT, Nguyen BP. Multitask Learning on Graph Convolutional Residual Neural Networks for Screening of Multitarget Anticancer Compounds. J Chem Inf Model 2024. [PMID: 39197175 DOI: 10.1021/acs.jcim.4c00643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2024]
Abstract
Recently, various modern experimental screening pipelines and assays have been developed to find promising anticancer drug candidates. However, it is time-consuming and almost infeasible to screen an immense number of compounds for anticancer activity via experimental approaches. To partially address this issue, several computational advances have been proposed. In this study, we present iACP-GCR, a model based on multitask learning on graph convolutional residual neural networks with two types of shortcut connections, to identify multitarget anticancer compounds. In our architecture, the graph convolutional residual neural networks are shared by all the prediction tasks before being separately customized. The NCI-60 data set, one of the most reliable and well-known sources of experimentally verified compounds, was used to develop our model. From that data set, we collected and refined data about compounds screened across nine cancer types (panels), including breast, central nervous system, colon, leukemia, nonsmall cell lung, melanoma, ovarian, prostate, and renal, for model training and evaluation. The model performance evaluated on an independent test set shows that iACP-GCR surpasses the three advanced computational methods for multitask learning. The integration of two shortcut connection types in the shared networks also improves the prediction efficiency. We also deployed the model as a public web server to assist the research community in screening potential anticancer compounds.
Collapse
Affiliation(s)
- Thanh-Hoang Nguyen-Vo
- Ho Chi Minh City Open University, 97 Vo Van Tan, District 3, Ho Chi Minh City 70000, Vietnam
| | - Trang T T Do
- Ho Chi Minh City Open University, 97 Vo Van Tan, District 3, Ho Chi Minh City 70000, Vietnam
| | - Binh P Nguyen
- Victoria University of Wellington, Kelburn Parade, Wellington 6012, New Zealand
| |
Collapse
|
7
|
Olmedo DA, Durant-Archibold AA, López-Pérez JL, Medina-Franco JL. Design and Diversity Analysis of Chemical Libraries in Drug Discovery. Comb Chem High Throughput Screen 2024; 27:502-515. [PMID: 37409545 DOI: 10.2174/1386207326666230705150110] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/30/2023] [Accepted: 05/30/2023] [Indexed: 07/07/2023]
Abstract
Chemical libraries and compound data sets are among the main inputs to start the drug discovery process at universities, research institutes, and the pharmaceutical industry. The approach used in the design of compound libraries, the chemical information they possess, and the representation of structures, play a fundamental role in the development of studies: chemoinformatics, food informatics, in silico pharmacokinetics, computational toxicology, bioinformatics, and molecular modeling to generate computational hits that will continue the optimization process of drug candidates. The prospects for growth in drug discovery and development processes in chemical, biotechnological, and pharmaceutical companies began a few years ago by integrating computational tools with artificial intelligence methodologies. It is anticipated that it will increase the number of drugs approved by regulatory agencies shortly.
Collapse
Affiliation(s)
- Dionisio A Olmedo
- Centro de Investigaciones Farmacognósticas de la Flora Panameña (CIFLORPAN), Facultad de Farmacia, Universidad de Panamá, Ciudad de Panamá, Apartado, 0824-00178, Panamá
- Sistema Nacional de Investigación (SNI), Secretaria Nacional de Ciencia, Tecnología e Innovación (SENACYT), Ciudad del Saber, Clayton, Panamá
| | - Armando A Durant-Archibold
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Apartado, 0843-01103, Panamá
- Departamento de Bioquímica, Facultad de Ciencias Naturales, Exactas y Tecnología, Universidad de Panamá, Ciudad de Panamá, Panamá
| | - José Luis López-Pérez
- CESIFAR, Departamento de Farmacología, Facultad de Medicina, Universidad de Panamá, Ciudad de Panamá, Panamá
- Departamento de Ciencias Farmacéuticas, Facultad de Farmacia, Universidad de Salamanca, Avda. Campo Charro s/n, 37071 Salamanca, España
| | - José Luis Medina-Franco
- DIFACQUIM Grupo de Investigación, Departamento de Farmacia, Escuela de Química, Universidad Nacional Autónoma de México, Ciudad de México, Apartado, 04510, México
| |
Collapse
|
8
|
Niazi SK, Mariam Z. Computer-Aided Drug Design and Drug Discovery: A Prospective Analysis. Pharmaceuticals (Basel) 2023; 17:22. [PMID: 38256856 PMCID: PMC10819513 DOI: 10.3390/ph17010022] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/13/2023] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
In the dynamic landscape of drug discovery, Computer-Aided Drug Design (CADD) emerges as a transformative force, bridging the realms of biology and technology. This paper overviews CADDs historical evolution, categorization into structure-based and ligand-based approaches, and its crucial role in rationalizing and expediting drug discovery. As CADD advances, incorporating diverse biological data and ensuring data privacy become paramount. Challenges persist, demanding the optimization of algorithms and robust ethical frameworks. Integrating Machine Learning and Artificial Intelligence amplifies CADDs predictive capabilities, yet ethical considerations and scalability challenges linger. Collaborative efforts and global initiatives, exemplified by platforms like Open-Source Malaria, underscore the democratization of drug discovery. The convergence of CADD with personalized medicine offers tailored therapeutic solutions, though ethical dilemmas and accessibility concerns must be navigated. Emerging technologies like quantum computing, immersive technologies, and green chemistry promise to redefine the future of CADD. The trajectory of CADD, marked by rapid advancements, anticipates challenges in ensuring accuracy, addressing biases in AI, and incorporating sustainability metrics. This paper concludes by highlighting the need for proactive measures in navigating the ethical, technological, and educational frontiers of CADD to shape a healthier, brighter future in drug discovery.
Collapse
Affiliation(s)
| | - Zamara Mariam
- Centre for Health and Life Sciences, Coventry University, Coventry City CV1 5FB, UK
| |
Collapse
|
9
|
Handa K, Sakamoto S, Kageyama M, Iijima T. Development of a 2D-QSAR Model for Tissue-to-Plasma Partition Coefficient Value with High Accuracy Using Machine Learning Method, Minimum Required Experimental Values, and Physicochemical Descriptors. Eur J Drug Metab Pharmacokinet 2023:10.1007/s13318-023-00832-w. [PMID: 37266860 DOI: 10.1007/s13318-023-00832-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/09/2023] [Indexed: 06/03/2023]
Abstract
BACKGROUND The demand for physiologically based pharmacokinetic (PBPK) model is increasing currently. New drug application (NDA) of many compounds is submitted with PBPK models for efficient drug development. Tissue-to-plasma partition coefficient (Kp) is a key parameter for the PBPK model to describe differential equations. However, it is difficult to obtain the Kp value experimentally because the measurement of drug concentration in the tissue is much harder than that in plasma. OBJECTIVE Instead of experiments, many researchers have sought in silico methods. Today, most of the models for Kp prediction are using in vitro and in vivo parameters as explanatory variables. We thought of physicochemical descriptors that could improve the predictability. Therefore, we aimed to develop the two-dimensional quantitative structure-activity relationship (2D-QSAR) model for Kp using physicochemical descriptors instead of in vivo experimental data as explanatory variables. METHODS We compared our model with the conventional models using 20-fold cross-validation according to the published method (Yun et al. J Pharmacokinet Pharmacodyn 41:1-14, 2014). We used random forest algorithm, which is known to be one of the best predictors for the 2D-QSAR model. Finally, we combined minimum in vitro experimental values and physiochemical descriptors. Thus, the prediction method for Kp value using a few in vitro parameters and physicochemical descriptors was developed; this is a multimodal model. RESULTS Its accuracy was found to be superior to that of the conventional models. Results of this research suggest that multimodality is useful for the 2D-QSAR model [RMSE and % of two-fold error: 0.66 and 42.2% (Berezohkovsky), 0.52 and 52.2% (Rodgers), 0.65 and 34.6% (Schmitt), 0.44 and 61.1% (published model), 0.41 and 62.1% (traditional model), 0.39 and 64.5% (multimodal model)]. CONCLUSION We could develop a 2D-QSAR model for Kp value with the highest accuracy using a few in vitro experimental data and physicochemical descriptors.
Collapse
Affiliation(s)
- Koichi Handa
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo, 191-8512, Japan.
| | - Seishiro Sakamoto
- Pharmaceutical Development Coordination Department, Teijin Pharma Limited, 3-2-1, Kasumigaseki Common Gate West Tower, Kasumigaseki Chiyoda-ku, Tokyo, 100-8585, Japan
| | - Michiharu Kageyama
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo, 191-8512, Japan
| | - Takeshi Iijima
- Toxicology & DMPK Research Department, Teijin Institute for Bio-Medical Research, Teijin Pharma Limited, 4-3-2 Asahigaoka, Hino-shi, Tokyo, 191-8512, Japan
| |
Collapse
|
10
|
Parastar H, Tauler R. Big (Bio)Chemical Data Mining Using Chemometric Methods: A Need for Chemists. Angew Chem Int Ed Engl 2022. [DOI: 10.1002/ange.201801134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Hadi Parastar
- Department of Chemistry Sharif University of Technology Tehran Iran
| | - Roma Tauler
- Department of Environmental Chemistry IDAEA-CSIC 08034 Barcelona Spain
| |
Collapse
|
11
|
Zabolotna Y, Bonachera F, Horvath D, Lin A, Marcou G, Klimchuk O, Varnek A. Chemspace Atlas: Multiscale Chemography of Ultralarge Libraries for Drug Discovery. J Chem Inf Model 2022; 62:4537-4548. [DOI: 10.1021/acs.jcim.2c00509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yuliana Zabolotna
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Fanny Bonachera
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Dragos Horvath
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Arkadii Lin
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Gilles Marcou
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Olga Klimchuk
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| | - Alexandre Varnek
- University of Strasbourg, Laboratoire de Chemoinformatique, 4, rue B. Pascal, Strasbourg 67081, France
| |
Collapse
|
12
|
From traditional to data-driven medicinal chemistry: a case study. Drug Discov Today 2022; 27:2065-2070. [PMID: 35452790 DOI: 10.1016/j.drudis.2022.04.017] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/08/2022] [Accepted: 04/13/2022] [Indexed: 12/20/2022]
Abstract
Artificial intelligence (AI) and data science are beginning to impact drug discovery. It usually takes considerable time and effort until new scientific concepts or technologies make a transition from conceptual stages to practical applicability and until experience values are gathered. Especially for computational approaches, demonstrating measurable impact on drug discovery projects is not a trivial task. A pilot study at Daiichi Sankyo Company has attempted to integrate data-driven approaches into practical medicinal chemistry and quantify the impact, as reported herein. Although the organization and focal points of early-phase drug discovery naturally vary at different pharmaceutical companies, the results of this pilot study indicate the significant potential of data-driven medicinal chemistry and suggest new models for internal training of next-generation medicinal chemists. Keywords: medicinal chemistry; drug discovery; chemoinformatics; data science; data-driven R&D.
Collapse
|
13
|
Abstract
Artificial intelligence (AI) tools find increasing application in drug discovery supporting every stage of the Design-Make-Test-Analyse (DMTA) cycle. The main focus of this chapter is the application in molecular generation with the aid of deep neural networks (DNN). We present a historical overview of the main advances in the field. We analyze the concepts of distribution and goal-directed learning and then highlight some of the recent applications of generative models in drug design with a focus into research work from the biopharmaceutical industry. We present in some more detail REINVENT which is an open-source software developed within our group in AstraZeneca and the main platform for AI molecular design support for a number of medicinal chemistry projects in the company and we also demonstrate some of our work in library design. Finally, we present some of the main challenges in the application of AI in Drug Discovery and different approaches to respond to these challenges which define areas for current and future work.
Collapse
|
14
|
Weber JM, Guo Z, Zhang C, Schweidtmann AM, Lapkin AA. Chemical data intelligence for sustainable chemistry. Chem Soc Rev 2021; 50:12013-12036. [PMID: 34520507 DOI: 10.1039/d1cs00477h] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
This study highlights new opportunities for optimal reaction route selection from large chemical databases brought about by the rapid digitalisation of chemical data. The chemical industry requires a transformation towards more sustainable practices, eliminating its dependencies on fossil fuels and limiting its impact on the environment. However, identifying more sustainable process alternatives is, at present, a cumbersome, manual, iterative process, based on chemical intuition and modelling. We give a perspective on methods for automated discovery and assessment of competitive sustainable reaction routes based on renewable or waste feedstocks. Three key areas of transition are outlined and reviewed based on their state-of-the-art as well as bottlenecks: (i) data, (ii) evaluation metrics, and (iii) decision-making. We elucidate their synergies and interfaces since only together these areas can bring about the most benefit. The field of chemical data intelligence offers the opportunity to identify the inherently more sustainable reaction pathways and to identify opportunities for a circular chemical economy. Our review shows that at present the field of data brings about most bottlenecks, such as data completion and data linkage, but also offers the principal opportunity for advancement.
Collapse
Affiliation(s)
- Jana M Weber
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK. .,Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore
| | - Zhen Guo
- Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore.,Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, 138602, Singapore
| | - Chonghuan Zhang
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK.
| | - Artur M Schweidtmann
- Department of Chemical Engineering, Delft University of Technology, Van der Maasweg 9, Delft 2629 HZ, The Netherlands
| | - Alexei A Lapkin
- Department of Chemical Engineering and Biotechnology, University of Cambridge, West Cambridge Site, Philippa Fawcett Drive, Cambridge CB3 0AS, UK. .,Chemical Data Intelligence (CDI) Pte Ltd, Robinson Road, #02-00, 068898, Singapore.,Cambridge Centre for Advanced Research and Education in Singapore, CARES Ltd. 1 CREATE Way, CREATE Tower #05-05, 138602, Singapore
| |
Collapse
|
15
|
Williams W, Zeng L, Gensch T, Sigman MS, Doyle AG, Anslyn EV. The Evolution of Data-Driven Modeling in Organic Chemistry. ACS CENTRAL SCIENCE 2021; 7:1622-1637. [PMID: 34729406 PMCID: PMC8554870 DOI: 10.1021/acscentsci.1c00535] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Indexed: 05/14/2023]
Abstract
Organic chemistry is replete with complex relationships: for example, how a reactant's structure relates to the resulting product formed; how reaction conditions relate to yield; how a catalyst's structure relates to enantioselectivity. Questions like these are at the foundation of understanding reactivity and developing novel and improved reactions. An approach to probing these questions that is both longstanding and contemporary is data-driven modeling. Here, we provide a synopsis of the history of data-driven modeling in organic chemistry and the terms used to describe these endeavors. We include a timeline of the steps that led to its current state. The case studies included highlight how, as a community, we have advanced physical organic chemistry tools with the aid of computers and data to augment the intuition of expert chemists and to facilitate the prediction of structure-activity and structure-property relationships.
Collapse
Affiliation(s)
- Wendy
L. Williams
- Department
of Chemistry and Biochemistry, University
of California, Los Angeles, California 90095, United States
- Department
of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Lingyu Zeng
- Department
of Chemistry, The University of Texas at
Austin, Austin, Texas 78712, United States
| | - Tobias Gensch
- Department
of Chemistry, TU Berlin, Straße des 17. Juni 135, Sekr. C2, 10623 Berlin, Germany
| | - Matthew S. Sigman
- Department
of Chemistry, University of Utah, Salt Lake City, Utah 84112, United States
| | - Abigail G. Doyle
- Department
of Chemistry and Biochemistry, University
of California, Los Angeles, California 90095, United States
- Department
of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Eric V. Anslyn
- Department
of Chemistry, The University of Texas at
Austin, Austin, Texas 78712, United States
| |
Collapse
|
16
|
|
17
|
Papadopoulos K, Giblin KA, Janet JP, Patronov A, Engkvist O. De novo design with deep generative models based on 3D similarity scoring. Bioorg Med Chem 2021; 44:116308. [PMID: 34280849 DOI: 10.1016/j.bmc.2021.116308] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 07/01/2021] [Accepted: 07/05/2021] [Indexed: 01/25/2023]
Abstract
We have demonstrated the utility of a 3D shape and pharmacophore similarity scoring component in molecular design with a deep generative model trained with reinforcement learning. Using Dopamine receptor type 2 (DRD2) as an example and its antagonist haloperidol 1 as a starting point in a ligand based design context, we have shown in a retrospective study that a 3D similarity enabled generative model can discover new leads in the absence of any other information. It can be efficiently used for scaffold hopping and generation of novel series. 3D similarity based models were compared against 2D QSAR based, indicating a significant degree of orthogonality of the generated outputs and with the former having a more diverse output. In addition, when the two scoring components are combined together for training of the generative model, it results in more efficient exploration of desirable chemical space compared to the individual components.
Collapse
Affiliation(s)
| | - Kathryn A Giblin
- Medicinal Chemistry, Research and Early Development, Oncology R&D, AstraZeneca, Cambridge, UK
| | - Jon Paul Janet
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden
| | - Atanas Patronov
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| | - Ola Engkvist
- Molecular AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden
| |
Collapse
|
18
|
Systematic risk identification and assessment using a new risk map in pharmaceutical R&D. Drug Discov Today 2021; 26:2786-2793. [PMID: 34229082 DOI: 10.1016/j.drudis.2021.06.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 05/21/2021] [Accepted: 06/29/2021] [Indexed: 11/20/2022]
Abstract
Delivering transformative therapies to patients while maintaining growth in the pharmaceutical industry requires an efficient use of research and development (R&D) resources and technologies to develop high-impact new molecular entities (NMEs). However, increasing global R&D competition in the pharmaceutical industry, growing impact of generics and biosimilars, more stringent regulatory requirements, as well as cost-constrained reimbursement frameworks challenge current business models of leading pharmaceutical companies. Big data-based analytics and artificial intelligence (AI) approaches have disrupted various industries and are having an increasing impact in the biopharmaceutical industry, with the promise to improve and accelerate biopharmaceutical R&D processes. Here, we systematically analyze, identify, assess, and categorize key risks across the drug discovery and development value chain using a new risk map approach, providing a comprehensive risk-reward analysis for pharmaceutical R&D.
Collapse
|
19
|
Rodrigues JF, Florea L, de Oliveira MCF, Diamond D, Oliveira ON. Big data and machine learning for materials science. DISCOVER MATERIALS 2021; 1:12. [PMID: 33899049 PMCID: PMC8054236 DOI: 10.1007/s43939-021-00012-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 05/11/2023]
Abstract
Herein, we review aspects of leading-edge research and innovation in materials science that exploit big data and machine learning (ML), two computer science concepts that combine to yield computational intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. However, the potential benefits of ML come at the cost of big data production; that is, the algorithms demand large volumes of data of various natures and from different sources, from material properties to sensor data. In the survey, we propose a roadmap for future developments with emphasis on computer-aided discovery of new materials and analysis of chemical sensing compounds, both prominent research fields for ML in the context of materials science. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to materials science, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
Collapse
Affiliation(s)
- Jose F. Rodrigues
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Larisa Florea
- SFI Research Centre for Advanced Materials and BioEngineering Research Trinity College Dublin, The University of Dublin, Dublin, Ireland
| | - Maria C. F. de Oliveira
- Institute of Mathematical Sciences and Computing, University of São Paulo (USP), São Carlos, SP Brazil
| | - Dermot Diamond
- Insight Centre for Data Analytics, National Centre for Sensor Research, Dublin City University, Dublin 9, Dublin, Ireland
| | - Osvaldo N. Oliveira
- São Carlos Institute of Physics, University of São Paulo (USP), São Carlos, SP Brazil
| |
Collapse
|
20
|
Achary PGR. Applications of Quantitative Structure-Activity Relationships (QSAR) based Virtual Screening in Drug Design: A Review. Mini Rev Med Chem 2020; 20:1375-1388. [DOI: 10.2174/1389557520666200429102334] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 11/07/2019] [Accepted: 11/08/2019] [Indexed: 12/18/2022]
Abstract
The scientists, and the researchers around the globe generate tremendous amount of information
everyday; for instance, so far more than 74 million molecules are registered in Chemical
Abstract Services. According to a recent study, at present we have around 1060 molecules, which are
classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical
space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good
number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today.
The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’
will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules
is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important
computational tool in the drug discovery process; however, experimental verification of the
drugs also equally important for the drug development process. The quantitative structure-activity relationship
(QSAR) analysis is one of the machine learning technique, which is extensively used in VS
techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate.
The QSAR model building involves (i) chemo-genomics data collection from a database or literature
(ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship
(model) between biological activity and the selected descriptors (iv) application of QSAR model to
predict the biological property for the molecules. All the hits obtained by the VS technique needs to be
experimentally verified. The present mini-review highlights: the web-based machine learning tools, the
role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery
and advantages and challenges of QSAR.
Collapse
Affiliation(s)
- Patnala Ganga Raju Achary
- Department of Chemistry, Faculty of Engineering & Technology (ITER), Siksha ‘O’ Anusandhan, Deemed to be University, Khandagiri Square, Bhubaneswar- 751030, India
| |
Collapse
|
21
|
Zhao L, Ciallella HL, Aleksunes LM, Zhu H. Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 2020; 25:1624-1638. [PMID: 32663517 PMCID: PMC7572559 DOI: 10.1016/j.drudis.2020.07.005] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 06/26/2020] [Accepted: 07/06/2020] [Indexed: 02/06/2023]
Abstract
Advancing a new drug to market requires substantial investments in time as well as financial resources. Crucial bioactivities for drug candidates, including their efficacy, pharmacokinetics (PK), and adverse effects, need to be investigated during drug development. With advancements in chemical synthesis and biological screening technologies over the past decade, a large amount of biological data points for millions of small molecules have been generated and are stored in various databases. These accumulated data, combined with new machine learning (ML) approaches, such as deep learning, have shown great potential to provide insights into relevant chemical structures to predict in vitro, in vivo, and clinical outcomes, thereby advancing drug discovery and development in the big data era.
Collapse
Affiliation(s)
- Linlin Zhao
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Heather L Ciallella
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA
| | - Lauren M Aleksunes
- Department of Pharmacology and Toxicology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ 08854, USA
| | - Hao Zhu
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ 08102, USA; Department of Chemistry, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
22
|
Baker CM, Kidley NJ, Papachristos K, Hotson M, Carson R, Gravestock D, Pouliot M, Harrison J, Dowling A. Tautomer Standardization in Chemical Databases: Deriving Business Rules from Quantum Chemistry. J Chem Inf Model 2020; 60:3781-3791. [PMID: 32644790 DOI: 10.1021/acs.jcim.0c00232] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Databases of small, potentially bioactive molecules are ubiquitous across the industry and academia. Designed such that each unique compound should appear only once, the multiplicity of ways in which many compounds can be represented means that these databases require methods for standardizing the representation of chemistry. This is commonly achieved through the use of "Chemistry Business Rules", sets of predefined rules that describe the "house style" of the database in question. At Syngenta, the historical approach to the design of chemistry business rules has been to focus on consistency of representation, with chemical relevance given secondary consideration. In this work, we overturn that convention. Through the use of quantum chemistry calculations, we define a set of chemistry business rules for tautomer standardization that reproduces gas-phase energetic preferences. We go on to show that, compared to our historic approach, this method yields tautomers that are in better agreement with those observed experimentally in condensed phases and that are better suited for use in predictive models.
Collapse
Affiliation(s)
- Christopher M Baker
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Nathan J Kidley
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | | | - Matthew Hotson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Rob Carson
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - David Gravestock
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| | - Martin Pouliot
- Syngenta Crop Protection, Schaffhauserstrasse, Stein CH-4332, Switzerland
| | - Jim Harrison
- Datacraft Technologies, 110 Parkwood Place, Anstead, QLD 4070, Australia
| | - Alan Dowling
- Syngenta, Jealott's Hill International Research Centre, Bracknell, Berkshire RG42 6EY, U.K
| |
Collapse
|
23
|
Winter R, Retel J, Noé F, Clevert DA, Steffen A. grünifai: interactive multiparameter optimization of molecules in a continuous vector space. Bioinformatics 2020; 36:4093-4094. [PMID: 32369561 DOI: 10.1093/bioinformatics/btaa271] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 03/09/2020] [Accepted: 04/27/2020] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Optimizing small molecules in a drug discovery project is a notoriously difficult task as multiple molecular properties have to be considered and balanced at the same time. In this work, we present our novel interactive in silico compound optimization platform termed grünifai to support the ideation of the next generation of compounds under the constraints of a multiparameter objective. grünifai integrates adjustable in silico models, a continuous representation of the chemical space, a scalable particle swarm optimization algorithm and the possibility to actively steer the compound optimization through providing feedback on generated intermediate structures. AVAILABILITY AND IMPLEMENTATION Source code and documentation are freely available under an MIT license and are openly available on GitHub (https://github.com/jrwnter/gruenifai). The backend, including the optimization method and distribution on multiple GPU nodes is written in Python 3. The frontend is written in ReactJS.
Collapse
Affiliation(s)
- Robin Winter
- Department of Digital Technologies, Bayer AG, Berlin 13353, Germany.,Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Joren Retel
- Department of Digital Technologies, Bayer AG, Berlin 13353, Germany
| | - Frank Noé
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | | | - Andreas Steffen
- Department of Digital Technologies, Bayer AG, Berlin 13353, Germany
| |
Collapse
|
24
|
Affiliation(s)
- Thomas Lengauer
- Max Planck Institute for Informatics Saarland Informatics Campus Campus E1 4 66123 Saarbrücken Germany
| |
Collapse
|
25
|
Bagchi A. Latest trends in structure based drug design with protein targets. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2019; 121:1-23. [PMID: 32312418 DOI: 10.1016/bs.apcsb.2019.11.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Structure based drug designing is an important endeavor in the field of structural bioinformatics. Previously the entire process was dependent on the wet-lab experiments to build libraries of ligand molecules. And the molecules used to be tested to determine their binding efficacies with protein target. However, the entire process is very lengthy and above all highly expensive. With the advent of supercomputers and increasing computational powers, the search process for finding suitable ligand molecules against target proteins have become more streamlined and cost-effective. Now the entire ligand search process is performed in-silico with the help of the techniques of virtual screening, molecular docking simulations and molecular dynamics studies. In the present chapter, a brief overview of the computational techniques involved in structure based drug designing is presented with a special emphasis on the thermodynamic principles behind the molecular interactions.
Collapse
Affiliation(s)
- Angshuman Bagchi
- Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, West Bengal, India
| |
Collapse
|
26
|
Schneider P, Walters WP, Plowright AT, Sieroka N, Listgarten J, Goodnow RA, Fisher J, Jansen JM, Duca JS, Rush TS, Zentgraf M, Hill JE, Krutoholow E, Kohler M, Blaney J, Funatsu K, Luebkemann C, Schneider G. Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019. [DOI: 78495111110.1038/s41573-019-0050-3' target='_blank'>'"<>78495111110.1038/s41573-019-0050-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [78495111110.1038/s41573-019-0050-3','', '10.1016/j.drudis.2013.12.004')">Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/29/2022]
78495111110.1038/s41573-019-0050-3" />
|
27
|
Rethinking drug design in the artificial intelligence era. Nat Rev Drug Discov 2019; 19:353-364. [PMID: 31801986 DOI: 10.1038/s41573-019-0050-3] [Citation(s) in RCA: 355] [Impact Index Per Article: 59.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/28/2019] [Indexed: 12/17/2022]
|
28
|
Joshi P, Kawade V, Dhulap S, Goel M. Unlocking the concealed targets using system biology mapping for Alzheimer's disease. Pharmacol Rep 2019; 71:1104-1107. [PMID: 31634797 DOI: 10.1016/j.pharep.2019.06.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 06/14/2019] [Accepted: 06/28/2019] [Indexed: 11/16/2022]
Abstract
BACKGROUND Alzheimer's disease (AD) constitutes a neural loss in histology of brain with involvement of complex genomic and environmental factors. Accumulation of amyloid beta (Aβ) peptide and phosphorylated tau are indicative of progression and cognitive decline. Hence an understanding of the underlying biological pathways and targets along with associated mechanisms would be useful for the development of improved therapeutics for treating AD. In the present work, we aim to identify concealed targets for developing first line therapeutics and repositioning of validated targets as well as FDA- approved drugs using a system biology approach. METHODS We have collated information pertaining to the biological targets as well as the approved drugs, from scientific literature and patents. RESULTS In all, the imbalance in the functioning of around 79 proteins and genes were identified to be involved in Alzheimer's cascade. Amongst them, around 21 targets were found to be under therapeutic consideration for AD. Of the remaining, around 17 targets were reported as potential targets for AD, although they are under researcher's attention for other physio-pathological conditions. The analysis further revealed that ˜41 therapeutic targets are pharmacologically concealed but structurally validated targets and may constitute as potential therapeutic candidate for future drug discovery for AD. CONCLUSION The biological pathway vs. drug mapping provides a complete overview about underlying biological pathways, therapeutic targets (explored and concealed), associated mechanisms, existing therapeutics and the information pertaining to molecules currently under active drug development for further drug discovery and drug re-positioning/repurposing approaches for AD management.
Collapse
Affiliation(s)
- Poorvashree Joshi
- CSIR - Unit for Research and Development of Information Products (URDIP), Pune, Maharashtra, India.
| | - Vikram Kawade
- CSIR - Unit for Research and Development of Information Products (URDIP), Pune, Maharashtra, India
| | - Sivakami Dhulap
- CSIR - Unit for Research and Development of Information Products (URDIP), Pune, Maharashtra, India.
| | - Mandakini Goel
- Solution Consultant, Clarivate Analytics, Bengaluru Karnataka, India
| |
Collapse
|
29
|
Naveja JJ, Pilón-Jiménez BA, Bajorath J, Medina-Franco JL. A general approach for retrosynthetic molecular core analysis. J Cheminform 2019; 11:61. [PMID: 33430974 PMCID: PMC6760108 DOI: 10.1186/s13321-019-0380-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 08/04/2019] [Indexed: 11/13/2022] Open
Abstract
Scaffold analysis of compound data sets has reemerged as a chemically interpretable alternative to machine learning for chemical space and structure–activity relationships analysis. In this context, analog series-based scaffolds (ASBS) are synthetically relevant core structures that represent individual series of analogs. As an extension to ASBS, we herein introduce the development of a general conceptual framework that considers all putative cores of molecules in a compound data set, thus softening the often applied “single molecule–single scaffold” correspondence. A putative core is here defined as any substructure of a molecule complying with two basic rules: (a) the size of the core is a significant proportion of the whole molecule size and (b) the substructure can be reached from the original molecule through a succession of retrosynthesis rules. Thereafter, a bipartite network consisting of molecules and cores can be constructed for a database of chemical structures. Compounds linked to the same cores are considered analogs. We present case studies illustrating the potential of the general framework. The applications range from inter- and intra-core diversity analysis of compound data sets, structure–property relationships, and identification of analog series and ASBS. The molecule–core network herein presented is a general methodology with multiple applications in scaffold analysis. New statistical methods are envisioned that will be able to draw quantitative conclusions from these data. The code to use the method presented in this work is freely available as an additional file. Follow-up applications include analog searching and core structure–property relationships analyses.![]()
Collapse
Affiliation(s)
- J Jesús Naveja
- PECEM, School of Medicine, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico. .,Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| | - B Angélica Pilón-Jiménez
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, 53115, Bonn, Germany
| | - José L Medina-Franco
- Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| |
Collapse
|
30
|
de Almeida AF, Moreira R, Rodrigues T. Synthetic organic chemistry driven by artificial intelligence. Nat Rev Chem 2019. [DOI: 10.1038/s41570-019-0124-0] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
31
|
Strang KD, Sun Z. Hidden big data analytics issues in the healthcare industry. Health Informatics J 2019; 26:981-998. [DOI: 10.1177/1460458219854603] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The goal of the study was to identify big data analysis issues that can impact empirical research in the healthcare industry. To accomplish that the author analyzed big data related keywords from a literature review of peer reviewed journal articles published since 2011. Topics, methods and techniques were summarized along with strengths and weaknesses. A panel of subject matter experts was interviewed to validate the intermediate results and synthesize the key problems that would likely impact researchers conducting quantitative big data analysis in healthcare studies. The systems thinking action research method was applied to identify and describe the hidden issues. The findings were similar to the extant literature but three hidden fatal issues were detected. Methodical and statistical control solutions were proposed to overcome the three fatal healthcare big data analysis issues.
Collapse
Affiliation(s)
| | - Zhaohao Sun
- PNG University of Technology, Papua New Guinea
| |
Collapse
|
32
|
Abstract
The Chemical Information Science Gateway (CISG) of F1000Research was originally conceptualized as a forum for high-quality publications in chemical information science (CIS) including chemoinformatics. Adding a publication venue with open access and open peer review to the CIS field was a prime motivation for the introduction of CISG, aiming to support open science in this area. Herein, the CISG concept is revisited and the development of the gateway over the past four years is reviewed. In addition, opportunities are discussed to better position CISG within the publication spectrum of F1000Research and further increase its visibility and attractiveness for scientific contributions.
Collapse
|
33
|
Batool M, Ahmad B, Choi S. A Structure-Based Drug Discovery Paradigm. Int J Mol Sci 2019; 20:ijms20112783. [PMID: 31174387 PMCID: PMC6601033 DOI: 10.3390/ijms20112783] [Citation(s) in RCA: 320] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Revised: 05/31/2019] [Accepted: 06/04/2019] [Indexed: 12/14/2022] Open
Abstract
Structure-based drug design is becoming an essential tool for faster and more cost-efficient lead discovery relative to the traditional method. Genomic, proteomic, and structural studies have provided hundreds of new targets and opportunities for future drug discovery. This situation poses a major problem: the necessity to handle the “big data” generated by combinatorial chemistry. Artificial intelligence (AI) and deep learning play a pivotal role in the analysis and systemization of larger data sets by statistical machine learning methods. Advanced AI-based sophisticated machine learning tools have a significant impact on the drug discovery process including medicinal chemistry. In this review, we focus on the currently available methods and algorithms for structure-based drug design including virtual screening and de novo drug design, with a special emphasis on AI- and deep-learning-based methods used for drug discovery.
Collapse
Affiliation(s)
- Maria Batool
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Korea.
| | - Bilal Ahmad
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Korea.
| | - Sangdun Choi
- Department of Molecular Science and Technology, Ajou University, Suwon 16499, Korea.
| |
Collapse
|
34
|
Koulouridi E, Valli M, Ntie-Kang F, Bolzani VDS. A primer on natural product-based virtual screening. PHYSICAL SCIENCES REVIEWS 2019. [DOI: 10.1515/psr-2018-0105] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Abstract
Databases play an important role in various computational techniques, including virtual screening (VS) and molecular modeling in general. These collections of molecules can contain a large amount of information, making them suitable for several drug discovery applications. For example, vendor, bioactivity data or target type can be found when searching a database. The introduction of these data resources and their characteristics is used for the design of an experiment. The description of the construction of a database can also be a good advisor for the creation of a new one. There are free available databases and commercial virtual libraries of molecules. Furthermore, a computational chemist can find databases for a general purpose or a specific subset such as natural products (NPs). In this chapter, NP database resources are presented, along with some guidelines when preparing an NP database for drug discovery purposes.
Collapse
|
35
|
Abstract
The objective of this review is to survey the development of the optimization of lyophilization. The optimization study of the lyophilizer has been roughly developing by the order of (i) trial-and-error approach, (ii) process modeling using mathematical models, (iii) scalability, and (iv) quality-by-design. From the conventional lyophilization studies based on the trial-and-error, the key parameters to optimize the operation of lyophilization were found out, i.e., critical material attributes (CMAs), critical process parameters (CPPs), and critical quality attributes (CQAs). The mathematical models using the key parameters mentioned above have been constructed from the viewpoints of the heat and mass transfer natures. In many cases, it is revealed that the control of the primary drying stage determines the outcome of the lyophilization of products, as compared with the freezing stage and the secondary drying stage. Thus, the understanding of the lyophilization process has proceeded. For the further improvement of the time and economical cost, the design space is a promising method to give the possible operation range for optimizing the lyophilization operation. This method is to search the optimized condition by reducing the number of key parameters of CMAs, CPPs, and CQAs. Alternatively, the transfer of lyophilization recipe among the lab-, pilot-, and production-scale lyophilizers (scale-up) has been examined. Notably, the scale-up of lyophilization requires the preservation of lyophilization dynamics between the two scales, i.e., the operation of lab- or pilot-scale lyophilizer under HEPA-filtrated airflow condition. The design space determined by focusing on the primary drying stage is large and involves the undesired variations in the quality of final products due to the heterogeneous size distribution of ice crystals. Accordingly, the control of the formation of the ice crystal with large size gave impact on the product quality and the productivity although the large water content in the final product should be improved. Therefore, the lyophilization should take into account the quality by design (QbD). The monitoring method of the quality of the product in lyophilization process is termed the “process analytical technology (PAT).” Recent PAT tools can reveal the lyophilization dynamics to some extent. A combination of PAT tools with a model/scale-up theory is expected to result in the QbD, i.e., a quality/risk management and an in situ optimization of lyophilization operation.
Collapse
|
36
|
Romano JD, Tatonetti NP. Informatics and Computational Methods in Natural Product Drug Discovery: A Review and Perspectives. Front Genet 2019; 10:368. [PMID: 31114606 PMCID: PMC6503039 DOI: 10.3389/fgene.2019.00368] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Accepted: 04/05/2019] [Indexed: 12/17/2022] Open
Abstract
The discovery of new pharmaceutical drugs is one of the preeminent tasks-scientifically, economically, and socially-in biomedical research. Advances in informatics and computational biology have increased productivity at many stages of the drug discovery pipeline. Nevertheless, drug discovery has slowed, largely due to the reliance on small molecules as the primary source of novel hypotheses. Natural products (such as plant metabolites, animal toxins, and immunological components) comprise a vast and diverse source of bioactive compounds, some of which are supported by thousands of years of traditional medicine, and are largely disjoint from the set of small molecules used commonly for discovery. However, natural products possess unique characteristics that distinguish them from traditional small molecule drug candidates, requiring new methods and approaches for assessing their therapeutic potential. In this review, we investigate a number of state-of-the-art techniques in bioinformatics, cheminformatics, and knowledge engineering for data-driven drug discovery from natural products. We focus on methods that aim to bridge the gap between traditional small-molecule drug candidates and different classes of natural products. We also explore the current informatics knowledge gaps and other barriers that need to be overcome to fully leverage these compounds for drug discovery. Finally, we conclude with a "road map" of research priorities that seeks to realize this goal.
Collapse
Affiliation(s)
- Joseph D. Romano
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
- Department of Systems Biology, Columbia University, New York, NY, United States
- Department of Medicine, Columbia University, New York, NY, United States
- Data Science Institute, Columbia University, New York, NY, United States
| | - Nicholas P. Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
- Department of Systems Biology, Columbia University, New York, NY, United States
- Department of Medicine, Columbia University, New York, NY, United States
- Data Science Institute, Columbia University, New York, NY, United States
| |
Collapse
|
37
|
Hu Y, Bajorath J. SAR Matrix Method for Large-Scale Analysis of Compound Structure-Activity Relationships and Exploration of Multitarget Activity Spaces. Methods Mol Biol 2019; 1825:339-352. [PMID: 30334212 DOI: 10.1007/978-1-4939-8639-2_11] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2023]
Abstract
As the number of compounds and the volume of bioactivity data rapidly grow, advanced computational methods are required to study structure-activity relationships (SARs) on a large scale. Herein, the SAR matrix (SARM) methodology is described that was designed to systematically extract structural relationships between bioactive compounds from large databases, explore structure-activity relationships, and navigate multitarget activity spaces, which is one of the core tasks in chemogenomics. In addition, the SARM approach was designed to visualize structural and structure-activity relationships, which is often of critical importance for making this information available in an intuitive form for practical applications.
Collapse
Affiliation(s)
- Ye Hu
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany.
| |
Collapse
|
38
|
Lenci E, Trabocchi A. Smart Design of Small‐Molecule Libraries: When Organic Synthesis Meets Cheminformatics. Chembiochem 2019; 20:1115-1123. [DOI: 10.1002/cbic.201800751] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Indexed: 12/25/2022]
Affiliation(s)
- Elena Lenci
- Department of Chemistry “Ugo Schiff”University of Florence Via della Lastruccia 13 50019 Sesto Fiorentino Florence Italy
| | - Andrea Trabocchi
- Department of Chemistry “Ugo Schiff”University of Florence Via della Lastruccia 13 50019 Sesto Fiorentino Florence Italy
| |
Collapse
|
39
|
Strang KD. Problems with research methods in medical device big data analytics. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2019. [DOI: 10.1007/s41060-019-00176-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
40
|
Tuvi-Arad I, Blonder R. Technology in the Service of Pedagogy: Teaching with Chemistry Databases. Isr J Chem 2018. [DOI: 10.1002/ijch.201800076] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Inbal Tuvi-Arad
- Department of Natural Sciences; The Open University of Israel; Israel
| | - Ron Blonder
- Department of Science Education; The Weizmann Institute of Science; Israel
| |
Collapse
|
41
|
Abstract
Advances in computer processing speed and storage capacity have enabled researchers to generate virtual chemical libraries containing billions of molecules. While these numbers appear large, they are only a small fraction of the number of organic molecules that could potentially be synthesized. This review provides an overview of recent advances in the generation and use of virtual chemical libraries in medicinal chemistry. We also consider the practical implications of these libraries in drug discovery programs and highlight a number of current and future challenges.
Collapse
Affiliation(s)
- W Patrick Walters
- Relay Therapeutics , 215 First Street , Cambridge , Massachusetts 02142 , United States
| |
Collapse
|
42
|
Foundations of data-driven medicinal chemistry. Future Sci OA 2018; 4:FSO320. [PMID: 30271612 PMCID: PMC6153455 DOI: 10.4155/fsoa-2018-0057] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 05/22/2018] [Indexed: 11/17/2022] Open
|
43
|
Sosnina EA, Osolodkin DI, Radchenko EV, Sosnin S, Palyulin VA. Influence of Descriptor Implementation on Compound Ranking Based on Multiparameter Assessment. J Chem Inf Model 2018; 58:1083-1093. [PMID: 29689160 DOI: 10.1021/acs.jcim.7b00734] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Most of the common molecular descriptors have numerous different implementations. This can influence the results of compound prioritization based on the multiparameter assessment (MPA) approach that allows a medicinal chemist to simultaneously analyze and achieve the desired balance of the diverse and often conflicting molecular and pharmacological properties. In this study, we analyzed the feasibility of using different implementations of common descriptors (logP, logS, TPSA, logBB, hERG, nHBA) interchangeably in predesigned sets of requirements in the course of multiparameter compound optimization. The influence of methods of descriptor calculation, continuity or discreteness of their values, their applicability domains, as well as of the nature of desirability functions in an MPA profile were examined in terms of the stability of MPA compound ranking. It was shown that the interchangeable use of different methods of descriptor calculation is reliably acceptable only for continuously distributed parameters transformed by a smooth desirability function. If a descriptor in an MPA scheme is discretely distributed, only the implementation that was used for building the scoring profile may be used for assessment. An inconsistency of assessment due to different applicability domains of descriptors was also demonstrated.
Collapse
Affiliation(s)
- Ekaterina A Sosnina
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Center for Computational and Data-Intensive Science and Engineering , Skolkovo Institute of Science and Technology , Moscow 143026 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| | - Dmitry I Osolodkin
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Chumakov Institute of Poliomyelitis and Viral Encephalitides, Chumakov FSC R&D IBP RAS , Moscow 108819 , Russia.,Sechenov First Moscow State Medical University , Moscow 119991 , Russia
| | - Eugene V Radchenko
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| | - Sergey Sosnin
- Center for Computational and Data-Intensive Science and Engineering , Skolkovo Institute of Science and Technology , Moscow 143026 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| | - Vladimir A Palyulin
- Department of Chemistry , Lomonosov Moscow State University , Moscow 119991 , Russia.,Institute of Physiologically Active Compounds RAS , Chernogolovka 142432 , Russia
| |
Collapse
|
44
|
Tauler R, Parastar H. Big (Bio)Chemical Data Mining Using Chemometric Methods: A Need for Chemists. Angew Chem Int Ed Engl 2018; 61:e201801134. [DOI: 10.1002/anie.201801134] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Indexed: 11/08/2022]
Affiliation(s)
- Roma Tauler
- IDAEA-CSIC Environmental Chemistry Jordi Girona 18-26 08034 Barcelona SPAIN
| | | |
Collapse
|
45
|
Kooistra AJ, Vass M, McGuire R, Leurs R, de Esch IJP, Vriend G, Verhoeven S, de Graaf C. 3D-e-Chem: Structural Cheminformatics Workflows for Computer-Aided Drug Discovery. ChemMedChem 2018; 13:614-626. [PMID: 29337438 PMCID: PMC5900740 DOI: 10.1002/cmdc.201700754] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 01/11/2018] [Indexed: 01/06/2023]
Abstract
eScience technologies are needed to process the information available in many heterogeneous types of protein-ligand interaction data and to capture these data into models that enable the design of efficacious and safe medicines. Here we present scientific KNIME tools and workflows that enable the integration of chemical, pharmacological, and structural information for: i) structure-based bioactivity data mapping, ii) structure-based identification of scaffold replacement strategies for ligand design, iii) ligand-based target prediction, iv) protein sequence-based binding site identification and ligand repurposing, and v) structure-based pharmacophore comparison for ligand repurposing across protein families. The modular setup of the workflows and the use of well-established standards allows the re-use of these protocols and facilitates the design of customized computer-aided drug discovery workflows.
Collapse
Affiliation(s)
- Albert J. Kooistra
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Márton Vass
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Ross McGuire
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
- BioAxis Research, Pivot ParkOssThe Netherlands
| | - Rob Leurs
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Iwan J. P. de Esch
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI)Radboud University Medical Center (RadboudUMC)NijmegenThe Netherlands
| | | | - Chris de Graaf
- Division of Medicinal Chemistry, Faculty of Science, Amsterdam Institute for Molecules, Medicines and Systems (AIMMS)Vrije Universiteit AmsterdamAmsterdamThe Netherlands
| |
Collapse
|
46
|
Humbeck L, Weigang S, Schäfer T, Mutzel P, Koch O. CHIPMUNK: A Virtual Synthesizable Small-Molecule Library for Medicinal Chemistry, Exploitable for Protein-Protein Interaction Modulators. ChemMedChem 2018; 13:532-539. [PMID: 29392860 DOI: 10.1002/cmdc.201700689] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 01/27/2018] [Indexed: 02/05/2023]
Abstract
A common issue during drug design and development is the discovery of novel scaffolds for protein targets. On the one hand the chemical space of purchasable compounds is rather limited; on the other hand artificially generated molecules suffer from a grave lack of accessibility in practice. Therefore, we generated a novel virtual library of small molecules which are synthesizable from purchasable educts, called CHIPMUNK (CHemically feasible In silico Public Molecular UNiverse Knowledge base). Altogether, CHIPMUNK covers over 95 million compounds and encompasses regions of the chemical space that are not covered by existing databases. The coverage of CHIPMUNK exceeds the chemical space spanned by the Lipinski rule of five to foster the exploration of novel and difficult target classes. The analysis of the generated property space reveals that CHIPMUNK is well suited for the design of protein-protein interaction inhibitors (PPIIs). Furthermore, a recently developed structural clustering algorithm (StruClus) for big data was used to partition the sub-libraries into meaningful subsets and assist scientists to process the large amount of data. These clustered subsets also contain the target space based on ChEMBL data which was included during clustering.
Collapse
Affiliation(s)
- Lina Humbeck
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, Dortmund, 44227, Germany
| | - Sebastian Weigang
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, Dortmund, 44227, Germany
| | - Till Schäfer
- Department of Computer Science, TU Dortmund University, Otto-Hahn-Straße 14, Dortmund, 44227, Germany
| | - Petra Mutzel
- Department of Computer Science, TU Dortmund University, Otto-Hahn-Straße 14, Dortmund, 44227, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Straße 6, Dortmund, 44227, Germany
| |
Collapse
|
47
|
Lin A, Horvath D, Afonina V, Marcou G, Reymond JL, Varnek A. Mapping of the Available Chemical Space versus the Chemical Universe of Lead-Like Compounds. ChemMedChem 2018; 13:540-554. [PMID: 29154440 DOI: 10.1002/cmdc.201700561] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Revised: 11/07/2017] [Indexed: 12/15/2022]
Abstract
This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment-like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB-17 and real-world PubChem/ChEMBL databases). The challenge was to prove that a robust map of fragment-like chemical space can actually be built, in spite of a limited (≪105 ) maximal number of compounds ("frame set") usable for fitting the GTM manifold. An evolutionary map building strategy has been updated with a "coverage check" step, which discards manifolds failing to accommodate compounds outside the frame set. The evolved map has a good propensity to separate actives from inactives for more than 20 external structure-activity sets. It was proven to properly accommodate the entire collection of 40 m compounds. Next, it served as a library comparison tool to highlight biases of real-world molecules (PubChem and ChEMBL) versus the universe of all possible species represented by FDB-17, a fragment-like subset of GDB-17 containing 10 million molecules. Specific patterns, proper to some libraries and absent from others (diversity holes), were highlighted.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina Afonina
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Department of Organic Chemistry, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya str., 420008, Kazan, Russia
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne, 3 Freiestrasse, 3012, Berne, Switzerland
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
48
|
Kaur P, Sharma M, Mittal M. Big Data and Machine Learning Based Secure Healthcare Framework. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.procs.2018.05.020] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
49
|
Kaur D, Mathew S, Nair CGS, Begum A, Jainanarayan AK, Sharma M, Brahmachari SK. Structure based drug discovery for designing leads for the non-toxic metabolic targets in multi drug resistant Mycobacterium tuberculosis. J Transl Med 2017; 15:261. [PMID: 29268770 PMCID: PMC5740895 DOI: 10.1186/s12967-017-1363-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2017] [Accepted: 12/08/2017] [Indexed: 01/09/2023] Open
Abstract
Background The problem of drug resistance and bacterial persistence in tuberculosis is a cause of global alarm. Although, the UN’s Sustainable Development Goals for 2030 has targeted a Tb free world, the treatment gap exists and only a few new drug candidates are in the pipeline. In spite of large information from medicinal chemistry to ‘omics’ data, there has been a little effort from pharmaceutical companies to generate pipelines for the development of novel drug candidates against the multi drug resistant Mycobacterium tuberculosis. Methods In the present study, we describe an integrated methodology; utilizing systems level information to optimize ligand selection to lower the failure rates at the pre-clinical and clinical levels. In the present study, metabolic targets (Rv2763c, Rv3247c, Rv1094, Rv3607c, Rv3048c, Rv2965c, Rv2361c, Rv0865, Rv0321, Rv0098, Rv0390, Rv3588c, Rv2244, Rv2465c and Rv2607) in M. tuberculosis, identified using our previous Systems Biology and data-intensive genome level analysis, have been used to design potential lead molecules, which are likely to be non-toxic. Various in silico drug discovery tools have been utilized to generate small molecular leads for each of the 15 targets with available crystal structures. Results The present study resulted in identification of 20 novel lead molecules including 4 FDA approved drugs (droxidropa, tetroxoprim, domperidone and nemonapride) which can be further taken for drug repurposing. This comprehensive integrated methodology, with both experimental and in silico approaches, has the potential to not only tackle the MDR form of Mtb but also the most important persister population of the bacterium, with a potential to reduce the failures in the Tb drug discovery. Conclusion We propose an integrated approach of systems and structural biology for identifying targets that address the high attrition rate issue in lead identification and drug development We expect that this system level analysis will be applicable for identification of drug candidates to other pathogenic organisms as well. Electronic supplementary material The online version of this article (10.1186/s12967-017-1363-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Divneet Kaur
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Shalu Mathew
- Centre for Open Innovation-Indian Centre for Social Transformation, Bengaluru, Karnataka, India
| | - Chinchu G S Nair
- Centre for Open Innovation-Indian Centre for Social Transformation, Bengaluru, Karnataka, India
| | - Azitha Begum
- Centre for Open Innovation-Indian Centre for Social Transformation, Bengaluru, Karnataka, India
| | - Ashwin K Jainanarayan
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India.,Indian Institute of Science Education and Research (IISER), Mohali, India
| | - Mukta Sharma
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Samir K Brahmachari
- CSIR-Institute of Genomics and Integrative Biology, New Delhi, India. .,Centre for Open Innovation-Indian Centre for Social Transformation, Bengaluru, Karnataka, India. .,Academy of Scientific and Innovative Research, New Delhi, India. .,CSIR-Open Source Drug Discovery Unit, New Delhi, India.
| |
Collapse
|
50
|
Richter L. Topliss Batchwise Schemes Reviewed in the Era of Open Data Reveal Significant Differences between Enzymes and Membrane Receptors. J Chem Inf Model 2017; 57:2575-2583. [PMID: 28934851 DOI: 10.1021/acs.jcim.7b00195] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In 1977, John G. Topliss introduced the Topliss Batchwise Scheme, a straightforward nonmathematical procedure to assist medicinal chemists in optimizing the substitution pattern of a phenyl ring. Despite its long period of application, a thorough validation of this method has been missing so far. Here, we address this issue by gathering 129 congeneric series from the ChEMBL database, suitable to retrospectively assess the approach. Frequency analysis of Topliss' schemes showed that the π, Es, σ, and -σ scheme occurred in 17, 20, 6, and 4 congeneric series, respectively. We observed a significant difference of π scheme frequency in enzymes versus membrane receptors, with 12 versus only 2 occurrences. Validation of Topliss schemes in potency optimization showed a remarkable performance increase after restricting the data set to analogue series tested solely against enzymes. In this setting, the Es and the π scheme were successful in 50% and 56% of the analogue series, respectively.
Collapse
Affiliation(s)
- Lars Richter
- Pharmacoinformatics Research Group, Department of Pharmaceutical Chemistry, University of Vienna , Althanstrasse 14, 1090 Vienna, Austria
| |
Collapse
|