1
|
Stefan SM, Pahnke J, Namasivayam V. HD_BPMDS: a curated binary pattern multitarget dataset of Huntington's disease-targeting agents. J Cheminform 2023; 15:109. [PMID: 37978560 PMCID: PMC10655317 DOI: 10.1186/s13321-023-00775-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 10/25/2023] [Indexed: 11/19/2023] Open
Abstract
The discovery of both distinctive lead molecules and novel drug targets is a great challenge in drug discovery, which particularly accounts for orphan diseases. Huntington's disease (HD) is an orphan, neurodegenerative disease of which the pathology is well-described. However, its pathophysiological background and molecular mechanisms are poorly understood. To date, only 2 drugs have been approved on the US and European markets, both of which address symptomatic aspects of this disease only. Although several hundreds of agents were described with efficacy against the HD phenotype in in vitro and/or in vivo models, a successful translation into clinical use is rarely achieved. Two major impediments are, first, the lack of awareness and understanding of the interactome-the sum of key proteins, cascades, and mediators-that contributes to HD initiation and progression; and second, the translation of the little gained knowledge into useful model systems. To counteract this lack of data awareness, we manually compiled and curated the entire modulator landscape of successfully evaluated pre-clinical small-molecule HD-targeting agents which are annotated with substructural molecular patterns, physicochemical properties, as well as drug targets, and which were linked to benchmark databases such as PubChem, ChEMBL, or UniProt. Particularly, the annotation with substructural molecular patterns expressed as binary code allowed for the generation of target-specific and -unspecific fingerprints which could be used to determine the (poly)pharmacological profile of molecular-structurally distinct molecules.
Collapse
Affiliation(s)
- Sven Marcel Stefan
- Drug Development and Chemical Biology, Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein, Ratzeburger Allee 160, 23538, Lübeck, Germany
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372, Oslo, Norway
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW, 2006, Australia
| | - Jens Pahnke
- Drug Development and Chemical Biology, Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein, Ratzeburger Allee 160, 23538, Lübeck, Germany
- Department of Pathology, Section of Neuropathology, Translational Neurodegeneration Research and Neuropathology Lab, University of Oslo and Oslo University Hospital, Sognsvannsveien 20, 0372, Oslo, Norway
- Department of Pharmacology, Faculty of Medicine, University of Latvia, Jelgavas Iela 4, Rīga, 1004, Latvia
- Department of Neurobiology, The Georg S. Wise Faculty of Life Sciences, Tel Aviv University, 6997801, Tel Aviv, Israel
| | - Vigneshwaran Namasivayam
- Drug Development and Chemical Biology, Lübeck Institute of Experimental Dermatology (LIED), University of Lübeck and University Medical Center Schleswig-Holstein, Ratzeburger Allee 160, 23538, Lübeck, Germany.
- Department of Pharmaceutical and Cellbiological Chemistry, Pharmaceutical Institute, University of Bonn, An Der Immenburg 4, 53121, Bonn, Germany.
| |
Collapse
|
2
|
Southan C. Opening up connectivity between documents, structures and bioactivity. Beilstein J Org Chem 2020; 16:596-606. [PMID: 32280387 PMCID: PMC7136548 DOI: 10.3762/bjoc.16.54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open
Abstract
Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC50) is reported for chemical structure "C" that modulates (e.g., inhibits) a protein target "P". A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into open databases. Progress may come from trends such as open science, open access (OA), findable, accessible, interoperable and reusable (FAIR), resource description framework (RDF) and WikiData. However, we will need to await the technical applicability in respect to DARCP capture to see if this opens up connectivity.
Collapse
Affiliation(s)
- Christopher Southan
- Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, EH8 9XD, UK.,TW2Informatics Ltd, Västra Frölunda, Gothenburg, 42166, Sweden
| |
Collapse
|
3
|
Southan C, Sharman JL, Faccenda E, Pawson AJ, Harding SD, Davies JA. Challenges of Connecting Chemistry to Pharmacology: Perspectives from Curating the IUPHAR/BPS Guide to PHARMACOLOGY. ACS OMEGA 2018; 3:8408-8420. [PMID: 30087946 PMCID: PMC6070956 DOI: 10.1021/acsomega.8b00884] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 07/12/2018] [Indexed: 06/08/2023]
Abstract
Connecting chemistry to pharmacology has been an objective of Guide to PHARMACOLOGY (GtoPdb) and its precursor the International Union of Basic and Clinical Pharmacology Database (IUPHAR-DB) since 2003. This has been achieved by populating our database with expert-curated relationships between documents, assays, quantitative results, chemical structures, their locations within the documents, and the protein targets in the assays (D-A-R-C-P). A wide range of challenges associated with this are described in this perspective, using illustrative examples from GtoPdb entries. Our selection process begins with judgments of pharmacological relevance and scientific quality. Even though we have a stringent focus for our small-data extraction, we note that assessing the quality of papers has become more difficult over the last 15 years. We discuss ambiguity issues with the resolution of authors' descriptions of A-R-C-P entities to standardized identifiers. We also describe developments that have made this somewhat easier over the same period both in the publication ecosystem and recent enhancements of our internal processes. This perspective concludes with a look at challenges for the future, including the wider capture of mechanistic nuances and possible impacts of text mining on automated entity extraction.
Collapse
|
4
|
Dashti H, Westler WM, Markley JL, Eghbalnia HR. Unique identifiers for small molecules enable rigorous labeling of their atoms. Sci Data 2017; 4:170073. [PMID: 28534867 PMCID: PMC5441290 DOI: 10.1038/sdata.2017.73] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Accepted: 04/28/2017] [Indexed: 11/09/2022] Open
Abstract
Rigorous characterization of small organic molecules in terms of their structural and biological properties is vital to biomedical research. The three-dimensional structure of a molecule, its 'photo ID', is inefficient for searching and matching tasks. Instead, identifiers play a key role in accessing compound data. Unique and reproducible molecule and atom identifiers are required to ensure the correct cross-referencing of properties associated with compounds archived in databases. The best approach to this requirement is the International Chemical Identifier (InChI). However, the current implementation of InChI fails to provide a complete standard for atom nomenclature, and incorrect use of the InChI standard has resulted in the proliferation of non-unique identifiers. We propose a methodology and associated software tools, named ALATIS, that overcomes these shortcomings. ALATIS is an adaptation of InChI, which operates fully within the InChI convention to provide unique and reproducible molecule and all atom identifiers. ALATIS includes an InChI extension for unique atom labeling of symmetric molecules. ALATIS forms the basis for improving reproducibility and unifying cross-referencing across databases.
Collapse
Affiliation(s)
- Hesam Dashti
- National Magnetic Resonance Facility at Madison, Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - William M Westler
- National Magnetic Resonance Facility at Madison, Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - John L Markley
- National Magnetic Resonance Facility at Madison, Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| | - Hamid R Eghbalnia
- National Magnetic Resonance Facility at Madison, Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
| |
Collapse
|
5
|
Krallinger M, Rabal O, Lourenço A, Oyarzabal J, Valencia A. Information Retrieval and Text Mining Technologies for Chemistry. Chem Rev 2017; 117:7673-7761. [PMID: 28475312 DOI: 10.1021/acs.chemrev.6b00851] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.
Collapse
Affiliation(s)
- Martin Krallinger
- Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre , C/Melchor Fernández Almagro 3, Madrid E-28029, Spain
| | - Obdulia Rabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Anália Lourenço
- ESEI - Department of Computer Science, University of Vigo , Edificio Politécnico, Campus Universitario As Lagoas s/n, Ourense E-32004, Spain.,Centro de Investigaciones Biomédicas (Centro Singular de Investigación de Galicia) , Campus Universitario Lagoas-Marcosende, Vigo E-36310, Spain.,CEB-Centre of Biological Engineering, University of Minho , Campus de Gualtar, Braga 4710-057, Portugal
| | - Julen Oyarzabal
- Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra , Avenida Pio XII 55, Pamplona E-31008, Spain
| | - Alfonso Valencia
- Life Science Department, Barcelona Supercomputing Centre (BSC-CNS) , C/Jordi Girona, 29-31, Barcelona E-08034, Spain.,Joint BSC-IRB-CRG Program in Computational Biology, Parc Científic de Barcelona , C/ Baldiri Reixac 10, Barcelona E-08028, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig de Lluís Companys 23, Barcelona E-08010, Spain
| |
Collapse
|
6
|
Abstract
Nonspecific bioactivity and assay artifacts have gained increasing attention in recent years. This focus has arisen primarily from the publication of a set of chemical substructures, termed pan assay interference compounds (PAINS), which are associated with promiscuous bioactivity and assay interference in real and virtual high-throughput screening (HTS) campaigns. Despite an increasing awareness in the HTS and medicinal chemistry communities about the liabilities of these compounds, articles featuring PAINS and PAINS-like compounds are still being published. In this perspective, we describe some of the factors we believe are driving this resource-sapping trend. We also provide what we hope are helpful insights that may lead to the earlier recognition of these generally nontranslatable compounds, thus preventing the propagation of PAINS-full costly research.
Collapse
Affiliation(s)
- Jayme L Dahlin
- 1 Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic College of Medicine, Mayo Clinic College of Medicine , Rochester, Minnesota.,2 Medical Scientist Training Program, Mayo Clinic College of Medicine, Mayo Clinic College of Medicine , Rochester, Minnesota
| | - Michael A Walters
- 3 Institute for Therapeutics Discovery and Development, University of Minnesota , Minneapolis, Minnesota
| |
Collapse
|
7
|
Warr WA. Many InChIs and quite some feat. J Comput Aided Mol Des 2015; 29:681-94. [PMID: 26081259 DOI: 10.1007/s10822-015-9854-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 06/10/2015] [Indexed: 12/14/2022]
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, Holmes Chapel, Crewe, Cheshire, CW4 7HZ, UK,
| |
Collapse
|
8
|
Abstract
It is increasingly clear that academic high-throughput screening (HTS) and virtual HTS triage suffers from a lack of scientists trained in the art and science of early drug discovery chemistry. Many recent publications report the discovery of compounds by screening that are most likely artifacts or promiscuous bioactive compounds, and these results are not placed into the context of previous studies. For HTS to be most successful, it is our contention that there must exist an early partnership between biologists and medicinal chemists. Their combined skill sets are necessary to design robust assays and efficient workflows that will weed out assay artifacts, false positives, promiscuous bioactive compounds and intractable screening hits, efforts that ultimately give projects a better chance at identifying truly useful chemical matter. Expertise in medicinal chemistry, cheminformatics and purification sciences (analytical chemistry) can enhance the post-HTS triage process by quickly removing these problematic chemotypes from consideration, while simultaneously prioritizing the more promising chemical matter for follow-up testing. It is only when biologists and chemists collaborate effectively that HTS can manifest its full promise.
Collapse
|
9
|
Lipinski CA, Litterman NK, Southan C, Williams AJ, Clark AM, Ekins S. Parallel worlds of public and commercial bioactive chemistry data. J Med Chem 2014; 58:2068-76. [PMID: 25415348 PMCID: PMC4360371 DOI: 10.1021/jm5011308] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
![]()
The
availability of structures and linked bioactivity data in databases
is powerfully enabling for drug discovery and chemical biology. However,
we now review some confounding issues with the divergent expansions
of public and commercial sources of chemical structures. These are
associated with not only expanding patent extraction but also increasingly
large vendor collections amassed via different selection criteria
between SciFinder from Chemical Abstracts Service (CAS) and major
public sources such as PubChem, ChemSpider, UniChem, and others. These
increasingly massive collections may include both real and virtual
compounds, as well as so-called prophetic compounds from patents.
We address a range of issues raised by the challenges faced resolving
the NIH probe compounds. In addition we highlight the confounding
of prior-art searching by virtual compounds that could impact the
composition of matter patentability of a new medicinal chemistry lead.
Finally, we propose some potential solutions.
Collapse
Affiliation(s)
- Christopher A Lipinski
- Christopher A. Lipinski, Ph.D., LLC , 10 Connshire Drive, Waterford, Connecticut 06385-4122, United States
| | | | | | | | | | | |
Collapse
|