1
|
Qiu X, Wang H, Tan X, Fang Z. G-K BertDTA: A graph representation learning and semantic embedding-based framework for drug-target affinity prediction. Comput Biol Med 2024; 173:108376. [PMID: 38552281 DOI: 10.1016/j.compbiomed.2024.108376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 03/21/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Developing new drugs is costly, time-consuming, and risky. Drug-target affinity (DTA), indicating the binding capability between drugs and target proteins, is a crucial indicator for drug development. Accurately predicting interaction strength between new drug-target pairs by analyzing previous experiments aids in screening potential drug molecules, repurposing them, and developing safe and effective medicines. Existing computational models for DTA prediction rely on strings or single-graph neural networks, lacking consideration of protein structure and molecular semantic information, leading to limited accuracy. Our experiments demonstrate that string-based methods may overlook protein conformations, causing a high root mean square error (RMSE) of 3.584 in affinity due to a lack of spatial context. Single graph networks also underperform on topology features, with a 6% lower confidence interval (CI) for activity classification. Absent semantic information also limits generalization across diverse compounds, resulting in 18% increment in RMSE and 5% in misclassifications within quantifications study, restricting potential drug discovery. To address these limitations, we propose G-K BertDTA, a novel framework for accurate DTA prediction incorporating protein features, molecular semantic features, and molecular structural information. In this proposed model, we represent drugs as graphs, with a GIN employed to learn the molecular topological information. For the extraction of protein structural features, we utilize a DenseNet architecture. A knowledge-based BERT semantic model is incorporated to obtain rich pre-trained semantic embeddings, thereby enhancing the feature information. We extensively evaluated our proposed approach on the publicly available benchmark datasets (i.e., KIBA and Davis), and experimental results demonstrate the promising performance of our method, which consistently outperforms previous state-of-the-art approaches. Code is available at https://github.com/AmbitYuki/G-K-BertDTA.
Collapse
Affiliation(s)
- Xihe Qiu
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Haoyu Wang
- School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai, China
| | - Xiaoyu Tan
- INF Technology (Shanghai) Co., Ltd., Shanghai, China
| | - Zhijun Fang
- School of Computer Science and Technology, Donghua University, Shanghai, China.
| |
Collapse
|
2
|
Barakat A, Munro G, Heegaard AM. Finding new analgesics: Computational pharmacology faces drug discovery challenges. Biochem Pharmacol 2024; 222:116091. [PMID: 38412924 DOI: 10.1016/j.bcp.2024.116091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/10/2024] [Accepted: 02/23/2024] [Indexed: 02/29/2024]
Abstract
Despite the worldwide prevalence and huge burden of pain, pain is an undertreated phenomenon. Currently used analgesics have several limitations regarding their efficacy and safety. The discovery of analgesics possessing a novel mechanism of action has faced multiple challenges, including a limited understanding of biological processes underpinning pain and analgesia and poor animal-to-human translation. Computational pharmacology is currently employed to face these challenges. In this review, we discuss the theory, methods, and applications of computational pharmacology in pain research. Computational pharmacology encompasses a wide variety of theoretical concepts and practical methodological approaches, with the overall aim of gaining biological insight through data acquisition and analysis. Data are acquired from patients or animal models with pain or analgesic treatment, at different levels of biological organization (molecular, cellular, physiological, and behavioral). Distinct methodological algorithms can then be used to analyze and integrate data. This helps to facilitate the identification of biological molecules and processes associated with pain phenotype, build quantitative models of pain signaling, and extract translatable features between humans and animals. However, computational pharmacology has several limitations, and its predictions can provide false positive and negative findings. Therefore, computational predictions are required to be validated experimentally before drawing solid conclusions. In this review, we discuss several case study examples of combining and integrating computational tools with experimental pain research tools to meet drug discovery challenges.
Collapse
Affiliation(s)
- Ahmed Barakat
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Department of Pharmacology and Toxicology, Faculty of Pharmacy, Assiut University, Assiut, Egypt.
| | | | - Anne-Marie Heegaard
- Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
3
|
Guo L, Kong D, Liu J, Luo L, Zheng W, Chen C, Sun S. Searching for Essential Genes and Targeted Drugs Common to Breast Cancer and Osteoarthritis. Comb Chem High Throughput Screen 2024; 27:238-255. [PMID: 37157194 DOI: 10.2174/1386207326666230508113036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 03/07/2023] [Accepted: 03/17/2023] [Indexed: 05/10/2023]
Abstract
BACKGROUND It is documented that osteoarthritis can promote the progression of breast cancer (BC). OBJECTIVE This study aims to search for the essential genes associated with breast cancer (BC) and osteoarthritis (OA), explore the relationship between epithelial-mesenchymal transition (EMT)- related genes and the two diseases, and identify the candidate drugs. METHODS The genes related to both BC and OA were determined by text mining. Protein-protein Interaction (PPI) analysis was carried out, and as a result, the exported genes were found to be related to EMT. PPI and the correlation of mRNA of these genes were also analyzed. Different kinds of enrichment analyses were performed on these genes. A prognostic analysis was performed on these genes for examining their expression levels at different pathological stages, in different tissues, and in different immune cells. Drug-gene interaction database was employed for potential drug discovery. RESULTS A total number of 1422 genes were identified as common to BC and OA and 58 genes were found to be related to EMT. We found that HDAC2 and TGFBR1 were significantly poor in overall survival. High expression of HDAC2 plays a vital role in the increase of pathological stages. Four immune cells might play a role in this process. Fifty-seven drugs were identified that could potentially have therapeutic effects. CONCLUSION EMT may be one of the mechanisms by which OA affects BC. Using the drugs can have potential therapeutic effects, which may benefit patients with both diseases and broaden the indications for drug use.
Collapse
Affiliation(s)
- Liantao Guo
- Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, Hubei 430060, People's Republic of China
| | - Deguang Kong
- Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, Hubei 430060, People's Republic of China
| | - Jianhua Liu
- Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, Hubei 430060, People's Republic of China
| | - Lan Luo
- Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, Hubei 430060, People's Republic of China
| | - Weijie Zheng
- Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, Hubei 430060, People's Republic of China
| | - Chuang Chen
- Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, Hubei 430060, People's Republic of China
| | - Shengrong Sun
- Department of Breast and Thyroid Surgery, Renmin Hospital of Wuhan University, No. 238 Jiefang Road, Wuhan, Hubei 430060, People's Republic of China
| |
Collapse
|
4
|
Fuenteslópez CV, McKitrick A, Corvi J, Ginebra MP, Hakimi O. Biomaterials text mining: A hands-on comparative study of methods on polydioxanone biocompatibility. N Biotechnol 2023; 77:161-175. [PMID: 37673372 DOI: 10.1016/j.nbt.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/14/2023] [Accepted: 09/02/2023] [Indexed: 09/08/2023]
Abstract
Scientific information extraction is fundamental for research and innovation, but is currently mostly a manual, time-consuming process. Text Mining tools (TMTs) enable automated, accurate and quick information extraction from text, but there is little precedent of their use in the biomaterials field. Here, we compare the ability of various TMTs to extract useful information from biomaterials abstracts. Focusing on the biocompatibility of polydioxanone, a biodegradable polymer for which there are relatively few scientific publications, we tested several tools ranging from machine learning approaches and statistical text analysis to MeSH indexing and domain-specific semantic tools for Named Entity Recognition. We also evaluated their output alongside a manual review of systematic reviews and meta-analyses. The findings show that TMTs can be highly efficient and powerful for mapping biomaterials texts and rapidly yield up-to-date information. Here, TMTs enable one to identify dominating themes, see the evolution of specific terms and topics, and learn about key medical applications in biomaterials literature over the years. The analysis also shows that ambiguity around biomaterials nomenclature is a significant challenge in mining biomedical literature that is yet to be tackled. This research showcases the potential value of using Natural Language Processing and domain-specific tools to extract and organize biomaterials data.
Collapse
Affiliation(s)
- Carla V Fuenteslópez
- Institute of Biomedical Engineering, Botnar Research Centre, Nuffield Orthopaedic Centre, University of Oxford, Oxford OX3 7LD, UK.
| | - Austin McKitrick
- Institute of Social Research, University of Michigan, MI 48104, USA
| | - Javier Corvi
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Maria-Pau Ginebra
- Department of Materials Science and Engineering, Universitat Politècnica de Catalunya, Barcelona 08019, Spain
| | - Osnat Hakimi
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain; Department of Materials Science and Engineering, Universitat Politècnica de Catalunya, Barcelona 08019, Spain; Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona 08017, Spain.
| |
Collapse
|
5
|
Huang Q, Zhang H, Zhang L, Xu B. Bacterial microbiota in different types of processed meat products: diversity, adaptation, and co-occurrence. Crit Rev Food Sci Nutr 2023:1-16. [PMID: 37905560 DOI: 10.1080/10408398.2023.2272770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
As a double-edged sword, some bacterial microbes can improve the quality and shelf life of meat products, but others mainly responsible for deterioration of the safety and quality of meat products. This review aims to present a landscape of the bacterial microbiota in different types of processed meat products. After demonstrating a panoramic view of the bacterial genera in meat products, the diversity of bacterial microbiota was evaluated in two dimensions, namely different types of processed meat products and different meats. Then, the influence of environmental factors on bacterial communities was evaluated according to the storage temperature, packaging conditions, and sterilization methods. Furthermore, microbes are not independent. To explore interactions among those genera, co-occurrence patterns were examined. In these respects, this review highlighted the recent advances in fundamental principles that underlie the environmental adaption tricks and why some species tend to occur together frequently, such as metabolic cross-feeding, co-aggregate at microscale, and the intercellular signaling system. Further investigations are required to unveil the underlying molecular mechanisms that govern microbial community systems, ultimately contributing to developing new strategies to harness beneficial microorganisms and control harmful microorganisms.
Collapse
Affiliation(s)
- Qianli Huang
- Engineering Research Center of Bio-process, Ministry of Education, Hefei University of Technology, Hefei, China
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, China
| | - Huijuan Zhang
- Engineering Research Center of Bio-process, Ministry of Education, Hefei University of Technology, Hefei, China
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, China
| | - Li Zhang
- Engineering Research Center of Bio-process, Ministry of Education, Hefei University of Technology, Hefei, China
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, China
| | - Baocai Xu
- Engineering Research Center of Bio-process, Ministry of Education, Hefei University of Technology, Hefei, China
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, China
| |
Collapse
|
6
|
Evaluation of the extraction of methodological study characteristics with JATSdecoder. Sci Rep 2023; 13:139. [PMID: 36599903 DOI: 10.1038/s41598-022-27085-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 12/26/2022] [Indexed: 01/06/2023] Open
Abstract
This paper introduces and evaluates the study.character module from the JATSdecoder package which extracts several key methodological study characteristics from NISO-JATS coded scientific articles. STUDY character splits the text into sections and applies its heuristic-driven extraction procedures to the text of the method and result section/s. When used individually, study.character's functions can also be applied to any textual input. An externally coded data set of 288 PDF articles serves as an indicator of study.character's capabilities in extracting the number of sub-studies reported per article, the statistical methods applied and software solutions used. Its precision of extraction of the reported [Formula: see text]-level, power, correction procedures for multiple testing, use of interactions, definition of outlier, and mentions of statistical assumptions are evaluated by a comparison to a manually curated data set of the same collection of articles. Sensitivity, specificity, and accuracy measures are reported for each of the evaluated functions. STUDY character reliably extracts the methodological study characteristics targeted here from psychological research articles. Most extractions have very low false positive rates and high accuracy ([Formula: see text]). Most non-detections are due to PDF-specific conversion errors and complex text structures, that are not yet manageable. STUDY character can be applied to large text resources in order to examine methodological trends over time, by journal and/or by topic. It also enables a new way of identifying study sets for meta-analyzes and systematic reviews.
Collapse
|
7
|
Weber L, Sänger M, Garda S, Barth F, Alt C, Leser U. Chemical-protein relation extraction with ensembles of carefully tuned pretrained language models. Database (Oxford) 2022; 2022:6833204. [PMID: 36399413 PMCID: PMC9674024 DOI: 10.1093/database/baac098] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 10/18/2022] [Accepted: 10/21/2022] [Indexed: 11/19/2022]
Abstract
The identification of chemical-protein interactions described in the literature is an important task with applications in drug design, precision medicine and biotechnology. Manual extraction of such relationships from the biomedical literature is costly and often prohibitively time-consuming. The BioCreative VII DrugProt shared task provides a benchmark for methods for the automated extraction of chemical-protein relations from scientific text. Here we describe our contribution to the shared task and report on the achieved results. We define the task as a relation classification problem, which we approach with pretrained transformer language models. Upon this basic architecture, we experiment with utilizing textual and embedded side information from knowledge bases as well as additional training data to improve extraction performance. We perform a comprehensive evaluation of the proposed model and the individual extensions including an extensive hyperparameter search leading to 2647 different runs. We find that ensembling and choosing the right pretrained language model are crucial for optimal performance, whereas adding additional data and embedded side information did not improve results. Our best model is based on an ensemble of 10 pretrained transformers and additional textual descriptions of chemicals taken from the Comparative Toxicogenomics Database. The model reaches an F1 score of 79.73% on the hidden DrugProt test set and achieves the first rank out of 107 submitted runs in the official evaluation. Database URL: https://github.com/leonweber/drugprot.
Collapse
Affiliation(s)
- Leon Weber
- *Corresponding authors: Tel: +49 30 209341293; Emails: and
| | - Mario Sänger
- Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany
| | - Samuele Garda
- Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany
| | - Fabio Barth
- Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany
| | - Christoph Alt
- Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin 10099, Germany,Research Cluster of Excellence, Science of Intelligence, Marchstr. 23, Berlin 10587, Germany
| | - Ulf Leser
- *Corresponding authors: Tel: +49 30 209341293; Emails: and
| |
Collapse
|
8
|
An automatic hypothesis generation for plausible linkage between xanthium and diabetes. Sci Rep 2022; 12:17547. [PMID: 36266295 PMCID: PMC9585073 DOI: 10.1038/s41598-022-20752-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 09/19/2022] [Indexed: 01/13/2023] Open
Abstract
There has been a significant increase in text mining implementation for biomedical literature in recent years. Previous studies introduced the implementation of text mining and literature-based discovery to generate hypotheses of potential candidates for drug development. By conducting a hypothesis-generation step and using evidence from published journal articles or proceedings, previous studies have managed to reduce experimental time and costs. First, we applied the closed discovery approach from Swanson's ABC model to collect publications related to 36 Xanthium compounds or diabetes. Second, we extracted biomedical entities and relations using a knowledge extraction engine, the Public Knowledge Discovery Engine for Java or PKDE4J. Third, we built a knowledge graph using the obtained bio entities and relations and then generated paths with Xanthium compounds as source nodes and diabetes as the target node. Lastly, we employed graph embeddings to rank each path and evaluated the results based on domain experts' opinions and literature. Among 36 Xanthium compounds, 35 had direct paths to five diabetes-related nodes. We ranked 2,740,314 paths in total between 35 Xanthium compounds and three diabetes-related phrases: type 1 diabetes, type 2 diabetes, and diabetes mellitus. Based on the top five percentile paths, we concluded that adenosine, choline, beta-sitosterol, rhamnose, and scopoletin were potential candidates for diabetes drug development using natural products. Our framework for hypothesis generation employs a closed discovery from Swanson's ABC model that has proven very helpful in discovering biological linkages between bio entities. The PKDE4J tools we used to capture bio entities from our document collection could label entities into five categories: genes, compounds, phenotypes, biological processes, and molecular functions. Using the BioPREP model, we managed to interpret the semantic relatedness between two nodes and provided paths containing valuable hypotheses. Lastly, using a graph-embedding algorithm in our path-ranking analysis, we exploited the semantic relatedness while preserving the graph structure properties.
Collapse
|
9
|
Gonzalez-Hernandez G, Krallinger M, Muñoz M, Rodriguez-Esteban R, Uzuner Ö, Hirschman L. Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers. Database (Oxford) 2022; 2022:6682867. [PMID: 36050787 PMCID: PMC9436770 DOI: 10.1093/database/baac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 11/17/2022]
Abstract
Monitoring drug safety is a central concern throughout the drug life cycle. Information about toxicity and adverse events is generated at every stage of this life cycle, and stakeholders have a strong interest in applying text mining and artificial intelligence (AI) methods to manage the ever-increasing volume of this information. Recognizing the importance of these applications and the role of challenge evaluations to drive progress in text mining, the organizers of BioCreative VII (Critical Assessment of Information Extraction in Biology) convened a panel of experts to explore ‘Challenges in Mining Drug Adverse Reactions’. This article is an outgrowth of the panel; each panelist has highlighted specific text mining application(s), based on their research and their experiences in organizing text mining challenge evaluations. While these highlighted applications only sample the complexity of this problem space, they reveal both opportunities and challenges for text mining to aid in the complex process of drug discovery, testing, marketing and post-market surveillance. Stakeholders are eager to embrace natural language processing and AI tools to help in this process, provided that these tools can be demonstrated to add value to stakeholder workflows. This creates an opportunity for the BioCreative community to work in partnership with regulatory agencies, pharma and the text mining community to identify next steps for future challenge evaluations.
Collapse
Affiliation(s)
- Graciela Gonzalez-Hernandez
- Department of Computational Biomedicine, Cedars-Sinai Medical Center , 700 N. San Vicente Blvd., West Hollywood, CA 90069, USA
| | - Martin Krallinger
- Life Sciences—Text Mining, Barcelona Supercomputing Center , Plaça Eusebi Güell, 1-3, Barcelona 08034, Spain
| | - Monica Muñoz
- Division of Pharmacovigilance, Office of Surveillance and Epidemiology, Center of Drug Evaluation and Research, FDA , 10903 New Hampshire Ave, Silver Spring, MD 20993, USA
| | - Raul Rodriguez-Esteban
- Roche Innovation Center Basel, Roche Pharmaceuticals , Grenzacherstrasse 124, Basel 4070, Switzerland
| | - Özlem Uzuner
- Information Sciences and Technology, George Mason University , 4400 University Dr, Fairfax, VA 22030, USA
| | - Lynette Hirschman
- MITRE Labs, The MITRE Corporation , 202 Burlington Rd., Bedford, MA 01730, USA
| |
Collapse
|
10
|
A Survey on Deep Networks Approaches in Prediction of Sequence-Based Protein–Protein Interactions. SN COMPUTER SCIENCE 2022; 3:298. [PMID: 35611239 PMCID: PMC9119573 DOI: 10.1007/s42979-022-01197-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/06/2022] [Indexed: 12/03/2022]
Abstract
The prominence of protein–protein interactions (PPIs) in system biology with diverse biological procedures has become the topic to discuss because it acts as a fundamental part in predicting the protein function of the target protein and drug ability of molecules. Numerous researches have been published to predict PPIs computationally because they provide an alternative solution to laboratory trials and a cost-effective way of predicting the most likely set of interactions at the entire proteome scale. In recent computational methods, deep learning has become a buzzword with numerous scientific researches. This paper presents, for the first time, a comprehensive survey of sequence-based PPI prediction by three popular deep learning architectures i.e. deep neural networks, convolutional neural networks and recurrent neural networks and its variants. The thorough survey discussed herein carefully mined every possible information, can help the researchers to further explore the success in this area.
Collapse
|
11
|
Computational drug repurposing based on electronic health records: a scoping review. NPJ Digit Med 2022; 5:77. [PMID: 35701544 PMCID: PMC9198008 DOI: 10.1038/s41746-022-00617-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 05/19/2022] [Indexed: 11/30/2022] Open
Abstract
Computational drug repurposing methods adapt Artificial intelligence (AI) algorithms for the discovery of new applications of approved or investigational drugs. Among the heterogeneous datasets, electronic health records (EHRs) datasets provide rich longitudinal and pathophysiological data that facilitate the generation and validation of drug repurposing. Here, we present an appraisal of recently published research on computational drug repurposing utilizing the EHR. Thirty-three research articles, retrieved from Embase, Medline, Scopus, and Web of Science between January 2000 and January 2022, were included in the final review. Four themes, (1) publication venue, (2) data types and sources, (3) method for data processing and prediction, and (4) targeted disease, validation, and released tools were presented. The review summarized the contribution of EHR used in drug repurposing as well as revealed that the utilization is hindered by the validation, accessibility, and understanding of EHRs. These findings can support researchers in the utilization of medical data resources and the development of computational methods for drug repurposing.
Collapse
|
12
|
Bhatnagar R, Sardar S, Beheshti M, Podichetty JT. How can natural language processing help model informed drug development?: a review. JAMIA Open 2022; 5:ooac043. [PMID: 35702625 PMCID: PMC9188322 DOI: 10.1093/jamiaopen/ooac043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/28/2022] [Accepted: 05/26/2022] [Indexed: 01/20/2023] Open
Abstract
Objective To summarize applications of natural language processing (NLP) in model informed drug development (MIDD) and identify potential areas of improvement. Materials and Methods Publications found on PubMed and Google Scholar, websites and GitHub repositories for NLP libraries and models. Publications describing applications of NLP in MIDD were reviewed. The applications were stratified into 3 stages: drug discovery, clinical trials, and pharmacovigilance. Key NLP functionalities used for these applications were assessed. Programming libraries and open-source resources for the implementation of NLP functionalities in MIDD were identified. Results NLP has been utilized to aid various processes in drug development lifecycle such as gene-disease mapping, biomarker discovery, patient-trial matching, adverse drug events detection, etc. These applications commonly use NLP functionalities of named entity recognition, word embeddings, entity resolution, assertion status detection, relation extraction, and topic modeling. The current state-of-the-art for implementing these functionalities in MIDD applications are transformer models that utilize transfer learning for enhanced performance. Various libraries in python, R, and Java like huggingface, sparkNLP, and KoRpus as well as open-source platforms such as DisGeNet, DeepEnroll, and Transmol have enabled convenient implementation of NLP models to MIDD applications. Discussion Challenges such as reproducibility, explainability, fairness, limited data, limited language-support, and security need to be overcome to ensure wider adoption of NLP in MIDD landscape. There are opportunities to improve the performance of existing models and expand the use of NLP in newer areas of MIDD. Conclusions This review provides an overview of the potential and pitfalls of current NLP approaches in MIDD.
Collapse
Affiliation(s)
- Roopal Bhatnagar
- Data Science, Data Collaboration Center, Critical Path Institute , Tucson, Arizona, USA
| | - Sakshi Sardar
- Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
| | - Maedeh Beheshti
- Quantitative Medicine, Critical Path Institute , Tucson, Arizona, USA
| | | |
Collapse
|
13
|
Merging data curation and machine learning to improve nanomedicines. Adv Drug Deliv Rev 2022; 183:114172. [PMID: 35189266 PMCID: PMC9233944 DOI: 10.1016/j.addr.2022.114172] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 01/28/2022] [Accepted: 02/16/2022] [Indexed: 12/12/2022]
Abstract
Nanomedicine design is often a trial-and-error process, and the optimization of formulations and in vivo properties requires tremendous benchwork. To expedite the nanomedicine research progress, data science is steadily gaining importance in the field of nanomedicine. Recently, efforts have explored the potential to predict nanomaterials synthesis and biological behaviors via advanced data analytics. Machine learning algorithms process large datasets to understand and predict various material properties in nanomedicine synthesis, pharmacologic parameters, and efficacy. "Big data" approaches may enable even larger advances, especially if researchers capitalize on data curation methods. However, the concomitant use of data curation processes needed to facilitate the acquisition and standardization of large, heterogeneous data sets, to support advanced data analytics methods such as machine learning has yet to be leveraged. Currently, data curation and data analytics areas of nanotechnology-focused data science, or 'nanoinformatics', have been proceeding largely independently. This review highlights the current efforts in both areas and the potential opportunities for coordination to advance the capabilities of data analytics in nanomedicine.
Collapse
|
14
|
Đuriš J, Pilović J, Džunić M, Cvijić S, Ibrić S. Application of text-mining techniques for extraction and analysis of paracetamol and ibuprofen marketed products' qualitative composition. ARHIV ZA FARMACIJU 2022. [DOI: 10.5937/arhfarm72-40397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Text mining (TM) applications in the field of biomedicine are gaining great interest. TM tools can facilitate formulation development by analyzing textual information from patent databases, scientific articles, summary of products characteristics, etc. The aim of this study was to utilize TM tools to perform qualitative analysis of paracetamol (PAR) and ibuprofen (IBU) formulations, in terms of identifying and evaluating the presence of excipients specific to the active pharmaceutical ingredient (API) and/or dosage form. A total of 152 products were analyzed. Web-scraping was used to retrieve the data, and Python-based open-source software Orange 3.31.1 was used for TM and statistical analysis (ANOVA) of the obtained results. The majority of marketed products for both APIs were tablets. The predominant excipients in all tablet formulations were povidone, starch, microcrystalline cellulose and hypromellose. Povidone, stearic acid, potassium sorbate, maize starch and pregelatinized starch occurred more frequently in PAR tablets. On the other hand, titanium dioxide, lactose, shellac, sucrose and ammonium hydroxide were specific to IBU tablets. PAR oral suspensions more frequently contained dispersible cellulose; liquid sorbitol; methyl and propyl parahydroxybenzoate, glycerol and acesulfame potassium. Specific excipients in other PAR dosage forms, such as effervescent tablets, hard capsules, oral powders, solutions and suspensions, as well as IBU gels and soft capsules, were also evaluated.
Collapse
|
15
|
Manoharan S, Iyyappan OR. A Hybrid Protocol for Finding Novel Gene Targets for Various Diseases Using Microarray Expression Data Analysis and Text Mining. Methods Mol Biol 2022; 2496:41-70. [PMID: 35713858 DOI: 10.1007/978-1-0716-2305-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The advancement in technology for various scientific experiments and the amount of raw data produced from that is enormous, thus giving rise to various subsets of biologists working with genome, proteome, transcriptome, expression, pathway, and so on. This has led to exponential growth in scientific literature which is becoming beyond the means of manual curation and annotation for extracting information of importance. Microarray data are expression data, analysis of which results in a set of up/downregulated lists of genes that are functionally annotated to ascertain the biological meaning of genes. These genes are represented as vocabularies and/or Gene Ontology terms when associated with pathway enrichment analysis need relational and conceptual understanding to a disease. The chapter deals with a hybrid approach we designed for identifying novel drug-disease targets. Microarray data for muscular dystrophy is explored here as an example and text mining approaches are utilized with an aim to identify promisingly novel drug targets. Our main objective is to give a basic overview from a biologist's perspective for whom text mining approaches of data mining and information retrieval is fairly a new concept. The chapter aims to bridge the gap between biologist and computational text miners and bring about unison for a more informative research in a fast and time efficient manner.
Collapse
Affiliation(s)
- Sharanya Manoharan
- Department of Bioinformatics, Stella Maris College (Autonomous), Chennai, Tamilnadu, India.
| | - Oviya Ramalakshmi Iyyappan
- Department of Sciences, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Chennai, Tamilnadu, India
| |
Collapse
|
16
|
Dehghan Z, Mohammadi-Yeganeh S, Sameni M, Mirmotalebisohi SA, Zali H, Salehi M. Repurposing new drug candidates and identifying crucial molecules underlying PCOS Pathogenesis Based On Bioinformatics Analysis. Daru 2021; 29:353-366. [PMID: 34480296 PMCID: PMC8416576 DOI: 10.1007/s40199-021-00413-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 08/16/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUNDS Polycystic ovary syndrome affects 7% of women of reproductive ages. Poor-quality oocytes, along with lower cleavage and implantation rates, reduce fertilization. OBJECTIVE This study aimed to determine crucial molecular mechanisms behind PCOS pathogenesis and repurpose new drug candidates interacting with them. To predict a more in-depth insight, we applied a novel bioinformatics approach to analyze interactions between the drug-related and PCOS proteins in PCOS patients. METHODS The newest proteomics data was retrieved from 16 proteomics datasets and was used to construct the PCOS PPI network using Cytoscape. The topological network analysis determined hubs and bottlenecks. The MCODE Plugin was used to identify highly connected regions, and the associations between PCOS clusters and drug-related proteins were evaluated using the Chi-squared/Fisher's exact test. The crucial PPI hub-bottlenecks and the shared molecules (between the PCOS clusters and drug-related proteins) were then investigated for their drug-protein interactions with previously US FDA-approved drugs to predict new drug candidates. RESULTS The PI3K/AKT pathway was significantly related to one PCOS subnetwork and most drugs (metformin, letrozole, pioglitazone, and spironolactone); moreover, VEGF, EGF, TGFB1, AGT, AMBP, and RBP4 were identified as the shared proteins between the PCOS subnetwork and the drugs. The shared top biochemical pathways between another PCOS subnetwork and rosiglitazone included metabolic pathways, carbon metabolism, and citrate cycle, while the shared proteins included HSPB1, HSPD1, ACO2, TALDO1, VDAC1, and MDH2. We proposed some new candidate medicines for further PCOS treatment investigations, such as copper and zinc compounds, reteplase, alteplase, gliclazide, Etc. CONCLUSION Some of the crucial molecules suggested by our model have already been experimentally reported as critical molecules in PCOS pathogenesis. Moreover, some repurposed medications have already shown beneficial effects on infertility treatment. These previous experimental reports confirm our suggestion for investigating our other repurposed drugs (in vitro and in vivo).
Collapse
Affiliation(s)
- Zeinab Dehghan
- Student Research Committee, Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Cellular & Molecular Biology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Samira Mohammadi-Yeganeh
- Cellular & Molecular Biology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Marzieh Sameni
- Student Research Committee, Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Cellular & Molecular Biology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Amir Mirmotalebisohi
- Student Research Committee, Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Cellular & Molecular Biology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hakimeh Zali
- Department of Tissue Engineering and Applied Cell Sciences, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Salehi
- Cellular & Molecular Biology Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
- Department of Medical Biotechnology, School of Advanced Technologies in Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
17
|
Software review: The JATSdecoder package-extract metadata, abstract and sectioned text from NISO-JATS coded XML documents; Insights to PubMed central's open access database. Scientometrics 2021; 126:9585-9601. [PMID: 34720253 PMCID: PMC8542361 DOI: 10.1007/s11192-021-04162-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 09/08/2021] [Indexed: 11/17/2022]
Abstract
JATSdecoder is a general toolbox which facilitates text extraction and analytical tasks on NISO-JATS coded XML documents. Its function JATSdecoder() outputs metadata, the abstract, the sectioned text and reference list as easy selectable elements. One of the biggest repositories for open access full texts covering biology and the medical and health sciences is PubMed Central (PMC), with more than 3.2 million files. This report provides an overview of the PMC document collection processed with JATSdecoder(). The development of extracted tags is displayed for the full corpus over time and in greater detail for some meta tags. Possibilities and limitations for text miners working with scientific literature are outlined. The NISO-JATS-tags are used quite consistently nowadays and allow a reliable extraction of metadata and text elements. International collaborations are more present than ever. There are obvious errors in the date stamps of some documents. Only about half of all articles from 2020 contain at least one author listed with an author identification code. Since many authors share the same name, the identification of person-related content is problematic, especially for authors with Asian names. JATSdecoder() reliably extracts key metadata and text elements from NISO-JATS coded XML files. When combined with the rich, publicly available content within PMCs database, new monitoring and text mining approaches can be carried out easily. Any selection of article subsets should be carefully performed with in- and exclusion criteria on several NISO-JATS tags, as both the subject and keyword tags are used quite inconsistently.
Collapse
|
18
|
Mak KK, Balijepalli MK, Pichika MR. Success stories of AI in drug discovery - where do things stand? Expert Opin Drug Discov 2021; 17:79-92. [PMID: 34553659 DOI: 10.1080/17460441.2022.1985108] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
INTRODUCTION Artificial intelligence (AI) in drug discovery and development (DDD) has gained more traction in the past few years. Many scientific reviews have already been made available in this area. Thus, in this review, the authors have focused on the success stories of AI-driven drug candidates and the scientometric analysis of the literature in this field. AREA COVERED The authors explore the literature to compile the success stories of AI-driven drug candidates that are currently being assessed in clinical trials or have investigational new drug (IND) status. The authors also provide the reader with their expert perspectives for future developments and their opinions on the field. EXPERT OPINION Partnerships between AI companies and the pharma industry are booming. The early signs of the impact of AI on DDD are encouraging, and the pharma industry is hoping for breakthroughs. AI can be a promising technology to unveil the greatest successes, but it has yet to be proven as AI is still at the embryonic stage.
Collapse
Affiliation(s)
- Kit-Kay Mak
- School of Postgraduate Studies and Research, International Medical University, Bukit Jalil, Malaysia.,Department of Pharmaceutical Chemistry, School of Pharmacy, International Medical University, Bukit Jalil, Malaysia.,Centre for Bioactive Molecules and Drug Delivery, Institute for Research, Development, and Innovation (Irdi), International Medical University, Bukit Jalil, Malaysia
| | | | - Mallikarjuna Rao Pichika
- Department of Pharmaceutical Chemistry, School of Pharmacy, International Medical University, Bukit Jalil, Malaysia.,Centre for Bioactive Molecules and Drug Delivery, Institute for Research, Development, and Innovation (Irdi), International Medical University, Bukit Jalil, Malaysia
| |
Collapse
|
19
|
Shaker B, Ahmad S, Lee J, Jung C, Na D. In silico methods and tools for drug discovery. Comput Biol Med 2021; 137:104851. [PMID: 34520990 DOI: 10.1016/j.compbiomed.2021.104851] [Citation(s) in RCA: 123] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/05/2021] [Accepted: 09/05/2021] [Indexed: 12/28/2022]
Abstract
In the past, conventional drug discovery strategies have been successfully employed to develop new drugs, but the process from lead identification to clinical trials takes more than 12 years and costs approximately $1.8 billion USD on average. Recently, in silico approaches have been attracting considerable interest because of their potential to accelerate drug discovery in terms of time, labor, and costs. Many new drug compounds have been successfully developed using computational methods. In this review, we briefly introduce computational drug discovery strategies and outline up-to-date tools to perform the strategies as well as available knowledge bases for those who develop their own computational models. Finally, we introduce successful examples of anti-bacterial, anti-viral, and anti-cancer drug discoveries that were made using computational methods.
Collapse
Affiliation(s)
- Bilal Shaker
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Sajjad Ahmad
- Department of Health and Biological Sciences, Abasyn University, Peshawar, 25000, Pakistan
| | - Jingyu Lee
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Chanjin Jung
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, 84 Heukseok-ro, Dongjak-gu, Seoul, 06974, Republic of Korea.
| |
Collapse
|
20
|
Parolo S, Tomasoni D, Bora P, Ramponi A, Kaddi C, Azer K, Domenici E, Neves-Zaph S, Lombardo R. Reconstruction of the Cytokine Signaling in Lysosomal Storage Diseases by Literature Mining and Network Analysis. Front Cell Dev Biol 2021; 9:703489. [PMID: 34490253 PMCID: PMC8417786 DOI: 10.3389/fcell.2021.703489] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/30/2021] [Indexed: 11/13/2022] Open
Abstract
Lysosomal storage diseases (LSDs) are characterized by the abnormal accumulation of substrates in tissues due to the deficiency of lysosomal proteins. Among the numerous clinical manifestations, chronic inflammation has been consistently reported for several LSDs. However, the molecular mechanisms involved in the inflammatory response are still not completely understood. In this study, we performed text-mining and systems biology analyses to investigate the inflammatory signals in three LSDs characterized by sphingolipid accumulation: Gaucher disease, Acid Sphingomyelinase Deficiency (ASMD), and Fabry Disease. We first identified the cytokines linked to the LSDs, and then built on the extracted knowledge to investigate the inflammatory signals. We found numerous transcription factors that are putative regulators of cytokine expression in a cell-specific context, such as the signaling axes controlled by STAT2, JUN, and NR4A2 as candidate regulators of the monocyte Gaucher disease cytokine network. Overall, our results suggest the presence of a complex inflammatory signaling in LSDs involving many cellular and molecular players that could be further investigated as putative targets of anti-inflammatory therapies.
Collapse
Affiliation(s)
- Silvia Parolo
- Fondazione the Microsoft Research-University of Trento Centre for Computational and Systems Biology, Rovereto, Italy
| | - Danilo Tomasoni
- Fondazione the Microsoft Research-University of Trento Centre for Computational and Systems Biology, Rovereto, Italy
| | - Pranami Bora
- Fondazione the Microsoft Research-University of Trento Centre for Computational and Systems Biology, Rovereto, Italy
| | - Alan Ramponi
- Fondazione the Microsoft Research-University of Trento Centre for Computational and Systems Biology, Rovereto, Italy
| | - Chanchala Kaddi
- Data and Data Science - Translational Disease Modeling, Sanofi, Bridgewater, NJ, United States
| | - Karim Azer
- Data and Data Science - Translational Disease Modeling, Sanofi, Bridgewater, NJ, United States
| | - Enrico Domenici
- Fondazione the Microsoft Research-University of Trento Centre for Computational and Systems Biology, Rovereto, Italy.,Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Susana Neves-Zaph
- Data and Data Science - Translational Disease Modeling, Sanofi, Bridgewater, NJ, United States
| | - Rosario Lombardo
- Fondazione the Microsoft Research-University of Trento Centre for Computational and Systems Biology, Rovereto, Italy
| |
Collapse
|
21
|
Shukla R, Henkel ND, Alganem K, Hamoud AR, Reigle J, Alnafisah RS, Eby HM, Imami AS, Creeden JF, Miruzzi SA, Meller J, Mccullumsmith RE. Signature-based approaches for informed drug repurposing: targeting CNS disorders. Neuropsychopharmacology 2021; 46:116-130. [PMID: 32604402 PMCID: PMC7688959 DOI: 10.1038/s41386-020-0752-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Revised: 05/30/2020] [Accepted: 06/22/2020] [Indexed: 12/15/2022]
Abstract
CNS disorders, and in particular psychiatric illnesses, lack definitive disease-altering therapeutics. The limited understanding of the mechanisms driving these illnesses with the slow pace and high cost of drug development exacerbates this issue. For these reasons, drug repurposing - both a less expensive and time-efficient practice compared to de novo drug development - has been a promising strategy to overcome the paucity of treatments available for these debilitating disorders. While empirical drug-repurposing has been a routine practice in clinical psychiatry, innovative, informed, and cost-effective repurposing efforts using big data ("omics") have been designed to characterize drugs by structural and transcriptomic signatures. These strategies, in conjunction with ontological integration, provide an important opportunity to address knowledge-based challenges associated with drug development for CNS disorders. In this review, we discuss various signature-based in silico approaches to drug repurposing, its integration with multiple omics platforms, and how this data can be used for clinically relevant, evidence-based drug repurposing. These tools provide an exciting translational avenue to merge omics-based drug discovery platforms with patient-specific disease signatures, ultimately facilitating the identification of new therapies for numerous psychiatric disorders.
Collapse
Affiliation(s)
- Rammohan Shukla
- Department of Neurosciences, University of Toledo, Toledo, OH, USA.
| | | | - Khaled Alganem
- Department of Neurosciences, University of Toledo, Toledo, OH, USA
| | | | - James Reigle
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | | | - Hunter M Eby
- Department of Neurosciences, University of Toledo, Toledo, OH, USA
| | - Ali S Imami
- Department of Neurosciences, University of Toledo, Toledo, OH, USA
| | - Justin F Creeden
- Department of Neurosciences, University of Toledo, Toledo, OH, USA
| | - Scott A Miruzzi
- Department of Neurosciences, University of Toledo, Toledo, OH, USA
| | - Jaroslaw Meller
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
- Department of Cancer Biology, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
- Department of Environmental Health, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Department of Electrical Engineering and Computing Systems, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Department of Informatics, Nicolaus Copernicus University, Torun, Poland
| | - Robert E Mccullumsmith
- Department of Neurosciences, University of Toledo, Toledo, OH, USA
- Neurosciences Institute, ProMedica, Toledo, OH, USA
| |
Collapse
|
22
|
Hansson LK, Hansen RB, Pletscher-Frankild S, Berzins R, Hansen DH, Madsen D, Christensen SB, Christiansen MR, Boulund U, Wolf XA, Kjærulff SK, van de Bunt M, Tulin S, Jensen TS, Wernersson R, Jensen JN. Semantic text mining in early drug discovery for type 2 diabetes. PLoS One 2020; 15:e0233956. [PMID: 32542027 PMCID: PMC7295186 DOI: 10.1371/journal.pone.0233956] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 05/15/2020] [Indexed: 11/18/2022] Open
Abstract
Background Surveying the scientific literature is an important part of early drug discovery; and with the ever-increasing amount of biomedical publications it is imperative to focus on the most interesting articles. Here we present a project that highlights new understanding (e.g. recently discovered modes of action) and identifies potential drug targets, via a novel, data-driven text mining approach to score type 2 diabetes (T2D) relevance. We focused on monitoring trends and jumps in T2D relevance to help us be timely informed of important breakthroughs. Methods We extracted over 7 million n-grams from PubMed abstracts and then clustered around 240,000 linked to T2D into almost 50,000 T2D relevant ‘semantic concepts’. To score papers, we weighted the concepts based on co-mentioning with core T2D proteins. A protein’s T2D relevance was determined by combining the scores of the papers mentioning it in the five preceding years. Each week all proteins were ranked according to their T2D relevance. Furthermore, the historical distribution of changes in rank from one week to the next was used to calculate the significance of a change in rank by T2D relevance for each protein. Results We show that T2D relevant papers, even those not mentioning T2D explicitly, were prioritised by relevant semantic concepts. Well known T2D proteins were therefore enriched among the top scoring proteins. Our ‘high jumpers’ identified important past developments in the apprehension of how certain key proteins relate to T2D, indicating that our method will make us aware of future breakthroughs. In summary, this project facilitated keeping up with current T2D research by repeatedly providing short lists of potential novel targets into our early drug discovery pipeline.
Collapse
Affiliation(s)
- Lena K. Hansson
- Novo Nordisk Research Centre Oxford, Novo Nordisk Ltd., Oxford, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Rasmus Wernersson
- Intomics A/S, Kgs. Lyngby, Denmark
- DTU Health Tech, Technical University of Denmark, Kgs. Lyngby, Denmark
- * E-mail:
| | - Jan Nygaard Jensen
- Novo Nordisk Research Centre Oxford, Novo Nordisk Ltd., Oxford, United Kingdom
| |
Collapse
|
23
|
Jiang M, Li Z, Bian Y, Wei Z. A novel protein descriptor for the prediction of drug binding sites. BMC Bioinformatics 2019; 20:478. [PMID: 31533611 PMCID: PMC6749706 DOI: 10.1186/s12859-019-3058-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Accepted: 08/27/2019] [Indexed: 12/23/2022] Open
Abstract
Background Binding sites are the pockets of proteins that can bind drugs; the discovery of these pockets is a critical step in drug design. With the help of computers, protein pockets prediction can save manpower and financial resources. Results In this paper, a novel protein descriptor for the prediction of binding sites is proposed. Information on non-bonded interactions in the three-dimensional structure of a protein is captured by a combination of geometry-based and energy-based methods. Moreover, due to the rapid development of deep learning, all binding features are extracted to generate three-dimensional grids that are fed into a convolution neural network. Two datasets were introduced into the experiment. The sc-PDB dataset was used for descriptor extraction and binding site prediction, and the PDBbind dataset was used only for testing and verification of the generalization of the method. The comparison with previous methods shows that the proposed descriptor is effective in predicting the binding sites. Conclusions A new protein descriptor is proposed for the prediction of the drug binding sites of proteins. This method combines the three-dimensional structure of a protein and non-bonded interactions with small molecules to involve important factors influencing the formation of binding site. Analysis of the experiments indicates that the descriptor is robust for site prediction. Electronic supplementary material The online version of this article (10.1186/s12859-019-3058-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mingjian Jiang
- Department of Computer Science and Technology, Ocean University of China, 238 Songling Road, Qingdao, 266100, China
| | - Zhen Li
- Department of Computer Science and Technology, Ocean University of China, 238 Songling Road, Qingdao, 266100, China.,Pilot National Laboratory for Marine Science and Technology (Qingdao), 1 Wenhai Road Aoshanwei, Qingdao, 266237, China
| | - Yujie Bian
- Department of Computer Science and Technology, Ocean University of China, 238 Songling Road, Qingdao, 266100, China
| | - Zhiqiang Wei
- Department of Computer Science and Technology, Ocean University of China, 238 Songling Road, Qingdao, 266100, China. .,Pilot National Laboratory for Marine Science and Technology (Qingdao), 1 Wenhai Road Aoshanwei, Qingdao, 266237, China.
| |
Collapse
|