1
|
Katritsis NM, Liu A, Youssef G, Rathee S, MacMahon M, Hwang W, Wollman L, Han N. dialogi: Utilising NLP With Chemical and Disease Similarities to Drive the Identification of Drug-Induced Liver Injury Literature. Front Genet 2022; 13:894209. [PMID: 36017500 PMCID: PMC9395939 DOI: 10.3389/fgene.2022.894209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Accepted: 06/17/2022] [Indexed: 11/13/2022] Open
Abstract
Drug-Induced Liver Injury (DILI), despite its low occurrence rate, can cause severe side effects or even lead to death. Thus, it is one of the leading causes for terminating the development of new, and restricting the use of already-circulating, drugs. Moreover, its multifactorial nature, combined with a clinical presentation that often mimics other liver diseases, complicate the identification of DILI-related (or “positive”) literature, which remains the main medium for sourcing results from the clinical practice and experimental studies. This work–contributing to the “Literature AI for DILI Challenge” of the Critical Assessment of Massive Data Analysis (CAMDA) 2021– presents an automated pipeline for distinguishing between DILI-positive and negative publications. We used Natural Language Processing (NLP) to filter out the uninformative parts of a text, and identify and extract mentions of chemicals and diseases. We combined that information with small-molecule and disease embeddings, which are capable of capturing chemical and disease similarities, to improve classification performance. The former were directly sourced from the Chemical Checker (CC). For the latter, we collected data that encode different aspects of disease similarity from the National Library of Medicine’s (NLM) Medical Subject Headings (MeSH) thesaurus and the Comparative Toxicogenomics Database (CTD). Following a similar procedure as the one used in the CC, vector representations for diseases were learnt and evaluated. Two Neural Network (NN) classifiers were developed: a baseline model that accepts texts as input and an augmented, extended, model that also utilises chemical and disease embeddings. We trained, validated, and tested the classifiers through a Nested Cross-Validation (NCV) scheme with 10 outer and 5 inner folds. During this, the baseline and extended models performed virtually identically, with F1-scores of 95.04 ± 0.61% and 94.80 ± 0.41%, respectively. Upon validation on an external, withheld, dataset that is meant to assess classifier generalisability, the extended model achieved an F1-score of 91.14 ± 1.62%, outperforming its baseline counterpart which received a lower score of 88.30 ± 2.44%. We make further comparisons between the classifiers and discuss future improvements and directions, including utilising chemical and disease embeddings for visualisation and exploratory analysis of the DILI-positive literature.
Collapse
Affiliation(s)
- Nicholas M. Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nicholas M. Katritsis, ; Namshik Han,
| | - Anika Liu
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Gehad Youssef
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Sanjay Rathee
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Méabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Centre for Therapeutics Discovery, LifeArc, Stevenage, United Kingdom
| | - Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Lilly Wollman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
- Cambridge Centre for AI in Medicine, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Nicholas M. Katritsis, ; Namshik Han,
| |
Collapse
|
2
|
Rathee S, MacMahon M, Liu A, Katritsis NM, Youssef G, Hwang W, Wollman L, Han N. DILI C : An AI-Based Classifier to Search for Drug-Induced Liver Injury Literature. Front Genet 2022; 13:867946. [PMID: 35846129 PMCID: PMC9277181 DOI: 10.3389/fgene.2022.867946] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/11/2022] [Indexed: 01/15/2023] Open
Abstract
Drug-induced liver injury (DILI) is a class of adverse drug reactions (ADR) that causes problems in both clinical and research settings. It is the most frequent cause of acute liver failure in the majority of Western countries and is a major cause of attrition of novel drug candidates. Manual trawling of the literature is the main route of deriving information on DILI from research studies. This makes it an inefficient process prone to human error. Therefore, an automatized AI model capable of retrieving DILI-related articles from the huge ocean of literature could be invaluable for the drug discovery community. In this study, we built an artificial intelligence (AI) model combining the power of natural language processing (NLP) and machine learning (ML) to address this problem. This model uses NLP to filter out meaningless text (e.g., stop words) and uses customized functions to extract relevant keywords such as singleton, pair, and triplet. These keywords are processed by an apriori pattern mining algorithm to extract relevant patterns which are used to estimate initial weightings for a ML classifier. Along with pattern importance and frequency, an FDA-approved drug list mentioning DILI adds extra confidence in classification. The combined power of these methods builds a DILI classifier (DILI C ), with 94.91% cross-validation and 94.14% external validation accuracy. To make DILI C as accessible as possible, including to researchers without coding experience, an R Shiny app capable of classifying single or multiple entries for DILI is developed to enhance ease of user experience and made available at https://researchmind.co.uk/diliclassifier/. Additionally, a GitHub link (https://github.com/sanjaysinghrathi/DILI-Classifier) for app source code and ISMB extended video talk (https://www.youtube.com/watch?v=j305yIVi_f8) are available as supplementary materials.
Collapse
Affiliation(s)
- Sanjay Rathee
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Meabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,LifeArc, Stevenage, United Kingdom
| | - Anika Liu
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Department of Chemistry, Centre for Molecular Informatics, University of Cambridge, Cambridge, United Kingdom
| | - Nicholas M Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, United Kingdom
| | - Gehad Youssef
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Lilly Wollman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom
| | - Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, United Kingdom.,Cambridge Centre for AI in Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
3
|
Han N, Hwang W, Tzelepis K, Schmerer P, Yankova E, MacMahon M, Lei W, M Katritsis N, Liu A, Felgenhauer U, Schuldt A, Harris R, Chapman K, McCaughan F, Weber F, Kouzarides T. Identification of SARS-CoV-2-induced pathways reveals drug repurposing strategies. Sci Adv 2021; 7:eabh3032. [PMID: 34193418 PMCID: PMC8245040 DOI: 10.1126/sciadv.abh3032] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2021] [Accepted: 05/14/2021] [Indexed: 05/02/2023]
Abstract
The global outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) necessitates the rapid development of new therapies against coronavirus disease 2019 (COVID-19) infection. Here, we present the identification of 200 approved drugs, appropriate for repurposing against COVID-19. We constructed a SARS-CoV-2-induced protein network, based on disease signatures defined by COVID-19 multiomics datasets, and cross-examined these pathways against approved drugs. This analysis identified 200 drugs predicted to target SARS-CoV-2-induced pathways, 40 of which are already in COVID-19 clinical trials, testifying to the validity of the approach. Using artificial neural network analysis, we classified these 200 drugs into nine distinct pathways, within two overarching mechanisms of action (MoAs): viral replication (126) and immune response (74). Two drugs (proguanil and sulfasalazine) implicated in viral replication were shown to inhibit replication in cell assays. This unbiased and validated analysis opens new avenues for the rapid repurposing of approved drugs into clinical trials.
Collapse
Affiliation(s)
- Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK.
| | - Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | | | - Patrick Schmerer
- Institute for Virology, FB10-Veterinary Medicine, Justus-Liebig University, Gießen 35392, Germany
| | - Eliza Yankova
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Méabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
- Centre for Therapeutics Discovery, LifeArc, Stevenage, UK
| | - Winnie Lei
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
- Department of Surgery, University of Cambridge, Cambridge, UK
| | - Nicholas M Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Anika Liu
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
- Department of Chemistry, University of Cambridge, Cambridge, UK
- Data and Computational Sciences, GSK, London, UK
| | - Ulrike Felgenhauer
- Institute for Virology, FB10-Veterinary Medicine, Justus-Liebig University, Gießen 35392, Germany
| | - Alison Schuldt
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Rebecca Harris
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Kathryn Chapman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Frank McCaughan
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Friedemann Weber
- Institute for Virology, FB10-Veterinary Medicine, Justus-Liebig University, Gießen 35392, Germany
| | - Tony Kouzarides
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK.
- The Gurdon Institute and Department of Pathology, University of Cambridge, Cambridge, UK
| |
Collapse
|
4
|
Hwang W, Lei W, Katritsis NM, MacMahon M, Chapman K, Han N. Current and prospective computational approaches and challenges for developing COVID-19 vaccines. Adv Drug Deliv Rev 2021; 172:249-274. [PMID: 33561453 PMCID: PMC7871111 DOI: 10.1016/j.addr.2021.02.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 02/01/2021] [Accepted: 02/03/2021] [Indexed: 12/23/2022]
Abstract
SARS-CoV-2, which causes COVID-19, was first identified in humans in late 2019 and is a coronavirus which is zoonotic in origin. As it spread around the world there has been an unprecedented effort in developing effective vaccines. Computational methods can be used to speed up the long and costly process of vaccine development. Antigen selection, epitope prediction, and toxicity and allergenicity prediction are areas in which computational tools have already been applied as part of reverse vaccinology for SARS-CoV-2 vaccine development. However, there is potential for computational methods to assist further. We review approaches which have been used and highlight additional bioinformatic approaches and PK modelling as in silico methods which may be useful for SARS-CoV-2 vaccine design but remain currently unexplored. As more novel viruses with pandemic potential are expected to arise in future, these techniques are not limited to application to SARS-CoV-2 but also useful to rapidly respond to novel emerging viruses.
Collapse
Affiliation(s)
- Woochang Hwang
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Winnie Lei
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK; Department of Surgery, University of Cambridge, Cambridge, UK
| | - Nicholas M Katritsis
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK; Department of Chemical Engineering and Biotechnology, University of Cambridge, Cambridge, UK
| | - Méabh MacMahon
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK; Centre for Therapeutics Discovery, LifeArc, Stevenage, UK
| | - Kathryn Chapman
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK
| | - Namshik Han
- Milner Therapeutics Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|