1
|
Tsubota T, Bollegala D, Zhao Y, Jin Y, Kozu T. Improvement of intervention information detection for automated clinical literature screening during systematic review. J Biomed Inform 2022; 134:104185. [PMID: 36038066 DOI: 10.1016/j.jbi.2022.104185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 08/19/2022] [Accepted: 08/21/2022] [Indexed: 11/19/2022]
Abstract
Systematic literature review (SLR) is a crucial method for clinicians and policymakers to make their decisions in a flood of new clinical studies. Because manual literature screening in SLR is a highly laborious task, its automation by natural language processing (NLP) has been welcomed. Although intervention is a key information for literature screening, NLP models for its detection in previous works have not shown adequate performance. In this work, we first design an algorithm for automated construction of high-quality intervention labels by utilizing information retrieved from a clinical trial database. We then design another algorithm for improving model's recall and F1 score by imposing adaptive weights on training instances in the loss function. The intervention detection model trained on the weighted datasets is tested with the Evidence-Based Medicine NLP (EBM-NLP) corpus, and shows 9.7% and 4.0% improvements respectively in recall and F1 score compared to the previous state-of-the-art model on the corpus. The proposed algorithms can boost automation of literature screening during SLR in the clinical domain.
Collapse
Affiliation(s)
- Tadashi Tsubota
- Deloitte Analytics R&D, Deloitte Touche Tohmatsu LLC, 3-2-3 Marunouchi, Chiyoda-ku, Tokyo, 100-8360, Japan.
| | - Danushka Bollegala
- Department of Computer Science, University of Liverpool, Liverpool, L69 3BX, UK
| | - Yang Zhao
- Deloitte Analytics R&D, Deloitte Touche Tohmatsu LLC, 3-2-3 Marunouchi, Chiyoda-ku, Tokyo, 100-8360, Japan
| | - Yingzi Jin
- Deloitte Analytics R&D, Deloitte Touche Tohmatsu LLC, 3-2-3 Marunouchi, Chiyoda-ku, Tokyo, 100-8360, Japan
| | - Tomotake Kozu
- Deloitte Analytics R&D, Deloitte Touche Tohmatsu LLC, 3-2-3 Marunouchi, Chiyoda-ku, Tokyo, 100-8360, Japan
| |
Collapse
|
2
|
Walker LE, Abuzour AS, Bollegala D, Clegg A, Gabbay M, Griffiths A, Kullu C, Leeming G, Mair FS, Maskell S, Relton S, Ruddle RA, Shantsila E, Sperrin M, Van Staa T, Woodall A, Buchan I. The DynAIRx Project Protocol: Artificial Intelligence for dynamic prescribing optimisation and care integration in multimorbidity. J Multimorb Comorb 2022; 12:26335565221145493. [PMID: 36545235 PMCID: PMC9761229 DOI: 10.1177/26335565221145493] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
BACKGROUND Structured Medication Reviews (SMRs) are intended to help deliver the NHS Long Term Plan for medicines optimisation in people living with multiple long-term conditions and polypharmacy. It is challenging to gather the information needed for these reviews due to poor integration of health records across providers and there is little guidance on how to identify those patients most urgently requiring review. OBJECTIVE To extract information from scattered clinical records on how health and medications change over time, apply interpretable artificial intelligence (AI) approaches to predict risks of poor outcomes and overlay this information on care records to inform SMRs. We will pilot this approach in primary care prescribing audit and feedback systems, and co-design future medicines optimisation decision support systems. DESIGN DynAIRx will target potentially problematic polypharmacy in three key multimorbidity groups, namely, people with (a) mental and physical health problems, (b) four or more long-term conditions taking ten or more drugs and (c) older age and frailty. Structured clinical data will be drawn from integrated care records (general practice, hospital, and social care) covering an ∼11m population supplemented with Natural Language Processing (NLP) of unstructured clinical text. AI systems will be trained to identify patterns of conditions, medications, tests, and clinical contacts preceding adverse events in order to identify individuals who might benefit most from an SMR. DISCUSSION By implementing and evaluating an AI-augmented visualisation of care records in an existing prescribing audit and feedback system we will create a learning system for medicines optimisation, co-designed throughout with end-users and patients.
Collapse
Affiliation(s)
- Lauren E Walker
- Wolfson Centre for Personalized
Medicine, University
of Liverpool, Liverpool, UK
| | - Aseel S Abuzour
- Academic Unit for Ageing &
Stroke Research, University of
Leeds, Bradford Teaching Hospitals NHS
Foundation Trust, Bradford, UK
| | | | - Andrew Clegg
- Academic Unit for Ageing &
Stroke Research, University of
Leeds, Bradford Teaching Hospitals NHS
Foundation Trust, Bradford, UK
| | - Mark Gabbay
- Institute of Population Health,
University
of Liverpool, Liverpool, UK
| | | | - Cecil Kullu
- Mersey Care NHS Foundation
Trust, Liverpool, UK
| | - Gary Leeming
- Civic Data Cooperative,
University
of Liverpool, Liverpool, UK
| | - Frances S Mair
- General Practice and Primary Care,
School of Health and Wellbeing, University of
Glasgow, UK
| | - Simon Maskell
- School of Electrical Engineering,
Electronics and Computer Science, University of
Liverpool, UK
| | - Samuel Relton
- Institute of Health Sciences,
University
of Leeds, UK
| | - Roy A Ruddle
- School of Computing and Leeds
Institute for Data Analytics, University of
Leeds, UK
| | - Eduard Shantsila
- Institute of Population Health,
University
of Liverpool, Liverpool, UK
| | - Matthew Sperrin
- Division of Informatics, Imaging
& Data Sciences, University of
Manchester, Manchester, UK
| | - Tjeerd Van Staa
- Division of Informatics, Imaging
& Data Sciences, University of
Manchester, Manchester, UK
| | - Alan Woodall
- Directorate of Mental Health and
Learning Disabilities, Powys Teaching Health
Board, Bronllys, UK
| | - Iain Buchan
- Institute of Population Health,
University
of Liverpool, Liverpool, UK
| |
Collapse
|
3
|
Alsuhaibani M, Bollegala D. Fine-Tuning Word Embeddings for Hierarchical Representation of Data Using a Corpus and a Knowledge Base for Various Machine Learning Applications. Comput Math Methods Med 2021; 2021:9761163. [PMID: 34824601 PMCID: PMC8610673 DOI: 10.1155/2021/9761163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 10/09/2021] [Accepted: 10/20/2021] [Indexed: 11/28/2022]
Abstract
Word embedding models have recently shown some capability to encode hierarchical information that exists in textual data. However, such models do not explicitly encode the hierarchical structure that exists among words. In this work, we propose a method to learn hierarchical word embeddings (HWEs) in a specific order to encode the hierarchical information of a knowledge base (KB) in a vector space. To learn the word embeddings, our proposed method considers not only the hypernym relations that exist between words in a KB but also contextual information in a text corpus. The experimental results on various applications, such as supervised and unsupervised hypernymy detection, graded lexical entailment prediction, hierarchical path prediction, and word reconstruction tasks, show the ability of the proposed method to encode the hierarchy. Moreover, the proposed method outperforms previously proposed methods for learning nonspecialised, hypernym-specific, and hierarchical word embeddings on multiple benchmarks.
Collapse
Affiliation(s)
- Mohammed Alsuhaibani
- Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | | |
Collapse
|
4
|
|
5
|
Khemchandani Y, O'Hagan S, Samanta S, Swainston N, Roberts TJ, Bollegala D, Kell DB. DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. J Cheminform 2020; 12:53. [PMID: 33431037 PMCID: PMC7487898 DOI: 10.1186/s13321-020-00454-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 08/18/2020] [Indexed: 02/03/2023] Open
Abstract
We address the problem of generating novel molecules with desired interaction properties as a multi-objective optimization problem. Interaction binding models are learned from binding data using graph convolution networks (GCNs). Since the experimentally obtained property scores are recognised as having potentially gross errors, we adopted a robust loss for the model. Combinations of these terms, including drug likeness and synthetic accessibility, are then optimized using reinforcement learning based on a graph convolution policy approach. Some of the molecules generated, while legitimate chemically, can have excellent drug-likeness scores but appear unusual. We provide an example based on the binding potency of small molecules to dopamine transporters. We extend our method successfully to use a multi-objective reward function, in this case for generating novel molecules that bind with dopamine transporters but not with those for norepinephrine. Our method should be generally applicable to the generation in silico of molecules with desirable properties.
Collapse
Affiliation(s)
- Yash Khemchandani
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
- Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, 400 076, India
| | - Stephen O'Hagan
- Dept of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Timothy J Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Danushka Bollegala
- Dept of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool, L69 3BX, UK
| | - Douglas B Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK.
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 200, Kgs, 2800, Lyngby, Denmark.
| |
Collapse
|
6
|
Bollegala D, Maskell S, Sloane R, Hajne J, Pirmohamed M. Causality Patterns for Detecting Adverse Drug Reactions From Social Media: Text Mining Approach. JMIR Public Health Surveill 2018; 4:e51. [PMID: 29743155 PMCID: PMC5966656 DOI: 10.2196/publichealth.8214] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/25/2017] [Accepted: 03/14/2018] [Indexed: 11/15/2022] Open
Abstract
Background Detecting adverse drug reactions (ADRs) is an important task that has direct implications for the use of that drug. If we can detect previously unknown ADRs as quickly as possible, then this information can be provided to the regulators, pharmaceutical companies, and health care organizations, thereby potentially reducing drug-related morbidity and saving lives of many patients. A promising approach for detecting ADRs is to use social media platforms such as Twitter and Facebook. A high level of correlation between a drug name and an event may be an indication of a potential adverse reaction associated with that drug. Although numerous association measures have been proposed by the signal detection community for identifying ADRs, these measures are limited in that they detect correlations but often ignore causality. Objective This study aimed to propose a causality measure that can detect an adverse reaction that is caused by a drug rather than merely being a correlated signal. Methods To the best of our knowledge, this was the first causality-sensitive approach for detecting ADRs from social media. Specifically, the relationship between a drug and an event was represented using a set of automatically extracted lexical patterns. We then learned the weights for the extracted lexical patterns that indicate their reliability for expressing an adverse reaction of a given drug. Results Our proposed method obtains an ADR detection accuracy of 74% on a large-scale manually annotated dataset of tweets, covering a standard set of drugs and adverse reactions. Conclusions By using lexical patterns, we can accurately detect the causality between drugs and adverse reaction–related events.
Collapse
Affiliation(s)
- Danushka Bollegala
- Department of Computer Science, University of Liverpool, Liverpool, United Kingdom
| | - Simon Maskell
- Department of Computer Science, University of Liverpool, Liverpool, United Kingdom
| | - Richard Sloane
- Department of Computer Science, University of Liverpool, Liverpool, United Kingdom
| | - Joanna Hajne
- Department of Computer Science, University of Liverpool, Liverpool, United Kingdom
| | - Munir Pirmohamed
- Department of Translational Medicine, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
7
|
Abstract
Methods for representing the meaning of words in vector spaces purely using the information distributed in text corpora have proved to be very valuable in various text mining and natural language processing (NLP) tasks. However, these methods still disregard the valuable semantic relational structure between words in co-occurring contexts. These beneficial semantic relational structures are contained in manually-created knowledge bases (KBs) such as ontologies and semantic lexicons, where the meanings of words are represented by defining the various relationships that exist among those words. We combine the knowledge in both a corpus and a KB to learn better word embeddings. Specifically, we propose a joint word representation learning method that uses the knowledge in the KBs, and simultaneously predicts the co-occurrences of two words in a corpus context. In particular, we use the corpus to define our objective function subject to the relational constrains derived from the KB. We further utilise the corpus co-occurrence statistics to propose two novel approaches, Nearest Neighbour Expansion (NNE) and Hedged Nearest Neighbour Expansion (HNE), that dynamically expand the KB and therefore derive more constraints that guide the optimisation process. Our experimental results over a wide-range of benchmark tasks demonstrate that the proposed method statistically significantly improves the accuracy of the word embeddings learnt. It outperforms a corpus-only baseline and reports an improvement of a number of previously proposed methods that incorporate corpora and KBs in both semantic similarity prediction and word analogy detection tasks.
Collapse
Affiliation(s)
- Mohammed Alsuhaibani
- Department of Computer Science, University of Liverpool, Liverpool, United Kingdom
- * E-mail:
| | - Danushka Bollegala
- Department of Computer Science, University of Liverpool, Liverpool, United Kingdom
- Kawarabayashi ERATO Large Graph Project, Tokyo, Japan
| | - Takanori Maehara
- RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Kawarabayashi ERATO Large Graph Project, Tokyo, Japan
| | - Ken-ichi Kawarabayashi
- National Institute of Informatics, Tokyo, Japan
- Kawarabayashi ERATO Large Graph Project, Tokyo, Japan
| |
Collapse
|
8
|
|
9
|
Kajiwara T, Bollegala D, Yoshida Y, Kawarabayashi KI. An iterative approach for the global estimation of sentence similarity. PLoS One 2017; 12:e0180885. [PMID: 28898242 PMCID: PMC5595307 DOI: 10.1371/journal.pone.0180885] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 06/22/2017] [Indexed: 11/21/2022] Open
Abstract
Measuring the similarity between two sentences is often difficult due to their small lexical overlap. Instead of focusing on the sets of features in two given sentences between which we must measure similarity, we propose a sentence similarity method that considers two types of constraints that must be satisfied by all pairs of sentences in a given corpus. Namely, (a) if two sentences share many features in common, then it is likely that the remaining features in each sentence are also related, and (b) if two sentences contain many related features, then those two sentences are themselves similar. The two constraints are utilized in an iterative bootstrapping procedure that simultaneously updates both word and sentence similarity scores. Experimental results on SemEval 2015 Task 2 dataset show that the proposed iterative approach for measuring sentence semantic similarity is significantly better than the non-iterative counterparts.
Collapse
|
10
|
Sloane R, Osanlou O, Lewis D, Bollegala D, Maskell S, Pirmohamed M. Social media and pharmacovigilance: A review of the opportunities and challenges. Br J Clin Pharmacol 2015; 80:910-20. [PMID: 26147850 PMCID: PMC4594734 DOI: 10.1111/bcp.12717] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 06/29/2015] [Accepted: 07/03/2015] [Indexed: 01/23/2023] Open
Abstract
Adverse drug reactions come at a considerable cost on society. Social media are a potentially invaluable reservoir of information for pharmacovigilance, yet their true value remains to be fully understood. In order to realize the benefits social media holds, a number of technical, regulatory and ethical challenges remain to be addressed. We outline these key challenges identifying relevant current research and present possible solutions.
Collapse
Affiliation(s)
- Richard Sloane
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Orod Osanlou
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| | - David Lewis
- Drug Safety & Epidemiology, Novartis Pharma AG, PostfachCH-4002, Basel, Switzerland
| | | | - Simon Maskell
- Department of Electrical Engineering and Electronics, University of LiverpoolL69 3GJ, UK
- Department of Computer Science, University of LiverpoolL69 3BX, UK
| | - Munir Pirmohamed
- Department of Molecular and Clinical Pharmacology, University of LiverpoolL69 3GL, UK
- Royal Liverpool and Broadgreen University Hospital NHS TrustLiverpool, L7 8XP, UK
| |
Collapse
|
11
|
Bollegala D, Kontonatsios G, Ananiadou S. A cross-lingual similarity measure for detecting biomedical term translations. PLoS One 2015; 10:e0126196. [PMID: 26030738 PMCID: PMC4452086 DOI: 10.1371/journal.pone.0126196] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 03/30/2015] [Indexed: 12/03/2022] Open
Abstract
Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks.
Collapse
Affiliation(s)
- Danushka Bollegala
- Department of Computer Science, University of Liverpool, United Kingdom
- * E-mail:
| | - Georgios Kontonatsios
- School of Computer Science, University of Manchester, Manchester, United Kingdom
- National Centre for Text Mining, University of Manchester, Manchester, United Kingdom
| | - Sophia Ananiadou
- School of Computer Science, University of Manchester, Manchester, United Kingdom
- National Centre for Text Mining, University of Manchester, Manchester, United Kingdom
| |
Collapse
|
12
|
Abstract
Interpreting metaphor is a hard but important problem in natural language processing that has numerous applications. One way to address this task is by finding a paraphrase that can replace the metaphorically used word in a given context. This approach has been previously implemented only within supervised frameworks, relying on manually constructed lexical resources, such as WordNet. In contrast, we present a fully unsupervised metaphor interpretation method that extracts literal paraphrases for metaphorical expressions from the Web. It achieves a precision of [Formula: see text], which is high for an unsupervised paraphrasing approach. Moreover, the method significantly outperforms both the baseline and the selectional preference-based method of Shutova employed in an unsupervised setting.
Collapse
Affiliation(s)
- Danushka Bollegala
- Department of Information and Communication Engineering, the University of Tokyo, Tokyo, Japan
- * E-mail:
| | - Ekaterina Shutova
- Institute for Cognitive and Brain Sciences, University of California, Berkeley, California, United States of America
| |
Collapse
|
13
|
|
14
|
|
15
|
|