1
|
Cajamarca G, Proust V, Herskovic V, Cádiz RF, Verdezoto N, Fernández FJ. Technologies for Managing the Health of Older Adults with Multiple Chronic Conditions: A Systematic Literature Review. Healthcare (Basel) 2023; 11:2897. [PMID: 37958041 PMCID: PMC10648176 DOI: 10.3390/healthcare11212897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/19/2023] [Accepted: 10/23/2023] [Indexed: 11/15/2023] Open
Abstract
Multimorbidity is defined as the presence of two or more chronic medical conditions in a person, whether physical, mental or long-term infectious diseases. This is especially common in older populations, affecting their quality of life and emotionally impacting their caregivers and family. Technology can allow for monitoring, managing, and motivating older adults in their self-care, as well as supporting their caregivers. However, when several conditions are present at once, it may be necessary to manage several types of technologies, or for technology to manage the interaction between conditions. This work aims to understand and describe the technologies that are used to support the management of multimorbidity for older adults. We conducted a systematic review of ten years of scientific literature from four online databases. We reviewed a corpus of 681 research papers, finally including 25 in our review. The technologies used most frequently by older adults with multimorbidity are mobile applications and websites, and they are mostly focused on communication and connectivity. We then propose opportunities for future research on addressing the challenges in the management of several simultaneous health conditions, potentially creating a better approach than managing each condition as if it were independent.
Collapse
Affiliation(s)
- Gabriela Cajamarca
- School of Mathematical and Computational Sciences, Yachay Tech University, San Miguel de Urcuquí 100119, Ecuador;
| | - Valentina Proust
- Annenberg School for Communication, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Valeria Herskovic
- Department of Computer Science, School of Engineering, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile
| | - Rodrigo F. Cádiz
- Department of Electrical Engineering, School of Engineering, and Music Institute, Faculty of Arts, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile;
| | - Nervo Verdezoto
- School of Computer Science and Informatics, Cardiff University, Cardiff CF24 4AG, UK;
| | - Francisco J. Fernández
- Faculty of Communication, Pontificia Universidad Católica de Chile, Santiago 8320000, Chile;
| |
Collapse
|
2
|
Perlman-Arrow S, Loo N, Bobrovitz N, Yan T, Arora RK. A real-world evaluation of the implementation of NLP technology in abstract screening of a systematic review. Res Synth Methods 2023. [PMID: 37230483 DOI: 10.1002/jrsm.1636] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 04/06/2023] [Accepted: 04/27/2023] [Indexed: 05/27/2023]
Abstract
The laborious and time-consuming nature of systematic review production hinders the dissemination of up-to-date evidence synthesis. Well-performing natural language processing (NLP) tools for systematic reviews have been developed, showing promise to improve efficiency. However, the feasibility and value of these technologies have not been comprehensively demonstrated in a real-world review. We developed an NLP-assisted abstract screening tool that provides text inclusion recommendations, keyword highlights, and visual context cues. We evaluated this tool in a living systematic review on SARS-CoV-2 seroprevalence, conducting a quality improvement assessment of screening with and without the tool. We evaluated changes to abstract screening speed, screening accuracy, characteristics of included texts, and user satisfaction. The tool improved efficiency, reducing screening time per abstract by 45.9% and decreasing inter-reviewer conflict rates. The tool conserved precision of article inclusion (positive predictive value; 0.92 with tool vs. 0.88 without) and recall (sensitivity; 0.90 vs. 0.81). The summary statistics of included studies were similar with and without the tool. Users were satisfied with the tool (mean satisfaction score of 4.2/5). We evaluated an abstract screening process where one human reviewer was replaced with the tool's votes, finding that this maintained recall (0.92 one-person, one-tool vs. 0.90 two tool-assisted humans) and precision (0.91 vs. 0.92) while reducing screening time by 70%. Implementing an NLP tool in this living systematic review improved efficiency, maintained accuracy, and was well-received by researchers, demonstrating the real-world effectiveness of NLP in expediting evidence synthesis.
Collapse
Affiliation(s)
- Sara Perlman-Arrow
- School of Population and Global Health, McGill University, Quebec, Canada
| | - Noel Loo
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Niklas Bobrovitz
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Tingting Yan
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Rahul K Arora
- Centre for Health Informatics, University of Calgary, Calgary, Alberta, Canada
- Institute of Biomedical Engineering, University of Oxford, Oxford, UK
| |
Collapse
|
3
|
Oliveira Dos Santos Á, Sergio da Silva E, Machado Couto L, Valadares Labanca Reis G, Silva Belo V. The use of artificial intelligence for automating or semi-automating biomedical literature analyses: a scoping review. J Biomed Inform 2023; 142:104389. [PMID: 37187321 DOI: 10.1016/j.jbi.2023.104389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 04/11/2023] [Accepted: 05/08/2023] [Indexed: 05/17/2023]
Abstract
OBJECTIVE Evidence-based medicine (EBM) is a decision-making process based on the conscious and judicious use of the best available scientific evidence. However, the exponential increase in the amount of information currently available likely exceeds the capacity of human-only analysis. In this context, artificial intelligence (AI) and its branches such as machine learning (ML) can be used to facilitate human efforts in analyzing the literature to foster EBM. The present scoping review aimed to examine the use of AI in the automation of biomedical literature survey and analysis with a view to establishing the state-of-the-art and identifying knowledge gaps. MATERIALS AND METHODS Comprehensive searches of the main databases were performed for articles published up to June 2022 and studies were selected according to inclusion and exclusion criteria. Data were extracted from the included articles and the findings categorized. RESULTS The total number of records retrieved from the databases was 12,145, of which 273 were included in the review. Classification of the studies according to the use of AI in evaluating the biomedical literature revealed three main application groups, namely assembly of scientific evidence (n=127; 47%), mining the biomedical literature (n=112; 41%) and quality analysis (n=34; 12%). Most studies addressed the preparation of systematic reviews, while articles focusing on the development of guidelines and evidence synthesis were the least frequent. The biggest knowledge gap was identified within the quality analysis group, particularly regarding methods and tools that assess the strength of recommendation and consistency of evidence. CONCLUSION Our review shows that, despite significant progress in the automation of biomedical literature surveys and analyses in recent years, intense research is needed to fill knowledge gaps on more difficult aspects of ML, deep learning and natural language processing, and to consolidate the use of automation by end-users (biomedical researchers and healthcare professionals).
Collapse
Affiliation(s)
| | - Eduardo Sergio da Silva
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | - Letícia Machado Couto
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| | | | - Vinícius Silva Belo
- Federal University of São João del-Rei, Campus Centro-Oeste Dona Lindu, Divinópolis, Minas Gerais, Brazil.
| |
Collapse
|
4
|
Fortunato Costa K, Almeida Araújo F, Morais J, Lisboa Frances CR, Ramos RTJ. Text mining for identification of biological entities related to antibiotic resistant organisms. PeerJ 2022; 10:e13351. [PMID: 35539017 PMCID: PMC9080439 DOI: 10.7717/peerj.13351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 04/07/2022] [Indexed: 01/13/2023] Open
Abstract
Antimicrobial resistance is a significant public health problem worldwide. In recent years, the scientific community has been intensifying efforts to combat this problem; many experiments have been developed, and many articles are published in this area. However, the growing volume of biological literature increases the difficulty of the biocuration process due to the cost and time required. Modern text mining tools with the adoption of artificial intelligence technology are helpful to assist in the evolution of research. In this article, we propose a text mining model capable of identifying and ranking prioritizing scientific articles in the context of antimicrobial resistance. We retrieved scientific articles from the PubMed database, adopted machine learning techniques to generate the vector representation of the retrieved scientific articles, and identified their similarity with the context. As a result of this process, we obtained a dataset labeled "Relevant" and "Irrelevant" and used this dataset to implement one supervised learning algorithm to classify new records. The model's overall performance reached 90% accuracy and the f-measure (harmonic mean between the metrics) reached 82% accuracy for positive class and 93% for negative class, showing quality in the identification of scientific articles relevant to the context. The dataset, scripts and models are available at https://github.com/engbiopct/TextMiningAMR.
Collapse
Affiliation(s)
- Kelle Fortunato Costa
- Programa de pós-graduação em Engenharia Elétrica, Universidade Federal do Pará, Belém, Pará, Brazil
| | - Fabrício Almeida Araújo
- Biological Science Institute, Universidade Federal do Pará, Belém, Pará, Brazil,Universidade Federal Rural da Amazônia, Belém, Pará, Brazil
| | | | | | - Rommel T. J. Ramos
- Biological Science Institute, Universidade Federal do Para, Belém, Pará, Brazil
| |
Collapse
|
5
|
Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: Update of a living systematic review. F1000Res 2021; 10:401. [PMID: 34408850 PMCID: PMC8361807 DOI: 10.12688/f1000research.51117.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 10/12/2023] Open
Abstract
Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search PubMed, ACL Anthology, arXiv, OpenAlex via EPPI-Reviewer, and the dblp computer science bibliography. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This living review update includes publications up to December 2022 and OpenAlex content up to March 2023. Results: 76 publications are included in this review. Of these, 64 (84%) of the publications addressed extraction of data from abstracts, while 19 (25%) used full texts. A total of 71 (93%) publications developed classifiers for randomised controlled trials. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. Data are available from 25 (33%), and code from 30 (39%) publications. Six (8%) implemented publicly available tools Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of literature review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. Between review updates, trends for sharing data and code increased strongly: in the base-review, data and code were available for 13 and 19% respectively, these numbers increased to 78 and 87% within the 23 new publications. Compared with the base-review, we observed another research trend, away from straightforward data extraction and towards additionally extracting relations between entities or automatic text summarisation. With this living review we aim to review the literature continually.
Collapse
Affiliation(s)
- Lena Schmidt
- NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | - Rebecca Elmore
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
| | - Babatunde K. Olorisade
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Evaluate Ltd, London, SE1 2RE, UK
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK
| | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|
6
|
Schmidt L, Finnerty Mutlu AN, Elmore R, Olorisade BK, Thomas J, Higgins JPT. Data extraction methods for systematic review (semi)automation: A living systematic review. F1000Res 2021; 10:401. [PMID: 34408850 PMCID: PMC8361807 DOI: 10.12688/f1000research.51117.1] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2021] [Indexed: 01/07/2023] Open
Abstract
Background: The reliable and usable (semi)automation of data extraction can support the field of systematic review by reducing the workload required to gather information about the conduct and results of the included studies. This living systematic review examines published approaches for data extraction from reports of clinical studies. Methods: We systematically and continually search MEDLINE, Institute of Electrical and Electronics Engineers (IEEE), arXiv, and the dblp computer science bibliography databases. Full text screening and data extraction are conducted within an open-source living systematic review application created for the purpose of this review. This iteration of the living review includes publications up to a cut-off date of 22 April 2020. Results: In total, 53 publications are included in this version of our review. Of these, 41 (77%) of the publications addressed extraction of data from abstracts, while 14 (26%) used full texts. A total of 48 (90%) publications developed and evaluated classifiers that used randomised controlled trials as the main target texts. Over 30 entities were extracted, with PICOs (population, intervention, comparator, outcome) being the most frequently extracted. A description of their datasets was provided by 49 publications (94%), but only seven (13%) made the data publicly available. Code was made available by 10 (19%) publications, and five (9%) implemented publicly available tools. Conclusions: This living systematic review presents an overview of (semi)automated data-extraction literature of interest to different types of systematic review. We identified a broad evidence base of publications describing data extraction for interventional reviews and a small number of publications extracting epidemiological or diagnostic accuracy data. The lack of publicly available gold-standard data for evaluation, and lack of application thereof, makes it difficult to draw conclusions on which is the best-performing system for each data extraction target. With this living review we aim to review the literature continually.
Collapse
Affiliation(s)
- Lena Schmidt
- NIHR Innovation Observatory, Newcastle University, Newcastle upon Tyne, NE4 5TG, UK
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
| | | | - Rebecca Elmore
- Sciome LLC, Research Triangle Park, North Carolina, 27713, USA
| | - Babatunde K. Olorisade
- Bristol Medical School, University of Bristol, Bristol, BS8 2PS, UK
- Evaluate Ltd, London, SE1 2RE, UK
- Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, CF5 2YB, UK
| | - James Thomas
- UCL Social Research Institute, University College London, London, WC1H 0AL, UK
| | | |
Collapse
|
7
|
Afzal M, Alam F, Malik KM, Malik GM. Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation. J Med Internet Res 2020; 22:e19810. [PMID: 33095174 PMCID: PMC7647812 DOI: 10.2196/19810] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Accepted: 09/24/2020] [Indexed: 01/09/2023] Open
Abstract
Background Automatic text summarization (ATS) enables users to retrieve meaningful evidence from big data of biomedical repositories to make complex clinical decisions. Deep neural and recurrent networks outperform traditional machine-learning techniques in areas of natural language processing and computer vision; however, they are yet to be explored in the ATS domain, particularly for medical text summarization. Objective Traditional approaches in ATS for biomedical text suffer from fundamental issues such as an inability to capture clinical context, quality of evidence, and purpose-driven selection of passages for the summary. We aimed to circumvent these limitations through achieving precise, succinct, and coherent information extraction from credible published biomedical resources, and to construct a simplified summary containing the most informative content that can offer a review particular to clinical needs. Methods In our proposed approach, we introduce a novel framework, termed Biomed-Summarizer, that provides quality-aware Patient/Problem, Intervention, Comparison, and Outcome (PICO)-based intelligent and context-enabled summarization of biomedical text. Biomed-Summarizer integrates the prognosis quality recognition model with a clinical context–aware model to locate text sequences in the body of a biomedical article for use in the final summary. First, we developed a deep neural network binary classifier for quality recognition to acquire scientifically sound studies and filter out others. Second, we developed a bidirectional long-short term memory recurrent neural network as a clinical context–aware classifier, which was trained on semantically enriched features generated using a word-embedding tokenizer for identification of meaningful sentences representing PICO text sequences. Third, we calculated the similarity between query and PICO text sequences using Jaccard similarity with semantic enrichments, where the semantic enrichments are obtained using medical ontologies. Last, we generated a representative summary from the high-scoring PICO sequences aggregated by study type, publication credibility, and freshness score. Results Evaluation of the prognosis quality recognition model using a large dataset of biomedical literature related to intracranial aneurysm showed an accuracy of 95.41% (2562/2686) in terms of recognizing quality articles. The clinical context–aware multiclass classifier outperformed the traditional machine-learning algorithms, including support vector machine, gradient boosted tree, linear regression, K-nearest neighbor, and naïve Bayes, by achieving 93% (16127/17341) accuracy for classifying five categories: aim, population, intervention, results, and outcome. The semantic similarity algorithm achieved a significant Pearson correlation coefficient of 0.61 (0-1 scale) on a well-known BIOSSES dataset (with 100 pair sentences) after semantic enrichment, representing an improvement of 8.9% over baseline Jaccard similarity. Finally, we found a highly positive correlation among the evaluations performed by three domain experts concerning different metrics, suggesting that the automated summarization is satisfactory. Conclusions By employing the proposed method Biomed-Summarizer, high accuracy in ATS was achieved, enabling seamless curation of research evidence from the biomedical literature to use for clinical decision-making.
Collapse
Affiliation(s)
- Muhammad Afzal
- Department of Software, Sejong University, Seoul, Republic of Korea.,Department of Computer Science & Engineering, School of Engineering and Computer Science, Oakland University, Rochester, MI, United States
| | - Fakhare Alam
- Department of Computer Science & Engineering, School of Engineering and Computer Science, Oakland University, Rochester, MI, United States
| | - Khalid Mahmood Malik
- Department of Computer Science & Engineering, School of Engineering and Computer Science, Oakland University, Rochester, MI, United States
| | - Ghaus M Malik
- Department of Neurosurgery, Henry Ford Hospital, Detroit, MI, United States
| |
Collapse
|
8
|
Cajamarca G, Herskovic V, Rossel PO. Enabling Older Adults' Health Self-Management through Self-Report and Visualization-A Systematic Literature Review. SENSORS (BASEL, SWITZERLAND) 2020; 20:E4348. [PMID: 32759801 PMCID: PMC7436010 DOI: 10.3390/s20154348] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 07/25/2020] [Accepted: 07/26/2020] [Indexed: 12/15/2022]
Abstract
Aging is associated with a progressive decline in health, resulting in increased medical care and costs. Mobile technology may facilitate health self-management, thus increasing the quality of care and reducing costs. Although the development of technology offers opportunities in monitoring the health of older adults, it is not clear whether these technologies allow older adults to manage their health data themselves. This paper presents a review of the literature on mobile health technologies for older adults, focusing on whether these technologies enable the visualization of monitored data and the self-reporting of additional information by the older adults. The systematic search considered studies published between 2009 and 2019 in five online databases. We screened 609 articles and identified 95 that met our inclusion and exclusion criteria. Smartphones and tablets are the most frequently reported technology for older adults to enter additional data to the one that is monitored automatically. The recorded information is displayed on the monitoring device and screens of external devices such as computers. Future designs of mobile health technology should allow older users to enter additional information and visualize data; this could enable them to understand their own data as well as improve their experience with technology.
Collapse
Affiliation(s)
- Gabriela Cajamarca
- Department of Computer Science, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile;
| | - Valeria Herskovic
- Department of Computer Science, Pontificia Universidad Católica de Chile, Santiago 7820436, Chile;
| | - Pedro O. Rossel
- Department of Computer Science, Universidad Católica de la Santísima Concepción, Concepción 4090541, Chile;
- Centro de Investigación en Biodiversidad y Ambientes Sustentables (CIBAS), Universidad Católica de la Santísima Concepción, Concepción 4090541, Chile
| |
Collapse
|
9
|
Jin D, Szolovits P. Advancing PICO element detection in biomedical text via deep neural networks. Bioinformatics 2020; 36:3856-3862. [PMID: 32311009 DOI: 10.1093/bioinformatics/btaa256] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 03/31/2020] [Accepted: 04/14/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In evidence-based medicine, defining a clinical question in terms of the specific patient problem aids the physicians to efficiently identify appropriate resources and search for the best available evidence for medical treatment. In order to formulate a well-defined, focused clinical question, a framework called PICO is widely used, which identifies the sentences in a given medical text that belong to the four components typically reported in clinical trials: Participants/Problem (P), Intervention (I), Comparison (C) and Outcome (O). In this work, we propose a novel deep learning model for recognizing PICO elements in biomedical abstracts. Based on the previous state-of-the-art bidirectional long-short-term memory (bi-LSTM) plus conditional random field architecture, we add another layer of bi-LSTM upon the sentence representation vectors so that the contextual information from surrounding sentences can be gathered to help infer the interpretation of the current one. In addition, we propose two methods to further generalize and improve the model: adversarial training and unsupervised pre-training over large corpora. RESULTS We tested our proposed approach over two benchmark datasets. One is the PubMed-PICO dataset, where our best results outperform the previous best by 5.5%, 7.9% and 5.8% for P, I and O elements in terms of F1 score, respectively. And for the other dataset named NICTA-PIBOSO, the improvements for P/I/O elements are 3.9%, 15.6% and 1.3% in F1 score, respectively. Overall, our proposed deep learning model can obtain unprecedented PICO element detection accuracy while avoiding the need for any manual feature selection. AVAILABILITY AND IMPLEMENTATION Code is available at https://github.com/jind11/Deep-PICO-Detection.
Collapse
Affiliation(s)
- Di Jin
- Compter Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Peter Szolovits
- Compter Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
10
|
Zhang X, Geng P, Zhang T, Lu Q, Gao P, Mei J. Aceso: PICO-guided Evidence Summarization on Medical Literature. IEEE J Biomed Health Inform 2020; PP:2663-2670. [PMID: 32275627 DOI: 10.1109/jbhi.2020.2984704] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Evidence-Based Medicine (EBM) aims to apply the best available evidence gained from scientific methods to clinical decision making. A generally accepted criterion to formulate evidence is to use the PICO framework, where PICO stands for Problem/Population, Intervention, Comparison, and Outcome. Automatic extraction of PICO-related sentences from medical literature is crucial to the success of many EBM applications. In this work, we present our Aceso system, which automatically generates PICO-based evidence summaries from medical literature. In Aceso 1, we adopt an active learning paradigm, which helps to minimize the cost of manual labeling and to optimize the quality of summarization with limited labeled data. An UMLS2Vec model is proposed to learn a vector representation of medical concepts in UMLS 2, and we fuse the embedding of medical knowledge with textual features in summarization. The evaluation shows that our approach is better on identifying PICO sentences against state-of-the-art studies and outperforms baseline methods on producing high-quality evidence summaries.
Collapse
|
11
|
Afzal M, Hussain M, Malik KM, Lee S. Impact of Automatic Query Generation and Quality Recognition Using Deep Learning to Curate Evidence From Biomedical Literature: Empirical Study. JMIR Med Inform 2019; 7:e13430. [PMID: 31815673 PMCID: PMC6928703 DOI: 10.2196/13430] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 08/07/2019] [Accepted: 09/26/2019] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The quality of health care is continuously improving and is expected to improve further because of the advancement of machine learning and knowledge-based techniques along with innovation and availability of wearable sensors. With these advancements, health care professionals are now becoming more interested and involved in seeking scientific research evidence from external sources for decision making relevant to medical diagnosis, treatments, and prognosis. Not much work has been done to develop methods for unobtrusive and seamless curation of data from the biomedical literature. OBJECTIVE This study aimed to design a framework that can enable bringing quality publications intelligently to the users' desk to assist medical practitioners in answering clinical questions and fulfilling their informational needs. METHODS The proposed framework consists of methods for efficient biomedical literature curation, including the automatic construction of a well-built question, the recognition of evidence quality by proposing extended quality recognition model (E-QRM), and the ranking and summarization of the extracted evidence. RESULTS Unlike previous works, the proposed framework systematically integrates the echelons of biomedical literature curation by including methods for searching queries, content quality assessments, and ranking and summarization. Using an ensemble approach, our high-impact classifier E-QRM obtained significantly improved accuracy than the existing quality recognition model (1723/1894, 90.97% vs 1462/1894, 77.21%). CONCLUSIONS Our proposed methods and evaluation demonstrate the validity and rigorousness of the results, which can be used in different applications, including evidence-based medicine, precision medicine, and medical education.
Collapse
Affiliation(s)
- Muhammad Afzal
- Department of Software, Sejong University, Seoul, Republic of Korea.,Department of Computer Science and Engineering, Oakland University, Rochester, MI, United States
| | - Maqbool Hussain
- Department of Software, Sejong University, Seoul, Republic of Korea
| | - Khalid Mahmood Malik
- Department of Computer Science and Engineering, Oakland University, Rochester, MI, United States
| | - Sungyoung Lee
- Department of Computer Science and Engineering, Kyung Hee University, Yongin, Republic of Korea
| |
Collapse
|
12
|
Bashir R, Surian D, Dunn AG. The risk of conclusion change in systematic review updates can be estimated by learning from a database of published examples. J Clin Epidemiol 2019; 110:42-49. [PMID: 30849512 DOI: 10.1016/j.jclinepi.2019.02.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Revised: 01/25/2019] [Accepted: 02/26/2019] [Indexed: 01/11/2023]
Abstract
OBJECTIVES To determine which systematic review characteristics are needed to estimate the risk of conclusion change in systematic review updates. STUDY DESIGN AND SETTING We applied classification trees (a machine learning method) to model the risk of conclusion change in systematic review updates, using pairs of systematic reviews and their updates as samples. The classifiers were constructed using a set of features extracted from systematic reviews and the relevant trials added in published updates. Model performance was measured by recall, precision, and area under the receiver operating characteristic curve (AUC). RESULTS We identified 63 pairs of systematic reviews and updates, of which 20 (32%) exhibited a change in conclusion in their updates. A classifier using information about new trials exhibited the highest performance (AUC: 0.71; recall: 0.75; precision: 0.43) compared to a classifier that used fewer features (AUC: 0.65; recall: 0.75; precision: 0.39). CONCLUSION When estimating the risk of conclusion change in systematic review updates, information about the sizes of trials that will be added in an update are most useful. Future tools aimed at signaling conclusion change risks would benefit from complementary tools that automate screening of relevant trials.
Collapse
Affiliation(s)
- Rabia Bashir
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, New South Wales 2109, Australia.
| | - Didi Surian
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, New South Wales 2109, Australia
| | - Adam G Dunn
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, New South Wales 2109, Australia; Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02115, USA
| |
Collapse
|
13
|
Fast and scalable neural embedding models for biomedical sentence classification. BMC Bioinformatics 2018; 19:541. [PMID: 30577747 PMCID: PMC6303852 DOI: 10.1186/s12859-018-2496-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 11/16/2018] [Indexed: 11/24/2022] Open
Abstract
Background Biomedical literature is expanding rapidly, and tools that help locate information of interest are needed. To this end, a multitude of different approaches for classifying sentences in biomedical publications according to their coarse semantic and rhetoric categories (e.g., Background, Methods, Results, Conclusions) have been devised, with recent state-of-the-art results reported for a complex deep learning model. Recent evidence showed that shallow and wide neural models such as fastText can provide results that are competitive or superior to complex deep learning models while requiring drastically lower training times and having better scalability. We analyze the efficacy of the fastText model in the classification of biomedical sentences in the PubMed 200k RCT benchmark, and introduce a simple pre-processing step that enables the application of fastText on sentence sequences. Furthermore, we explore the utility of two unsupervised pre-training approaches in scenarios where labeled training data are limited. Results Our fastText-based methodology yields a state-of-the-art F1 score of.917 on the PubMed 200k benchmark when sentence ordering is taken into account, with a training time of only 73 s on standard hardware. Applying fastText on single sentences, without taking sentence ordering into account, yielded an F1 score of.852 (training time 13 s). Unsupervised pre-training of N-gram vectors greatly improved the results for small training set sizes, with an increase of F1 score of.21 to.74 when trained on only 1000 randomly picked sentences without taking sentence ordering into account. Conclusions Because of it’s ease of use and performance, fastText should be among the first choices of tools when tackling biomedical text classification problems with large corpora. Unsupervised pre-training of N-gram vectors on domain-specific corpora also makes it possible to apply fastText when labeled training data are limited.
Collapse
|
14
|
Giaquinta Aranda A, Fernández Araque A, Curbelo Rodriguez R, Rojo Aragues A. Glaucoma y antioxidantes: revisión sistemática. REVISTA MEXICANA DE OFTALMOLOGÍA 2017. [DOI: 10.1016/j.mexoft.2016.03.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
|
15
|
Bui DDA, Del Fiol G, Hurdle JF, Jonnalagadda S. Extractive text summarization system to aid data extraction from full text in systematic review development. J Biomed Inform 2016; 64:265-272. [PMID: 27989816 PMCID: PMC5362293 DOI: 10.1016/j.jbi.2016.10.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 10/24/2016] [Accepted: 10/25/2016] [Indexed: 01/30/2023]
Abstract
OBJECTIVES Extracting data from publication reports is a standard process in systematic review (SR) development. However, the data extraction process still relies too much on manual effort which is slow, costly, and subject to human error. In this study, we developed a text summarization system aimed at enhancing productivity and reducing errors in the traditional data extraction process. METHODS We developed a computer system that used machine learning and natural language processing approaches to automatically generate summaries of full-text scientific publications. The summaries at the sentence and fragment levels were evaluated in finding common clinical SR data elements such as sample size, group size, and PICO values. We compared the computer-generated summaries with human written summaries (title and abstract) in terms of the presence of necessary information for the data extraction as presented in the Cochrane review's study characteristics tables. RESULTS At the sentence level, the computer-generated summaries covered more information than humans do for systematic reviews (recall 91.2% vs. 83.8%, p<0.001). They also had a better density of relevant sentences (precision 59% vs. 39%, p<0.001). At the fragment level, the ensemble approach combining rule-based, concept mapping, and dictionary-based methods performed better than individual methods alone, achieving an 84.7% F-measure. CONCLUSION Computer-generated summaries are potential alternative information sources for data extraction in systematic review development. Machine learning and natural language processing are promising approaches to the development of such an extractive summarization system.
Collapse
Affiliation(s)
- Duy Duc An Bui
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Division of Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA.
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - John F Hurdle
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | | |
Collapse
|
16
|
Bui DDA, Del Fiol G, Jonnalagadda S. PDF text classification to leverage information extraction from publication reports. J Biomed Inform 2016; 61:141-8. [PMID: 27044929 DOI: 10.1016/j.jbi.2016.03.026] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 03/22/2016] [Accepted: 03/31/2016] [Indexed: 11/19/2022]
Abstract
OBJECTIVES Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. METHODS We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. RESULTS The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). CONCLUSIONS The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents.
Collapse
Affiliation(s)
- Duy Duc An Bui
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Department of Preventive Medicine-Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA.
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Siddhartha Jonnalagadda
- Department of Preventive Medicine-Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA
| |
Collapse
|
17
|
Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev 2015; 4:78. [PMID: 26073888 PMCID: PMC4514954 DOI: 10.1186/s13643-015-0066-7] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 05/21/2015] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Automation of the parts of systematic review process, specifically the data extraction step, may be an important strategy to reduce the time necessary to complete a systematic review. However, the state of the science of automatically extracting data elements from full texts has not been well described. This paper performs a systematic review of published and unpublished methods to automate data extraction for systematic reviews. METHODS We systematically searched PubMed, IEEEXplore, and ACM Digital Library to identify potentially relevant articles. We included reports that met the following criteria: 1) methods or results section described what entities were or need to be extracted, and 2) at least one entity was automatically extracted with evaluation results that were presented for that entity. We also reviewed the citations from included reports. RESULTS Out of a total of 1190 unique citations that met our search criteria, we found 26 published reports describing automatic extraction of at least one of more than 52 potential data elements used in systematic reviews. For 25 (48 %) of the data elements used in systematic reviews, there were attempts from various researchers to extract information automatically from the publication text. Out of these, 14 (27 %) data elements were completely extracted, but the highest number of data elements extracted automatically by a single study was 7. Most of the data elements were extracted with F-scores (a mean of sensitivity and positive predictive value) of over 70 %. CONCLUSIONS We found no unified information extraction framework tailored to the systematic review process, and published reports focused on a limited (1-7) number of data elements. Biomedical natural language processing techniques have not been fully utilized to fully or even partially automate the data extraction step of systematic reviews.
Collapse
Affiliation(s)
- Siddhartha R Jonnalagadda
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 North Lake Shore Drive, 11th Floor, Chicago, IL, 60611, USA.
| | - Pawan Goyal
- Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, 721302, West Bengal, India.
| | - Mark D Huffman
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, USA.
| |
Collapse
|
18
|
Application of text mining in the biomedical domain. Methods 2015; 74:97-106. [PMID: 25641519 DOI: 10.1016/j.ymeth.2015.01.015] [Citation(s) in RCA: 76] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Revised: 01/21/2015] [Accepted: 01/23/2015] [Indexed: 12/12/2022] Open
Abstract
In recent years the amount of experimental data that is produced in biomedical research and the number of papers that are being published in this field have grown rapidly. In order to keep up to date with developments in their field of interest and to interpret the outcome of experiments in light of all available literature, researchers turn more and more to the use of automated literature mining. As a consequence, text mining tools have evolved considerably in number and quality and nowadays can be used to address a variety of research questions ranging from de novo drug target discovery to enhanced biological interpretation of the results from high throughput experiments. In this paper we introduce the most important techniques that are used for a text mining and give an overview of the text mining tools that are currently being used and the type of problems they are typically applied for.
Collapse
|