1
|
Balasubramanian V, Vivekanandhan S, Mahadevan V. Pandemic tele-smart: a contactless tele-health system for efficient monitoring of remotely located COVID-19 quarantine wards in India using near-field communication and natural language processing system. Med Biol Eng Comput 2021; 60:61-79. [PMID: 34705163 PMCID: PMC8548353 DOI: 10.1007/s11517-021-02456-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 10/07/2021] [Indexed: 11/28/2022]
Abstract
Efficient remote monitoring of the patient infected with coronavirus without spread to healthcare workers is the need of the hour. An effectual and faster communication system must be established wherein the healthcare workers at the remote quarantine ward can communicate with healthcare professionals present in specialty hospitals. Incidentally, there is a need to establish a contactless smart cloud-based connection between a specialty hospital and quarantine wards during pandemic situation. This paper proposes an initial contactless web-based tele-health clinical decision support system that integrates near-field communication (NFC) tags and a smart cloud-based structuring tool that enables the quick diagnosis of patients with COVID-19 symptoms and monitors the remotely located quarantine wards during the recent pandemic. The proposed framework consists of three-stages: (i) contactless health parameter extraction from the patient using an NFC tag; (ii) converting medical report into digital text using optical character recognition algorithm and extracting values of relevant medical-parameters using natural language processing; and (iii) smart visualization of key medical parameters. The accuracy of the proposed system from NFC reader until analysis using a novel structuring algorithm deployed in the cloud is more than 94%. Several capabilities of the proposed web-based system were compared with similar systems and tested in an authentic mock clinical setup, and the physicians found that the system is reliable and user friendly.
Collapse
Affiliation(s)
- Vishal Balasubramanian
- Department of Electronics & Communication Engineering, Rajalakshmi Engineering College, Chennai, 602105, India
| | - Sapthagirivasan Vivekanandhan
- Department of Biomedical Engineering, Rajalakshmi Engineering College, Chennai, 602105, India. .,Medical Devices and Healthcare Technologies Department, Engineering R&D Division, IT Service Company, Bengaluru, 560066, India.
| | | |
Collapse
|
2
|
Li P, Jiang X, Zhang G, Trabucco JT, Raciti D, Smith C, Ringwald M, Marai GE, Arighi C, Shatkay H. Utilizing image and caption information for biomedical document classification. Bioinformatics 2021; 37:i468-i476. [PMID: 34252939 PMCID: PMC8346654 DOI: 10.1093/bioinformatics/btab331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/06/2021] [Indexed: 11/15/2022] Open
Abstract
Motivation Biomedical research findings are typically disseminated through publications. To simplify access to domain-specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature—a labor intensive process. The first step toward biocuration requires identifying articles relevant to the specific area on which the database focuses. Thus, automatically identifying publications relevant to a specific topic within a large volume of publications is an important task toward expediting the biocuration process and, in turn, biomedical research. Current methods focus on textual contents, typically extracted from the title-and-abstract. Notably, images and captions are often used in publications to convey pivotal evidence about processes, experiments and results. Results We present a new document classification scheme, using both image and caption information, in addition to titles-and-abstracts. To use the image information, we introduce a new image representation, namely Figure-word, based on class labels of subfigures. We use word embeddings for representing captions and titles-and-abstracts. To utilize all three types of information, we introduce two information integration methods. The first combines Figure-words and textual features obtained from captions and titles-and-abstracts into a single larger vector for document representation; the second employs a meta-classification scheme. Our experiments and results demonstrate the usefulness of the newly proposed Figure-words for representing images. Moreover, the results showcase the value of Figure-words, captions and titles-and-abstracts in providing complementary information for document classification; these three sources of information when combined, lead to an overall improved classification performance. Availability and implementation Source code and the list of PMIDs of the publications in our datasets are available upon request.
Collapse
Affiliation(s)
- Pengyuan Li
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | - Xiangying Jiang
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA.,Amazon, Seattle, WA 98109, USA
| | - Gongbo Zhang
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA.,Google, Mountain View, CA 94043, USA
| | - Juan Trelles Trabucco
- Department of Computer Science, The University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Daniela Raciti
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | - G Elisabeta Marai
- Department of Computer Science, The University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | - Hagit Shatkay
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| |
Collapse
|
3
|
Leveraging Wikipedia knowledge to classify multilingual biomedical documents. Artif Intell Med 2018; 88:37-57. [PMID: 29730047 DOI: 10.1016/j.artmed.2018.04.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Revised: 04/06/2018] [Accepted: 04/23/2018] [Indexed: 11/23/2022]
Abstract
This article presents a classifier that leverages Wikipedia knowledge to represent documents as vectors of concepts weights, and analyses its suitability for classifying biomedical documents written in any language when it is trained only with English documents. We propose the cross-language concept matching technique, which relies on Wikipedia interlanguage links to convert concept vectors between languages. The performance of the classifier is compared to a classifier based on machine translation, and two classifiers based on MetaMap. To perform the experiments, we created two multilingual corpus. The first one, Multi-Lingual UVigoMED (ML-UVigoMED) is composed of 23,647 Wikipedia documents about biomedical topics written in English, German, French, Spanish, Italian, Galician, Romanian, and Icelandic. The second one, English-French-Spanish-German UVigoMED (EFSG-UVigoMED) is composed of 19,210 biomedical abstract extracted from MEDLINE written in English, French, Spanish, and German. The performance of the approach proposed is superior to any of the state-of-the art classifier in the benchmark. We conclude that leveraging Wikipedia knowledge is of great advantage in tasks of multilingual classification of biomedical documents.
Collapse
|
4
|
|
5
|
Mouriño-García MA, Pérez-Rodríguez R, Anido-Rifón LE. A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge*. Spanish-English Cross-language Case Study. Methods Inf Med 2017; 56:370-376. [PMID: 28816337 DOI: 10.3414/me17-01-0028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 07/07/2017] [Indexed: 11/09/2022]
Abstract
OBJECTIVES The ability to efficiently review the existing literature is essential for the rapid progress of research. This paper describes a classifier of text documents, represented as vectors in spaces of Wikipedia concepts, and analyses its suitability for classification of Spanish biomedical documents when only English documents are available for training. We propose the cross-language concept matching (CLCM) technique, which relies on Wikipedia interlanguage links to convert concept vectors from the Spanish to the English space. METHODS The performance of the classifier is compared to several baselines: a classifier based on machine translation, a classifier that represents documents after performing Explicit Semantic Analysis (ESA), and a classifier that uses a domain-specific semantic annotator (MetaMap). The corpus used for the experiments (Cross-Language UVigoMED) was purpose-built for this study, and it is composed of 12,832 English and 2,184 Spanish MEDLINE abstracts. RESULTS The performance of our approach is superior to any other state-of-the art classifier in the benchmark, with performance increases up to: 124% over classical machine translation, 332% over MetaMap, and 60 times over the classifier based on ESA. The results have statistical significance, showing p-values < 0.0001. CONCLUSION Using knowledge mined from Wikipedia to represent documents as vectors in a space of Wikipedia concepts and translating vectors between language-specific concept spaces, a cross-language classifier can be built, and it performs better than several state-of-the-art classifiers.
Collapse
|
6
|
|
7
|
Rybinski M, Aldana-Montes JF. tESA: a distributional measure for calculating semantic relatedness. J Biomed Semantics 2016; 7:67. [PMID: 28031037 PMCID: PMC5192592 DOI: 10.1186/s13326-016-0109-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2016] [Accepted: 11/13/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Semantic relatedness is a measure that quantifies the strength of a semantic link between two concepts. Often, it can be efficiently approximated with methods that operate on words, which represent these concepts. Approximating semantic relatedness between texts and concepts represented by these texts is an important part of many text and knowledge processing tasks of crucial importance in the ever growing domain of biomedical informatics. The problem of most state-of-the-art methods for calculating semantic relatedness is their dependence on highly specialized, structured knowledge resources, which makes these methods poorly adaptable for many usage scenarios. On the other hand, the domain knowledge in the Life Sciences has become more and more accessible, but mostly in its unstructured form - as texts in large document collections, which makes its use more challenging for automated processing. In this paper we present tESA, an extension to a well known Explicit Semantic Relatedness (ESA) method. RESULTS In our extension we use two separate sets of vectors, corresponding to different sections of the articles from the underlying corpus of documents, as opposed to the original method, which only uses a single vector space. We present an evaluation of Life Sciences domain-focused applicability of both tESA and domain-adapted Explicit Semantic Analysis. The methods are tested against a set of standard benchmarks established for the evaluation of biomedical semantic relatedness quality. Our experiments show that the propsed method achieves results comparable with or superior to the current state-of-the-art methods. Additionally, a comparative discussion of the results obtained with tESA and ESA is presented, together with a study of the adaptability of the methods to different corpora and their performance with different input parameters. CONCLUSIONS Our findings suggest that combined use of the semantics from different sections (i.e. extending the original ESA methodology with the use of title vectors) of the documents of scientific corpora may be used to enhance the performance of a distributional semantic relatedness measures, which can be observed in the largest reference datasets. We also present the impact of the proposed extension on the size of distributional representations.
Collapse
Affiliation(s)
- Maciej Rybinski
- Departamento LCC, University of Malaga, Campus Teatinos, Malaga, 29010, Spain
| | | |
Collapse
|
8
|
Lachiany M, Louzoun Y. Effects of distribution of infection rate on epidemic models. Phys Rev E 2016; 94:022409. [PMID: 27627337 PMCID: PMC7088461 DOI: 10.1103/physreve.94.022409] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 06/20/2016] [Indexed: 01/05/2023]
Abstract
A goal of many epidemic models is to compute the outcome of the epidemics from the observed infected early dynamics. However, often, the total number of infected individuals at the end of the epidemics is much lower than predicted from the early dynamics. This discrepancy is argued to result from human intervention or nonlinear dynamics not incorporated in standard models. We show that when variability in infection rates is included in standard susciptible-infected-susceptible (SIS) and susceptible-infected-recovered (SIR) models the total number of infected individuals in the late dynamics can be orders lower than predicted from the early dynamics. This discrepancy holds for SIS and SIR models, where the assumption that all individuals have the same sensitivity is eliminated. In contrast with network models, fixed partnerships are not assumed. We derive a moment closure scheme capturing the distribution of sensitivities. We find that the shape of the sensitivity distribution does not affect R_{0} or the number of infected individuals in the early phases of the epidemics. However, a wide distribution of sensitivities reduces the total number of removed individuals in the SIR model and the steady-state infected fraction in the SIS model. The difference between the early and late dynamics implies that in order to extrapolate the expected effect of the epidemics from the initial phase of the epidemics, the rate of change in the average infectivity should be computed. These results are supported by a comparison of the theoretical model to the Ebola epidemics and by numerical simulation.
Collapse
Affiliation(s)
| | - Yoram Louzoun
- Gonda Brain Research Center and Department of Mathematics, Bar-Ilan University, Ramat Gan 52900, Israel
| |
Collapse
|
9
|
Bui DDA, Del Fiol G, Jonnalagadda S. PDF text classification to leverage information extraction from publication reports. J Biomed Inform 2016; 61:141-8. [PMID: 27044929 DOI: 10.1016/j.jbi.2016.03.026] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 03/22/2016] [Accepted: 03/31/2016] [Indexed: 11/19/2022]
Abstract
OBJECTIVES Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. METHODS We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. RESULTS The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). CONCLUSIONS The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents.
Collapse
Affiliation(s)
- Duy Duc An Bui
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Department of Preventive Medicine-Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA.
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
| | - Siddhartha Jonnalagadda
- Department of Preventive Medicine-Health and Biomedical Informatics, Northwestern University, Chicago, IL, USA
| |
Collapse
|