1
|
Kirmani M, Kour G, Mohd M, Sheikh N, Khan DA, Maqbool Z, Wani MA, Wani AH. Biomedical semantic text summarizer. BMC Bioinformatics 2024; 25:152. [PMID: 38627652 PMCID: PMC11022460 DOI: 10.1186/s12859-024-05712-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 02/19/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND Text summarization is a challenging problem in Natural Language Processing, which involves condensing the content of textual documents without losing their overall meaning and information content, In the domain of bio-medical research, summaries are critical for efficient data analysis and information retrieval. While several bio-medical text summarizers exist in the literature, they often miss out on an essential text aspect: text semantics. RESULTS This paper proposes a novel extractive summarizer that preserves text semantics by utilizing bio-semantic models. We evaluate our approach using ROUGE on a standard dataset and compare it with three state-of-the-art summarizers. Our results show that our approach outperforms existing summarizers. CONCLUSION The usage of semantics can improve summarizer performance and lead to better summaries. Our summarizer has the potential to aid in efficient data analysis and information retrieval in the field of biomedical research.
Collapse
Affiliation(s)
- Mahira Kirmani
- University Institute of Computing, Chandigarh University, NH-05-Chandigarh-Ludhiana, Mohali, Punjab, India
| | - Gagandeep Kour
- University Institute of Computing, Chandigarh University, NH-05-Chandigarh-Ludhiana, Mohali, Punjab, India
| | - Mudasir Mohd
- Department of Computer Science, University of Kashmir, South Campus, Anantnag, Jammu and Kashmir, India.
| | | | | | - Zahid Maqbool
- Department of Computer Science, Government Degree College Bemina, Srinagar, Jammu and Kashmir, India
| | - Mohsin Altaf Wani
- Department of Computer Science, University of Kashmir, South Campus, Anantnag, Jammu and Kashmir, India
| | - Abid Hussain Wani
- Department of Computer Science, University of Kashmir, South Campus, Anantnag, Jammu and Kashmir, India
| |
Collapse
|
2
|
Guo Y, Qiu W, Leroy G, Wang S, Cohen T. Retrieval augmentation of large language models for lay language generation. J Biomed Inform 2024; 149:104580. [PMID: 38163514 PMCID: PMC10874606 DOI: 10.1016/j.jbi.2023.104580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Revised: 12/05/2023] [Accepted: 12/17/2023] [Indexed: 01/03/2024]
Abstract
The complex linguistic structures and specialized terminology of expert-authored content limit the accessibility of biomedical literature to the general public. Automated methods have the potential to render this literature more interpretable to readers with different educational backgrounds. Prior work has framed such lay language generation as a summarization or simplification task. However, adapting biomedical text for the lay public includes the additional and distinct task of background explanation: adding external content in the form of definitions, motivation, or examples to enhance comprehensibility. This task is especially challenging because the source document may not include the required background knowledge. Furthermore, background explanation capabilities have yet to be formally evaluated, and little is known about how best to enhance them. To address this problem, we introduce Retrieval-Augmented Lay Language (RALL) generation, which intuitively fits the need for external knowledge beyond that in expert-authored source documents. In addition, we introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. To evaluate RALL, we augmented state-of-the-art text generation models with information retrieval of either term definitions from the UMLS and Wikipedia, or embeddings of explanations from Wikipedia documents. Of these, embedding-based RALL models improved summary quality and simplicity while maintaining factual correctness, suggesting that Wikipedia is a helpful source for background explanation in this context. We also evaluated the ability of both an open-source Large Language Model (Llama 2) and a closed-source Large Language Model (GPT-4) in background explanation, with and without retrieval augmentation. Results indicate that these LLMs can generate simplified content, but that the summary quality is not ideal. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. Our code and data are publicly available at: https://github.com/LinguisticAnomalies/pls_retrieval.
Collapse
Affiliation(s)
- Yue Guo
- Biomedical and Health Informatics, University of Washington, United States of America.
| | - Wei Qiu
- Paul G. Allen School of Computer Science, University of Washington, United States of America
| | - Gondy Leroy
- Management Information Systems, University of Arizona, United States of America
| | - Sheng Wang
- Paul G. Allen School of Computer Science, University of Washington, United States of America
| | - Trevor Cohen
- Biomedical and Health Informatics, University of Washington, United States of America
| |
Collapse
|
3
|
Keszthelyi D, Gaudet-Blavignac C, Bjelogrlic M, Lovis C. Patient Information Summarization in Clinical Settings: Scoping Review. JMIR Med Inform 2023; 11:e44639. [PMID: 38015588 DOI: 10.2196/44639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 03/15/2023] [Accepted: 07/25/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Information overflow, a common problem in the present clinical environment, can be mitigated by summarizing clinical data. Although there are several solutions for clinical summarization, there is a lack of a complete overview of the research relevant to this field. OBJECTIVE This study aims to identify state-of-the-art solutions for clinical summarization, to analyze their capabilities, and to identify their properties. METHODS A scoping review of articles published between 2005 and 2022 was conducted. With a clinical focus, PubMed and Web of Science were queried to find an initial set of reports, later extended by articles found through a chain of citations. The included reports were analyzed to answer the questions of where, what, and how medical information is summarized; whether summarization conserves temporality, uncertainty, and medical pertinence; and how the propositions are evaluated and deployed. To answer how information is summarized, methods were compared through a new framework "collect-synthesize-communicate" referring to information gathering from data, its synthesis, and communication to the end user. RESULTS Overall, 128 articles were included, representing various medical fields. Exclusively structured data were used as input in 46.1% (59/128) of papers, text in 41.4% (53/128) of articles, and both in 10.2% (13/128) of papers. Using the proposed framework, 42.2% (54/128) of the records contributed to information collection, 27.3% (35/128) contributed to information synthesis, and 46.1% (59/128) presented solutions for summary communication. Numerous summarization approaches have been presented, including extractive (n=13) and abstractive summarization (n=19); topic modeling (n=5); summary specification (n=11); concept and relation extraction (n=30); visual design considerations (n=59); and complete pipelines (n=7) using information extraction, synthesis, and communication. Graphical displays (n=53), short texts (n=41), static reports (n=7), and problem-oriented views (n=7) were the most common types in terms of summary communication. Although temporality and uncertainty information were usually not conserved in most studies (74/128, 57.8% and 113/128, 88.3%, respectively), some studies presented solutions to treat this information. Overall, 115 (89.8%) articles showed results of an evaluation, and methods included evaluations with human participants (median 15, IQR 24 participants): measurements in experiments with human participants (n=31), real situations (n=8), and usability studies (n=28). Methods without human involvement included intrinsic evaluation (n=24), performance on a proxy (n=10), or domain-specific tasks (n=11). Overall, 11 (8.6%) reports described a system deployed in clinical settings. CONCLUSIONS The scientific literature contains many propositions for summarizing patient information but reports very few comparisons of these proposals. This work proposes to compare these algorithms through how they conserve essential aspects of clinical information and through the "collect-synthesize-communicate" framework. We found that current propositions usually address these 3 steps only partially. Moreover, they conserve and use temporality, uncertainty, and pertinent medical aspects to varying extents, and solutions are often preliminary.
Collapse
Affiliation(s)
- Daniel Keszthelyi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christophe Gaudet-Blavignac
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Mina Bjelogrlic
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland
- Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| |
Collapse
|
4
|
Rani S, Jain A. Optimizing healthcare system by amalgamation of text processing and deep learning: a systematic review. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-25. [PMID: 37362695 PMCID: PMC10183315 DOI: 10.1007/s11042-023-15539-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 05/18/2022] [Accepted: 04/19/2023] [Indexed: 06/28/2023]
Abstract
The explosion of clinical textual data has drawn the attention of researchers. Owing to the abundance of clinical data, it is becoming difficult for healthcare professionals to take real-time measures. The tools and methods are lacking when compared to the amount of clinical data generated every day. This review aims to survey the text processing pipeline with deep learning methods such as CNN, RNN, LSTM, and GRU in the healthcare domain and discuss various applications such as clinical concept detection and extraction, medically aware dialogue systems, sentiment analysis of drug reviews shared online, clinical trial matching, and pharmacovigilance. In addition, we highlighted the major challenges in deploying text processing with deep learning to clinical textual data and identified the scope of research in this domain. Furthermore, we have discussed various resources that can be used in the future to optimize the healthcare domain by amalgamating text processing and deep learning.
Collapse
Affiliation(s)
- Somiya Rani
- Department of Computer Science and Engineering, NSUT East Campus (erstwhile AIACTR), Affiliated to Guru Gobind Singh Indraprastha University, Delhi, India
| | - Amita Jain
- Department of Computer Science and Engineering, Netaji Subhas University of Technology, Delhi, India
| |
Collapse
|
5
|
Luo M, Li S, Pang Y, Yao L, Ma R, Huang HY, Huang HD, Lee TY. Extraction of microRNA-target interaction sentences from biomedical literature by deep learning approach. Brief Bioinform 2023; 24:6847797. [PMID: 36440972 DOI: 10.1093/bib/bbac497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 10/16/2022] [Accepted: 10/19/2022] [Indexed: 11/29/2022] Open
Abstract
MicroRNA (miRNA)-target interaction (MTI) plays a substantial role in various cell activities, molecular regulations and physiological processes. Published biomedical literature is the carrier of high-confidence MTI knowledge. However, digging out this knowledge in an efficient manner from large-scale published articles remains challenging. To address this issue, we were motivated to construct a deep learning-based model. We applied the pre-trained language models to biomedical text to obtain the representation, and subsequently fed them into a deep neural network with gate mechanism layers and a fully connected layer for the extraction of MTI information sentences. Performances of the proposed models were evaluated using two datasets constructed on the basis of text data obtained from miRTarBase. The validation and test results revealed that incorporating both PubMedBERT and SciBERT for sentence level encoding with the long short-term memory (LSTM)-based deep neural network can yield an outstanding performance, with both F1 and accuracy being higher than 80% on validation data and test data. Additionally, the proposed deep learning method outperformed the following machine learning methods: random forest, support vector machine, logistic regression and bidirectional LSTM. This work would greatly facilitate studies on MTI analysis and regulations. It is anticipated that this work can assist in large-scale screening of miRNAs, thereby revealing their functional roles in various diseases, which is important for the development of highly specific drugs with fewer side effects. Source code and corpus are publicly available at https://github.com/qi29.
Collapse
Affiliation(s)
- Mengqi Luo
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China; School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen
| | - Yuxuan Pang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China, and also in the School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, PR China
| | - Lantian Yao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, PR China, and also in the School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, PR China
| | - Renfei Ma
- Warshel Institute for Computational Biology, Chinese University of Hong Kong, Shenzhen; School of Life Sciences, University of Science and Technology of China, Hefei, China
| | - Hsi-Yuan Huang
- School of Medicine and the Warshel Institute of Computational Biology, The Chinese University of Hong Kong, Shenzhen
| | - Hsien-Da Huang
- School of Medicine, and the executive director of Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China
| |
Collapse
|
6
|
Abstract
In recent years, the evolution of technology has led to an increase in text data obtained from many sources. In the biomedical domain, text information has also evidenced this accelerated growth, and automatic text summarization systems play an essential role in optimizing physicians’ time resources and identifying relevant information. In this paper, we present a systematic review in recent research of text summarization for biomedical textual data, focusing mainly on the methods employed, type of input data text, areas of application, and evaluation metrics used to assess systems. The survey was limited to the period between 1st January 2014 and 15th March 2022. The data collected was obtained from WoS, IEEE, and ACM digital libraries, while the search strategies were developed with the help of experts in NLP techniques and previous systematic reviews. The four phases of a systematic review by PRISMA methodology were conducted, and five summarization factors were determined to assess the studies included: Input, Purpose, Output, Method, and Evaluation metric. Results showed that 3.5% of 801 studies met the inclusion criteria. Moreover, Single-document, Biomedical Literature, Generic, and Extractive summarization proved to be the most common approaches employed, while techniques based on Machine Learning were performed in 16 studies and Rouge (Recall-Oriented Understudy for Gisting Evaluation) was reported as the evaluation metric in 26 studies. This review found that in recent years, more transformer-based methodologies for summarization purposes have been implemented compared to a previous survey. Additionally, there are still some challenges in text summarization in different domains, especially in the biomedical field in terms of demand for further research.
Collapse
|
7
|
Du Y, Zhao Y, Yan J, Li Q. UGDAS: Unsupervised Graph-Network based Denoiser for Abstractive Summarization in biomedical domain. Methods 2022; 203:160-166. [DOI: 10.1016/j.ymeth.2022.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/14/2021] [Accepted: 03/20/2022] [Indexed: 10/18/2022] Open
|
8
|
Ozyegen O, Kabe D, Cevik M. Word-level text highlighting of medical texts for telehealth services. Artif Intell Med 2022; 127:102284. [DOI: 10.1016/j.artmed.2022.102284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/02/2022]
|
9
|
Kalyan KS, Rajasekharan A, Sangeetha S. AMMU: A survey of transformer-based biomedical pretrained language models. J Biomed Inform 2021; 126:103982. [PMID: 34974190 DOI: 10.1016/j.jbi.2021.103982] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 12/12/2021] [Accepted: 12/20/2021] [Indexed: 01/04/2023]
Abstract
Transformer-based pretrained language models (PLMs) have started a new era in modern natural language processing (NLP). These models combine the power of transformers, transfer learning, and self-supervised learning (SSL). Following the success of these models in the general domain, the biomedical research community has developed various in-domain PLMs starting from BioBERT to the latest BioELECTRA and BioALBERT models. We strongly believe there is a need for a survey paper that can provide a comprehensive survey of various transformer-based biomedical pretrained language models (BPLMs). In this survey, we start with a brief overview of foundational concepts like self-supervised learning, embedding layer and transformer encoder layers. We discuss core concepts of transformer-based PLMs like pretraining methods, pretraining tasks, fine-tuning methods, and various embedding types specific to biomedical domain. We introduce a taxonomy for transformer-based BPLMs and then discuss all the models. We discuss various challenges and present possible solutions. We conclude by highlighting some of the open issues which will drive the research community to further improve transformer-based BPLMs. The list of all the publicly available transformer-based BPLMs along with their links is provided at https://mr-nlp.github.io/posts/2021/05/transformer-based-biomedical-pretrained-language-models-list/.
Collapse
|
10
|
Jing X. The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis. JMIR Med Inform 2021; 9:e20675. [PMID: 34236337 PMCID: PMC8433943 DOI: 10.2196/20675] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 11/25/2020] [Accepted: 07/02/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND The Unified Medical Language System (UMLS) has been a critical tool in biomedical and health informatics, and the year 2021 marks its 30th anniversary. The UMLS brings together many broadly used vocabularies and standards in the biomedical field to facilitate interoperability among different computer systems and applications. OBJECTIVE Despite its longevity, there is no comprehensive publication analysis of the use of the UMLS. Thus, this review and analysis is conducted to provide an overview of the UMLS and its use in English-language peer-reviewed publications, with the objective of providing a comprehensive understanding of how the UMLS has been used in English-language peer-reviewed publications over the last 30 years. METHODS PubMed, ACM Digital Library, and the Nursing & Allied Health Database were used to search for studies. The primary search strategy was as follows: UMLS was used as a Medical Subject Headings term or a keyword or appeared in the title or abstract. Only English-language publications were considered. The publications were screened first, then coded and categorized iteratively, following the grounded theory. The review process followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. RESULTS A total of 943 publications were included in the final analysis. Moreover, 32 publications were categorized into 2 categories; hence the total number of publications before duplicates are removed is 975. After analysis and categorization of the publications, UMLS was found to be used in the following emerging themes or areas (the number of publications and their respective percentages are given in parentheses): natural language processing (230/975, 23.6%), information retrieval (125/975, 12.8%), terminology study (90/975, 9.2%), ontology and modeling (80/975, 8.2%), medical subdomains (76/975, 7.8%), other language studies (53/975, 5.4%), artificial intelligence tools and applications (46/975, 4.7%), patient care (35/975, 3.6%), data mining and knowledge discovery (25/975, 2.6%), medical education (20/975, 2.1%), degree-related theses (13/975, 1.3%), digital library (5/975, 0.5%), and the UMLS itself (150/975, 15.4%), as well as the UMLS for other purposes (27/975, 2.8%). CONCLUSIONS The UMLS has been used successfully in patient care, medical education, digital libraries, and software development, as originally planned, as well as in degree-related theses, the building of artificial intelligence tools, data mining and knowledge discovery, foundational work in methodology, and middle layers that may lead to advanced products. Natural language processing, the UMLS itself, and information retrieval are the 3 most common themes that emerged among the included publications. The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.
Collapse
Affiliation(s)
- Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social and Health Sciences, Clemson University, Clemson, SC, United States
| |
Collapse
|
11
|
Wang M, Wang M, Yu F, Yang Y, Walker J, Mostafa J. A systematic review of automatic text summarization for biomedical literature and EHRs. J Am Med Inform Assoc 2021; 28:2287-2297. [PMID: 34338801 DOI: 10.1093/jamia/ocab143] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 06/21/2021] [Accepted: 06/24/2021] [Indexed: 01/09/2023] Open
Abstract
OBJECTIVE Biomedical text summarization helps biomedical information seekers avoid information overload by reducing the length of a document while preserving the contents' essence. Our systematic review investigates the most recent biomedical text summarization researches on biomedical literature and electronic health records by analyzing their techniques, areas of application, and evaluation methods. We identify gaps and propose potential directions for future research. MATERIALS AND METHODS This review followed the PRISMA methodology and replicated the approaches adopted by the previous systematic review published on the same topic. We searched 4 databases (PubMed, ACM Digital Library, Scopus, and Web of Science) from January 1, 2013 to April 8, 2021. Two reviewers independently screened title, abstract, and full-text for all retrieved articles. The conflicts were resolved by the third reviewer. The data extraction of the included articles was in 5 dimensions: input, purpose, output, method, and evaluation. RESULTS Fifty-eight out of 7235 retrieved articles met the inclusion criteria. Thirty-nine systems used single-document biomedical research literature as their input, 17 systems were explicitly designed for clinical support, 47 systems generated extractive summaries, and 53 systems adopted hybrid methods combining computational linguistics, machine learning, and statistical approaches. As for the assessment, 51 studies conducted an intrinsic evaluation using predefined metrics. DISCUSSION AND CONCLUSION This study found that current biomedical text summarization systems have achieved good performance using hybrid methods. Studies on electronic health records summarization have been increasing compared to a previous survey. However, the majority of the works still focus on summarizing literature.
Collapse
Affiliation(s)
- Mengqian Wang
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Manhua Wang
- iSchool, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Fei Yu
- iSchool, University of North Carolina, Chapel Hill, North Carolina, USA.,Health Sciences Library, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Yue Yang
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jennifer Walker
- Health Sciences Library, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Javed Mostafa
- Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina, USA.,iSchool, University of North Carolina, Chapel Hill, North Carolina, USA.,Biomedical Research Imaging Center, the School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
12
|
Exploring and predicting mortality among patients with end-stage liver disease without cancer: a machine learning approach. Eur J Gastroenterol Hepatol 2021; 33:1117-1123. [PMID: 33905216 DOI: 10.1097/meg.0000000000002169] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
OBJECTIVE End-stage liver disease is a global public health problem with a high mortality rate. Early identification of people at risk of poor prognosis is fundamental for decision-making in clinical settings. This study created a machine learning prediction system that provides several related models with visualized graphs, including decision trees, ensemble learning and clustering, to predict mortality in patients with end-stage liver disease. METHODS A retrospective cohort study was conducted: the training data were from patients enrolled from January 2009 to December 2010 and followed up to December 2014; validation data were from patients enrolled from January 2015 to December 2016 and followed up to January 2019. Hospitalized patients with noncancer-related chronic liver disease were identified from the hospital's electrical medical records. RESULTS In traditional multivariable logistic regression and Cox proportional hazard model, prothrombin time of international normalized ratio, which was significant with P value = 0.002, odds ratio = 2.790 and hazard ratio 1.363. Besides, blood urea nitrogen and C-reactive protein were also significant, with P value <0.001 and 0.026. The area under the curve was 0.771 in the receiver operating characteristic curve. In machine learning, blood urea nitrogen and age were regarded as the primary factors for predicting mortality. Creatinine, prothrombin time of international normalized ratio and bilirubin were also significant mortality predictors. The area under the curve of the random forest and AdaBoost was 0.838 and 0.792. CONCLUSION The machine learning techniques provided a comprehensive assessment of patient conditions; it could help physicians make an accurate diagnosis of chronic liver disease and improve healthcare management.
Collapse
|
13
|
Mallick C, Das AK, Ding W, Nayak J. Ensemble summarization of bio-medical articles integrating clustering and multi-objective evolutionary algorithms. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Mallick C, Das AK, Nayak J, Pelusi D, Vimal S. Evolutionary Algorithm based Ensemble Extractive Summarization for Developing Smart Medical System. Interdiscip Sci 2021; 13:229-259. [PMID: 33576956 DOI: 10.1007/s12539-020-00412-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Revised: 12/17/2020] [Accepted: 12/21/2020] [Indexed: 11/25/2022]
Abstract
The amount of information in the scientific literature of the bio-medical domain is growing exponentially, which makes it difficult in developing a smart medical system. Summarization techniques help for efficient searching and understanding of relevant information from the medical documents. In the paper, an evolutionary algorithm based ensemble extractive summarization technique is devised as a smart medical application with the idea of hybrid artificial intelligence on natural language processing. We have considered the abstracts of the target article and its cited articles as the base summaries and a multi-objective evolutionary algorithm is applied for generating the ensemble summary of the target article. Each sentence of the base summaries is represented by a concept vector of the medical terms contained in it with the help of the Unified Modelling Language System (UMLS) tool which is widely used in various smart medical applications. These terms carry the key information of the sentence which is very useful to find out the semantic similarity among the sentences. Fitness functions of the evolutionary algorithm are mainly defined using clustering coefficient and sparsity index, the concepts of graph theory. After the convergence of the algorithm, the best solution of the final population gives the ensemble summary. Next, the semantic similarity of each sentence in the target article with the ensemble summary is calculated and the sentences which are most similar to the ensemble summary are considered as the summary of the target article. The method is applied to the articles available in the PubMed MEDLINE database system and experimental results are compared with some state of the art methods applied in the Bio-medical domain. Experimental results and comparative study based on the performance evaluation show that the method competes with some recently proposed summarization methods and outperforms others, which express the effectiveness of the proposed methodology. Different statistical tests have also been made to observe that the method is statistically significant.
Collapse
Affiliation(s)
- Chirantana Mallick
- Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India
| | - Asit Kumar Das
- Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India.
| | - Janmenjoy Nayak
- Department of Computer Science and Engineering, Aditya Institute of Technology and Management (AITAM), Tekkali, Andhra Pradesh, 532201, India
| | - Danilo Pelusi
- Department of Communications Sciences, University of Teramo, Teramo, Italy
| | - S Vimal
- Department of Information Technology, National Engineering College, K.R.Nagar, Kovilpatti, Thoothukudi District, Tamilnadu, 628503, India
| |
Collapse
|
15
|
Yu CS, Chang SS, Lin CH, Lin YJ, Wu JL, Chen RJ. Identify the Characteristics of Metabolic Syndrome and Non-obese Phenotype: Data Visualization and a Machine Learning Approach. Front Med (Lausanne) 2021; 8:626580. [PMID: 33898478 PMCID: PMC8058220 DOI: 10.3389/fmed.2021.626580] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/08/2021] [Indexed: 12/16/2022] Open
Abstract
Introduction: A third of the world's population is classified as having Metabolic Syndrome (MetS). Traditional diagnostic criteria for MetS are based on three or more of five components. However, the outcomes of patients with different combinations of specific metabolic components are undefined. It is challenging to be discovered and introduce treatment in advance for intervention, since the related research is still insufficient. Methods: This retrospective cohort study attempted to establish a method of visualizing metabolic components by using unsupervised machine learning and treemap technology to discover the relations between predicting factors and different metabolic components. Several supervised machine-learning models were used to explore significant predictors of MetS and to construct a powerful prediction model for preventive medicine. Results: The random forest had the best performance with accuracy and c-statistic of 0.947 and 0.921, respectively, and found that body mass index, glycated hemoglobin, and controlled attenuation parameter (CAP) score were the optimal primary predictors of MetS. In treemap, high triglyceride level plus high fasting blood glucose or large waist circumference group had higher CAP scores (>260) than other groups. Moreover, 32.2% of patients with high CAP scores during 3 years of follow-up had metabolic diseases are observed. This reveals that the CAP score may be used for detecting MetS, especially for the non-obese MetS phenotype. Conclusions: Machine learning and data visualization can illustrate the complicated relationships between metabolic components and potential risk factors for MetS.
Collapse
Affiliation(s)
- Cheng-Sheng Yu
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.,Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan.,Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Shy-Shin Chang
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Chang-Hsien Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Yu-Jiun Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Jenny L Wu
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.,Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Ray-Jade Chen
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.,Division of General Surgery, Department of Surgery, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Surgery, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
16
|
Drobne D. Adding Toxicological Context to Nanotoxicity Study Reporting Using the NanoTox Metadata List. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2021; 17:e2005622. [PMID: 33605049 DOI: 10.1002/smll.202005622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 11/08/2020] [Indexed: 06/12/2023]
Abstract
This paper proposes a list of specifications (NanoTox metadata list) to be reported about nanotoxicity experiments (metadata) together with resultant data to add toxicological context to reported studies. In areas involving nanomaterials (NMs), existing metadata reporting standards include the reporting of experimental conditions and protocols (MIRIBEL) and material characteristics (MINChar and MIAN), as well as reporting focused on specific experiments (MINBE). NanoCRED is a similarly transparent and structured framework, however, it is developed to guide risk assessors in evaluating the reliability and relevance of NM ecotoxicity studies. There is no reporting standard which would include interpretation of the aims and outcomes of nanotoxicity studies beyond regulatory purposes. The proposed NanoTox metadata reporting checklist is elaborated to extend reporting toward describing nanotoxicological context and thus is a logical complement to technology/material-assay focused reporting checklists. It is further designed to allow for NM toxicity data and knowledge integration, reuse, and communication. Its ultimate goal is to adhere to the basic rules of toxicology when taking a stand on the toxicity of NMs and to limit speculations on safety. As nanotoxicology becomes more interdisciplinary with the advent of new tools and new materials to be tested, reporting standards will contribute to cross-disciplinary communication.
Collapse
Affiliation(s)
- Damjana Drobne
- Department of Biology, Biotechnical Faculty, University of Ljubljana, Večna pot 111, Ljubljana, 1000, Slovenia
| |
Collapse
|
17
|
Summarization of biomedical articles using domain-specific word embeddings and graph ranking. J Biomed Inform 2020; 107:103452. [DOI: 10.1016/j.jbi.2020.103452] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 05/06/2020] [Accepted: 05/09/2020] [Indexed: 12/21/2022]
|
18
|
Du Y, Li Q, Wang L, He Y. Biomedical-domain pre-trained language model for extractive summarization. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105964] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
19
|
Ma TH, Wang HM, Zhao YW, Tian Y, Al-Nabhan N. Topic-based automatic summarization algorithm for Chinese short text. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2020; 17:3582-3600. [PMID: 32987545 DOI: 10.3934/mbe.2020202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Most current automatic summarization methods are for English texts. The distinction between words in Chinese text is large, the types of parts of speech are many and complex, and polysemy or ambiguous words appear frequently. Therefore, compared with English text, Chinese text is more difficult to extract useful feature words. Due to the complex syntax of Chinese, there are currently relatively few automatic summarization methods for Chinese text. In the past, only the important sentences in the original text can be selected and simply arranged to obtain a summary with chaotic sentences and insufficient coherence. Meanwhile, because Chinese short text usually contains more redundant information and the sentence structure is not neat, we propose a topic-based automatic summary method for Chinese short text. Firstly, a key sentence selection method is proposed combining topic words and TF-IDF to obtain the score of each text corresponding to the topic in the original text data. Then the sentence with the highest score as the topic sentence of the topic is selected. Considering that the short text of Weibo may contain a lot of irrelevant information and sometimes even lack some important components of topic, three retouching mechanisms are proposed to improve the conciseness, richness and readability of topic sentence extraction results. We validate our approach on natural disaster and social hot event datasets from Sina Weibo. The experimental results show that the polished topic summary not only reflects the exact relationship between topic sentences and natural disasters or social hot events, but also has rich semantic information. More importantly, we can almost grasp the basic elements of natural disaster or social hot event from the topic sentence, so as to help the government guide disaster relief or meet the needs of users for quickly obtaining information of social hot events.
Collapse
Affiliation(s)
- Ting Huai Ma
- Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Hong Mei Wang
- Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Yu Wei Zhao
- Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Yuan Tian
- Nanjing Institute of Technology, Nanjing 211167, China
| | | |
Collapse
|
20
|
Zhang X, Geng P, Zhang T, Lu Q, Gao P, Mei J. Aceso: PICO-guided Evidence Summarization on Medical Literature. IEEE J Biomed Health Inform 2020; PP:2663-2670. [PMID: 32275627 DOI: 10.1109/jbhi.2020.2984704] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Evidence-Based Medicine (EBM) aims to apply the best available evidence gained from scientific methods to clinical decision making. A generally accepted criterion to formulate evidence is to use the PICO framework, where PICO stands for Problem/Population, Intervention, Comparison, and Outcome. Automatic extraction of PICO-related sentences from medical literature is crucial to the success of many EBM applications. In this work, we present our Aceso system, which automatically generates PICO-based evidence summaries from medical literature. In Aceso 1, we adopt an active learning paradigm, which helps to minimize the cost of manual labeling and to optimize the quality of summarization with limited labeled data. An UMLS2Vec model is proposed to learn a vector representation of medical concepts in UMLS 2, and we fuse the embedding of medical knowledge with textual features in summarization. The evaluation shows that our approach is better on identifying PICO sentences against state-of-the-art studies and outperforms baseline methods on producing high-quality evidence summaries.
Collapse
|
21
|
Yu CS, Lin YJ, Lin CH, Wang ST, Lin SY, Lin SH, Wu JL, Chang SS. Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study. JMIR Med Inform 2020; 8:e17110. [PMID: 32202504 PMCID: PMC7136841 DOI: 10.2196/17110] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 02/07/2020] [Accepted: 03/05/2020] [Indexed: 12/18/2022] Open
Abstract
Background Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. Objective We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. Methods Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. Results Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. Conclusions Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.
Collapse
Affiliation(s)
- Cheng-Sheng Yu
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Yu-Jiun Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Chang-Hsien Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Sen-Te Wang
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Shiyng-Yu Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Sanders H Lin
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
| | - Jenny L Wu
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Shy-Shin Chang
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan.,Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
22
|
Moradi M, Dorffner G, Samwald M. Deep contextualized embeddings for quantifying the informative content in biomedical text summarization. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 184:105117. [PMID: 31627150 DOI: 10.1016/j.cmpb.2019.105117] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 09/19/2019] [Accepted: 10/03/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Capturing the context of text is a challenging task in biomedical text summarization. The objective of this research is to show how contextualized embeddings produced by a deep bidirectional language model can be utilized to quantify the informative content of sentences in biomedical text summarization. METHODS We propose a novel summarization method that utilizes contextualized embeddings generated by the Bidirectional Encoder Representations from Transformers (BERT) model, a deep learning model that recently demonstrated state-of-the-art results in several natural language processing tasks. We combine different versions of BERT with a clustering method to identify the most relevant and informative sentences of input documents. Using the ROUGE toolkit, we evaluate the summarizer against several methods previously described in literature. RESULTS The summarizer obtains state-of-the-art results and significantly improves the performance of biomedical text summarization in comparison to a set of domain-specific and domain-independent methods. The largest language model not specifically pretrained on biomedical text outperformed other models. However, among language models of the same size, the one further pretrained on biomedical text obtained best results. CONCLUSIONS We demonstrate that a hybrid system combining a deep bidirectional language model and a clustering method yields state-of-the-art results without requiring labor-intensive creation of annotated features or knowledge bases or computationally demanding domain-specific pretraining. This study provides a starting point towards investigating deep contextualized language models for biomedical text summarization.
Collapse
Affiliation(s)
- Milad Moradi
- Institute for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria.
| | - Georg Dorffner
- Institute for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Matthias Samwald
- Institute for Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
23
|
Mufti HN, Hirsch GM, Abidi SR, Abidi SSR. Exploiting Machine Learning Algorithms and Methods for the Prediction of Agitated Delirium After Cardiac Surgery: Models Development and Validation Study. JMIR Med Inform 2019; 7:e14993. [PMID: 31558433 PMCID: PMC6913743 DOI: 10.2196/14993] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 09/02/2019] [Accepted: 09/24/2019] [Indexed: 12/28/2022] Open
Abstract
Background Delirium is a temporary mental disorder that occasionally affects patients undergoing surgery, especially cardiac surgery. It is strongly associated with major adverse events, which in turn leads to increased cost and poor outcomes (eg, need for nursing home due to cognitive impairment, stroke, and death). The ability to foresee patients at risk of delirium will guide the timely initiation of multimodal preventive interventions, which will aid in reducing the burden and negative consequences associated with delirium. Several studies have focused on the prediction of delirium. However, the number of studies in cardiac surgical patients that have used machine learning methods is very limited. Objective This study aimed to explore the application of several machine learning predictive models that can pre-emptively predict delirium in patients undergoing cardiac surgery and compare their performance. Methods We investigated a number of machine learning methods to develop models that can predict delirium after cardiac surgery. A clinical dataset comprising over 5000 actual patients who underwent cardiac surgery in a single center was used to develop the models using logistic regression, artificial neural networks (ANN), support vector machines (SVM), Bayesian belief networks (BBN), naïve Bayesian, random forest, and decision trees. Results Only 507 out of 5584 patients (11.4%) developed delirium. We addressed the underlying class imbalance, using random undersampling, in the training dataset. The final prediction performance was validated on a separate test dataset. Owing to the target class imbalance, several measures were used to evaluate algorithm’s performance for the delirium class on the test dataset. Out of the selected algorithms, the SVM algorithm had the best F1 score for positive cases, kappa, and positive predictive value (40.2%, 29.3%, and 29.7%, respectively) with a P=.01, .03, .02, respectively. The ANN had the best receiver-operator area-under the curve (78.2%; P=.03). The BBN had the best precision-recall area-under the curve for detecting positive cases (30.4%; P=.03). Conclusions Although delirium is inherently complex, preventive measures to mitigate its negative effect can be applied proactively if patients at risk are prospectively identified. Our results highlight 2 important points: (1) addressing class imbalance on the training dataset will augment machine learning model’s performance in identifying patients likely to develop postoperative delirium, and (2) as the prediction of postoperative delirium is difficult because it is multifactorial and has complex pathophysiology, applying machine learning methods (complex or simple) may improve the prediction by revealing hidden patterns, which will lead to cost reduction by prevention of complications and will optimize patients’ outcomes.
Collapse
Affiliation(s)
- Hani Nabeel Mufti
- Division of Cardiac Surgery, Department of Cardiac Sciences, King Faisal Cardiac Center, King Abdulaziz Medical City, Ministry of National Guard Health Affairs - Western Region, Jeddah, Saudi Arabia.,College of Medicine-Jeddah, King Saud bin Abdulaziz University for Health, Ministry of National Guard Health Affairs, Jeddah, Saudi Arabia.,King Abdullah International Medical Research Center, Jeddah, Saudi Arabia
| | | | - Samina Raza Abidi
- Department of Community Health and Epidemiology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
| | - Syed Sibte Raza Abidi
- kNowledge Intensive Computing for Healthcare Enterprise Research Group, Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
| |
Collapse
|
24
|
A Novel Hybrid Genetic-Whale Optimization Model for Ontology Learning from Arabic Text. ALGORITHMS 2019. [DOI: 10.3390/a12090182] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Ontologies are used to model knowledge in several domains of interest, such as the biomedical domain. Conceptualization is the basic task for ontology building. Concepts are identified, and then they are linked through their semantic relationships. Recently, ontologies have constituted a crucial part of modern semantic webs because they can convert a web of documents into a web of things. Although ontology learning generally occupies a large space in computer science, Arabic ontology learning, in particular, is underdeveloped due to the Arabic language’s nature as well as the profundity required in this domain. The previously published research on Arabic ontology learning from text falls into three categories: developing manually hand-crafted rules, using ordinary supervised/unsupervised machine learning algorithms, or a hybrid of these two approaches. The model proposed in this work contributes to Arabic ontology learning in two ways. First, a text mining algorithm is proposed for extracting concepts and their semantic relations from text documents. The algorithm calculates the concept frequency weights using the term frequency weights. Then, it calculates the weights of concept similarity using the information of the ontology structure, involving (1) the concept’s path distance, (2) the concept’s distribution layer, and (3) the mutual parent concept’s distribution layer. Then, feature mapping is performed by assigning the concepts’ similarities to the concept features. Second, a hybrid genetic-whale optimization algorithm was proposed to optimize ontology learning from Arabic text. The operator of the G-WOA is a hybrid operator integrating GA’s mutation, crossover, and selection processes with the WOA’s processes (encircling prey, attacking of bubble-net, and searching for prey) to fulfill the balance between both exploitation and exploration, and to find the solutions that exhibit the highest fitness. For evaluating the performance of the ontology learning approach, extensive comparisons are conducted using different Arabic corpora and bio-inspired optimization algorithms. Furthermore, two publicly available non-Arabic corpora are used to compare the efficiency of the proposed approach with those of other languages. The results reveal that the proposed genetic-whale optimization algorithm outperforms the other compared algorithms across all the Arabic corpora in terms of precision, recall, and F-score measures. Moreover, the proposed approach outperforms the state-of-the-art methods of ontology learning from Arabic and non-Arabic texts in terms of these three measures.
Collapse
|
25
|
Moradi M. CIBS: A biomedical text summarizer using topic-based sentence clustering. J Biomed Inform 2018; 88:53-61. [DOI: 10.1016/j.jbi.2018.11.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2018] [Revised: 09/26/2018] [Accepted: 11/12/2018] [Indexed: 12/21/2022]
|
26
|
Walczak S. The Role of Artificial Intelligence in Clinical Decision Support Systems and a Classification Framework. ACTA ACUST UNITED AC 2018. [DOI: 10.4018/ijccp.2018070103] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Clinical decision support systems are meant to improve the quality of decision-making in healthcare. Artificial intelligence is the science of creating intelligent systems that solve complex problems at the level of or better than human experts. Combining artificial intelligence methods into clinical decision support will enable the utilization of large quantities of data to produce relevant decision-making information to practitioners. This article examines various artificial intelligence methodologies and shows how they may be incorporated into clinical decision-making systems. A framework for describing artificial intelligence applications in clinical decision support systems is presented.
Collapse
|