1
|
Menotti L, Silvello G, Atzori M, Boytcheva S, Ciompi F, Di Nunzio GM, Fraggetta F, Giachelle F, Irrera O, Marchesin S, Marini N, Müller H, Primov T. Modelling digital health data: The ExaMode ontology for computational pathology. J Pathol Inform 2023; 14:100332. [PMID: 37705689 PMCID: PMC10495665 DOI: 10.1016/j.jpi.2023.100332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 07/14/2023] [Accepted: 08/16/2023] [Indexed: 09/15/2023] Open
Abstract
Computational pathology can significantly benefit from ontologies to standardize the employed nomenclature and help with knowledge extraction processes for high-quality annotated image datasets. The end goal is to reach a shared model for digital pathology to overcome data variability and integration problems. Indeed, data annotation in such a specific domain is still an unsolved challenge and datasets cannot be steadily reused in diverse contexts due to heterogeneity issues of the adopted labels, multilingualism, and different clinical practices. Material and methods This paper presents the ExaMode ontology, modeling the histopathology process by considering 3 key cancer diseases (colon, cervical, and lung tumors) and celiac disease. The ExaMode ontology has been designed bottom-up in an iterative fashion with continuous feedback and validation from pathologists and clinicians. The ontology is organized into 5 semantic areas that defines an ontological template to model any disease of interest in histopathology. Results The ExaMode ontology is currently being used as a common semantic layer in: (i) an entity linking tool for the automatic annotation of medical records; (ii) a web-based collaborative annotation tool for histopathology text reports; and (iii) a software platform for building holistic solutions integrating multimodal histopathology data. Discussion The ontology ExaMode is a key means to store data in a graph database according to the RDF data model. The creation of an RDF dataset can help develop more accurate algorithms for image analysis, especially in the field of digital pathology. This approach allows for seamless data integration and a unified query access point, from which we can extract relevant clinical insights about the considered diseases using SPARQL queries.
Collapse
Affiliation(s)
- Laura Menotti
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Gianmaria Silvello
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Manfredo Atzori
- Information Systems Institute, University of Applied Sciences Western Switzerland, Delémont, Switzerland
- Department of Neuroscience, University of Padua, Padova, Italy
| | | | - Francesco Ciompi
- Department of Pathology, Radboud University Medical Center, Nijmegen, The Netherlands
| | | | | | - Fabio Giachelle
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Ornella Irrera
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Stefano Marchesin
- Department of Information Engineering, University of Padua, Padova, Italy
| | - Niccolò Marini
- Information Systems Institute, University of Applied Sciences Western Switzerland, Delémont, Switzerland
| | - Henning Müller
- Information Systems Institute, University of Applied Sciences Western Switzerland, Delémont, Switzerland
| | | |
Collapse
|
2
|
Marini N, Marchesin S, Otálora S, Wodzinski M, Caputo A, van Rijthoven M, Aswolinskiy W, Bokhorst JM, Podareanu D, Petters E, Boytcheva S, Buttafuoco G, Vatrano S, Fraggetta F, van der Laak J, Agosti M, Ciompi F, Silvello G, Muller H, Atzori M. Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations. NPJ Digit Med 2022; 5:102. [PMID: 35869179 PMCID: PMC9307641 DOI: 10.1038/s41746-022-00635-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 06/24/2022] [Indexed: 01/02/2023] Open
Abstract
The digitalization of clinical workflows and the increasing performance of deep learning algorithms are paving the way towards new methods for tackling cancer diagnosis. However, the availability of medical specialists to annotate digitized images and free-text diagnostic reports does not scale with the need for large datasets required to train robust computer-aided diagnosis methods that can target the high variability of clinical cases and data produced. This work proposes and evaluates an approach to eliminate the need for manual annotations to train computer-aided diagnosis tools in digital pathology. The approach includes two components, to automatically extract semantically meaningful concepts from diagnostic reports and use them as weak labels to train convolutional neural networks (CNNs) for histopathology diagnosis. The approach is trained (through 10-fold cross-validation) on 3’769 clinical images and reports, provided by two hospitals and tested on over 11’000 images from private and publicly available datasets. The CNN, trained with automatically generated labels, is compared with the same architecture trained with manual labels. Results show that combining text analysis and end-to-end deep neural networks allows building computer-aided diagnosis tools that reach solid performance (micro-accuracy = 0.908 at image-level) based only on existing clinical data without the need for manual annotations.
Collapse
|
3
|
Marchesin S, Giachelle F, Marini N, Atzori M, Boytcheva S, Buttafuoco G, Ciompi F, Di Nunzio GM, Fraggetta F, Irrera O, Müller H, Primov T, Vatrano S, Silvello G. Empowering Digital Pathology Applications through Explainable Knowledge Extraction Tools. J Pathol Inform 2022; 13:100139. [PMID: 36268087 PMCID: PMC9577130 DOI: 10.1016/j.jpi.2022.100139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/06/2022] [Accepted: 09/07/2022] [Indexed: 11/25/2022] Open
|
4
|
Boytcheva S, Velichkov B, Velchev G, Koychev I. Automatic Generation of Annotated Corpora of Diagnoses with ICD-10 codes based on Open Data and Linked Open Data. Proceedings of the 2020 Federated Conference on Computer Science and Information Systems 2020. [DOI: 10.15439/2020f192] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
5
|
Boytcheva S, Angelova G, Angelov Z, Tcharaktchiev D. Mining comorbidity patterns using retrospective analysis of big collection of outpatient records. Health Inf Sci Syst 2017; 5:3. [PMID: 29038733 PMCID: PMC5622010 DOI: 10.1007/s13755-017-0024-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 09/22/2017] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Studying comorbidities of disorders is important for detection and prevention. For discovering frequent patterns of diseases we can use retrospective analysis of population data, by filtering events with common properties and similar significance. Most frequent pattern mining methods do not consider contextual information about extracted patterns. Further data mining developments might enable more efficient applications in specific tasks like comorbidities identification. METHODS We propose a cascade data mining approach for frequent pattern mining enriched with context information, including a new algorithm MIxCO for maximal frequent patterns mining. Text mining tools extract entities from free text and deliver additional context attributes beyond the structured information about the patients. RESULTS The proposed approach was tested using pseudonymised reimbursement requests (outpatient records) submitted to the Bulgarian National Health Insurance Fund in 2010-2016 for more than 5 million citizens yearly. Experiments were run on 3 data collections. Some known comorbidities of Schizophrenia, Hyperprolactinemia and Diabetes Mellitus Type 2 are confirmed; novel hypotheses about stable comorbidities are generated. The evaluation shows that MIxCO is efficient for big dense datasets. CONCLUSION Explicating maximal frequent itemsets enables to build hypotheses concerning the relationships between the exogeneous and endogeneous factors triggering the formation of these sets. MixCO will help to identify risk groups of patients with a predisposition to develop socially-significant disorders like diabetes. This will turn static archives like the Diabetes Register in Bulgaria to a powerful alerting and predictive framework.
Collapse
Affiliation(s)
- Svetla Boytcheva
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Galia Angelova
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | | | - Dimitar Tcharaktchiev
- Medical University Sofia, University Specialised Hospital for Active Treatment of Endocrinology, Sofia, Bulgaria
| |
Collapse
|
6
|
Boytcheva S, Angelova G, Angelov Z, Tcharaktchiev D. Text Mining and Big Data Analytics for Retrospective Analysis of Clinical Texts from Outpatient Care. Cybernetics and Information Technologies 2015. [DOI: 10.1515/cait-2015-0055] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
This paper presents the results of an on-going research project for knowledge extraction from large corpora of clinical narratives in Bulgarian language, approximately 100 million of outpatient care notes. Entities with numerical values are mined in the free text and the extracted information is stored in a structured format. The Algorithms for retrospective analyses and big data analytics are applied for studying the treatment and evaluating the diabetes compensation and control of arterial blood pressure.
Collapse
Affiliation(s)
- Svetla Boytcheva
- Institute of Information and Communication Technologies, BAS, Sofia, Bulgaria
| | - Galia Angelova
- Institute of Information and Communication Technologies, BAS, Sofia, Bulgaria
| | | | - Dimitar Tcharaktchiev
- Medical University Sofia, University Specialised Hospital for Active Treatment of Endocrinology, Sofia, Bulgaria
| |
Collapse
|
7
|
Boytcheva S. Shallow medication extraction from hospital patient records. Stud Health Technol Inform 2011; 166:119-128. [PMID: 21685617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This paper presents methods for shallow Information Extraction (IE) from the free text zones of hospital Patient Records (PRs) in Bulgarian language in the Patient Safety through Intelligent Procedures in medication (PSIP) project. We extract automatically information about drug names, dosage, modes and frequency and assign the corresponding ATC code to each medication event. Using various modules for rule-based text analysis, our IE components in PSIP perform a significant amount of symbolic computations. We try to address negative statements, elliptical constructions, typical conjunctive phrases, and simple inferences concerning temporal constraints and finally aim at the assignment of the drug ACT code to the extracted medication events, which additionally complicates the extraction algorithm. The prototype of the system was used for experiments with a training corpus containing 1,300 PRs and the evaluation results are obtained using a test corpus containing 6,200 PRs. The extraction accuracy (f-score) for drug names is 98.42% and for dose 93.85%.
Collapse
Affiliation(s)
- Svetla Boytcheva
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, 25A Acad. G. Bonchev Str. Sofia, Bulgaria.
| |
Collapse
|
8
|
Tchraktchiev D, Angelova G, Boytcheva S, Angelov Z, Zacharieva S. Completion of structured patient descriptions by semantic mining. Stud Health Technol Inform 2011; 166:260-269. [PMID: 21685632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
This paper presents experiments in automatic Information Extraction of medication events, diagnoses, and laboratory tests form hospital patient records, in order to increase the completeness of the description of the episode of care. Each patient record in our hospital information system contains structured data and text descriptions, including full discharge letters. From these letters, we extract automatically information about the medication just before and in the time of hospitalization, especially for the drugs prescribed to the patient, but not delivered by the hospital pharmacy; we also extract values of lab tests not performed and not registered in our laboratory as well as all non-encoded diagnoses described only in the free text of discharge letters. Thus we increase the availability of suitable and accurate information about the hospital stay and the outpatient segment of care before the hospitalization. Information Extraction also helps to understand the clinical and organizational decisions concerning the patient without increasing the complexity of the structured health record.
Collapse
Affiliation(s)
- Dimitar Tchraktchiev
- University Specialized Hospital for Active Treatment of Endocrinology (USHATE), Medical University - Sofia.
| | | | | | | | | |
Collapse
|
9
|
Boytcheva S, Tcharaktchiev D, Angelova G. Contextualization in automatic extraction of drugs from hospital patient records. Stud Health Technol Inform 2011; 169:527-531. [PMID: 21893805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Information Extraction (IE) from medical texts aims at the automatic recognition of entities and relations of interests. IE is based on shallow analysis and considers only sentences containing important words. Thus IE of drugs from discharge letters can identify as 'current' some past or future medication events. This article presents heuristic observations enabling to filter drugs that are taken by the patients during the hospitalization. These heuristics are based on the default PR structure and linguistic expressions signaling temporal and conditional markers. They are integrated in a system for drug extraction from hospital Patient Records (PRs) in Bulgarian language. Present evaluation results are summarized as well.
Collapse
Affiliation(s)
- Svetla Boytcheva
- Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | | | | |
Collapse
|
10
|
Ivanov I, Boytcheva S, Mihailova G. Parallel study of thermal resistance and permeability barrier stability of Enterococcus faecalis as affected by salt composition, growth temperature and pre-incubation temperature. J Therm Biol 1999. [DOI: 10.1016/s0306-4565(99)00012-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|