1
|
Newton AJH, Chartash D, Kleinstein SH, McDougal RA. A pipeline for the retrieval and extraction of domain-specific information with application to COVID-19 immune signatures. BMC Bioinformatics 2023; 24:292. [PMID: 37474900 PMCID: PMC10357743 DOI: 10.1186/s12859-023-05397-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 06/23/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND The accelerating pace of biomedical publication has made it impractical to manually, systematically identify papers containing specific information and extract this information. This is especially challenging when the information itself resides beyond titles or abstracts. For emerging science, with a limited set of known papers of interest and an incomplete information model, this is of pressing concern. A timely example in retrospect is the identification of immune signatures (coherent sets of biomarkers) driving differential SARS-CoV-2 infection outcomes. IMPLEMENTATION We built a classifier to identify papers containing domain-specific information from the document embeddings of the title and abstract. To train this classifier with limited data, we developed an iterative process leveraging pre-trained SPECTER document embeddings, SVM classifiers and web-enabled expert review to iteratively augment the training set. This training set was then used to create a classifier to identify papers containing domain-specific information. Finally, information was extracted from these papers through a semi-automated system that directly solicited the paper authors to respond via a web-based form. RESULTS We demonstrate a classifier that retrieves papers with human COVID-19 immune signatures with a positive predictive value of 86%. The type of immune signature (e.g., gene expression vs. other types of profiling) was also identified with a positive predictive value of 74%. Semi-automated queries to the corresponding authors of these publications requesting signature information achieved a 31% response rate. CONCLUSIONS Our results demonstrate the efficacy of using a SVM classifier with document embeddings of the title and abstract, to retrieve papers with domain-specific information, even when that information is rarely present in the abstract. Targeted author engagement based on classifier predictions offers a promising pathway to build a semi-structured representation of such information. Through this approach, partially automated literature mining can help rapidly create semi-structured knowledge repositories for automatic analysis of emerging health threats.
Collapse
Affiliation(s)
- Adam J H Newton
- Department of Physiology and Pharmacology, SUNY Downstate Health Sciences University, Brooklyn, NY, 11203, USA
- Yale Center for Medical Informatics, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06511, USA
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
| | - David Chartash
- Yale Center for Medical Informatics, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06511, USA
- School of Medicine, University College Dublin - National University of Ireland, Dublin, Co. Dublin, Republic of Ireland
| | - Steven H Kleinstein
- Department of Pathology, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Department of Immunobiology, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
| | - Robert A McDougal
- Yale Center for Medical Informatics, Yale School of Medicine, Yale University, New Haven, CT, 06511, USA.
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, 06511, USA.
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
2
|
Abstract
Pretrained language models (PLMs) have demonstrated strong performance on many natural language processing (NLP) tasks. Despite their great success, these PLMs are typically pretrained only on unstructured free texts without leveraging existing structured knowledge bases that are readily available for many domains, especially scientific domains. As a result, these PLMs may not achieve satisfactory performance on knowledge-intensive tasks such as biomedical NLP. Comprehending a complex biomedical document without domain-specific knowledge is challenging, even for humans. Inspired by this observation, we propose a general framework for incorporating various types of domain knowledge from multiple sources into biomedical PLMs. We encode domain knowledge using lightweight adapter modules, bottleneck feed-forward networks that are inserted into different locations of a backbone PLM. For each knowledge source of interest, we pretrain an adapter module to capture the knowledge in a self-supervised way. We design a wide range of self-supervised objectives to accommodate diverse types of knowledge, ranging from entity relations to description sentences. Once a set of pretrained adapters is available, we employ fusion layers to combine the knowledge encoded within these adapters for downstream tasks. Each fusion layer is a parameterized mixer of the available trained adapters that can identify and activate the most useful adapters for a given input. Our method diverges from prior work by including a knowledge consolidation phase, during which we teach the fusion layers to effectively combine knowledge from both the original PLM and newly-acquired external knowledge using a large collection of unannotated texts. After the consolidation phase, the complete knowledge-enhanced model can be fine-tuned for any downstream task of interest to achieve optimal performance. Extensive experiments on many biomedical NLP datasets show that our proposed framework consistently improves the performance of the underlying PLMs on various downstream tasks such as natural language inference, question answering, and entity linking. These results demonstrate the benefits of using multiple sources of external knowledge to enhance PLMs and the effectiveness of the framework for incorporating knowledge into PLMs. While primarily focused on the biomedical domain in this work, our framework is highly adaptable and can be easily applied to other domains, such as the bioenergy sector.
Collapse
Affiliation(s)
- Tuan Manh Lai
- Computer Science Department, University of Illinois Urbana-Champaign, 201 N. Goodwin Ave, Urbana, 61801, IL, United States.
| | - ChengXiang Zhai
- Computer Science Department, University of Illinois Urbana-Champaign, 201 N. Goodwin Ave, Urbana, 61801, IL, United States
| | - Heng Ji
- Computer Science Department, University of Illinois Urbana-Champaign, 201 N. Goodwin Ave, Urbana, 61801, IL, United States
| |
Collapse
|
3
|
Pereira A, Almeida JR, Lopes RP, Oliveira JL. Querying semantic catalogues of biomedical databases. J Biomed Inform 2023; 137:104272. [PMID: 36563828 DOI: 10.1016/j.jbi.2022.104272] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 11/03/2022] [Accepted: 12/12/2022] [Indexed: 12/24/2022]
Abstract
BACKGROUND Secondary use of health data is a valuable source of knowledge that boosts observational studies, leading to important discoveries in the medical and biomedical sciences. The fundamental guiding principle for performing a successful observational study is the research question and the approach in advance of executing a study. However, in multi-centre studies, finding suitable datasets to support the study is challenging, time-consuming, and sometimes impossible without a deep understanding of each dataset. METHODS We propose a strategy for retrieving biomedical datasets of interest that were semantically annotated, using an interface built by applying a methodology for transforming natural language questions into formal language queries. The advantages of creating biomedical semantic data are enhanced by using natural language interfaces to issue complex queries without manipulating a logical query language. RESULTS Our methodology was validated using Alzheimer's disease datasets published in a European platform for sharing and reusing biomedical data. We converted data to semantic information format using biomedical ontologies in everyday use in the biomedical community and published it as a FAIR endpoint. We have considered natural language questions of three types: single-concept questions, questions with exclusion criteria, and multi-concept questions. Finally, we analysed the performance of the question-answering module we used and its limitations. The source code is publicly available at https://bioinformatics-ua.github.io/BioKBQA/. CONCLUSION We propose a strategy for using information extracted from biomedical data and transformed into a semantic format using open biomedical ontologies. Our method uses natural language to formulate questions to be answered by this semantic data without the direct use of formal query languages.
Collapse
Affiliation(s)
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Rui Pedro Lopes
- CeDRI, Polytechnic Institute of Bragança, Bragança, Portugal.
| | | |
Collapse
|
4
|
Denton N, Mulberg AE, Molloy M, Charleston S, Fajgenbaum DC, Marsh ED, Howard P. Sharing is caring: a call for a new era of rare disease research and development. Orphanet J Rare Dis 2022; 17:389. [PMID: 36303170 PMCID: PMC9612604 DOI: 10.1186/s13023-022-02529-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 08/05/2022] [Accepted: 10/02/2022] [Indexed: 01/25/2023] Open
Abstract
Scientific advances in the understanding of the genetics and mechanisms of many rare diseases with previously unknown etiologies are inspiring optimism in the patient, clinical, and research communities and there is hope that disease-specific treatments are on the way. However, the rare disease community has reached a critical point in which its increasingly fragmented structure and operating models are threatening its ability to harness the full potential of advancing genomic and computational technologies. Changes are therefore needed to overcome these issues plaguing many rare diseases while also supporting economically viable therapy development. In "Data silos are undermining drug development and failing rare disease patients (Orphanet Journal of Rare Disease, Apr 2021)," we outlined many of the broad issues underpinning the increasingly fragmented and siloed nature of the rare disease space, as well as how the issues encountered by this community are representative of biomedical research more generally. Here, we propose several initiatives for key stakeholders - including regulators, private and public foundations, and research institutions - to reorient the rare disease ecosystem and its incentives in a way that we believe would cultivate and accelerate innovation. Specifically, we propose supporting non-proprietary patient registries, greater data standardization, global regulatory harmonization, and new business models that encourage data sharing and research collaboration as the default mode. Leadership needs to be integrated across sectors to drive meaningful change between patients, industry, sponsors, and academic medical centers. To transform the research and development landscape and unlock its vast healthcare, economic, and scientific potential for rare disease patients, a new model is ultimately the goal for all.
Collapse
Affiliation(s)
- Nathan Denton
- grid.25879.310000 0004 1936 8972Gene Therapy Program, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA ,grid.427771.00000 0004 0619 7027Amicus Therapeutics, Philadelphia, PA 19104 USA
| | | | - Monique Molloy
- grid.25879.310000 0004 1936 8972Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Samantha Charleston
- grid.25879.310000 0004 1936 8972Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - David C. Fajgenbaum
- grid.25879.310000 0004 1936 8972Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA ,grid.25879.310000 0004 1936 8972Translational Medicine & Human Genetics, Perelman School of Medicine, University of Pennsylvania, Pennsylvania, PA 19104 USA
| | - Eric D. Marsh
- grid.25879.310000 0004 1936 8972Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA ,grid.25879.310000 0004 1936 8972Departments of Neurology and Pediatrics, Perelman School of Medicine, Children’s Hospital of Philadelphia, University of Pennsylvania, Philadelphia, PA 19104 USA ,grid.239552.a0000 0001 0680 8770Division of Neurology, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.427771.00000 0004 0619 7027Amicus Therapeutics, Philadelphia, PA 19104 USA
| | - Paul Howard
- grid.427771.00000 0004 0619 7027Amicus Therapeutics, Philadelphia, PA 19104 USA
| |
Collapse
|
5
|
Abstract
Background Data quality assessment is important but complex and task dependent. Identifying suitable measurement methods and reference ranges for assessing their results is challenging. Manually inspecting the measurement results and current data driven approaches for learning which results indicate data quality issues have considerable limitations, e.g. to identify task dependent thresholds for measurement results that indicate data quality issues. Objectives To explore the applicability and potential benefits of a data driven approach to learn task dependent knowledge about suitable measurement methods and assessment of their results. Such knowledge could be useful for others to determine whether a local data stock is suitable for a given task. Methods We started by creating artificial data with previously defined data quality issues and applied a set of generic measurement methods on this data (e.g. a method to count the number of values in a certain variable or the mean value of the values). We trained decision trees on exported measurement methods’ results and corresponding outcome data (data that indicated the data’s suitability for a use case). For evaluation, we derived rules for potential measurement methods and reference values from the decision trees and compared these regarding their coverage of the true data quality issues artificially created in the dataset. Three researchers independently derived these rules. One with knowledge about present data quality issues and two without. Results Our self-trained decision trees were able to indicate rules for 12 of 19 previously defined data quality issues. Learned knowledge about measurement methods and their assessment was complementary to manual interpretation of measurement methods’ results. Conclusions Our data driven approach derives sensible knowledge for task dependent data quality assessment and complements other current approaches. Based on labeled measurement methods’ results as training data, our approach successfully suggested applicable rules for checking data quality characteristics that determine whether a dataset is suitable for a given task. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01656-x.
Collapse
Affiliation(s)
- Erik Tute
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany.
| | - Nagarajan Ganapathy
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany
| | - Antje Wulff
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Carl-Neuberg-Str. 1, 30625, Hannover, Germany
| |
Collapse
|
6
|
Steiner B, Saalfeld B, Elgert L, Haux R, Wolf KH. OnTARi: an ontology for factors influencing therapy adherence to rehabilitation. BMC Med Inform Decis Mak 2021; 21:153. [PMID: 33975585 PMCID: PMC8111729 DOI: 10.1186/s12911-021-01512-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/28/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Adherence and motivation are key factors for successful treatment of patients with chronic diseases, especially in long-term care processes like rehabilitation. However, only a few patients achieve good treatment adherence. The causes are manifold. Adherence-influencing factors vary depending on indications, therapies, and individuals. Positive and negative effects are rarely confirmed or even contradictory. An ontology seems to be convenient to represent existing knowledge in this domain and to make it available for information retrieval. METHODS First, a manual data extraction of current knowledge in the domain of treatment adherence in rehabilitation was conducted. Data was retrieved from various sources, including basic literature, scientific publications, and health behavior models. Second, all adherence and motivation factors identified were formalized according to the ontology development methodology METHONTOLOGY. This comprises the specification, conceptualization, formalization, and implementation of the ontology "Ontology for factors influencing therapy adherence to rehabilitation" (OnTARi) in Protégé. A taxonomy-oriented evaluation was conducted by two domain experts. RESULTS OnTARi includes 281 classes implemented in ontology web language, ten object properties, 22 data properties, 1440 logical axioms, 244 individuals, and 1023 annotations. Six higher-level classes are differentiated: (1) Adherence, (2) AdherenceFactors, (3) AdherenceFactorCategory, (4) Rehabilitation, (5) RehabilitationForm, and (6) RehabilitationType. By means of the class AdherenceFactors 227 adherence factors, thereof 49 hard factors, are represented. Each factor involves a proper description, synonyms, possibly existing acronyms, and a German translation. OnTARi illustrates links between adherence factors through 160 influences-relations. Description logic queries implemented in Protégé allow multiple targeted requests, e.g., for the extraction of adherence factors in a specific rehabilitation area. CONCLUSIONS With OnTARi, a generic reference model was built to represent potential adherence and motivation factors and their interrelations in rehabilitation of patients with chronic diseases. In terms of information retrieval, this formalization can serve as a basis for implementation and adaptation of conventional rehabilitative measures, taking into account (patient-specific) adherence factors. OnTARi also enables the development of medical assistance systems to increase motivation and adherence in rehabilitation processes.
Collapse
Affiliation(s)
- Bianca Steiner
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Germany.
| | - Birgit Saalfeld
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover, Germany
| | - Lena Elgert
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover, Germany
| | - Reinhold Haux
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Braunschweig, Germany
| | - Klaus-Hendrik Wolf
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover, Germany
| |
Collapse
|
7
|
Denton N, Molloy M, Charleston S, Lipset C, Hirsch J, Mulberg AE, Howard P, Marsh ED. Data silos are undermining drug development and failing rare disease patients. Orphanet J Rare Dis 2021; 16:161. [PMID: 33827602 PMCID: PMC8025897 DOI: 10.1186/s13023-021-01806-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/30/2021] [Indexed: 11/10/2022] Open
Abstract
Data silos are proliferating while research and development activity explode following genetic and immunological advances for many clinically described disorders with previously unknown etiologies. The latter event has inspired optimism in the patient, clinical, and research communities that disease-specific treatments are on the way. However, we fear the tendency of various stakeholders to balkanize databases in proprietary formats, driven by current economic and academic incentives, will inevitably fragment the expanding knowledge base and undermine current and future research efforts to develop much-needed treatments. The proliferation of proprietary databases, compounded by a paucity of meaningful outcome measures and/or good natural history data, slows our ability to generate scalable solutions to benefit chronically underserved patient populations in ways that would translate to more common diseases. The current research and development landscape sets too many projects up for unnecessary failure, particularly in the rare disease sphere, and does a grave disservice to highly vulnerable patients. This system also encourages the collection of redundant data in uncoordinated parallel studies and registries to ultimately delay or deny potential treatments for ostensibly tractable diseases; it also promotes the waste of precious time, energy, and resources. Groups at the National Institutes of Health and Food and Drug Administration have started programs to address these issues. However, we and many others feel there should be significantly more discussion of how to coordinate and scale registry efforts. Such discourse aims to reduce needless complexity and duplication of efforts, as well as promote a pre-competitive knowledge ecosystem for rare disease drug development that cultivates and accelerates innovation.
Collapse
Affiliation(s)
- Nathan Denton
- Gene Therapy Program, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Monique Molloy
- Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Samantha Charleston
- Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - Jonathan Hirsch
- Syapse, San Francisco, CA, USA.,Bios Ventures, San Francisco, CA, USA
| | | | - Paul Howard
- Amicus Therapeutics, Philadelphia, PA, 19104, USA.
| | - Eric D Marsh
- Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA. .,Departments of Neurology and Pediatrics, Children's Hospital of Philadelphia, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA. .,Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
| |
Collapse
|
8
|
Denton N, Molloy M, Charleston S, Lipset C, Hirsch J, Mulberg AE, Howard P, Marsh ED. Data silos are undermining drug development and failing rare disease patients. Orphanet J Rare Dis 2021. [PMID: 33827602 DOI: 10.1186/s13023-021-01806-4)] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/30/2023] Open
Abstract
Data silos are proliferating while research and development activity explode following genetic and immunological advances for many clinically described disorders with previously unknown etiologies. The latter event has inspired optimism in the patient, clinical, and research communities that disease-specific treatments are on the way. However, we fear the tendency of various stakeholders to balkanize databases in proprietary formats, driven by current economic and academic incentives, will inevitably fragment the expanding knowledge base and undermine current and future research efforts to develop much-needed treatments. The proliferation of proprietary databases, compounded by a paucity of meaningful outcome measures and/or good natural history data, slows our ability to generate scalable solutions to benefit chronically underserved patient populations in ways that would translate to more common diseases. The current research and development landscape sets too many projects up for unnecessary failure, particularly in the rare disease sphere, and does a grave disservice to highly vulnerable patients. This system also encourages the collection of redundant data in uncoordinated parallel studies and registries to ultimately delay or deny potential treatments for ostensibly tractable diseases; it also promotes the waste of precious time, energy, and resources. Groups at the National Institutes of Health and Food and Drug Administration have started programs to address these issues. However, we and many others feel there should be significantly more discussion of how to coordinate and scale registry efforts. Such discourse aims to reduce needless complexity and duplication of efforts, as well as promote a pre-competitive knowledge ecosystem for rare disease drug development that cultivates and accelerates innovation.
Collapse
Affiliation(s)
- Nathan Denton
- Gene Therapy Program, Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | - Monique Molloy
- Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Samantha Charleston
- Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - Jonathan Hirsch
- Syapse, San Francisco, CA, USA
- Bios Ventures, San Francisco, CA, USA
| | | | - Paul Howard
- Amicus Therapeutics, Philadelphia, PA, 19104, USA.
| | - Eric D Marsh
- Department of Medicine, Orphan Disease Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Departments of Neurology and Pediatrics, Children's Hospital of Philadelphia, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
| |
Collapse
|
9
|
Abstract
BACKGROUND Assessing the quality of healthcare data is a complex task including the selection of suitable measurement methods (MM) and adequately assessing their results. OBJECTIVES To present an interoperable data quality (DQ) assessment method that formalizes MMs based on standardized data definitions and intends to support collaborative governance of DQ-assessment knowledge, e.g. which MMs to apply and how to assess their results in different situations. METHODS We describe and explain central concepts of our method using the example of its first real world application in a study on predictive biomarkers for rejection and other injuries of kidney transplants. We applied our open source tool-openCQA-that implements our method utilizing the openEHR specifications. Means to support collaborative governance of DQ-assessment knowledge are the version-control system git and openEHR clinical information models. RESULTS Applying the method on the study's dataset showed satisfactory practicability of the described concepts and produced useful results for DQ-assessment. CONCLUSIONS The main contribution of our work is to provide applicable concepts and a tested exemplary open source implementation for interoperable and knowledge-based DQ-assessment in healthcare that considers the need for flexible task and domain specific requirements.
Collapse
Affiliation(s)
- Erik Tute
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| | - Irina Scheffner
- Department of Nephrology, Hannover Medical School, Hannover, Germany
| | - Michael Marschollek
- Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany
| |
Collapse
|
10
|
Gerdesköld C, Toth-Pal E, Wårdh I, Nilsson GH, Nager A. Use of online knowledge base in primary health care and correlation to health care quality: an observational study. BMC Med Inform Decis Mak 2020; 20:294. [PMID: 33198720 PMCID: PMC7670813 DOI: 10.1186/s12911-020-01313-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Accepted: 10/30/2020] [Indexed: 11/10/2022] Open
Abstract
Background Evidence-based information available at the point of care improves patient care outcomes. Online knowledge bases can increase the application of evidence-based medicine and influence patient outcome data which may be captured in quality registries. The aim of this study was to explore the effect of use of an online knowledge base on patient experiences and health care quality. Methods The study was conducted as a retrospective, observational study of 24 primary health care centers in Sweden exploring their use of an online knowledge base. Frequency of use was compared to patient outcomes in two national quality registries. A socio-economic Care Need Index was applied to assess whether the burden of care influenced the results from those quality registries. Non-parametric statistical methods and linear regression were used. Results Frequency of knowledge base use showed two groups: frequent and non-frequent users, with a significant use difference between the groups (p < 0.001). Outcome data showed significant higher values for all seven National Primary Care Patient Survey dimensions in the frequent compared to the non-frequent knowledge base users (p < 0.001), whereas 10 out of 11 parameters in the National Diabetes Register showed no differences between the groups (p > 0.05). Adjusting for Care Need Index had almost no effect on the outcomes for the groups. Conclusions Frequent users of a national online knowledge base received higher ratings on patient experiences, but figures on health care quality in diabetes showed near to no correlation. The findings indicate that some effects may be attributed to the use of knowledge bases and requires a controlled evaluation.
Collapse
Affiliation(s)
- Christian Gerdesköld
- Division of Family Medicine and Primary Care, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Alfred Nobels Allé 23 D2, 141 83, Stockholm, Sweden.
| | - Eva Toth-Pal
- Division of Family Medicine and Primary Care, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Alfred Nobels Allé 23 D2, 141 83, Stockholm, Sweden.,Academic Primary Health Care Centre, Region Stockholm, Sweden
| | - Inger Wårdh
- Department of Dental Medicine, Academic Centre of Geriatric Dentistry, Karolinska Institutet, Stockholm, Sweden
| | - Gunnar H Nilsson
- Division of Family Medicine and Primary Care, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Alfred Nobels Allé 23 D2, 141 83, Stockholm, Sweden.,Academic Primary Health Care Centre, Region Stockholm, Sweden
| | - Anna Nager
- Division of Family Medicine and Primary Care, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Alfred Nobels Allé 23 D2, 141 83, Stockholm, Sweden.,Medibas, Bonnier Healthcare Sweden, Stockholm, Sweden
| |
Collapse
|
11
|
Abstract
In this paper, we tell the story of efforts currently underway, on diverse fronts, to build digital knowledge repositories ('knowledge-bases') to support research in the life sciences. If successful, knowledge bases will be part of a new knowledge infrastructure-capable of facilitating ever-more comprehensive, computational models of biological systems. Such an infrastructure would, however, represent a sea-change in the technological management and manipulation of complex data, inducing a generational shift in how questions are asked and answered and results published and circulated. Integrating such knowledge bases into the daily workflow of the lab thus destabilizes a number of well-established habits which biologists rely on to ensure the quality of the knowledge they produce, evaluate, communicate and exploit. As the story we tell here shows, such destabilization introduces a situation of unfamiliarity, one that carries with it epistemic risks. It should elicit-to use Niklas Luhmann's terms-the question of trust: a shared recognition that the reliability of research practices is being risked, but that such a risk is worth taking in view of what may be gained. And yet, the problem of trust is being unexpectedly silenced. How that silencing has come about, why it matters, and what might yet be done forms the heart of this paper.
Collapse
Affiliation(s)
- Rune Nydal
- Programme for Applied Ethics, Department of Philosophy and Religious Studies, Norwegian University of Science and Technology, NO- 7491 Trondheim, Norway
| | - Gaymon Bennett
- School of Historical, Philosophical, and Religious Studies, Arizona State University, Tempe, AZ 85287-4302 USA
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway
| |
Collapse
|
12
|
Silsand L, Severinsen GH, Pedersen R, Ellingsen G. Preconditions for Enabling Advanced Patient-Centered Decision Support on a National Knowledge Information Infrastructure. Stud Health Technol Inform 2019; 264:1773-1774. [PMID: 31438337 DOI: 10.3233/shti190641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In Western healthcare, an important goal is to provide clinical decision support "for the right healthcare personnel, in the right situation, at the right time". In this poster, we use a qualitative approach to outline the preconditions for enabling such advanced patient-centered decision support. This study indicates that establishing a national knowledge information infrastructure demands well-defined national standards, codes, and terminologies, as well as structured clinical data. An extensive governance structure is also required.
Collapse
Affiliation(s)
- Line Silsand
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| | - Gro-Hilde Severinsen
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
| | - Rune Pedersen
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway.,Telemedicine and E-health Research Group, The Arctic University of Norway, Tromsø, Norway
| | - Gunnar Ellingsen
- Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway.,Telemedicine and E-health Research Group, The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
13
|
Zhu R, Han S, Su Y, Zhang C, Yu Q, Duan Z. The application of big data and the development of nursing science: A discussion paper. Int J Nurs Sci 2019; 6:229-34. [PMID: 31406897 DOI: 10.1016/j.ijnss.2019.03.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 01/20/2019] [Accepted: 03/04/2019] [Indexed: 11/23/2022] Open
Abstract
Based on the concept and research status of big data, we analyze and examine the importance of constructing the knowledge system of nursing science for the development of the nursing discipline in the context of big data and propose that it is necessary to establish big data centers for nursing science to share resources, unify language standards, improve professional nursing databases, and establish a knowledge system structure.
Collapse
|
14
|
Lenert MC, Walsh CG, Miller RA. Discovering hidden knowledge through auditing clinical diagnostic knowledge bases. J Biomed Inform 2018; 84:75-81. [PMID: 29940263 DOI: 10.1016/j.jbi.2018.06.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 06/19/2018] [Accepted: 06/21/2018] [Indexed: 11/21/2022]
Abstract
OBJECTIVE Evaluate potential for data mining auditing techniques to identify hidden concepts in diagnostic knowledge bases (KB). Improving completeness enhances KB applications such as differential diagnosis and patient case simulation. MATERIALS AND METHODS Authors used unsupervised (Pearson's correlation - PC, Kendall's correlation - KC, and a heuristic algorithm - HA) methods to identify existing and discover new finding-finding interrelationships ("properties") in the INTERNIST-1/QMR KB. Authors estimated KB maintenance efficiency gains (effort reduction) of the approaches. RESULTS The methods discovered new properties at 95% CI rates of [0.1%, 5.4%] (PC), [2.8%, 12.5%] (KC), and [5.6%, 18.8%] (HA). Estimated manual effort reduction for HA-assisted determination of new properties was approximately 50-fold. CONCLUSION Data mining can provide an efficient supplement to ensuring the completeness of finding-finding interdependencies in diagnostic knowledge bases. Authors' findings should be applicable to other diagnostic systems that record finding frequencies within diseases (e.g., DXplain, ISABEL).
Collapse
|
15
|
Maiella S, Olry A, Hanauer M, Lanneau V, Lourghi H, Donadille B, Rodwell C, Köhler S, Seelow D, Jupp S, Parkinson H, Groza T, Brudno M, Robinson PN, Rath A. Harmonising phenomics information for a better interoperability in the rare disease field. Eur J Med Genet 2018; 61:706-714. [PMID: 29425702 DOI: 10.1016/j.ejmg.2018.01.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 11/30/2017] [Accepted: 01/27/2018] [Indexed: 01/30/2023]
Abstract
HIPBI-RD (Harmonising phenomics information for a better interoperability in the rare disease field) is a three-year project which started in 2016 funded via the E-Rare 3 ERA-NET program. This project builds on three resources largely adopted by the rare disease (RD) community: Orphanet, its ontology ORDO (the Orphanet Rare Disease Ontology), HPO (the Human Phenotype Ontology) as well as PhenoTips software for the capture and sharing of structured phenotypic data for RD patients. Our project is further supported by resources developed by the European Bioinformatics Institute and the Garvan Institute. HIPBI-RD aims to provide the community with an integrated, RD-specific bioinformatics ecosystem that will harmonise the way phenomics information is stored in databases and patient files worldwide, and thereby contribute to interoperability. This ecosystem will consist of a suite of tools and ontologies, optimized to work together, and made available through commonly used software repositories. The project workplan follows three main objectives: The HIPBI-RD ecosystem will contribute to the interpretation of variants identified through exome and full genome sequencing by harmonising the way phenotypic information is collected, thus improving diagnostics and delineation of RD. The ultimate goal of HIPBI-RD is to provide a resource that will contribute to bridging genome-scale biology and a disease-centered view on human pathobiology. Achievements in Year 1.
Collapse
Affiliation(s)
- Sylvie Maiella
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Annie Olry
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Marc Hanauer
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Valérie Lanneau
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Halima Lourghi
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Bruno Donadille
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Charlotte Rodwell
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France
| | - Sebastian Köhler
- NeuroCure Cluster of Excellence, Charité Universitätsklinikum, Charitéplatz 1, 10117 Berlin, Germany
| | - Dominik Seelow
- NeuroCure Cluster of Excellence, Charité Universitätsklinikum, Charitéplatz 1, 10117 Berlin, Germany
| | - Simon Jupp
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Helen Parkinson
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Tudor Groza
- Kinghorn Centre for Clinical Genomics, Garvan Institute for Medical Research, Darlinghurst, NSW, Australia
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto M5S 1A1, Canada
| | - Peter N Robinson
- Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Ana Rath
- INSERM, US14 - Orphanet, Plateforme Maladies Rares, 75014 Paris, France.
| |
Collapse
|
16
|
Abstract
Background The provision of health and wellness care is undergoing an enormous transformation. A key element of this revolution consists in prioritizing prevention and proactivity based on the analysis of people’s conducts and the empowerment of individuals in their self-management. Digital technologies are unquestionably destined to be the main engine of this change, with an increasing number of domain-specific applications and devices commercialized every year; however, there is an apparent lack of frameworks capable of orchestrating and intelligently leveraging, all the data, information and knowledge generated through these systems. Methods This work presents Mining Minds, a novel framework that builds on the core ideas of the digital health and wellness paradigms to enable the provision of personalized support. Mining Minds embraces some of the most prominent digital technologies, ranging from Big Data and Cloud Computing to Wearables and Internet of Things, as well as modern concepts and methods, such as context-awareness, knowledge bases or analytics, to holistically and continuously investigate on people’s lifestyles and provide a variety of smart coaching and support services. Results This paper comprehensively describes the efficient and rational combination and interoperation of these technologies and methods through Mining Minds, while meeting the essential requirements posed by a framework for personalized health and wellness support. Moreover, this work presents a realization of the key architectural components of Mining Minds, as well as various exemplary user applications and expert tools to illustrate some of the potential services supported by the proposed framework. Conclusions Mining Minds constitutes an innovative holistic means to inspect human behavior and provide personalized health and wellness support. The principles behind this framework uncover new research ideas and may serve as a reference for similar initiatives.
Collapse
Affiliation(s)
- Oresti Banos
- Department of Computer Engineering, Kyung Hee University, 1732 Deokyoungdae-ro, Giheung-ug, Yongin-si, 446-701, Korea
| | - Muhammad Bilal Amin
- Department of Computer Engineering, Kyung Hee University, 1732 Deokyoungdae-ro, Giheung-ug, Yongin-si, 446-701, Korea
| | - Wajahat Ali Khan
- Department of Computer Engineering, Kyung Hee University, 1732 Deokyoungdae-ro, Giheung-ug, Yongin-si, 446-701, Korea
| | - Muhammad Afzal
- Department of Computer Engineering, Kyung Hee University, 1732 Deokyoungdae-ro, Giheung-ug, Yongin-si, 446-701, Korea
| | - Maqbool Hussain
- Department of Computer Engineering, Kyung Hee University, 1732 Deokyoungdae-ro, Giheung-ug, Yongin-si, 446-701, Korea
| | - Byeong Ho Kang
- School of Computing and Information Systems, University of Tasmania, Churchill Avenue Hobart, Tasmania, 7005, Australia
| | - Sungyong Lee
- Department of Computer Engineering, Kyung Hee University, 1732 Deokyoungdae-ro, Giheung-ug, Yongin-si, 446-701, Korea.
| |
Collapse
|
17
|
Abstract
The invention of fictional ideas (ideation) is often a central process in the creative production of artefacts such as poems, music and paintings, but has barely been studied in the computational creativity community.
We present here a general approach to automated fictional ideation that works by manipulating facts specified in knowledge bases. More specifically, we specify a number of constructions which, by altering and combining facts from a knowledge base, result in the generation of fictions. Moreover, we present an instantiation of these constructions through the use of ConceptNet, a database of common sense knowledge. In order to evaluate the success of these constructions, we present a curation analysis that calculates the proportion of ideas which pass a typicality judgement. We further evaluate the output of this approach through a crowd-sourcing experiment in which participants were asked to rank ideas. We found a positive correlation between the participant’s rankings and a chaining inference technique that automatically assesses the value of the fictions generated through our approach. We believe that these results show that this approach constitutes a firm basis for automated fictional ideation with evaluative capacity.
Collapse
Affiliation(s)
| | - Simon Colton
- Department of Computing, Goldsmiths, University of London, London, UK
| | - Rose Hepworth
- Department of Computing, Goldsmiths, University of London, London, UK
| | - Jeremy Gow
- Department of Computing, Goldsmiths, University of London, London, UK
| |
Collapse
|
18
|
McCoy AB, Wright A, Rogith D, Fathiamini S, Ottenbacher AJ, Sittig DF. Development of a clinician reputation metric to identify appropriate problem-medication pairs in a crowdsourced knowledge base. J Biomed Inform 2013; 48:66-72. [PMID: 24321170 DOI: 10.1016/j.jbi.2013.11.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Revised: 10/23/2013] [Accepted: 11/29/2013] [Indexed: 02/08/2023]
Abstract
BACKGROUND Correlation of data within electronic health records is necessary for implementation of various clinical decision support functions, including patient summarization. A key type of correlation is linking medications to clinical problems; while some databases of problem-medication links are available, they are not robust and depend on problems and medications being encoded in particular terminologies. Crowdsourcing represents one approach to generating robust knowledge bases across a variety of terminologies, but more sophisticated approaches are necessary to improve accuracy and reduce manual data review requirements. OBJECTIVE We sought to develop and evaluate a clinician reputation metric to facilitate the identification of appropriate problem-medication pairs through crowdsourcing without requiring extensive manual review. APPROACH We retrieved medications from our clinical data warehouse that had been prescribed and manually linked to one or more problems by clinicians during e-prescribing between June 1, 2010 and May 31, 2011. We identified measures likely to be associated with the percentage of accurate problem-medication links made by clinicians. Using logistic regression, we created a metric for identifying clinicians who had made greater than or equal to 95% appropriate links. We evaluated the accuracy of the approach by comparing links made by those physicians identified as having appropriate links to a previously manually validated subset of problem-medication pairs. RESULTS Of 867 clinicians who asserted a total of 237,748 problem-medication links during the study period, 125 had a reputation metric that predicted the percentage of appropriate links greater than or equal to 95%. These clinicians asserted a total of 2464 linked problem-medication pairs (983 distinct pairs). Compared to a previously validated set of problem-medication pairs, the reputation metric achieved a specificity of 99.5% and marginally improved the sensitivity of previously described knowledge bases. CONCLUSION A reputation metric may be a valuable measure for identifying high quality clinician-entered, crowdsourced data.
Collapse
Affiliation(s)
- Allison B McCoy
- The University of Texas School of Biomedical Informatics at Houston, 7000 Fannin St., Ste. 600, Houston, TX 70030, USA.
| | - Adam Wright
- Brigham and Women's Hospital, Harvard Medical School, 1620 Tremont St., Boston, MA 02115, USA.
| | - Deevakar Rogith
- The University of Texas School of Biomedical Informatics at Houston, 7000 Fannin St., Ste. 600, Houston, TX 70030, USA.
| | - Safa Fathiamini
- The University of Texas School of Biomedical Informatics at Houston, 7000 Fannin St., Ste. 600, Houston, TX 70030, USA.
| | - Allison J Ottenbacher
- The University of Texas Medical School at Houston, 6410 Fannin St., Ste. 1100, Houston, TX 77030, USA.
| | - Dean F Sittig
- The University of Texas School of Biomedical Informatics at Houston, 7000 Fannin St., Ste. 600, Houston, TX 70030, USA.
| |
Collapse
|