1
|
Ferrão JC, Oliveira MD, Janela F, Martins HMG. Preprocessing structured clinical data for predictive modeling and decision support. A roadmap to tackle the challenges. Appl Clin Inform 2016; 7:1135-1153. [PMID: 27924347 PMCID: PMC5228148 DOI: 10.4338/aci-2016-03-soa-0035] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 10/01/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND EHR systems have high potential to improve healthcare delivery and management. Although structured EHR data generates information in machine-readable formats, their use for decision support still poses technical challenges for researchers due to the need to preprocess and convert data into a matrix format. During our research, we observed that clinical informatics literature does not provide guidance for researchers on how to build this matrix while avoiding potential pitfalls. OBJECTIVES This article aims to provide researchers a roadmap of the main technical challenges of preprocessing structured EHR data and possible strategies to overcome them. METHODS Along standard data processing stages - extracting database entries, defining features, processing data, assessing feature values and integrating data elements, within an EDPAI framework -, we identified the main challenges faced by researchers and reflect on how to address those challenges based on lessons learned from our research experience and on best practices from related literature. We highlight the main potential sources of error, present strategies to approach those challenges and discuss implications of these strategies. RESULTS Following the EDPAI framework, researchers face five key challenges: (1) gathering and integrating data, (2) identifying and handling different feature types, (3) combining features to handle redundancy and granularity, (4) addressing data missingness, and (5) handling multiple feature values. Strategies to address these challenges include: cross-checking identifiers for robust data retrieval and integration; applying clinical knowledge in identifying feature types, in addressing redundancy and granularity, and in accommodating multiple feature values; and investigating missing patterns adequately. CONCLUSIONS This article contributes to literature by providing a roadmap to inform structured EHR data preprocessing. It may advise researchers on potential pitfalls and implications of methodological decisions in handling structured data, so as to avoid biases and help realize the benefits of the secondary use of EHR data.
Collapse
Affiliation(s)
- José Carlos Ferrão
- José Carlos Ferrão, Rua Irmãos Siemens 1, Ed. 3 Piso 3, 2720-093 Amadora, Portugal, Email address: , Telephone: (+351) 214 178 668, Fax: (+351) 214 178 030
| | | | | | | |
Collapse
|
2
|
Basole RC, Braunstein ML, Sun J. Data and Analytics Challenges for a Learning Healthcare System. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2015. [DOI: 10.1145/2755489] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
| | | | - Jimeng Sun
- Georgia Institute of Technology, Atlanta, Georgia
| |
Collapse
|
3
|
Bettencourt-Silva JH, Clark J, Cooper CS, Mills R, Rayward-Smith VJ, de la Iglesia B. Building Data-Driven Pathways From Routinely Collected Hospital Data: A Case Study on Prostate Cancer. JMIR Med Inform 2015; 3:e26. [PMID: 26162314 PMCID: PMC4526987 DOI: 10.2196/medinform.4221] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 04/25/2015] [Accepted: 04/27/2015] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed. OBJECTIVE The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer. METHODS Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways. RESULTS The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the computation of quality indicators and dimensions. A novel graphical representation of the pathways allows the synthesis of such information. CONCLUSIONS Clinical pathways built from routinely collected hospital data can unearth information about patients and diseases that may otherwise be unavailable or overlooked in hospitals. Data-driven clinical pathways allow for heterogeneous data (ie, semistructured and unstructured data) to be collated over a unified data model and for data quality dimensions to be assessed. This work has enabled further research on prostate cancer and its biomarkers, and on the development and application of methods to mine, compare, analyze, and visualize pathways constructed from routine data. This is an important development for the reuse of big data in hospitals.
Collapse
|
4
|
Abstract
The exponential growth of experimental and clinical data generated from systematic studies, the complexity in health and diseases, and the request for the establishment of systems models are bringing bioinformatics to the center stage of pharmacogenomics and systems biology. Bioinformatics plays an essential role in bridging the gap among different knowledge domains for the translation of the voluminous data into better diagnosis, prognosis, prevention, and treatment. Bioinformatics is essential in finding the spatiotemporal patterns in pharmacogenomics, including the time-series analyses of the associations between genetic structural variations and functional alterations such as drug responses. The elucidation of the cross talks among different systems levels and time scales can contribute to the discovery of accurate and robust biomarkers at various diseases stages for the development of systems and dynamical medicine. Various resources are available for such purposes, including databases and tools supporting "omics" studies such as genomics, proteomics, epigenomics, transcriptomics, metabolomics, lipidomics, pharmacogenomics, and chronomics. The combination of bioinformatics and health informatics methods would provide powerful decision support in both scientific and clinical environments. Data integration, data mining, and knowledge discovery (KD) methods would enable the simulation of complex systems and dynamical networks to establish predictive models for achieving predictive, preventive, and personalized medicine.
Collapse
Affiliation(s)
- Qing Yan
- PharmTao, 5672, 4601 Lafayette Street, Santa Clara, CA, 95056-5672, USA,
| |
Collapse
|
5
|
Sengupta D, Sood M, Vijayvargia P, Hota S, Naik PK. Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor. Bioinformation 2013; 9:555-9. [PMID: 23888095 PMCID: PMC3717182 DOI: 10.6026/97320630009555] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 06/16/2013] [Indexed: 11/23/2022] Open
Abstract
Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of
an individual. Mining knowledge and providing scientific decision-making for the diagnosis & treatment of disease from the
clinical dataset is therefore increasingly becoming necessary. Aim of this study was to assess the applicability of knowledge
discovery in brain tumor data warehouse, applying data mining techniques for investigation of clinical parameters that can be
associated with occurrence of brain tumor. In this study, a brain tumor warehouse was developed comprising of clinical data for
550 patients. Apriori association rule algorithm was applied to discover associative rules among the clinical parameters. The rules
discovered in the study suggests - high values of Creatinine, Blood Urea Nitrogen (BUN), SGOT & SGPT to be directly associated
with tumor occurrence for patients in the primary stage with atleast 85% confidence and more than 50% support. A normalized
regression model is proposed based on these parameters along with Haemoglobin content, Alkaline Phosphatase and Serum
Bilirubin for prediction of occurrence of STATE (brain tumor) as 0 (absent) or 1 (present). The results indicate that the
methodology followed will be of good value for the diagnostic procedure of brain tumor, especially when large data volumes are
involved and screening based on discovered parameters would allow clinicians to detect tumors at an early stage of development.
Collapse
Affiliation(s)
- Dipankar Sengupta
- Dept. of Biotechnology & Bioinformatics, Jaypee University of Information Technology, Waknaghat, Solan, H.P., India
| | | | | | | | | |
Collapse
|
6
|
Bettencourt-Silva J, De La Iglesia B, Donell S, Rayward-Smith V. On creating a patient-centric database from multiple Hospital Information Systems. Methods Inf Med 2011; 51:210-20. [PMID: 21818520 DOI: 10.3414/me10-01-0069] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 05/16/2011] [Indexed: 11/09/2022]
Abstract
BACKGROUND The information present in Hospital Information Systems (HIS) is heterogeneous and is used primarily by health practitioners to support and improve patient care. Conducting clinical research, data analyses or knowledge discovery projects using electronic patient data in secondary care centres relies on accurate data collection, which is often an ad-hoc process poorly described in the literature. OBJECTIVES This paper aims at facilitating and expanding on the process of retrieving and collating patient-centric data from multiple HIS for the purpose of creating a research database. The development of a process roadmap for this purpose illustrates and exposes the constraints and drawbacks of undertaking such work in secondary care centres. METHODS A data collection exercise was carried using a combined approach based on segments of well established data mining and knowledge discovery methodologies, previous work on clinical data integration and local expert consultation. A case study on prostate cancer was carried out at an English regional National Health Service (NHS) hospital. RESULTS The process for data retrieval described in this paper allowed patient-centric data, pertaining to the case study on prostate cancer, to be successfully collected from multiple heterogeneous hospital sources, and collated in a format suitable for further clinical research. CONCLUSIONS The data collection exercise described in this paper exposes the lengthy and difficult journey of retrieving and collating patient-centric, multi-source data from a hospital, which is indeed a non-trivial task, and one which will greatly benefit from further attention from researchers and hospital IT management.
Collapse
Affiliation(s)
- J Bettencourt-Silva
- School of Computing Sciences, University of East Anglia, Norwich, United Kingdom.
| | | | | | | |
Collapse
|
7
|
Sharov AA. Functional Information: Towards Synthesis of Biosemiotics and Cybernetics. ENTROPY (BASEL, SWITZERLAND) 2010; 12:1050-1070. [PMID: 22368439 PMCID: PMC3285384 DOI: 10.3390/e12051050] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Biosemiotics and cybernetics are closely related, yet they are separated by the boundary between life and non-life: biosemiotics is focused on living organisms, whereas cybernetics is applied mostly to non-living artificial devices. However, both classes of systems are agents that perform functions necessary for reaching their goals. I propose to shift the focus of biosemiotics from living organisms to agents in general, which all belong to a pragmasphere or functional universe. Agents should be considered in the context of their hierarchy and origin because their semiosis can be inherited or induced by higher-level agents. To preserve and disseminate their functions, agents use functional information - a set of signs that encode and control their functions. It includes stable memory signs, transient messengers, and natural signs. The origin and evolution of functional information is discussed in terms of transitions between vegetative, animal, and social levels of semiosis, defined by Kull. Vegetative semiosis differs substantially from higher levels of semiosis, because signs are recognized and interpreted via direct code-based matching and are not associated with ideal representations of objects. Thus, I consider a separate classification of signs at the vegetative level that includes proto-icons, proto-indexes, and proto-symbols. Animal and social semiosis are based on classification, and modeling of objects, which represent the knowledge of agents about their body (Innenwelt) and environment (Umwelt).
Collapse
Affiliation(s)
- Alexei A. Sharov
- National Institute on Aging, 251 Bayview Boulevard, Baltimore, MD 21224, USA
| |
Collapse
|
8
|
Zhou X, Chen S, Liu B, Zhang R, Wang Y, Li P, Guo Y, Zhang H, Gao Z, Yan X. Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif Intell Med 2010; 48:139-52. [PMID: 20122820 DOI: 10.1016/j.artmed.2009.07.012] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2008] [Revised: 07/22/2009] [Accepted: 07/23/2009] [Indexed: 01/14/2023]
Abstract
OBJECTIVE Traditional Chinese medicine (TCM) is a scientific discipline, which develops the related theories from the long-term clinical practices. The large-scale clinical data are the core empirical knowledge source for TCM research. This paper introduces a clinical data warehouse (CDW) system, which incorporates the structured electronic medical record (SEMR) data for medical knowledge discovery and TCM clinical decision support (CDS). MATERIALS AND METHODS We have developed the clinical reference information model (RIM) and physical data model to manage the various information entities and their relationships in TCM clinical data. An extraction-transformation-loading (ETL) tool is implemented to integrate and normalize the clinical data from different operational data sources. The CDW includes online analytical processing (OLAP) and complex network analysis (CNA) components to explore the various clinical relationships. Furthermore, the data mining and CNA methods are used to discover the valuable clinical knowledge from the data. RESULTS The CDW has integrated 20,000 TCM inpatient data and 20,000 outpatient data, which contains manifestations (e.g. symptoms, physical examinations and laboratory test results), diagnoses and prescriptions as the main information components. We propose a practical solution to accomplish the large-scale clinical data integration and preprocessing tasks. Meanwhile, we have developed over 400 OLAP reports to enable the multidimensional analysis of clinical data and the case-based CDS. We have successfully conducted several interesting data mining applications. Particularly, we use various classification methods, namely support vector machine, decision tree and Bayesian network, to discover the knowledge of syndrome differentiation. Furthermore, we have applied association rule and CNA to extract the useful acupuncture point and herb combination patterns from the clinical prescriptions. CONCLUSION A CDW system consisting of TCM clinical RIM, ETL, OLAP and data mining as the core components has been developed to facilitate the tasks of TCM knowledge discovery and CDS. We have conducted several OLAP and data mining tasks to explore the empirical knowledge from the TCM clinical data. The CDW platform would be a promising infrastructure to make full use of the TCM clinical data for scientific hypothesis generation, and promote the development of TCM from individualized empirical knowledge to large-scale evidence-based medicine.
Collapse
Affiliation(s)
- Xuezhong Zhou
- School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Colombo G, Merico D, Boncoraglio G, De Paoli F, Ellul J, Frisoni G, Nagy Z, van der Lugt A, Vassányi I, Antoniotti M. An ontological modeling approach to cerebrovascular disease studies: the NEUROWEB case. J Biomed Inform 2010; 43:469-84. [PMID: 20074662 DOI: 10.1016/j.jbi.2009.12.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2008] [Revised: 10/29/2009] [Accepted: 12/21/2009] [Indexed: 10/20/2022]
Abstract
The NEUROWEB project supports cerebrovascular researchers' association studies, intended as the search for statistical correlations between a feature (e.g., a genotype) and a phenotype. In this project the phenotype refers to the patients' pathological state, and thus it is formulated on the basis of the clinical data collected during the diagnostic activity. In order to enhance the statistical robustness of the association inquiries, the project involves four European Union clinical institutions. Each institution provides its proprietary repository, storing patients' data. Although all sites comply with common diagnostic guidelines, they also adopt specific protocols, resulting in partially discrepant repository contents. Therefore, in order to effectively exploit NEUROWEB data for association studies, it is necessary to provide a framework for the phenotype formulation, grounded on the clinical repository content which explicitly addresses the inherent integration problem. To that end, we developed an ontological model for cerebrovascular phenotypes, the NEUROWEB Reference Ontology, composed of three layers. The top-layer (Top Phenotypes) is an expert-based cerebrovascular disease taxonomy. The middle-layer deconstructs the Top Phenotypes into more elementary phenotypes (Low Phenotypes) and general-use medical concepts such as anatomical parts and topological concepts. The bottom-layer (Core Data Set, or CDS) comprises the clinical indicators required for cerebrovascular disorder diagnosis. Low Phenotypes are connected to the bottom-layer (CDS) by specifying what combination of CDS values is required for their existence. Finally, CDS elements are mapped to the local repositories of clinical data. The NEUROWEB system exploits the Reference Ontology to query the different repositories and to retrieve patients characterized by a common phenotype.
Collapse
Affiliation(s)
- Gianluca Colombo
- Dipartimento di Informatica, Sistemistica e Comunicazione (DISCo), Università degli Studi di Milano Bicocca, U14 Viale Sarca 336, I-20126 Milan, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Yan Q. Translational bioinformatics and systems biology approaches for personalized medicine. Methods Mol Biol 2010; 662:167-178. [PMID: 20824471 DOI: 10.1007/978-1-60761-800-3_8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Systems biology and pharmacogenomics are emerging and promising fields that will provide a thorough understanding of diseases and enable personalized therapy. However, one of the most significant obstacles in the practice of personalized medicine is the translation of scientific discoveries into better therapeutic outcomes. Translational bioinformatics is a powerful method to bridge the gap between systems biology research and clinical practice. This goal can be achieved through providing integrative methods to enable predictive models for therapeutic responses. As a media between bench and bedside, translational bioinformatics has the mission to meet challenges in the development of personalized medicine. On the biomedical side, translational bioinformatics would enable the identification of biomarkers based on systemic analyses. It can improve the understanding of the correlations between genotypes and phenotypes. It would enable novel insights of interactions and interrelationships among different parts in a whole system. On the informatics side, methods based on data integration, data mining, and knowledge representation can provide decision support for both researchers and clinicians. Data integration is not only for better data access, but also for knowledge discovery. Decision support based on translational bioinformatics means better information and workflow management, efficient literature and resource retrieval, and communication improvement. These approaches are crucial for understanding diseases and applying personalized therapeutics at systems levels.
Collapse
|
11
|
Sinha A, Hripcsak G, Markatou M. Large datasets in biomedicine: a discussion of salient analytic issues. J Am Med Inform Assoc 2009; 16:759-67. [PMID: 19717808 PMCID: PMC3002128 DOI: 10.1197/jamia.m2780] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2008] [Accepted: 08/02/2009] [Indexed: 11/10/2022] Open
Abstract
Advances in high-throughput and mass-storage technologies have led to an information explosion in both biology and medicine, presenting novel challenges for analysis and modeling. With regards to multivariate analysis techniques such as clustering, classification, and regression, large datasets present unique and often misunderstood challenges. The authors' goal is to provide a discussion of the salient problems encountered in the analysis of large datasets as they relate to modeling and inference to inform a principled and generalizable analysis and highlight the interdisciplinary nature of these challenges. The authors present a detailed study of germane issues including high dimensionality, multiple testing, scientific significance, dependence, information measurement, and information management with a focus on appropriate methodologies available to address these concerns. A firm understanding of the challenges and statistical technology involved ultimately contributes to better science. The authors further suggest that the community consider facilitating discussion through interdisciplinary panels, invited papers and curriculum enhancement to establish guidelines for analysis and reporting.
Collapse
Affiliation(s)
- Anshu Sinha
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY
| | | |
Collapse
|
12
|
Maldonado JA, Moner D, Boscá D, Fernández-Breis JT, Angulo C, Robles M. LinkEHR-Ed: A multi-reference model archetype editor based on formal semantics. Int J Med Inform 2009; 78:559-70. [DOI: 10.1016/j.ijmedinf.2009.03.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2008] [Revised: 03/09/2009] [Accepted: 03/15/2009] [Indexed: 10/20/2022]
|
13
|
Viangteeravat T, Brooks IM, Smith EJ, Furlotte N, Vuthipadadon S, Reynolds R, McDonald CS. Slim-prim: a biomedical informatics database to promote translational research. PERSPECTIVES IN HEALTH INFORMATION MANAGEMENT 2009; 6:6. [PMID: 19471646 PMCID: PMC2682606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
With the current national emphasis on translational research, data-exchange systems that can bridge the basic and clinical sciences are vital. To meet this challenge, we have developed Slim-Prim, an integrated data system (IDS) for collecting, processing, archiving, and distributing basic and clinical research data. Slim-Prim is accessed via user-friendly Web-based applications, thus increasing data accessibility and eliminating the security risks inherent with office or laboratory servers. Slim-Prim serves as a laboratory management interface and archival data repository for institutional projects. Importantly, multiple levels of controlled access allow HIPAA-compliant sharing of de-identified information to facilitate data sharing and analysis across research domains; thus Slim-Prim encourages collaboration between researchers and clinicians, an essential factor in the development of translational research. Slim-Prim is an example of utilizing an IDS to improve organizational efficiency and to bridge the gap between laboratory discovery and practice.
Collapse
|
14
|
Manning M, Aggarwal A, Gao K, Tucker-Kellogg G. Scaling the walls of discovery: using semantic metadata for integrative problem solving. Brief Bioinform 2009; 10:164-76. [DOI: 10.1093/bib/bbp007] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
|
15
|
Wang X, Liu L, Fackenthal J, Cummings S, Cook M, Hope K, Silverstein JC, Olopade OI. Translational integrity and continuity: personalized biomedical data integration. J Biomed Inform 2009; 42:100-12. [PMID: 18760382 PMCID: PMC2675887 DOI: 10.1016/j.jbi.2008.08.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2008] [Revised: 08/04/2008] [Accepted: 08/05/2008] [Indexed: 12/18/2022]
Abstract
Translational research data are generated in multiple research domains from the bedside to experimental laboratories. These data are typically stored in heterogeneous databases, held by segregated research domains, and described with inconsistent terminologies. Such inconsistency and fragmentation of data significantly impedes the efficiency of tracking and analyzing human-centered records. To address this problem, we have developed a data repository and management system named TraM (http://tram.uchicago.edu), based on a domain ontology integrated entity relationship model. The TraM system has the flexibility to recruit dynamically evolving domain concepts and the ability to support data integration for a broad range of translational research. The web-based application interfaces of TraM allow curators to improve data quality and provide robust and user-friendly cross-domain query functions. In its current stage, TraM relies on a semi-automated mechanism to standardize and restructure source data for data integration and thus does not support real-time data application.
Collapse
Affiliation(s)
- Xiaoming Wang
- Biomedical Informatics Core, Computation Institute, University of Chicago
- Computation Institute, University of Chicago
| | - Lili Liu
- Biomedical Informatics Core, Computation Institute, University of Chicago
- Computation Institute, University of Chicago
| | - James Fackenthal
- Center for Clinical Cancer Genetics, Department of Medicine, University of Chicago
| | - Shelly Cummings
- Center for Clinical Cancer Genetics, Department of Medicine, University of Chicago
| | - Maggie Cook
- Center for Clinical Cancer Genetics, Department of Medicine, University of Chicago
| | - Kisha Hope
- Center for Clinical Cancer Genetics, Department of Medicine, University of Chicago
| | - Jonathan C. Silverstein
- Biomedical Informatics Core, Computation Institute, University of Chicago
- Computation Institute, University of Chicago
| | | |
Collapse
|
16
|
Campillo Artero C. [Integration of information for health interventions: from data to information and from information to action. 2008 SESPAS Report]. GACETA SANITARIA 2008; 22 Suppl 1:14-8. [PMID: 18405548 DOI: 10.1016/s0213-9111(08)76070-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Spanish public health and health services information systems (HIS) have improved, but are still fragmented by areas of interest and have evolved independently from one another. Their format, structure, integrity and data quality vary widely, as do the programs, platforms and databases that support them. The latest innovations focus on isolated HIS and are hampered by obsolete models, tools, functionalities, and the inertial demand of information. Transfer of responsibilities without minimal agreements on HIS has eroded their national cohesion and, along with the absence of exchanges on experiences of computerization on a national basis, has weakened us, given the supply of immature computer applications. The evolution of HIS must be governed by integration. We have to redefine their strategic and operational objectives, review existing data and information, and determine the single identification of specific persons and patients. Variables, indicators, services and control panels should be reviewed and systematized through a single shared nomenclature. Personal health records and administrative and clinical registries should become the primary sources of health information data. Data collection, mechanization, registration and exploitation, and their quality control and maintenance, should be redefined regardless of setting. A national agreement is urgently required on the minimal functionalities of HIS, while respecting their technical nature and management by autonomous governments.
Collapse
|
17
|
Mohammed SL, Lehmann HP, Kim GR. A proposed taxonomy for characterization and assessment of avian influenza outbreaks. Int J Med Inform 2008; 78:182-92. [PMID: 18805050 DOI: 10.1016/j.ijmedinf.2008.06.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2007] [Revised: 06/28/2008] [Accepted: 06/30/2008] [Indexed: 11/19/2022]
Abstract
PURPOSE The speed and high potential impact of avian influenza's (AI) on local bird populations, poultry economies and human health make timely and coordinated characterization, assessment and response to possible threats essential. To collaborate effectively, stakeholders (public health, medical, veterinary, and agricultural professionals) must be able to communicate and record findings, assessments, and actions in a standard fashion. We seek to discern a taxonomy of concepts and relationships that are important to the stakeholder community when sharing information about the characterization and assessment of an AI outbreak, according to a consistent and common perspective, interpretation, and level of detail. METHODS To derive concepts relevant to AI characterization and assessment, we reviewed selected journal articles, reporting and laboratory forms, and public health websites associated with AI case reporting. We mapped concepts to existing medical terminologies within the Unified Medical Language System when possible, using the National Library of Medicine's MetaMap program. RESULTS From 54 distinct information sources, we extracted 1113 concepts, of which 533 mapped to 15 medical terminologies; 580 did not map to specific terminologies. Using a combination of semantic type-relationship matching and expert consensus, we constructed the proposed taxonomy, with linkages to existing terminologies where pragmatic. CONCLUSION The proposed taxonomy describes core knowledge, data and communication needs for the characterization and assessment of AI outbreaks in the context of existing medical terminologies across different domains. We also describe areas for further work.
Collapse
Affiliation(s)
- Sule L Mohammed
- Division of Health Sciences Informatics, School of Medicine, Johns Hopkins University, 2024 East Monument Street, Suite 1-201, Baltimore, MD 21205, USA.
| | | | | |
Collapse
|
18
|
Deus HF, Stanislaus R, Veiga DF, Behrens C, Wistuba II, Minna JD, Garner HR, Swisher SG, Roth JA, Correa AM, Broom B, Coombes K, Chang A, Vogel LH, Almeida JS. A Semantic Web management model for integrative biomedical informatics. PLoS One 2008; 3:e2946. [PMID: 18698353 PMCID: PMC2491554 DOI: 10.1371/journal.pone.0002946] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 07/12/2008] [Indexed: 11/19/2022] Open
Abstract
Background Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data. Methodology/Principal Findings The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MDAnderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management. Conclusions/Significance The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis.
Collapse
Affiliation(s)
- Helena F. Deus
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Lisboa, Portugal
| | - Romesh Stanislaus
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Diogo F. Veiga
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Carmen Behrens
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Ignacio I. Wistuba
- Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Department of Pathology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - John D. Minna
- Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Harold R. Garner
- Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Center for Biomedical Inventions, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Stephen G. Swisher
- Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Jack A. Roth
- Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Arlene M. Correa
- Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Bradley Broom
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Kevin Coombes
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Allen Chang
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
| | - Lynn H. Vogel
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
| | - Jonas S. Almeida
- Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
19
|
Burgun A, Bodenreider O. Accessing and integrating data and knowledge for biomedical research. Yearb Med Inform 2008:91-101. [PMID: 18660883 PMCID: PMC2553094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
OBJECTIVES To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues. METHODS Using examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge. RESULTS New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies. CONCLUSION As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research.
Collapse
Affiliation(s)
- A Burgun
- Département d'Information Médicale, CHU Pontchaillou, rue Henri Le Guilloux, F-35033 Rennes Cedex, France.
| | | |
Collapse
|