Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Brazhnik O, Jones JF. Anatomy of data integration. J Biomed Inform 2007;40:252-69. [PMID: 17071142 PMCID: PMC2094006 DOI: 10.1016/j.jbi.2006.09.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Revised: 09/11/2006] [Accepted: 09/19/2006] [Indexed: 01/23/2023]

For:	Brazhnik O, Jones JF. Anatomy of data integration. J Biomed Inform 2007;40:252-69. [PMID: 17071142 PMCID: PMC2094006 DOI: 10.1016/j.jbi.2006.09.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2005] [Revised: 09/11/2006] [Accepted: 09/19/2006] [Indexed: 01/23/2023]

Number

Cited by Other Article(s)

Ferrão JC, Oliveira MD, Janela F, Martins HMG. Preprocessing structured clinical data for predictive modeling and decision support. A roadmap to tackle the challenges. Appl Clin Inform 2016;7:1135-1153. [PMID: 27924347 PMCID: PMC5228148 DOI: 10.4338/aci-2016-03-soa-0035] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 10/01/2016] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

EHR systems have high potential to improve healthcare delivery and management. Although structured EHR data generates information in machine-readable formats, their use for decision support still poses technical challenges for researchers due to the need to preprocess and convert data into a matrix format. During our research, we observed that clinical informatics literature does not provide guidance for researchers on how to build this matrix while avoiding potential pitfalls.

OBJECTIVES

This article aims to provide researchers a roadmap of the main technical challenges of preprocessing structured EHR data and possible strategies to overcome them.

METHODS

Along standard data processing stages - extracting database entries, defining features, processing data, assessing feature values and integrating data elements, within an EDPAI framework -, we identified the main challenges faced by researchers and reflect on how to address those challenges based on lessons learned from our research experience and on best practices from related literature. We highlight the main potential sources of error, present strategies to approach those challenges and discuss implications of these strategies.

RESULTS

Following the EDPAI framework, researchers face five key challenges: (1) gathering and integrating data, (2) identifying and handling different feature types, (3) combining features to handle redundancy and granularity, (4) addressing data missingness, and (5) handling multiple feature values. Strategies to address these challenges include: cross-checking identifiers for robust data retrieval and integration; applying clinical knowledge in identifying feature types, in addressing redundancy and granularity, and in accommodating multiple feature values; and investigating missing patterns adequately.

CONCLUSIONS

This article contributes to literature by providing a roadmap to inform structured EHR data preprocessing. It may advise researchers on potential pitfalls and implications of methodological decisions in handling structured data, so as to avoid biases and help realize the benefits of the secondary use of EHR data.

Collapse

Basole RC, Braunstein ML, Sun J. Data and Analytics Challenges for a Learning Healthcare System. ACM JOURNAL OF DATA AND INFORMATION QUALITY 2015. [DOI: 10.1145/2755489] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Bettencourt-Silva JH, Clark J, Cooper CS, Mills R, Rayward-Smith VJ, de la Iglesia B. Building Data-Driven Pathways From Routinely Collected Hospital Data: A Case Study on Prostate Cancer. JMIR Med Inform 2015;3:e26. [PMID: 26162314 PMCID: PMC4526987 DOI: 10.2196/medinform.4221] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 04/25/2015] [Accepted: 04/27/2015] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Routinely collected data in hospitals is complex, typically heterogeneous, and scattered across multiple Hospital Information Systems (HIS). This big data, created as a byproduct of health care activities, has the potential to provide a better understanding of diseases, unearth hidden patterns, and improve services and cost. The extent and uses of such data rely on its quality, which is not consistently checked, nor fully understood. Nevertheless, using routine data for the construction of data-driven clinical pathways, describing processes and trends, is a key topic receiving increasing attention in the literature. Traditional algorithms do not cope well with unstructured processes or data, and do not produce clinically meaningful visualizations. Supporting systems that provide additional information, context, and quality assurance inspection are needed.

OBJECTIVE

The objective of the study is to explore how routine hospital data can be used to develop data-driven pathways that describe the journeys that patients take through care, and their potential uses in biomedical research; it proposes a framework for the construction, quality assessment, and visualization of patient pathways for clinical studies and decision support using a case study on prostate cancer.

METHODS

Data pertaining to prostate cancer patients were extracted from a large UK hospital from eight different HIS, validated, and complemented with information from the local cancer registry. Data-driven pathways were built for each of the 1904 patients and an expert knowledge base, containing rules on the prostate cancer biomarker, was used to assess the completeness and utility of the pathways for a specific clinical study. Software components were built to provide meaningful visualizations for the constructed pathways.

RESULTS

The proposed framework and pathway formalism enable the summarization, visualization, and querying of complex patient-centric clinical information, as well as the computation of quality indicators and dimensions. A novel graphical representation of the pathways allows the synthesis of such information.

CONCLUSIONS

Clinical pathways built from routinely collected hospital data can unearth information about patients and diseases that may otherwise be unavailable or overlooked in hospitals. Data-driven clinical pathways allow for heterogeneous data (ie, semistructured and unstructured data) to be collated over a unified data model and for data quality dimensions to be assessed. This work has enabled further research on prostate cancer and its biomarkers, and on the development and application of methods to mine, compare, analyze, and visualize pathways constructed from routine data. This is an important development for the reuse of big data in hospitals.

Collapse

Yan Q. Translational bioinformatics approaches for systems and dynamical medicine. Methods Mol Biol 2014;1175:19-34. [PMID: 25150864 DOI: 10.1007/978-1-4939-0956-8_2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Sengupta D, Sood M, Vijayvargia P, Hota S, Naik PK. Association rule mining based study for identification of clinical parameters akin to occurrence of brain tumor. Bioinformation 2013;9:555-9. [PMID: 23888095 PMCID: PMC3717182 DOI: 10.6026/97320630009555] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 06/16/2013] [Indexed: 11/23/2022] Open

Bettencourt-Silva J, De La Iglesia B, Donell S, Rayward-Smith V. On creating a patient-centric database from multiple Hospital Information Systems. Methods Inf Med 2011;51:210-20. [PMID: 21818520 DOI: 10.3414/me10-01-0069] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 05/16/2011] [Indexed: 11/09/2022]

Sharov AA. Functional Information: Towards Synthesis of Biosemiotics and Cybernetics. ENTROPY (BASEL, SWITZERLAND) 2010;12:1050-1070. [PMID: 22368439 PMCID: PMC3285384 DOI: 10.3390/e12051050] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Zhou X, Chen S, Liu B, Zhang R, Wang Y, Li P, Guo Y, Zhang H, Gao Z, Yan X. Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif Intell Med 2010;48:139-52. [PMID: 20122820 DOI: 10.1016/j.artmed.2009.07.012] [Citation(s) in RCA: 154] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2008] [Revised: 07/22/2009] [Accepted: 07/23/2009] [Indexed: 01/14/2023]

Abstract

OBJECTIVE

Traditional Chinese medicine (TCM) is a scientific discipline, which develops the related theories from the long-term clinical practices. The large-scale clinical data are the core empirical knowledge source for TCM research. This paper introduces a clinical data warehouse (CDW) system, which incorporates the structured electronic medical record (SEMR) data for medical knowledge discovery and TCM clinical decision support (CDS).

MATERIALS AND METHODS

We have developed the clinical reference information model (RIM) and physical data model to manage the various information entities and their relationships in TCM clinical data. An extraction-transformation-loading (ETL) tool is implemented to integrate and normalize the clinical data from different operational data sources. The CDW includes online analytical processing (OLAP) and complex network analysis (CNA) components to explore the various clinical relationships. Furthermore, the data mining and CNA methods are used to discover the valuable clinical knowledge from the data.

RESULTS

The CDW has integrated 20,000 TCM inpatient data and 20,000 outpatient data, which contains manifestations (e.g. symptoms, physical examinations and laboratory test results), diagnoses and prescriptions as the main information components. We propose a practical solution to accomplish the large-scale clinical data integration and preprocessing tasks. Meanwhile, we have developed over 400 OLAP reports to enable the multidimensional analysis of clinical data and the case-based CDS. We have successfully conducted several interesting data mining applications. Particularly, we use various classification methods, namely support vector machine, decision tree and Bayesian network, to discover the knowledge of syndrome differentiation. Furthermore, we have applied association rule and CNA to extract the useful acupuncture point and herb combination patterns from the clinical prescriptions.

CONCLUSION

A CDW system consisting of TCM clinical RIM, ETL, OLAP and data mining as the core components has been developed to facilitate the tasks of TCM knowledge discovery and CDS. We have conducted several OLAP and data mining tasks to explore the empirical knowledge from the TCM clinical data. The CDW platform would be a promising infrastructure to make full use of the TCM clinical data for scientific hypothesis generation, and promote the development of TCM from individualized empirical knowledge to large-scale evidence-based medicine.

Collapse

Colombo G, Merico D, Boncoraglio G, De Paoli F, Ellul J, Frisoni G, Nagy Z, van der Lugt A, Vassányi I, Antoniotti M. An ontological modeling approach to cerebrovascular disease studies: the NEUROWEB case. J Biomed Inform 2010;43:469-84. [PMID: 20074662 DOI: 10.1016/j.jbi.2009.12.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2008] [Revised: 10/29/2009] [Accepted: 12/21/2009] [Indexed: 10/20/2022]

Abstract

The NEUROWEB project supports cerebrovascular researchers' association studies, intended as the search for statistical correlations between a feature (e.g., a genotype) and a phenotype. In this project the phenotype refers to the patients' pathological state, and thus it is formulated on the basis of the clinical data collected during the diagnostic activity. In order to enhance the statistical robustness of the association inquiries, the project involves four European Union clinical institutions. Each institution provides its proprietary repository, storing patients' data. Although all sites comply with common diagnostic guidelines, they also adopt specific protocols, resulting in partially discrepant repository contents. Therefore, in order to effectively exploit NEUROWEB data for association studies, it is necessary to provide a framework for the phenotype formulation, grounded on the clinical repository content which explicitly addresses the inherent integration problem. To that end, we developed an ontological model for cerebrovascular phenotypes, the NEUROWEB Reference Ontology, composed of three layers. The top-layer (Top Phenotypes) is an expert-based cerebrovascular disease taxonomy. The middle-layer deconstructs the Top Phenotypes into more elementary phenotypes (Low Phenotypes) and general-use medical concepts such as anatomical parts and topological concepts. The bottom-layer (Core Data Set, or CDS) comprises the clinical indicators required for cerebrovascular disorder diagnosis. Low Phenotypes are connected to the bottom-layer (CDS) by specifying what combination of CDS values is required for their existence. Finally, CDS elements are mapped to the local repositories of clinical data. The NEUROWEB system exploits the Reference Ontology to query the different repositories and to retrieve patients characterized by a common phenotype.

Collapse

Yan Q. Translational bioinformatics and systems biology approaches for personalized medicine. Methods Mol Biol 2010;662:167-178. [PMID: 20824471 DOI: 10.1007/978-1-60761-800-3_8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Sinha A, Hripcsak G, Markatou M. Large datasets in biomedicine: a discussion of salient analytic issues. J Am Med Inform Assoc 2009;16:759-67. [PMID: 19717808 PMCID: PMC3002128 DOI: 10.1197/jamia.m2780] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2008] [Accepted: 08/02/2009] [Indexed: 11/10/2022] Open

Maldonado JA, Moner D, Boscá D, Fernández-Breis JT, Angulo C, Robles M. LinkEHR-Ed: A multi-reference model archetype editor based on formal semantics. Int J Med Inform 2009;78:559-70. [DOI: 10.1016/j.ijmedinf.2009.03.006] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2008] [Revised: 03/09/2009] [Accepted: 03/15/2009] [Indexed: 10/20/2022]

Viangteeravat T, Brooks IM, Smith EJ, Furlotte N, Vuthipadadon S, Reynolds R, McDonald CS. Slim-prim: a biomedical informatics database to promote translational research. PERSPECTIVES IN HEALTH INFORMATION MANAGEMENT 2009;6:6. [PMID: 19471646 PMCID: PMC2682606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Manning M, Aggarwal A, Gao K, Tucker-Kellogg G. Scaling the walls of discovery: using semantic metadata for integrative problem solving. Brief Bioinform 2009;10:164-76. [DOI: 10.1093/bib/bbp007] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Wang X, Liu L, Fackenthal J, Cummings S, Cook M, Hope K, Silverstein JC, Olopade OI. Translational integrity and continuity: personalized biomedical data integration. J Biomed Inform 2009;42:100-12. [PMID: 18760382 PMCID: PMC2675887 DOI: 10.1016/j.jbi.2008.08.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2008] [Revised: 08/04/2008] [Accepted: 08/05/2008] [Indexed: 12/18/2022]

Campillo Artero C. [Integration of information for health interventions: from data to information and from information to action. 2008 SESPAS Report]. GACETA SANITARIA 2008;22 Suppl 1:14-8. [PMID: 18405548 DOI: 10.1016/s0213-9111(08)76070-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Mohammed SL, Lehmann HP, Kim GR. A proposed taxonomy for characterization and assessment of avian influenza outbreaks. Int J Med Inform 2008;78:182-92. [PMID: 18805050 DOI: 10.1016/j.ijmedinf.2008.06.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2007] [Revised: 06/28/2008] [Accepted: 06/30/2008] [Indexed: 11/19/2022]

Deus HF, Stanislaus R, Veiga DF, Behrens C, Wistuba II, Minna JD, Garner HR, Swisher SG, Roth JA, Correa AM, Broom B, Coombes K, Chang A, Vogel LH, Almeida JS. A Semantic Web management model for integrative biomedical informatics. PLoS One 2008;3:e2946. [PMID: 18698353 PMCID: PMC2491554 DOI: 10.1371/journal.pone.0002946] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2008] [Accepted: 07/12/2008] [Indexed: 11/19/2022] Open

Abstract

Background

Data, data everywhere. The diversity and magnitude of the data generated in the Life Sciences defies automated articulation among complementary efforts. The additional need in this field for managing property and access permissions compounds the difficulty very significantly. This is particularly the case when the integration involves multiple domains and disciplines, even more so when it includes clinical and high throughput molecular data.

Methodology/Principal Findings

The emergence of Semantic Web technologies brings the promise of meaningful interoperation between data and analysis resources. In this report we identify a core model for biomedical Knowledge Engineering applications and demonstrate how this new technology can be used to weave a management model where multiple intertwined data structures can be hosted and managed by multiple authorities in a distributed management infrastructure. Specifically, the demonstration is performed by linking data sources associated with the Lung Cancer SPORE awarded to The University of Texas MDAnderson Cancer Center at Houston and the Southwestern Medical Center at Dallas. A software prototype, available with open source at www.s3db.org, was developed and its proposed design has been made publicly available as an open source instrument for shared, distributed data management.

Conclusions/Significance

The Semantic Web technologies have the potential to addresses the need for distributed and evolvable representations that are critical for systems Biology and translational biomedical research. As this technology is incorporated into application development we can expect that both general purpose productivity software and domain specific software installed on our personal computers will become increasingly integrated with the relevant remote resources. In this scenario, the acquisition of a new dataset should automatically trigger the delegation of its analysis.

Collapse

Affiliation(s)

Helena F. Deus Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Lisboa, Portugal
Romesh Stanislaus Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Diogo F. Veiga Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Carmen Behrens Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Ignacio I. Wistuba Department of Thoracic/Head and Neck Medical Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America Department of Pathology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
John D. Minna Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
Harold R. Garner Hamon Center for Therapeutic Oncology Research, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America Center for Biomedical Inventions, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
Stephen G. Swisher Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Jack A. Roth Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Arlene M. Correa Department of Thoracic and Cardiovascular Surgery, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Bradley Broom Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Kevin Coombes Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Allen Chang Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America
Lynn H. Vogel Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
Jonas S. Almeida Department of Bioinformatics and Computational Biology, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, United States of America * E-mail:

Collapse

Burgun A, Bodenreider O. Accessing and integrating data and knowledge for biomedical research. Yearb Med Inform 2008:91-101. [PMID: 18660883 PMCID: PMC2553094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open