1
|
Taira RK, Garlid AO, Speier W. Design considerations for a hierarchical semantic compositional framework for medical natural language understanding. PLoS One 2023; 18:e0282882. [PMID: 36928721 PMCID: PMC10019629 DOI: 10.1371/journal.pone.0282882] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 02/24/2023] [Indexed: 03/18/2023] Open
Abstract
Medical natural language processing (NLP) systems are a key enabling technology for transforming Big Data from clinical report repositories to information used to support disease models and validate intervention methods. However, current medical NLP systems fall considerably short when faced with the task of logically interpreting clinical text. In this paper, we describe a framework inspired by mechanisms of human cognition in an attempt to jump the NLP performance curve. The design centers on a hierarchical semantic compositional model (HSCM), which provides an internal substrate for guiding the interpretation process. The paper describes insights from four key cognitive aspects: semantic memory, semantic composition, semantic activation, and hierarchical predictive coding. We discuss the design of a generative semantic model and an associated semantic parser used to transform a free-text sentence into a logical representation of its meaning. The paper discusses supportive and antagonistic arguments for the key features of the architecture as a long-term foundational framework.
Collapse
Affiliation(s)
- Ricky K. Taira
- Medical and Imaging Informatics (MII) Group, Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| | - Anders O. Garlid
- Medical and Imaging Informatics (MII) Group, Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, California, United States of America
| | - William Speier
- Medical and Imaging Informatics (MII) Group, Department of Radiological Sciences, University of California, Los Angeles, Los Angeles, California, United States of America
- Department of Bioengineering, University of California, Los Angeles, Los Angeles, California, United States of America
| |
Collapse
|
2
|
Deng L, Zhang X, Yang T, Liu M, Chen L, Jiang T. PIAT: an evolutionarily intelligent system for deep phenotyping of Chinese electronic health records. IEEE J Biomed Health Inform 2022; 26:4142-4152. [PMID: 35609107 DOI: 10.1109/jbhi.2022.3177421] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Electronic health record (EHR) resources are valuable but remain underexplored because most clinical information, especially phenotype information, is buried in the free text of EHRs. An intelligent annotation tool plays an important role in unlocking the full potential of EHRs by transforming free-text phenotype information into a computer-readable form. Deep phenotyping has shown its advantage in representing phenotype information in EHRs with high fidelity; however, most existing annotation tools are not suitable for the deep phenotyping task. Here, we developed an intelligent annotation tool named PIAT with a major focus on the deep phenotyping of Chinese EHRs. PIAT can improve the annotation efficiency for EHR-based deep phenotyping with a simple but effective interactive interface, automatic preannotation support, and a learning mechanism. Specifically, experts can proofread automatic annotation results from the annotation algorithm in the web-based interactive interface, and EHRs reviewed by experts can be used for evolving the underlying annotation algorithm. In this way, the annotation process of deep phenotyping EHRs will become easier. In conclusion, we create a powerful intelligent system for the deep phenotyping of Chinese EHRs. It is hoped that our work will inspire further studies in constructing intelligent systems for deep phenotyping English and non-English EHRs.
Collapse
|
3
|
Li S, Deng L, Zhang X, Chen L, Yang T, Qi Y, Jiang T. Deep Phenotyping on Chinese Electronic Health Records by Recognizing Linguistic Patterns of Phenotypic Narratives with a Sequence Motif Discovery Tool: Algorithm Development and Validation (Preprint). J Med Internet Res 2022; 24:e37213. [PMID: 35657661 PMCID: PMC9206202 DOI: 10.2196/37213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/21/2022] [Accepted: 05/12/2022] [Indexed: 11/23/2022] Open
Abstract
Background Phenotype information in electronic health records (EHRs) is mainly recorded in unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype information in EHRs with high fidelity, making it the focus of medical informatics. However, developing a deep-phenotyping method for non-English EHRs (ie, Chinese EHRs) is challenging. Although numerous EHR resources exist in China, fine-grained annotation data that are suitable for developing deep-phenotyping methods are limited. It is challenging to develop a deep-phenotyping method for Chinese EHRs in such a low-resource scenario. Objective In this study, we aimed to develop a deep-phenotyping method with good generalization ability for Chinese EHRs based on limited fine-grained annotation data. Methods The core of the methodology was to identify linguistic patterns of phenotype descriptions in Chinese EHRs with a sequence motif discovery tool and perform deep phenotyping of Chinese EHRs by recognizing linguistic patterns in free text. Specifically, 1000 Chinese EHRs were manually annotated based on a fine-grained information model, PhenoSSU (Semantic Structured Unit of Phenotypes). The annotation data set was randomly divided into a training set (n=700, 70%) and a testing set (n=300, 30%). The process for mining linguistic patterns was divided into three steps. First, free text in the training set was encoded as single-letter sequences (P: phenotype, A: attribute). Second, a biological sequence analysis tool—MEME (Multiple Expectation Maximums for Motif Elicitation)—was used to identify motifs in the single-letter sequences. Finally, the identified motifs were reduced to a series of regular expressions representing linguistic patterns of PhenoSSU instances in Chinese EHRs. Based on the discovered linguistic patterns, we developed a deep-phenotyping method for Chinese EHRs, including a deep learning–based method for named entity recognition and a pattern recognition–based method for attribute prediction. Results In total, 51 sequence motifs with statistical significance were mined from 700 Chinese EHRs in the training set and were combined into six regular expressions. It was found that these six regular expressions could be learned from a mean of 134 (SD 9.7) annotated EHRs in the training set. The deep-phenotyping algorithm for Chinese EHRs could recognize PhenoSSU instances with an overall accuracy of 0.844 on the test set. For the subtask of entity recognition, the algorithm achieved an F1 score of 0.898 with the Bidirectional Encoder Representations from Transformers–bidirectional long short-term memory and conditional random field model; for the subtask of attribute prediction, the algorithm achieved a weighted accuracy of 0.940 with the linguistic pattern–based method. Conclusions We developed a simple but effective strategy to perform deep phenotyping of Chinese EHRs with limited fine-grained annotation data. Our work will promote the second use of Chinese EHRs and give inspiration to other non–English-speaking countries.
Collapse
Affiliation(s)
- Shicheng Li
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Lizong Deng
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Xu Zhang
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Luming Chen
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
- Guangzhou Laboratory, Guangzhou, China
| | - Tao Yang
- Guangzhou Laboratory, Guangzhou, China
- Guangzhou Medical University, Guangzhou, China
| | - Yifan Qi
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
| | - Taijiao Jiang
- Institute of Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Suzhou Institute of Systems Medicine, Suzhou, China
- Guangzhou Laboratory, Guangzhou, China
| |
Collapse
|
4
|
He W, Kirchoff KG, Sampson RR, McGhee KK, Cates AM, Obeid JS, Lenert LA. Research Integrated Network of Systems (RINS): a virtual data warehouse for the acceleration of translational research. J Am Med Inform Assoc 2021; 28:1440-1450. [PMID: 33729486 DOI: 10.1093/jamia/ocab023] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 01/28/2021] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Integrated, real-time data are crucial to evaluate translational efforts to accelerate innovation into care. Too often, however, needed data are fragmented in disparate systems. The South Carolina Clinical & Translational Research Institute at the Medical University of South Carolina (MUSC) developed and implemented a universal study identifier-the Research Master Identifier (RMID)-for tracking research studies across disparate systems and a data warehouse-inspired model-the Research Integrated Network of Systems (RINS)-for integrating data from those systems. MATERIALS AND METHODS In 2017, MUSC began requiring the use of RMIDs in informatics systems that support human subject studies. We developed a web-based tool to create RMIDs and application programming interfaces to synchronize research records and visualize linkages to protocols across systems. Selected data from these disparate systems were extracted and merged nightly into an enterprise data mart, and performance dashboards were created to monitor key translational processes. RESULTS Within 4 years, 5513 RMIDs were created. Among these were 726 (13%) bridged systems needed to evaluate research study performance, and 982 (18%) linked to the electronic health records, enabling patient-level reporting. DISCUSSION Barriers posed by data fragmentation to assessment of program impact have largely been eliminated at MUSC through the requirement for an RMID, its distribution via RINS to disparate systems, and mapping of system-level data to a single integrated data mart. CONCLUSION By applying data warehousing principles to federate data at the "study" level, the RINS project reduced data fragmentation and promoted research systems integration.
Collapse
Affiliation(s)
- Wenjun He
- College of Medicine, South Carolina Clinical & Translational Research Institute, Medical University of South Carolina, Charleston, SC, USA
| | - Katie G Kirchoff
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Royce R Sampson
- College of Medicine, South Carolina Clinical & Translational Research Institute, Medical University of South Carolina, Charleston, SC, USA.,Department of Psychiatry & Behavioral Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Kimberly K McGhee
- College of Medicine, South Carolina Clinical & Translational Research Institute, Medical University of South Carolina, Charleston, SC, USA.,Academic Affairs Faculty, Medical University of South Carolina, Charleston, SC, USA
| | - Andrew M Cates
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA
| | - Jihad S Obeid
- College of Medicine, South Carolina Clinical & Translational Research Institute, Medical University of South Carolina, Charleston, SC, USA.,Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA.,Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Leslie A Lenert
- College of Medicine, South Carolina Clinical & Translational Research Institute, Medical University of South Carolina, Charleston, SC, USA.,Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, USA.,Department of Medicine, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
5
|
Park JA, Sung MD, Kim HH, Park YR. Weight-Based Framework for Predictive Modeling of Multiple Databases With Noniterative Communication Without Data Sharing: Privacy-Protecting Analytic Method for Multi-Institutional Studies. JMIR Med Inform 2021; 9:e21043. [PMID: 33818396 PMCID: PMC8056295 DOI: 10.2196/21043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Revised: 11/16/2020] [Accepted: 03/03/2021] [Indexed: 01/22/2023] Open
Abstract
Background Securing the representativeness of study populations is crucial in biomedical research to ensure high generalizability. In this regard, using multi-institutional data have advantages in medicine. However, combining data physically is difficult as the confidential nature of biomedical data causes privacy issues. Therefore, a methodological approach is necessary when using multi-institution medical data for research to develop a model without sharing data between institutions. Objective This study aims to develop a weight-based integrated predictive model of multi-institutional data, which does not require iterative communication between institutions, to improve average predictive performance by increasing the generalizability of the model under privacy-preserving conditions without sharing patient-level data. Methods The weight-based integrated model generates a weight for each institutional model and builds an integrated model for multi-institutional data based on these weights. We performed 3 simulations to show the weight characteristics and to determine the number of repetitions of the weight required to obtain stable values. We also conducted an experiment using real multi-institutional data to verify the developed weight-based integrated model. We selected 10 hospitals (2845 intensive care unit [ICU] stays in total) from the electronic intensive care unit Collaborative Research Database to predict ICU mortality with 11 features. To evaluate the validity of our model, compared with a centralized model, which was developed by combining all the data of 10 hospitals, we used proportional overlap (ie, 0.5 or less indicates a significant difference at a level of .05; and 2 indicates 2 CIs overlapping completely). Standard and firth logistic regression models were applied for the 2 simulations and the experiment. Results The results of these simulations indicate that the weight of each institution is determined by 2 factors (ie, the data size of each institution and how well each institutional model fits into the overall institutional data) and that repeatedly generating 200 weights is necessary per institution. In the experiment, the estimated area under the receiver operating characteristic curve (AUC) and 95% CIs were 81.36% (79.37%-83.36%) and 81.95% (80.03%-83.87%) in the centralized model and weight-based integrated model, respectively. The proportional overlap of the CIs for AUC in both the weight-based integrated model and the centralized model was approximately 1.70, and that of overlap of the 11 estimated odds ratios was over 1, except for 1 case. Conclusions In the experiment where real multi-institutional data were used, our model showed similar results to the centralized model without iterative communication between institutions. In addition, our weight-based integrated model provided a weighted average model by integrating 10 models overfitted or underfitted, compared with the centralized model. The proposed weight-based integrated model is expected to provide an efficient distributed research approach as it increases the generalizability of the model and does not require iterative communication.
Collapse
Affiliation(s)
- Ji Ae Park
- Department of Biomedical System Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Min Dong Sung
- Department of Biomedical System Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Ho Heon Kim
- Department of Biomedical System Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Yu Rang Park
- Department of Biomedical System Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
6
|
Sáez C, Gutiérrez-Sacristán A, Kohane I, García-Gómez JM, Avillach P. EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. Gigascience 2020; 9:giaa079. [PMID: 32729900 PMCID: PMC7391413 DOI: 10.1093/gigascience/giaa079] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 05/28/2020] [Accepted: 07/03/2020] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Temporal variability in health-care processes or protocols is intrinsic to medicine. Such variability can potentially introduce dataset shifts, a data quality issue when reusing electronic health records (EHRs) for secondary purposes. Temporal data-set shifts can present as trends, as well as abrupt or seasonal changes in the statistical distributions of data over time. The latter are particularly complicated to address in multimodal and highly coded data. These changes, if not delineated, can harm population and data-driven research, such as machine learning. Given that biomedical research repositories are increasingly being populated with large sets of historical data from EHRs, there is a need for specific software methods to help delineate temporal data-set shifts to ensure reliable data reuse. RESULTS EHRtemporalVariability is an open-source R package and Shiny app designed to explore and identify temporal data-set shifts. EHRtemporalVariability estimates the statistical distributions of coded and numerical data over time; projects their temporal evolution through non-parametric information geometric temporal plots; and enables the exploration of changes in variables through data temporal heat maps. We demonstrate the capability of EHRtemporalVariability to delineate data-set shifts in three impact case studies, one of which is available for reproducibility. CONCLUSIONS EHRtemporalVariability enables the exploration and identification of data-set shifts, contributing to the broad examination and repurposing of large, longitudinal data sets. Our goal is to help ensure reliable data reuse for a wide range of biomedical data users. EHRtemporalVariability is designed for technical users who are programmatically utilizing the R package, as well as users who are not familiar with programming via the Shiny user interface.Availability: https://github.com/hms-dbmi/EHRtemporalVariability/Reproducible vignette: https://cran.r-project.org/web/packages/EHRtemporalVariability/vignettes/EHRtemporalVariability.htmlOnline demo: http://ehrtemporalvariability.upv.es/.
Collapse
Affiliation(s)
- Carlos Sáez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Isaac Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Juan M García-Gómez
- Biomedical Data Science Lab, Instituto Universitario de Tecnologías de la Información y Comunicaciones, Universitat Politècnica de València, Camino de Vera s/n, Valencia 46022, España
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
| |
Collapse
|
7
|
Platt JE, Raj M, Wienroth M. An Analysis of the Learning Health System in Its First Decade in Practice: Scoping Review. J Med Internet Res 2020; 22:e17026. [PMID: 32191214 PMCID: PMC7118548 DOI: 10.2196/17026] [Citation(s) in RCA: 54] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 12/30/2019] [Accepted: 12/31/2019] [Indexed: 12/20/2022] Open
Abstract
Background In the past decade, Lynn Etheredge presented a vision for the Learning Health System (LHS) as an opportunity for increasing the value of health care via rapid learning from data and immediate translation to practice and policy. An LHS is defined in the literature as a system that seeks to continuously generate and apply evidence, innovation, quality, and value in health care. Objective This review aimed to examine themes in the literature and rhetoric on the LHS in the past decade to understand efforts to realize the LHS in practice and to identify gaps and opportunities to continue to take the LHS forward. Methods We conducted a thematic analysis in 2018 to analyze progress and opportunities over time as compared with the initial Knowledge Gaps and Uncertainties proposed in 2007. Results We found that the literature on the LHS has increased over the past decade, with most articles focused on theory and implementation; articles have been increasingly concerned with policy. Conclusions There is a need for attention to understanding the ethical and social implications of the LHS and for exploring opportunities to ensure that these implications are salient in implementation, practice, and policy efforts.
Collapse
Affiliation(s)
- Jodyn E Platt
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, United States
| | - Minakshi Raj
- Department of Health Management and Policy, University of Michigan School of Public Health, Ann Arbor, MI, United States
| | - Matthias Wienroth
- School of Geography, Politics & Sociology, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
8
|
Lowes LP, Noritz GH, Newmeyer A, Embi PJ, Yin H, Smoyer WE. 'Learn From Every Patient': implementation and early results of a learning health system. Dev Med Child Neurol 2017; 59:183-191. [PMID: 27545839 DOI: 10.1111/dmcn.13227] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/15/2016] [Indexed: 01/17/2023]
Abstract
AIM The convergence of three major trends in medicine, namely conversion to electronic health records (EHRs), prioritization of translational research, and the need to control healthcare expenditures, has created unprecedented interest and opportunities to develop systems that improve care while reducing costs. However, operationalizing a 'learning health system' requires systematic changes that have not yet been widely demonstrated in clinical practice. METHOD We developed, implemented, and evaluated a model of EHR-supported care in a cohort of 131 children with cerebral palsy that integrated clinical care, quality improvement, and research, entitled 'Learn From Every Patient' (LFEP). RESULTS Children treated in the LFEP Program for a 12-month period experienced a 43% reduction in total inpatient days (p=0.030 vs prior 12mo period), a 27% reduction in inpatient admissions, a 30% reduction in emergency department visits (p=0.001), and a 29% reduction in urgent care visits (p=0.046). LFEP Program implementation also resulted in reductions in healthcare costs of 210% (US$7014/child) versus a Time control group, and reductions of 176% ($6596/child) versus a Program Activities control group. Importantly, clinical implementation of the LFEP Program has also driven the continuous accumulation of robust research-quality data for both publication and implementation of evidence-based improvements in clinical care. INTERPRETATION These results demonstrate that a learning health system can be developed and implemented in a cost-effective manner, and can integrate clinical care and research to systematically drive simultaneous clinical quality improvement and reduced healthcare costs.
Collapse
Affiliation(s)
| | - Garey H Noritz
- Nationwide Children's Hospital, Columbus, OH, USA.,Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | - Amy Newmeyer
- Children's Hospital of the King's Daughters, Norfolk, VA, USA
| | - Peter J Embi
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
| | - Han Yin
- Nationwide Children's Hospital, Columbus, OH, USA
| | - William E Smoyer
- Nationwide Children's Hospital, Columbus, OH, USA.,Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| | | |
Collapse
|
9
|
Fedorov A, Clunie D, Ulrich E, Bauer C, Wahle A, Brown B, Onken M, Riesmeier J, Pieper S, Kikinis R, Buatti J, Beichel RR. DICOM for quantitative imaging biomarker development: a standards based approach to sharing clinical data and structured PET/CT analysis results in head and neck cancer research. PeerJ 2016; 4:e2057. [PMID: 27257542 PMCID: PMC4888317 DOI: 10.7717/peerj.2057] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Accepted: 04/29/2016] [Indexed: 12/29/2022] Open
Abstract
Background. Imaging biomarkers hold tremendous promise for precision medicine clinical applications. Development of such biomarkers relies heavily on image post-processing tools for automated image quantitation. Their deployment in the context of clinical research necessitates interoperability with the clinical systems. Comparison with the established outcomes and evaluation tasks motivate integration of the clinical and imaging data, and the use of standardized approaches to support annotation and sharing of the analysis results and semantics. We developed the methodology and tools to support these tasks in Positron Emission Tomography and Computed Tomography (PET/CT) quantitative imaging (QI) biomarker development applied to head and neck cancer (HNC) treatment response assessment, using the Digital Imaging and Communications in Medicine (DICOM(®)) international standard and free open-source software. Methods. Quantitative analysis of PET/CT imaging data collected on patients undergoing treatment for HNC was conducted. Processing steps included Standardized Uptake Value (SUV) normalization of the images, segmentation of the tumor using manual and semi-automatic approaches, automatic segmentation of the reference regions, and extraction of the volumetric segmentation-based measurements. Suitable components of the DICOM standard were identified to model the various types of data produced by the analysis. A developer toolkit of conversion routines and an Application Programming Interface (API) were contributed and applied to create a standards-based representation of the data. Results. DICOM Real World Value Mapping, Segmentation and Structured Reporting objects were utilized for standards-compliant representation of the PET/CT QI analysis results and relevant clinical data. A number of correction proposals to the standard were developed. The open-source DICOM toolkit (DCMTK) was improved to simplify the task of DICOM encoding by introducing new API abstractions. Conversion and visualization tools utilizing this toolkit were developed. The encoded objects were validated for consistency and interoperability. The resulting dataset was deposited in the QIN-HEADNECK collection of The Cancer Imaging Archive (TCIA). Supporting tools for data analysis and DICOM conversion were made available as free open-source software. Discussion. We presented a detailed investigation of the development and application of the DICOM model, as well as the supporting open-source tools and toolkits, to accommodate representation of the research data in QI biomarker development. We demonstrated that the DICOM standard can be used to represent the types of data relevant in HNC QI biomarker development, and encode their complex relationships. The resulting annotated objects are amenable to data mining applications, and are interoperable with a variety of systems that support the DICOM standard.
Collapse
Affiliation(s)
- Andriy Fedorov
- Department of Radiology, Brigham and Women’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Harvard University, Boston, MA, United States of America
| | - David Clunie
- PixelMed Publishing, LLC, Bangor, PA, United States of America
| | - Ethan Ulrich
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, United States of America
- Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, IA, United States of America
| | - Christian Bauer
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, United States of America
- Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, IA, United States of America
| | - Andreas Wahle
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, United States of America
- Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, IA, United States of America
| | - Bartley Brown
- Center for Bioinformatics and Computational Biology, University of Iowa, Iowa City, IA, United States of America
| | | | | | - Steve Pieper
- Isomics, Inc., Cambridge, MA, United States of America
| | - Ron Kikinis
- Department of Radiology, Brigham and Women’s Hospital, Boston, MA, United States of America
- Harvard Medical School, Harvard University, Boston, MA, United States of America
- Fraunhofer MEVIS, Bremen, Germany
- Mathematics/Computer Science Faculty, University of Bremen, Bremen, Germany
| | - John Buatti
- Department of Radiation Oncology, University of Iowa Carver College of Medicine, Iowa City, IA, United States of America
| | - Reinhard R. Beichel
- Department of Electrical and Computer Engineering, University of Iowa, Iowa City, IA, United States of America
- Iowa Institute for Biomedical Imaging, University of Iowa, Iowa City, IA, United States of America
- Department of Internal Medicine, University of Iowa Carver College of Medicine, Iowa City, IA, United States of America
| |
Collapse
|
10
|
Abstract
The published biomedical research literature encompasses most of our understanding of how drugs interact with gene products to produce physiological responses (phenotypes). Unfortunately, this information is distributed throughout the unstructured text of over 23 million articles. The creation of structured resources that catalog the relationships between drugs and genes would accelerate the translation of basic molecular knowledge into discoveries of genomic biomarkers for drug response and prediction of unexpected drug-drug interactions. Extracting these relationships from natural language sentences on such a large scale, however, requires text mining algorithms that can recognize when different-looking statements are expressing similar ideas. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of biomedical relationships automatically from text, overcoming differences in word choice and sentence structure. We validate EBC's performance against manually-curated sets of (1) pharmacogenomic relationships from PharmGKB and (2) drug-target relationships from DrugBank, and use it to discover new drug-gene relationships for both knowledge bases. We then apply EBC to map the complete universe of drug-gene relationships based on their descriptions in Medline, revealing unexpected structure that challenges current notions about how these relationships are expressed in text. For instance, we learn that newer experimental findings are described in consistently different ways than established knowledge, and that seemingly pure classes of relationships can exhibit interesting chimeric structure. The EBC algorithm is flexible and adaptable to a wide range of problems in biomedical text mining. Virtually all important biomedical knowledge is described in the published research literature, but Medline currently contains over 23 million articles and is growing at the rate of several hundred thousand new articles each year. In this environment, we need computational algorithms that can efficiently extract, aggregate, annotate and store information from the raw text. Because authors describe their results using natural language, descriptions of similar phenomena vary considerably with respect to both word choice and sentence structure. Any algorithm capable of mining the biomedical literature on a large scale must be able to overcome these differences and recognize when two different-looking statements are saying the same thing. Here we describe a novel algorithm, Ensemble Biclustering for Classification (EBC), that learns the structure of drug-gene relationships automatically from the unstructured text of biomedical research abstracts. By applying EBC to the entirety of Medline, we learn from the structure of the text itself approximately 20 key ways that drugs and genes can interact, discover new facts for two biomedical knowledge bases, and reveal rich and unexpected structure in how scientists describe drug-gene relationships.
Collapse
Affiliation(s)
- Bethany Percha
- Biomedical Informatics Training Program, Stanford University, Stanford, California, United States of America
| | - Russ B. Altman
- Departments of Medicine, Genetics and Bioengineering, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
11
|
Knowledge retrieval from PubMed abstracts and electronic medical records with the Multiple Sclerosis Ontology. PLoS One 2015; 10:e0116718. [PMID: 25665127 PMCID: PMC4321837 DOI: 10.1371/journal.pone.0116718] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2014] [Accepted: 12/13/2014] [Indexed: 12/03/2022] Open
Abstract
Background In order to retrieve useful information from scientific literature and electronic medical records (EMR) we developed an ontology specific for Multiple Sclerosis (MS). Methods The MS Ontology was created using scientific literature and expert review under the Protégé OWL environment. We developed a dictionary with semantic synonyms and translations to different languages for mining EMR. The MS Ontology was integrated with other ontologies and dictionaries (diseases/comorbidities, gene/protein, pathways, drug) into the text-mining tool SCAIView. We analyzed the EMRs from 624 patients with MS using the MS ontology dictionary in order to identify drug usage and comorbidities in MS. Testing competency questions and functional evaluation using F statistics further validated the usefulness of MS ontology. Results Validation of the lexicalized ontology by means of named entity recognition-based methods showed an adequate performance (F score = 0.73). The MS Ontology retrieved 80% of the genes associated with MS from scientific abstracts and identified additional pathways targeted by approved disease-modifying drugs (e.g. apoptosis pathways associated with mitoxantrone, rituximab and fingolimod). The analysis of the EMR from patients with MS identified current usage of disease modifying drugs and symptomatic therapy as well as comorbidities, which are in agreement with recent reports. Conclusion The MS Ontology provides a semantic framework that is able to automatically extract information from both scientific literature and EMR from patients with MS, revealing new pathogenesis insights as well as new clinical information.
Collapse
|
12
|
Papoulias C, Robotham D, Drake G, Rose D, Wykes T. Staff and service users' views on a 'Consent for Contact' research register within psychosis services: a qualitative study. BMC Psychiatry 2014; 14:377. [PMID: 25539869 PMCID: PMC4296527 DOI: 10.1186/s12888-014-0377-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 12/19/2014] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Recruitment to mental health research can be challenging. 'Consent for Contact' (C4C) is a novel framework which may expedite recruitment and contribute to equitable access to research. This paper discusses stakeholder perspectives on using a C4C model in services for people with psychosis. METHOD This is a cross sectional study investigating the views of service users and staff using qualitative methods. Eight focus groups were recruited: five with service users (n = 26) and three with clinicians (n = 17). Purposive sampling was applied in order to reflect the local population in terms of ethnicity, experience of psychiatric services and attitudes towards research. RESULTS Staff and service users alike associated the principle of 'consent for contact' with greater service user autonomy and favourable conditions for research recruitment. Fears around coercion and inappropriate uses of clinical records were common and most marked in service users identifying as having a negative view to research participation. Staff working in inpatient services reported that consenting for future contact might contribute to paranoid ideation. All groups agreed that implementation should highlight safeguards and the opt-in nature of the register. CONCLUSIONS Staff and service users responded positively to C4C. Clinicians explaining C4C to service users should allay anxieties around coercion, degree of commitment, and use of records. For some service users, researcher access to records is likely to be the most challenging aspect of the consultation.
Collapse
Affiliation(s)
- Constantina Papoulias
- Department of Psychology, Institute of Psychiatry, King's College London, London, UK.
| | - Dan Robotham
- Department of Psychology, Institute of Psychiatry, King's College London, London, UK.
| | - Gareth Drake
- Department of Clinical, Educational & Health Psychology, University College London, London, UK.
| | - Diana Rose
- Service User Research Enterprise, Institute of Psychiatry, King's College London, London, UK.
| | - Til Wykes
- Department of Psychology, Institute of Psychiatry, King's College London, London, UK. .,Service User Research Enterprise, Institute of Psychiatry, King's College London, London, UK.
| |
Collapse
|
13
|
Sim I, Tu SW, Carini S, Lehmann HP, Pollock BH, Peleg M, Wittkowski KM. The Ontology of Clinical Research (OCRe): an informatics foundation for the science of clinical research. J Biomed Inform 2013; 52:78-91. [PMID: 24239612 DOI: 10.1016/j.jbi.2013.11.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Revised: 10/11/2013] [Accepted: 11/03/2013] [Indexed: 11/25/2022]
Abstract
To date, the scientific process for generating, interpreting, and applying knowledge has received less informatics attention than operational processes for conducting clinical studies. The activities of these scientific processes - the science of clinical research - are centered on the study protocol, which is the abstract representation of the scientific design of a clinical study. The Ontology of Clinical Research (OCRe) is an OWL 2 model of the entities and relationships of study design protocols for the purpose of computationally supporting the design and analysis of human studies. OCRe's modeling is independent of any specific study design or clinical domain. It includes a study design typology and a specialized module called ERGO Annotation for capturing the meaning of eligibility criteria. In this paper, we describe the key informatics use cases of each phase of a study's scientific lifecycle, present OCRe and the principles behind its modeling, and describe applications of OCRe and associated technologies to a range of clinical research use cases. OCRe captures the central semantics that underlies the scientific processes of clinical research and can serve as an informatics foundation for supporting the entire range of knowledge activities that constitute the science of clinical research.
Collapse
Affiliation(s)
- Ida Sim
- Department of Medicine, University of California, San Francisco, CA, United States.
| | - Samson W Tu
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States
| | - Simona Carini
- Department of Medicine, University of California, San Francisco, CA, United States
| | - Harold P Lehmann
- Division of Health Sciences Informatics, Johns Hopkins University, Baltimore, MD, United States
| | - Brad H Pollock
- Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
| | - Mor Peleg
- Department of Information Systems, University of Haifa, Haifa, Israel
| | - Knut M Wittkowski
- Department of Research Design and Biostatistics, The Rockefeller University, New York, NY, United States
| |
Collapse
|
14
|
McDonald CJ, Vreeman DJ, Abhyankar S. Comment on "time to integrate clinical and research informatics". Sci Transl Med 2013; 5:179le1. [PMID: 23552367 DOI: 10.1126/scitranslmed.3005700] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The same code standards should be used in both research and clinical care to facilitate data integration across domains.
Collapse
|
15
|
Witt CM. Clinical research on traditional drugs and food items--the potential of comparative effectiveness research for interdisciplinary research. JOURNAL OF ETHNOPHARMACOLOGY 2013; 147:254-258. [PMID: 23458921 DOI: 10.1016/j.jep.2013.02.024] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Revised: 02/14/2013] [Accepted: 02/16/2013] [Indexed: 06/01/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE In the traditional context, herbs are often used as herbal whole system therapies, however, most clinical trials included highly selected patients and applied standardized treatment protocols with the aim to exclude as much bias as possible. These studies have contributed important information on the efficacy of herbal medicine extracts; however, their results are only marginally helpful to understand the value of herbal medicine and food items in a more traditional usual care context. METHODS The new development of comparative effectiveness research (CER) will be introduced and synergies with ethnopharmacology will be outlined. RESULTS CER provides great opportunities for guiding researchers and clinicians in improving management of disease. CER compares two or more health interventions in order to determine which of these options works best for which types of patients in settings that are similar to those in which the intervention will be used in practice. CER uses a broad spectrum of methodologies including randomized pragmatic trials that can also be applied to herbal whole system therapies. Ethnopharmacological research can provide highly relevant information for CER including data on characteristics of typical patients as well as traditional usage including methods of collection, extraction, and preparation. Recommendations for future research on traditional herbal medicine and food items are (1) a systematic cooperation between ethnopharmacology and clinical researchers and (2) a call for more CER on traditional herbal medicines and food items. CONCLUSION Multiple stakeholders, including ethnopharmacologists, should cooperate to identify relevant study questions as well share their knowledge to determine the optimal placement of a clinical trial in the efficacy-effectiveness-continuum.
Collapse
Affiliation(s)
- Claudia M Witt
- Institute for Social Medicine, Epidemiology and Health Economics, Charité-Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
16
|
Katzan IL, Rudick RA. Author Response to Comment on “Time to Integrate Clinical and Research Informatics”. Sci Transl Med 2013; 5:179lr1. [DOI: 10.1126/scitranslmed.3006031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Lack of structured clinical data limits research potential of EHRs, and efforts to establish clinical data standards should be a priority.
Collapse
Affiliation(s)
- Irene L. Katzan
- Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| | | |
Collapse
|