1
|
Carrell DS, Floyd JS, Gruber S, Hazlehurst BL, Heagerty PJ, Nelson JC, Williamson BD, Ball R. A general framework for developing computable clinical phenotype algorithms. J Am Med Inform Assoc 2024; 31:1785-1796. [PMID: 38748991 PMCID: PMC11258420 DOI: 10.1093/jamia/ocae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 05/07/2024] [Accepted: 05/14/2024] [Indexed: 07/20/2024] Open
Abstract
OBJECTIVE To present a general framework providing high-level guidance to developers of computable algorithms for identifying patients with specific clinical conditions (phenotypes) through a variety of approaches, including but not limited to machine learning and natural language processing methods to incorporate rich electronic health record data. MATERIALS AND METHODS Drawing on extensive prior phenotyping experiences and insights derived from 3 algorithm development projects conducted specifically for this purpose, our team with expertise in clinical medicine, statistics, informatics, pharmacoepidemiology, and healthcare data science methods conceptualized stages of development and corresponding sets of principles, strategies, and practical guidelines for improving the algorithm development process. RESULTS We propose 5 stages of algorithm development and corresponding principles, strategies, and guidelines: (1) assessing fitness-for-purpose, (2) creating gold standard data, (3) feature engineering, (4) model development, and (5) model evaluation. DISCUSSION AND CONCLUSION This framework is intended to provide practical guidance and serve as a basis for future elaboration and extension.
Collapse
Affiliation(s)
- David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - James S Floyd
- Department of Medicine, School of Medicine, University of Washington, Seattle, WA 98195, United States
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Susan Gruber
- Putnam Data Sciences, LLC, Cambridge, MA 02139, United States
| | - Brian L Hazlehurst
- Center for Health Research, Kaiser Permanente Northwest, Portland, OR 97227, United States
| | - Patrick J Heagerty
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Jennifer C Nelson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Brian D Williamson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| |
Collapse
|
2
|
Huang X, Kleiman R, Page D, Hebbring S. Automated Family Histories Significantly Improve Risk Prediction in an EHR. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:221-229. [PMID: 38827091 PMCID: PMC11141855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
We recently demonstrated that electronically constructed family pedigrees (e-pedigrees) have great value in epidemiologic research using electronic health record (EHR) data. Prior to this work, it has been well accepted that family health history is a major predictor for a wide spectrum of diseases, reflecting shared effects of genetics, environment, and lifestyle. With the widespread digitalization of patient data via EHRs, there is an unprecedented opportunity to use machine learning algorithms to better predict disease risk. Although predictive models have previously been constructed for a few important diseases, we currently know very little about how accurately the risk for most diseases can be predicted. It is further unknown if the incorporation of e-pedigrees in machine learning can improve the value of these models. In this study, we devised a family pedigree-driven high-throughput machine learning pipeline to simultaneously predict risks for thousands of diagnosis codes using thousands of input features. Models were built to predict future disease risk for three time windows using both Logistic Regression and XGBoost. For example, we achieved average areas under the receiver operating characteristic curves (AUCs) of 0.82, 0.77 and 0.71 for 1, 6, and 24 months, respectively using XGBoost and without e-pedigrees. When adding e-pedigree features to the XGBoost pipeline, AUCs increased to 0.83, 0.79 and 0.74 for the same three time periods, respectively. E-pedigrees similarly improved the predictions when using Logistic Regression. These results emphasize the potential value of incorporating family health history via e-pedigrees into machine learning with no further human time.
Collapse
Affiliation(s)
- Xiayuan Huang
- University of Wisconsin-Madison, Madison, Wisconsin, United Sates
| | - Ross Kleiman
- University of Wisconsin-Madison, Madison, Wisconsin, United Sates
| | - David Page
- Duke University, Durham, North Carolina, United States
| | - Scott Hebbring
- Marshfield Clinic Research Institute, Marshfield, Wisconsin, United States
| |
Collapse
|
3
|
Deng Y, Pacheco JA, Ghosh A, Chung A, Mao C, Smith JC, Zhao J, Wei WQ, Barnado A, Dorn C, Weng C, Liu C, Cordon A, Yu J, Tedla Y, Kho A, Ramsey-Goldman R, Walunas T, Luo Y. Natural language processing to identify lupus nephritis phenotype in electronic health records. BMC Med Inform Decis Mak 2024; 22:348. [PMID: 38433189 PMCID: PMC10910523 DOI: 10.1186/s12911-024-02420-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 01/09/2024] [Indexed: 03/05/2024] Open
Abstract
BACKGROUND Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). METHODS We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). RESULTS Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. CONCLUSION Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.
Collapse
Affiliation(s)
- Yu Deng
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Anika Ghosh
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Anh Chung
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
- Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Chengsheng Mao
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Joshua C Smith
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
| | - Juan Zhao
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
| | - April Barnado
- Department of Medicine, Vanderbilt University Medical Center, Nashville, USA
| | - Chad Dorn
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York City, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York City, USA
| | - Adam Cordon
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Jingzhi Yu
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Yacob Tedla
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Abel Kho
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Rosalind Ramsey-Goldman
- Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA
| | - Theresa Walunas
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
| | - Yuan Luo
- Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
| |
Collapse
|
4
|
Brandt PS, Kho A, Luo Y, Pacheco JA, Walunas TL, Hakonarson H, Hripcsak G, Liu C, Shang N, Weng C, Walton N, Carrell DS, Crane PK, Larson EB, Chute CG, Kullo IJ, Carroll R, Denny J, Ramirez A, Wei WQ, Pathak J, Wiley LK, Richesson R, Starren JB, Rasmussen LV. Characterizing variability of electronic health record-driven phenotype definitions. J Am Med Inform Assoc 2023; 30:427-437. [PMID: 36474423 PMCID: PMC9933077 DOI: 10.1093/jamia/ocac235] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 10/19/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE The aim of this study was to analyze a publicly available sample of rule-based phenotype definitions to characterize and evaluate the variability of logical constructs used. MATERIALS AND METHODS A sample of 33 preexisting phenotype definitions used in research that are represented using Fast Healthcare Interoperability Resources and Clinical Quality Language (CQL) was analyzed using automated analysis of the computable representation of the CQL libraries. RESULTS Most of the phenotype definitions include narrative descriptions and flowcharts, while few provide pseudocode or executable artifacts. Most use 4 or fewer medical terminologies. The number of codes used ranges from 5 to 6865, and value sets from 1 to 19. We found that the most common expressions used were literal, data, and logical expressions. Aggregate and arithmetic expressions are the least common. Expression depth ranges from 4 to 27. DISCUSSION Despite the range of conditions, we found that all of the phenotype definitions consisted of logical criteria, representing both clinical and operational logic, and tabular data, consisting of codes from standard terminologies and keywords for natural language processing. The total number and variety of expressions are low, which may be to simplify implementation, or authors may limit complexity due to data availability constraints. CONCLUSIONS The phenotype definitions analyzed show significant variation in specific logical, arithmetic, and other operators but are all composed of the same high-level components, namely tabular data and logical expressions. A standard representation for phenotype definitions should support these formats and be modular to support localization and shared logic.
Collapse
Affiliation(s)
- Pascal S Brandt
- Department of Biomedical and Medical Education, University of Washington, Seattle, Washington, USA
| | - Abel Kho
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Jennifer A Pacheco
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Theresa L Walunas
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Nephi Walton
- Intermountain Precision Genomics, Intermountain Healthcare, St George, Utah, USA
| | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
| | - Paul K Crane
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Eric B Larson
- Department of Medicine, University of Washington, Seattle, Washington, USA
- Department of Health Services, University of Washington, Seattle, Washington, USA
| | - Christopher G Chute
- Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Robert Carroll
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Josh Denny
- All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA
| | - Andrea Ramirez
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jyoti Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Laura K Wiley
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
| | - Rachel Richesson
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Justin B Starren
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
5
|
Performance of EHR classifiers for patient eligibility in a clinical trial of precision screening. Contemp Clin Trials 2022; 121:106926. [PMID: 36115637 DOI: 10.1016/j.cct.2022.106926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 09/07/2022] [Accepted: 09/09/2022] [Indexed: 01/27/2023]
Abstract
BACKGROUND Validated computable eligibility criteria use real-world data and facilitate the conduct of clinical trials. The Genomic Medicine at VA (GenoVA) Study is a pragmatic trial of polygenic risk score testing enrolling patients without known diagnoses of 6 common diseases: atrial fibrillation, coronary artery disease, type 2 diabetes, breast cancer, colorectal cancer, and prostate cancer. We describe the validation of computable disease classifiers as eligibility criteria and their performance in the first 16 months of trial enrollment. METHODS We identified well-performing published computable classifiers for the 6 target diseases and validated these in the target population using blinded physician review. If needed, classifiers were refined and then underwent a subsequent round of blinded review until true positive and true negative rates ≥80% were achieved. The optimized classifiers were then implemented as pre-screening exclusion criteria; telephone screens enabled an assessment of their real-world negative predictive value (NPV-RW). RESULTS Published classifiers for type 2 diabetes and breast and prostate cancer achieved desired performance in blinded chart review without modification; the classifier for atrial fibrillation required two rounds of refinement before achieving desired performance. Among the 1077 potential participants screened in the first 16 months of enrollment, NPV-RW of the classifiers ranged from 98.4% for coronary artery disease to 99.9% for colorectal cancer. Performance did not differ by gender or race/ethnicity. CONCLUSIONS Computable disease classifiers can serve as efficient and accurate pre-screening classifiers for clinical trials, although performance will depend on the trial objectives and diseases under study.
Collapse
|
6
|
Partogi M, Gaviria-Valencia S, Alzate Aguirre M, Pick NJ, Bhopalwala HM, Barry BA, Kaggal VC, Scott CG, Kessler ME, Moore MM, Mitchell JD, Chaudhry R, Bonacci RP, Arruda-Olson AM. Sociotechnical Intervention for Improved Delivery of Preventive Cardiovascular Care to Rural Communities: Participatory Design Approach. J Med Internet Res 2022; 24:e27333. [PMID: 35994324 PMCID: PMC9446142 DOI: 10.2196/27333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/30/2021] [Accepted: 06/27/2022] [Indexed: 11/15/2022] Open
Abstract
Background Clinical practice guidelines recommend antiplatelet and statin therapies as well as blood pressure control and tobacco cessation for secondary prevention in patients with established atherosclerotic cardiovascular diseases (ASCVDs). However, these strategies for risk modification are underused, especially in rural communities. Moreover, resources to support the delivery of preventive care to rural patients are fewer than those for their urban counterparts. Transformative interventions for the delivery of tailored preventive cardiovascular care to rural patients are needed. Objective A multidisciplinary team developed a rural-specific, team-based model of care intervention assisted by clinical decision support (CDS) technology using participatory design in a sociotechnical conceptual framework. The model of care intervention included redesigned workflows and a novel CDS technology for the coordination and delivery of guideline recommendations by primary care teams in a rural clinic. Methods The design of the model of care intervention comprised 3 phases: problem identification, experimentation, and testing. Input from team members (n=35) required 150 hours, including observations of clinical encounters, provider workshops, and interviews with patients and health care professionals. The intervention was prototyped, iteratively refined, and tested with user feedback. In a 3-month pilot trial, 369 patients with ASCVDs were randomized into the control or intervention arm. Results New workflows and a novel CDS tool were created to identify patients with ASCVDs who had gaps in preventive care and assign the right care team member for delivery of tailored recommendations. During the pilot, the intervention prototype was iteratively refined and tested. The pilot demonstrated feasibility for successful implementation of the sociotechnical intervention as the proportion of patients who had encounters with advanced practice providers (nurse practitioners and physician assistants), pharmacists, or tobacco cessation coaches for the delivery of guideline recommendations in the intervention arm was greater than that in the control arm. Conclusions Participatory design and a sociotechnical conceptual framework enabled the development of a rural-specific, team-based model of care intervention assisted by CDS technology for the transformation of preventive health care delivery for ASCVDs.
Collapse
|
7
|
Movaghar A, Page D, Brilliant M, Mailick M. Advancing artificial intelligence-assisted pre-screening for fragile X syndrome. BMC Med Inform Decis Mak 2022; 22:152. [PMID: 35689224 PMCID: PMC9185893 DOI: 10.1186/s12911-022-01896-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fragile X syndrome (FXS), the most common inherited cause of intellectual disability and autism, is significantly underdiagnosed in the general population. Diagnosing FXS is challenging due to the heterogeneity of the condition, subtle physical characteristics at the time of birth and similarity of phenotypes to other conditions. The medical complexity of FXS underscores an urgent need to develop more efficient and effective screening methods to identify individuals with FXS. In this study, we evaluate the effectiveness of using artificial intelligence (AI) and electronic health records (EHRs) to accelerate FXS diagnosis. METHODS The EHRs of 2.1 million patients served by the University of Wisconsin Health System (UW Health) were the main data source for this retrospective study. UW Health includes patients from south central Wisconsin, with approximately 33 years (1988-2021) of digitized health data. We identified all participants who received a code for FXS in the form of International Classification of Diseases (ICD), Ninth or Tenth Revision (ICD9 = 759.83, ICD10 = Q99.2). Only individuals who received the FXS code on at least two occasions ("Rule of 2") were classified as clinically diagnosed cases. To ensure the availability of sufficient data prior to clinical diagnosis to test the model, only individuals who were diagnosed after age 10 were included in the analysis. A supervised random forest classifier was used to create an AI-assisted pre-screening tool to identify cases with FXS, 5 years earlier than the time of clinical diagnosis based on their medical records. The area under receiver operating characteristic curve (AUROC) was reported. The AUROC shows the level of success in identification of cases and controls (AUROC = 1 represents perfect classification). RESULTS 52 individuals were identified as target cases and matched with 5200 controls. AI-assisted pre-screening tool successfully identified cases with FXS, 5 years earlier than the time of clinical diagnosis with an AUROC of 0.717. A separate model trained and tested on UW Health cases achieved the AUROC of 0.798. CONCLUSIONS This result shows the potential utility of our tool in accelerating FXS diagnosis in real clinical settings. Earlier diagnosis can lead to more timely intervention and access to services with the goal of improving patients' health outcomes.
Collapse
Affiliation(s)
- Arezoo Movaghar
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, WI, 53705, USA.
| | - David Page
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Murray Brilliant
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, WI, 53705, USA
| | - Marsha Mailick
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, WI, 53705, USA
| |
Collapse
|
8
|
Link NB, Huang S, Cai T, Sun J, Dahal K, Costa L, Cho K, Liao K, Cai T, Hong C. Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping. Int J Med Inform 2022; 162:104753. [PMID: 35405530 DOI: 10.1016/j.ijmedinf.2022.104753] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 03/11/2022] [Accepted: 03/27/2022] [Indexed: 01/05/2023]
Abstract
OBJECTIVE The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. METHODS We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. RESULTS CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CONCLUSION CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.
Collapse
Affiliation(s)
- Nicholas B Link
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
| | - Sicong Huang
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Tianrun Cai
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Jiehuan Sun
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Kumar Dahal
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Lauren Costa
- VA Boston Healthcare System, Boston, MA, United States
| | - Kelly Cho
- VA Boston Healthcare System, Boston, MA, United States
| | - Katherine Liao
- VA Boston Healthcare System, Boston, MA, United States; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, United States
| | - Tianxi Cai
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Chuan Hong
- VA Boston Healthcare System, Boston, MA, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| |
Collapse
|
9
|
Spilseth B, McKnight CD, Li MD, Park CJ, Fried JG, Yi PH, Brian JM, Lehman CD, Wang XJ, Phalke V, Pakkal M, Baruah D, Khine PP, Fajardo LL. AUR-RRA Review: Logistics of Academic-Industry Partnerships in Artificial Intelligence. Acad Radiol 2022; 29:119-128. [PMID: 34561163 DOI: 10.1016/j.acra.2021.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 07/29/2021] [Accepted: 08/07/2021] [Indexed: 12/27/2022]
Abstract
The Radiology Research Alliance (RRA) of the Association of University Radiologists (AUR) convenes Task Forces to address current topics in radiology. In this article, the AUR-RRA Task Force on Academic-Industry Partnerships for Artificial Intelligence, considered issues of importance to academic radiology departments contemplating industry partnerships in artificial intelligence (AI) development, testing and evaluation. Our goal was to create a framework encompassing the domains of clinical, technical, regulatory, legal and financial considerations that impact the arrangement and success of such partnerships.
Collapse
Affiliation(s)
- Benjamin Spilseth
- Department of Radiology, University of Minnesota Medical School, Minneapolis, Minnesota
| | - Colin D McKnight
- Department of Radiology and Radiological Sciences, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Matthew D Li
- Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Christian J Park
- Department of Radiology, Penn State Health, Milton S. Hershey Center, Hershey, Pennsylvania
| | - Jessica G Fried
- Department of Radiology, University of Michigan, Ann Arbor, Michigan
| | - Paul H Yi
- Department of Radiology and Diagnostic Imaging, University of Maryland Intelligent Imaging (UMII) Center, University of Maryland School of Medicine & Malone Center for Engineering in Healthcare, Johns Hopkins University Whiting School of Engineering, Baltimore, Maryland
| | - James M Brian
- Department of Radiology, Penn State Health, Penn State Children's Hospital, Penn State Milton S. Hershey Medical Center, Hershey, Pennsylvania
| | - Constance D Lehman
- Department of Radiology, Massachusetts General Hospital, Boston, Massachusetts
| | | | - Vaishali Phalke
- Department of Radiology, University of Florida, Gainesville, Florida
| | - Mini Pakkal
- Department of Radiology, University of Toronto, Toronto, Canada
| | - Dhiraj Baruah
- Department of Radiology and Radiological Science; Medical University of South Carolina, Charleston, South Carolina
| | - Pwint Phyu Khine
- Penn State Health Milton S. Hershey Medical Center, Hershey, PA, USA
| | - Laurie L Fajardo
- Department of Radiology and Radiological Sciences, University of Utah, 1950 Circle of Hope - 3rd floor Breast Imaging Clinic, Salt Lake City, UT 84112.
| |
Collapse
|
10
|
Movaghar A, Page D, Brilliant M, Mailick M. Response to Timothé Ménard. Genet Med 2021; 24:752-753. [PMID: 34906516 DOI: 10.1016/j.gim.2021.10.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 10/27/2021] [Accepted: 10/28/2021] [Indexed: 10/19/2022] Open
Affiliation(s)
- Arezoo Movaghar
- Waisman Center, University of Wisconsin-Madison, Madison, WI
| | - David Page
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC
| | | | - Marsha Mailick
- Waisman Center, University of Wisconsin-Madison, Madison, WI.
| |
Collapse
|
11
|
Somani S, Yoffie S, Teng S, Havaldar S, Nadkarni GN, Zhao S, Glicksberg BS. Development and validation of techniques for phenotyping ST-elevation myocardial infarction encounters from electronic health records. JAMIA Open 2021; 4:ooab068. [PMID: 34423260 PMCID: PMC8374370 DOI: 10.1093/jamiaopen/ooab068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 06/07/2021] [Accepted: 07/29/2021] [Indexed: 11/12/2022] Open
Abstract
Objectives Classifying hospital admissions into various acute myocardial infarction phenotypes in electronic health records (EHRs) is a challenging task with strong research implications that remains unsolved. To our knowledge, this study is the first study to design and validate phenotyping algorithms using cardiac catheterizations to identify not only patients with a ST-elevation myocardial infarction (STEMI), but the specific encounter when it occurred. Materials and Methods We design and validate multi-modal algorithms to phenotype STEMI on a multicenter EHR containing 5.1 million patients and 115 million patient encounters by using discharge summaries, diagnosis codes, electrocardiography readings, and the presence of cardiac catheterizations on the encounter. Results We demonstrate that robustly phenotyping STEMIs by selecting discharge summaries containing “STEM” has the potential to capture the most number of STEMIs (positive predictive value [PPV] = 0.36, N = 2110), but that addition of a STEMI-related International Classification of Disease (ICD) code and cardiac catheterizations to these summaries yields the highest precision (PPV = 0.94, N = 952). Discussion and Conclusion In this study, we demonstrate that the incorporation of percutaneous coronary intervention increases the PPV for detecting STEMI-related patient encounters from the EHR.
Collapse
Affiliation(s)
- Sulaiman Somani
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Stephen Yoffie
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shelly Teng
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shreyas Havaldar
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Girish N Nadkarni
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Shan Zhao
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Benjamin S Glicksberg
- The Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
12
|
Klann JG, Estiri H, Weber GM, Moal B, Avillach P, Hong C, Tan ALM, Beaulieu-Jones BK, Castro V, Maulhardt T, Geva A, Malovini A, South AM, Visweswaran S, Morris M, Samayamuthu MJ, Omenn GS, Ngiam KY, Mandl KD, Boeker M, Olson KL, Mowery DL, Follett RW, Hanauer DA, Bellazzi R, Moore JH, Loh NHW, Bell DS, Wagholikar KB, Chiovato L, Tibollo V, Rieg S, Li ALLJ, Jouhet V, Schriver E, Xia Z, Hutch M, Luo Y, Kohane IS, Brat GA, Murphy SN. Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data. J Am Med Inform Assoc 2021; 28:1411-1420. [PMID: 33566082 PMCID: PMC7928835 DOI: 10.1093/jamia/ocab018] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 01/14/2021] [Accepted: 01/29/2021] [Indexed: 12/21/2022] Open
Abstract
OBJECTIVE The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity. MATERIALS AND METHODS Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site. RESULTS The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review. DISCUSSION We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions. CONCLUSIONS We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.
Collapse
Affiliation(s)
- Jeffrey G Klann
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Hossein Estiri
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Griffin M Weber
- Department of Biomedical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
| | - Bertrand Moal
- IAM Unit, Public Health Department , Bordeaux University Hospital, Bordeaux, France
| | - Paul Avillach
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Amelia L M Tan
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Brett K Beaulieu-Jones
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Victor Castro
- Research Information Science and Computing, Mass General Brigham, Boston, Massachusetts, USA
| | - Thomas Maulhardt
- Institute of Medical Biometry and Statistics, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Alon Geva
- Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Alberto Malovini
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy
| | - Andrew M South
- Section of Nephrology, Department of Pediatrics, Brenner Children's Hospital, Wake Forest School of Medicine, Winston Salem, North Carolina, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Malarkodi J Samayamuthu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Kee Yuan Ngiam
- Department of Biomedical Informatics-WisDM, National University Health System, Singapore
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Martin Boeker
- Institute of Medical Biometry and Statistics, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Karen L Olson
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
| | - Danielle L Mowery
- Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Robert W Follett
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California, USA
| | - David A Hanauer
- Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA
| | - Riccardo Bellazzi
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy.,Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Ne-Hooi Will Loh
- Division of Critical Care, National University Health System, Singapore
| | - Douglas S Bell
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California, USA
| | | | - Luca Chiovato
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy.,Department of Internal Medicine and Medical Therapy, University of Pavia, Pavia, Italy
| | - Valentina Tibollo
- Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy
| | - Siegbert Rieg
- Division of Infectious Diseases, Department of Medicine II, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Anthony L L J Li
- National Center for Infectious Diseases, Tan Tock Seng Hospital, Singapore
| | - Vianney Jouhet
- ERIAS-INSERM U1219 BPH, Bordeaux University Hospital, Bordeaux, France
| | - Emily Schriver
- Data Analytics Center, Penn Medicine, Philadelphia, Pennsylvania, USA
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Meghan Hutch
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Gabriel A Brat
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing , Mass General Brigham, Boston, Massachusetts, USA
| |
Collapse
|
13
|
Movaghar A, Page D, Scholze D, Hong J, DaWalt LS, Kuusisto F, Stewart R, Brilliant M, Mailick M. Artificial intelligence-assisted phenotype discovery of fragile X syndrome in a population-based sample. Genet Med 2021; 23:1273-1280. [PMID: 33772223 PMCID: PMC8257481 DOI: 10.1038/s41436-021-01144-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 03/01/2021] [Accepted: 03/02/2021] [Indexed: 11/09/2022] Open
Abstract
PURPOSE Fragile X syndrome (FXS), the most prevalent inherited cause of intellectual disability, remains underdiagnosed in the general population. Clinical studies have shown that individuals with FXS have a complex health profile leading to unique clinical needs. However, the full impact of this X-linked disorder on the health of affected individuals is unclear and the prevalence of co-occurring conditions is unknown. METHODS We mined the longitudinal electronic health records from more than one million individuals to investigate the health characteristics of patients who have been clinically diagnosed with FXS. Additionally, using machine-learning approaches, we created predictive models to identify individuals with FXS in the general population. RESULTS Our discovery-oriented approach identified the associations of FXS with a wide range of medical conditions including circulatory, endocrine, digestive, and genitourinary, in addition to mental and neurological disorders. We successfully created predictive models to identify cases five years prior to clinical diagnosis of FXS without relying on any genetic or familial data. CONCLUSION Although FXS is often thought of primarily as a neurological disorder, it is in fact a multisystem syndrome involving many co-occurring conditions, some primary and some secondary, and they are associated with a considerable burden on patients and their families.
Collapse
Affiliation(s)
- Arezoo Movaghar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
| | - David Page
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Danielle Scholze
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
- Department of Pediatrics, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
| | - Jinkuk Hong
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Ron Stewart
- Morgridge Institute for Research, Madison, WI, USA
| | - Murray Brilliant
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA
- Marshfield Clinic Research Institute, Marshfield, WI, USA
| | - Marsha Mailick
- Waisman Center, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
14
|
Walunas TL, Ghosh AS, Pacheco JA, Mitrovic V, Wu A, Jackson KL, Schusler R, Chung A, Erickson D, Mancera-Cuevas K, Luo Y, Kho AN, Ramsey-Goldman R. Evaluation of structured data from electronic health records to identify clinical classification criteria attributes for systemic lupus erythematosus. Lupus Sci Med 2021; 8:8/1/e000488. [PMID: 33903204 PMCID: PMC8076919 DOI: 10.1136/lupus-2021-000488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/12/2021] [Accepted: 04/13/2021] [Indexed: 11/10/2022]
Abstract
Objective Our objective was to develop algorithms to identify lupus clinical classification criteria attributes using structured data found in the electronic health record (EHR) and determine whether they could be used to describe a cohort of people with lupus and discriminate them from a defined healthy control cohort. Methods We created gold standard lupus and healthy patient cohorts that were fully adjudicated for the American College of Rheumatology (ACR), Systemic Lupus International Collaborating Clinics (SLICC) and European League Against Rheumatism/ACR (EULAR/ACR) classification criteria and had matched EHR data. We implemented rule-based algorithms using structured data within the EHR system for each attribute of the three classification criteria. Individual criteria attribute and classification criteria algorithms as a whole were assessed over our combined cohorts and the overall performance of the algorithms was measured through sensitivity and specificity. Results Individual classification criteria attributes had a wide range of sensitivities, 7% (oral ulcers) to 97% (haematological disorders) and specificities, 56% (haematological disorders) to 98% (photosensitivity), but all could be identified in EHR data. In general, algorithms based on laboratory results performed better than those primarily based on diagnosis codes. All three classification criteria systems effectively distinguished members of our case and control cohorts, but the SLICC criteria-based algorithm had the highest overall performance (76% sensitivity, 99% specificity). Conclusions It is possible to characterise disease manifestations in people with lupus using classification criteria-based algorithms that assess structured EHR data. These algorithms may reduce chart review burden and are a foundation for identifying subpopulations of patients with lupus based on disease presentation to support precision medicine applications.
Collapse
Affiliation(s)
- Theresa L Walunas
- Division of General Internal Medicine and Geriatrics, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA .,Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Anika S Ghosh
- Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Vesna Mitrovic
- Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Andy Wu
- Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Kathryn L Jackson
- Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Ryan Schusler
- Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Anh Chung
- Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Daniel Erickson
- Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Karen Mancera-Cuevas
- Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Abel N Kho
- Division of General Internal Medicine and Geriatrics, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.,Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Rosalind Ramsey-Goldman
- Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| |
Collapse
|
15
|
Shang N, Khan A, Polubriaginof F, Zanoni F, Mehl K, Fasel D, Drawz PE, Carrol RJ, Denny JC, Hathcock MA, Arruda-Olson AM, Peissig PL, Dart RA, Brilliant MH, Larson EB, Carrell DS, Pendergrass S, Verma SS, Ritchie MD, Benoit B, Gainer VS, Karlson EW, Gordon AS, Jarvik GP, Stanaway IB, Crosslin DR, Mohan S, Ionita-Laza I, Tatonetti NP, Gharavi AG, Hripcsak G, Weng C, Kiryluk K. Medical records-based chronic kidney disease phenotype for clinical care and "big data" observational and genetic studies. NPJ Digit Med 2021; 4:70. [PMID: 33850243 PMCID: PMC8044136 DOI: 10.1038/s41746-021-00428-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 02/25/2021] [Indexed: 12/19/2022] Open
Abstract
Chronic Kidney Disease (CKD) represents a slowly progressive disorder that is typically silent until late stages, but early intervention can significantly delay its progression. We designed a portable and scalable electronic CKD phenotype to facilitate early disease recognition and empower large-scale observational and genetic studies of kidney traits. The algorithm uses a combination of rule-based and machine-learning methods to automatically place patients on the staging grid of albuminuria by glomerular filtration rate ("A-by-G" grid). We manually validated the algorithm by 451 chart reviews across three medical systems, demonstrating overall positive predictive value of 95% for CKD cases and 97% for healthy controls. Independent case-control validation using 2350 patient records demonstrated diagnostic specificity of 97% and sensitivity of 87%. Application of the phenotype to 1.3 million patients demonstrated that over 80% of CKD cases are undetected using ICD codes alone. We also demonstrated several large-scale applications of the phenotype, including identifying stage-specific kidney disease comorbidities, in silico estimation of kidney trait heritability in thousands of pedigrees reconstructed from medical records, and biobank-based multicenter genome-wide and phenome-wide association studies.
Collapse
Affiliation(s)
- Ning Shang
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Fernanda Polubriaginof
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Francesca Zanoni
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Karla Mehl
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - David Fasel
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Paul E Drawz
- Department of Medicine, University of Minnesota, Minnesota, MN, USA
| | - Robert J Carrol
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
- Departments of Medicine, Vanderbilt University, Nashville, TN, USA
| | | | | | | | - Richard A Dart
- Marshfield Clinic Research Institute, Marshfield, WI, USA
| | | | - Eric B Larson
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | | | | | | | | | | | | | - Adam S Gordon
- Center for Genetic Medicine, Northwestern University, Chicago, IL, USA
| | - Gail P Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ian B Stanaway
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David R Crosslin
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Sumit Mohan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Ali G Gharavi
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA.
| |
Collapse
|
16
|
Floris-Moore M, Edmonds A, Napravnik S, Adimora AA. Computerized Adjudication of Coronary Heart Disease Events Using the Electronic Medical Record in HIV Clinical Research: Possibilities and Challenges Ahead. AIDS Res Hum Retroviruses 2020; 36:306-313. [PMID: 31407587 DOI: 10.1089/aid.2019.0036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
This pilot study assessed feasibility of computer-assisted electronic medical record (EMR) abstraction to ascertain coronary heart disease (CHD) event hospitalizations. We included a sample of 87 hospitalization records from participants the University of North Carolina (UNC) site of the Women's Interagency HIV Study (WIHS) and UNC Center for AIDS Research (CFAR) HIV Clinical Cohort who were hospitalized within UNC Healthcare System from July 2004 to July 2015. We compared a computer algorithm utilizing diagnosis/procedure codes, medications, and cardiac enzyme levels to adjudicate CHD events [myocardial infarction (MI)/coronary revascularization] from the EMR to standardized manual chart adjudication. Of 87 hospitalizations, 42 were classified as definite, 25 probable, and 20 non-CHD events by manual chart adjudication. A computer algorithm requiring presence of ≥1 CHD-related International Classification of Diseases, 9th Revision (ICD-9)/Current Procedural Terminology (CPT) code correctly identified 24 of 42 definite (57%), 29 of 67 probable/definite CHD (43%), and 95% of non-CHD events; additionally requiring clinically defined cardiac enzyme levels or administration of MI-related medications correctly identified 55%, 42%, and 95% of such events, respectively. Requiring any one of the ICD-9/CPT or cardiac enzyme criteria correctly identified 98% of definite, 97% of probable/definite CHD, and 85% of non-CHD events. Challenges included difficulty matching hospitalization dates, incomplete diagnosis code data, and multiple field names/locations of laboratory/medication data. Computer algorithms comprising only ICD-9/CPT codes failed to identify a sizable proportion of CHD events. Using a less restrictive algorithm yielded fewer missed events but increased the false-positive rate. Despite potential benefits of EMR-based research, there remain several challenges to fully computerized adjudication of CHD events.
Collapse
Affiliation(s)
- Michelle Floris-Moore
- Division of Infectious Diseases, Department of Medicine, School of Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Medicine, Center for AIDS Research, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Andrew Edmonds
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Sonia Napravnik
- Division of Infectious Diseases, Department of Medicine, School of Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Medicine, Center for AIDS Research, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Adaora A. Adimora
- Division of Infectious Diseases, Department of Medicine, School of Medicine, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Medicine, Center for AIDS Research, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
17
|
Feller DJ, Zucker J, Walk OBD, Yin MT, Gordon P, Elhadad N. Longitudinal analysis of social and behavioral determinants of health in the EHR: exploring the impact of patient trajectories and documentation practices. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:399-407. [PMID: 32308833 PMCID: PMC7153098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that impede disease self-management and can exacerbate clinical conditions. While recent research in the informatics community has focused on building systems that can automatically infer SBDH from the patient record, it is unclear how such determinants change overtime. This study analyzes the longitudinal characteristics of 4 common SBDH as expressed in the patient record and compares the rates of change among distinct SBDH. In addition, manual review of patient notes was undertaken to establish whether changes in patient SBDH status reflected legitimate changes in patient status or rather potential data quality issues. Our findings suggest that a patient's SBDH status is liable to change over time and that some changes reflect poor social history taking by clinicians.
Collapse
Affiliation(s)
- Daniel J Feller
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Jason Zucker
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY USA
| | | | - Michael T Yin
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY USA
| | - Peter Gordon
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
18
|
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019; 14:3426-3444. [PMID: 31748751 DOI: 10.1038/s41596-019-0227-6] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 07/22/2019] [Indexed: 01/12/2023]
Abstract
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Collapse
|
19
|
Chartash D, Paek H, Dziura JD, Ross BK, Nogee DP, Boccio E, Hines C, Schott AM, Jeffery MM, Patel MD, Platts-Mills TF, Ahmed O, Brandt C, Couturier K, Melnick E. Identifying Opioid Use Disorder in the Emergency Department: Multi-System Electronic Health Record-Based Computable Phenotype Derivation and Validation Study. JMIR Med Inform 2019; 7:e15794. [PMID: 31674913 PMCID: PMC6913746 DOI: 10.2196/15794] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 09/27/2019] [Accepted: 10/01/2019] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Deploying accurate computable phenotypes in pragmatic trials requires a trade-off between precise and clinically sensical variable selection. In particular, evaluating the medical encounter to assess a pattern leading to clinically significant impairment or distress indicative of disease is a difficult modeling challenge for the emergency department. OBJECTIVE This study aimed to derive and validate an electronic health record-based computable phenotype to identify emergency department patients with opioid use disorder using physician chart review as a reference standard. METHODS A two-algorithm computable phenotype was developed and evaluated using structured clinical data across 13 emergency departments in two large health care systems. Algorithm 1 combined clinician and billing codes. Algorithm 2 used chief complaint structured data suggestive of opioid use disorder. To evaluate the algorithms in both internal and external validation phases, two emergency medicine physicians, with a third acting as adjudicator, reviewed a pragmatic sample of 231 charts: 125 internal validation (75 positive and 50 negative), 106 external validation (56 positive and 50 negative). RESULTS Cohen kappa, measuring agreement between reviewers, for the internal and external validation cohorts was 0.95 and 0.93, respectively. In the internal validation phase, Algorithm 1 had a positive predictive value (PPV) of 0.96 (95% CI 0.863-0.995) and a negative predictive value (NPV) of 0.98 (95% CI 0.893-0.999), and Algorithm 2 had a PPV of 0.8 (95% CI 0.593-0.932) and an NPV of 1.0 (one-sided 97.5% CI 0.863-1). In the external validation phase, the phenotype had a PPV of 0.95 (95% CI 0.851-0.989) and an NPV of 0.92 (95% CI 0.807-0.978). CONCLUSIONS This phenotype detected emergency department patients with opioid use disorder with high predictive values and reliability. Its algorithms were transportable across health care systems and have potential value for both clinical and research purposes.
Collapse
Affiliation(s)
- David Chartash
- Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, United States
| | - Hyung Paek
- Information Technology Services, Yale New Haven Health, New Haven, CT, United States
| | - James D Dziura
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Bill K Ross
- North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Daniel P Nogee
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Eric Boccio
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Cory Hines
- Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Aaron M Schott
- Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Molly M Jeffery
- Department of Emergency Medicine, Mayo Clinic, Rochester, MN, United States
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Mehul D Patel
- Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Timothy F Platts-Mills
- Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Osama Ahmed
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Cynthia Brandt
- Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, United States
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Katherine Couturier
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| | - Edward Melnick
- Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
| |
Collapse
|
20
|
Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network. J Biomed Inform 2019; 99:103293. [PMID: 31542521 DOI: 10.1016/j.jbi.2019.103293] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 08/26/2019] [Accepted: 09/19/2019] [Indexed: 11/21/2022]
Abstract
BACKGROUND Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ± 1.38. Specifically, the average knowledge (K) score is 0.64 ± 0.66, interpretation (I) score is 0.33 ± 0.55, and programming (P) score is 0.40 ± 0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.
Collapse
|
21
|
Movaghar A, Page D, Brilliant M, Baker MW, Greenberg J, Hong J, DaWalt LS, Saha K, Kuusisto F, Stewart R, Berry-Kravis E, Mailick MR. Data-driven phenotype discovery of FMR1 premutation carriers in a population-based sample. SCIENCE ADVANCES 2019; 5:eaaw7195. [PMID: 31457090 PMCID: PMC6703870 DOI: 10.1126/sciadv.aaw7195] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 07/15/2019] [Indexed: 05/18/2023]
Abstract
The impact of the FMR1 premutation on human health is the subject of considerable controversy. A fundamental unanswered question is whether carrying the premutation allele is directly correlated with clinical phenotypes. A challenging problem in past genotype-phenotype studies of the FMR1 premutation is ascertainment bias, which could lead to invalid research conclusions and negatively affect clinical practice. Here, we created the first population-based FMR1-informed biobank to find the pattern of health characteristics in premutation carriers. Our extensive phenotyping shows that premutation carriers experience a clinical profile that is significantly different from controls and is evident throughout adulthood. Comprehensive understanding of the clinical risk associated with this genetic variant is critical for premutation carriers, their families, and clinicians and has important implications for public health.
Collapse
Affiliation(s)
- Arezoo Movaghar
- Waisman Center, University of Wisconsin–Madison, Madison, WI, USA
- Department of Biomedical Engineering, University of Wisconsin–Madison, Madison, WI, USA
| | - David Page
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, WI, USA
| | | | | | - Jan Greenberg
- Waisman Center, University of Wisconsin–Madison, Madison, WI, USA
| | - Jinkuk Hong
- Waisman Center, University of Wisconsin–Madison, Madison, WI, USA
| | | | - Krishanu Saha
- Waisman Center, University of Wisconsin–Madison, Madison, WI, USA
- Department of Biomedical Engineering, University of Wisconsin–Madison, Madison, WI, USA
| | | | - Ron Stewart
- Morgridge Institute for Research, Madison, WI, USA
| | | | - Marsha R. Mailick
- Waisman Center, University of Wisconsin–Madison, Madison, WI, USA
- Corresponding author.
| |
Collapse
|
22
|
Hripcsak G, Shang N, Peissig PL, Rasmussen LV, Liu C, Benoit B, Carroll RJ, Carrell DS, Denny JC, Dikilitas O, Gainer VS, Howell KM, Klann JG, Kullo IJ, Lingren T, Mentch FD, Murphy SN, Natarajan K, Pacheco JA, Wei WQ, Wiley K, Weng C. Facilitating phenotype transfer using a common data model. J Biomed Inform 2019; 96:103253. [PMID: 31325501 DOI: 10.1016/j.jbi.2019.103253] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 07/11/2019] [Accepted: 07/16/2019] [Indexed: 11/16/2022]
Abstract
BACKGROUND Implementing clinical phenotypes across a network is labor intensive and potentially error prone. Use of a common data model may facilitate the process. METHODS Electronic Medical Records and Genomics (eMERGE) sites implemented the Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) Common Data Model across their electronic health record (EHR)-linked DNA biobanks. Two previously implemented eMERGE phenotypes were converted to OMOP and implemented across the network. RESULTS It was feasible to implement the common data model across sites, with laboratory data producing the greatest challenge due to local encoding. Sites were then able to execute the OMOP phenotype in less than one day, as opposed to weeks of effort to manually implement an eMERGE phenotype in their bespoke research EHR databases. Of the sites that could compare the current OMOP phenotype implementation with the original eMERGE phenotype implementation, specific agreement ranged from 100% to 43%, with disagreements due to the original phenotype, the OMOP phenotype, changes in data, and issues in the databases. Using the OMOP query as a standard comparison revealed differences in the original implementations despite starting from the same definitions, code lists, flowcharts, and pseudocode. CONCLUSION Using a common data model can dramatically speed phenotype implementation at the cost of having to populate that data model, though this will produce a net benefit as the number of phenotype implementations increases. Inconsistencies among the implementations of the original queries point to a potential benefit of using a common data model so that actual phenotype code and logic can be shared, mitigating human error in reinterpretation of a narrative phenotype definition.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, NY, United States; Medical Informatics Services, NewYork-Presbyterian Hospital, New York, NY, United States.
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Peggy L Peissig
- Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI, United States
| | - Luke V Rasmussen
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| | - Barbara Benoit
- Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - David S Carrell
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, United States
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
| | - Vivian S Gainer
- Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
| | - Kayla Marie Howell
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Jeffrey G Klann
- Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
| | - Todd Lingren
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Frank D Mentch
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, United States
| | - Shawn N Murphy
- Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University, New York, NY, United States; Medical Informatics Services, NewYork-Presbyterian Hospital, New York, NY, United States
| | - Jennifer A Pacheco
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Ken Wiley
- National Human Genome Research Institute, NIH, Bethesda, MD, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, United States
| |
Collapse
|
23
|
Langlotz CP, Allen B, Erickson BJ, Kalpathy-Cramer J, Bigelow K, Cook TS, Flanders AE, Lungren MP, Mendelson DS, Rudie JD, Wang G, Kandarpa K. A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop. Radiology 2019; 291:781-791. [PMID: 30990384 PMCID: PMC6542624 DOI: 10.1148/radiol.2019190613] [Citation(s) in RCA: 175] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 03/24/2019] [Accepted: 03/25/2019] [Indexed: 01/08/2023]
Abstract
Imaging research laboratories are rapidly creating machine learning systems that achieve expert human performance using open-source methods and tools. These artificial intelligence systems are being developed to improve medical image reconstruction, noise reduction, quality assurance, triage, segmentation, computer-aided detection, computer-aided classification, and radiogenomics. In August 2018, a meeting was held in Bethesda, Maryland, at the National Institutes of Health to discuss the current state of the art and knowledge gaps and to develop a roadmap for future research initiatives. Key research priorities include: 1, new image reconstruction methods that efficiently produce images suitable for human interpretation from source data; 2, automated image labeling and annotation methods, including information extraction from the imaging report, electronic phenotyping, and prospective structured image reporting; 3, new machine learning methods for clinical imaging data, such as tailored, pretrained model architectures, and federated machine learning methods; 4, machine learning methods that can explain the advice they provide to human users (so-called explainable artificial intelligence); and 5, validated methods for image de-identification and data sharing to facilitate wide availability of clinical imaging data sets. This research roadmap is intended to identify and prioritize these needs for academic research laboratories, funding agencies, professional societies, and industry.
Collapse
Affiliation(s)
- Curtis P. Langlotz
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Bibb Allen
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Bradley J. Erickson
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Jayashree Kalpathy-Cramer
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Keith Bigelow
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Tessa S. Cook
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Adam E. Flanders
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Matthew P. Lungren
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - David S. Mendelson
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Jeffrey D. Rudie
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Ge Wang
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| | - Krishna Kandarpa
- From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
| |
Collapse
|
24
|
Wagholikar KB, Ainsworth L, Vernekar VP, Pathak A, Glynn C, Zelle D, Zagade A, Karipineni N, Herrick CD, McPartlin M, Bui TV, Mendis M, Klann J, Oates M, Gordon W, Cannon C, Patel R, Aronson SJ, MacRae CA, Scirica BM, Murphy SN. Extending i2b2 into a framework for semantic abstraction of EHR to facilitate rapid development and portability of Health IT applications. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:370-378. [PMID: 31258990 PMCID: PMC6568124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The wide gap between a care provider's conceptualization of electronic health record (EHR) and the structures for electronic health record (EHR) data storage and transmission, presents a multitude of obstacles for development of innovative Health IT applications. While developers model the EHR view of the clinicians at one end, they work with a different data view to construct health IT applications. Although there has been considerable progress to bridge this gap by evolution of developer friendly standards and tools for terminology mapping and data warehousing, there is a need for a simplified framework to facilitate development of interoperable applications. To this end, we propose a framework for creating a layer of semantic abstraction on the EHR and describe preliminary work on the implementation of this framework for management of hyperlipidemia and hypertension. Our goal is to facilitate the rapid development and portability of Health IT applications.
Collapse
Affiliation(s)
- Kavishwar B Wagholikar
- Harvard Medical School, Boston, MA
- Massachusetts General Hospital, Boston, MA
- Partners Healthcare Boston, MA
| | | | | | | | | | | | | | - Neelima Karipineni
- Harvard Medical School, Boston, MA
- Brigham and Women's Hospital, Boston, MA
| | | | - Marian McPartlin
- Harvard Medical School, Boston, MA
- Brigham and Women's Hospital, Boston, MA
- Massachusetts General Hospital, Boston, MA
- Persistent Systems, Pune, India
- Partners Healthcare Boston, MA
| | - Tiffany V Bui
- Harvard Medical School, Boston, MA
- Brigham and Women's Hospital, Boston, MA
- Massachusetts General Hospital, Boston, MA
- Persistent Systems, Pune, India
- Partners Healthcare Boston, MA
| | | | - Jeffery Klann
- Harvard Medical School, Boston, MA
- Massachusetts General Hospital, Boston, MA
- Partners Healthcare Boston, MA
| | | | | | - Christopher Cannon
- Harvard Medical School, Boston, MA
- Brigham and Women's Hospital, Boston, MA
- Massachusetts General Hospital, Boston, MA
- Persistent Systems, Pune, India
- Partners Healthcare Boston, MA
| | | | | | - Calum A MacRae
- Harvard Medical School, Boston, MA
- Brigham and Women's Hospital, Boston, MA
| | - Benjamin M Scirica
- Harvard Medical School, Boston, MA
- Brigham and Women's Hospital, Boston, MA
| | - Shawn N Murphy
- Harvard Medical School, Boston, MA
- Massachusetts General Hospital, Boston, MA
| |
Collapse
|
25
|
Jeong E, Park N, Choi Y, Park RW, Yoon D. Machine learning model combining features from algorithms with different analytical methodologies to detect laboratory-event-related adverse drug reaction signals. PLoS One 2018; 13:e0207749. [PMID: 30462745 PMCID: PMC6248973 DOI: 10.1371/journal.pone.0207749] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Accepted: 11/06/2018] [Indexed: 11/25/2022] Open
Abstract
Background The importance of identifying and evaluating adverse drug reactions (ADRs) has been widely recognized. Many studies have developed algorithms for ADR signal detection using electronic health record (EHR) data. In this study, we propose a machine learning (ML) model that enables accurate ADR signal detection by integrating features from existing algorithms based on inpatient EHR laboratory results. Materials and methods To construct an ADR reference dataset, we extracted known drug–laboratory event pairs represented by a laboratory test from the EU-SPC and SIDER databases. All possible drug–laboratory event pairs, except known ones, are considered unknown. To detect a known drug–laboratory event pair, three existing algorithms—CERT, CLEAR, and PACE—were applied to 21-year inpatient EHR data. We also constructed ML models (based on random forest, L1 regularized logistic regression, support vector machine, and a neural network) that use the intermediate products of the CERT, CLEAR, and PACE algorithms as inputs and determine whether a drug–laboratory event pair is associated. For performance comparison, we evaluated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-measure, and area under receiver operating characteristic (AUROC). Results All measures of ML models outperformed those of existing algorithms with sensitivity of 0.593–0.793, specificity of 0.619–0.796, NPV of 0.645–0.727, PPV of 0.680–0.777, F1-measure of 0.629–0.709, and AUROC of 0.737–0.816. Features related to change or distribution of shape were considered important for detecting ADR signals. Conclusions Improved performance of ML models indicated that applying our model to EHR data is feasible and promising for detecting more accurate and comprehensive ADR signals.
Collapse
Affiliation(s)
- Eugene Jeong
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
| | - Namgi Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
- College of Pharmacy, Ewha Womans University, Seoul, Republic of Korea
| | - Young Choi
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
| | - Dukyong Yoon
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
- * E-mail:
| |
Collapse
|
26
|
Pacheco JA, Rasmussen LV, Kiefer RC, Campion TR, Speltz P, Carroll RJ, Stallings SC, Mo H, Ahuja M, Jiang G, LaRose ER, Peissig PL, Shang N, Benoit B, Gainer VS, Borthwick K, Jackson KL, Sharma A, Wu AY, Kho AN, Roden DM, Pathak J, Denny JC, Thompson WK. A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments. J Am Med Inform Assoc 2018; 25:1540-1546. [PMID: 30124903 PMCID: PMC6213083 DOI: 10.1093/jamia/ocy101] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 06/13/2018] [Accepted: 07/10/2018] [Indexed: 12/12/2022] Open
Abstract
Electronic health record (EHR) algorithms for defining patient cohorts are commonly shared as free-text descriptions that require human intervention both to interpret and implement. We developed the Phenotype Execution and Modeling Architecture (PhEMA, http://projectphema.org) to author and execute standardized computable phenotype algorithms. With PhEMA, we converted an algorithm for benign prostatic hyperplasia, developed for the electronic Medical Records and Genomics network (eMERGE), into a standards-based computable format. Eight sites (7 within eMERGE) received the computable algorithm, and 6 successfully executed it against local data warehouses and/or i2b2 instances. Blinded random chart review of cases selected by the computable algorithm shows PPV ≥90%, and 3 out of 5 sites had >90% overlap of selected cases when comparing the computable algorithm to their original eMERGE implementation. This case study demonstrates potential use of PhEMA computable representations to automate phenotyping across different EHR systems, but also highlights some ongoing challenges.
Collapse
Affiliation(s)
- Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Luke V Rasmussen
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Richard C Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Thomas R Campion
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Peter Speltz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Robert J Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Sarah C Stallings
- Meharry-Vanderbilt Alliance, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Huan Mo
- Department of Pathology, Loma Linda University Health, Loma Linda, California, USA
| | - Monika Ahuja
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Eric R LaRose
- Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
| | - Peggy L Peissig
- Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Barbara Benoit
- Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
| | - Vivian S Gainer
- Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
| | - Kenneth Borthwick
- Henry Hood Center for Health Research, Geisinger, Danville, Pennsylvania, USA
| | - Kathryn L Jackson
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Ambrish Sharma
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Andy Yizhou Wu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Abel N Kho
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jyotishman Pathak
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - William K Thompson
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| |
Collapse
|
27
|
Hripcsak G, Albers DJ. High-fidelity phenotyping: richness and freedom from bias. J Am Med Inform Assoc 2018; 25:289-294. [PMID: 29040596 PMCID: PMC7282504 DOI: 10.1093/jamia/ocx110] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 08/07/2017] [Accepted: 09/06/2017] [Indexed: 01/14/2023] Open
Abstract
Electronic health record phenotyping is the use of raw electronic health record data to assert characterizations about patients. Researchers have been doing it since the beginning of biomedical informatics, under different names. Phenotyping will benefit from an increasing focus on fidelity, both in the sense of increasing richness, such as measured levels, degree or severity, timing, probability, or conceptual relationships, and in the sense of reducing bias. Research agendas should shift from merely improving binary assignment to studying and improving richer representations. The field is actively researching new temporal directions and abstract representations, including deep learning. The field would benefit from research in nonlinear dynamics, in combining mechanistic models with empirical data, including data assimilation, and in topology. The health care process produces substantial bias, and studying that bias explicitly rather than treating it as merely another source of noise would facilitate addressing it.
Collapse
Affiliation(s)
- George Hripcsak
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| | - David J Albers
- Department of Biomedical Informatics, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
28
|
Glicksberg BS, Miotto R, Johnson KW, Shameer K, Li L, Chen R, Dudley JT. Automated disease cohort selection using word embeddings from Electronic Health Records. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:145-156. [PMID: 29218877 PMCID: PMC5788312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Accurate and robust cohort definition is critical to biomedical discovery using Electronic Health Records (EHR). Similar to prospective study designs, high quality EHR-based research requires rigorous selection criteria to designate case/control status particular to each disease. Electronic phenotyping algorithms, which are manually built and validated per disease, have been successful in filling this need. However, these approaches are time-consuming, leading to only a relatively small amount of algorithms for diseases developed. Methodologies that automatically learn features from EHRs have been used for cohort selection as well. To date, however, there has been no systematic analysis of how these methods perform against current gold standards. Accordingly, this paper compares the performance of a state-of-the-art automated feature learning method to extracting research-grade cohorts for five diseases against their established electronic phenotyping algorithms. In particular, we use word2vec to create unsupervised embeddings of the phenotype space within an EHR system. Using medical concepts as a query, we then rank patients by their proximity in the embedding space and automatically extract putative disease cohorts via a distance threshold. Experimental evaluation shows promising results with average F-score of 0.57 and AUC-ROC of 0.98. However, we noticed that results varied considerably between diseases, thus necessitating further investigation and/or phenotype-specific refinement of the approach before being readily deployed across all diseases.
Collapse
Affiliation(s)
- Benjamin S Glicksberg
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York, NY 10065, USA, ²Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl. New York, NY 10065, USA
| | | | | | | | | | | | | |
Collapse
|
29
|
Esteban S, Rodríguez Tablado M, Peper FE, Mahumud YS, Ricci RI, Kopitowski KS, Terrasa SA. Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 152:53-70. [PMID: 29054261 DOI: 10.1016/j.cmpb.2017.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 08/19/2017] [Accepted: 09/13/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE Recent progression towards precision medicine has encouraged the use of electronic health records (EHRs) as a source for large amounts of data, which is required for studying the effect of treatments or risk factors in more specific subpopulations. Phenotyping algorithms allow to automatically classify patients according to their particular electronic phenotype thus facilitating the setup of retrospective cohorts. Our objective is to compare the performance of different classification strategies (only using standardized problems, rule-based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive; Diabetes of any type) using information extracted from EHRs. METHODS Patient information was extracted from the EHR at Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. For the derivation and validation datasets, two probabilistic samples of patients from different years (2005: n = 1663; 2015: n = 800) were extracted. The only inclusion criterion was age (≥40 & <80 years). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria; non-diabetic: not fulfilling the ADA criteria and having at least one fasting glycemia below 126 mg/dL; inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set. RESULTS The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.82 (95% CI 0.76, 0.88). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98). CONCLUSIONS The stacked generalization strategy and the feedforward neural network showed the best classification metrics in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately.
Collapse
Affiliation(s)
- Santiago Esteban
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Research Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina..
| | | | - Francisco E Peper
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Yamila S Mahumud
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Ricardo I Ricci
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Karin S Kopitowski
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Research Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| | - Sergio A Terrasa
- Family and Community Division, Hospital Italiano de Buenos Aires, Buenos Aires, Argentina.; Public Health Department, Instituto Universitario Hospital Italiano de Buenos Aires, Buenos Aires, Argentina
| |
Collapse
|
30
|
Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary Use and Analysis of Big Data Collected for Patient Care. Yearb Med Inform 2017; 26:28-37. [PMID: 28480474 PMCID: PMC6239231 DOI: 10.15265/iy-2017-008] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Objectives: To identify common methodological challenges and review relevant initiatives related to the re-use of patient data collected in routine clinical care, as well as to analyze the economic benefits derived from the secondary use of this data. Through the use of several examples, this article aims to provide a glimpse into the different areas of application, namely clinical research, genomic research, study of environmental factors, and population and health services research. This paper describes some of the informatics methods and Big Data resources developed in this context, such as electronic phenotyping, clinical research networks, biorepositories, screening data banks, and wide association studies. Lastly, some of the potential limitations of these approaches are discussed, focusing on confounding factors and data quality. Methods: A series of literature searches in main bibliographic databases have been conducted in order to assess the extent to which existing patient data has been repurposed for research. This contribution from the IMIA working group on "Data mining and Big Data analytics" focuses on the literature published during the last two years, covering the timeframe since the working group's last survey. Results and Conclusions: Although most of the examples of secondary use of patient data lie in the arena of clinical and health services research, we have started to witness other important applications, particularly in the area of genomic research and the study of health effects of environmental factors. Further research is needed to characterize the economic impact of secondary use across the broad spectrum of translational research.
Collapse
Affiliation(s)
- F. J. Martin-Sanchez
- Weill Cornell Medicine, Department of Healthcare Policy and Research, Division of Health Informatics, New York, USA
| | - V. Aguiar-Pulido
- Weill Cornell Medicine, Brain and Mind Research Institute, New York, USA
| | - G. H. Lopez-Campos
- The University of Melbourne, Health & Biomedical Informatics Centre, Melbourne, Australia
| | - N. Peek
- MRC Health e-Research Centre, Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, UK
| | - L. Sacchi
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| |
Collapse
|
31
|
Small AM, Kiss DH, Zlatsin Y, Birtwell DL, Williams H, Guerraty MA, Han Y, Anwaruddin S, Holmes JH, Chirinos JA, Wilensky RL, Giri J, Rader DJ. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease. J Biomed Inform 2017. [PMID: 28624641 DOI: 10.1016/j.jbi.2017.06.016] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
BACKGROUND Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest. METHODS We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes. RESULTS Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD. CONCLUSION These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research.
Collapse
Affiliation(s)
- Aeron M Small
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - Daniel H Kiss
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - Yevgeny Zlatsin
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - David L Birtwell
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Heather Williams
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Marie A Guerraty
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - Yuchi Han
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - Saif Anwaruddin
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - John H Holmes
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Julio A Chirinos
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - Robert L Wilensky
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - Jay Giri
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
| | - Daniel J Rader
- Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA; Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Genetics, University of Pennsylvania Perelman School of Medicine, PA, USA.
| |
Collapse
|
32
|
Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, Pacheco JA, Tromp G, Pathak J, Carrell DS, Ellis SB, Lingren T, Thompson WK, Savova G, Haines J, Roden DM, Harris PA, Denny JC. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016; 23:1046-1052. [PMID: 27026615 PMCID: PMC5070514 DOI: 10.1093/jamia/ocv202] [Citation(s) in RCA: 213] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 10/27/2015] [Accepted: 11/25/2015] [Indexed: 01/29/2023] Open
Abstract
OBJECTIVE Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. RESULTS As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). DISCUSSION These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. CONCLUSION By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.
Collapse
Affiliation(s)
| | - Peter Speltz
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Luke V Rasmussen
- Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | | | - Omri Gottesman
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | | | | | | | | | | | | - Todd Lingren
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Will K Thompson
- Northwestern University, Feinberg School of Medicine, Chicago, IL, USA
| | - Guergana Savova
- Boston Children's Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Dan M Roden
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Paul A Harris
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Joshua C Denny
- Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
33
|
Malas MS, Wish J, Moorthi R, Grannis S, Dexter P, Duke J, Moe S. A comparison between physicians and computer algorithms for form CMS-2728 data reporting. Hemodial Int 2016; 21:117-124. [PMID: 27353890 DOI: 10.1111/hdi.12445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
INTRODUCTION CMS-2728 form (Medical Evidence Report) assesses 23 comorbidities chosen to reflect poor outcomes and increased mortality risk. Previous studies questioned the validity of physician reporting on forms CMS-2728. We hypothesize that reporting of comorbidities by computer algorithms identifies more comorbidities than physician completion, and, therefore, is more reflective of underlying disease burden. METHODS We collected data from CMS-2728 forms for all 296 patients who had incident ESRD diagnosis and received chronic dialysis from 2005 through 2014 at Indiana University outpatient dialysis centers. We analyzed patients' data from electronic medical records systems that collated information from multiple health care sources. Previously utilized algorithms or natural language processing was used to extract data on 10 comorbidities for a period of up to 10 years prior to ESRD incidence. These algorithms incorporate billing codes, prescriptions, and other relevant elements. We compared the presence or unchecked status of these comorbidities on the forms to the presence or absence according to the algorithms. FINDINGS Computer algorithms had higher reporting of comorbidities compared to forms completion by physicians. This remained true when decreasing data span to one year and using only a single health center source. The algorithms determination was well accepted by a physician panel. Importantly, algorithms use significantly increased the expected deaths and lowered the standardized mortality ratios. DISCUSSION Using computer algorithms showed superior identification of comorbidities for form CMS-2728 and altered standardized mortality ratios. Adapting similar algorithms in available EMR systems may offer more thorough evaluation of comorbidities and improve quality reporting.
Collapse
Affiliation(s)
- Mohammed Said Malas
- Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA.,Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, USA
| | - Jay Wish
- Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Ranjani Moorthi
- Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Shaun Grannis
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, USA
| | - Paul Dexter
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, USA
| | - Jon Duke
- Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, USA
| | - Sharon Moe
- Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA.,Roudebush Veterans Administration Medical Center, Indianapolis, Indiana, USA
| |
Collapse
|
34
|
Tenenbaum JD. Translational Bioinformatics: Past, Present, and Future. GENOMICS PROTEOMICS & BIOINFORMATICS 2016; 14:31-41. [PMID: 26876718 PMCID: PMC4792852 DOI: 10.1016/j.gpb.2016.01.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 01/20/2016] [Indexed: 02/04/2023]
Abstract
Though a relatively young discipline, translational bioinformatics (TBI) has become a key component of biomedical research in the era of precision medicine. Development of high-throughput technologies and electronic health records has caused a paradigm shift in both healthcare and biomedical research. Novel tools and methods are required to convert increasingly voluminous datasets into information and actionable knowledge. This review provides a definition and contextualization of the term TBI, describes the discipline’s brief history and past accomplishments, as well as current foci, and concludes with predictions of future directions in the field.
Collapse
Affiliation(s)
- Jessica D Tenenbaum
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA.
| |
Collapse
|
35
|
Klann JG, Phillips LC, Turchin A, Weiler S, Mandl KD, Murphy SN. A numerical similarity approach for using retired Current Procedural Terminology (CPT) codes for electronic phenotyping in the Scalable Collaborative Infrastructure for a Learning Health System (SCILHS). BMC Med Inform Decis Mak 2015; 15:104. [PMID: 26655696 PMCID: PMC4676189 DOI: 10.1186/s12911-015-0223-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 11/25/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Interoperable phenotyping algorithms, needed to identify patient cohorts meeting eligibility criteria for observational studies or clinical trials, require medical data in a consistent structured, coded format. Data heterogeneity limits such algorithms' applicability. Existing approaches are often: not widely interoperable; or, have low sensitivity due to reliance on the lowest common denominator (ICD-9 diagnoses). In the Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS) we endeavor to use the widely-available Current Procedural Terminology (CPT) procedure codes with ICD-9. Unfortunately, CPT changes drastically year-to-year - codes are retired/replaced. Longitudinal analysis requires grouping retired and current codes. BioPortal provides a navigable CPT hierarchy, which we imported into the Informatics for Integrating Biology and the Bedside (i2b2) data warehouse and analytics platform. However, this hierarchy does not include retired codes. METHODS We compared BioPortal's 2014AA CPT hierarchy with Partners Healthcare's SCILHS datamart, comprising three-million patients' data over 15 years. 573 CPT codes were not present in 2014AA (6.5 million occurrences). No existing terminology provided hierarchical linkages for these missing codes, so we developed a method that automatically places missing codes in the most specific "grouper" category, using the numerical similarity of CPT codes. Two informaticians reviewed the results. We incorporated the final table into our i2b2 SCILHS/PCORnet ontology, deployed it at seven sites, and performed a gap analysis and an evaluation against several phenotyping algorithms. RESULTS The reviewers found the method placed the code correctly with 97 % precision when considering only miscategorizations ("correctness precision") and 52 % precision using a gold-standard of optimal placement ("optimality precision"). High correctness precision meant that codes were placed in a reasonable hierarchal position that a reviewer can quickly validate. Lower optimality precision meant that codes were not often placed in the optimal hierarchical subfolder. The seven sites encountered few occurrences of codes outside our ontology, 93 % of which comprised just four codes. Our hierarchical approach correctly grouped retired and non-retired codes in most cases and extended the temporal reach of several important phenotyping algorithms. CONCLUSIONS We developed a simple, easily-validated, automated method to place retired CPT codes into the BioPortal CPT hierarchy. This complements existing hierarchical terminologies, which do not include retired codes. The approach's utility is confirmed by the high correctness precision and successful grouping of retired with non-retired codes.
Collapse
Affiliation(s)
- Jeffrey G Klann
- Harvard Medical School, Boston, MA, USA. .,Partners Healthcare, Boston, MA, USA. .,Massachusetts General Hospital, Boston, MA, USA.
| | | | - Alexander Turchin
- Harvard Medical School, Boston, MA, USA.,Partners Healthcare, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA.,Harvard Clinical Research Institute, Boston, MA, USA
| | | | - Kenneth D Mandl
- Harvard Medical School, Boston, MA, USA.,Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, MA, USA.,Partners Healthcare, Boston, MA, USA.,Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
36
|
Borthwick KM, Smelser DT, Bock JA, Elmore JR, Ryer EJ, Ye Z, Pacheco JA, Carrell DS, Michalkiewicz M, Thompson WK, Pathak J, Bielinski SJ, Denny JC, Linneman JG, Peissig PL, Kho AN, Gottesman O, Parmar H, Kullo IJ, McCarty CA, Böttinger EP, Larson EB, Jarvik GP, Harley JB, Bajwa T, Franklin DP, Carey DJ, Kuivaniemi H, Tromp G. ePhenotyping for Abdominal Aortic Aneurysm in the Electronic Medical Records and Genomics (eMERGE) Network: Algorithm Development and Konstanz Information Miner Workflow. INTERNATIONAL JOURNAL OF BIOMEDICAL DATA MINING 2015; 4:113. [PMID: 27054044 PMCID: PMC4820287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND OBJECTIVE We designed an algorithm to identify abdominal aortic aneurysm cases and controls from electronic health records to be shared and executed within the "electronic Medical Records and Genomics" (eMERGE) Network. MATERIALS AND METHODS Structured Query Language, was used to script the algorithm utilizing "Current Procedural Terminology" and "International Classification of Diseases" codes, with demographic and encounter data to classify individuals as case, control, or excluded. The algorithm was validated using blinded manual chart review at three eMERGE Network sites and one non-eMERGE Network site. Validation comprised evaluation of an equal number of predicted cases and controls selected at random from the algorithm predictions. After validation at the three eMERGE Network sites, the remaining eMERGE Network sites performed verification only. Finally, the algorithm was implemented as a workflow in the Konstanz Information Miner, which represented the logic graphically while retaining intermediate data for inspection at each node. The algorithm was configured to be independent of specific access to data and was exportable (without data) to other sites. RESULTS The algorithm demonstrated positive predictive values (PPV) of 92.8% (CI: 86.8-96.7) and 100% (CI: 97.0-100) for cases and controls, respectively. It performed well also outside the eMERGE Network. Implementation of the transportable executable algorithm as a Konstanz Information Miner workflow required much less effort than implementation from pseudo code, and ensured that the logic was as intended. DISCUSSION AND CONCLUSION This ePhenotyping algorithm identifies abdominal aortic aneurysm cases and controls from the electronic health record with high case and control PPV necessary for research purposes, can be disseminated easily, and applied to high-throughput genetic and other studies.
Collapse
Affiliation(s)
- Kenneth M Borthwick
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Diane T Smelser
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Jonathan A Bock
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - James R Elmore
- Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
| | - Evan J Ryer
- Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
| | - Zi Ye
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Jennifer A. Pacheco
- Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - David S. Carrell
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - Michael Michalkiewicz
- Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
| | - William K Thompson
- Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jyotishman Pathak
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | | | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - James G Linneman
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Peggy L Peissig
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
| | - Abel N Kho
- Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Omri Gottesman
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Harpreet Parmar
- Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | | | - Erwin P Böttinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric B Larson
- Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
| | - Gail P Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, WA, USA
| | - John B Harley
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Tanvir Bajwa
- Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
| | - David P Franklin
- Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
| | - David J Carey
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA,Department of Surgery, Temple University School of Medicine, Philadelphia, PA, USA
| | - Gerard Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA,Corresponding author: Gerard Tromp, The Sigfried and Janet Weis Center for Research, Geisinger Health, Danville, PA, USA, Tel: (570) 271-5592;
| |
Collapse
|
37
|
Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G, Kiefer R, Zhu Q, Xu J, Montague E, Carrell DS, Lingren T, Mentch FD, Ni Y, Wehbe FH, Peissig PL, Tromp G, Larson EB, Chute CG, Pathak J, Denny JC, Speltz P, Kho AN, Jarvik GP, Bejan CA, Williams MS, Borthwick K, Kitchner TE, Roden DM, Harris PA. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015; 22:1220-30. [PMID: 26342218 PMCID: PMC4639716 DOI: 10.1093/jamia/ocv112] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 06/24/2015] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). METHODS A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. RESULTS We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. CONCLUSION A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.
Collapse
Affiliation(s)
- Huan Mo
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - William K Thompson
- Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, IL, USA
| | - Luke V Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Richard Kiefer
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Qian Zhu
- Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA
| | - Jie Xu
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Enid Montague
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | | | - Todd Lingren
- Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Frank D Mentch
- Center for Applied Genomics, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
| | - Firas H Wehbe
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Peggy L Peissig
- Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
| | - Gerard Tromp
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Stellenbosch, Cape Town, South Africa
| | | | - Christopher G Chute
- Division of General Internal Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Jyotishman Pathak
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Peter Speltz
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Abel N Kho
- Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Gail P Jarvik
- Department of Medicine (Medical Genetics), University of Washington, Seattle, WA, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cosmin A Bejan
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Marc S Williams
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Kenneth Borthwick
- The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
| | - Terrie E Kitchner
- Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University, Nashville, TN, USA Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Paul A Harris
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
38
|
Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts. PLoS One 2015; 10:e0136651. [PMID: 26301417 PMCID: PMC4547801 DOI: 10.1371/journal.pone.0136651] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 08/06/2015] [Indexed: 01/06/2023] Open
Abstract
Background Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. Methods and Results We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. Conclusions We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.
Collapse
|
39
|
Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, Gainer VS, Shaw SY, Xia Z, Szolovits P, Churchill S, Kohane I. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015; 350:h1885. [PMID: 25911572 PMCID: PMC4707569 DOI: 10.1136/bmj.h1885] [Citation(s) in RCA: 182] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Electronic medical records are emerging as a major source of data for clinical and translational research studies, although phenotypes of interest need to be accurately defined first. This article provides an overview of how to develop a phenotype algorithm from electronic medical records, incorporating modern informatics and biostatistics methods.
Collapse
Affiliation(s)
- Katherine P Liao
- Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston
| | - Tianxi Cai
- Department of Biostatistics, Harvard School of Public Health, Boston
| | | | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston
| | - Elizabeth W Karlson
- Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA 02115, USA Harvard Medical School, Boston
| | - Ashwin N Ananthakrishnan
- Department of Gastroenterology, Massachusetts General Hospital, MGH Crohn's and Colitis Center, Boston
| | - Vivian S Gainer
- Partners Research Computing, Partners HealthCare System, Boston
| | - Stanley Y Shaw
- Harvard Medical School, Boston Center for Systems Biology, Massachusetts General Hospital, Boston
| | - Zongqi Xia
- Harvard Medical School, Boston Department of Neurology, Harvard Medical School, Boston
| | - Peter Szolovits
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
| | | | - Isaac Kohane
- Harvard Medical School, Boston Department of Neurology, Massachusetts General Hospital, Boston
| |
Collapse
|
40
|
Crowe CL, Tao C. Designing Ontology-based Patterns for the Representation of the Time-Relevant Eligibility Criteria of Clinical Protocols. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:173-7. [PMID: 26306263 PMCID: PMC4525239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The amount of time and money required to screen patients for clinical trial and guideline eligibility presents the need for an automated screening process to streamline clinical trial enrollment and guideline implementation. This paper introduces an ontology-based approach for defining a set of patterns that can be used to represent various types of time-relevant eligibility criteria that may appear in clinical protocols. With a focus only on temporal requirements, we examined the criteria of 600 protocols and extracted a set of 37 representative time-relevant eligibility criteria. 16 patterns were designed to represent these criteria. Using a test set of an additional 100 protocols, it was found that these 16 patterns could sufficiently represent 98.5% of the time-relevant criteria. After the time-relevant criteria are modeled by these patterns, it will allow the potential to (1) use natural language processing algorithms to automatically extract temporal constraints from criteria; and (2) develop computer rules and queries to automate the processing of the criteria.
Collapse
Affiliation(s)
| | - Cui Tao
- University of Texas School of Biomedical Informatics, Houston, TX
| |
Collapse
|
41
|
Rasmussen LV, Kiefer RC, Mo H, Speltz P, Thompson WK, Jiang G, Pacheco JA, Xu J, Zhu Q, Denny JC, Montague E, Pathak J. A Modular Architecture for Electronic Health Record-Driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015; 2015:147-51. [PMID: 26306258 PMCID: PMC4525215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Increasing interest in and experience with electronic health record (EHR)-driven phenotyping has yielded multiple challenges that are at present only partially addressed. Many solutions require the adoption of a single software platform, often with an additional cost of mapping existing patient and phenotypic data to multiple representations. We propose a set of guiding design principles and a modular software architecture to bridge the gap to a standardized phenotype representation, dissemination and execution. Ongoing development leveraging this proposed architecture has shown its ability to address existing limitations.
Collapse
Affiliation(s)
| | | | - Huan Mo
- Vanderbilt University, Nashville, TN
| | | | | | | | | | - Jie Xu
- Northwestern University, Chicago, IL
| | - Qian Zhu
- University of Maryland Baltimore County, Baltimore, MD
| | | | | | | |
Collapse
|
42
|
Fu X, Batista-Navarro R, Rak R, Ananiadou S. Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows. J Biomed Semantics 2015; 6:8. [PMID: 25789153 PMCID: PMC4364458 DOI: 10.1186/s13326-015-0004-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 02/22/2015] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients. METHODS A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents. RESULTS When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors. CONCLUSIONS We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Collapse
Affiliation(s)
- Xiao Fu
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK
| | - Riza Batista-Navarro
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK ; Department of Computer Science, University of the Philippines Diliman, Quezon City, 1101 Philippines
| | - Rafal Rak
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, UK
| |
Collapse
|