Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rasmussen LV, Thompson WK, Pacheco JA, Kho AN, Carrell DS, Pathak J, Peissig PL, Tromp G, Denny JC, Starren JB. Design patterns for the development of electronic health record-driven phenotype extraction algorithms. J Biomed Inform 2014;51:280-6. [PMID: 24960203 DOI: 10.1016/j.jbi.2014.06.007] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Revised: 05/27/2014] [Accepted: 06/16/2014] [Indexed: 12/22/2022]

For:	Rasmussen LV, Thompson WK, Pacheco JA, Kho AN, Carrell DS, Pathak J, Peissig PL, Tromp G, Denny JC, Starren JB. Design patterns for the development of electronic health record-driven phenotype extraction algorithms. J Biomed Inform 2014;51:280-6. [PMID: 24960203 DOI: 10.1016/j.jbi.2014.06.007] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Revised: 05/27/2014] [Accepted: 06/16/2014] [Indexed: 12/22/2022]

Number

Cited by Other Article(s)

Carrell DS, Floyd JS, Gruber S, Hazlehurst BL, Heagerty PJ, Nelson JC, Williamson BD, Ball R. A general framework for developing computable clinical phenotype algorithms. J Am Med Inform Assoc 2024;31:1785-1796. [PMID: 38748991 PMCID: PMC11258420 DOI: 10.1093/jamia/ocae121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 05/07/2024] [Accepted: 05/14/2024] [Indexed: 07/20/2024] Open

Huang X, Kleiman R, Page D, Hebbring S. Automated Family Histories Significantly Improve Risk Prediction in an EHR. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024;2024:221-229. [PMID: 38827091 PMCID: PMC11141855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]

Deng Y, Pacheco JA, Ghosh A, Chung A, Mao C, Smith JC, Zhao J, Wei WQ, Barnado A, Dorn C, Weng C, Liu C, Cordon A, Yu J, Tedla Y, Kho A, Ramsey-Goldman R, Walunas T, Luo Y. Natural language processing to identify lupus nephritis phenotype in electronic health records. BMC Med Inform Decis Mak 2024;22:348. [PMID: 38433189 PMCID: PMC10910523 DOI: 10.1186/s12911-024-02420-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 01/09/2024] [Indexed: 03/05/2024] Open

Abstract

BACKGROUND

Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW).

METHODS

We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC).

RESULTS

Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm.

CONCLUSION

Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

Collapse

Affiliation(s)

Yu Deng Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
Jennifer A Pacheco Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA
Anika Ghosh Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
Anh Chung Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA
Chengsheng Mao Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
Joshua C Smith Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
Juan Zhao Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
April Barnado Department of Medicine, Vanderbilt University Medical Center, Nashville, USA
Chad Dorn Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, USA
Chunhua Weng Department of Biomedical Informatics, Columbia University, New York City, USA
Cong Liu Department of Biomedical Informatics, Columbia University, New York City, USA
Adam Cordon Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, USA
Jingzhi Yu Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
Yacob Tedla Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
Abel Kho Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA
Rosalind Ramsey-Goldman Department of Medicine/Rheumatology, Feinberg School of Medicine, Northwestern University, Chicago, USA
Theresa Walunas Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.
Yuan Luo Center for Health Information Partnerships, Feinberg School of Medicine, Northwestern University, Chicago, USA.

Collapse

Brandt PS, Kho A, Luo Y, Pacheco JA, Walunas TL, Hakonarson H, Hripcsak G, Liu C, Shang N, Weng C, Walton N, Carrell DS, Crane PK, Larson EB, Chute CG, Kullo IJ, Carroll R, Denny J, Ramirez A, Wei WQ, Pathak J, Wiley LK, Richesson R, Starren JB, Rasmussen LV. Characterizing variability of electronic health record-driven phenotype definitions. J Am Med Inform Assoc 2023;30:427-437. [PMID: 36474423 PMCID: PMC9933077 DOI: 10.1093/jamia/ocac235] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 10/19/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open

Affiliation(s)

Pascal S Brandt Department of Biomedical and Medical Education, University of Washington, Seattle, Washington, USA
Abel Kho Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Yuan Luo Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Jennifer A Pacheco Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Theresa L Walunas Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Hakon Hakonarson Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
George Hripcsak Department of Biomedical Informatics, Columbia University, New York, New York, USA
Cong Liu Department of Biomedical Informatics, Columbia University, New York, New York, USA
Ning Shang Department of Biomedical Informatics, Columbia University, New York, New York, USA
Chunhua Weng Department of Biomedical Informatics, Columbia University, New York, New York, USA
Nephi Walton Intermountain Precision Genomics, Intermountain Healthcare, St George, Utah, USA
David S Carrell Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA
Paul K Crane Department of Medicine, University of Washington, Seattle, Washington, USA
Eric B Larson Department of Medicine, University of Washington, Seattle, Washington, USA Department of Health Services, University of Washington, Seattle, Washington, USA
Christopher G Chute Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
Iftikhar J Kullo Department of Cardiovascular Medicine, Mayo Clinic, Rochester, Minnesota, USA
Robert Carroll Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Josh Denny All of Us Research Program, National Institutes of Health, Bethesda, Maryland, USA
Andrea Ramirez Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Jyoti Pathak Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
Laura K Wiley Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
Rachel Richesson Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA
Justin B Starren Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Luke V Rasmussen Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA

Collapse

Performance of EHR classifiers for patient eligibility in a clinical trial of precision screening. Contemp Clin Trials 2022;121:106926. [PMID: 36115637 DOI: 10.1016/j.cct.2022.106926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 09/07/2022] [Accepted: 09/09/2022] [Indexed: 01/27/2023]

Partogi M, Gaviria-Valencia S, Alzate Aguirre M, Pick NJ, Bhopalwala HM, Barry BA, Kaggal VC, Scott CG, Kessler ME, Moore MM, Mitchell JD, Chaudhry R, Bonacci RP, Arruda-Olson AM. Sociotechnical Intervention for Improved Delivery of Preventive Cardiovascular Care to Rural Communities: Participatory Design Approach. J Med Internet Res 2022;24:e27333. [PMID: 35994324 PMCID: PMC9446142 DOI: 10.2196/27333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 12/30/2021] [Accepted: 06/27/2022] [Indexed: 11/15/2022] Open

Abstract

Background

Clinical practice guidelines recommend antiplatelet and statin therapies as well as blood pressure control and tobacco cessation for secondary prevention in patients with established atherosclerotic cardiovascular diseases (ASCVDs). However, these strategies for risk modification are underused, especially in rural communities. Moreover, resources to support the delivery of preventive care to rural patients are fewer than those for their urban counterparts. Transformative interventions for the delivery of tailored preventive cardiovascular care to rural patients are needed.

Objective

A multidisciplinary team developed a rural-specific, team-based model of care intervention assisted by clinical decision support (CDS) technology using participatory design in a sociotechnical conceptual framework. The model of care intervention included redesigned workflows and a novel CDS technology for the coordination and delivery of guideline recommendations by primary care teams in a rural clinic.

Methods

The design of the model of care intervention comprised 3 phases: problem identification, experimentation, and testing. Input from team members (n=35) required 150 hours, including observations of clinical encounters, provider workshops, and interviews with patients and health care professionals. The intervention was prototyped, iteratively refined, and tested with user feedback. In a 3-month pilot trial, 369 patients with ASCVDs were randomized into the control or intervention arm.

Results

New workflows and a novel CDS tool were created to identify patients with ASCVDs who had gaps in preventive care and assign the right care team member for delivery of tailored recommendations. During the pilot, the intervention prototype was iteratively refined and tested. The pilot demonstrated feasibility for successful implementation of the sociotechnical intervention as the proportion of patients who had encounters with advanced practice providers (nurse practitioners and physician assistants), pharmacists, or tobacco cessation coaches for the delivery of guideline recommendations in the intervention arm was greater than that in the control arm.

Conclusions

Participatory design and a sociotechnical conceptual framework enabled the development of a rural-specific, team-based model of care intervention assisted by CDS technology for the transformation of preventive health care delivery for ASCVDs.

Collapse

Movaghar A, Page D, Brilliant M, Mailick M. Advancing artificial intelligence-assisted pre-screening for fragile X syndrome. BMC Med Inform Decis Mak 2022;22:152. [PMID: 35689224 PMCID: PMC9185893 DOI: 10.1186/s12911-022-01896-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 06/01/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Fragile X syndrome (FXS), the most common inherited cause of intellectual disability and autism, is significantly underdiagnosed in the general population. Diagnosing FXS is challenging due to the heterogeneity of the condition, subtle physical characteristics at the time of birth and similarity of phenotypes to other conditions. The medical complexity of FXS underscores an urgent need to develop more efficient and effective screening methods to identify individuals with FXS. In this study, we evaluate the effectiveness of using artificial intelligence (AI) and electronic health records (EHRs) to accelerate FXS diagnosis.

METHODS

The EHRs of 2.1 million patients served by the University of Wisconsin Health System (UW Health) were the main data source for this retrospective study. UW Health includes patients from south central Wisconsin, with approximately 33 years (1988-2021) of digitized health data. We identified all participants who received a code for FXS in the form of International Classification of Diseases (ICD), Ninth or Tenth Revision (ICD9 = 759.83, ICD10 = Q99.2). Only individuals who received the FXS code on at least two occasions ("Rule of 2") were classified as clinically diagnosed cases. To ensure the availability of sufficient data prior to clinical diagnosis to test the model, only individuals who were diagnosed after age 10 were included in the analysis. A supervised random forest classifier was used to create an AI-assisted pre-screening tool to identify cases with FXS, 5 years earlier than the time of clinical diagnosis based on their medical records. The area under receiver operating characteristic curve (AUROC) was reported. The AUROC shows the level of success in identification of cases and controls (AUROC = 1 represents perfect classification).

RESULTS

52 individuals were identified as target cases and matched with 5200 controls. AI-assisted pre-screening tool successfully identified cases with FXS, 5 years earlier than the time of clinical diagnosis with an AUROC of 0.717. A separate model trained and tested on UW Health cases achieved the AUROC of 0.798.

CONCLUSIONS

This result shows the potential utility of our tool in accelerating FXS diagnosis in real clinical settings. Earlier diagnosis can lead to more timely intervention and access to services with the goal of improving patients' health outcomes.

Collapse

Link NB, Huang S, Cai T, Sun J, Dahal K, Costa L, Cho K, Liao K, Cai T, Hong C. Binary acronym disambiguation in clinical notes from electronic health records with an application in computational phenotyping. Int J Med Inform 2022;162:104753. [PMID: 35405530 DOI: 10.1016/j.ijmedinf.2022.104753] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 03/11/2022] [Accepted: 03/27/2022] [Indexed: 01/05/2023]

Abstract

OBJECTIVE

The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes.

METHODS

We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis.

RESULTS

CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis.

CONCLUSION

CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

Collapse

Spilseth B, McKnight CD, Li MD, Park CJ, Fried JG, Yi PH, Brian JM, Lehman CD, Wang XJ, Phalke V, Pakkal M, Baruah D, Khine PP, Fajardo LL. AUR-RRA Review: Logistics of Academic-Industry Partnerships in Artificial Intelligence. Acad Radiol 2022;29:119-128. [PMID: 34561163 DOI: 10.1016/j.acra.2021.08.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 07/29/2021] [Accepted: 08/07/2021] [Indexed: 12/27/2022]

Movaghar A, Page D, Brilliant M, Mailick M. Response to Timothé Ménard. Genet Med 2021;24:752-753. [PMID: 34906516 DOI: 10.1016/j.gim.2021.10.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 10/27/2021] [Accepted: 10/28/2021] [Indexed: 10/19/2022] Open

Somani S, Yoffie S, Teng S, Havaldar S, Nadkarni GN, Zhao S, Glicksberg BS. Development and validation of techniques for phenotyping ST-elevation myocardial infarction encounters from electronic health records. JAMIA Open 2021;4:ooab068. [PMID: 34423260 PMCID: PMC8374370 DOI: 10.1093/jamiaopen/ooab068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 06/07/2021] [Accepted: 07/29/2021] [Indexed: 11/12/2022] Open

Klann JG, Estiri H, Weber GM, Moal B, Avillach P, Hong C, Tan ALM, Beaulieu-Jones BK, Castro V, Maulhardt T, Geva A, Malovini A, South AM, Visweswaran S, Morris M, Samayamuthu MJ, Omenn GS, Ngiam KY, Mandl KD, Boeker M, Olson KL, Mowery DL, Follett RW, Hanauer DA, Bellazzi R, Moore JH, Loh NHW, Bell DS, Wagholikar KB, Chiovato L, Tibollo V, Rieg S, Li ALLJ, Jouhet V, Schriver E, Xia Z, Hutch M, Luo Y, Kohane IS, Brat GA, Murphy SN. Validation of an internationally derived patient severity phenotype to support COVID-19 analytics from electronic health record data. J Am Med Inform Assoc 2021;28:1411-1420. [PMID: 33566082 PMCID: PMC7928835 DOI: 10.1093/jamia/ocab018] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 01/14/2021] [Accepted: 01/29/2021] [Indexed: 12/21/2022] Open

Abstract

OBJECTIVE

The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing coronavirus disease 2019 (COVID-19) with federated analyses of electronic health record (EHR) data. We sought to develop and validate a computable phenotype for COVID-19 severity.

MATERIALS AND METHODS

Twelve 4CE sites participated. First, we developed an EHR-based severity phenotype consisting of 6 code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of intensive care unit (ICU) admission and/or death. We also piloted an alternative machine learning approach and compared selected predictors of severity with the 4CE phenotype at 1 site.

RESULTS

The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability-up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean area under the curve of 0.903 (95% confidence interval, 0.886-0.921), compared with an area under the curve of 0.956 (95% confidence interval, 0.952-0.959) for the machine learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared with chart review.

DISCUSSION

We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly owing to heterogeneous pandemic conditions.

CONCLUSIONS

We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.

Collapse

Affiliation(s)

Jeffrey G Klann Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Hossein Estiri Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, USA
Griffin M Weber Department of Biomedical Informatics, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA
Bertrand Moal IAM Unit, Public Health Department , Bordeaux University Hospital, Bordeaux, France
Paul Avillach Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
Chuan Hong Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
Amelia L M Tan Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
Brett K Beaulieu-Jones Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
Victor Castro Research Information Science and Computing, Mass General Brigham, Boston, Massachusetts, USA
Thomas Maulhardt Institute of Medical Biometry and Statistics, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
Alon Geva Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, Massachusetts, USA.,Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
Alberto Malovini Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy
Andrew M South Section of Nephrology, Department of Pediatrics, Brenner Children's Hospital, Wake Forest School of Medicine, Winston Salem, North Carolina, USA
Shyam Visweswaran Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Michele Morris Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Malarkodi J Samayamuthu Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Gilbert S Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
Kee Yuan Ngiam Department of Biomedical Informatics-WisDM, National University Health System, Singapore
Kenneth D Mandl Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
Martin Boeker Institute of Medical Biometry and Statistics, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
Karen L Olson Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts, USA
Danielle L Mowery Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
Robert W Follett Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California, USA
David A Hanauer Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, Michigan, USA
Riccardo Bellazzi Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy.,Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy
Jason H Moore Department of Biostatistics, Epidemiology, and Informatics, Institute for Biomedical Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
Ne-Hooi Will Loh Division of Critical Care, National University Health System, Singapore
Douglas S Bell Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California, USA
Kavishwar B Wagholikar Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
Luca Chiovato Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy.,Department of Internal Medicine and Medical Therapy, University of Pavia, Pavia, Italy
Valentina Tibollo Laboratory of Informatics and Systems Engineering for Clinical Research, Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy
Siegbert Rieg Division of Infectious Diseases, Department of Medicine II, Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
Anthony L L J Li National Center for Infectious Diseases, Tan Tock Seng Hospital, Singapore
Vianney Jouhet ERIAS-INSERM U1219 BPH, Bordeaux University Hospital, Bordeaux, France
Emily Schriver Data Analytics Center, Penn Medicine, Philadelphia, Pennsylvania, USA
Zongqi Xia Department of Neurology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
Meghan Hutch Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Yuan Luo Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Isaac S Kohane Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA

Gabriel A Brat Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
Shawn N Murphy Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA.,Research Information Science and Computing , Mass General Brigham, Boston, Massachusetts, USA

Collapse

Movaghar A, Page D, Scholze D, Hong J, DaWalt LS, Kuusisto F, Stewart R, Brilliant M, Mailick M. Artificial intelligence-assisted phenotype discovery of fragile X syndrome in a population-based sample. Genet Med 2021;23:1273-1280. [PMID: 33772223 PMCID: PMC8257481 DOI: 10.1038/s41436-021-01144-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 03/01/2021] [Accepted: 03/02/2021] [Indexed: 11/09/2022] Open

Walunas TL, Ghosh AS, Pacheco JA, Mitrovic V, Wu A, Jackson KL, Schusler R, Chung A, Erickson D, Mancera-Cuevas K, Luo Y, Kho AN, Ramsey-Goldman R. Evaluation of structured data from electronic health records to identify clinical classification criteria attributes for systemic lupus erythematosus. Lupus Sci Med 2021;8:8/1/e000488. [PMID: 33903204 PMCID: PMC8076919 DOI: 10.1136/lupus-2021-000488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/12/2021] [Accepted: 04/13/2021] [Indexed: 11/10/2022]

Abstract

Objective

Our objective was to develop algorithms to identify lupus clinical classification criteria attributes using structured data found in the electronic health record (EHR) and determine whether they could be used to describe a cohort of people with lupus and discriminate them from a defined healthy control cohort.

Methods

We created gold standard lupus and healthy patient cohorts that were fully adjudicated for the American College of Rheumatology (ACR), Systemic Lupus International Collaborating Clinics (SLICC) and European League Against Rheumatism/ACR (EULAR/ACR) classification criteria and had matched EHR data. We implemented rule-based algorithms using structured data within the EHR system for each attribute of the three classification criteria. Individual criteria attribute and classification criteria algorithms as a whole were assessed over our combined cohorts and the overall performance of the algorithms was measured through sensitivity and specificity.

Results

Individual classification criteria attributes had a wide range of sensitivities, 7% (oral ulcers) to 97% (haematological disorders) and specificities, 56% (haematological disorders) to 98% (photosensitivity), but all could be identified in EHR data. In general, algorithms based on laboratory results performed better than those primarily based on diagnosis codes. All three classification criteria systems effectively distinguished members of our case and control cohorts, but the SLICC criteria-based algorithm had the highest overall performance (76% sensitivity, 99% specificity).

Conclusions

It is possible to characterise disease manifestations in people with lupus using classification criteria-based algorithms that assess structured EHR data. These algorithms may reduce chart review burden and are a foundation for identifying subpopulations of patients with lupus based on disease presentation to support precision medicine applications.

Collapse

Affiliation(s)

Theresa L Walunas Division of General Internal Medicine and Geriatrics, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA .,Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Anika S Ghosh Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Jennifer A Pacheco Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Vesna Mitrovic Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Andy Wu Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Kathryn L Jackson Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Ryan Schusler Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Anh Chung Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Daniel Erickson Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Karen Mancera-Cuevas Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Yuan Luo Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Abel N Kho Division of General Internal Medicine and Geriatrics, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA.,Center for Health Information Partnerships, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
Rosalind Ramsey-Goldman Division of Rheumatology, Department of Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA

Collapse

Shang N, Khan A, Polubriaginof F, Zanoni F, Mehl K, Fasel D, Drawz PE, Carrol RJ, Denny JC, Hathcock MA, Arruda-Olson AM, Peissig PL, Dart RA, Brilliant MH, Larson EB, Carrell DS, Pendergrass S, Verma SS, Ritchie MD, Benoit B, Gainer VS, Karlson EW, Gordon AS, Jarvik GP, Stanaway IB, Crosslin DR, Mohan S, Ionita-Laza I, Tatonetti NP, Gharavi AG, Hripcsak G, Weng C, Kiryluk K. Medical records-based chronic kidney disease phenotype for clinical care and "big data" observational and genetic studies. NPJ Digit Med 2021;4:70. [PMID: 33850243 PMCID: PMC8044136 DOI: 10.1038/s41746-021-00428-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 02/25/2021] [Indexed: 12/19/2022] Open

Affiliation(s)

Ning Shang Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Atlas Khan Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Fernanda Polubriaginof Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Francesca Zanoni Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Karla Mehl Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
David Fasel Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Paul E Drawz Department of Medicine, University of Minnesota, Minnesota, MN, USA
Robert J Carrol Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
Joshua C Denny Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Departments of Medicine, Vanderbilt University, Nashville, TN, USA
Matthew A Hathcock Department of Biomedical Informatics, Mayo Clinic, Rochester, MN, USA
Adelaide M Arruda-Olson Department of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
Peggy L Peissig Marshfield Clinic Research Institute, Marshfield, WI, USA
Richard A Dart Marshfield Clinic Research Institute, Marshfield, WI, USA
Murray H Brilliant Marshfield Clinic Research Institute, Marshfield, WI, USA
Eric B Larson Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
David S Carrell Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
Sarah Pendergrass Geisinger Research, Rockville, MD, USA
Shefali Setia Verma University of Pennsylvania, Philadelphia, PA, USA
Marylyn D Ritchie University of Pennsylvania, Philadelphia, PA, USA
Barbara Benoit Partners HealthCare, Somerville, MA, USA
Vivian S Gainer Partners HealthCare, Somerville, MA, USA
Elizabeth W Karlson Harvard Medical School, Harvard University, Cambridge, MA, USA
Adam S Gordon Center for Genetic Medicine, Northwestern University, Chicago, IL, USA
Gail P Jarvik Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Ian B Stanaway Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
David R Crosslin Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
Sumit Mohan Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Iuliana Ionita-Laza Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
Nicholas P Tatonetti Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Ali G Gharavi Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
George Hripcsak Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Chunhua Weng Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
Krzysztof Kiryluk Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA.

Collapse

Floris-Moore M, Edmonds A, Napravnik S, Adimora AA. Computerized Adjudication of Coronary Heart Disease Events Using the Electronic Medical Record in HIV Clinical Research: Possibilities and Challenges Ahead. AIDS Res Hum Retroviruses 2020;36:306-313. [PMID: 31407587 DOI: 10.1089/aid.2019.0036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Abstract

This pilot study assessed feasibility of computer-assisted electronic medical record (EMR) abstraction to ascertain coronary heart disease (CHD) event hospitalizations. We included a sample of 87 hospitalization records from participants the University of North Carolina (UNC) site of the Women's Interagency HIV Study (WIHS) and UNC Center for AIDS Research (CFAR) HIV Clinical Cohort who were hospitalized within UNC Healthcare System from July 2004 to July 2015. We compared a computer algorithm utilizing diagnosis/procedure codes, medications, and cardiac enzyme levels to adjudicate CHD events [myocardial infarction (MI)/coronary revascularization] from the EMR to standardized manual chart adjudication. Of 87 hospitalizations, 42 were classified as definite, 25 probable, and 20 non-CHD events by manual chart adjudication. A computer algorithm requiring presence of ≥1 CHD-related International Classification of Diseases, 9th Revision (ICD-9)/Current Procedural Terminology (CPT) code correctly identified 24 of 42 definite (57%), 29 of 67 probable/definite CHD (43%), and 95% of non-CHD events; additionally requiring clinically defined cardiac enzyme levels or administration of MI-related medications correctly identified 55%, 42%, and 95% of such events, respectively. Requiring any one of the ICD-9/CPT or cardiac enzyme criteria correctly identified 98% of definite, 97% of probable/definite CHD, and 85% of non-CHD events. Challenges included difficulty matching hospitalization dates, incomplete diagnosis code data, and multiple field names/locations of laboratory/medication data. Computer algorithms comprising only ICD-9/CPT codes failed to identify a sizable proportion of CHD events. Using a less restrictive algorithm yielded fewer missed events but increased the false-positive rate. Despite potential benefits of EMR-based research, there remain several challenges to fully computerized adjudication of CHD events.

Collapse

Feller DJ, Zucker J, Walk OBD, Yin MT, Gordon P, Elhadad N. Longitudinal analysis of social and behavioral determinants of health in the EHR: exploring the impact of patient trajectories and documentation practices. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020;2019:399-407. [PMID: 32308833 PMCID: PMC7153098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019;14:3426-3444. [PMID: 31748751 DOI: 10.1038/s41596-019-0227-6] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 07/22/2019] [Indexed: 01/12/2023]

Chartash D, Paek H, Dziura JD, Ross BK, Nogee DP, Boccio E, Hines C, Schott AM, Jeffery MM, Patel MD, Platts-Mills TF, Ahmed O, Brandt C, Couturier K, Melnick E. Identifying Opioid Use Disorder in the Emergency Department: Multi-System Electronic Health Record-Based Computable Phenotype Derivation and Validation Study. JMIR Med Inform 2019;7:e15794. [PMID: 31674913 PMCID: PMC6913746 DOI: 10.2196/15794] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 09/27/2019] [Accepted: 10/01/2019] [Indexed: 01/10/2023] Open

Abstract

BACKGROUND

Deploying accurate computable phenotypes in pragmatic trials requires a trade-off between precise and clinically sensical variable selection. In particular, evaluating the medical encounter to assess a pattern leading to clinically significant impairment or distress indicative of disease is a difficult modeling challenge for the emergency department.

OBJECTIVE

This study aimed to derive and validate an electronic health record-based computable phenotype to identify emergency department patients with opioid use disorder using physician chart review as a reference standard.

METHODS

A two-algorithm computable phenotype was developed and evaluated using structured clinical data across 13 emergency departments in two large health care systems. Algorithm 1 combined clinician and billing codes. Algorithm 2 used chief complaint structured data suggestive of opioid use disorder. To evaluate the algorithms in both internal and external validation phases, two emergency medicine physicians, with a third acting as adjudicator, reviewed a pragmatic sample of 231 charts: 125 internal validation (75 positive and 50 negative), 106 external validation (56 positive and 50 negative).

RESULTS

Cohen kappa, measuring agreement between reviewers, for the internal and external validation cohorts was 0.95 and 0.93, respectively. In the internal validation phase, Algorithm 1 had a positive predictive value (PPV) of 0.96 (95% CI 0.863-0.995) and a negative predictive value (NPV) of 0.98 (95% CI 0.893-0.999), and Algorithm 2 had a PPV of 0.8 (95% CI 0.593-0.932) and an NPV of 1.0 (one-sided 97.5% CI 0.863-1). In the external validation phase, the phenotype had a PPV of 0.95 (95% CI 0.851-0.989) and an NPV of 0.92 (95% CI 0.807-0.978).

CONCLUSIONS

This phenotype detected emergency department patients with opioid use disorder with high predictive values and reliability. Its algorithms were transportable across health care systems and have potential value for both clinical and research purposes.

Collapse

Affiliation(s)

David Chartash Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, United States
Hyung Paek Information Technology Services, Yale New Haven Health, New Haven, CT, United States
James D Dziura Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
Bill K Ross North Carolina Translational and Clinical Sciences Institute, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Daniel P Nogee Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
Eric Boccio Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
Cory Hines Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Aaron M Schott Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Molly M Jeffery Department of Emergency Medicine, Mayo Clinic, Rochester, MN, United States Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
Mehul D Patel Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Timothy F Platts-Mills Department of Emergency Medicine, University of North Carolina School of Medicine, Chapel Hill, NC, United States
Osama Ahmed Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
Cynthia Brandt Yale Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, United States Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
Katherine Couturier Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States
Edward Melnick Department of Emergency Medicine, Yale University School of Medicine, New Haven, CT, United States

Collapse

Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network. J Biomed Inform 2019;99:103293. [PMID: 31542521 DOI: 10.1016/j.jbi.2019.103293] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 08/26/2019] [Accepted: 09/19/2019] [Indexed: 11/21/2022]

Abstract

BACKGROUND

Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms.

METHODS

We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category.

RESULTS

A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37 ± 1.38. Specifically, the average knowledge (K) score is 0.64 ± 0.66, interpretation (I) score is 0.33 ± 0.55, and programming (P) score is 0.40 ± 0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks.

CONCLUSION

This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.

Collapse

Movaghar A, Page D, Brilliant M, Baker MW, Greenberg J, Hong J, DaWalt LS, Saha K, Kuusisto F, Stewart R, Berry-Kravis E, Mailick MR. Data-driven phenotype discovery of FMR1 premutation carriers in a population-based sample. SCIENCE ADVANCES 2019;5:eaaw7195. [PMID: 31457090 PMCID: PMC6703870 DOI: 10.1126/sciadv.aaw7195] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 07/15/2019] [Indexed: 05/18/2023]

Hripcsak G, Shang N, Peissig PL, Rasmussen LV, Liu C, Benoit B, Carroll RJ, Carrell DS, Denny JC, Dikilitas O, Gainer VS, Howell KM, Klann JG, Kullo IJ, Lingren T, Mentch FD, Murphy SN, Natarajan K, Pacheco JA, Wei WQ, Wiley K, Weng C. Facilitating phenotype transfer using a common data model. J Biomed Inform 2019;96:103253. [PMID: 31325501 DOI: 10.1016/j.jbi.2019.103253] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 07/11/2019] [Accepted: 07/16/2019] [Indexed: 11/16/2022]

Affiliation(s)

George Hripcsak Department of Biomedical Informatics, Columbia University, New York, NY, United States; Medical Informatics Services, NewYork-Presbyterian Hospital, New York, NY, United States.
Ning Shang Department of Biomedical Informatics, Columbia University, New York, NY, United States
Peggy L Peissig Center for Precision Medicine Research, Marshfield Clinic Research Institute, Marshfield, WI, United States
Luke V Rasmussen Northwestern University Feinberg School of Medicine, Chicago, IL, United States
Cong Liu Department of Biomedical Informatics, Columbia University, New York, NY, United States
Barbara Benoit Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
Robert J Carroll Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
David S Carrell Kaiser Permanente Washington Health Research Institute, Seattle, WA, United States
Joshua C Denny Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
Ozan Dikilitas Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
Vivian S Gainer Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
Kayla Marie Howell Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, United States
Jeffrey G Klann Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
Iftikhar J Kullo Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, United States
Todd Lingren Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
Frank D Mentch Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, United States
Shawn N Murphy Research Information Science and Computing, Partners Healthcare, Boston, MA, United States
Karthik Natarajan Department of Biomedical Informatics, Columbia University, New York, NY, United States; Medical Informatics Services, NewYork-Presbyterian Hospital, New York, NY, United States
Jennifer A Pacheco Northwestern University Feinberg School of Medicine, Chicago, IL, United States
Wei-Qi Wei Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
Ken Wiley National Human Genome Research Institute, NIH, Bethesda, MD, United States
Chunhua Weng Department of Biomedical Informatics, Columbia University, New York, NY, United States

Collapse

Langlotz CP, Allen B, Erickson BJ, Kalpathy-Cramer J, Bigelow K, Cook TS, Flanders AE, Lungren MP, Mendelson DS, Rudie JD, Wang G, Kandarpa K. A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop. Radiology 2019;291:781-791. [PMID: 30990384 PMCID: PMC6542624 DOI: 10.1148/radiol.2019190613] [Citation(s) in RCA: 175] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 03/24/2019] [Accepted: 03/25/2019] [Indexed: 01/08/2023]

Affiliation(s)

Curtis P. Langlotz From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Bibb Allen From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Bradley J. Erickson From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Jayashree Kalpathy-Cramer From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Keith Bigelow From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Tessa S. Cook From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Adam E. Flanders From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Matthew P. Lungren From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
David S. Mendelson From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Jeffrey D. Rudie From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Ge Wang From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)
Krishna Kandarpa From the Department of Radiology, Stanford University, Stanford, CA 94305 (C.P.L., M.P.L.); Department of Radiology, Grandview Medical Center, Birmingham, Ala (B.A.); Department of Radiology, Mayo Clinic, Rochester, Minn (B.J.E.); Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (J.K.C.); GE Healthcare, Chicago, Ill (K.B.); Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, Pa (T.S.C., J.D.R.); Department of Radiology, Thomas Jefferson University Hospital, Philadelphia, Pa (A.E.F.); Department of Radiology, Icahn School of Medicine at Mount Sinai, New York, NY (D.S.M.); Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY (G.W.); and National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Washington, DC (K.K.)

Collapse

Wagholikar KB, Ainsworth L, Vernekar VP, Pathak A, Glynn C, Zelle D, Zagade A, Karipineni N, Herrick CD, McPartlin M, Bui TV, Mendis M, Klann J, Oates M, Gordon W, Cannon C, Patel R, Aronson SJ, MacRae CA, Scirica BM, Murphy SN. Extending i2b2 into a framework for semantic abstraction of EHR to facilitate rapid development and portability of Health IT applications. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019;2019:370-378. [PMID: 31258990 PMCID: PMC6568124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Jeong E, Park N, Choi Y, Park RW, Yoon D. Machine learning model combining features from algorithms with different analytical methodologies to detect laboratory-event-related adverse drug reaction signals. PLoS One 2018;13:e0207749. [PMID: 30462745 PMCID: PMC6248973 DOI: 10.1371/journal.pone.0207749] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Accepted: 11/06/2018] [Indexed: 11/25/2022] Open

Abstract

Background

The importance of identifying and evaluating adverse drug reactions (ADRs) has been widely recognized. Many studies have developed algorithms for ADR signal detection using electronic health record (EHR) data. In this study, we propose a machine learning (ML) model that enables accurate ADR signal detection by integrating features from existing algorithms based on inpatient EHR laboratory results.

Materials and methods

To construct an ADR reference dataset, we extracted known drug–laboratory event pairs represented by a laboratory test from the EU-SPC and SIDER databases. All possible drug–laboratory event pairs, except known ones, are considered unknown. To detect a known drug–laboratory event pair, three existing algorithms—CERT, CLEAR, and PACE—were applied to 21-year inpatient EHR data. We also constructed ML models (based on random forest, L1 regularized logistic regression, support vector machine, and a neural network) that use the intermediate products of the CERT, CLEAR, and PACE algorithms as inputs and determine whether a drug–laboratory event pair is associated. For performance comparison, we evaluated the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), F1-measure, and area under receiver operating characteristic (AUROC).

Results

All measures of ML models outperformed those of existing algorithms with sensitivity of 0.593–0.793, specificity of 0.619–0.796, NPV of 0.645–0.727, PPV of 0.680–0.777, F1-measure of 0.629–0.709, and AUROC of 0.737–0.816. Features related to change or distribution of shape were considered important for detecting ADR signals.

Conclusions

Improved performance of ML models indicated that applying our model to EHR data is feasible and promising for detecting more accurate and comprehensive ADR signals.

Collapse

Pacheco JA, Rasmussen LV, Kiefer RC, Campion TR, Speltz P, Carroll RJ, Stallings SC, Mo H, Ahuja M, Jiang G, LaRose ER, Peissig PL, Shang N, Benoit B, Gainer VS, Borthwick K, Jackson KL, Sharma A, Wu AY, Kho AN, Roden DM, Pathak J, Denny JC, Thompson WK. A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments. J Am Med Inform Assoc 2018;25:1540-1546. [PMID: 30124903 PMCID: PMC6213083 DOI: 10.1093/jamia/ocy101] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 06/13/2018] [Accepted: 07/10/2018] [Indexed: 12/12/2022] Open

Affiliation(s)

Jennifer A Pacheco Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Luke V Rasmussen Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Richard C Kiefer Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
Thomas R Campion Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
Peter Speltz Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Robert J Carroll Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Sarah C Stallings Meharry-Vanderbilt Alliance, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Huan Mo Department of Pathology, Loma Linda University Health, Loma Linda, California, USA
Monika Ahuja Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
Guoqian Jiang Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
Eric R LaRose Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
Peggy L Peissig Department of Biomedical Informatics, Marshfield Clinic Research Institute, Marshfield, Wisconsin, USA
Ning Shang Department of Biomedical Informatics, Columbia University, New York, New York, USA
Barbara Benoit Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
Vivian S Gainer Research IS and Computing, Partners HealthCare, Harvard University, Somerville, Massachusetts, USA
Kenneth Borthwick Henry Hood Center for Health Research, Geisinger, Danville, Pennsylvania, USA
Kathryn L Jackson Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Ambrish Sharma Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Andy Yizhou Wu Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Abel N Kho Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
Dan M Roden Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Pharmacology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Jyotishman Pathak Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, New York, USA
Joshua C Denny Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
William K Thompson Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA

Collapse

Hripcsak G, Albers DJ. High-fidelity phenotyping: richness and freedom from bias. J Am Med Inform Assoc 2018;25:289-294. [PMID: 29040596 PMCID: PMC7282504 DOI: 10.1093/jamia/ocx110] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 08/07/2017] [Accepted: 09/06/2017] [Indexed: 01/14/2023] Open

Glicksberg BS, Miotto R, Johnson KW, Shameer K, Li L, Chen R, Dudley JT. Automated disease cohort selection using word embeddings from Electronic Health Records. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018;23:145-156. [PMID: 29218877 PMCID: PMC5788312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Esteban S, Rodríguez Tablado M, Peper FE, Mahumud YS, Ricci RI, Kopitowski KS, Terrasa SA. Development and validation of various phenotyping algorithms for Diabetes Mellitus using data from electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;152:53-70. [PMID: 29054261 DOI: 10.1016/j.cmpb.2017.09.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 08/19/2017] [Accepted: 09/13/2017] [Indexed: 06/07/2023]

Abstract

BACKGROUND AND OBJECTIVE

Recent progression towards precision medicine has encouraged the use of electronic health records (EHRs) as a source for large amounts of data, which is required for studying the effect of treatments or risk factors in more specific subpopulations. Phenotyping algorithms allow to automatically classify patients according to their particular electronic phenotype thus facilitating the setup of retrospective cohorts. Our objective is to compare the performance of different classification strategies (only using standardized problems, rule-based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive; Diabetes of any type) using information extracted from EHRs.

METHODS

Patient information was extracted from the EHR at Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. For the derivation and validation datasets, two probabilistic samples of patients from different years (2005: n = 1663; 2015: n = 800) were extracted. The only inclusion criterion was age (≥40 & <80 years). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria; non-diabetic: not fulfilling the ADA criteria and having at least one fasting glycemia below 126 mg/dL; inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set.

RESULTS

The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.82 (95% CI 0.76, 0.88). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98).

CONCLUSIONS

The stacked generalization strategy and the feedforward neural network showed the best classification metrics in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately.

Collapse

Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary Use and Analysis of Big Data Collected for Patient Care. Yearb Med Inform 2017;26:28-37. [PMID: 28480474 PMCID: PMC6239231 DOI: 10.15265/iy-2017-008] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open

Small AM, Kiss DH, Zlatsin Y, Birtwell DL, Williams H, Guerraty MA, Han Y, Anwaruddin S, Holmes JH, Chirinos JA, Wilensky RL, Giri J, Rader DJ. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease. J Biomed Inform 2017. [PMID: 28624641 DOI: 10.1016/j.jbi.2017.06.016] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Abstract

BACKGROUND

Interrogation of the electronic health record (EHR) using billing codes as a surrogate for diagnoses of interest has been widely used for clinical research. However, the accuracy of this methodology is variable, as it reflects billing codes rather than severity of disease, and depends on the disease and the accuracy of the coding practitioner. Systematic application of text mining to the EHR has had variable success for the detection of cardiovascular phenotypes. We hypothesize that the application of text mining algorithms to cardiovascular procedure reports may be a superior method to identify patients with cardiovascular conditions of interest.

METHODS

We adapted the Oracle product Endeca, which utilizes text mining to identify terms of interest from a NoSQL-like database, for purposes of searching cardiovascular procedure reports and termed the tool "PennSeek". We imported 282,569 echocardiography reports representing 81,164 individuals and 27,205 cardiac catheterization reports representing 14,567 individuals from non-searchable databases into PennSeek. We then applied clinical criteria to these reports in PennSeek to identify patients with trileaflet aortic stenosis (TAS) and coronary artery disease (CAD). Accuracy of patient identification by text mining through PennSeek was compared with ICD-9 billing codes.

RESULTS

Text mining identified 7115 patients with TAS and 9247 patients with CAD. ICD-9 codes identified 8272 patients with TAS and 6913 patients with CAD. 4346 patients with AS and 6024 patients with CAD were identified by both approaches. A randomly selected sample of 200-250 patients uniquely identified by text mining was compared with 200-250 patients uniquely identified by billing codes for both diseases. We demonstrate that text mining was superior, with a positive predictive value (PPV) of 0.95 compared to 0.53 by ICD-9 for TAS, and a PPV of 0.97 compared to 0.86 for CAD.

CONCLUSION

These results highlight the superiority of text mining algorithms applied to electronic cardiovascular procedure reports in the identification of phenotypes of interest for cardiovascular research.

Collapse

Affiliation(s)

Aeron M Small Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
Daniel H Kiss Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
Yevgeny Zlatsin Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
David L Birtwell Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
Heather Williams Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
Marie A Guerraty Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
Yuchi Han Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
Saif Anwaruddin Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
John H Holmes Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
Julio A Chirinos Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
Robert L Wilensky Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
Jay Giri Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA
Daniel J Rader Department of Medicine and Cardiovascular Institute, University of Pennsylvania Perelman School of Medicine, PA, USA; Institute for Translational Medicine and Therapeutics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA; Department of Genetics, University of Pennsylvania Perelman School of Medicine, PA, USA.

Collapse

Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, Pacheco JA, Tromp G, Pathak J, Carrell DS, Ellis SB, Lingren T, Thompson WK, Savova G, Haines J, Roden DM, Harris PA, Denny JC. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016;23:1046-1052. [PMID: 27026615 PMCID: PMC5070514 DOI: 10.1093/jamia/ocv202] [Citation(s) in RCA: 213] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Revised: 10/27/2015] [Accepted: 11/25/2015] [Indexed: 01/29/2023] Open

Abstract

OBJECTIVE

Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites.

RESULTS

As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%).

DISCUSSION

These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others.

CONCLUSION

By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.

Collapse

Malas MS, Wish J, Moorthi R, Grannis S, Dexter P, Duke J, Moe S. A comparison between physicians and computer algorithms for form CMS-2728 data reporting. Hemodial Int 2016;21:117-124. [PMID: 27353890 DOI: 10.1111/hdi.12445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Tenenbaum JD. Translational Bioinformatics: Past, Present, and Future. GENOMICS PROTEOMICS & BIOINFORMATICS 2016;14:31-41. [PMID: 26876718 PMCID: PMC4792852 DOI: 10.1016/j.gpb.2016.01.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 01/20/2016] [Indexed: 02/04/2023]

Klann JG, Phillips LC, Turchin A, Weiler S, Mandl KD, Murphy SN. A numerical similarity approach for using retired Current Procedural Terminology (CPT) codes for electronic phenotyping in the Scalable Collaborative Infrastructure for a Learning Health System (SCILHS). BMC Med Inform Decis Mak 2015;15:104. [PMID: 26655696 PMCID: PMC4676189 DOI: 10.1186/s12911-015-0223-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Accepted: 11/25/2015] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Interoperable phenotyping algorithms, needed to identify patient cohorts meeting eligibility criteria for observational studies or clinical trials, require medical data in a consistent structured, coded format. Data heterogeneity limits such algorithms' applicability. Existing approaches are often: not widely interoperable; or, have low sensitivity due to reliance on the lowest common denominator (ICD-9 diagnoses). In the Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS) we endeavor to use the widely-available Current Procedural Terminology (CPT) procedure codes with ICD-9. Unfortunately, CPT changes drastically year-to-year - codes are retired/replaced. Longitudinal analysis requires grouping retired and current codes. BioPortal provides a navigable CPT hierarchy, which we imported into the Informatics for Integrating Biology and the Bedside (i2b2) data warehouse and analytics platform. However, this hierarchy does not include retired codes.

METHODS

We compared BioPortal's 2014AA CPT hierarchy with Partners Healthcare's SCILHS datamart, comprising three-million patients' data over 15 years. 573 CPT codes were not present in 2014AA (6.5 million occurrences). No existing terminology provided hierarchical linkages for these missing codes, so we developed a method that automatically places missing codes in the most specific "grouper" category, using the numerical similarity of CPT codes. Two informaticians reviewed the results. We incorporated the final table into our i2b2 SCILHS/PCORnet ontology, deployed it at seven sites, and performed a gap analysis and an evaluation against several phenotyping algorithms.

RESULTS

The reviewers found the method placed the code correctly with 97 % precision when considering only miscategorizations ("correctness precision") and 52 % precision using a gold-standard of optimal placement ("optimality precision"). High correctness precision meant that codes were placed in a reasonable hierarchal position that a reviewer can quickly validate. Lower optimality precision meant that codes were not often placed in the optimal hierarchical subfolder. The seven sites encountered few occurrences of codes outside our ontology, 93 % of which comprised just four codes. Our hierarchical approach correctly grouped retired and non-retired codes in most cases and extended the temporal reach of several important phenotyping algorithms.

CONCLUSIONS

We developed a simple, easily-validated, automated method to place retired CPT codes into the BioPortal CPT hierarchy. This complements existing hierarchical terminologies, which do not include retired codes. The approach's utility is confirmed by the high correctness precision and successful grouping of retired with non-retired codes.

Collapse

Borthwick KM, Smelser DT, Bock JA, Elmore JR, Ryer EJ, Ye Z, Pacheco JA, Carrell DS, Michalkiewicz M, Thompson WK, Pathak J, Bielinski SJ, Denny JC, Linneman JG, Peissig PL, Kho AN, Gottesman O, Parmar H, Kullo IJ, McCarty CA, Böttinger EP, Larson EB, Jarvik GP, Harley JB, Bajwa T, Franklin DP, Carey DJ, Kuivaniemi H, Tromp G. ePhenotyping for Abdominal Aortic Aneurysm in the Electronic Medical Records and Genomics (eMERGE) Network: Algorithm Development and Konstanz Information Miner Workflow. INTERNATIONAL JOURNAL OF BIOMEDICAL DATA MINING 2015;4:113. [PMID: 27054044 PMCID: PMC4820287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Abstract

BACKGROUND AND OBJECTIVE

We designed an algorithm to identify abdominal aortic aneurysm cases and controls from electronic health records to be shared and executed within the "electronic Medical Records and Genomics" (eMERGE) Network.

MATERIALS AND METHODS

Structured Query Language, was used to script the algorithm utilizing "Current Procedural Terminology" and "International Classification of Diseases" codes, with demographic and encounter data to classify individuals as case, control, or excluded. The algorithm was validated using blinded manual chart review at three eMERGE Network sites and one non-eMERGE Network site. Validation comprised evaluation of an equal number of predicted cases and controls selected at random from the algorithm predictions. After validation at the three eMERGE Network sites, the remaining eMERGE Network sites performed verification only. Finally, the algorithm was implemented as a workflow in the Konstanz Information Miner, which represented the logic graphically while retaining intermediate data for inspection at each node. The algorithm was configured to be independent of specific access to data and was exportable (without data) to other sites.

RESULTS

The algorithm demonstrated positive predictive values (PPV) of 92.8% (CI: 86.8-96.7) and 100% (CI: 97.0-100) for cases and controls, respectively. It performed well also outside the eMERGE Network. Implementation of the transportable executable algorithm as a Konstanz Information Miner workflow required much less effort than implementation from pseudo code, and ensured that the logic was as intended.

DISCUSSION AND CONCLUSION

This ePhenotyping algorithm identifies abdominal aortic aneurysm cases and controls from the electronic health record with high case and control PPV necessary for research purposes, can be disseminated easily, and applied to high-throughput genetic and other studies.

Collapse

Affiliation(s)

Kenneth M Borthwick The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
Diane T Smelser The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
Jonathan A Bock The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
James R Elmore Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
Evan J Ryer Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
Zi Ye Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
Jennifer A. Pacheco Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
David S. Carrell Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
Michael Michalkiewicz Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
William K Thompson Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Jyotishman Pathak Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
Suzette J Bielinski Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
Joshua C Denny Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
James G Linneman Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
Peggy L Peissig Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA
Abel N Kho Divisions of General Internal Medicine and Preventive Medicine, and the Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Omri Gottesman The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Harpreet Parmar Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
Iftikhar J Kullo Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
Catherine A McCarty Essentia Institute of Rural Health, Duluth, MN, USA
Erwin P Böttinger The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Eric B Larson Group Health Research Institute, Group Health Cooperative, Seattle, WA, USA
Gail P Jarvik Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington, Seattle, WA, USA
John B Harley Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
Tanvir Bajwa Patient-Centered Research, Aurora Research Institute™, Aurora Sinai Medical Center, Milwaukee, WI, USA
David P Franklin Department of Vascular and Endovascular Surgery, Geisinger Health System, Danville, PA, USA
David J Carey The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
Helena Kuivaniemi The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA,14Department of Surgery, Temple University School of Medicine, Philadelphia, PA, USA
Gerard Tromp The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA,*Corresponding author: Gerard Tromp, The Sigfried and Janet Weis Center for Research, Geisinger Health, Danville, PA, USA, Tel: (570) 271-5592;

Collapse

Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G, Kiefer R, Zhu Q, Xu J, Montague E, Carrell DS, Lingren T, Mentch FD, Ni Y, Wehbe FH, Peissig PL, Tromp G, Larson EB, Chute CG, Pathak J, Denny JC, Speltz P, Kho AN, Jarvik GP, Bejan CA, Williams MS, Borthwick K, Kitchner TE, Roden DM, Harris PA. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015;22:1220-30. [PMID: 26342218 PMCID: PMC4639716 DOI: 10.1093/jamia/ocv112] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 06/24/2015] [Indexed: 11/12/2022] Open

Abstract

BACKGROUND

Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM).

METHODS

A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms.

RESULTS

We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility.

CONCLUSION

A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.

Collapse

Affiliation(s)

Huan Mo Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
William K Thompson Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, IL, USA
Luke V Rasmussen Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Jennifer A Pacheco Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Guoqian Jiang Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Richard Kiefer Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Qian Zhu Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA
Jie Xu Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Enid Montague Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
David S Carrell Group Health Research Institute, Seattle, WA, USA
Todd Lingren Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
Frank D Mentch Center for Applied Genomics, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
Yizhao Ni Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
Firas H Wehbe Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Peggy L Peissig Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
Gerard Tromp Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Stellenbosch, Cape Town, South Africa
Eric B Larson Group Health Research Institute, Seattle, WA, USA
Christopher G Chute Division of General Internal Medicine, Johns Hopkins University, Baltimore, MD, USA
Jyotishman Pathak Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
Joshua C Denny Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
Peter Speltz Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
Abel N Kho Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Gail P Jarvik Department of Medicine (Medical Genetics), University of Washington, Seattle, WA, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA
Cosmin A Bejan Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
Marc S Williams Department of Genome Sciences, University of Washington, Seattle, WA, USA
Kenneth Borthwick The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
Terrie E Kitchner Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
Dan M Roden Department of Medicine, Vanderbilt University, Nashville, TN, USA Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
Paul A Harris Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA

Collapse

Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts. PLoS One 2015;10:e0136651. [PMID: 26301417 PMCID: PMC4547801 DOI: 10.1371/journal.pone.0136651] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 08/06/2015] [Indexed: 01/06/2023] Open

Abstract

Background

Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study.

Methods and Results

We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors.

Conclusions

We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.

Collapse

Liao KP, Cai T, Savova GK, Murphy SN, Karlson EW, Ananthakrishnan AN, Gainer VS, Shaw SY, Xia Z, Szolovits P, Churchill S, Kohane I. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ 2015;350:h1885. [PMID: 25911572 PMCID: PMC4707569 DOI: 10.1136/bmj.h1885] [Citation(s) in RCA: 182] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Crowe CL, Tao C. Designing Ontology-based Patterns for the Representation of the Time-Relevant Eligibility Criteria of Clinical Protocols. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015;2015:173-7. [PMID: 26306263 PMCID: PMC4525239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Rasmussen LV, Kiefer RC, Mo H, Speltz P, Thompson WK, Jiang G, Pacheco JA, Xu J, Zhu Q, Denny JC, Montague E, Pathak J. A Modular Architecture for Electronic Health Record-Driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2015;2015:147-51. [PMID: 26306258 PMCID: PMC4525215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Fu X, Batista-Navarro R, Rak R, Ananiadou S. Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows. J Biomed Semantics 2015;6:8. [PMID: 25789153 PMCID: PMC4364458 DOI: 10.1186/s13326-015-0004-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 02/22/2015] [Indexed: 02/03/2023] Open

Abstract

BACKGROUND

Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients.

METHODS

A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents.

RESULTS

When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors.

CONCLUSIONS

We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.

Collapse