1
|
Tang H, Guo J, Shaaban CE, Feng Z, Wu Y, Magoc T, Hu X, Donahoo WT, DeKosky ST, Bian J. Heterogeneous treatment effects of metformin on risk of dementia in patients with type 2 diabetes: A longitudinal observational study. Alzheimers Dement 2024; 20:975-985. [PMID: 37830443 PMCID: PMC10917005 DOI: 10.1002/alz.13480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/14/2023] [Accepted: 08/20/2023] [Indexed: 10/14/2023]
Abstract
INTRODUCTION Little is known about the heterogeneous treatment effects of metformin on dementia risk in people with type 2 diabetes (T2D). METHODS Participants (≥ 50 years) with T2D and normal cognition at baseline were identified from the National Alzheimer's Coordinating Center database (2005-2021). We applied a doubly robust learning approach to estimate risk differences (RD) with a 95% confidence interval (CI) for dementia risk between metformin use and no use in the overall population and subgroups identified through a decision tree model. RESULTS Among 1393 participants, 104 developed dementia over a 4-year median follow-up. Metformin was significantly associated with a lower risk of dementia in the overall population (RD, -3.2%; 95% CI, -6.2% to -0.2%). We identified four subgroups with varied risks for dementia, defined by neuropsychiatric disorders, non-steroidal anti-inflammatory drugs, and antidepressant use. DISCUSSION Metformin use was significantly associated with a lower risk of dementia in individuals with T2D, with significant variability among subgroups.
Collapse
Affiliation(s)
- Huilin Tang
- Department of Pharmaceutical Outcomes and PolicyUniversity of Florida College of PharmacyGainesvilleFloridaUSA
| | - Jingchuan Guo
- Department of Pharmaceutical Outcomes and PolicyUniversity of Florida College of PharmacyGainesvilleFloridaUSA
- Center for Drug Evaluation and SafetyUniversity of FloridaGainesvilleFloridaUSA
| | - C. Elizabeth Shaaban
- Department of EpidemiologySchool of Public HealthUniversity of PittsburghPittsburghPennsylvaniaUSA
- Alzheimer's Disease Research CenterUniversity of PittsburghPittsburghPennsylvaniaUSA
| | - Zheng Feng
- Department of Health Outcomes and Biomedical InformaticsCollege of MedicineUniversity of FloridaGainesvilleFloridaUSA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical InformaticsCollege of MedicineUniversity of FloridaGainesvilleFloridaUSA
| | - Tanja Magoc
- Clinical and Translational Science InstituteUniversity of FloridaGainesvilleFloridaUSA
| | - Xia Hu
- DATA LabDepartment of Computer ScienceRice UniversityHoustonTexasUSA
| | - William T Donahoo
- Department of MedicineCollege of MedicineUniversity of FloridaGainesvilleFloridaUSA
| | - Steven T. DeKosky
- Department of Neurology and McKnight Brain InstituteCollege of MedicineUniversity of FloridaGainesvilleFloridaUSA
- Florida Alzheimer's Disease Research Center (ADRC)University of FloridaGainesvilleFloridaUSA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical InformaticsCollege of MedicineUniversity of FloridaGainesvilleFloridaUSA
| |
Collapse
|
2
|
Solberg LM, Duckworth LJ, Dunn EM, Dickinson T, Magoc T, Snigurska UA, Ser SE, Celso B, Bailey M, Bowen C, Radhakrishnan N, Patel CR, Lucero R, Bjarnadottir RI. Use of a Data Repository to Identify Delirium as a Presenting Symptom of COVID-19 Infection in Hospitalized Adults: Cross-Sectional Cohort Pilot Study. JMIR Aging 2023; 6:e43185. [PMID: 37910448 PMCID: PMC10722366 DOI: 10.2196/43185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 07/06/2023] [Accepted: 09/28/2023] [Indexed: 11/03/2023] Open
Abstract
BACKGROUND Delirium, an acute confusional state highlighted by inattention, has been reported to occur in 10% to 50% of patients with COVID-19. People hospitalized with COVID-19 have been noted to present with or develop delirium and neurocognitive disorders. Caring for patients with delirium is associated with more burden for nurses, clinicians, and caregivers. Using information in electronic health record data to recognize delirium and possibly COVID-19 could lead to earlier treatment of the underlying viral infection and improve outcomes in clinical and health care systems cost per patient. Clinical data repositories can further support rapid discovery through cohort identification tools, such as the Informatics for Integrating Biology and the Bedside tool. OBJECTIVE The specific aim of this research was to investigate delirium in hospitalized older adults as a possible presenting symptom in COVID-19 using a data repository to identify neurocognitive disorders with a novel group of International Classification of Diseases, Tenth Revision (ICD-10) codes. METHODS We analyzed data from 2 catchment areas with different demographics. The first catchment area (7 counties in the North-Central Florida) is predominantly rural while the second (1 county in North Florida) is predominantly urban. The Integrating Biology and the Bedside data repository was queried for patients with COVID-19 admitted to inpatient units via the emergency department (ED) within the health center from April 1, 2020, and April 1, 2022. Patients with COVID-19 were identified by having a positive COVID-19 laboratory test or a diagnosis code of U07.1. We identified neurocognitive disorders as delirium or encephalopathy, using ICD-10 codes. RESULTS Less than one-third (1437/4828, 29.8%) of patients with COVID-19 were diagnosed with a co-occurring neurocognitive disorder. A neurocognitive disorder was present on admission for 15.8% (762/4828) of all patients with COVID-19 admitted through the ED. Among patients with both COVID-19 and a neurocognitive disorder, 56.9% (817/1437) were aged ≥65 years, a significantly higher proportion than those with no neurocognitive disorder (P<.001). The proportion of patients aged <65 years was significantly higher among patients diagnosed with encephalopathy only than patients diagnosed with delirium only and both delirium and encephalopathy (P<.001). Most (1272/4828, 26.3%) patients with COVID-19 admitted through the ED during our study period were admitted during the Delta variant peak. CONCLUSIONS The data collected demonstrated that an increased number of older patients with neurocognitive disorder present on admission were infected with COVID-19. Knowing that delirium increases the staffing, nursing care needs, hospital resources used, and the length of stay as previously noted, identifying delirium early may benefit hospital administration when planning for newly anticipated COVID-19 surges. A robust and accessible data repository, such as the one used in this study, can provide invaluable support to clinicians and clinical administrators in such resource reallocation and clinical decision-making.
Collapse
Affiliation(s)
- Laurence M Solberg
- Geriatrics Research, Education, and Clinical Center, North Florida/South Georgia Veterans Health System, Veterans Health Administration, Gainesville, FL, United States
- College of Nursing, University of Florida, Gainesville, FL, United States
| | - Laurie J Duckworth
- College of Nursing, University of Florida, Gainesville, FL, United States
- Shands Hospital, UF Health, Gainesville, FL, United States
| | | | | | - Tanja Magoc
- Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, United States
| | | | - Sarah E Ser
- Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, United States
- College of Medicine, University of Florida, Gainesville, FL, United States
| | - Brian Celso
- College of Medicine, University of Florida, Jacksonville, FL, United States
| | - Meghan Bailey
- Shands Hospital, UF Health, Gainesville, FL, United States
| | - Courtney Bowen
- Shands Hospital, UF Health, Gainesville, FL, United States
| | - Nila Radhakrishnan
- College of Medicine, University of Florida, Gainesville, FL, United States
| | - Chirag R Patel
- College of Medicine, University of Florida, Jacksonville, FL, United States
| | - Robert Lucero
- College of Nursing, University of Florida, Gainesville, FL, United States
- School of Nursing, University of California Los Angeles, Los Angeles, CA, United States
| | | |
Collapse
|
3
|
Peng C, Yang X, Chen A, Smith KE, PourNejatian N, Costa AB, Martin C, Flores MG, Zhang Y, Magoc T, Lipori G, Mitchell DA, Ospina NS, Ahmed MM, Hogan WR, Shenkman EA, Guo Y, Bian J, Wu Y. A study of generative large language model for medical research and healthcare. NPJ Digit Med 2023; 6:210. [PMID: 37973919 PMCID: PMC10654385 DOI: 10.1038/s41746-023-00958-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Accepted: 11/01/2023] [Indexed: 11/19/2023] Open
Abstract
There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical text from 126 clinical departments and approximately 2 million patients at the University of Florida Health and (2) 195 billion words of diverse general English text. We train GatorTronGPT using a GPT-3 architecture with up to 20 billion parameters and evaluate its utility for biomedical natural language processing (NLP) and healthcare text generation. GatorTronGPT improves biomedical natural language processing. We apply GatorTronGPT to generate 20 billion words of synthetic text. Synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text. Physicians' Turing test using 1 (worst) to 9 (best) scale shows that there are no significant differences in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights into the opportunities and challenges of LLMs for medical research and healthcare.
Collapse
Affiliation(s)
- Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | | | | | | | | | | | - Ying Zhang
- Research Computing, University of Florida, Gainesville, FL, USA
| | - Tanja Magoc
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
| | - Gloria Lipori
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
- Lillian S. Wells Department of Neurosurgery, Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
| | - Duane A Mitchell
- Lillian S. Wells Department of Neurosurgery, Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
| | - Naykky S Ospina
- Division of Endocrinology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Mustafa M Ahmed
- Division of Cardiovascular Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Elizabeth A Shenkman
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
- Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
4
|
Ser SE, Shear K, Snigurska UA, Prosperi M, Wu Y, Magoc T, Bjarnadottir RI, Lucero RJ. Clinical Prediction Models for Hospital-Induced Delirium Using Structured and Unstructured Electronic Health Record Data: Protocol for a Development and Validation Study. JMIR Res Protoc 2023; 12:e48521. [PMID: 37943599 PMCID: PMC10667972 DOI: 10.2196/48521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 09/01/2023] [Accepted: 09/05/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND Hospital-induced delirium is one of the most common and costly iatrogenic conditions, and its incidence is predicted to increase as the population of the United States ages. An academic and clinical interdisciplinary systems approach is needed to reduce the frequency and impact of hospital-induced delirium. OBJECTIVE The long-term goal of our research is to enhance the safety of hospitalized older adults by reducing iatrogenic conditions through an effective learning health system. In this study, we will develop models for predicting hospital-induced delirium. In order to accomplish this objective, we will create a computable phenotype for our outcome (hospital-induced delirium), design an expert-based traditional logistic regression model, leverage machine learning techniques to generate a model using structured data, and use machine learning and natural language processing to produce an integrated model with components from both structured data and text data. METHODS This study will explore text-based data, such as nursing notes, to improve the predictive capability of prognostic models for hospital-induced delirium. By using supervised and unsupervised text mining in addition to structured data, we will examine multiple types of information in electronic health record data to predict medical-surgical patient risk of developing delirium. Development and validation will be compliant to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement. RESULTS Work on this project will take place through March 2024. For this study, we will use data from approximately 332,230 encounters that occurred between January 2012 to May 2021. Findings from this project will be disseminated at scientific conferences and in peer-reviewed journals. CONCLUSIONS Success in this study will yield a durable, high-performing research-data infrastructure that will process, extract, and analyze clinical text data in near real time. This model has the potential to be integrated into the electronic health record and provide point-of-care decision support to prevent harm and improve quality of care. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/48521.
Collapse
Affiliation(s)
- Sarah E Ser
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, United States
| | - Kristen Shear
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States
| | - Urszula A Snigurska
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States
| | - Mattia Prosperi
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, United States
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Tanja Magoc
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, United States
| | - Ragnhildur I Bjarnadottir
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States
| | - Robert J Lucero
- Department of Family, Community, and Health Systems Science, College of Nursing, University of Florida, Gainesville, FL, United States
- School of Nursing, University of California Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
5
|
Zapata RD, Huang S, Morris E, Wang C, Harle C, Magoc T, Mardini M, Loftus T, Modave F. Machine learning-based prediction models for home discharge in patients with COVID-19: Development and evaluation using electronic health records. PLoS One 2023; 18:e0292888. [PMID: 37862334 PMCID: PMC10588875 DOI: 10.1371/journal.pone.0292888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 09/30/2023] [Indexed: 10/22/2023] Open
Abstract
OBJECTIVE This study aimed to develop and validate predictive models using electronic health records (EHR) data to determine whether hospitalized COVID-19-positive patients would be admitted to alternative medical care or discharged home. METHODS We conducted a retrospective cohort study using deidentified data from the University of Florida Health Integrated Data Repository. The study included 1,578 adult patients (≥18 years) who tested positive for COVID-19 while hospitalized, comprising 960 (60.8%) female patients with a mean (SD) age of 51.86 (18.49) years and 618 (39.2%) male patients with a mean (SD) age of 54.35 (18.48) years. Machine learning (ML) model training involved cross-validation to assess their performance in predicting patient disposition. RESULTS We developed and validated six supervised ML-based prediction models (logistic regression, Gaussian Naïve Bayes, k-nearest neighbors, decision trees, random forest, and support vector machine classifier) to predict patient discharge status. The models were evaluated based on the area under the receiver operating characteristic curve (ROC-AUC), precision, accuracy, F1 score, and Brier score. The random forest classifier exhibited the highest performance, achieving an accuracy of 0.84 and an AUC of 0.72. Logistic regression (accuracy: 0.85, AUC: 0.71), k-nearest neighbor (accuracy: 0.84, AUC: 0.63), decision tree (accuracy: 0.84, AUC: 0.61), Gaussian Naïve Bayes (accuracy: 0.84, AUC: 0.66), and support vector machine classifier (accuracy: 0.84, AUC: 0.67) also demonstrated valuable predictive capabilities. SIGNIFICANCE This study's findings are crucial for efficiently allocating healthcare resources during pandemics like COVID-19. By harnessing ML techniques and EHR data, we can create predictive tools to identify patients at greater risk of severe symptoms based on their medical histories. The models developed here serve as a foundation for expanding the toolkit available to healthcare professionals and organizations. Additionally, explainable ML methods, such as Shapley Additive Explanations, aid in uncovering underlying data features that inform healthcare decision-making processes.
Collapse
Affiliation(s)
- Ruben D. Zapata
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, United States of America
| | - Shu Huang
- Department of Pharmaceutical Outcomes and Policy, University of Florida College of Pharmacy, Gainesville, FL, United States of America
| | - Earl Morris
- Department of Pharmaceutical Outcomes and Policy, University of Florida College of Pharmacy, Gainesville, FL, United States of America
| | - Chang Wang
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, United States of America
| | - Christopher Harle
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, United States of America
- Clinical and Translational Science Institute, University of Florida, Gainesville, FL, United States of America
| | - Tanja Magoc
- Clinical and Translational Science Institute, University of Florida, Gainesville, FL, United States of America
| | - Mamoun Mardini
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, United States of America
| | - Tyler Loftus
- Department of Surgery, University of Florida College of Medicine, Gainesville, FL, United States of America
| | - François Modave
- Department of Health Outcomes and Biomedical Informatics, University of Florida College of Medicine, Gainesville, FL, United States of America
- Department of Anesthesiology, University of Florida College of Medicine, Gainesville, FL, United States of America
| |
Collapse
|
6
|
Snigurska UA, Ser SE, Solberg LM, Prosperi M, Magoc T, Chen Z, Bian J, Bjarnadottir RI, Lucero RJ. Application of a practice-based approach in variable selection for a prediction model development study of hospital-induced delirium. BMC Med Inform Decis Mak 2023; 23:181. [PMID: 37704994 PMCID: PMC10500854 DOI: 10.1186/s12911-023-02278-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 08/30/2023] [Indexed: 09/15/2023] Open
Abstract
BACKGROUND Prognostic models of hospital-induced delirium, that include potential predisposing and precipitating factors, may be used to identify vulnerable patients and inform the implementation of tailored preventive interventions. It is recommended that, in prediction model development studies, candidate predictors are selected on the basis of existing knowledge, including knowledge from clinical practice. The purpose of this article is to describe the process of identifying and operationalizing candidate predictors of hospital-induced delirium for application in a prediction model development study using a practice-based approach. METHODS This study is part of a larger, retrospective cohort study that is developing prognostic models of hospital-induced delirium for medical-surgical older adult patients using structured data from administrative and electronic health records. First, we conducted a review of the literature to identify clinical concepts that had been used as candidate predictors in prognostic model development-and-validation studies of hospital-induced delirium. Then, we consulted a multidisciplinary task force of nine members who independently judged whether each clinical concept was associated with hospital-induced delirium. Finally, we mapped the clinical concepts to the administrative and electronic health records and operationalized our candidate predictors. RESULTS In the review of 34 studies, we identified 504 unique clinical concepts. Two-thirds of the clinical concepts (337/504) were used as candidate predictors only once. The most common clinical concepts included age (31/34), sex (29/34), and alcohol use (22/34). 96% of the clinical concepts (484/504) were judged to be associated with the development of hospital-induced delirium by at least two members of the task force. All of the task force members agreed that 47 or 9% of the 504 clinical concepts were associated with hospital-induced delirium. CONCLUSIONS Heterogeneity among candidate predictors of hospital-induced delirium in the literature suggests a still evolving list of factors that contribute to the development of this complex phenomenon. We demonstrated a practice-based approach to variable selection for our model development study of hospital-induced delirium. Expert judgement of variables enabled us to categorize the variables based on the amount of agreement among the experts and plan for the development of different models, including an expert-model and data-driven model.
Collapse
Affiliation(s)
- Urszula A Snigurska
- College of Nursing, Department of Family, Community, and Health Systems Science, University of Florida, 1225 Center Drive, PO Box 100197, Gainesville, FL, 32610, United States of America.
| | - Sarah E Ser
- College of Public Health and Health Professions & College of Medicine, Department of Epidemiology, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32610, United States of America
| | - Laurence M Solberg
- College of Nursing, Department of Family, Community, and Health Systems Science, University of Florida, 1225 Center Drive, PO Box 100197, Gainesville, FL, 32610, United States of America
- Geriatrics Research, Education, and Clinical Center (GRECC), North Florida/South Georgia Veterans Health System, 1601 SW Archer Rd, Gainesville, FL, 32608, United States of America
- College of Medicine, University of Central Florida, 6850 Lake Nona Blvd, Orlando, FL, 32827, United States of America
| | - Mattia Prosperi
- College of Public Health and Health Professions & College of Medicine, Department of Epidemiology, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32610, United States of America
| | - Tanja Magoc
- Clinical and Translational Science Institute (CTSI), Integrated Data Repository Research Services, University of Florida, 3300 SW Williston Rd, Gainesville, FL, 32608, United States of America
| | - Zhaoyi Chen
- College of Medicine, Department of Health Outcomes & Biomedical Informatics, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32610, United States of America
| | - Jiang Bian
- College of Medicine, Department of Health Outcomes & Biomedical Informatics, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32610, United States of America
| | - Ragnhildur I Bjarnadottir
- College of Nursing, Department of Family, Community, and Health Systems Science, University of Florida, 1225 Center Drive, PO Box 100197, Gainesville, FL, 32610, United States of America
| | - Robert J Lucero
- College of Nursing, Department of Family, Community, and Health Systems Science, University of Florida, 1225 Center Drive, PO Box 100197, Gainesville, FL, 32610, United States of America
- School of Nursing, University of California Los Angeles, 700 Tiverton Ave, Los Angeles, CA, 90095, United States of America
| |
Collapse
|
7
|
Magoc T, Allen KS, McDonnell C, Russo JP, Cummins J, Vest JR, Harle CA. Generalizability and portability of natural language processing system to extract individual social risk factors. Int J Med Inform 2023; 177:105115. [PMID: 37302362 DOI: 10.1016/j.ijmedinf.2023.105115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Revised: 05/15/2023] [Accepted: 05/30/2023] [Indexed: 06/13/2023]
Abstract
OBJECTIVE The objective of this study is to validate and report on portability and generalizability of a Natural Language Processing (NLP) method to extract individual social factors from clinical notes, which was originally developed at a different institution. MATERIALS AND METHODS A rule-based deterministic state machine NLP model was developed to extract financial insecurity and housing instability using notes from one institution and was applied on all notes written during 6 months at another institution. 10% of positively-classified notes by NLP and the same number of negatively-classified notes were manually annotated. The NLP model was adjusted to accommodate notes at the new site. Accuracy, positive predictive value, sensitivity, and specificity were calculated. RESULTS More than 6 million notes were processed at the receiving site by the NLP model, which resulted in about 13,000 and 19,000 classified as positive for financial insecurity and housing instability, respectively. The NLP model showed excellent performance on the validation dataset with all measures over 0.87 for both social factors. DISCUSSION Our study illustrated the need to accommodate institution-specific note-writing templates as well as clinical terminology of emergent diseases when applying NLP model for social factors. A state machine is relatively simple to port effectively across institutions. Our study. showed superior performance to similar generalizability studies for extracting social factors. CONCLUSION Rule-based NLP model to extract social factors from clinical notes showed strong portability and generalizability across organizationally and geographically distinct institutions. With only relatively simple modifications, we obtained promising performance from an NLP-based model.
Collapse
Affiliation(s)
- Tanja Magoc
- College of Medicine, University of Florida, Gainesville, FL, USA.
| | - Katie S Allen
- Regenstrief Institute, Inc., Indianapolis, IN, USA; Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, IN, USA
| | - Cara McDonnell
- College of Medicine, University of Florida, Gainesville, FL, USA
| | - Jean-Paul Russo
- College of Medicine, University of Florida, Gainesville, FL, USA; Miller School of Medicine, University of Miami, Miami, FL, USA
| | | | - Joshua R Vest
- Regenstrief Institute, Inc., Indianapolis, IN, USA; Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, IN, USA
| | - Christopher A Harle
- Regenstrief Institute, Inc., Indianapolis, IN, USA; Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, IN, USA
| |
Collapse
|
8
|
Magoc T, Everson R, Harle CA. Enhancing an enterprise data warehouse for research with data extracted using natural language processing. J Clin Transl Sci 2023; 7:e149. [PMID: 37456264 PMCID: PMC10346024 DOI: 10.1017/cts.2023.575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/14/2023] [Accepted: 05/31/2023] [Indexed: 07/18/2023] Open
Abstract
Objective This study aims to develop a generalizable architecture for enhancing an enterprise data warehouse for research (EDW4R) with results from a natural language processing (NLP) model, which allows discrete data derived from clinical notes to be made broadly available for research use without need for NLP expertise. The study also quantifies the additional value that information extracted from clinical narratives brings to EDW4R. Materials and methods Clinical notes written during one month at an academic health center were used to evaluate the performance of an existing NLP model and to quantify its value added to the structured data. Manual review was utilized for performance analysis. The architecture for enhancing the EDW4R is described in detail to enable reproducibility. Results Two weeks were needed to enhance EDW4R with data from 250 million clinical notes. NLP generated 16 and 39% increase in data availability for two variables. Discussion Our architecture is highly generalizable to a new NLP model. The positive predictive value obtained by an independent team showed only slightly lower NLP performance than the values reported by the NLP developers. The NLP showed significant value added to data already available in structured format. Conclusion Given the value added by data extracted using NLP, it is important to enhance EDW4R with these data to enable research teams without NLP expertise to benefit from value added by NLP models.
Collapse
Affiliation(s)
- Tanja Magoc
- College of Medicine, University of Florida, Gainesville, FL, USA
| | | | | |
Collapse
|
9
|
Yang X, Chen A, PourNejatian N, Shin HC, Smith KE, Parisien C, Compas C, Martin C, Costa AB, Flores MG, Zhang Y, Magoc T, Harle CA, Lipori G, Mitchell DA, Hogan WR, Shenkman EA, Bian J, Wu Y. A large language model for electronic health records. NPJ Digit Med 2022; 5:194. [PMID: 36572766 PMCID: PMC9792464 DOI: 10.1038/s41746-022-00742-2] [Citation(s) in RCA: 83] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 12/13/2022] [Indexed: 12/27/2022] Open
Abstract
There is an increasing interest in developing artificial intelligence (AI) systems to process and interpret electronic health records (EHRs). Natural language processing (NLP) powered by pretrained language models is the key technology for medical AI systems utilizing clinical narratives. However, there are few clinical language models, the largest of which trained in the clinical domain is comparatively small at 110 million parameters (compared with billions of parameters in the general domain). It is not clear how large clinical language models with billions of parameters can help medical AI systems utilize unstructured EHRs. In this study, we develop from scratch a large clinical language model-GatorTron-using >90 billion words of text (including >82 billion words of de-identified clinical text) and systematically evaluate it on five clinical NLP tasks including clinical concept extraction, medical relation extraction, semantic textual similarity, natural language inference (NLI), and medical question answering (MQA). We examine how (1) scaling up the number of parameters and (2) scaling up the size of the training data could benefit these NLP tasks. GatorTron models scale up the clinical language model from 110 million to 8.9 billion parameters and improve five clinical NLP tasks (e.g., 9.6% and 9.5% improvement in accuracy for NLI and MQA), which can be applied to medical AI systems to improve healthcare delivery. The GatorTron models are publicly available at: https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/models/gatortron_og .
Collapse
Affiliation(s)
- Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Aokun Chen
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA
| | | | | | | | | | | | | | | | | | - Ying Zhang
- Research Computing, University of Florida, Gainesville, FL, USA
| | - Tanja Magoc
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
| | - Christopher A Harle
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
| | - Gloria Lipori
- Integrated Data Repository Research Services, University of Florida, Gainesville, FL, USA
- Lillian S. Wells Department of Neurosurgery, UF Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
| | - Duane A Mitchell
- Lillian S. Wells Department of Neurosurgery, UF Clinical and Translational Science Institute, University of Florida, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Elizabeth A Shenkman
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
- Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
- Cancer Informatics and eHealth core, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
10
|
Peng L, Luo G, Walker A, Zaiman Z, Jones EK, Gupta H, Kersten K, Burns JL, Harle CA, Magoc T, Shickel B, Steenburg SD, Loftus T, Melton GB, Gichoya JW, Sun J, Tignanelli CJ. Evaluation of federated learning variations for COVID-19 diagnosis using chest radiographs from 42 US and European hospitals. J Am Med Inform Assoc 2022; 30:54-63. [PMID: 36214629 PMCID: PMC9619688 DOI: 10.1093/jamia/ocac188] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 08/31/2022] [Accepted: 10/07/2022] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE Federated learning (FL) allows multiple distributed data holders to collaboratively learn a shared model without data sharing. However, individual health system data are heterogeneous. "Personalized" FL variations have been developed to counter data heterogeneity, but few have been evaluated using real-world healthcare data. The purpose of this study is to investigate the performance of a single-site versus a 3-client federated model using a previously described Coronavirus Disease 19 (COVID-19) diagnostic model. Additionally, to investigate the effect of system heterogeneity, we evaluate the performance of 4 FL variations. MATERIALS AND METHODS We leverage a FL healthcare collaborative including data from 5 international healthcare systems (US and Europe) encompassing 42 hospitals. We implemented a COVID-19 computer vision diagnosis system using the Federated Averaging (FedAvg) algorithm implemented on Clara Train SDK 4.0. To study the effect of data heterogeneity, training data was pooled from 3 systems locally and federation was simulated. We compared a centralized/pooled model, versus FedAvg, and 3 personalized FL variations (FedProx, FedBN, and FedAMP). RESULTS We observed comparable model performance with respect to internal validation (local model: AUROC 0.94 vs FedAvg: 0.95, P = .5) and improved model generalizability with the FedAvg model (P < .05). When investigating the effects of model heterogeneity, we observed poor performance with FedAvg on internal validation as compared to personalized FL algorithms. FedAvg did have improved generalizability compared to personalized FL algorithms. On average, FedBN had the best rank performance on internal and external validation. CONCLUSION FedAvg can significantly improve the generalization of the model compared to other personalization FL algorithms; however, at the cost of poor internal validity. Personalized FL may offer an opportunity to develop both internal and externally validated algorithms.
Collapse
Affiliation(s)
- Le Peng
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Gaoxiang Luo
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew Walker
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Zachary Zaiman
- Department of Computer Science, Emory University, Atlanta, Georgia, USA
| | - Emma K Jones
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
| | - Hemant Gupta
- Fairview Health Services, Minneapolis, Minnesota, USA
| | | | - John L Burns
- The School of Medicine, Indiana University, Indianapolis, Indiana, USA
| | - Christopher A Harle
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Tanja Magoc
- University of Florida College of Medicine, Gainesville, Florida, USA
| | - Benjamin Shickel
- Department of Medicine, University of Florida, Gainesville, Florida, USA
- Intelligent Critical Care Center, University of Florida, Gainesville, Florida, USA
| | - Scott D Steenburg
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Tyler Loftus
- Intelligent Critical Care Center, University of Florida, Gainesville, Florida, USA
- Department of Surgery, University of Florida, Gainesville, Florida, USA
| | - Genevieve B Melton
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
- Fairview Health Services, Minneapolis, Minnesota, USA
- Center for Learning Health System Sciences, University of Minnesota, Minneapolis, Minnesota, USA
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | | | - Ju Sun
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, USA
| | - Christopher J Tignanelli
- Department of Surgery, University of Minnesota, Minneapolis, Minnesota, USA
- Center for Learning Health System Sciences, University of Minnesota, Minneapolis, Minnesota, USA
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
11
|
Kostka K, Duarte-Salles T, Prats-Uribe A, Sena AG, Pistillo A, Khalid S, Lai LYH, Golozar A, Alshammari TM, Dawoud DM, Nyberg F, Wilcox AB, Andryc A, Williams A, Ostropolets A, Areia C, Jung CY, Harle CA, Reich CG, Blacketer C, Morales DR, Dorr DA, Burn E, Roel E, Tan EH, Minty E, DeFalco F, de Maeztu G, Lipori G, Alghoul H, Zhu H, Thomas JA, Bian J, Park J, Martínez Roldán J, Posada JD, Banda JM, Horcajada JP, Kohler J, Shah K, Natarajan K, Lynch KE, Liu L, Schilling LM, Recalde M, Spotnitz M, Gong M, Matheny ME, Valveny N, Weiskopf NG, Shah N, Alser O, Casajust P, Park RW, Schuff R, Seager S, DuVall SL, You SC, Song S, Fernández-Bertolín S, Fortin S, Magoc T, Falconer T, Subbian V, Huser V, Ahmed WUR, Carter W, Guan Y, Galvan Y, He X, Rijnbeek PR, Hripcsak G, Ryan PB, Suchard MA, Prieto-Alhambra D. Unraveling COVID-19: A Large-Scale Characterization of 4.5 Million COVID-19 Cases Using CHARYBDIS. Clin Epidemiol 2022; 14:369-384. [PMID: 35345821 PMCID: PMC8957305 DOI: 10.2147/clep.s323292] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 01/27/2022] [Indexed: 01/20/2023] Open
Abstract
Purpose Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) Characterizing Health Associated Risks and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Patients and Methods We conducted a descriptive retrospective database study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11th June 2020 and are iteratively updated via GitHub. We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19, and 113,627 hospitalized with COVID-19 requiring intensive services. Results We aggregated over 22,000 unique characteristics describing patients with COVID-19. All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts and are readily available online. Globally, we observed similarities in the USA and Europe: more women diagnosed than men but more men hospitalized than women, most diagnosed cases between 25 and 60 years of age versus most hospitalized cases between 60 and 80 years of age. South Korea differed with more women than men hospitalized. Common comorbidities included type 2 diabetes, hypertension, chronic kidney disease and heart disease. Common presenting symptoms were dyspnea, cough and fever. Symptom data availability was more common in hospitalized cohorts than diagnosed. Conclusion We constructed a global, multi-centre view to describe trends in COVID-19 progression, management and evolution over time. By characterising baseline variability in patients and geography, our work provides critical context that may otherwise be misconstrued as data quality issues. This is important as we perform studies on adverse events of special interest in COVID-19 vaccine surveillance.
Collapse
Affiliation(s)
- Kristin Kostka
- IQVIA, Cambridge, MA, USA
- OHDSI Center at The Roux Institute, Northeastern University, Portland, ME, USA
| | - Talita Duarte-Salles
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | - Albert Prats-Uribe
- Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK
| | - Anthony G Sena
- Janssen Research & Development, Titusville, NJ, USA
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Andrea Pistillo
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | - Sara Khalid
- Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK
| | - Lana Y H Lai
- School of Medical Sciences, University of Manchester, Manchester, UK
| | - Asieh Golozar
- Regeneron Pharmaceuticals, Tarrytown, NY, USA
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Dalia M Dawoud
- National Institute for Health and Care Excellence, London, UK
| | - Fredrik Nyberg
- School of Public Health and Community Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Adam B Wilcox
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
- Unviersity of Washington Medicine, Seattle, WA, USA
| | - Alan Andryc
- Janssen Research & Development, Titusville, NJ, USA
| | - Andrew Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Boston, MA, USA
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Carlos Areia
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| | - Chi Young Jung
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Daegu Catholic University Medical Center, Daegu, South Korea
| | | | - Christian G Reich
- IQVIA, Cambridge, MA, USA
- OHDSI Center at The Roux Institute, Northeastern University, Portland, ME, USA
| | - Clair Blacketer
- Janssen Research & Development, Titusville, NJ, USA
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Daniel R Morales
- Division of Population Health and Genomics, University of Dundee, Dundee, UK
| | - David A Dorr
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Edward Burn
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
- Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK
| | - Elena Roel
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
- Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Eng Hooi Tan
- Centre for Statistics in Medicine, NDORMS, University of Oxford, Oxford, UK
| | - Evan Minty
- O’Brien Institute for Public Health, Faculty of Medicine, University of Calgary, Calgary, Canada
| | | | | | - Gigi Lipori
- University of Florida Health, Gainesville, FL, USA
| | - Hiba Alghoul
- Faculty of Medicine, Islamic University of Gaza, Gaza, Palestine
| | - Hong Zhu
- Nanfang Hospital, Southern Medical University, Guangzhou, People’s Republic of China
| | - Jason A Thomas
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Jiang Bian
- University of Florida Health, Gainesville, FL, USA
| | - Jimyung Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
| | - Jordi Martínez Roldán
- Director of Innovation and Digital Transformation, Hospital del Mar, Barcelona, Spain
| | - Jose D Posada
- Department of Medicine, School of Medicine, Stanford University, Redwood City, CA, USA
| | - Juan M Banda
- Georgia State University, Department of Computer Science, Atlanta, GA, USA
| | - Juan P Horcajada
- Department of Infectious Diseases, Hospital del Mar, Institut Hospital del Mar d’Investigació Mèdica (IMIM), Universitat Autònoma de Barcelona, Universitat Pompeu Fabra, Barcelona, Spain
| | - Julianna Kohler
- United States Agency for International Development, Washington, DC, USA
| | - Karishma Shah
- Botnar Research Centre, NDORMS, University of Oxford, Oxford, UK
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
- New York-Presbyterian Hospital, New York, NY, USA
| | - Kristine E Lynch
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, USA
- Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Li Liu
- Biomedical Big Data Center, Nanfang Hospital, Southern Medical University, Guangzhou, People’s Republic of China
| | - Lisa M Schilling
- Data Science to Patient Value Program, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Martina Recalde
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
- Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - Mengchun Gong
- Institute of Health Management, Southern Medical University, Guangzhou, People’s Republic of China
| | - Michael E Matheny
- Tennessee Valley Healthcare System, Veterans Affairs Medical Center, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Nicole G Weiskopf
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Nigam Shah
- Department of Medicine, School of Medicine, Stanford University, Redwood City, CA, USA
| | - Osaid Alser
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | - Robert Schuff
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | | | - Scott L DuVall
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, USA
- Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Seng Chan You
- Department of Preventive Medicine, Yonsei University College of Medicine, Seoul, South Korea
| | - Seokyoung Song
- Department of Anesthesiology and Pain Medicine, Catholic University of Daegu, School of Medicine, Daegu, South Korea
| | - Sergio Fernández-Bertolín
- Fundació Institut Universitari per a la recerca a l’Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | | | - Tanja Magoc
- University of Florida Health, Gainesville, FL, USA
| | - Thomas Falconer
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Vignesh Subbian
- College of Engineering, The University of Arizona, Tucson, AZ, USA
| | - Vojtech Huser
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Waheed-Ul-Rahman Ahmed
- Botnar Research Centre, NDORMS, University of Oxford, Oxford, UK
- College of Medicine and Health, University of Exeter, St Luke’s Campus, Exeter, UK
| | - William Carter
- Data Science to Patient Value Program, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Yin Guan
- DHC Technologies Co. Ltd., Beijing, People’s Republic of China
| | | | - Xing He
- University of Florida Health, Gainesville, FL, USA
| | - Peter R Rijnbeek
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
- New York-Presbyterian Hospital, New York, NY, USA
| | - Patrick B Ryan
- Janssen Research & Development, Titusville, NJ, USA
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA
| | - Marc A Suchard
- Departments of Biostatistics, Computational Medicine, and Human Genetics, University of California, Los Angeles, CA, USA
| | | |
Collapse
|
12
|
Guerrier C, McDonnell C, Magoc T, Fishe JN, Harle CA. Understanding Health Care Administrators’ Data and Information Needs for Decision Making during the COVID-19 Pandemic: A Qualitative Study at an Academic Health System. MDM Policy Pract 2022; 7:23814683221089844. [PMID: 35368410 PMCID: PMC8972941 DOI: 10.1177/23814683221089844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/02/2022] [Indexed: 11/23/2022] Open
Abstract
Objective. The COVID-19 pandemic created an unprecedented strain on the health care system, and administrators had to make many critical decisions to respond appropriately. This study sought to understand how health care administrators used data and information for decision making during the first 6 mo of the COVID-19 pandemic. Materials and Methods. We conducted semistructured interviews with administrators across University of Florida (UF) Health. We performed an inductive thematic analysis of the transcripts. Results. Four themes emerged from the interviews: 1) common types of health systems or hospital operations data; 2) public health and other external data sources; 3) data interaction, integration, and exchange; and 4) novelty and evolution in data, information, or tools used over time. Participants illustrated the organizational, public health, and regional information they considered essential (e.g., hospital census, community positivity rate, etc.). Participants named specific challenges they faced due to data quality and timeliness. Participants elaborated on the necessity of data integration, validation, and coordination across different boundaries (e.g., different hospital systems in the same metro areas, public health agencies at the local, state, and federal level, etc.). Participants indicated that even within the first 6 mo of the COVID-19 pandemic, the data and tools used for making critical decisions changed. Discussion. While existing medical informatics infrastructure can facilitate decision making in pandemic response, data may not always be readily available in a usable format. Interoperable infrastructure and data standardization across multiple health systems would help provide more reliable and timely information for decision making. Conclusion. Our findings contribute to future discussions of improving data infrastructure and developing harmonized data standards needed to facilitate critical decisions at multiple health care system levels.
Collapse
Affiliation(s)
- Christina Guerrier
- Center for Data Solutions, University of Florida Health Science Center, Jacksonville, Florida, USA
| | - Cara McDonnell
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Tanja Magoc
- Integrated Data Repository, University of Florida, Gainesville, Florida, USA
| | - Jennifer N. Fishe
- Center for Data Solutions, University of Florida Health Science Center, Jacksonville, Florida, USA
| | - Christopher A. Harle
- Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| |
Collapse
|
13
|
Nguyen OT, Turner K, Apathy NC, Magoc T, Hanna K, Merlo LJ, Harle CA, Thompson LA, Berner ES, Feldman SS. Primary care physicians' electronic health record proficiency and efficiency behaviors and time interacting with electronic health records: a quantile regression analysis. J Am Med Inform Assoc 2021; 29:461-471. [PMID: 34897493 PMCID: PMC8800512 DOI: 10.1093/jamia/ocab272] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/05/2021] [Accepted: 11/23/2021] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE This study aimed to understand the association between primary care physician (PCP) proficiency with the electronic health record (EHR) system and time spent interacting with the EHR. MATERIALS AND METHODS We examined the use of EHR proficiency tools among PCPs at one large academic health system using EHR-derived measures of clinician EHR proficiency and efficiency. Our main predictors were the use of EHR proficiency tools and our outcomes focused on 4 measures assessing time spent in the EHR: (1) total time spent interacting with the EHR, (2) time spent outside scheduled clinical hours, (3) time spent documenting, and (4) time spent on inbox management. We conducted multivariable quantile regression models with fixed effects for physician-level factors and time in order to identify factors that were independently associated with time spent in the EHR. RESULTS Across 441 primary care physicians, we found mixed associations between certain EHR proficiency behaviors and time spent in the EHR. Across EHR activities studied, QuickActions, SmartPhrases, and documentation length were positively associated with increased time spent in the EHR. Models also showed a greater amount of help from team members in note writing was associated with less time spent in the EHR and documenting. DISCUSSION Examining the prevalence of EHR proficiency behaviors may suggest targeted areas for initial and ongoing EHR training. Although documentation behaviors are key areas for training, team-based models for documentation and inbox management require further study. CONCLUSIONS A nuanced association exists between physician EHR proficiency and time spent in the EHR.
Collapse
Affiliation(s)
- Oliver T Nguyen
- Corresponding Author: Oliver T. Nguyen, MSHI, Department of Community Health and Family Medicine, University of Florida, College of Medicine, PO Box 100211, Gainesville, FL 32610, USA;
| | - Kea Turner
- Department of Health Outcomes and Behavior, Moffitt Cancer Center, Tampa, Florida, USA,Department of Oncological Sciences, University of South Florida, Tampa, Florida, USA
| | - Nate C Apathy
- Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Tanja Magoc
- Clinical and Translational Science Institute, University of Florida, Gainesville, Florida, USA
| | - Karim Hanna
- Department of Family Medicine, Morsani College of Medicine, University of South Florida, Tampa, Florida, USA
| | - Lisa J Merlo
- Department of Psychiatry, University of Florida, Gainesville, Florida, USA
| | - Christopher A Harle
- Clinical and Translational Science Institute, University of Florida, Gainesville, Florida, USA,Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Lindsay A Thompson
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA,Department of Pediatrics, University of Florida, Gainesville, Florida, USA
| | - Eta S Berner
- Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Sue S Feldman
- Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, Alabama, USA
| |
Collapse
|
14
|
Hogan WR, Shenkman EA, Robinson T, Carasquillo O, Robinson PS, Essner RZ, Bian J, Lipori G, Harle C, Magoc T, Manini L, Mendoza T, White S, Loiacono A, Hall J, Nelson D. The OneFlorida Data Trust: a centralized, translational research data infrastructure of statewide scope. J Am Med Inform Assoc 2021; 29:686-693. [PMID: 34664656 PMCID: PMC8922180 DOI: 10.1093/jamia/ocab221] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 09/03/2021] [Accepted: 09/29/2021] [Indexed: 01/22/2023] Open
Abstract
The OneFlorida Data Trust is a centralized research patient data repository created and managed by the OneFlorida Clinical Research Consortium ("OneFlorida"). It comprises structured electronic health record (EHR), administrative claims, tumor registry, death, and other data on 17.2 million individuals who received healthcare in Florida between January 2012 and the present. Ten healthcare systems in Miami, Orlando, Tampa, Jacksonville, Tallahassee, Gainesville, and rural areas of Florida contribute EHR data, covering the major metropolitan regions in Florida. Deduplication of patients is accomplished via privacy-preserving entity resolution (precision 0.97-0.99, recall 0.75), thereby linking patients' EHR, claims, and death data. Another unique feature is the establishment of mother-baby relationships via Florida vital statistics data. Research usage has been significant, including major studies launched in the National Patient-Centered Clinical Research Network ("PCORnet"), where OneFlorida is 1 of 9 clinical research networks. The Data Trust's robust, centralized, statewide data are a valuable and relatively unique research resource.
Collapse
Affiliation(s)
- William R Hogan
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA,Corresponding Author: William R. Hogan, MD, MS, FACMI, Clinical & Translational Research Building, 2004 Mowry Road, PO Box 100219, Gainesville, FL 32610, USA;
| | - Elizabeth A Shenkman
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | | | | | | | | | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | | | - Christopher Harle
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA,UF Health, Gainesville, Florida, USA
| | | | - Lizabeth Manini
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Tona Mendoza
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Sonya White
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Alex Loiacono
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Jackie Hall
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | | |
Collapse
|
15
|
Prieto-Alhambra D, Kostka K, Duarte-Salles T, Prats-Uribe A, Sena A, Pistillo A, Khalid S, Lai L, Golozar A, Alshammari TM, Dawoud D, Nyberg F, Wilcox A, Andryc A, Williams A, Ostropolets A, Areia C, Jung CY, Harle C, Reich C, Blacketer C, Morales D, Dorr DA, Burn E, Roel E, Tan EH, Minty E, DeFalco F, de Maeztu G, Lipori G, Alghoul H, Zhu H, Thomas J, Bian J, Park J, Roldán JM, Posada J, Banda JM, Horcajada JP, Kohler J, Shah K, Natarajan K, Lynch K, Liu L, Schilling L, Recalde M, Spotnitz M, Gong M, Matheny M, Valveny N, Weiskopf N, Shah N, Alser O, Casajust P, Park RW, Schuff R, Seager S, DuVall S, You SC, Song S, Fernández-Bertolín S, Fortin S, Magoc T, Falconer T, Subbian V, Huser V, Ahmed WUR, Carter W, Guan Y, Galvan Y, He X, Rijnbeek P, Hripcsak G, Ryan P, Suchard M. Unraveling COVID-19: a large-scale characterization of 4.5 million COVID-19 cases using CHARYBDIS. Res Sq 2021:rs.3.rs-279400. [PMID: 33688639 PMCID: PMC7941629 DOI: 10.21203/rs.3.rs-279400/v1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background: Routinely collected real world data (RWD) have great utility in aiding the novel coronavirus disease (COVID-19) pandemic response [1,2]. Here we present the international Observational Health Data Sciences and Informatics (OHDSI) [3] Characterizing Health Associated Risks, and Your Baseline Disease In SARS-COV-2 (CHARYBDIS) framework for standardisation and analysis of COVID-19 RWD. Methods: We conducted a descriptive cohort study using a federated network of data partners in the United States, Europe (the Netherlands, Spain, the UK, Germany, France and Italy) and Asia (South Korea and China). The study protocol and analytical package were released on 11 th June 2020 and are iteratively updated via GitHub [4]. Findings: We identified three non-mutually exclusive cohorts of 4,537,153 individuals with a clinical COVID-19 diagnosis or positive test, 886,193 hospitalized with COVID-19 , and 113,627 hospitalized with COVID-19 requiring intensive services . All comorbidities, symptoms, medications, and outcomes are described by cohort in aggregate counts, and are available in an interactive website: https://data.ohdsi.org/Covid19CharacterizationCharybdis/. Interpretation: CHARYBDIS findings provide benchmarks that contribute to our understanding of COVID-19 progression, management and evolution over time. This can enable timely assessment of real-world outcomes of preventative and therapeutic options as they are introduced in clinical practice.
Collapse
Affiliation(s)
- Daniel Prieto-Alhambra
- Centre for Statistics in Medicine (CSM), Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDROMS), University of Oxford, UK
| | | | - Talita Duarte-Salles
- Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | | | - Anthony Sena
- Janssen R&D, Titusville NJ, USA, 2) Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Andrea Pistillo
- Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | - Sara Khalid
- Centre for Statistics in Medicine, NDORMS, University of Oxford, UK
| | - Lana Lai
- Division of Cancer Sciences, School of Medical Sciences, University of Manchester, UK
| | - Asieh Golozar
- Regeneron Pharmaceuticals, NY USA, Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, MD USA
| | - Thamir M Alshammari
- Medication Safety Research Chair, King Saud University, Riyadh, Saudi Arabia
| | - Dalia Dawoud
- National Institute for Health and Care Excellence, London, UK
| | - Fredrik Nyberg
- School of Public Health and Community Medicine, Institute of Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Adam Wilcox
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA, 2) UW Medicine, Seattle, WA, USA
| | | | - Andrew Williams
- Tufts Institute for Clinical Research and Health Policy Studies, US
| | - Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Carlos Areia
- Nuffield Department of Clinical Neurosciences, University of Oxford, UK
| | - Chi Young Jung
- Division of Respiratory and Critical Care Medicine, Department of Internal Medicine, Daegu Catholic University Medical Center, Daegu, Korea
| | | | | | - Clair Blacketer
- Janssen R&D, Titusville NJ, USA, 2) Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Daniel Morales
- Division of Population Health and Genomics, University of Dundee, UK
| | - David A Dorr
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | - Edward Burn
- Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | | | - Eng Hooi Tan
- Centre for Statistics in Medicine, NDORMS, University of Oxford, UK
| | - Evan Minty
- O'Brien Institute for Public Health, Faculty of Medicine, University of Calgary, Canada
| | | | | | | | - Heba Alghoul
- Faculty of Medicine, Islamic University of Gaza, Palestine
| | - Hong Zhu
- Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Jason Thomas
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | | | - Jimyung Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Korea
| | - Jordi Martínez Roldán
- Director of Innovation and Digital Transformation, Hospital del Mar, Barcelona, Spain
| | - Jose Posada
- Stanford University School of Medicine, Stanford, California, USA
| | - Juan M Banda
- Georgia State University, Department of Computer Science, Atlanta, GA, USA
| | - Juan P Horcajada
- Department of Infectious Diseases, Hospital del Mar, Institut Hospital del Mar d'Investigació Mèdica (IMIM), Universitat Autònoma de Barcelona. Universitat Pompeu Fabra, Barcelo
| | - Julianna Kohler
- United States Agency for International Development, Washington, DC, USA
| | - Karishma Shah
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Karthik Natarajan
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, USA, 2) New York-Presbyterian Hospital, 622 W 168 St, PH20 New York, NY 10032 USA
| | - Kristine Lynch
- VINCI, VA Salt Lake City Health Care System, Salt Lake City, VA, & Division of Epidemiology, University of Utah, Salt Lake City, UT
| | - Li Liu
- Biomedical Big Data Center, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Lisa Schilling
- Data Science to Patient Value Program, University of Colorado Anschutz Medical Campus
| | - Martina Recalde
- Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | - Matthew Spotnitz
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | | | - Michael Matheny
- VINCI, Tennessee Valley Healthcare System VA, Nashville, TN & Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | | | - Nicole Weiskopf
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
| | | | - Osaid Alser
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Korea
| | - Robert Schuff
- Knight Cancer Institute, Oregon Health & Science University
| | | | - Scott DuVall
- VA Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, UT, USA
| | - Seng Chan You
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | - Seokyoung Song
- Department of Anesthesiology and Pain Medicine, Catholic University of Daegu, School of Medicine, Daegu, Korea
| | - Sergio Fernández-Bertolín
- Fundació Institut Universitari per a la recerca a l'Atenció Primària de Salut Jordi Gol i Gurina (IDIAPJGol), Barcelona, Spain
| | - Stephen Fortin
- Observational Health Data Analytics, Janssen Research and Development, Raritan, NJ, USA
| | | | - Thomas Falconer
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Vignesh Subbian
- College of Engineering, The University of Arizona, Tucson, Arizona, USA
| | - Vojtech Huser
- National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Waheed-Ul-Rahman Ahmed
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK, 2) College of Medicine and Health, University of Exeter, St Luke's Campus, E
| | - William Carter
- Data Science to Patient Value Program, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Yin Guan
- DHC Technologies Co. Ltd, Beijing, China
| | | | - Xing He
- University of Florida Health
| | - Peter Rijnbeek
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY 10032, USA, 2) New York-Presbyterian Hospital, 622 W 168 St, PH20 New York, NY 10032 USA
| | | | - Marc Suchard
- Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles
| |
Collapse
|
16
|
Kapheim KM, Pan H, Li C, Salzberg SL, Puiu D, Magoc T, Robertson HM, Hudson ME, Venkat A, Fischman BJ, Hernandez A, Yandell M, Ence D, Holt C, Yocum GD, Kemp WP, Bosch J, Waterhouse RM, Zdobnov EM, Stolle E, Kraus FB, Helbing S, Moritz RFA, Glastad KM, Hunt BG, Goodisman MAD, Hauser F, Grimmelikhuijzen CJP, Pinheiro DG, Nunes FMF, Soares MPM, Tanaka ÉD, Simões ZLP, Hartfelder K, Evans JD, Barribeau SM, Johnson RM, Massey JH, Southey BR, Hasselmann M, Hamacher D, Biewer M, Kent CF, Zayed A, Blatti C, Sinha S, Johnston JS, Hanrahan SJ, Kocher SD, Wang J, Robinson GE, Zhang G. Social evolution. Genomic signatures of evolutionary transitions from solitary to group living. Science 2015; 348:1139-43. [PMID: 25977371 DOI: 10.1126/science.aaa4788] [Citation(s) in RCA: 234] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 05/06/2015] [Indexed: 12/14/2022]
Abstract
The evolution of eusociality is one of the major transitions in evolution, but the underlying genomic changes are unknown. We compared the genomes of 10 bee species that vary in social complexity, representing multiple independent transitions in social evolution, and report three major findings. First, many important genes show evidence of neutral evolution as a consequence of relaxed selection with increasing social complexity. Second, there is no single road map to eusociality; independent evolutionary transitions in sociality have independent genetic underpinnings. Third, though clearly independent in detail, these transitions do have similar general features, including an increase in constrained protein evolution accompanied by increases in the potential for gene regulation and decreases in diversity and abundance of transposable elements. Eusociality may arise through different mechanisms each time, but would likely always involve an increase in the complexity of gene networks.
Collapse
Affiliation(s)
- Karen M Kapheim
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Biology, Utah State University, Logan, UT 84322, USA.
| | - Hailin Pan
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083, China
| | - Cai Li
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083, China. Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, 1350, Denmark
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD 21218, USA. Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Daniela Puiu
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Tanja Magoc
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Hugh M Robertson
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Matthew E Hudson
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Aarti Venkat
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Brielle J Fischman
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Program in Ecology and Evolutionary Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Biology, Hobart and William Smith Colleges, Geneva, NY 14456, USA
| | - Alvaro Hernandez
- Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mark Yandell
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA. USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - Daniel Ence
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA
| | - Carson Holt
- Department of Human Genetics, Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT 84112, USA. USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, UT 84112, USA
| | - George D Yocum
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS) Red River Valley Agricultural Research Center, Biosciences Research Laboratory, Fargo, ND 58102, USA
| | - William P Kemp
- U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS) Red River Valley Agricultural Research Center, Biosciences Research Laboratory, Fargo, ND 58102, USA
| | - Jordi Bosch
- Center for Ecological Research and Forestry Applications (CREAF), Universitat Autonoma de Barcelona, 08193 Bellaterra, Spain
| | - Robert M Waterhouse
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA. The Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1211 Geneva, Switzerland. Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Eckart Stolle
- Institute of Biology, Department Zoology, Martin-Luther-University Halle-Wittenberg, Hoher Weg 4, D-06099 Halle (Saale), Germany. Queen Mary University of London, School of Biological and Chemical Sciences Organismal Biology Research Group, London E1 4NS, UK
| | - F Bernhard Kraus
- Institute of Biology, Department Zoology, Martin-Luther-University Halle-Wittenberg, Hoher Weg 4, D-06099 Halle (Saale), Germany. Department of Laboratory Medicine, University Hospital Halle, Ernst Grube Strasse 40, D-06120 Halle (Saale), Germany
| | - Sophie Helbing
- Institute of Biology, Department Zoology, Martin-Luther-University Halle-Wittenberg, Hoher Weg 4, D-06099 Halle (Saale), Germany
| | - Robin F A Moritz
- Institute of Biology, Department Zoology, Martin-Luther-University Halle-Wittenberg, Hoher Weg 4, D-06099 Halle (Saale), Germany. German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, 04103 Leipzig, Germany
| | - Karl M Glastad
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Brendan G Hunt
- Department of Entomology, University of Georgia, Griffin, GA 30223, USA
| | | | - Frank Hauser
- Center for Functional and Comparative Insect Genomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Cornelis J P Grimmelikhuijzen
- Center for Functional and Comparative Insect Genomics, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Daniel Guariz Pinheiro
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil. Departamento de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista (UNESP), 14884-900 Jaboticabal, SP, Brazil
| | - Francis Morais Franco Nunes
- Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, 13565-905 São Carlos, SP, Brazil
| | - Michelle Prioli Miranda Soares
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
| | - Érica Donato Tanaka
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, 14049-900 Ribeirão Preto, SP, Brazil
| | - Zilá Luz Paulino Simões
- Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, 14040-901 Ribeirão Preto, SP, Brazil
| | - Klaus Hartfelder
- Departamento de Biologia Celular e Molecular e Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, 14049-900 Ribeirão Preto, SP, Brazil
| | - Jay D Evans
- USDA-ARS Bee Research Lab, Beltsville, MD 20705 USA
| | - Seth M Barribeau
- Department of Biology, East Carolina University, Greenville, NC 27858, USA
| | - Reed M Johnson
- Department of Entomology, Ohio Agricultural Research and Development Center, Ohio State University, Wooster, OH 44691, USA
| | - Jonathan H Massey
- Department of Entomology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Bruce R Southey
- Department of Animal Sciences, University of Illinois, Urbana, IL 61801, USA
| | - Martin Hasselmann
- Department of Population Genomics, Institute of Animal Husbandry and Animal Breeding, University of Hohenheim, Germany
| | - Daniel Hamacher
- Department of Population Genomics, Institute of Animal Husbandry and Animal Breeding, University of Hohenheim, Germany
| | - Matthias Biewer
- Department of Population Genomics, Institute of Animal Husbandry and Animal Breeding, University of Hohenheim, Germany
| | - Clement F Kent
- Department of Biology, York University, Toronto, ON M3J 1P3, Canada. Janelia Farm Research Campus, Howard Hughes Medical Institue, Ashburn, VA 20147, USA
| | - Amro Zayed
- Department of Biology, York University, Toronto, ON M3J 1P3, Canada
| | - Charles Blatti
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Saurabh Sinha
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - J Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA
| | - Shawn J Hanrahan
- Department of Entomology, Texas A&M University, College Station, TX 77843, USA
| | - Sarah D Kocher
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, USA
| | - Jun Wang
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083, China. Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark. Princess Al Jawhara Center of Excellence in the Research of Hereditary Disorders, King Abdulaziz University, Jeddah 21589, Saudi Arabia. Macau University of Science and Technology, Avenida Wai long, Taipa, Macau 999078, China. Department of Medicine, University of Hong Kong, Hong Kong.
| | - Gene E Robinson
- Carl R. WoeseInstitute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA. Center for Advanced Study Professor in Entomology and Neuroscience, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, 518083, China. Centre for Social Evolution, Department of Biology, Universitetsparken 15, University of Copenhagen, DK-2100 Copenhagen, Denmark.
| |
Collapse
|
17
|
Abstract
MOTIVATION A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods. RESULTS We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms. CONTACT salzberg@jhu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tanja Magoc
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21025, USA
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Abstract
Background The expression levels of bacterial genes can be measured directly using next-generation sequencing (NGS) methods, offering much greater sensitivity and accuracy than earlier, microarray-based methods. Most bioinformatics software for estimating levels of gene expression from NGS data has been designed for eukaryotic genomes, with algorithms focusing particularly on detection of splicing patterns. These methods do not perform well on bacterial genomes. Results Here we describe the first software system designed explicitly for quantifying the degree of gene expression in bacteria and other prokaryotes. EDGE-pro (Estimated Degree of Gene Expression in PROkaryotes) processes the raw data from an RNA-seq experiment on a bacterial or archaeal species and produces estimates of the expression levels for each gene in these gene-dense genomes. Software The EDGE-pro tool is implemented as a pipeline of C++ and Perl programs and is freely available as open-source code at http://www.genomics.jhu.edu/software/EDGE/index.shtml.
Collapse
Affiliation(s)
- Tanja Magoc
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | | | | |
Collapse
|
19
|
Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res 2012; 22:557-67. [PMID: 22147368 DOI: 10.1101/gr.131383.111] [Citation(s) in RCA: 410] [Impact Index Per Article: 34.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
New sequencing technology has dramatically altered the landscape of whole-genome sequencing, allowing scientists to initiate numerous projects to decode the genomes of previously unsequenced organisms. The lowest-cost technology can generate deep coverage of most species, including mammals, in just a few days. The sequence data generated by one of these projects consist of millions or billions of short DNA sequences (reads) that range from 50 to 150 nt in length. These sequences must then be assembled de novo before most genome analyses can begin. Unfortunately, genome assembly remains a very difficult problem, made more difficult by shorter reads and unreliable long-range linking information. In this study, we evaluated several of the leading de novo assembly algorithms on four different short-read data sets, all generated by Illumina sequencers. Our results describe the relative performance of the different assemblers as well as other significant differences in assembly difficulty that appear to be inherent in the genomes themselves. Three overarching conclusions are apparent: first, that data quality, rather than the assembler itself, has a dramatic effect on the quality of an assembled genome; second, that the degree of contiguity of an assembly varies enormously among different assemblers and different genomes; and third, that the correctness of an assembly also varies widely and is not well correlated with statistics on contiguity. To enable others to replicate our results, all of our data and methods are freely available, as are all assemblers used in this study.
Collapse
Affiliation(s)
- Steven L Salzberg
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Guinn DE, Summers JB, Heyman HR, Conway RG, Rhein DA, Albert DH, Magoc T, Carter GW. Synthesis and structure-activity relationships of a series of novel benzopyran-containing platelet activating factor antagonists. J Med Chem 1992; 35:2055-61. [PMID: 1317924 DOI: 10.1021/jm00089a016] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
A class of N-substituted tetrahydrobenzopyrano[3,4-c]pyridines, I, have been identified as antagonists of platelet activating factor (PAF). The structural features essential for PAF binding were determined by systematic modification of three sites in the molecule. While O-alkyl analogues had little effect on binding potency, N-alkyl analogues exhibited a wide range of activity. Structural changes in the core ring system generally resulted in a loss of binding activity. Optimization of the N- and O-substituents resulted in the analogues 25-27 which exhibited Ki values ranging between 131 and 167 nM in a [3H]PAF binding assay. Compound 23 was also active in a model of PAF-induced shock in the mouse following intravenous administration.
Collapse
Affiliation(s)
- D E Guinn
- Immunosciences Research Area, Abbott Laboratories, Abbott Park, Illinois 60064
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Kocka FE, Magoc T, Searcy RL. Evaluation of rapid tests for staphylococci characterization. Am J Med Technol 1973; 39:269-71. [PMID: 4578189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
22
|
|
23
|
|