1
|
Ekpo RH, Osamor VC, Azeta AA, Ikeakanam E, Amos BO. Machine learning classification approach for asthma prediction models in children. HEALTH AND TECHNOLOGY 2023. [DOI: 10.1007/s12553-023-00732-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
2
|
Bae WD, Alkobaisi S, Horak M, Park CS, Kim S, Davidson J. Predicting Health Risks of Adult Asthmatics Susceptible to Indoor Air Quality Using Improved Logistic and Quantile Regression Models. Life (Basel) 2022; 12:life12101631. [PMID: 36295066 PMCID: PMC9604638 DOI: 10.3390/life12101631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/03/2022] [Accepted: 10/04/2022] [Indexed: 11/28/2022] Open
Abstract
The increasing global patterns for asthma disease and its associated fiscal burden to healthcare systems demand a change to healthcare processes and the way asthma risks are managed. Patient-centered health care systems equipped with advanced sensing technologies can empower patients to participate actively in their health risk control, which results in improving health outcomes. Despite having data analytics gradually emerging in health care, the path to well established and successful data driven health care services exhibit some limitations. Low accuracy of existing predictive models causes misclassification and needs improvement. In addition, lack of guidance and explanation of the reasons of a prediction leads to unsuccessful interventions. This paper proposes a modeling framework for an asthma risk management system in which the contributions are three fold: First, the framework uses a deep learning technique to improve the performance of logistic regression classification models. Second, it implements a variable sliding window method considering spatio-temporal properties of the data, which improves the quality of quantile regression models. Lastly, it provides a guidance on how to use the outcomes of the two predictive models in practice. To promote the application of predictive modeling, we present a use case that illustrates the life cycle of the proposed framework. The performance of our proposed framework was extensively evaluated using real datasets in which results showed improvement in the model classification accuracy, approximately 11.5–18.4% in the improved logistic regression classification model and confirmed low relative errors ranging from 0.018 to 0.160 in quantile regression model.
Collapse
Affiliation(s)
- Wan D. Bae
- Department of Computer Science, Seattle University, Seattle, WA 98122, USA
| | - Shayma Alkobaisi
- College of Information Technology, United Arab Emirates University, Al Ain 15551, United Arab Emirates
- Correspondence:
| | - Matthew Horak
- Lockheed Martin Space Systems, Denver, CO 80221, USA
| | - Choon-Sik Park
- Department of Internal Medicine, Soonchunhyang Bucheon Hospital, Bucheon 420-767, Korea
| | - Sungroul Kim
- Department of ICT Environmental Health System, Graduate School, Department of Environmental Sciences, Soonchunhyang University, Asan 336-745, Korea
| | - Joel Davidson
- Department of Computer Science, Seattle University, Seattle, WA 98122, USA
| |
Collapse
|
3
|
Luo G. A Roadmap for Boosting Model Generalizability for Predicting Hospital Encounters for Asthma. JMIR Med Inform 2022; 10:e33044. [PMID: 35230246 PMCID: PMC8924785 DOI: 10.2196/33044] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 01/08/2022] [Indexed: 11/29/2022] Open
Abstract
In the United States, ~9% of people have asthma. Each year, asthma incurs high health care cost and many hospital encounters covering 1.8 million emergency room visits and 439,000 hospitalizations. A small percentage of patients with asthma use most health care resources. To improve outcomes and cut resource use, many health care systems use predictive models to prospectively find high-risk patients and enroll them in care management for preventive care. For maximal benefit from costly care management with limited service capacity, only patients at the highest risk should be enrolled. However, prior models built by others miss >50% of true highest-risk patients and mislabel many low-risk patients as high risk, leading to suboptimal care and wasted resources. To address this issue, 3 site-specific models were recently built to predict hospital encounters for asthma, gaining up to >11% better performance. However, these models do not generalize well across sites and patient subgroups, creating 2 gaps before translating these models into clinical use. This paper points out these 2 gaps and outlines 2 corresponding solutions: (1) a new machine learning technique to create cross-site generalizable predictive models to accurately find high-risk patients and (2) a new machine learning technique to automatically raise model performance for poorly performing subgroups while maintaining model performance on other subgroups. This gives a roadmap for future research.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| |
Collapse
|
4
|
Meng Z, Chen H, Deng C, Meng S. Potential cellular endocrinology mechanisms underlying the effects of Chinese herbal medicine therapy on asthma. Front Endocrinol (Lausanne) 2022; 13:916328. [PMID: 36051395 PMCID: PMC9424672 DOI: 10.3389/fendo.2022.916328] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 07/04/2022] [Indexed: 11/20/2022] Open
Abstract
Asthma is a complex syndrome with polygenetic tendency and multiple phenotypes, which has variable expiratory airflow limitation and respiratory symptoms that vary over time and in intensity. In recent years, continuous industrial development has seriously impacted the climate and air quality at a global scale. It has been verified that climate change can induce asthma in predisposed individuals and that atmospheric pollution can exacerbate asthma severity. At present, a subset of patients is resistant to the drug therapy for asthma. Hence, it is urgent to find new ideas for asthma prevention and treatment. In this review, we discuss the prescription, composition, formulation, and mechanism of traditional Chinese medicine monomer, traditional Chinese medicine monomer complex, single herbs, and traditional Chinese patent medicine in the treatment of asthma. We also discuss the effects of Chinese herbal medicine on asthma from the perspective of cellular endocrinology in the past decade, emphasizing on the roles as intracellular and extracellular messengers of three substances-hormones, substances secreted by pulmonary neuroendocrine cells, and neuroendocrine-related signaling protein-which provide the theoretical basis for clinical application and new drug development.
Collapse
Affiliation(s)
- Zeyu Meng
- The Second Clinical Medical College, Heilongjiang University of Chinese Medicine, Harbin, China
| | - Huize Chen
- Department of Traditional Chinese Medicine, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, China
| | - Chujun Deng
- Department of Traditional Chinese Medicine, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, China
| | - Shengxi Meng
- Department of Traditional Chinese Medicine, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, China
- *Correspondence: Shengxi Meng,
| |
Collapse
|
5
|
Luo G, Stone BL, Sheng X, He S, Koebnick C, Nkoy FL. Using Computational Methods to Improve Integrated Disease Management for Asthma and Chronic Obstructive Pulmonary Disease: Protocol for a Secondary Analysis. JMIR Res Protoc 2021; 10:e27065. [PMID: 34003134 PMCID: PMC8170556 DOI: 10.2196/27065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 04/12/2021] [Accepted: 04/19/2021] [Indexed: 12/05/2022] Open
Abstract
Background Asthma and chronic obstructive pulmonary disease (COPD) impose a heavy burden on health care. Approximately one-fourth of patients with asthma and patients with COPD are prone to exacerbations, which can be greatly reduced by preventive care via integrated disease management that has a limited service capacity. To do this well, a predictive model for proneness to exacerbation is required, but no such model exists. It would be suboptimal to build such models using the current model building approach for asthma and COPD, which has 2 gaps due to rarely factoring in temporal features showing early health changes and general directions. First, existing models for other asthma and COPD outcomes rarely use more advanced temporal features, such as the slope of the number of days to albuterol refill, and are inaccurate. Second, existing models seldom show the reason a patient is deemed high risk and the potential interventions to reduce the risk, making already occupied clinicians expend more time on chart review and overlook suitable interventions. Regular automatic explanation methods cannot deal with temporal data and address this issue well. Objective To enable more patients with asthma and patients with COPD to obtain suitable and timely care to avoid exacerbations, we aim to implement comprehensible computational methods to accurately predict proneness to exacerbation and recommend customized interventions. Methods We will use temporal features to accurately predict proneness to exacerbation, automatically find modifiable temporal risk factors for every high-risk patient, and assess the impact of actionable warnings on clinicians’ decisions to use integrated disease management to prevent proneness to exacerbation. Results We have obtained most of the clinical and administrative data of patients with asthma from 3 prominent American health care systems. We are retrieving other clinical and administrative data, mostly of patients with COPD, needed for the study. We intend to complete the study in 6 years. Conclusions Our results will help make asthma and COPD care more proactive, effective, and efficient, improving outcomes and saving resources. International Registered Report Identifier (IRRID) PRR1-10.2196/27065
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Bryan L Stone
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Xiaoming Sheng
- College of Nursing, University of Utah, Salt Lake City, UT, United States
| | - Shan He
- Care Transformation and Information Systems, Intermountain Healthcare, West Valley City, UT, United States
| | - Corinna Koebnick
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Flory L Nkoy
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
6
|
Rankin D, Black M, Bond R, Wallace J, Mulvenna M, Epelde G. Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing. JMIR Med Inform 2020; 8:e18910. [PMID: 32501278 PMCID: PMC7400044 DOI: 10.2196/18910] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 04/24/2020] [Accepted: 06/04/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. OBJECTIVE This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. METHODS A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. RESULTS A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. CONCLUSIONS The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.
Collapse
Affiliation(s)
- Debbie Rankin
- School of Computing, Engineering and Intelligent Systems, Ulster University, Derry~Londonderry, United Kingdom
| | - Michaela Black
- School of Computing, Engineering and Intelligent Systems, Ulster University, Derry~Londonderry, United Kingdom
| | - Raymond Bond
- School of Computing, Ulster University, Jordanstown, United Kingdom
| | - Jonathan Wallace
- School of Computing, Ulster University, Jordanstown, United Kingdom
| | - Maurice Mulvenna
- School of Computing, Ulster University, Jordanstown, United Kingdom
| | - Gorka Epelde
- Vicomtech Foundation, Basque Research and Technology Alliance, Donostia-San Sebastián, Spain
- Biodonostia Health Research Institute, eHealth Group, Donostia-San Sebastián, Spain
| |
Collapse
|
7
|
Roe KD, Jawa V, Zhang X, Chute CG, Epstein JA, Matelsky J, Shpitser I, Taylor CO. Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance. PLoS One 2020; 15:e0231300. [PMID: 32324754 PMCID: PMC7179831 DOI: 10.1371/journal.pone.0231300] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 03/20/2020] [Indexed: 11/19/2022] Open
Abstract
Incorporating expert knowledge at the time machine learning models are trained holds promise for producing models that are easier to interpret. The main objectives of this study were to use a feature engineering approach to incorporate clinical expert knowledge prior to applying machine learning techniques, and to assess the impact of the approach on model complexity and performance. Four machine learning models were trained to predict mortality with a severe asthma case study. Experiments to select fewer input features based on a discriminative score showed low to moderate precision for discovering clinically meaningful triplets, indicating that discriminative score alone cannot replace clinical input. When compared to baseline machine learning models, we found a decrease in model complexity with use of fewer features informed by discriminative score and filtering of laboratory features with clinical input. We also found a small difference in performance for the mortality prediction task when comparing baseline ML models to models that used filtered features. Encoding demographic and triplet information in ML models with filtered features appeared to show performance improvements from the baseline. These findings indicated that the use of filtered features may reduce model complexity, and with little impact on performance.
Collapse
Affiliation(s)
- Kenneth D. Roe
- Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America
- The Institute of Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, United States of America
| | - Vibhu Jawa
- Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Computer Science, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, United States of America
| | - Xiaohan Zhang
- Division of Health Sciences Informatics, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Christopher G. Chute
- Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America
- The Institute of Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, United States of America
- Division of Health Sciences Informatics, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
- Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Jeremy A. Epstein
- Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Jordan Matelsky
- Johns Hopkins University Applied Physics Laboratory, Laurel, MD, United States of America
| | - Ilya Shpitser
- Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Computer Science, Johns Hopkins University Whiting School of Engineering, Baltimore, MD, United States of America
| | - Casey Overby Taylor
- Johns Hopkins Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States of America
- The Institute of Clinical and Translational Research, Johns Hopkins University, Baltimore, MD, United States of America
- Division of Health Sciences Informatics, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
- Division of General Internal Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
- * E-mail:
| |
Collapse
|
8
|
Kennedy K, Allenbrand R, Bowles E. The Role of Home Environments in Allergic Disease. Clin Rev Allergy Immunol 2020; 57:364-390. [PMID: 30684120 DOI: 10.1007/s12016-018-8724-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Allergic diseases are surprisingly common, chronic health conditions. The primary location where the vast majority of people are exposed to allergens and other substances is in their home. This means it is important to understand home environments and how a home's systems function and interact-and that how we occupy these spaces plays a crucial role in both environmental exposure and management of allergic disease. This review provides an overview of what is understood about home environmental exposure and its impact on our health, and proposes a systematic process for using a patient's environmental history to develop individualized, manageable and cost-effective recommendations. Once occupant-related information has been gathered, a home environmental exposure assessment should be performed focused on identifying the relationships between any identified sources of contaminants and the housing systems, and conditions that may be contributing to exposure. The results and recommendations from this assessment can then be used to guide exposure-reduction efforts by patients and/or their caregivers in an effort to improve disease management. In this review, we'll discuss three different types of home interventions-active, which must be routinely performed by the patient and/or caregiver, passive, which are interventions that work without routine, direct interaction from the homeowner, and behavioral changes in how the home environment is cleaned and maintained for long-term reduction of allergens. In this review, and others evaluated for this discussion, a significant number of home environmental assessment and intervention programs were shown to be cost effective, with the majority of programs showing a net positive return on investment. It is important to recognize that to be cost effective, the level and intensity of services offered through home visit programs need be stratified, based on the estimated health risks of the patient, in order to tailor the assessment and target the interventions to a patient's needs while maximizing cost effectiveness.
Collapse
Affiliation(s)
- Kevin Kennedy
- Section of Toxicology and Environmental Health, Children's Mercy Kansas City, Kansas City, USA.
| | - Ryan Allenbrand
- Section of Toxicology and Environmental Health, Children's Mercy Kansas City, Kansas City, USA
| | - Eric Bowles
- Section of Toxicology and Environmental Health, Children's Mercy Kansas City, Kansas City, USA
| |
Collapse
|
9
|
Luo G, He S, Stone BL, Nkoy FL, Johnson MD. Developing a Model to Predict Hospital Encounters for Asthma in Asthmatic Patients: Secondary Analysis. JMIR Med Inform 2020; 8:e16080. [PMID: 31961332 PMCID: PMC7001050 DOI: 10.2196/16080] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/01/2019] [Accepted: 12/01/2019] [Indexed: 12/12/2022] Open
Abstract
Background As a major chronic disease, asthma causes many emergency department (ED) visits and hospitalizations each year. Predictive modeling is a key technology to prospectively identify high-risk asthmatic patients and enroll them in care management for preventive care to reduce future hospital encounters, including inpatient stays and ED visits. However, existing models for predicting hospital encounters in asthmatic patients are inaccurate. Usually, they miss over half of the patients who will incur future hospital encounters and incorrectly classify many others who will not. This makes it difficult to match the limited resources of care management to the patients who will incur future hospital encounters, increasing health care costs and degrading patient outcomes. Objective The goal of this study was to develop a more accurate model for predicting hospital encounters in asthmatic patients. Methods Secondary analysis of 334,564 data instances from Intermountain Healthcare from 2005 to 2018 was conducted to build a machine learning classification model to predict the hospital encounters for asthma in the following year in asthmatic patients. The patient cohort included all asthmatic patients who resided in Utah or Idaho and visited Intermountain Healthcare facilities during 2005 to 2018. A total of 235 candidate features were considered for model building. Results The model achieved an area under the receiver operating characteristic curve of 0.859 (95% CI 0.846-0.871). When the cutoff threshold for conducting binary classification was set at the top 10.00% (1926/19,256) of asthmatic patients with the highest predicted risk, the model reached an accuracy of 90.31% (17,391/19,256; 95% CI 89.86-90.70), a sensitivity of 53.7% (436/812; 95% CI 50.12-57.18), and a specificity of 91.93% (16,955/18,444; 95% CI 91.54-92.31). To steer future research on this topic, we pinpointed several potential improvements to our model. Conclusions Our model improves the state of the art for predicting hospital encounters for asthma in asthmatic patients. After further refinement, the model could be integrated into a decision support tool to guide asthma care management allocation. International Registered Report Identifier (IRRID) RR2-10.2196/resprot.5039
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Shan He
- Care Transformation, Intermountain Healthcare, Salt Lake City, UT, United States
| | - Bryan L Stone
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Flory L Nkoy
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Michael D Johnson
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
10
|
Luo G, Stone BL, Koebnick C, He S, Au DH, Sheng X, Murtaugh MA, Sward KA, Schatz M, Zeiger RS, Davidson GH, Nkoy FL. Using Temporal Features to Provide Data-Driven Clinical Early Warnings for Chronic Obstructive Pulmonary Disease and Asthma Care Management: Protocol for a Secondary Analysis. JMIR Res Protoc 2019; 8:e13783. [PMID: 31199308 PMCID: PMC6592592 DOI: 10.2196/13783] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 05/13/2019] [Accepted: 05/14/2019] [Indexed: 01/19/2023] Open
Abstract
Background Both chronic obstructive pulmonary disease (COPD) and asthma incur heavy health care burdens. To support tailored preventive care for these 2 diseases, predictive modeling is widely used to give warnings and to identify patients for care management. However, 3 gaps exist in current modeling methods owing to rarely factoring in temporal aspects showing trends and early health change: (1) existing models seldom use temporal features and often give late warnings, making care reactive. A health risk is often found at a relatively late stage of declining health, when the risk of a poor outcome is high and resolving the issue is difficult and costly. A typical model predicts patient outcomes in the next 12 months. This often does not warn early enough. If a patient will actually be hospitalized for COPD next week, intervening now could be too late to avoid the hospitalization. If temporal features were used, this patient could potentially be identified a few weeks earlier to institute preventive therapy; (2) existing models often miss many temporal features with high predictive power and have low accuracy. This makes care management enroll many patients not needing it and overlook over half of the patients needing it the most; (3) existing models often give no information on why a patient is at high risk nor about possible interventions to mitigate risk, causing busy care managers to spend more time reviewing charts and to miss suited interventions. Typical automatic explanation methods cannot handle longitudinal attributes and fully address these issues. Objective To fill these gaps so that more COPD and asthma patients will receive more appropriate and timely care, we will develop comprehensible data-driven methods to provide accurate early warnings of poor outcomes and to suggest tailored interventions, making care more proactive, efficient, and effective. Methods By conducting a secondary data analysis and surveys, the study will: (1) use temporal features to provide accurate early warnings of poor outcomes and assess the potential impact on prediction accuracy, risk warning timeliness, and outcomes; (2) automatically identify actionable temporal risk factors for each patient at high risk for future hospital use and assess the impact on prediction accuracy and outcomes; and (3) assess the impact of actionable information on clinicians’ acceptance of early warnings and on perceived care plan quality. Results We are obtaining clinical and administrative datasets from 3 leading health care systems’ enterprise data warehouses. We plan to start data analysis in 2020 and finish our study in 2025. Conclusions Techniques to be developed in this study can boost risk warning timeliness, model accuracy, and generalizability; improve patient finding for preventive care; help form tailored care plans; advance machine learning for many clinical applications; and be generalized for many other chronic diseases. International Registered Report Identifier (IRRID) PRR1-10.2196/13783
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| | - Bryan L Stone
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| | - Corinna Koebnick
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States
| | - Shan He
- Care Transformation, Intermountain Healthcare, Salt Lake City, UT, United States
| | - David H Au
- Center of Innovation for Veteran-Centered & Value-Driven Care, VA Puget Sound Health Care System, Seattle, WA, United States.,Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Washington, Seattle, WA, United States
| | - Xiaoming Sheng
- College of Nursing, University of Utah, Salt Lake City, UT, United States
| | - Maureen A Murtaugh
- Department of Family and Preventive Medicine, University of Utah, Salt Lake City, UT, United States
| | - Katherine A Sward
- College of Nursing, University of Utah, Salt Lake City, UT, United States
| | - Michael Schatz
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States.,Department of Allergy, Kaiser Permanente Southern California, San Diego, CA, United States
| | - Robert S Zeiger
- Department of Research & Evaluation, Kaiser Permanente Southern California, Pasadena, CA, United States.,Department of Allergy, Kaiser Permanente Southern California, San Diego, CA, United States
| | - Giana H Davidson
- Department of Surgery, University of Washington, Seattle, WA, United States
| | - Flory L Nkoy
- Department of Pediatrics, University of Utah, Salt Lake City, UT, United States
| |
Collapse
|
11
|
Luo G. A roadmap for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling. GLOBAL TRANSITIONS 2019; 1:61-82. [PMID: 31032483 PMCID: PMC6482973 DOI: 10.1016/j.glt.2018.11.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Predictive modeling based on machine learning with medical data has great potential to improve healthcare and reduce costs. However, two hurdles, among others, impede its widespread adoption in hdealthcare. First, medical data are by nature longitudinal. Pre-processing them, particularly for feature engineering, is labor intensive and often takes 50-80% of the model building effort. Predictive temporal features are the basis of building accurate models, but are difficult to identify. This is problematic. Healthcare systems have limited resources for model building, while inaccurate models produce sub-optimal outcomes and are often useless. Second, most machine learning models provide no explanation of their prediction results. However, offering such explanations is essential for a model to be used in usual clinical practice. To address these two hurdles, this paper outlines: 1) a data-driven method for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling; and 2) a method of using these features to automatically explain machine learning prediction results and suggest tailored interventions. This provides a roadmap for future research.
Collapse
Affiliation(s)
- Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA, 98109, USA
| |
Collapse
|
12
|
Zeng X, Luo G. Progressive sampling-based Bayesian optimization for efficient and automatic machine learning model selection. Health Inf Sci Syst 2017; 5:2. [PMID: 29038732 PMCID: PMC5617811 DOI: 10.1007/s13755-017-0023-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 09/20/2017] [Indexed: 12/11/2022] Open
Abstract
PURPOSE Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. METHODS To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. RESULTS We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. CONCLUSIONS This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Collapse
Affiliation(s)
- Xueqiang Zeng
- Computer Center, Nanchang University, 999 Xuefu Road, Nanchang, 330031 Jiangxi People’s Republic of China
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA 98109 USA
| |
Collapse
|