1
|
Sadeghi P, Karimi H, Lavafian A, Rashedi R, Samieefar N, Shafiekhani S, Rezaei N. Machine learning and artificial intelligence within pediatric autoimmune diseases: applications, challenges, future perspective. Expert Rev Clin Immunol 2024:1-18. [PMID: 38771915 DOI: 10.1080/1744666x.2024.2359019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 05/20/2024] [Indexed: 05/23/2024]
Abstract
INTRODUCTION Autoimmune disorders affect 4.5% to 9.4% of children, significantly reducing their quality of life. The diagnosis and prognosis of autoimmune diseases are uncertain because of the variety of onset and development. Machine learning can identify clinically relevant patterns from vast amounts of data. Hence, its introduction has been beneficial in the diagnosis and management of patients. AREAS COVERED This narrative review was conducted through searching various electronic databases, including PubMed, Scopus, and Web of Science. This study thoroughly explores the current knowledge and identifies the remaining gaps in the applications of machine learning specifically in the context of pediatric autoimmune and related diseases. EXPERT OPINION Machine learning algorithms have the potential to completely change how pediatric autoimmune disorders are identified, treated, and managed. Machine learning can assist physicians in making more precise and fast judgments, identifying new biomarkers and therapeutic targets, and personalizing treatment strategies for each patient by utilizing massive datasets and powerful analytics.
Collapse
Affiliation(s)
- Parniyan Sadeghi
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- Student Research Committee, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hanie Karimi
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Atiye Lavafian
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- School of Medicine, Semnan University of Medical Science, Semnan, Iran
| | - Ronak Rashedi
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- USERN Office, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Noosha Samieefar
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- USERN Office, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sajad Shafiekhani
- Department of Biomedical Engineering, Buein Zahra Technical University, Qazvin, Iran
| | - Nima Rezaei
- Network of Interdisciplinarity in Neonates and Infants (NINI), Universal Scientific Education and Research Network (USERN), Tehran, Iran
- Research Center for Immunodeficiencies, Children's Medical Center, Tehran University of Medical Sciences, Tehran, Iran
- Department of Immunology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
Labib SM. Greenness, air pollution, and temperature exposure effects in predicting premature mortality and morbidity: A small-area study using spatial random forest model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 928:172387. [PMID: 38608883 DOI: 10.1016/j.scitotenv.2024.172387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 04/08/2024] [Accepted: 04/08/2024] [Indexed: 04/14/2024]
Abstract
BACKGROUND Although studies have provided negative impacts of air pollution, heat or cold exposure on mortality and morbidity, and positive effects of increased greenness on reducing them, a few studies have focused on exploring combined and synergetic effects of these exposures in predicting these health outcomes, and most had ignored the spatial autocorrelation in analyzing their health effects. This study aims to investigate the health effects of air pollution, greenness, and temperature exposure on premature mortality and morbidity within a spatial machine-learning modeling framework. METHODS Years of potential life lost reflecting premature mortality and comparative illness and disability ratio reflecting chronic morbidity from 1673 small areas covering Greater Manchester for the year 2008-2013 obtained. Average annual levels of NO2 concentration, normalized difference vegetation index (NDVI) representing greenness, and annual average air temperature were utilized to assess exposure in each area. These exposures were linked to health outcomes using non-spatial and spatial random forest (RF) models while accounting for spatial autocorrelation. RESULTS Spatial-RF models provided the best predictive accuracy when accounted for spatial autocorrelation. Among the exposures considered, air pollution emerged as the most influential in predicting mortality and morbidity, followed by NDVI and temperature exposure. Nonlinear exposure-response relations were observed, and interactions between exposures illustrated specific ranges or sweet and sour spots of exposure thresholds where combined effects either exacerbate or moderate health conditions. CONCLUSION Air pollution exposure had a greater negative impact on health compared to greenness and temperature exposure. Combined exposure effects may indicate the highest influence of premature mortality and morbidity burden.
Collapse
Affiliation(s)
- S M Labib
- Department of Human Geography and Spatial Planning, Faculty of Geosciences, Utrecht University, the Netherlands.
| |
Collapse
|
3
|
Ghazi L, Farhat K, Hoenig MP, Durant TJS, El-Khoury JM. Biomarkers vs Machines: The Race to Predict Acute Kidney Injury. Clin Chem 2024; 70:805-819. [PMID: 38299927 DOI: 10.1093/clinchem/hvad217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 10/20/2023] [Indexed: 02/02/2024]
Abstract
BACKGROUND Acute kidney injury (AKI) is a serious complication affecting up to 15% of hospitalized patients. Early diagnosis is critical to prevent irreversible kidney damage that could otherwise lead to significant morbidity and mortality. However, AKI is a clinically silent syndrome, and current detection primarily relies on measuring a rise in serum creatinine, an imperfect marker that can be slow to react to developing AKI. Over the past decade, new innovations have emerged in the form of biomarkers and artificial intelligence tools to aid in the early diagnosis and prediction of imminent AKI. CONTENT This review summarizes and critically evaluates the latest developments in AKI detection and prediction by emerging biomarkers and artificial intelligence. Main guidelines and studies discussed herein include those evaluating clinical utilitiy of alternate filtration markers such as cystatin C and structural injury markers such as neutrophil gelatinase-associated lipocalin and tissue inhibitor of metalloprotease 2 with insulin-like growth factor binding protein 7 and machine learning algorithms for the detection and prediction of AKI in adult and pediatric populations. Recommendations for clinical practices considering the adoption of these new tools are also provided. SUMMARY The race to detect AKI is heating up. Regulatory approval of select biomarkers for clinical use and the emergence of machine learning algorithms that can predict imminent AKI with high accuracy are all promising developments. But the race is far from being won. Future research focusing on clinical outcome studies that demonstrate the utility and validity of implementing these new tools into clinical practice is needed.
Collapse
Affiliation(s)
- Lama Ghazi
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL 35294, United States
| | - Kassem Farhat
- Faculty of Medicine, American University of Beirut, Beirut, Lebanon
| | - Melanie P Hoenig
- Renal Division, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA 02215, United States
| | - Thomas J S Durant
- Department of Laboratory Medicine, Yale School of Medicine, New Haven, CT 06510, United States
- Computational Biology and Bioinformatics, Yale University, New Haven, CT 06510, United States
| | - Joe M El-Khoury
- Department of Laboratory Medicine, Yale School of Medicine, New Haven, CT 06510, United States
| |
Collapse
|
4
|
Kikuchi T, Hanaoka S, Nakao T, Takenaga T, Nomura Y, Mori H, Yoshikawa T. Synthesis of Hybrid Data Consisting of Chest Radiographs and Tabular Clinical Records Using Dual Generative Models for COVID-19 Positive Cases. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024; 37:1217-1227. [PMID: 38351224 DOI: 10.1007/s10278-024-01015-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 06/13/2024]
Abstract
To generate synthetic medical data incorporating image-tabular hybrid data by merging an image encoding/decoding model with a table-compatible generative model and assess their utility. We used 1342 cases from the Stony Brook University Covid-19-positive cases, comprising chest X-ray radiographs (CXRs) and tabular clinical data as a private dataset (pDS). We generated a synthetic dataset (sDS) through the following steps: (I) dimensionally reducing CXRs in the pDS using a pretrained encoder of the auto-encoding generative adversarial networks (αGAN) and integrating them with the correspondent tabular clinical data; (II) training the conditional tabular GAN (CTGAN) on this combined data to generate synthetic records, encompassing encoded image features and clinical data; and (III) reconstructing synthetic images from these encoded image features in the sDS using a pretrained decoder of the αGAN. The utility of sDS was assessed by the performance of the prediction models for patient outcomes (deceased or discharged). For the pDS test set, the area under the receiver operating characteristic (AUC) curve was calculated to compare the performance of prediction models trained separately with pDS, sDS, or a combination of both. We created an sDS comprising CXRs with a resolution of 256 × 256 pixels and tabular data containing 13 variables. The AUC for the outcome was 0.83 when the model was trained with the pDS, 0.74 with the sDS, and 0.87 when combining pDS and sDS for training. Our method is effective for generating synthetic records consisting of both images and tabular clinical data.
Collapse
Affiliation(s)
- Tomohiro Kikuchi
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan.
- Department of Radiology, School of Medicine, Jichi Medical University, 3311-1 Yakushiji, Shimotsuke, Tochigi, 329-0498, Japan.
| | - Shouhei Hanaoka
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
| | - Takahiro Nakao
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| | - Tomomi Takenaga
- Department of Radiology, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, Japan
| | - Yukihiro Nomura
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
- Center for Frontier Medical Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, 263-8522, Japan
| | - Harushi Mori
- Department of Radiology, School of Medicine, Jichi Medical University, 3311-1 Yakushiji, Shimotsuke, Tochigi, 329-0498, Japan
| | - Takeharu Yoshikawa
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8655, Japan
| |
Collapse
|
5
|
Sloan RA. Estimated Cardiorespiratory Fitness and Metabolic Risks. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024; 21:635. [PMID: 38791849 PMCID: PMC11120962 DOI: 10.3390/ijerph21050635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/14/2024] [Accepted: 05/14/2024] [Indexed: 05/26/2024]
Abstract
This review focuses on the emerging evidence for the association between non-exercise fitness testing, estimated cardiorespiratory fitness (eCRF), and metabolic risk factors. Given the challenges associated with directly measuring cardiorespiratory fitness (CRF) in large populations, eCRF presents a practical alternative for predicting metabolic health risks. A literature search identified seven relevant cohort studies from 2020 to 2024 that investigated the association of eCRF with hypertension, hyperglycemia, dyslipidemia, and obesity. This review consistently demonstrates an inverse relationship between higher eCRF and a lower incidence of metabolic risks, which is in line with CRF cohort studies. It highlights the importance of low eCRF as a primordial indicator for metabolic risks and underscores the potential for broader application. Future research directions should include exploring eCRF's predictive ability across diverse populations and health outcomes and testing its real-world applicability in healthcare and public health settings.
Collapse
Affiliation(s)
- Robert A Sloan
- Division of Social and Behavioral Medicine, Kagoshima University Graduate Medical School, Kagoshima 890-8520, Japan
| |
Collapse
|
6
|
Rafiee M, Jahangiri-Rad M, Mohseni-Bandpei A, Razmi E. Impacts of socioeconomic and environmental factors on neoplasms incidence rates using machine learning and GIS: a cross-sectional study in Iran. Sci Rep 2024; 14:10604. [PMID: 38719879 PMCID: PMC11078954 DOI: 10.1038/s41598-024-61397-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 05/06/2024] [Indexed: 05/12/2024] Open
Abstract
Neoplasm is an umbrella term used to describe either benign or malignant conditions. The correlations between socioeconomic and environmental factors and the occurrence of new-onset of neoplasms have already been demonstrated in a body of research. Nevertheless, few studies have specifically dealt with the nature of relationship, significance of risk factors, and geographic variation of them, particularly in low- and middle-income communities. This study, thus, set out to (1) analyze spatiotemporal variations of the age-adjusted incidence rate (AAIR) of neoplasms in Iran throughout five time periods, (2) investigate relationships between a collection of environmental and socioeconomic indicators and the AAIR of neoplasms all over the country, and (3) evaluate geographical alterations in their relative importance. Our cross-sectional study design was based on county-level data from 2010 to 2020. AAIR of neoplasms data was acquired from the Institute for Health Metrics and Evaluation (IHME). HotSpot analyses and Anselin Local Moran's I indices were deployed to precisely identify AAIR of neoplasms high- and low-risk clusters. Multi-scale geographically weight regression (MGWR) analysis was worked out to evaluate the association between each explanatory variable and the AAIR of neoplasms. Utilizing random forests (RF), we also examined the relationships between environmental (e.g., UV index and PM2.5 concentration) and socioeconomic (e.g., Gini coefficient and literacy rate) factors and AAIR of neoplasms. AAIR of neoplasms displayed a significant increasing trend over the study period. According to the MGWR, the only factor that significantly varied spatially and was associated with the AAIR of neoplasms in Iran was the UV index. A good accuracy RF model was confirmed for both training and testing data with correlation coefficients R2 greater than 0.91 and 0.92, respectively. UV index and Gini coefficient ranked the highest variables in the prediction of AAIR of neoplasms, based on the relative influence of each variable. More research using machine learning approaches taking the advantages of considering all possible determinants is required to assess health strategies outcomes and properly formulate policy planning.
Collapse
Affiliation(s)
- Mohammad Rafiee
- Air Quality and Climate Change Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Environmental Health Engineering, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mahsa Jahangiri-Rad
- Department of Environmental Health Engineering, School of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran.
- Water Purification Research Center, Islamic Azad University, Tehran, Iran.
| | - Anoushiravan Mohseni-Bandpei
- Air Quality and Climate Change Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Environmental Health Engineering, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Elham Razmi
- Department of Environmental Health Engineering, School of Public Health, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
7
|
Nawrin SS, Inada H, Momma H, Nagatomi R. Twenty-four-hour physical activity patterns associated with depressive symptoms: a cross-sectional study using big data-machine learning approach. BMC Public Health 2024; 24:1254. [PMID: 38714982 PMCID: PMC11075341 DOI: 10.1186/s12889-024-18759-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 05/02/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND Depression is a global burden with profound personal and economic consequences. Previous studies have reported that the amount of physical activity is associated with depression. However, the relationship between the temporal patterns of physical activity and depressive symptoms is poorly understood. In this exploratory study, we hypothesize that a particular temporal pattern of daily physical activity could be associated with depressive symptoms and might be a better marker than the total amount of physical activity. METHODS To address the hypothesis, we investigated the association between depressive symptoms and daily dominant activity behaviors based on 24-h temporal patterns of physical activity. We conducted a cross-sectional study on NHANES 2011-2012 data collected from the noninstitutionalized civilian resident population of the United States. The number of participants that had the whole set of physical activity data collected by the accelerometer is 6613. Among 6613 participants, 4242 participants had complete demography and Patient Health Questionnaire-9 (PHQ-9) questionnaire, a tool to quantify depressive symptoms. The association between activity-count behaviors and depressive symptoms was analyzed using multivariable logistic regression to adjust for confounding factors in sequential models. RESULTS We identified four physical activity-count behaviors based on five physical activity-counting patterns classified by unsupervised machine learning. Regarding PHQ-9 scores, we found that evening dominant behavior was positively associated with depressive symptoms compared to morning dominant behavior as the control group. CONCLUSIONS Our results might contribute to monitoring and identifying individuals with latent depressive symptoms, emphasizing the importance of nuanced activity patterns and their probability of assessing depressive symptoms effectively.
Collapse
Affiliation(s)
- Saida Salima Nawrin
- Laboratory of Health and Sports Sciences, Tohoku University Graduate School of Biomedical Engineering, Sendai, Miyagi, Japan
| | - Hitoshi Inada
- Laboratory of Health and Sports Sciences, Tohoku University Graduate School of Biomedical Engineering, Sendai, Miyagi, Japan.
- Present Address: Department of Biochemistry & Cellular Biology, National Center of Neurology and Psychiatry, Kodaira, Tokyo, Japan.
| | - Haruki Momma
- Department of Medicine and Science in Sports and Exercise, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Ryoichi Nagatomi
- Laboratory of Health and Sports Sciences, Tohoku University Graduate School of Biomedical Engineering, Sendai, Miyagi, Japan.
- Department of Medicine and Science in Sports and Exercise, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan.
| |
Collapse
|
8
|
Kapoor S, Cantrell EM, Peng K, Pham TH, Bail CA, Gundersen OE, Hofman JM, Hullman J, Lones MA, Malik MM, Nanayakkara P, Poldrack RA, Raji ID, Roberts M, Salganik MJ, Serra-Garcia M, Stewart BM, Vandewiele G, Narayanan A. REFORMS: Consensus-based Recommendations for Machine-learning-based Science. SCIENCE ADVANCES 2024; 10:eadk3452. [PMID: 38691601 PMCID: PMC11092361 DOI: 10.1126/sciadv.adk3452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 05/03/2024]
Abstract
Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
Collapse
Affiliation(s)
- Sayash Kapoor
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Emily M. Cantrell
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- School of Public and International Affairs, Princeton University, Princeton, NJ 08544, USA
| | - Kenny Peng
- Department of Computer Science, Cornell University, Ithaca, NY 14850, USA
| | - Thanh Hien Pham
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| | - Christopher A. Bail
- Department of Sociology, Duke University, Durham, NC 27708, USA
- Department of Political Science, Duke University, Durham, NC 27708, USA
- Sanford School of Public Policy, Duke University, Durham, NC 27708, USA
| | - Odd Erik Gundersen
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
- Aneo AS, Trondheim, Norway
| | | | - Jessica Hullman
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
| | - Michael A. Lones
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK
| | - Momin M. Malik
- Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA 19104, USA
- Institute in Critical Quantitative, Computational, & Mixed Methodologies, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Priyanka Nanayakkara
- Department of Computer Science, Northwestern University, Evanston, IL 60208, USA
- Department of Communication Studies, Northwestern University, Evanston, IL 60208, USA
| | | | - Inioluwa Deborah Raji
- Department of Computer Science, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - Matthew J. Salganik
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
| | - Marta Serra-Garcia
- Rady School of Management, University of California, San Diego, La Jolla, CA 92093, USA
| | - Brandon M. Stewart
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
- Department of Sociology, Princeton University, Princeton, NJ 08544, USA
- Office of Population Research, Princeton University, Princeton, NJ 08544, USA
- Department of Politics, Princeton University, Princeton, NJ 08544, USA
| | - Gilles Vandewiele
- Department of Information Technology, Ghent University, Ghent, Belgium
| | - Arvind Narayanan
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
- Center for Information Technology Policy, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
9
|
Nimmal Haribabu G, Basu B. Implementing Machine Learning approaches for accelerated prediction of bone strain in acetabulum of a hip joint. J Mech Behav Biomed Mater 2024; 153:106495. [PMID: 38460455 DOI: 10.1016/j.jmbbm.2024.106495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/10/2024] [Accepted: 03/01/2024] [Indexed: 03/11/2024]
Abstract
The Finite Element (FE) methods for biomechanical analysis involving implant design and subject parameters for musculoskeletal applications are extensively reported in literature. Such an approach is manually intensive and computationally expensive with longer simulations times. Although Artificial Intelligence (AI) based approaches are implemented to a limited extent in biomechanics, such approaches to predict bone strain in acetabulum of a hip joint, are hardly explored. In this context, the primary objective of this paper is to evaluate machine learning (ML) models in tandem with high-fidelity FEA data for the accelerated prediction of the biomechanical response in the acetabulum of the human hip joint, during the walking gait. The parameters used in the FEA study included the subject weight, number and distribution of fins on the periphery of the acetabular shell, bone condition and phases of the gait cycle. The biomechanical response has also been evaluated using three different acetabular liners, including pre-clinically validated HDPE-20% HA-20% Al2O3, highly-crosslinked ultrahigh molecular weight polyethylene (HC-UHMWPE) and ZrO2-toughened Al2O3 (ZTA). Such parametric variation in FEA analysis, involving 26 variables and a full factorial design resulted in 10,752 datasets for spatially varying bone strains. The bone condition, as opposed to subject weight, was found to play a statistically significant role in determining the strain response in the periprosthetic bone of the acetabulum. While utilising hyperparameter tuning, K-fold cross validation and statistical learning approaches, a number of ML models were trained on the FEA dataset, and the Random Forest model performed the best with a coefficient of determination (R2) value of 0.99/0.97 and Root Mean Square Error (RMSE) of 0.02/0.01 on the training/test dataset. Taken together, this study establishes the potential of ML approach as a fast surrogate of FEA for implant biomechanics analysis, in less than a minute.
Collapse
Affiliation(s)
- Gowtham Nimmal Haribabu
- Laboratory for Biomaterials Science and Translational Research, Materials Research Centre, Indian Institute of Science, Bangalore, 560012, India
| | - Bikramjit Basu
- Laboratory for Biomaterials Science and Translational Research, Materials Research Centre, Indian Institute of Science, Bangalore, 560012, India.
| |
Collapse
|
10
|
Saingam P, Jain T, Woicik A, Li B, Candry P, Redcorn R, Wang S, Himmelfarb J, Bryan A, Winkler MKH, Gattuso M. Integrating socio-economic vulnerability factors improves neighborhood-scale wastewater-based epidemiology for public health applications. WATER RESEARCH 2024; 254:121415. [PMID: 38479175 DOI: 10.1016/j.watres.2024.121415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 02/28/2024] [Accepted: 03/03/2024] [Indexed: 04/06/2024]
Abstract
Wastewater Based Epidemiology (WBE) of COVID-19 is a low-cost, non-invasive, and inclusive early warning tool for disease spread. Previously studied WBE focused on sampling at wastewater treatment plant scale, limiting the level at which demographic and geographic variations in disease dynamics can be incorporated into the analysis of certain neighborhoods. This study demonstrates the integration of demographic mapping to improve the WBE of COVID-19 and associated post-COVID disease prediction (here kidney disease) at the neighborhood level using machine learning. WBE was conducted at six neighborhoods in Seattle during October 2020 - February 2022. Wastewater processing and RT-qPCR were performed to obtain SARS-CoV-2 RNA concentration. Census data, clinical data of COVID-19, as well as patient data of acute kidney injury (AKI) cases reported during the study period were collected and the distribution across the city was studied using Geographic Information System (GIS) mapping. Further, we analyzed the data set to better understand socioeconomic impacts on disease prevalence of COVID-19 and AKI per neighborhood. The heterogeneity of eleven demographic factors (such as education and age among others) was observed within neighborhoods across the city of Seattle. Dynamics of COVID-19 clinical cases and wastewater SARS-CoV-2 varied across neighborhood with different levels of demographics. Machine learning models trained with data from the earlier stages of the pandemic were able to predict both COVID-19 and AKI incidence in the later stages of the pandemic (Spearman correlation coefficient of 0·546 - 0·904), with the most predictive model trained on the combination of wastewater data and demographics. The integration of demographics strengthened machine learning models' capabilities to predict prevalence of COVID-19, and of AKI as a marker for post-COVID sequelae. Demographic-based WBE presents an effective tool to monitor and manage public health beyond COVID-19 at the neighborhood level.
Collapse
Affiliation(s)
- Prakit Saingam
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, United States.
| | - Tanisha Jain
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, United States
| | - Addie Woicik
- Department of Computer Science & Engineering, University of Washington, Seattle, WA, United States
| | - Bo Li
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, United States
| | - Pieter Candry
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, United States
| | - Raymond Redcorn
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, United States
| | - Sheng Wang
- Department of Computer Science & Engineering, University of Washington, Seattle, WA, United States
| | - Jonathan Himmelfarb
- Kidney Research Institute, University of Washington, Seattle, WA, United States; Center for Dialysis Innovation, University of Washington, Seattle, WA, United States
| | - Andrew Bryan
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, United States
| | - Mari K H Winkler
- Department of Civil and Environmental Engineering, University of Washington, Seattle, WA, United States
| | - Meghan Gattuso
- Seattle Public Utilities, Project Delivery and Engineering, 700 5th Ave, Seattle, WA 98104, United States
| |
Collapse
|
11
|
Santana JEG, Oliveira-Tintino CDDM, Alencar GG, Siqueira GM, Almeida-Bezerra JW, Viana Rodrigues JP, Pinheiro Gonçalves VB, Nicolete R, Tintino SR, Coutinho HDM, Silva TGD. Liposomal nanoformulations with trans-caryophyllene and caryophyllene oxide: do they have an inhibitory action on the efflux pumps NorA, Tet(K), MsrA, and MepA? Chem Biol Interact 2024; 393:110945. [PMID: 38460934 DOI: 10.1016/j.cbi.2024.110945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/09/2024] [Accepted: 03/06/2024] [Indexed: 03/11/2024]
Abstract
This study aimed to evaluate the antibacterial and inhibitory action of NorA, Tet(K), MsrA and MepA efflux pumps in S. aureus strains using the sesquiterpenes named trans-caryophyllene and caryophyllene oxide, both isolated and encapsulated in liposomes. The antibacterial and inhibitory action of these efflux pumps was evaluated through the serial microdilution test in 96-well microplates. Each sesquiterpene and liposome/sesquiterpene was combined with antibiotics and ethidium bromide (EtBr). The antibiotics named norfloxacin, tetracycline and erythromycin were used. The 1199 B, IS-58, RN4220 and K2068 S. aureus strains carrying NorA, Tet(K), MsrA and MepA, respectively, were tested. In the fluorescence measurement test, K2068 S. aureus was incubated with the sesquiterpenes and EtBr, and the fluorescence emission by EtBr was measured. The tested substances did not show direct antibacterial activity, with MIC >1024 μg/mL. Nonetheless, the isolated trans-caryophyllene and caryophyllene oxide reduced the MIC of antibiotics and EtBr, indicating inhibition of NorA, Tet(K) and MsrA. In the fluorescence test, these same sesquiterpenes increased fluorescence emission, indicating inhibition of MepA. Therefore, the sesquiterpenes named trans-caryophyllene and caryophyllene oxide did not show direct antibacterial action; however, in their isolated form, they showed possible inhibitory action on NorA, Tet(K), MsrA and MepA efflux pumps. They may also act in antibiotic potentiation. Further studies are needed to identify the mechanisms involved in antibiotic potentiation and efflux pump inhibitory action.
Collapse
Affiliation(s)
| | | | - Gabriel Gonçalves Alencar
- Departament of Biological Chemistry, Universidade Regional Do Cariri (URCA), Crato, 63105-010, Ceará, Brazil
| | - Gustavo Miguel Siqueira
- Departament of Biological Chemistry, Universidade Regional Do Cariri (URCA), Crato, 63105-010, Ceará, Brazil
| | | | | | | | - Roberto Nicolete
- Oswaldo Cruz Foundation (Fiocruz Ceará), Eusebio, 61773-270, Ceará, Brazil
| | - Saulo Relison Tintino
- Departament of Biological Chemistry, Universidade Regional Do Cariri (URCA), Crato, 63105-010, Ceará, Brazil
| | | | | |
Collapse
|
12
|
Ran W, Yu Q. Data-driven clustering approach to identify novel clusters of high cognitive impairment risk among Chinese community-dwelling elderly people with normal cognition: A national cohort study. J Glob Health 2024; 14:04088. [PMID: 38638099 PMCID: PMC11026990 DOI: 10.7189/jogh.14.04088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2024] Open
Abstract
Background Cognitive impairment is a highly heterogeneous disorder that necessitates further investigation into the distinct characteristics of populations at varying risk levels of cognitive impairment. Using a large-scale registry cohort of elderly individuals, we applied a data-driven approach to identify novel clusters based on diverse sociodemographic features. Methods A prospective cohort of 6398 elderly people from the Chinese Longitudinal Healthy Longevity Survey, followed between 2008-14, was used to develop and validate the model. Participants were aged ≥60 years, community-dwelling, and the Chinese version of the Mini-Mental State Examination (MMSE) score ≥18 were included. Sixty-nine sociodemographic features were included in the analysis. The total population was divided into two-thirds for the derivation cohort (n = 4265) and one-third for the validation cohort (n = 2133). In the derivation cohort, an unsupervised Gaussian mixture model was applied to categorise participants into distinct clusters. A classifier was developed based on the most important 10 factors and was applied to categorise participants into their corresponding clusters in a validation cohort. The difference in the three-year risk of cognitive impairment was compared across the clusters. Results We identified four clusters with distinct features in the derivation cohort. Cluster 1 was associated with the worst life independence, longest sleep duration, and the oldest age. Cluster 2 demonstrated the highest loneliness, characterised by non-marital status and living alone. Cluster 3 was characterised by the lowest sense of loneliness and the highest proportions in marital status and family co-residence. Cluster 4 demonstrated heightened engagement in exercise and leisure activity, along with independent decision-making, hygiene, and a diverse diet. In comparison to Cluster 4, Cluster 1 exhibited the highest three-year cognitive impairment risk (adjusted odds ratio (aOR) = 3.31; 95% confidence interval (CI) = 1.81-6.05), followed by Cluster 2 and Cluster 3 after adjustment for baseline MMSE, residence, sex, age, years of education, drinking, smoking, hypertension, diabetes, heart disease and stroke or cardiovascular diseases. Conclusions A data-driven approach can be instrumental in identifying individuals at high risk of cognitive impairment among cognitively normal elderly populations. Based on various sociodemographic features, these clusters can suggest individualised intervention plans.
Collapse
Affiliation(s)
- Wang Ran
- Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical College, Hangzhou, China
| | - Qiutong Yu
- Medical Education Department, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical College, Hangzhou, China
| |
Collapse
|
13
|
Eijsbroek VC, Kjell K, Schwartz HA, Boehnke JR, Fried EI, Klein DN, Gustafsson P, Augenstein I, Bossuyt PMM, Kjell O. The LEADING Guideline: Reporting Standards for Expert Panel, Best-Estimate Diagnosis, and Longitudinal Expert All Data (LEAD) Studies. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.19.24304526. [PMID: 38699296 PMCID: PMC11065032 DOI: 10.1101/2024.03.19.24304526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Accurate assessments of symptoms and diagnoses are essential for health research and clinical practice but face many challenges. The absence of a single error-free measure is currently addressed by assessment methods involving experts reviewing several sources of information to achieve a more accurate or best-estimate assessment. Three bodies of work spanning medicine, psychiatry, and psychology propose similar assessment methods: The Expert Panel, the Best-Estimate Diagnosis, and the Longitudinal Expert All Data (LEAD). However, the quality of such best-estimate assessments is typically very difficult to evaluate due to poor reporting of the assessment methods and when it is reported, the reporting quality varies substantially. Here we tackle this gap by developing reporting guidelines for such studies, using a four-stage approach: 1) drafting reporting standards accompanied by rationales and empirical evidence, which were further developed with a patient organization for depression, 2) incorporating expert feedback through a two-round Delphi procedure, 3) refining the guideline based on an expert consensus meeting, and 4) testing the guideline by i) having two researchers test it and ii) using it to examine the extent previously published articles report the standards. The last step also demonstrates the need for the guideline: 18 to 58% (Mean = 33%) of the standards were not reported across fifteen randomly selected studies. The LEADING guideline comprises 20 reporting standards related to four groups: The Longitudinal design; the Appropriate data; the Evaluation - experts, materials, and procedures; and the Validity group. We hope that the LEADING guideline will be useful in assisting researchers in planning, reporting, and evaluating research aiming to achieve best-estimate assessments.
Collapse
Affiliation(s)
| | | | - H Andrew Schwartz
- Department of Computer Science, Stony Brook University, New York, the United States
| | - Jan R Boehnke
- School of Health Sciences, University of Dundee, Dundee, Scotland
| | - Eiko I Fried
- Institute of Psychology, Leiden University, Leiden, the Netherlands
| | - Daniel N Klein
- Department of Psychology, Stony Brook University, New York, the United States
| | | | - Isabelle Augenstein
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Patrick M M Bossuyt
- Department of Epidemiology and Data Science, Amsterdam University Medical Centers, Amsterdam, the Netherlands
| | - Oscar Kjell
- Department of Psychology, Lund University, Lund, Sweden
| |
Collapse
|
14
|
Yan Y, Schillemans T, Skantze V, Brunius C. Adjusting for covariates and assessing modeling fitness in machine learning using MUVR2. BIOINFORMATICS ADVANCES 2024; 4:vbae051. [PMID: 38645717 PMCID: PMC11031361 DOI: 10.1093/bioadv/vbae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/05/2024] [Accepted: 04/03/2024] [Indexed: 04/23/2024]
Abstract
Motivation Machine learning (ML) methods are frequently used in Omics research to examine associations between molecular data and for example exposures and health conditions. ML is also used for feature selection to facilitate biological interpretation. Our previous MUVR algorithm was shown to generate predictions and variable selections at state-of-the-art performance. However, a general framework for assessing modeling fitness is still lacking. In addition, enabling to adjust for covariates is a highly desired, but largely lacking trait in ML. We aimed to address these issues in the new MUVR2 framework. Results The MUVR2 algorithm was developed to include the regularized regression framework elastic net in addition to partial least squares and random forest modeling. Compared with other cross-validation strategies, MUVR2 consistently showed state-of-the-art performance, including variable selection, while minimizing overfitting. Testing on simulated and real-world data, we also showed that MUVR2 allows for the adjustment for covariates using elastic net modeling, but not using partial least squares or random forest. Availability and implementation Algorithms, data, scripts, and a tutorial are open source under GPL-3 license and available in the MUVR2 R package at https://github.com/MetaboComp/MUVR2.
Collapse
Affiliation(s)
- Yingxiao Yan
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Tessa Schillemans
- Cardiovascular and Nutritional Epidemiology, Institute of Environmental Medicine, Karolinska Institute, Stockholm, Sweden
| | - Viktor Skantze
- Fraunhofer-Chalmers Research Centre for Industrial Mathematics, Gothenburg, Sweden
| | - Carl Brunius
- Department of Life Sciences, Chalmers University of Technology, Gothenburg, Sweden
- Chalmers Mass Spectrometry Infrastructure, Chalmers University of Technology, Gothenburg SE-41296, Sweden
| |
Collapse
|
15
|
Choo SM, Sartori D, Lee SC, Yang HC, Syed-Abdul S. Data-Driven Identification of Factors That Influence the Quality of Adverse Event Reports: 15-Year Interpretable Machine Learning and Time-Series Analyses of VigiBase and QUEST. JMIR Med Inform 2024; 12:e49643. [PMID: 38568722 PMCID: PMC11024759 DOI: 10.2196/49643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/10/2023] [Accepted: 02/24/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND The completeness of adverse event (AE) reports, crucial for assessing putative causal relationships, is measured using the vigiGrade completeness score in VigiBase, the World Health Organization global database of reported potential AEs. Malaysian reports have surpassed the global average score (approximately 0.44), achieving a 5-year average of 0.79 (SD 0.23) as of 2019 and approaching the benchmark for well-documented reports (0.80). However, the contributing factors to this relatively high report completeness score remain unexplored. OBJECTIVE This study aims to explore the main drivers influencing the completeness of Malaysian AE reports in VigiBase over a 15-year period using vigiGrade. A secondary objective was to understand the strategic measures taken by the Malaysian authorities leading to enhanced report completeness across different time frames. METHODS We analyzed 132,738 Malaysian reports (2005-2019) recorded in VigiBase up to February 2021 split into historical International Drug Information System (INTDIS; n=63,943, 48.17% in 2005-2016) and newer E2B (n=68,795, 51.83% in 2015-2019) format subsets. For machine learning analyses, we performed a 2-stage feature selection followed by a random forest classifier to identify the top features predicting well-documented reports. We subsequently applied tree Shapley additive explanations to examine the magnitude, prevalence, and direction of feature effects. In addition, we conducted time-series analyses to evaluate chronological trends and potential influences of key interventions on reporting quality. RESULTS Among the analyzed reports, 42.84% (56,877/132,738) were well documented, with an increase of 65.37% (53,929/82,497) since 2015. Over two-thirds (46,186/68,795, 67.14%) of the Malaysian E2B reports were well documented compared to INTDIS reports at 16.72% (10,691/63,943). For INTDIS reports, higher pharmacovigilance center staffing was the primary feature positively associated with being well documented. In recent E2B reports, the top positive features included reaction abated upon drug dechallenge, reaction onset or drug use duration of <1 week, dosing interval of <1 day, reports from public specialist hospitals, reports by pharmacists, and reaction duration between 1 and 6 days. In contrast, reports from product registration holders and other health care professionals and reactions involving product substitution issues negatively affected the quality of E2B reports. Multifaceted strategies and interventions comprising policy changes, continuity of education, and human resource development laid the groundwork for AE reporting in Malaysia, whereas advancements in technological infrastructure, pharmacovigilance databases, and reporting tools concurred with increases in both the quantity and quality of AE reports. CONCLUSIONS Through interpretable machine learning and time-series analyses, this study identified key features that positively or negatively influence the completeness of Malaysian AE reports and unveiled how Malaysia has developed its pharmacovigilance capacity via multifaceted strategies and interventions. These findings will guide future work in enhancing pharmacovigilance and public health.
Collapse
Affiliation(s)
- Sim Mei Choo
- Centre of Compliance & Quality Control, National Pharmaceutical Regulatory Agency, Petaling Jaya, Malaysia
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
| | | | - Sing Chet Lee
- Centre of Compliance & Quality Control, National Pharmaceutical Regulatory Agency, Petaling Jaya, Malaysia
| | - Hsuan-Chia Yang
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
- International Center for Health Information Technology, Taipei Medical University, Taipei, Taiwan
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei Medical University, Taipei, Taiwan
- Research Center of Big Data and Meta-Analysis, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| | - Shabbir Syed-Abdul
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei, Taiwan
- International Center for Health Information Technology, Taipei Medical University, Taipei, Taiwan
- School of Gerontology and Long-Term Care, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
16
|
Sun C, Fang R, Salemi M, Prosperi M, Rife Magalis B. DeepDynaForecast: Phylogenetic-informed graph deep learning for epidemic transmission dynamic prediction. PLoS Comput Biol 2024; 20:e1011351. [PMID: 38598563 PMCID: PMC11034642 DOI: 10.1371/journal.pcbi.1011351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 04/22/2024] [Accepted: 03/11/2024] [Indexed: 04/12/2024] Open
Abstract
In the midst of an outbreak or sustained epidemic, reliable prediction of transmission risks and patterns of spread is critical to inform public health programs. Projections of transmission growth or decline among specific risk groups can aid in optimizing interventions, particularly when resources are limited. Phylogenetic trees have been widely used in the detection of transmission chains and high-risk populations. Moreover, tree topology and the incorporation of population parameters (phylodynamics) can be useful in reconstructing the evolutionary dynamics of an epidemic across space and time among individuals. We now demonstrate the utility of phylodynamic trees for transmission modeling and forecasting, developing a phylogeny-based deep learning system, referred to as DeepDynaForecast. Our approach leverages a primal-dual graph learning structure with shortcut multi-layer aggregation, which is suited for the early identification and prediction of transmission dynamics in emerging high-risk groups. We demonstrate the accuracy of DeepDynaForecast using simulated outbreak data and the utility of the learned model using empirical, large-scale data from the human immunodeficiency virus epidemic in Florida between 2012 and 2020. Our framework is available as open-source software (MIT license) at github.com/lab-smile/DeepDynaForcast.
Collapse
Affiliation(s)
- Chaoyue Sun
- Department of Electrical and Computer Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
| | - Ruogu Fang
- Department of Electrical and Computer Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- J. Crayton Pruitt Family Department of Biomedical Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida, United States of America
- Center for Cognitive Aging and Memory, McKnight Brain Institute, University of Florida, Gainesville, Florida, United States of America
| | - Marco Salemi
- Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, Florida, United States of America
- Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America
| | - Mattia Prosperi
- Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America
- Department of Epidemiology, University of Florida, Gainesville, Florida, United States of America
| | - Brittany Rife Magalis
- Department of Pathology, Immunology, and Laboratory Medicine, University of Florida, Gainesville, Florida, United States of America
- Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
17
|
Wójcik Z, Dimitrova V, Warrington L, Velikova G, Absolom K. Using Machine Learning to Predict Unplanned Hospital Utilization and Chemotherapy Management From Patient-Reported Outcome Measures. JCO Clin Cancer Inform 2024; 8:e2300264. [PMID: 38669610 PMCID: PMC11161248 DOI: 10.1200/cci.23.00264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/14/2024] [Accepted: 03/01/2024] [Indexed: 04/28/2024] Open
Abstract
PURPOSE Adverse effects of chemotherapy often require hospital admissions or treatment management. Identifying factors contributing to unplanned hospital utilization may improve health care quality and patients' well-being. This study aimed to assess if patient-reported outcome measures (PROMs) improve performance of machine learning (ML) models predicting hospital admissions, triage events (contacting helpline or attending hospital), and changes to chemotherapy. MATERIALS AND METHODS Clinical trial data were used and contained responses to three PROMs (European Organisation for Research and Treatment of Cancer Core Quality of Life Questionnaire [QLQ-C30], EuroQol Five-Dimensional Visual Analogue Scale [EQ-5D], and Functional Assessment of Cancer Therapy-General [FACT-G]) and clinical information on 508 participants undergoing chemotherapy. Six feature sets (with following variables: [1] all available; [2] clinical; [3] PROMs; [4] clinical and QLQ-C30; [5] clinical and EQ-5D; [6] clinical and FACT-G) were applied in six ML models (logistic regression [LR], decision tree, adaptive boosting, random forest [RF], support vector machines [SVMs], and neural network) to predict admissions, triage events, and chemotherapy changes. RESULTS The comprehensive analysis of predictive performances of the six ML models for each feature set in three different methods for handling class imbalance indicated that PROMs improved predictions of all outcomes. RF and SVMs had the highest performance for predicting admissions and changes to chemotherapy in balanced data sets, and LR in imbalanced data set. Balancing data led to the best performance compared with imbalanced data set or data set with balanced train set only. CONCLUSION These results endorsed the view that ML can be applied on PROM data to predict hospital utilization and chemotherapy management. If further explored, this study may contribute to health care planning and treatment personalization. Rigorous comparison of model performance affected by different imbalanced data handling methods shows best practice in ML research.
Collapse
Affiliation(s)
- Zuzanna Wójcik
- UKRI Centre for Doctoral Training in Artificial Intelligence for Medical Diagnosis and Care, University of Leeds, Leeds, United Kingdom
| | - Vania Dimitrova
- School of Computing, University of Leeds, Leeds, United Kingdom
| | - Lorraine Warrington
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, United Kingdom
| | - Galina Velikova
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, United Kingdom
- Leeds Cancer Centre, Leeds Teaching Hospitals NHS Trust, Leeds, United Kingdom
| | - Kate Absolom
- Leeds Institute of Medical Research, University of Leeds, St James's University Hospital, Leeds, United Kingdom
- Leeds Institute of Health Sciences, University of Leeds, Leeds, United Kingdom
| |
Collapse
|
18
|
Zhu J, Wu Y, Lin S, Duan S, Wang X, Fang Y. Identifying and predicting physical limitation and cognitive decline trajectory group of older adults in China: A data-driven machine learning analysis. J Affect Disord 2024; 350:590-599. [PMID: 38218258 DOI: 10.1016/j.jad.2024.01.095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 11/24/2023] [Accepted: 01/07/2024] [Indexed: 01/15/2024]
Abstract
OBJECTIVE This study aimed to utilize data-driven machine learning methods to identify and predict potential physical and cognitive function trajectory groups of older adults and determine their crucial factors for promoting active ageing in China. METHODS Longitudinal data on 3026 older adults from the Chinese Longitudinal Healthy Longevity and Happy Family Survey was used to identify potential physical and cognitive function trajectory groups using a group-based multi-trajectory model (GBMTM). Predictors were selected from sociodemographic characteristics, lifestyle factors, and physical and mental conditions. The trajectory groups were predicted using data-driven machine learning models and dynamic nomogram. Model performance was evaluated by area under the receiver operating characteristics curve (AUROC), area under the precision-recall curve (PRAUC), and confusion matrix. RESULTS Two physical and cognitive function trajectory groups were determined, including a trajectory group with physical limitation and cognitive decline (14.18 %) and a normal trajectory group (85.82 %). Logistic regression performed well in predicting trajectory groups (AUROC = 0.881, PRAUC = 0.649). Older adults with lower baseline score of activities of daily living, older age, less frequent housework, and fewer actual teeth were more likely to experience physical limitation and cognitive decline trajectory group. LIMITATION This study didn't carry out external validation. CONCLUSIONS This study shows that GBMTM and machine learning models effectively identify and predict physical limitation and cognitive decline trajectory group. The identified predictors might be essential for developing targeted interventions to promote healthy ageing.
Collapse
Affiliation(s)
- Junmin Zhu
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian, China
| | - Yafei Wu
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian, China
| | - Shaowu Lin
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China
| | - Siyu Duan
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian, China
| | - Xing Wang
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian, China
| | - Ya Fang
- School of Public Health, Xiamen University, Xiamen, Fujian, China; Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian, China; National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, China.
| |
Collapse
|
19
|
Brooks JM, Chapman CG, Chen BK, Floyd SB, Hikmet N. Assessing the properties of patient-specific treatment effect estimates from causal forest algorithms under essential heterogeneity. BMC Med Res Methodol 2024; 24:66. [PMID: 38481139 PMCID: PMC10935905 DOI: 10.1186/s12874-024-02187-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 02/21/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Treatment variation from observational data has been used to estimate patient-specific treatment effects. Causal Forest Algorithms (CFAs) developed for this task have unknown properties when treatment effect heterogeneity from unmeasured patient factors influences treatment choice - essential heterogeneity. METHODS We simulated eleven populations with identical treatment effect distributions based on patient factors. The populations varied in the extent that treatment effect heterogeneity influenced treatment choice. We used the generalized random forest application (CFA-GRF) to estimate patient-specific treatment effects for each population. Average differences between true and estimated effects for patient subsets were evaluated. RESULTS CFA-GRF performed well across the population when treatment effect heterogeneity did not influence treatment choice. Under essential heterogeneity, however, CFA-GRF yielded treatment effect estimates that reflected true treatment effects only for treated patients and were on average greater than true treatment effects for untreated patients. CONCLUSIONS Patient-specific estimates produced by CFAs are sensitive to why patients in real-world practice make different treatment choices. Researchers using CFAs should develop conceptual frameworks of treatment choice prior to estimation to guide estimate interpretation ex post.
Collapse
Affiliation(s)
- John M Brooks
- Center for Effectiveness Research in Orthopaedics - Arnold School of Public Health Greenville, 915 Greene Street #302D, Columbia, SC, 29208-0001, USA.
- University of South Carolina Arnold School of Public Health, Health Services Policy & Management, Columbia, SC, USA.
| | - Cole G Chapman
- Department of Pharmacy Practice and Science Iowa City, University of Iowa, Iowa, USA
- Center for Effectiveness Research in Orthopaedics, Greenville, SC, USA
| | - Brian K Chen
- University of South Carolina Arnold School of Public Health, Health Services Policy & Management, Columbia, SC, USA
- Center for Effectiveness Research in Orthopaedics, Greenville, SC, USA
| | - Sarah B Floyd
- Center for Effectiveness Research in Orthopaedics, Greenville, SC, USA
- Clemson University College of Behavioral Social and Health Sciences, Public Health Sciences, Clemson, South Carolina, USA
| | - Neset Hikmet
- Center for Effectiveness Research in Orthopaedics, Greenville, SC, USA
- Department of Integrated Information Technology, Innovation Think Tank Lab @ USC, University of South Carolina College of Engineering and Computing, Columbia, SC, USA
| |
Collapse
|
20
|
Zafar F, Fakhare Alam L, Vivas RR, Wang J, Whei SJ, Mehmood S, Sadeghzadegan A, Lakkimsetti M, Nazir Z. The Role of Artificial Intelligence in Identifying Depression and Anxiety: A Comprehensive Literature Review. Cureus 2024; 16:e56472. [PMID: 38638735 PMCID: PMC11025697 DOI: 10.7759/cureus.56472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/20/2024] Open
Abstract
This narrative literature review undertakes a comprehensive examination of the burgeoning field, tracing the development of artificial intelligence (AI)-powered tools for depression and anxiety detection from the level of intricate algorithms to practical applications. Delivering essential mental health care services is now a significant public health priority. In recent years, AI has become a game-changer in the early identification and intervention of these pervasive mental health disorders. AI tools can potentially empower behavioral healthcare services by helping psychiatrists collect objective data on patients' progress and tasks. This study emphasizes the current understanding of AI, the different types of AI, its current use in multiple mental health disorders, advantages, disadvantages, and future potentials. As technology develops and the digitalization of the modern era increases, there will be a rise in the application of artificial intelligence in psychiatry; therefore, a comprehensive understanding will be needed. We searched PubMed, Google Scholar, and Science Direct using keywords for this. In a recent review of studies using electronic health records (EHR) with AI and machine learning techniques for diagnosing all clinical conditions, roughly 99 publications have been found. Out of these, 35 studies were identified for mental health disorders in all age groups, and among them, six studies utilized EHR data sources. By critically analyzing prominent scholarly works, we aim to illuminate the current state of this technology, exploring its successes, limitations, and future directions. In doing so, we hope to contribute to a nuanced understanding of AI's potential to revolutionize mental health diagnostics and pave the way for further research and development in this critically important domain.
Collapse
Affiliation(s)
- Fabeha Zafar
- Internal Medicine, Dow University of Health Sciences (DUHS), Karachi, PAK
| | | | - Rafael R Vivas
- Nutrition, Food and Exercise Sciences, Florida State University College of Human Sciences, Tallahassee, USA
| | - Jada Wang
- Medicine, St. George's University, Brooklyn, USA
| | - See Jia Whei
- Internal Medicine, Sriwijaya University, Palembang, IDN
| | | | | | | | - Zahra Nazir
- Internal Medicine, Combined Military Hospital, Quetta, Quetta, PAK
| |
Collapse
|
21
|
Lee H, Hanson HA, Logan J, Maguire D, Kapadia A, Dewji S, Agasthya G. Evaluating county-level lung cancer incidence from environmental radiation exposure, PM 2.5, and other exposures with regression and machine learning models. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2024; 46:82. [PMID: 38367080 PMCID: PMC10874317 DOI: 10.1007/s10653-023-01820-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 11/27/2023] [Indexed: 02/19/2024]
Abstract
Characterizing the interplay between exposures shaping the human exposome is vital for uncovering the etiology of complex diseases. For example, cancer risk is modified by a range of multifactorial external environmental exposures. Environmental, socioeconomic, and lifestyle factors all shape lung cancer risk. However, epidemiological studies of radon aimed at identifying populations at high risk for lung cancer often fail to consider multiple exposures simultaneously. For example, moderating factors, such as PM2.5, may affect the transport of radon progeny to lung tissue. This ecological analysis leveraged a population-level dataset from the National Cancer Institute's Surveillance, Epidemiology, and End-Results data (2013-17) to simultaneously investigate the effect of multiple sources of low-dose radiation (gross [Formula: see text] activity and indoor radon) and PM2.5 on lung cancer incidence rates in the USA. County-level factors (environmental, sociodemographic, lifestyle) were controlled for, and Poisson regression and random forest models were used to assess the association between radon exposure and lung and bronchus cancer incidence rates. Tree-based machine learning (ML) method perform better than traditional regression: Poisson regression: 6.29/7.13 (mean absolute percentage error, MAPE), 12.70/12.77 (root mean square error, RMSE); Poisson random forest regression: 1.22/1.16 (MAPE), 8.01/8.15 (RMSE). The effect of PM2.5 increased with the concentration of environmental radon, thereby confirming findings from previous studies that investigated the possible synergistic effect of radon and PM2.5 on health outcomes. In summary, the results demonstrated (1) a need to consider multiple environmental exposures when assessing radon exposure's association with lung cancer risk, thereby highlighting (1) the importance of an exposomics framework and (2) that employing ML models may capture the complex interplay between environmental exposures and health, as in the case of indoor radon exposure and lung cancer incidence.
Collapse
Affiliation(s)
- Heechan Lee
- Nuclear and Radiological Engineering and Medical Physics Programs, George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, 770 State Street, Atlanta, GA, 30332, USA
- Advanced Computing for Health Sciences Section, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37830, USA
| | - Heidi A Hanson
- Advanced Computing for Health Sciences Section, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37830, USA
| | - Jeremy Logan
- Data Engineering Group, Data and AI Section, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37830, USA
| | - Dakotah Maguire
- Advanced Computing for Health Sciences Section, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37830, USA
| | - Anuj Kapadia
- Advanced Computing for Health Sciences Section, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37830, USA
| | - Shaheen Dewji
- Nuclear and Radiological Engineering and Medical Physics Programs, George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, 770 State Street, Atlanta, GA, 30332, USA
| | - Greeshma Agasthya
- Advanced Computing for Health Sciences Section, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37830, USA
| |
Collapse
|
22
|
Alkhamis MA, Al Jarallah M, Attur S, Zubaid M. Interpretable machine learning models for predicting in-hospital and 30 days adverse events in acute coronary syndrome patients in Kuwait. Sci Rep 2024; 14:1243. [PMID: 38216605 PMCID: PMC10786865 DOI: 10.1038/s41598-024-51604-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 01/07/2024] [Indexed: 01/14/2024] Open
Abstract
The relationships between acute coronary syndromes (ACS) adverse events and the associated risk factors are typically complicated and nonlinear, which poses significant challenges to clinicians' attempts at risk stratification. Here, we aim to explore the implementation of modern risk stratification tools to untangle how these complex factors shape the risk of adverse events in patients with ACS. We used an interpretable multi-algorithm machine learning (ML) approach and clinical features to fit predictive models to 1,976 patients with ACS in Kuwait. We demonstrated that random forest (RF) and extreme gradient boosting (XGB) algorithms, remarkably outperform traditional logistic regression model (AUCs = 0.84 & 0.79 for RF and XGB, respectively). Our in-hospital adverse events model identified left ventricular ejection fraction as the most important predictor with the highest interaction strength with other factors. However, using the 30-days adverse events model, we found that performing an urgent coronary artery bypass graft was the most important predictor, with creatinine levels having the strongest overall interaction with other related factors. Our ML models not only untangled the non-linear relationships that shape the clinical epidemiology of ACS adverse events but also elucidated their risk in individual patients based on their unique features.
Collapse
Affiliation(s)
- Moh A Alkhamis
- Department of Epidemiology and Biostatistics, Health Sciences Center, College of Public Health, Kuwait University, Kuwait City, Kuwait.
| | - Mohammad Al Jarallah
- Department of Cardiology, Sabah Al Ahmed Cardiac Center, Ministry of Health, Kuwait City, Kuwait
| | - Sreeja Attur
- Department of Medicine, Health Sciences Center, Faculty of Medicine, Kuwait University, Kuwait City, Kuwait
| | - Mohammad Zubaid
- Department of Medicine, Health Sciences Center, Faculty of Medicine, Kuwait University, Kuwait City, Kuwait
| |
Collapse
|
23
|
Bednorz A, Mak JKL, Jylhävä J, Religa D. Use of Electronic Medical Records (EMR) in Gerontology: Benefits, Considerations and a Promising Future. Clin Interv Aging 2023; 18:2171-2183. [PMID: 38152074 PMCID: PMC10752027 DOI: 10.2147/cia.s400887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 11/05/2023] [Indexed: 12/29/2023] Open
Abstract
Electronic medical records (EMRs) have many benefits in clinical research in gerontology, enabling data analysis, development of prognostic tools and disease risk prediction. EMRs also offer a range of advantages in clinical practice, such as comprehensive medical records, streamlined communication with healthcare providers, remote data access, and rapid retrieval of test results, ultimately leading to increased efficiency, enhanced patient safety, and improved quality of care in gerontology, which includes benefits like reduced medication use and better patient history taking and physical examination assessments. The use of artificial intelligence (AI) and machine learning (ML) approaches on EMRs can further improve disease diagnosis, symptom classification, and support clinical decision-making. However, there are also challenges related to data quality, data entry errors, as well as the ethics and safety of using AI in healthcare. This article discusses the future of EMRs in gerontology and the application of AI and ML in clinical research. Ethical and legal issues surrounding data sharing and the need for healthcare professionals to critically evaluate and integrate these technologies are also emphasized. The article concludes by discussing the challenges related to the use of EMRs in research as well as in their primary intended use, the daily clinical practice.
Collapse
Affiliation(s)
- Adam Bednorz
- John Paul II Geriatric Hospital, Katowice, Poland
- Institute of Psychology, Humanitas Academy, Sosnowiec, Poland
| | - Jonathan K L Mak
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Juulia Jylhävä
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Faculty of Social Sciences (Health Sciences) and Gerontology Research Center (GEREC), University of Tampere, Tampere, Finland
| | - Dorota Religa
- Division of Clinical Geriatrics, Department of Neurobiology, Care sciences and Society, Karolinska Institutet, Stockholm, Sweden
- Theme Inflammation and Aging, Karolinska University Hospital, Huddinge, Sweden
| |
Collapse
|
24
|
Ma Q, Cheng C, Chen Y, Wang Q, Li B, Wang P. Effect and prediction of physical exercise and diet on blood pressure control in patients with hypertension. Medicine (Baltimore) 2023; 102:e36612. [PMID: 38115342 PMCID: PMC10727525 DOI: 10.1097/md.0000000000036612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 08/07/2023] [Accepted: 11/21/2023] [Indexed: 12/21/2023] Open
Abstract
The study aims to explore the current status of hypertension control and its predictors in patients with hypertension in China and provide evidence for preventing and controlling hypertension. A questionnaire survey was conducted among 300 hypertensive patients who visited the Second Affiliated Hospital of Anhui Medical University from February 20, 2023 to March 11, 2023. The patients were divided into a well-controlled group and an untargeted-control group according to their hypertension control status. A total of 294 subjects, including 83 in the well-controlled group and 211 in the untargeted-control group, were included in the analysis. Multivariate logistic regression analysis showed that hypertensive patients with high BMI and family history of hypertension were risk factors for hypertension control. Married status was a protective factor for hypertension control. SVM optimized the model with γ = 0.001 and a penalty factor of C = 0.001. The prediction accuracy of the final model was 80.9%. The findings indicated that BMI, family history of hypertension, and marital status were independent predictors of blood pressure control. Further studies are warranted to illustrate potential mechanisms for improving hypertensive patients' blood pressure control.
Collapse
Affiliation(s)
- Qiang Ma
- Department of Police Physical Skills Training, Anhui Vocational College of Police Officers, Hefei, China
| | - Cheng Cheng
- Department of Cardiovascular Medicine, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Yuenan Chen
- School of Pharmacy, Anhui Medical University, Hefei, China
| | - Qianya Wang
- School of Clinical Medicine, Anhui Medical University, Hefei, China
| | - Baozhu Li
- School of Public Health, Anhui Medical University, Hefei, China
| | - Ping Wang
- School of Innovation and Entrepreneurship, Anhui Medical University, Hefei, China
| |
Collapse
|
25
|
Gharbi-Meliani A, Husson F, Vandendriessche H, Bayen E, Yaffe K, Bachoud-Lévi AC, Cleret de Langavant L. Identification of high likelihood of dementia in population-based surveys using unsupervised clustering: a longitudinal analysis. Alzheimers Res Ther 2023; 15:209. [PMID: 38031083 PMCID: PMC10688099 DOI: 10.1186/s13195-023-01357-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 11/21/2023] [Indexed: 12/01/2023]
Abstract
BACKGROUND Dementia is defined as a cognitive decline that affects functional status. Longitudinal ageing surveys often lack a clinical diagnosis of dementia though measure cognition and daily function over time. We used unsupervised machine learning and longitudinal data to identify transition to probable dementia. METHODS Multiple Factor Analysis was applied to longitudinal function and cognitive data of 15,278 baseline participants (aged 50 years and more) from the Survey of Health, Ageing, and Retirement in Europe (SHARE) (waves 1, 2 and 4-7, between 2004 and 2017). Hierarchical Clustering on Principal Components discriminated three clusters at each wave. We estimated probable or "Likely Dementia" prevalence by sex and age, and assessed whether dementia risk factors increased the risk of being assigned probable dementia status using multistate models. Next, we compared the "Likely Dementia" cluster with self-reported dementia status and replicated our findings in the English Longitudinal Study of Ageing (ELSA) cohort (waves 1-9, between 2002 and 2019, 7840 participants at baseline). RESULTS Our algorithm identified a higher number of probable dementia cases compared with self-reported cases and showed good discriminative power across all waves (AUC ranged from 0.754 [0.722-0.787] to 0.830 [0.800-0.861]). "Likely Dementia" status was more prevalent in older people, displayed a 2:1 female/male ratio, and was associated with nine factors that increased risk of transition to dementia: low education, hearing loss, hypertension, drinking, smoking, depression, social isolation, physical inactivity, diabetes, and obesity. Results were replicated in ELSA cohort with good accuracy. CONCLUSIONS Machine learning clustering can be used to study dementia determinants and outcomes in longitudinal population ageing surveys in which dementia clinical diagnosis is lacking.
Collapse
Affiliation(s)
- Amin Gharbi-Meliani
- Neuropsychologie Interventionnelle, U955 E01, Institut Mondor de Recherche Biomédicale & Département d'études Cognitives, INSERM, Ecole Normale Supérieure, Université PSL, Université Paris-Est Créteil, Creteil, 94000, France
| | - François Husson
- Institut Agro, Univ Rennes1, CNRS, IRMAR, Rennes, 35000, France
| | - Henri Vandendriessche
- Laboratoire de Neurosciences Cognitives et Computationnelles, Département d'études Cognitives, Ecole Normale Supérieure, Université PSL, INSERM, Paris, 75005, France
| | - Eleonore Bayen
- Département de Rééducation Neurologique, Sorbonne Université, Hôpital Pitié-Salpêtrière-Assistance Publique Hôpitaux de Paris, Paris, 75013, France
- Global Brain Health Institute, University of California, San Francisco, CA, 94143, USA
| | - Kristine Yaffe
- Global Brain Health Institute, University of California, San Francisco, CA, 94143, USA
- Departments of Psychiatry, Neurology and Epidemiology and Biostatistics, University of California, San Francisco, CA, 94143, USA
| | - Anne-Catherine Bachoud-Lévi
- Neuropsychologie Interventionnelle, U955 E01, Institut Mondor de Recherche Biomédicale & Département d'études Cognitives, INSERM, Ecole Normale Supérieure, Université PSL, Université Paris-Est Créteil, Creteil, 94000, France
- Service de Neurologie, Centre de référence maladie de Huntington, Hôpital Henri Mondor, Assistance Publique Hôpitaux de Paris, 1 rue Gustave Eiffel, Creteil, 94000, France
| | - Laurent Cleret de Langavant
- Neuropsychologie Interventionnelle, U955 E01, Institut Mondor de Recherche Biomédicale & Département d'études Cognitives, INSERM, Ecole Normale Supérieure, Université PSL, Université Paris-Est Créteil, Creteil, 94000, France.
- Global Brain Health Institute, University of California, San Francisco, CA, 94143, USA.
- Service de Neurologie, Centre de référence maladie de Huntington, Hôpital Henri Mondor, Assistance Publique Hôpitaux de Paris, 1 rue Gustave Eiffel, Creteil, 94000, France.
| |
Collapse
|
26
|
Li Q, Zheng JX, Jia TW, Feng XY, Lv C, Zhang LJ, Yang GJ, Xu J, Zhou XN. Optimized strategy for schistosomiasis elimination: results from marginal benefit modeling. Parasit Vectors 2023; 16:419. [PMID: 37968661 PMCID: PMC10652544 DOI: 10.1186/s13071-023-06001-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 10/06/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND Poverty contributes to the transmission of schistosomiasis via multiple pathways, with the insufficiency of appropriate interventions being a crucial factor. The aim of this article is to provide more economical and feasible intervention measures for endemic areas with varying levels of poverty. METHODS We collected and analyzed the prevalence patterns along with the cost of control measures in 11 counties over the last 20 years in China. Seven machine learning models, including XGBoost, support vector machine, generalized linear model, regression tree, random forest, gradient boosting machine and neural network, were used for developing model and calculate marginal benefits. RESULTS The XGBoost model had the highest prediction accuracy with an R2 of 0.7308. Results showed that risk surveillance, snail control with molluscicides and treatment were the most effective interventions in controlling schistosomiasis prevalence. The best combination of interventions was interlacing seven interventions, including risk surveillance, treatment, toilet construction, health education, snail control with molluscicides, cattle slaughter and animal chemotherapy. The marginal benefit of risk surveillance is the most effective intervention among nine interventions, which was influenced by the prevalence of schistosomiasis and cost. CONCLUSIONS In the elimination phase of the national schistosomiasis program, emphasizing risk surveillance holds significant importance in terms of cost-saving.
Collapse
Affiliation(s)
- Qin Li
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Jin-Xin Zheng
- Ruijin Hospital Affiliated to The Shanghai Jiao Tong University Medical School, Shanghai, 200025, China
| | - Tie-Wu Jia
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Xin-Yu Feng
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Chao Lv
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
- School of Global Health, Chinese Center for Tropical Diseases Research and Shanghai Jiao Tong University School of Medicine, One Health Center, Shanghai Jiao Tong University and The Edinburgh University, Shanghai, 200025, China
| | - Li-Juan Zhang
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Guo-Jing Yang
- School of Tropical Medicine, Hainan Medical University, Haikou, 571199, China
| | - Jing Xu
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China
| | - Xiao-Nong Zhou
- National Institute of Parasitic Diseases, Chinese Center for Disease Control and Prevention (Chinese Center for Tropical Diseases Research), National Health Commission Key Laboratory of Parasite and Vector Biology, WHO Collaborating Centre for Tropical Diseases, National Center for International Research on Tropical Diseases, Shanghai, 200025, China.
- School of Global Health, Chinese Center for Tropical Diseases Research and Shanghai Jiao Tong University School of Medicine, One Health Center, Shanghai Jiao Tong University and The Edinburgh University, Shanghai, 200025, China.
| |
Collapse
|
27
|
Breeze F, Hossain RR, Mayo M, McKelvie J. Predicting ophthalmic clinic non-attendance using machine learning: Development and validation of models using nationwide data. Clin Exp Ophthalmol 2023; 51:764-774. [PMID: 37885379 DOI: 10.1111/ceo.14310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 09/04/2023] [Accepted: 10/08/2023] [Indexed: 10/28/2023]
Abstract
BACKGROUND Ophthalmic clinic non-attendance in New Zealand is associated with poorer health outcomes, marked inequities and costs NZD$30 million per annum. Initiatives to improve attendance typically involve expensive and ineffective brute-force strategies. The aim was to develop machine learning models to accurately predict ophthalmic clinic non-attendance. METHODS This multicentre, retrospective observational study developed and validated predictive models of clinic non-attendance. Attendance data for 3.1 million appointments from all New Zealand government-funded ophthalmology clinics from 2009 to 2018 were aggregated for analysis. Repeated ten-fold cross validation was used to train and optimise XGBoost and logistic regression models on several demographic and clinic-related variables. Models developed using the entire training set were compared with those restricted to regional subsets of the data. RESULTS In the testing data set from 2019, there were 407 574 appointments (median [range] age, 66 [0-105] years; 210 365 [51.6%] female) with a non-attendance rate of 5.7% (n = 23 309 missed appointments), XGBoost models trained on each region's data achieved the highest mean AUROC of 0.764 (SD 0.058) and mean AUPRC of 0.157 (SD 0.072). XGBoost performed better than logistic regression (mean AUROC = 0.756, p = 0.002). Training individual XGBoost models for each region led to better performance than training a single model on the complete nationwide dataset (mean AUROC = 0.754, p = 0.04). CONCLUSION Machine learning algorithms can predict ophthalmic clinic non-attendance with relatively basic demographic and clinic data. These findings suggest further research examining implementation of such algorithms in scheduling systems or public health interventions may be useful.
Collapse
Affiliation(s)
- Finley Breeze
- Department of Ophthalmology, University of Auckland, Auckland, New Zealand
| | - Ruhella R Hossain
- Department of Ophthalmology, University of Auckland, Auckland, New Zealand
- Department of Ophthalmology, Waikato Hospital, Hamilton, New Zealand
| | - Michael Mayo
- Department of Computer Science, University of Waikato, Hamilton, New Zealand
| | - James McKelvie
- Department of Ophthalmology, University of Auckland, Auckland, New Zealand
- Department of Ophthalmology, Waikato Hospital, Hamilton, New Zealand
| |
Collapse
|
28
|
Ma X, Mo C, Li Y, Chen X, Gui C. Prediction of the development of contrast‑induced nephropathy following percutaneous coronary artery intervention by machine learning. Acta Cardiol 2023; 78:912-921. [PMID: 37052397 DOI: 10.1080/00015385.2023.2198937] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 03/30/2023] [Indexed: 04/14/2023]
Abstract
Contrast-induced nephropathy (CIN) is associated with increased mortality and morbidity in patients with coronary artery disease undergoing elective percutaneous coronary intervention(PCI). We developed a machine learning-based risk stratification model to predict contrast-induced nephropathy after PCI. A study retrospectively enrolling 240 patients eligible for PCI from December 2017 to May 2020 was performed. CIN was defined as a rise in serum creatinine levels ≥0.5 mg/dL or ≥25% from baseline within 72 h after surgery. Eight machine learning methods were performed based on clinical variables. Shapley Additive exPlanation values were also used to interpret the best-performing prediction models. Development of CIN was found in 37 patients(16.5%) after PCI. There were 11 significant predictors of CIN, including uric acid, peripheral vascular disease, cystatin C, creatine kinase-MB, haemoglobin, N-terminal pro-brain natriuretic peptide, age, diabetes, systemic immune-inflammatory index, total protein, and low-density lipoprotein. Regarding the efficacy of the machine learning model that accurately predicted CIN, SVM exhibited the most outstanding AUC value of 0.784. The SHAP and radar plots were used to illustrate the positive and negative effects of the 11 features attributed to the SVM. Machine learning models have the potential to identify the risk of CIN for elective PCI patients.
Collapse
Affiliation(s)
- Xiao Ma
- Department of Cardiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, P. R. China
- Guangxi Key Laboratory Base of Precision Medicine in Cardiocerebrovascular Diseases Control and Prevention, Nanning, P. R. China
- Guangxi Clinical Research Center for Cardiocerebrovascular Diseases, Nanning, P. R. China
| | - Changhua Mo
- Department of Cardiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, P. R. China
- Guangxi Key Laboratory Base of Precision Medicine in Cardiocerebrovascular Diseases Control and Prevention, Nanning, P. R. China
- Guangxi Clinical Research Center for Cardiocerebrovascular Diseases, Nanning, P. R. China
| | - Yujuan Li
- Department of Cardiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, P. R. China
- Guangxi Key Laboratory Base of Precision Medicine in Cardiocerebrovascular Diseases Control and Prevention, Nanning, P. R. China
- Guangxi Clinical Research Center for Cardiocerebrovascular Diseases, Nanning, P. R. China
| | - Xinyuan Chen
- Department of Cardiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, P. R. China
| | - Chun Gui
- Department of Cardiology, The First Affiliated Hospital of Guangxi Medical University, Nanning, P. R. China
- Guangxi Key Laboratory Base of Precision Medicine in Cardiocerebrovascular Diseases Control and Prevention, Nanning, P. R. China
- Guangxi Clinical Research Center for Cardiocerebrovascular Diseases, Nanning, P. R. China
| |
Collapse
|
29
|
Lotfata A, Moosazadeh M, Helbich M, Hoseini B. Socioeconomic and environmental determinants of asthma prevalence: a cross-sectional study at the U.S. County level using geographically weighted random forests. Int J Health Geogr 2023; 22:18. [PMID: 37563691 PMCID: PMC10413687 DOI: 10.1186/s12942-023-00343-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND Some studies have established associations between the prevalence of new-onset asthma and asthma exacerbation and socioeconomic and environmental determinants. However, research remains limited concerning the shape of these associations, the importance of the risk factors, and how these factors vary geographically. OBJECTIVE We aimed (1) to examine ecological associations between asthma prevalence and multiple socio-physical determinants in the United States; and (2) to assess geographic variations in their relative importance. METHODS Our study design is cross sectional based on county-level data for 2020 across the United States. We obtained self-reported asthma prevalence data of adults aged 18 years or older for each county. We applied conventional and geographically weighted random forest (GWRF) to investigate the associations between asthma prevalence and socioeconomic (e.g., poverty) and environmental determinants (e.g., air pollution and green space). To enhance the interpretability of the GWRF, we (1) assessed the shape of the associations through partial dependence plots, (2) ranked the determinants according to their global importance scores, and (3) mapped the local variable importance spatially. RESULTS Of the 3059 counties, the average asthma prevalence was 9.9 (standard deviation ± 0.99). The GWRF outperformed the conventional random forest. We found an indication, for example, that temperature was inversely associated with asthma prevalence, while poverty showed positive associations. The partial dependence plots showed that these associations had a non-linear shape. Ranking the socio-physical environmental factors concerning their global importance showed that smoking prevalence and depression prevalence were most relevant, while green space and limited language were of minor relevance. The local variable importance measures showed striking geographical differences. CONCLUSION Our findings strengthen the evidence that socio-physical environments play a role in explaining asthma prevalence, but their relevance seems to vary geographically. The results are vital for implementing future asthma prevention programs that should be tailor-made for specific areas.
Collapse
Affiliation(s)
- Aynaz Lotfata
- Department of Pathology, Microbiology, and Immunology, School of Veterinary Medicine, University of California, Davis, CA, USA
| | - Mohammad Moosazadeh
- Integrated Engineering, Department of Environmental Science and Engineering, College of Engineering, KyungHee University, Yongin, 446-701, Republic of Korea
| | - Marco Helbich
- Department of Human Geography and Spatial Planning, Faculty of Geosciences, University Utrecht, Utrecht, The Netherlands
| | - Benyamin Hoseini
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
30
|
Hamidi F, Gilani N, Arabi Belaghi R, Yaghoobi H, Babaei E, Sarbakhsh P, Malakouti J. Identifying potential circulating miRNA biomarkers for the diagnosis and prediction of ovarian cancer using machine-learning approach: application of Boruta. Front Digit Health 2023; 5:1187578. [PMID: 37621964 PMCID: PMC10445490 DOI: 10.3389/fdgth.2023.1187578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 07/20/2023] [Indexed: 08/26/2023] Open
Abstract
Introduction In gynecologic oncology, ovarian cancer is a great clinical challenge. Because of the lack of typical symptoms and effective biomarkers for noninvasive screening, most patients develop advanced-stage ovarian cancer by the time of diagnosis. MicroRNAs (miRNAs) are a type of non-coding RNA molecule that has been linked to human cancers. Specifying diagnostic biomarkers to determine non-cancer and cancer samples is difficult. Methods By using Boruta, a novel random forest-based feature selection in the machine-learning techniques, we aimed to identify biomarkers associated with ovarian cancer using cancerous and non-cancer samples from the Gene Expression Omnibus (GEO) database: GSE106817. In this study, we used two independent GEO data sets as external validation, including GSE113486 and GSE113740. We utilized five state-of-the-art machine-learning algorithms for classification: logistic regression, random forest, decision trees, artificial neural networks, and XGBoost. Results Four models discovered in GSE113486 had an AUC of 100%, three in GSE113740 with AUC of over 94%, and four in GSE113486 with AUC of over 94%. We identified 10 miRNAs to distinguish ovarian cancer cases from normal controls: hsa-miR-1290, hsa-miR-1233-5p, hsa-miR-1914-5p, hsa-miR-1469, hsa-miR-4675, hsa-miR-1228-5p, hsa-miR-3184-5p, hsa-miR-6784-5p, hsa-miR-6800-5p, and hsa-miR-5100. Our findings suggest that miRNAs could be used as possible biomarkers for ovarian cancer screening, for possible intervention.
Collapse
Affiliation(s)
- Farzaneh Hamidi
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Neda Gilani
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
- Road Traffic Injury Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Arabi Belaghi
- Department of Mathematics, Applied Mathematics and Statistics, Uppsala University, Uppsala, Sweden
- Department of Statistics, Faculty of Mathematical Science, University of Tabriz, Tabriz, Iran
- Department of Energy and Technology, Swedish Agricultural University, Uppsala, Sweden
| | - Hanif Yaghoobi
- Department of Biological Sciences, School of Natural Sciences, University of Tabriz, Tabriz, Iran
| | - Esmaeil Babaei
- Department of Biological Sciences, School of Natural Sciences, University of Tabriz, Tabriz, Iran
- Interfaculty Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | - Parvin Sarbakhsh
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Jamileh Malakouti
- Department of Midwifery, Faculty of Nursing and Midwifery, Tabriz University of Medical Science, Tabriz, Iran
| |
Collapse
|
31
|
Fan P, Miranda O, Qi X, Kofler J, Sweet RA, Wang L. Unveiling the Enigma: Exploring Risk Factors and Mechanisms for Psychotic Symptoms in Alzheimer's Disease through Electronic Medical Records with Deep Learning Models. Pharmaceuticals (Basel) 2023; 16:911. [PMID: 37513822 PMCID: PMC10385983 DOI: 10.3390/ph16070911] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/14/2023] [Accepted: 06/16/2023] [Indexed: 07/30/2023] Open
Abstract
Around 50% of patients with Alzheimer's disease (AD) may experience psychotic symptoms after onset, resulting in a subtype of AD known as psychosis in AD (AD + P). This subtype is characterized by more rapid cognitive decline compared to AD patients without psychosis. Therefore, there is a great need to identify risk factors for the development of AD + P and explore potential treatment options. In this study, we enhanced our deep learning model, DeepBiomarker, to predict the onset of psychosis in AD utilizing data from electronic medical records (EMRs). The model demonstrated superior predictive capacity with an AUC (area under curve) of 0.907, significantly surpassing conventional risk prediction models. Utilizing a perturbation-based method, we identified key features from multiple medications, comorbidities, and abnormal laboratory tests, which notably influenced the prediction outcomes. Our findings demonstrated substantial agreement with existing studies, underscoring the vital role of metabolic syndrome, inflammation, and liver function pathways in AD + P. Importantly, the DeepBiomarker model not only offers a precise prediction of AD + P onset but also provides mechanistic understanding, potentially informing the development of innovative treatments. With additional validation, this approach could significantly contribute to early detection and prevention strategies for AD + P, thereby improving patient outcomes and quality of life.
Collapse
Affiliation(s)
- Peihao Fan
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Oshin Miranda
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Xiguang Qi
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Julia Kofler
- Department of Pathology, Division of Neuropathology, UPMC Presbyterian Hospital, Pittsburgh, PA 15213, USA
| | - Robert A Sweet
- Department of Psychiatry, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Neurology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lirong Wang
- Computational Chemical Genomics Screening Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
32
|
Brinch ML, Hald T, Wainaina L, Merlotti A, Remondini D, Henri C, Njage PMK. Comparison of Source Attribution Methodologies for Human Campylobacteriosis. Pathogens 2023; 12:786. [PMID: 37375476 DOI: 10.3390/pathogens12060786] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/09/2023] [Accepted: 05/10/2023] [Indexed: 06/29/2023] Open
Abstract
Campylobacter spp. are the most common cause of bacterial gastrointestinal infection in humans both in Denmark and worldwide. Studies have found microbial subtyping to be a powerful tool for source attribution, but comparisons of different methodologies are limited. In this study, we compare three source attribution approaches (Machine Learning, Network Analysis, and Bayesian modeling) using three types of whole genome sequences (WGS) data inputs (cgMLST, 5-Mers and 7-Mers). We predicted and compared the sources of human campylobacteriosis cases in Denmark. Using 7mer as an input feature provided the best model performance. The network analysis algorithm had a CSC value of 78.99% and an F1-score value of 67%, while the machine-learning algorithm showed the highest accuracy (98%). The models attributed between 965 and all of the 1224 human cases to a source (network applying 5mer and machine learning applying 7mer, respectively). Chicken from Denmark was the primary source of human campylobacteriosis with an average percentage probability of attribution of 45.8% to 65.4%, representing Bayesian with 7mer and machine learning with cgMLST, respectively. Our results indicate that the different source attribution methodologies based on WGS have great potential for the surveillance and source tracking of Campylobacter. The results of such models may support decision makers to prioritize and target interventions.
Collapse
Affiliation(s)
- Maja Lykke Brinch
- Research Group for Foodborne Pathogens and Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Tine Hald
- Research Group for Foodborne Pathogens and Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Lynda Wainaina
- Department of Mathematics, University of Padova, 35121 Padova, Italy
| | - Alessandra Merlotti
- Department of Physics and Astronomy, University of Bologna, 40126 Bologna, Italy
| | - Daniel Remondini
- Department of Physics and Astronomy, University of Bologna, 40126 Bologna, Italy
| | - Clementine Henri
- Research Group for Foodborne Pathogens and Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| | - Patrick Murigu Kamau Njage
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
33
|
Ross RK, Keil AP, Cole SR, Edwards JK, Stringer JSA. A WARNING ABOUT USING PREDICTED VALUES TO ESTIMATE DESCRIPTIVE MEASURES. Am J Epidemiol 2023; 192:840-843. [PMID: 36708231 PMCID: PMC10893853 DOI: 10.1093/aje/kwad020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 01/11/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Affiliation(s)
- Rachael K Ross
- Correspondence to Rachael Ross, Department of Epidemiology, Gillings School of Global Public Health, Campus Box 7435m, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-6435 (e-mail: )
| | | | | | | | | |
Collapse
|
34
|
Liu Y, Zhuang Y, Yu L, Li Q, Zhao C, Meng R, Zhu J, Guo X. A Machine Learning Framework Based on Extreme Gradient Boosting to Predict the Occurrence and Development of Infectious Diseases in Laying Hen Farms, Taking H9N2 as an Example. Animals (Basel) 2023; 13:1494. [PMID: 37174531 PMCID: PMC10177545 DOI: 10.3390/ani13091494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 04/26/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023] Open
Abstract
The H9N2 avian influenza virus has become one of the dominant subtypes of avian influenza virus in poultry and has been significantly harmful to chickens in China, with great economic losses in terms of reduced egg production or high mortality by co-infection with other pathogens. A prediction of H9N2 status based on easily available production data with high accuracy would be important and essential to prevent and control H9N2 outbreaks in advance. This study developed a machine learning framework based on the XGBoost classification algorithm using 3 months' laying rates and mortalities collected from three H9N2-infected laying hen houses with complete onset cycles. A framework was developed to automatically predict the H9N2 status of individual house for future 3 days (H9N2 status + 0, H9N2 status + 1, H9N2 status + 2) with five time frames (day + 0, day - 1, day - 2, day - 3, day - 4). It had been proven that a high accuracy rate > 90%, a recall rate > 90%, a precision rate of >80%, and an area under the curve of the receiver operator characteristic ≥ 0.85 could be achieved with the prediction models. Models with day + 0 and day - 1 were highly recommended to predict H9N2 status + 0 and H9N2 status + 1 for the direct or auxiliary monitoring of its occurrence and development. Such a framework could provide new insights into predicting H9N2 outbreaks, and other practical potential applications to assist in disease monitor were also considerable.
Collapse
Affiliation(s)
- Yu Liu
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Yanrong Zhuang
- College of Water Resources and Civil Engineering, China Agricultural University, Beijing 100083, China
| | - Ligen Yu
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Qifeng Li
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Chunjiang Zhao
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Rui Meng
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Jun Zhu
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| | - Xiaoli Guo
- Research Center of Information Technology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
- National Innovation Center of Digital Technology in Animal Husbandry, Beijing 100097, China
| |
Collapse
|
35
|
Data driven contagion risk management in low-income countries using machine learning applications with COVID-19 in South Asia. Sci Rep 2023; 13:3732. [PMID: 36878910 PMCID: PMC9987367 DOI: 10.1038/s41598-023-30348-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 02/21/2023] [Indexed: 03/08/2023] Open
Abstract
In the absence of real-time surveillance data, it is difficult to derive an early warning system and potential outbreak locations with the existing epidemiological models, especially in resource-constrained countries. We proposed a contagion risk index (CR-Index)-based on publicly available national statistics-founded on communicable disease spreadability vectors. Utilizing the daily COVID-19 data (positive cases and deaths) from 2020 to 2022, we developed country-specific and sub-national CR-Index for South Asia (India, Pakistan, and Bangladesh) and identified potential infection hotspots-aiding policymakers with efficient mitigation planning. Across the study period, the week-by-week and fixed-effects regression estimates demonstrate a strong correlation between the proposed CR-Index and sub-national (district-level) COVID-19 statistics. We validated the CR-Index using machine learning methods by evaluating the out-of-sample predictive performance. Machine learning driven validation showed that the CR-Index can correctly predict districts with high incidents of COVID-19 cases and deaths more than 85% of the time. This proposed CR-Index is a simple, replicable, and easily interpretable tool that can help low-income countries prioritize resource mobilization to contain the disease spread and associated crisis management with global relevance and applicability. This index can also help to contain future pandemics (and epidemics) and manage their far-reaching adverse consequences.
Collapse
|
36
|
Gharbi-Meliani A, Husson F, Vandendriessche H, Eleonore Bayen F, Yaffe K, Bachoud-Lévi AC, de Langavant LC. Identification of High Likelihood of Dementia in Population-Based Surveys using Unsupervised Clustering: a Longitudinal Analysis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.17.23286078. [PMID: 36865284 PMCID: PMC9980227 DOI: 10.1101/2023.02.17.23286078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Abstract
Background Dementia is defined by cognitive decline that affects functional status. Longitudinal ageing surveys often lack a clinical diagnosis of dementia though measure cognitive and function over time. We used unsupervised machine learning and longitudinal data to identify transition to probable dementia. Methods Multiple Factor Analysis was applied to longitudinal function and cognitive data of 15,278 baseline participants (aged 50 years and more) from the Survey of Health, Ageing, and Retirement in Europe (SHARE) (waves 1, 2 and 4-7, between 2004 and 2017). Hierarchical Clustering on Principal Components discriminated three clusters at each wave. We estimated probable or "Likely Dementia" prevalence by sex and age, and assessed whether dementia risk factors increased the risk of being assigned probable dementia status using multistate models. Next, we compared the "Likely Dementia" cluster with self-reported dementia status and replicated our findings in the English Longitudinal Study of Ageing (ELSA) cohort (waves 1-9, between 2002 and 2019, 7,840 participants at baseline). Findings Our algorithm identified a higher number of probable dementia cases compared with self-reported cases and showed good discriminative power across all waves (AUC ranged from 0.754 [0.722-0.787] to 0.830 [0.800-0.861]). "Likely Dementia" status was more prevalent in older people, displayed a 2:1 female/male ratio and was associated with nine factors that increased risk of transition to dementia: low education, hearing loss, hypertension, drinking, smoking, depression, social isolation, physical inactivity, diabetes, and obesity. Results were replicated in ELSA cohort with good accuracy. Interpretation Machine learning clustering can be used to study dementia determinants and outcomes in longitudinal population ageing surveys in which dementia clinical diagnosis is lacking.
Collapse
Affiliation(s)
- Amin Gharbi-Meliani
- Equipe neuropsychologie interventionnelle, Institut Mondor de Recherche Biomédicale, Département d'études cognitives, Ecole normale supérieure, Université PSL, Université Paris-Est Créteil, AP-HP Hôpital Henri Mondor-Albert Chenevier, Centre de référence Maladie de Huntington et Service de Neurologie, INSERM, 75005 Paris [ou 94000 Créteil], France
| | - François Husson
- Institut Agro, Univ Rennes1, CNRS, IRMAR, 35000, Rennes, France
| | - Henri Vandendriessche
- Laboratoire de Neurosciences Cognitives et Computationnelles, Département d'études cognitives, Ecole normale supérieure, Université PSL, INSERM, 75005 Paris, France
| | - France Eleonore Bayen
- Global Brain Health Institute, University of California, San Francisco, CA, United States; Sorbonne Université, Hôpital Pitié-Salpêtrière-Assistance Publique Hôpitaux de Paris, Département de Rééducation Neurologique, Paris, France
| | - Kristine Yaffe
- Global Brain Health Institute, University of California, San Francisco, CA, United States; Departments of Psychiatry, Neurology and Epidemiology and Biostatistics, University of California, San Francisco
| | - Anne-Catherine Bachoud-Lévi
- Equipe neuropsychologie interventionnelle, Institut Mondor de Recherche Biomédicale, Département d'études cognitives, Ecole normale supérieure, Université PSL, Université Paris-Est Créteil, AP-HP Hôpital Henri Mondor-Albert Chenevier, Centre de référence Maladie de Huntington et Service de Neurologie, INSERM, 75005 Paris [ou 94000 Créteil], France
| | - Laurent Cleret de Langavant
- Equipe neuropsychologie interventionnelle, Institut Mondor de Recherche Biomédicale, Département d'études cognitives, Ecole normale supérieure, Université PSL, Université Paris-Est Créteil, AP-HP Hôpital Henri Mondor-Albert Chenevier, Centre de référence Maladie de Huntington et Service de Neurologie, INSERM, 75005 Paris [ou 94000 Créteil], France; Global Brain Health Institute, University of California, San Francisco, CA, United States
| |
Collapse
|
37
|
Parhofer KG, Anastassopoulou A, Calver H, Becker C, Rathore AS, Dave R, Zamfir C. Estimating Prevalence and Characteristics of Statin Intolerance among High and Very High Cardiovascular Risk Patients in Germany (2017 to 2020). J Clin Med 2023; 12:jcm12020705. [PMID: 36675634 PMCID: PMC9864390 DOI: 10.3390/jcm12020705] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 01/06/2023] [Accepted: 01/08/2023] [Indexed: 01/18/2023] Open
Abstract
Statin intolerance (SI) (partial and absolute) could lead to suboptimal lipid management. The lack of a widely accepted definition of SI results into poor understanding of patient profiles and characteristics. This study aims to estimate SI and better understand patient characteristics, as reflected in clinical practice in Germany using supervised machine learning (ML) techniques. This retrospective cohort study utilized patient records from an outpatient setting in Germany in the IQVIA™ Disease Analyzer. Patients with a high cardiovascular risk, atherosclerotic cardiovascular disease, or hypercholesterolemia, and those on lipid-lowering therapies between 2017 and 2020 were included, and categorized as having “absolute” or “partial” SI. ML techniques were applied to calibrate prevalence estimates, derived from different rules and levels of confidence (high and low). The study included 292,603 patients, 6.4% and 2.8% had with high confidence absolute and partial SI, respectively. After deploying ML, SI prevalence increased approximately by 27% and 57% (p < 0.00001) in absolute and partial SI, respectively, eliciting a maximum estimate of 12.5% SI with high confidence. The use of advanced analytics to provide a complementary perspective to current prevalence estimates may inform the identification, optimal treatment, and pragmatic, patient-centered management of SI in Germany.
Collapse
Affiliation(s)
- Klaus G. Parhofer
- Ludwig Maximilians University, Medical Clinic IV, Großhadern, 81377 Munich, Germany
| | | | | | - Christian Becker
- Daiichi Sankyo Germany GmbH, Zielstattstraße 48, 81379 Munich, Germany
| | | | | | | |
Collapse
|
38
|
Barboza LA, Chou-Chen SW, Vásquez P, García YE, Calvo JG, Hidalgo HG, Sanchez F. Assessing dengue fever risk in Costa Rica by using climate variables and machine learning techniques. PLoS Negl Trop Dis 2023; 17:e0011047. [PMID: 36638136 PMCID: PMC9879398 DOI: 10.1371/journal.pntd.0011047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 01/26/2023] [Accepted: 12/20/2022] [Indexed: 01/14/2023] Open
Abstract
Dengue fever is a vector-borne disease affecting millions yearly, mostly in tropical and subtropical countries. Driven mainly by social and environmental factors, dengue incidence and geographical expansion have increased in recent decades. Therefore, understanding how climate variables drive dengue outbreaks is challenging and a problem of interest for decision-makers that could aid in improving surveillance and resource allocation. Here, we explore the effect of climate variables on relative dengue risk in 32 cantons of interest for public health authorities in Costa Rica. Relative dengue risk is forecast using a Generalized Additive Model for location, scale, and shape and a Random Forest approach. Models use a training period from 2000 to 2020 and predicted climatic variables obtained with a vector auto-regressive model. Results show reliable projections, and climate variables predictions allow for a prospective instead of a retrospective study.
Collapse
Affiliation(s)
- Luis A. Barboza
- Centro de Investigación en Matemática Pura y Aplicada - Escuela de Matemática, Universidad de Costa Rica, San José, Costa Rica
| | - Shu-Wei Chou-Chen
- Centro de Investigación en Matemática Pura y Aplicada - Escuela de Estadística, Universidad de Costa Rica, San José, Costa Rica
| | - Paola Vásquez
- Centro de Investigación en Matemática Pura y Aplicada, Universidad de Costa Rica, San José, Costa Rica
| | - Yury E. García
- Centro de Investigación en Matemática Pura y Aplicada, Universidad de Costa Rica, San José, Costa Rica
- Department of Public Health Sciences, University of California Davis, California, United States of America
- * E-mail:
| | - Juan G. Calvo
- Centro de Investigación en Matemática Pura y Aplicada - Escuela de Matemática, Universidad de Costa Rica, San José, Costa Rica
| | - Hugo G. Hidalgo
- Centro de Investigaciones Geofísicas and Escuela de Física, Universidad de Costa Rica, San José, Costa Rica
| | - Fabio Sanchez
- Centro de Investigación en Matemática Pura y Aplicada - Escuela de Matemática, Universidad de Costa Rica, San José, Costa Rica
| |
Collapse
|
39
|
Improving the Accuracy of Diabetes Diagnosis Applications through a Hybrid Feature Selection Algorithm. Neural Process Lett 2023; 55:153-169. [PMID: 33814965 PMCID: PMC7997791 DOI: 10.1007/s11063-021-10491-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/09/2021] [Indexed: 01/20/2023]
Abstract
Artificial intelligence is a future and valuable tool for early disease recognition and support in patient condition monitoring. It can increase the reliability of the cure and decision making by developing useful systems and algorithms. Healthcare workers, especially nurses and physicians, are overworked due to a massive and unexpected increase in the number of patients during the coronavirus pandemic. In such situations, artificial intelligence techniques could be used to diagnose a patient with life-threatening illnesses. In particular, diseases that increase the risk of hospitalization and death in coronavirus patients, such as high blood pressure, heart disease and diabetes, should be diagnosed at an early stage. This article focuses on diagnosing a diabetic patient through data mining techniques. If we are able to diagnose diabetes in the early stages of the disease, we can force patients to stay home and care for their health, so the risk of being infected with the coronavirus would be reduced. The proposed method has three steps: preprocessing, feature selection and classification. Several combinations of Harmony search algorithm, genetic algorithm, and particle swarm optimization algorithm are examined with K-means for feature selection. The combinations have not examined before for diabetes diagnosis applications. K-nearest neighbor is used for classification of the diabetes dataset. Sensitivity, specificity, and accuracy have been measured to evaluate the results. The results achieved indicate that the proposed method with an accuracy of 91.65% outperformed the results of the earlier methods examined in this article.
Collapse
|
40
|
Hayakawa T, Nagashima T, Akimoto H, Minagawa K, Takahashi Y, Asai S. Benzodiazepine-related dementia risks and protopathic biases revealed by multiple-kernel learning with electronic medical records. Digit Health 2023; 9:20552076231178577. [PMID: 37312937 PMCID: PMC10259140 DOI: 10.1177/20552076231178577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 05/06/2023] [Indexed: 06/15/2023] Open
Abstract
Objectives To simultaneously estimate how the risk of incident dementia nonlinearly varies with the administration period and cumulative dose of benzodiazepines, the duration of disorders with an indication for benzodiazepines, and other potential confounders, with the goal of settling the controversy over the role of benzodiazepines in the development of dementia. Methods The classical hazard model was extended using the techniques of multiple-kernel learning. Regularised maximum-likelihood estimation, including determination of hyperparameter values with 10-fold cross-validation, bootstrap goodness-of-fit test, and bootstrap estimation of confidence intervals, was applied to cohorts retrospectively extracted from electronic medical records of our university hospitals between 1 November 2004 and 31 July 2020. The analysis was mainly focused on 8160 patients aged 40 or older with new onset of insomnia, affective disorders, or anxiety disorders, who were followed up for 4.10 ± 3.47 years. Results Besides previously reported risk associations, we detected significant nonlinear risk variations over 2-4 years attributable to the duration of insomnia and anxiety disorders, and to the administration period of short-acting benzodiazepines. After nonlinear adjustment for potential confounders, we observed no significant risk associations with long-term use of benzodiazepines. Conclusions The pattern of the detected nonlinear risk variations suggested reverse causation and confounding. Their putative bias effects over 2-4 years suggested similar biases in previously reported results. These results, together with the lack of significant risk associations with long-term use of benzodiazepines, suggested the need to reconsider previous results and methods for future analysis.
Collapse
Affiliation(s)
- Takashi Hayakawa
- Division of Pharmacology, Department of Biomedical Sciences, Nihon University School of Medicine, Tokyo, Japan
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | - Takuya Nagashima
- Division of Pharmacology, Department of Biomedical Sciences, Nihon University School of Medicine, Tokyo, Japan
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | - Hayato Akimoto
- Division of Pharmacology, Department of Biomedical Sciences, Nihon University School of Medicine, Tokyo, Japan
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | - Kimino Minagawa
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | - Yasuo Takahashi
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| | - Satoshi Asai
- Division of Pharmacology, Department of Biomedical Sciences, Nihon University School of Medicine, Tokyo, Japan
- Division of Genomic Epidemiology and Clinical Trials, Clinical Trials Research Center, Nihon University School of Medicine, Tokyo, Japan
| |
Collapse
|
41
|
Ikram M, Shaikh NF, Vishwanatha JK, Sambamoorthi U. Leading Predictors of COVID-19-Related Poor Mental Health in Adult Asian Indians: An Application of Extreme Gradient Boosting and Shapley Additive Explanations. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 20:775. [PMID: 36613095 PMCID: PMC9819341 DOI: 10.3390/ijerph20010775] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/22/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
During the COVID-19 pandemic, an increase in poor mental health among Asian Indians was observed in the United States. However, the leading predictors of poor mental health during the COVID-19 pandemic in Asian Indians remained unknown. A cross-sectional online survey was administered to self-identified Asian Indians aged 18 and older (N = 289). Survey collected information on demographic and socio-economic characteristics and the COVID-19 burden. Two novel machine learning techniques-eXtreme Gradient Boosting and Shapley Additive exPlanations (SHAP) were used to identify the leading predictors and explain their associations with poor mental health. A majority of the study participants were female (65.1%), below 50 years of age (73.3%), and had income ≥ $75,000 (81.0%). The six leading predictors of poor mental health among Asian Indians were sleep disturbance, age, general health, income, wearing a mask, and self-reported discrimination. SHAP plots indicated that higher age, wearing a mask, and maintaining social distancing all the time were negatively associated with poor mental health while having sleep disturbance and imputed income levels were positively associated with poor mental health. The model performance metrics indicated high accuracy (0.77), precision (0.78), F1 score (0.77), recall (0.77), and AUROC (0.87). Nearly one in two adults reported poor mental health, and one in five reported sleep disturbance. Findings from our study suggest a paradoxical relationship between income and poor mental health; further studies are needed to confirm our study findings. Sleep disturbance and perceived discrimination can be targeted through tailored intervention to reduce the risk of poor mental health in Asian Indians.
Collapse
Affiliation(s)
- Mohammad Ikram
- Department of Pharmaceutical Systems and Policy, School of Pharmacy, West Virginia University, Robert C. Byrd Health Sciences Center [North], P.O. Box 9510, Morgantown, WV 26506-9510, USA
| | - Nazneen Fatima Shaikh
- Department of Pharmaceutical Systems and Policy, School of Pharmacy, West Virginia University, Robert C. Byrd Health Sciences Center [North], P.O. Box 9510, Morgantown, WV 26506-9510, USA
| | - Jamboor K. Vishwanatha
- Department of Microbiology, Immunology and Genetics, University of North Texas Health Science Center, Fort Worth, TX 76107, USA
| | - Usha Sambamoorthi
- Department of Pharmacotherapy, University of North Texas Health Science Center, Fort Worth, TX 76107, USA
| |
Collapse
|
42
|
Kirk D, Kok E, Tufano M, Tekinerdogan B, Feskens EJM, Camps G. Machine Learning in Nutrition Research. Adv Nutr 2022; 13:2573-2589. [PMID: 36166846 PMCID: PMC9776646 DOI: 10.1093/advances/nmac103] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/02/2022] [Accepted: 09/22/2022] [Indexed: 01/29/2023] Open
Abstract
Data currently generated in the field of nutrition are becoming increasingly complex and high-dimensional, bringing with them new methods of data analysis. The characteristics of machine learning (ML) make it suitable for such analysis and thus lend itself as an alternative tool to deal with data of this nature. ML has already been applied in important problem areas in nutrition, such as obesity, metabolic health, and malnutrition. Despite this, experts in nutrition are often without an understanding of ML, which limits its application and therefore potential to solve currently open questions. The current article aims to bridge this knowledge gap by supplying nutrition researchers with a resource to facilitate the use of ML in their research. ML is first explained and distinguished from existing solutions, with key examples of applications in the nutrition literature provided. Two case studies of domains in which ML is particularly applicable, precision nutrition and metabolomics, are then presented. Finally, a framework is outlined to guide interested researchers in integrating ML into their work. By acting as a resource to which researchers can refer, we hope to support the integration of ML in the field of nutrition to facilitate modern research.
Collapse
Affiliation(s)
- Daniel Kirk
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Esther Kok
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Michele Tufano
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Bedir Tekinerdogan
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands
| | - Edith J M Feskens
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Guido Camps
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands.,OnePlanet Research Center, Wageningen, The Netherlands
| |
Collapse
|
43
|
Wang J. Mathematical Models for Cholera Dynamics-A Review. Microorganisms 2022; 10:microorganisms10122358. [PMID: 36557611 PMCID: PMC9783556 DOI: 10.3390/microorganisms10122358] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 11/27/2022] [Accepted: 11/28/2022] [Indexed: 11/30/2022] Open
Abstract
Cholera remains a significant public health burden in many countries and regions of the world, highlighting the need for a deeper understanding of the mechanisms associated with its transmission, spread, and control. Mathematical modeling offers a valuable research tool to investigate cholera dynamics and explore effective intervention strategies. In this article, we provide a review of the current state in the modeling studies of cholera. Starting from an introduction of basic cholera transmission models and their applications, we survey model extensions in several directions that include spatial and temporal heterogeneities, effects of disease control, impacts of human behavior, and multi-scale infection dynamics. We discuss some challenges and opportunities for future modeling efforts on cholera dynamics, and emphasize the importance of collaborations between different modeling groups and different disciplines in advancing this research area.
Collapse
Affiliation(s)
- Jin Wang
- Department of Mathematics, University of Tennessee at Chattanooga, Chattanooga, TN 37403, USA
| |
Collapse
|
44
|
Wu Y, Jia M, Xiang C, Fang Y. Latent trajectories of frailty and risk prediction models among geriatric community dwellers: an interpretable machine learning perspective. BMC Geriatr 2022; 22:900. [PMID: 36434518 PMCID: PMC9700973 DOI: 10.1186/s12877-022-03576-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 11/01/2022] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND This study aimed to identify long-term frailty trajectories among older adults (≥65) and construct interpretable prediction models to assess the risk of developing abnormal frailty trajectory among older adults and examine significant factors related to the progression of frailty. METHODS This study retrospectively collected data from the Chinese Longitudinal Healthy Longevity and Happy Family Study between 2002 and 2018 (N = 4083). Frailty was defined by the frailty index. The whole study consisted of two phases of tasks. First, group-based trajectory modeling was used to identify frailty trajectories. Second, easy-to-access epidemiological data was utilized to construct machine learning algorithms including naïve bayes, logistic regression, decision tree, support vector machine, random forest, artificial neural network, and extreme gradient boosting to predict the risk of long-term frailty trajectories. Further, Shapley additive explanations was employed to identify feature importance and open-up the black box model of machine learning to further strengthen decision makers' trust in the model. RESULTS Two distinct frailty trajectories (stable-growth: 82.54%, rapid-growth: 17.46%) were identified. Compared with other algorithms, random forest performed relatively better in distinguishing the stable-growth and rapid-growth groups. Physical function including activities of daily living and instrumental activities of daily living, marital status, weight, and cognitive function were top five predictors. CONCLUSIONS Interpretable machine learning can achieve the primary goal of risk stratification and make it more transparent in individual prediction beneficial to primary screening and tailored prevention.
Collapse
Affiliation(s)
- Yafei Wu
- grid.12955.3a0000 0001 2264 7233School of Public Health, Xiamen University, Xiang’an Nan Road, Xiang’an District, Xiamen, 361102 Fujian China ,grid.12955.3a0000 0001 2264 7233National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian China ,grid.12955.3a0000 0001 2264 7233Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian China
| | - Maoni Jia
- grid.12955.3a0000 0001 2264 7233School of Public Health, Xiamen University, Xiang’an Nan Road, Xiang’an District, Xiamen, 361102 Fujian China ,grid.12955.3a0000 0001 2264 7233Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian China
| | - Chaoyi Xiang
- grid.12955.3a0000 0001 2264 7233School of Public Health, Xiamen University, Xiang’an Nan Road, Xiang’an District, Xiamen, 361102 Fujian China ,grid.12955.3a0000 0001 2264 7233Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian China
| | - Ya Fang
- grid.12955.3a0000 0001 2264 7233School of Public Health, Xiamen University, Xiang’an Nan Road, Xiang’an District, Xiamen, 361102 Fujian China ,grid.12955.3a0000 0001 2264 7233National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian China ,grid.12955.3a0000 0001 2264 7233Key Laboratory of Health Technology Assessment of Fujian Province, School of Public Health, Xiamen University, Xiamen, Fujian China
| |
Collapse
|
45
|
Leist AK, Klee M, Kim JH, Rehkopf DH, Bordas SPA, Muniz-Terrera G, Wade S. Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. SCIENCE ADVANCES 2022; 8:eabk1942. [PMID: 36260666 PMCID: PMC9581488 DOI: 10.1126/sciadv.abk1942] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 09/01/2022] [Indexed: 05/20/2023]
Abstract
Machine learning (ML) methodology used in the social and health sciences needs to fit the intended research purposes of description, prediction, or causal inference. This paper provides a comprehensive, systematic meta-mapping of research questions in the social and health sciences to appropriate ML approaches by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, counterfactual prediction, and causal structural learning to common research goals, such as estimating prevalence of adverse social or health outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes, and explain common ML performance metrics. Such mapping may help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research.
Collapse
Affiliation(s)
- Anja K. Leist
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
- Corresponding author.
| | - Matthias Klee
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jung Hyun Kim
- Department of Social Sciences, Institute for Research on Socio-Economic Inequality (IRSEI), University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - David H. Rehkopf
- Department of Epidemiology and Population Health, Stanford University, Palo Alto, CA, USA
| | | | - Graciela Muniz-Terrera
- Centre for Dementia Prevention, University of Edinburgh, Edinburgh, UK
- Ohio University, Athens, OH, USA
| | - Sara Wade
- School of Mathematics, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
46
|
Zheng D, Hao X, Khan M, Wang L, Li F, Xiang N, Kang F, Hamalainen T, Cong F, Song K, Qiao C. Comparison of machine learning and logistic regression as predictive models for adverse maternal and neonatal outcomes of preeclampsia: A retrospective study. Front Cardiovasc Med 2022; 9:959649. [PMID: 36312231 PMCID: PMC9596815 DOI: 10.3389/fcvm.2022.959649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 09/12/2022] [Indexed: 12/05/2022] Open
Abstract
Introduction Preeclampsia, one of the leading causes of maternal and fetal morbidity and mortality, demands accurate predictive models for the lack of effective treatment. Predictive models based on machine learning algorithms demonstrate promising potential, while there is a controversial discussion about whether machine learning methods should be recommended preferably, compared to traditional statistical models. Methods We employed both logistic regression and six machine learning methods as binary predictive models for a dataset containing 733 women diagnosed with preeclampsia. Participants were grouped by four different pregnancy outcomes. After the imputation of missing values, statistical description and comparison were conducted preliminarily to explore the characteristics of documented 73 variables. Sequentially, correlation analysis and feature selection were performed as preprocessing steps to filter contributing variables for developing models. The models were evaluated by multiple criteria. Results We first figured out that the influential variables screened by preprocessing steps did not overlap with those determined by statistical differences. Secondly, the most accurate imputation method is K-Nearest Neighbor, and the imputation process did not affect the performance of the developed models much. Finally, the performance of models was investigated. The random forest classifier, multi-layer perceptron, and support vector machine demonstrated better discriminative power for prediction evaluated by the area under the receiver operating characteristic curve, while the decision tree classifier, random forest, and logistic regression yielded better calibration ability verified, as by the calibration curve. Conclusion Machine learning algorithms can accomplish prediction modeling and demonstrate superior discrimination, while Logistic Regression can be calibrated well. Statistical analysis and machine learning are two scientific domains sharing similar themes. The predictive abilities of such developed models vary according to the characteristics of datasets, which still need larger sample sizes and more influential predictors to accumulate evidence.
Collapse
Affiliation(s)
- Dongying Zheng
- State Key Laboratory of Fine Chemicals, Dalian R&D Center for Stem Cell and Tissue Engineering, Dalian University of Technology, Dalian, China,Department of Obstetrics and Gynecology, Second Affiliated Hospital of Dalian Medical University, Dalian, China,Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland
| | - Xinyu Hao
- Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland,School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Muhanmmad Khan
- Institute of Zoology, University of Punjab, Lahore, Pakistan
| | - Lixia Wang
- Department of Obstetrics and Gynecology, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Fan Li
- Department of Obstetrics and Gynecology, Shengjing Hospital, China Medical University, Shenyang, China
| | - Ning Xiang
- Department of Obstetrics and Gynecology, Jingzhou Hospital Affiliated to Yangtze University, Jingzhou, China
| | - Fuli Kang
- Department of Obstetrics and Gynecology, Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Timo Hamalainen
- Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland
| | - Fengyu Cong
- Faculty of Information Technology, University of Jyvaskyla, Jyväskylä, Finland,School of Biomedical Engineering, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China,School of Artificial Intelligence, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China,Key Laboratory of Integrated Circuit and Biomedical Electronic System, Dalian University of Technology, Dalian, China
| | - Kedong Song
- State Key Laboratory of Fine Chemicals, Dalian R&D Center for Stem Cell and Tissue Engineering, Dalian University of Technology, Dalian, China,*Correspondence: Kedong Song
| | - Chong Qiao
- Department of Obstetrics and Gynecology, Shengjing Hospital, China Medical University, Shenyang, China,Chong Qiao
| |
Collapse
|
47
|
Yoshihara A, Yoshimura Noh J, Inoue K, Taguchi J, Hata K, Aizawa T, Taira Arai Y, Watanabe N, Fukushita M, Matsumoto M, Suzuki N, Hoshiyama A, Suzuki A, Mitsumatsu T, Kinoshita A, Mikura K, Yoshimura R, Sugino K, Ito K. Prediction model of Graves' disease in general clinical practice based on complete blood count and biochemistry profile. Endocr J 2022; 69:1091-1100. [PMID: 35387949 DOI: 10.1507/endocrj.ej21-0741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Although untreated Graves' disease (GD) is associated with a higher risk of cardiac complications and mortality, there is no well-established way to predict the onset of thyrotoxicosis in clinical practice. The aim of this study was to identify important variables that will make it possible to predict GD and thyrotoxicosis (GD + painless thyroiditis (PT)) by using a machine-learning-based model based on complete blood count and standard biochemistry profile data. We identified 19,335 newly diagnosed GD patients, 3,267 PT patients, and 4,159 subjects without any thyroid disease. We built a GD prediction model based on information obtained from subjects regarding sex, age, a complete blood count, and a standard biochemistry profile. We built the model in the training set and evaluated the performance of the model in the test set by using the artificial intelligence software Prediction One. Our machine learning-based model showed high discriminative ability to predict GD in the test set (area under the curve [AUC] 0.99). The main contributing factors to predict GD included age and serum creatinine, total cholesterol, alkaline phosphatase, and total protein levels. We still found high discriminative ability even when we restricted the variables to these five most contributory factors in our prediction model (AUC 0.97) built by using artificial intelligence software showed high GD prediction ability based on information regarding only five factors.
Collapse
Affiliation(s)
| | | | - Kosuke Inoue
- Ito Hospital, Tokyo 150-8308, Japan
- Department of Social Epidemiology, Graduate School of Medicine, Kyoto University, Kyoto 606-8501, Japan
| | | | - Keisuke Hata
- Nihonbashi Muromachi Mitsui Tower Midtown Clinic, Tokyo 103-0022, Japan
| | - Toru Aizawa
- Nihonbashi Muromachi Mitsui Tower Midtown Clinic, Tokyo 103-0022, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Ru B, Kujawski S, Lee Afanador N, Baumgartner R, Pawaskar M, Das A. Predicting Measles Outbreaks in the United States: Evaluation of Machine Learning Approaches (Preprint). JMIR Form Res 2022; 7:e42832. [PMID: 37014694 PMCID: PMC10131820 DOI: 10.2196/42832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 01/24/2023] [Accepted: 02/07/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Measles, a highly contagious viral infection, is resurging in the United States, driven by international importation and declining domestic vaccination coverage. Despite this resurgence, measles outbreaks are still rare events that are difficult to predict. Improved methods to predict outbreaks at the county level would facilitate the optimal allocation of public health resources. OBJECTIVE We aimed to validate and compare extreme gradient boosting (XGBoost) and logistic regression, 2 supervised learning approaches, to predict the US counties most likely to experience measles cases. We also aimed to assess the performance of hybrid versions of these models that incorporated additional predictors generated by 2 clustering algorithms, hierarchical density-based spatial clustering of applications with noise (HDBSCAN) and unsupervised random forest (uRF). METHODS We constructed a supervised machine learning model based on XGBoost and unsupervised models based on HDBSCAN and uRF. The unsupervised models were used to investigate clustering patterns among counties with measles outbreaks; these clustering data were also incorporated into hybrid XGBoost models as additional input variables. The machine learning models were then compared to logistic regression models with and without input from the unsupervised models. RESULTS Both HDBSCAN and uRF identified clusters that included a high percentage of counties with measles outbreaks. XGBoost and XGBoost hybrid models outperformed logistic regression and logistic regression hybrid models, with the area under the receiver operating curve values of 0.920-0.926 versus 0.900-0.908, the area under the precision-recall curve values of 0.522-0.532 versus 0.485-0.513, and F2 scores of 0.595-0.601 versus 0.385-0.426. Logistic regression or logistic regression hybrid models had higher sensitivity than XGBoost or XGBoost hybrid models (0.837-0.857 vs 0.704-0.735) but a lower positive predictive value (0.122-0.141 vs 0.340-0.367) and specificity (0.793-0.821 vs 0.952-0.958). The hybrid versions of the logistic regression and XGBoost models had slightly higher areas under the precision-recall curve, specificity, and positive predictive values than the respective models that did not include any unsupervised features. CONCLUSIONS XGBoost provided more accurate predictions of measles cases at the county level compared with logistic regression. The threshold of prediction in this model can be adjusted to align with each county's resources, priorities, and risk for measles. While clustering pattern data from unsupervised machine learning approaches improved some aspects of model performance in this imbalanced data set, the optimal approach for the integration of such approaches with supervised machine learning models requires further investigation.
Collapse
Affiliation(s)
- Boshu Ru
- Merck & Co, Inc, West Point, PA, United States
| | | | | | | | | | - Amar Das
- Merck & Co, Inc, Rahway, NJ, United States
| |
Collapse
|
49
|
Russo V, Lallo E, Munnia A, Spedicato M, Messerini L, D’Aurizio R, Ceroni EG, Brunelli G, Galvano A, Russo A, Landini I, Nobili S, Ceppi M, Bruzzone M, Cianchi F, Staderini F, Roselli M, Riondino S, Ferroni P, Guadagni F, Mini E, Peluso M. Artificial Intelligence Predictive Models of Response to Cytotoxic Chemotherapy Alone or Combined to Targeted Therapy for Metastatic Colorectal Cancer Patients: A Systematic Review and Meta-Analysis. Cancers (Basel) 2022; 14:4012. [PMID: 36011003 PMCID: PMC9406544 DOI: 10.3390/cancers14164012] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/26/2022] [Accepted: 08/12/2022] [Indexed: 12/24/2022] Open
Abstract
Tailored treatments for metastatic colorectal cancer (mCRC) have not yet completely evolved due to the variety in response to drugs. Therefore, artificial intelligence has been recently used to develop prognostic and predictive models of treatment response (either activity/efficacy or toxicity) to aid in clinical decision making. In this systematic review, we have examined the ability of learning methods to predict response to chemotherapy alone or combined with targeted therapy in mCRC patients by targeting specific narrative publications in Medline up to April 2022 to identify appropriate original scientific articles. After the literature search, 26 original articles met inclusion and exclusion criteria and were included in the study. Our results show that all investigations conducted on this field have provided generally promising results in predicting the response to therapy or toxic side-effects. By a meta-analytic approach we found that the overall weighted means of the area under the receiver operating characteristic (ROC) curve (AUC) were 0.90, 95% C.I. 0.80-0.95 and 0.83, 95% C.I. 0.74-0.89 in training and validation sets, respectively, indicating a good classification performance in discriminating response vs. non-response. The calculation of overall HR indicates that learning models have strong ability to predict improved survival. Lastly, the delta-radiomics and the 74 gene signatures were able to discriminate response vs. non-response by correctly identifying up to 99% of mCRC patients who were responders and up to 100% of patients who were non-responders. Specifically, when we evaluated the predictive models with tests reaching 80% sensitivity (SE) and 90% specificity (SP), the delta radiomics showed an SE of 99% and an SP of 94% in the training set and an SE of 85% and SP of 92 in the test set, whereas for the 74 gene signatures the SE was 97.6% and the SP 100% in the training set.
Collapse
Affiliation(s)
- Valentina Russo
- Research and Development Branch, Regional Cancer Prevention Laboratory, ISPRO-Study, Prevention and Oncology Network Institute, 50139 Florence, Italy
| | - Eleonora Lallo
- Research and Development Branch, Regional Cancer Prevention Laboratory, ISPRO-Study, Prevention and Oncology Network Institute, 50139 Florence, Italy
| | - Armelle Munnia
- Research and Development Branch, Regional Cancer Prevention Laboratory, ISPRO-Study, Prevention and Oncology Network Institute, 50139 Florence, Italy
| | - Miriana Spedicato
- Research and Development Branch, Regional Cancer Prevention Laboratory, ISPRO-Study, Prevention and Oncology Network Institute, 50139 Florence, Italy
| | - Luca Messerini
- Department of Experimental and Clinical Medicine, University of Florence, 50134 Florence, Italy
| | - Romina D’Aurizio
- Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy
| | - Elia Giuseppe Ceroni
- Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy
| | - Giulia Brunelli
- Institute of Informatics and Telematics, National Research Council, 56124 Pisa, Italy
| | - Antonio Galvano
- Department of Surgical, Oncological and Oral Sciences, University of Palermo, 90127 Palermo, Italy
| | - Antonio Russo
- Department of Surgical, Oncological and Oral Sciences, University of Palermo, 90127 Palermo, Italy
| | - Ida Landini
- Department of Health Sciences, University of Florence, 50139 Florence, Italy
| | - Stefania Nobili
- Department of Neurosciences, Imaging and Clinical Sciences, “G. D’Annunzio” Chieti-Pescara, 66100 Chieti, Italy
| | - Marcello Ceppi
- Clinical Epidemiology Unit, IRCCS-Ospedale Policlinico San Martino, 16131 Genova, Italy
| | - Marco Bruzzone
- Clinical Epidemiology Unit, IRCCS-Ospedale Policlinico San Martino, 16131 Genova, Italy
| | - Fabio Cianchi
- Department of Experimental and Clinical Medicine, University of Florence, 50134 Florence, Italy
| | - Fabio Staderini
- Department of Experimental and Clinical Medicine, University of Florence, 50134 Florence, Italy
| | - Mario Roselli
- Medical Oncology Unit, Department of Systems Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Silvia Riondino
- Medical Oncology Unit, Department of Systems Medicine, Tor Vergata University, 00133 Rome, Italy
| | - Patrizia Ferroni
- BioBIM (InterInstitutional Multidisciplinary Biobank), IRCCS San Raffaele Roma, 00166 Rome, Italy
- Department of Human Sciences & Quality of Life Promotion, San Raffaele Roma Open University, 00166 Rome, Italy
| | - Fiorella Guadagni
- BioBIM (InterInstitutional Multidisciplinary Biobank), IRCCS San Raffaele Roma, 00166 Rome, Italy
- Department of Human Sciences & Quality of Life Promotion, San Raffaele Roma Open University, 00166 Rome, Italy
| | - Enrico Mini
- Department of Health Sciences, University of Florence, 50139 Florence, Italy
| | - Marco Peluso
- Research and Development Branch, Regional Cancer Prevention Laboratory, ISPRO-Study, Prevention and Oncology Network Institute, 50139 Florence, Italy
| |
Collapse
|
50
|
Wang S, Wang W, Li X, Liu Y, Wei J, Zheng J, Wang Y, Ye B, Zhao R, Huang Y, Peng S, Zheng Y, Zeng Y. Using machine learning algorithms for predicting cognitive impairment and identifying modifiable factors among Chinese elderly people. Front Aging Neurosci 2022; 14:977034. [PMID: 36034140 PMCID: PMC9407018 DOI: 10.3389/fnagi.2022.977034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 07/19/2022] [Indexed: 11/18/2022] Open
Abstract
Objectives: This study firstly aimed to explore predicting cognitive impairment at an early stage using a large population-based longitudinal survey of elderly Chinese people. The second aim was to identify reversible factors which may help slow the rate of decline in cognitive function over 3 years in the community. Methods: We included 12,280 elderly people from four waves of the Chinese Longitudinal Healthy Longevity Survey (CLHLS), followed from 2002 to 2014. The Chinese version of the Mini-Mental State Examination (MMSE) was used to examine cognitive function. Six machine learning algorithms (including a neural network model) and an ensemble method were trained on data split 2/3 for training and 1/3 testing. Parameters were explored in training data using 3-fold cross-validation and models were evaluated in test data. The model performance was measured by area-under-curve (AUC), sensitivity, and specificity. In addition, due to its better interpretability, logistic regression (LR) was used to assess the association of life behavior and its change with cognitive impairment after 3 years. Results: Support vector machine and multi-layer perceptron were found to be the best performing algorithms with AUC of 0.8267 and 0.8256, respectively. Fusing the results of all six single models further improves the AUC to 0.8269. Playing more Mahjong or cards (OR = 0.49,95% CI: 0.38-0.64), doing more garden works (OR = 0.54,95% CI: 0.43-0.68), watching TV or listening to the radio more (OR = 0.67,95% CI: 0.59-0.77) were associated with decreased risk of cognitive impairment after 3 years. Conclusions: Machine learning algorithms especially the SVM, and the ensemble model can be leveraged to identify the elderly at risk of cognitive impairment. Doing more leisure activities, doing more gardening work, and engaging in more activities combined were associated with decreased risk of cognitive impairment.
Collapse
Affiliation(s)
| | | | | | | | - Jingming Wei
- Institute of Mental Health, Peking University, Beijing, China
| | | | - Yan Wang
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | | | | | - Yu Huang
- Tencent Jarvis Lab, Shenzhen, China
| | | | | | - Yanbing Zeng
- School of Public Health, Capital Medical University, Beijing, China
| |
Collapse
|