1
|
Umesh C, Mahendra M, Bej S, Wolkenhauer O, Wolfien M. Challenges and applications in generative AI for clinical tabular data in physiology. Pflugers Arch 2025; 477:531-542. [PMID: 39417878 PMCID: PMC11958401 DOI: 10.1007/s00424-024-03024-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/17/2024] [Accepted: 09/23/2024] [Indexed: 10/19/2024]
Abstract
Recent advancements in generative approaches in AI have opened up the prospect of synthetic tabular clinical data generation. From filling in missing values in real-world data, these approaches have now advanced to creating complex multi-tables. This review explores the development of techniques capable of synthesizing patient data and modeling multiple tables. We highlight the challenges and opportunities of these methods for analyzing patient data in physiology. Additionally, it discusses the challenges and potential of these approaches in improving clinical research, personalized medicine, and healthcare policy. The integration of these generative models into physiological settings may represent both a theoretical advancement and a practical tool that has the potential to improve mechanistic understanding and patient care. By providing a reliable source of synthetic data, these models can also help mitigate privacy concerns and facilitate large-scale data sharing.
Collapse
Affiliation(s)
- Chaithra Umesh
- Institute of Computer Science, Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany.
| | - Manjunath Mahendra
- Institute of Computer Science, Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany.
| | - Saptarshi Bej
- School of Data Science, Indian Institute of Science Education and Research (IISER), Thiruvananthapuram, India
| | - Olaf Wolkenhauer
- Institute of Computer Science, Department of Systems Biology and Bioinformatics, University of Rostock, Rostock, Germany
- Leibniz-Institute for Food Systems Biology, Technical University of Munich, Freising, Germany
| | - Markus Wolfien
- Faculty of Medicine Carl Gustav Carus, Institute for Medical Informatics and Biometry, TUD Dresden University of Technology, Dresden, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden, Germany
| |
Collapse
|
2
|
Telkmann K, Gudi-Mindermann H, Bogers R, Ahrens J, Tönnies J, van Kamp I, Vrijkotte T, Bolte G. Identification of exposome clusters based on societal, social, built and natural environment - results of the ABCD cohort study. ENVIRONMENT INTERNATIONAL 2025; 197:109335. [PMID: 39983415 DOI: 10.1016/j.envint.2025.109335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 01/17/2025] [Accepted: 02/14/2025] [Indexed: 02/23/2025]
Abstract
Exposome research has seen a recent increase. The conceptual framework of the Social Exposome extends initial concepts by considering the entirety of societal, social, built and natural environmental exposures which are assumed to holistically impact development and health across the lifecourse. The aim of this study is the identification and characterisation of exposome clusters. Additionally, their relevance for mental health is investigated. To this end 2,850 participants aged 11-12 of the Amsterdam Born Children and their Development (ABCD) Cohort Study were analysed. The exposome was characterized by 60 variables representing the societal, social, built and natural environment. Uniform manifold approximation and projection (UMAP) was applied for dimensionality reduction, and subsequently clustering was performed on the retrieved low-dimensional embedding. Mental health symptoms and behaviour related outcomes were assessed by the Strength and Difficulties Questionnaire (SDQ) as well as the Substance Use Risk Profile Scale (SURPS). The results suggest that exposome clusters are mainly driven by contextual socioeconomic and physical characteristics such as neighborhood income and deprivation rather than social characteristics at the individual level. Moreover, prevalence of children's mental health problems was more prominent within exposome clusters characterized at the contextual level by more deprived neighborhoods and at the individual level by higher prevalence of maternal mental health problems. This exploratory exposome cluster identification emphasized the relevance of socioeconomic neighborhood characteristics, thus structural inequalities.
Collapse
Affiliation(s)
- Klaus Telkmann
- University of Bremen, Institute of Public Health and Nursing Research, Department of Social Epidemiology, 28359 Bremen, Germany; University of Bremen, Health Sciences Bremen 28359 Bremen, Germany.
| | - Helene Gudi-Mindermann
- University of Bremen, Institute of Public Health and Nursing Research, Department of Social Epidemiology, 28359 Bremen, Germany; University of Bremen, Health Sciences Bremen 28359 Bremen, Germany
| | - Rik Bogers
- National Institute for Public Health and the Environment, Centre for Sustainability, Environment and Health, 3720 BA Bilthoven, the Netherlands
| | - Jenny Ahrens
- University of Bremen, Institute of Public Health and Nursing Research, Department of Social Epidemiology, 28359 Bremen, Germany; University of Bremen, Health Sciences Bremen 28359 Bremen, Germany
| | - Justus Tönnies
- University of Bremen, Institute of Public Health and Nursing Research, Department of Social Epidemiology, 28359 Bremen, Germany; University of Bremen, Health Sciences Bremen 28359 Bremen, Germany
| | - Irene van Kamp
- National Institute for Public Health and the Environment, Centre for Sustainability, Environment and Health, 3720 BA Bilthoven, the Netherlands
| | - Tanja Vrijkotte
- University of Amsterdam, Amsterdam UMC, Amsterdam Public Health Research Institute, Department of Public and Occupational Health, 1081 BT Amsterdam, the Netherlands
| | - Gabriele Bolte
- University of Bremen, Institute of Public Health and Nursing Research, Department of Social Epidemiology, 28359 Bremen, Germany; University of Bremen, Health Sciences Bremen 28359 Bremen, Germany
| |
Collapse
|
3
|
Tian Z, Zhang J, Fan Y, Sun X, Wang D, Liu X, Lu G, Wang H. Diabetic peripheral neuropathy detection of type 2 diabetes using machine learning from TCM features: a cross-sectional study. BMC Med Inform Decis Mak 2025; 25:90. [PMID: 39966886 PMCID: PMC11837659 DOI: 10.1186/s12911-025-02932-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2024] [Accepted: 02/11/2025] [Indexed: 02/20/2025] Open
Abstract
AIMS Diabetic peripheral neuropathy (DPN) is the most common complication of diabetes mellitus. Early identification of individuals at high risk of DPN is essential for successful early intervention. Traditional Chinese medicine (TCM) tongue diagnosis, one of the four diagnostic methods, lacks specific algorithms for TCM symptoms and tongue features. This study aims to develop machine learning (ML) models based on TCM to predict the risk of diabetic peripheral neuropathy (DPN) in patients with type 2 diabetes mellitus (T2DM). METHODS A total of 4723 patients were included in the analysis (4430 with T2DM and 293 with DPN). TFDA-1 was used to obtain tongue images during a questionnaire survey. LASSO (least absolute shrinkage and selection operator) logistic regression model with fivefold cross-validation was used to select imaging features, which were then screened using best subset selection. The synthetic minority oversampling technique (SMOTE) algorithm was applied to address the class imbalance and eliminate possible bias. The area under the receiver operating characteristic curve (AUC) was used to evaluate the model's performance. Four ML algorithms, namely logistic regression (LR), random forest (RF), support vector classifier (SVC), and light gradient boosting machine (LGBM), were used to build predictive models for DPN. The importance of covariates in DPN was ranked using classifiers with better performance. RESULTS The RF model performed the best, with an accuracy of 0.767, precision of 0.718, recall of 0.874, F-1 score of 0.789, and AUC of 0.77. With a value of 0.879, the LGBM model appeared to be the best regarding recall Age, sweating, dark red tongue, insomnia, and smoking were the five most significant RF features. Age, yellow coating, loose teeth, smoking, and insomnia were the five most significant features of the LGBM model. CONCLUSIONS This cross-sectional study demonstrates that the RF and LGBM models can screen for high-risk DPN in T2DM patients using TCM symptoms and tongue features. The identified key TCM-related features, such as age, tongue coating, and other symptoms, may be advantageous in developing preventative measures for T2DM patients.
Collapse
Affiliation(s)
- Zhikui Tian
- School of Rehabilitation Medicine, Qilu Medical University, Shandong, 255300, China
| | - JiZhong Zhang
- School of Rehabilitation Medicine, Qilu Medical University, Shandong, 255300, China
| | - Yadong Fan
- Medical College of Yangzhou University, YangZhou, 225000, China
| | - Xuan Sun
- College of Traditional Chinese Medicine, Binzhou Medical University, Shandong, China
| | - Dongjun Wang
- College of Traditional Chinese Medicine, North China University of Science and Technology, Tangshan, 063000, China
| | - XiaoFei Liu
- School of Rehabilitation Medicine, Qilu Medical University, Shandong, 255300, China
| | - GuoHui Lu
- School of Rehabilitation Medicine, Qilu Medical University, Shandong, 255300, China.
| | - Hongwu Wang
- School of Health Sciences and Engineering, Tianjin University of Traditional Chinese Medicine, Tianjin, 301617, China.
| |
Collapse
|
4
|
Quistberg DA, Mooney SJ, Tasdizen T, Arbelaez P, Nguyen QC. Invited commentary: deep learning-methods to amplify epidemiologic data collection and analyses. Am J Epidemiol 2025; 194:322-326. [PMID: 39013794 PMCID: PMC11815488 DOI: 10.1093/aje/kwae215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 06/18/2024] [Accepted: 07/12/2024] [Indexed: 07/18/2024] Open
Abstract
Deep learning is a subfield of artificial intelligence and machine learning, based mostly on neural networks and often combined with attention algorithms, that has been used to detect and identify objects in text, audio, images, and video. Serghiou and Rough (Am J Epidemiol. 2023;192(11):1904-1916) presented a primer for epidemiologists on deep learning models. These models provide substantial opportunities for epidemiologists to expand and amplify their research in both data collection and analyses by increasing the geographic reach of studies, including more research subjects, and working with large or high-dimensional data. The tools for implementing deep learning methods are not as straightforward or ubiquitous for epidemiologists as traditional regression methods found in standard statistical software, but there are exciting opportunities for interdisciplinary collaboration with deep learning experts, just as epidemiologists have with statisticians, health care providers, urban planners, and other professionals. Despite the novelty of these methods, epidemiologic principles of assessing bias, study design, interpretation, and others still apply when implementing deep learning methods or assessing the findings of studies that have used them.
Collapse
Affiliation(s)
- D Alex Quistberg
- Urban Health Collaborative, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, United States
- Department of Environmental and Occupational Health, Dornsife School of Public Health, Drexel University, Philadelphia, PA 19104, United States
| | - Stephen J Mooney
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA 98195, United States
| | - Tolga Tasdizen
- Department of Electrical and Computer Engineering, College of Engineering, University of Utah, Salt Lake City, UT 84112, United States
- The Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT 84112, United States
| | - Pablo Arbelaez
- Department of Biomedical Engineering, Universidad de los Andes, Bogota 111711, Colombia
- Centro de Investigacion y Formacion en Inteligencia Artificial (CinfonIA), Universidad de los Andes, Bogota 111711, Colombia
| | - Quynh C Nguyen
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, MD 20742, United States
| |
Collapse
|
5
|
Rischmüller K, Caton V, Wolfien M, Ehlers L, van Welzen M, Brauer D, Sautter LF, Meyer F, Valentini L, Wiese ML, Aghdassi AA, Jaster R, Wolkenhauer O, Lamprecht G, Bej S. Identification of key factors for malnutrition diagnosis in chronic gastrointestinal diseases using machine learning underscores the importance of GLIM criteria as well as additional parameters. Front Nutr 2024; 11:1479501. [PMID: 39726873 PMCID: PMC11670747 DOI: 10.3389/fnut.2024.1479501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 11/26/2024] [Indexed: 12/28/2024] Open
Abstract
Introduction Disease-related malnutrition is common but often underdiagnosed in patients with chronic gastrointestinal diseases, such as liver cirrhosis, short bowel and intestinal insufficiency, and chronic pancreatitis. To improve malnutrition diagnosis in these patients, an evaluation of the current Global Leadership Initiative on Malnutrition (GLIM) diagnostic criteria, and possibly the implementation of additional criteria, is needed. Aim This study aimed to identify previously unknown and potentially specific features of malnutrition in patients with different chronic gastrointestinal diseases and to validate the relevance of the GLIM criteria for clinical practice using machine learning (ML). Methods Between 10/2018 and 09/2021, n = 314 patients and controls were prospectively enrolled in a cross-sectional study. A total of n = 230 features (anthropometric data, body composition, handgrip strength, gait speed, laboratory values, dietary habits, physical activity, mental health) were recorded. After data preprocessing (cleaning, feature exploration, imputation of missing data), n = 135 features were included in the ML analyses. Supervised ML models were used to classify malnutrition, and key features were identified using SHapley Additive exPlanations (SHAP). Results Supervised ML effectively classified malnourished versus non-malnourished patients and controls. Excluding the existing GLIM criteria and malnutrition risk reduced model performance (sensitivity -19%, specificity -8%, F1-score -10%), highlighting their significance. Besides some GLIM criteria (weight loss, reduced food intake, disease/inflammation), additional anthropometric (hip and upper arm circumference), body composition (phase angle, SMMI), and laboratory markers (albumin, pseudocholinesterase, prealbumin) were key features for malnutrition classification. Conclusion ML analysis confirmed the clinical applicability of the current GLIM criteria and identified additional features that may improve malnutrition diagnosis and understanding of the pathophysiology of malnutrition in chronic gastrointestinal diseases.
Collapse
Affiliation(s)
- Karen Rischmüller
- Division of Gastroenterology and Endocrinology, Department of Internal Medicine II, Rostock University Medical Center, Rostock, Germany
| | - Vanessa Caton
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
| | - Markus Wolfien
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
- Faculty of Medicine Carl Gustav Carus, Institute for Medical Informatics and Biometry, TUD Dresden University of Technology, Dresden, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Dresden, Germany
| | - Luise Ehlers
- Division of Gastroenterology and Endocrinology, Department of Internal Medicine II, Rostock University Medical Center, Rostock, Germany
| | - Matti van Welzen
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
| | - David Brauer
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
| | - Lea F. Sautter
- Division of Gastroenterology and Endocrinology, Department of Internal Medicine II, Rostock University Medical Center, Rostock, Germany
| | - Fatuma Meyer
- Department of Agriculture and Food Sciences, Neubrandenburg Institute of Evidence-Based Nutrition (NIED), University of Applied Sciences Neubrandenburg, Neubrandenburg, Germany
| | - Luzia Valentini
- Department of Agriculture and Food Sciences, Neubrandenburg Institute of Evidence-Based Nutrition (NIED), University of Applied Sciences Neubrandenburg, Neubrandenburg, Germany
| | - Mats L. Wiese
- Department of Medicine A, University Medicine Greifswald, Greifswald, Germany
| | - Ali A. Aghdassi
- Department of Medicine A, University Medicine Greifswald, Greifswald, Germany
| | - Robert Jaster
- Division of Gastroenterology and Endocrinology, Department of Internal Medicine II, Rostock University Medical Center, Rostock, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
- Leibniz-Institute for Food Systems Biology, Technical University of Munich, Freising, Germany
| | - Georg Lamprecht
- Division of Gastroenterology and Endocrinology, Department of Internal Medicine II, Rostock University Medical Center, Rostock, Germany
| | - Saptarshi Bej
- Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany
- Indian Institute of Science Education and Research, Thiruvananthapuram, India
| |
Collapse
|
6
|
Shen Y, Fei X, Xu J, Yang R, Ge Q, Wang Z. Performance analysis of markers for prostate cell typing in single-cell data. Genes Dis 2024; 11:101157. [PMID: 39100200 PMCID: PMC11295451 DOI: 10.1016/j.gendis.2023.101157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/10/2023] [Indexed: 08/06/2024] Open
Affiliation(s)
- Yanting Shen
- Department of Urology, Shanghai Ninth People's Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200011, China
- Department of Urology and Andrology, Gongli Hospital, The Second Military Medical University, Shanghai 200135, China
| | - Xiawei Fei
- Department of Urology, Qingpu Branch of Zhongshan Hospital Affiliated to Fudan University, Shanghai 201799, China
| | - Junyan Xu
- University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Rui Yang
- Wuxi Maternal and Child Health Hospital, Wuxi School of Medicine, Jiangnan University, Jiangsu 214002, China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Zhong Wang
- Department of Urology and Andrology, Gongli Hospital, The Second Military Medical University, Shanghai 200135, China
| |
Collapse
|
7
|
Srivastava P, Steuer A, Ferri F, Nicoli A, Schultz K, Bej S, Di Pizio A, Wolkenhauer O. Bitter peptide prediction using graph neural networks. J Cheminform 2024; 16:111. [PMID: 39375808 PMCID: PMC11459932 DOI: 10.1186/s13321-024-00909-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 09/22/2024] [Indexed: 10/09/2024] Open
Abstract
Bitter taste is an unpleasant taste modality that affects food consumption. Bitter peptides are generated during enzymatic processes that produce functional, bioactive protein hydrolysates or during the aging process of fermented products such as cheese, soybean protein, and wine. Understanding the underlying peptide sequences responsible for bitter taste can pave the way for more efficient identification of these peptides. This paper presents BitterPep-GCN, a feature-agnostic graph convolution network for bitter peptide prediction. The graph-based model learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. BitterPep-GCN was benchmarked using BTP640, a publicly available bitter peptide dataset. The latent peptide embeddings generated by the trained model were used to analyze the activity of sequence motifs responsible for the bitter taste of the peptides. Particularly, we calculated the activity for individual amino acids and dipeptide, tripeptide, and tetrapeptide sequence motifs present in the peptides. Our analyses pinpoint specific amino acids, such as F, G, P, and R, as well as sequence motifs, notably tripeptide and tetrapeptide motifs containing FF, as key bitter signatures in peptides. This work not only provides a new predictor of bitter taste for a more efficient identification of bitter peptides in various food products but also gives a hint into the molecular basis of bitterness.Scientific ContributionOur work provides the first application of Graph Neural Networks for the prediction of peptide bitter taste. The best-developed model, BitterPep-GCN, learns the embedding of amino acids in the bitter peptide sequences and uses mixed pooling for bitter classification. The embeddings were used to analyze the sequence motifs responsible for the bitter taste.
Collapse
Affiliation(s)
- Prashant Srivastava
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Alexandra Steuer
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Francesco Ferri
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Alessandro Nicoli
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany
| | - Kristian Schultz
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany
| | - Saptarshi Bej
- Indian Institute of Science Education and Research Thiruvananthapuram, Maruthamala P. O, Vithura, 695551, Kerala, India
| | - Antonella Di Pizio
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
- Professorship for Chemoinformatics and Protein Modelling, TUM School of Life Sciences, Technical University of Munich, 85354, Freising, Germany.
| | - Olaf Wolkenhauer
- Institute of Computer Science, University of Rostock, 18051, Rostock, Germany.
- Section III In Silico Biology & Machine Learning, Leibniz Institute for Food Systems Biology at the Technical University of Munich, 85354, Freising, Germany.
| |
Collapse
|
8
|
Wang Z, Whipp AM, Heinonen-Guzejev M, Foraster M, Júlvez J, Kaprio J. The association between urban land use and depressive symptoms in young adulthood: a FinnTwin12 cohort study. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2024; 34:770-779. [PMID: 38081942 PMCID: PMC11446816 DOI: 10.1038/s41370-023-00619-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 11/20/2023] [Accepted: 11/22/2023] [Indexed: 10/04/2024]
Abstract
BACKGROUND Depressive symptoms lead to a serious public health burden and are considerably affected by the environment. Land use, describing the urban living environment, influences mental health, but complex relationship assessment is rare. OBJECTIVE We aimed to examine the complicated association between urban land use and depressive symptoms among young adults with differential land use environments, by applying multiple models. METHODS We included 1804 individual twins from the FinnTwin12 cohort, living in urban areas in 2012. There were eight types of land use exposures in three buffer radii. The depressive symptoms were assessed through the General Behavior Inventory (GBI) in young adulthood (mean age: 24.1). First, K-means clustering was performed to distinguish participants with differential land use environments. Then, linear elastic net penalized regression and eXtreme Gradient Boosting (XGBoost) were used to reduce dimensions or prioritize for importance and examine the linear and nonlinear relationships. RESULTS Two clusters were identified: one is more typical of city centers and another of suburban areas. A heterogeneous pattern in results was detected from the linear elastic net penalized regression model among the overall sample and the two separated clusters. Agricultural residential land use in a 100 m buffer contributed to GBI most (coefficient: 0.097) in the "suburban" cluster among 11 selected exposures after adjustment with demographic covariates. In the "city center" cluster, none of the land use exposures was associated with GBI, even after further adjustment with social indicators. From the XGBoost models, we observed that ranks of the importance of land use exposures on GBI and their nonlinear relationships are also heterogeneous in the two clusters. IMPACT This study examined the complex relationship between urban land use and depressive symptoms among young adults in Finland. Based on the FinnTwin12 cohort, two distinct clusters of participants were identified with different urban land use environments at first. We then employed two pluralistic models, elastic net penalized regression and XGBoost, and revealed both linear and nonlinear relationships between urban land use and depressive symptoms, which also varied in the two clusters. The findings suggest that analyses, involving land use and the broader environmental profile, should consider aspects such as population heterogeneity and linearity for comprehensive assessment in the future.
Collapse
Affiliation(s)
- Zhiyang Wang
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Alyce M Whipp
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
| | | | - Maria Foraster
- PHAGEX Research Group, Blanquerna School of Health Science, Universitat Ramon Llull (URL), Barcelona, Spain
- ISGlobal-Instituto de Salud Global de Barcelona Campus MAR, Parc de Recerca Biomèdica de Barcelona (PRBB), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBEREsp), Madrid, Spain
| | - Jordi Júlvez
- ISGlobal-Instituto de Salud Global de Barcelona Campus MAR, Parc de Recerca Biomèdica de Barcelona (PRBB), Barcelona, Spain
- Clinical and Epidemiological Neuroscience (NeuroÈpia), Institut d'Investigació Sanitària Pere Virgili (IISPV), Reus, Spain
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland.
- Department of Public Health, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
9
|
Shao H, Liu X, Zong D, Song Q. Optimization of diabetes prediction methods based on combinatorial balancing algorithm. Nutr Diabetes 2024; 14:63. [PMID: 39143066 PMCID: PMC11324958 DOI: 10.1038/s41387-024-00324-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 07/22/2024] [Accepted: 07/26/2024] [Indexed: 08/16/2024] Open
Abstract
BACKGROUND Diabetes, as a significant disease affecting public health, requires early detection for effective management and intervention. However, imbalanced datasets pose a challenge to accurate diabetes prediction. This imbalance often results in models performing poorly in predicting minority classes, affecting overall diagnostic performance. OBJECTIVES To address this issue, this study employs a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Random Under-Sampling (RUS) for data balancing and uses Optuna for hyperparameter optimization of machine learning models. This approach aims to fill the gap in current research concerning data balancing and model optimization, thereby improving prediction accuracy and computational efficiency. METHODS First, the study uses SMOTE and RUS methods to process the imbalanced diabetes dataset, balancing the data distribution. Then, Optuna is utilized to optimize the hyperparameters of the LightGBM model to enhance its performance. During the experiment, the effectiveness of the proposed methods is evaluated by comparing the training results of the dataset before and after balancing. RESULTS The experimental results show that the enhanced LightGBM-Optuna model improves the accuracy from 97.07% to 97.11%, and the precision from 97.17% to 98.99%. The time required for a single search is only 2.5 seconds. These results demonstrate the superiority of the proposed method in handling imbalanced datasets and optimizing model performance. CONCLUSIONS The study indicates that combining SMOTE and RUS data balancing algorithms with Optuna for hyperparameter optimization can effectively enhance machine learning models, especially in dealing with imbalanced datasets for diabetes prediction.
Collapse
Affiliation(s)
- HuiZhi Shao
- Jinan Engineering Polytechnic, Ji-Nan, Shandong, China
- College of Intelligent Equipment, Shandong University of Science & Technology, Tai-an, Shandong, China
| | - Xiang Liu
- College of Intelligent Equipment, Shandong University of Science & Technology, Tai-an, Shandong, China
| | - DaShuai Zong
- College of Intelligent Equipment, Shandong University of Science & Technology, Tai-an, Shandong, China
| | - QingJun Song
- College of Intelligent Equipment, Shandong University of Science & Technology, Tai-an, Shandong, China.
| |
Collapse
|
10
|
Li B, Yang Z, Liu Y, Zhou X, Wang W, Gao Z, Yan L, Qin G, Tang X, Wan Q, Chen L, Luo Z, Ning G, Gu W, Mu Y. Clinical characteristics and complication risks in data-driven clusters among Chinese community diabetes populations. J Diabetes 2024; 16:e13596. [PMID: 39136497 PMCID: PMC11320751 DOI: 10.1111/1753-0407.13596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 04/23/2024] [Accepted: 06/02/2024] [Indexed: 08/16/2024] Open
Abstract
BACKGROUND Novel diabetes phenotypes were proposed by the Europeans through cluster analysis, but Chinese community diabetes populations might exhibit different characteristics. This study aims to explore the clinical characteristics of novel diabetes subgroups under data-driven analysis in Chinese community diabetes populations. METHODS We used K-means cluster analysis in 6369 newly diagnosed diabetic patients from eight centers of the REACTION (Risk Evaluation of cAncers in Chinese diabeTic Individuals) study. The cluster analysis was performed based on age, body mass index, glycosylated hemoglobin, homeostatic modeled insulin resistance index, and homeostatic modeled pancreatic β-cell functionality index. The clinical features were evaluated with the analysis of variance (ANOVA) and chi-square test. Logistic regression analysis was done to compare chronic kidney disease and cardiovascular disease risks between subgroups. RESULTS Overall, 2063 (32.39%), 658 (10.33%), 1769 (27.78%), and 1879 (29.50%) populations were assigned to severe obesity-related and insulin-resistant diabetes (SOIRD), severe insulin-deficient diabetes (SIDD), mild age-associated diabetes mellitus (MARD), and mild insulin-deficient diabetes (MIDD) subgroups, respectively. Individuals in the MIDD subgroup had a low risk burden equivalent to prediabetes, but with reduced insulin secretion. Individuals in the SOIRD subgroup were obese, had insulin resistance, and a high prevalence of fatty liver, tumors, family history of diabetes, and tumors. Individuals in the SIDD subgroup had severe insulin deficiency, the poorest glycemic control, and the highest prevalence of dyslipidemia and diabetic nephropathy. Individuals in MARD subgroup were the oldest, had moderate metabolic dysregulation and the highest risk of cardiovascular disease. CONCLUSION The data-driven approach to differentiating the status of new-onset diabetes in the Chinese community was feasible. Patients in different clusters presented different characteristics and risks of complications.
Collapse
Affiliation(s)
- Binqi Li
- School of MedicineNankai UniversityTianjinChina
- Department of Endocrinologythe First medical center of PLA General HospitalBeijingChina
| | | | - Yang Liu
- Department of Endocrinologythe First medical center of PLA General HospitalBeijingChina
- Department of Endocrinologythe eighth medical center of PLA General HospitalBeijingChina
| | - Xin Zhou
- Graduate SchoolChinese PLA General HospitalBeijingChina
- Department of Medical Oncologythe Fifth Medical Center of Chinese PLA General HospitalBeijingChina
- Department of GeriatricsThe Second Medical Center of Chinese PLA General HospitalBeijingChina
| | - Weiqing Wang
- Department of Endocrinology, Ruijin HospitalShanghai Jiao Tong University School of MedicineShanghaiChina
| | - Zhengnan Gao
- Department of EndocrinologyDalian Central HospitalDalianChina
| | - Li Yan
- Department of EndocrinologyZhongshan University Sun Yat‐sen Memorial HospitalGuangzhouChina
| | - Guijun Qin
- Department of EndocrinologyFirst Affiliated Hospital of Zhengzhou UniversityZhengzhouChina
| | - Xulei Tang
- Department of EndocrinologyFirst Hospital of Lanzhou UniversityLanzhouChina
| | - Qin Wan
- Department of EndocrinologySouthwest Medical University Affiliated HospitalLuzhouChina
| | - Lulu Chen
- Department of EndocrinologyWuhan Union Hospital, Huazhong University of Science and TechnologyWuhanChina
| | - Zuojie Luo
- Department of EndocrinologyFirst Affiliated Hospital of Guangxi Medical UniversityNanningChina
| | - Guang Ning
- Department of Endocrinology, Ruijin HospitalShanghai Jiao Tong University School of MedicineShanghaiChina
| | - Weijun Gu
- Department of Endocrinologythe First medical center of PLA General HospitalBeijingChina
| | - Yiming Mu
- School of MedicineNankai UniversityTianjinChina
- Department of Endocrinologythe First medical center of PLA General HospitalBeijingChina
- Department of Endocrinologythe eighth medical center of PLA General HospitalBeijingChina
| |
Collapse
|
11
|
Li L, Momma H, Chen H, Nawrin SS, Xu Y, Inada H, Nagatomi R. Dietary patterns associated with the incidence of hypertension among adult Japanese males: application of machine learning to a cohort study. Eur J Nutr 2024; 63:1293-1314. [PMID: 38403812 PMCID: PMC11139695 DOI: 10.1007/s00394-024-03342-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 01/30/2024] [Indexed: 02/27/2024]
Abstract
PURPOSE The previous studies that examined the effectiveness of unsupervised machine learning methods versus traditional methods in assessing dietary patterns and their association with incident hypertension showed contradictory results. Consequently, our aim is to explore the correlation between the incidence of hypertension and overall dietary patterns that were extracted using unsupervised machine learning techniques. METHODS Data were obtained from Japanese male participants enrolled in a prospective cohort study between August 2008 and August 2010. A final dataset of 447 male participants was used for analysis. Dimension reduction using uniform manifold approximation and projection (UMAP) and subsequent K-means clustering was used to derive dietary patterns. In addition, multivariable logistic regression was used to evaluate the association between dietary patterns and the incidence of hypertension. RESULTS We identified four dietary patterns: 'Low-protein/fiber High-sugar,' 'Dairy/vegetable-based,' 'Meat-based,' and 'Seafood and Alcohol.' Compared with 'Seafood and Alcohol' as a reference, the protective dietary patterns for hypertension were 'Dairy/vegetable-based' (OR 0.39, 95% CI 0.19-0.80, P = 0.013) and the 'Meat-based' (OR 0.37, 95% CI 0.16-0.86, P = 0.022) after adjusting for potential confounding factors, including age, body mass index, smoking, education, physical activity, dyslipidemia, and diabetes. An age-matched sensitivity analysis confirmed this finding. CONCLUSION This study finds that relative to the 'Seafood and Alcohol' pattern, the 'Dairy/vegetable-based' and 'Meat-based' dietary patterns are associated with a lower risk of hypertension among men.
Collapse
Affiliation(s)
- Longfei Li
- School of Physical Education and Health, Heze University, 2269 University Road, Mudan District, Heze, 274-015, Shandong, China
- Department of Medicine and Science in Sports and Exercise, Tohoku University Graduate School of Medicine, 2-1 Seiryo-Machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Haruki Momma
- Department of Medicine and Science in Sports and Exercise, Tohoku University Graduate School of Medicine, 2-1 Seiryo-Machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Haili Chen
- Department of Medicine and Science in Sports and Exercise, Tohoku University Graduate School of Medicine, 2-1 Seiryo-Machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Saida Salima Nawrin
- Division of Biomedical Engineering for Health & Welfare, Tohoku University Graduate School of Biomedical Engineering, 6-6-12, Aramaki Aza Aoba Aoba-ku, Sendai, Miyagi, 980-8579, Japan
| | - Yidan Xu
- Department of Medicine and Science in Sports and Exercise, Tohoku University Graduate School of Medicine, 2-1 Seiryo-Machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan
| | - Hitoshi Inada
- Department of Developmental Neuroscience, Tohoku University Graduate School of Medicine, 2-1 Seiryo-Machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
- Department of Biochemistry and Cellular Biology, National Center of Neurology and Psychiatry, 4-1-1 Ogawa-Higashi, Kodaira, Tokyo, 187-8502, Japan.
| | - Ryoichi Nagatomi
- Department of Medicine and Science in Sports and Exercise, Tohoku University Graduate School of Medicine, 2-1 Seiryo-Machi, Aoba-ku, Sendai, Miyagi, 980-8575, Japan.
- Division of Biomedical Engineering for Health & Welfare, Tohoku University Graduate School of Biomedical Engineering, 6-6-12, Aramaki Aza Aoba Aoba-ku, Sendai, Miyagi, 980-8579, Japan.
| |
Collapse
|
12
|
Romero-Rosales JA, Aragones DG, Escribano-Serrano J, Borrachero MG, Doña AM, Macías López FJ, Santos Mata MA, Jiménez IN, Casamitjana Zamora MJ, Serrano H, Belmonte-Beitia J, Durán MR, Calvo GF. Integrated modeling of labile and glycated hemoglobin with glucose for enhanced diabetes detection and short-term monitoring. iScience 2024; 27:109369. [PMID: 38500833 PMCID: PMC10946329 DOI: 10.1016/j.isci.2024.109369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 02/16/2024] [Accepted: 02/26/2024] [Indexed: 03/20/2024] Open
Abstract
Metabolic biomarkers, particularly glycated hemoglobin and fasting plasma glucose, are pivotal in the diagnosis and control of diabetes mellitus. Despite their importance, they exhibit limitations in assessing short-term glucose variations. In this study, we propose labile hemoglobin as an additional biomarker, providing insightful perspectives into these fluctuations. By utilizing datasets from 40,652 retrospective general participants and conducting glucose tolerance tests on 60 prospective pediatric subjects, we explored the relationship between plasma glucose and labile hemoglobin. A mathematical model was developed to encapsulate short-term glucose kinetics in the pediatric group. Applying dimensionality reduction techniques, we successfully identified participant subclusters, facilitating the differentiation between diabetic and non-diabetic individuals. Intriguingly, by integrating labile hemoglobin measurements with plasma glucose values, we were able to predict the likelihood of diabetes in pediatric subjects, underscoring the potential of labile hemoglobin as a significant glycemic biomarker for diabetes research.
Collapse
Affiliation(s)
- José Antonio Romero-Rosales
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), University of Castilla-La Mancha, Ciudad Real, Spain
| | - David G. Aragones
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), University of Castilla-La Mancha, Ciudad Real, Spain
| | | | | | - Alfredo Michán Doña
- UGC Internal Medicine, University Hospital of Jerez and Department of Medicine, University of Cádiz, Cádiz, Spain
- Biomedical Research and Innovation Institute of Cadiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
| | | | | | | | | | - Hélia Serrano
- Department of Mathematics, Faculty of Chemical Sciences and Technologies, University of Castilla-La Mancha, Ciudad Real, Spain
| | - Juan Belmonte-Beitia
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), University of Castilla-La Mancha, Ciudad Real, Spain
| | - María Rosa Durán
- Biomedical Research and Innovation Institute of Cadiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
- Department of Mathematics, University of Cádiz, Puerto Real, Cádiz, Spain
| | - Gabriel F. Calvo
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), University of Castilla-La Mancha, Ciudad Real, Spain
| |
Collapse
|
13
|
Zhou M, Li Y. Spatial distribution and source identification of potentially toxic elements in Yellow River Delta soils, China: An interpretable machine-learning approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 912:169092. [PMID: 38056655 DOI: 10.1016/j.scitotenv.2023.169092] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 11/15/2023] [Accepted: 12/02/2023] [Indexed: 12/08/2023]
Abstract
Identifying the driving factors and quantifying the sources of potentially toxic elements (PTEs) are essential for protecting the ecological environment of the Yellow River Delta. In this study, data from 201 surface soil samples and 16 environmental variables were collected, and the random forest (RF) and Shapley additive explanations (SHAP) methods were then combined to explore the key factors affecting soil PTEs. An innovative t-distributed random neighbor embedding-RF-SHAP model was then constructed, based on the absolute principal component score and multivariate linear regression model, to quantitatively determine PTE sources. Although average PTE concentrations did not exceed the risk control values, PTE distributions exhibited significant differences. It was found that sodium, soil organic matter, and phosphorus contents were the three most important factors affecting PTEs, and human activities and natural environmental factors both influence PTE contents by altering the soil properties. The proposed model successfully determined PTE sources in the soil, outperforming the original linear regression model with a significantly lower RMSE. Source analysis revealed that the parent material was the main contributor to soil PTEs, accounting for more than half of the total PTE content. Industrial and agricultural activities also contributed to an increase in soil PTEs, with average contributions of 19.91 % and 17.44 %, respectively. Unknown sources accounted for 10.83 % of the total PTE content. Thus, the proposed model provides innovative perspectives on source parsing. These findings provide valuable scientific insights for policymakers seeking to develop effective environmental protection measures and improve the quality of saline-alkali land in the Yellow River Delta.
Collapse
Affiliation(s)
- Mengge Zhou
- Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yonghua Li
- Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
14
|
Yoshida M, Murakami T, Kawai K, Nishikawa K, Ishihara K, Mori Y, Tsujikawa A. Inference of Capillary Nonperfusion Progression on Widefield OCT Angiography in Diabetic Retinopathy. Invest Ophthalmol Vis Sci 2023; 64:24. [PMID: 37847225 PMCID: PMC10584022 DOI: 10.1167/iovs.64.13.24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 09/26/2023] [Indexed: 10/18/2023] Open
Abstract
Purpose The purpose of this study was to explore the spatial patterns of the nonperfusion areas (NPAs) on widefield optical coherence tomography angiography (OCTA) images in diabetic retinopathy (DR) and to investigate their associations with NPA progression and DR severity. Methods We prospectively enrolled 201 eyes from 158 patients with DR. Widefield images were obtained using a swept-source OCTA device (Xephilio OCT-S1), followed by the creation of 20-mm (1614 pixels) en face images. Nonperfusion squares (NPSs) were defined as 10 × 10-pixel squares without retinal vessels. Eyes with high-dimensional spatial data were mapped onto a two-dimensional space using the uniform manifold approximation and projection algorithm and divided by clustering. The patterns of NPA distribution were statistically compared between clusters. Results All eyes were mapped onto a two-dimensional space and divided into six clusters based on the similarity of NPA distribution. Eyes in clusters 1 and 2 had minimal and small NPAs, respectively. Eyes in clusters 3 and 4 exhibited NPAs in the temporal and inferotemporal regions, respectively. Eyes in cluster 5 displayed NPAs in both superonasal and inferonasal areas. The unique NPA distributions in each cluster encouraged us to propose eight possible pathways of NPA progression. DR severity was not equal between clusters (P < 0.001), for example, 8 (15.7%) of 51 eyes and 15 (65.2%) of 23 eyes had PDR in clusters 1 and 5, respectively. Conclusions Dimensionality reduction and subsequent clustering based on the NPA distribution on widefield OCTA enabled the inference of possible NPA progression in DR.
Collapse
Affiliation(s)
- Miyo Yoshida
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Tomoaki Murakami
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Kentaro Kawai
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Keiichi Nishikawa
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Kenji Ishihara
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Yuki Mori
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Akitaka Tsujikawa
- Department of Ophthalmology and Visual Sciences, Kyoto University Graduate School of Medicine, Kyoto, Japan
| |
Collapse
|
15
|
Liu P, Wang Z, Liu N, Peres MA. A scoping review of the clinical application of machine learning in data-driven population segmentation analysis. J Am Med Inform Assoc 2023; 30:1573-1582. [PMID: 37369006 PMCID: PMC10436153 DOI: 10.1093/jamia/ocad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 06/08/2023] [Accepted: 06/16/2023] [Indexed: 06/29/2023] Open
Abstract
OBJECTIVE Data-driven population segmentation is commonly used in clinical settings to separate the heterogeneous population into multiple relatively homogenous groups with similar healthcare features. In recent years, machine learning (ML) based segmentation algorithms have garnered interest for their potential to speed up and improve algorithm development across many phenotypes and healthcare situations. This study evaluates ML-based segmentation with respect to (1) the populations applied, (2) the segmentation details, and (3) the outcome evaluations. MATERIALS AND METHODS MEDLINE, Embase, Web of Science, and Scopus were used following the PRISMA-ScR criteria. Peer-reviewed studies in the English language that used data-driven population segmentation analysis on structured data from January 2000 to October 2022 were included. RESULTS We identified 6077 articles and included 79 for the final analysis. Data-driven population segmentation analysis was employed in various clinical settings. K-means clustering is the most prevalent unsupervised ML paradigm. The most common settings were healthcare institutions. The most common targeted population was the general population. DISCUSSION Although all the studies did internal validation, only 11 papers (13.9%) did external validation, and 23 papers (29.1%) conducted methods comparison. The existing papers discussed little validating the robustness of ML modeling. CONCLUSION Existing ML applications on population segmentation need more evaluations regarding giving tailored, efficient integrated healthcare solutions compared to traditional segmentation analysis. Future ML applications in the field should emphasize methods' comparisons and external validation and investigate approaches to evaluate individual consistency using different methods.
Collapse
Affiliation(s)
- Pinyan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Ziwen Wang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- Institute of Data Science, National University of Singapore, Singapore, Singapore
| | - Marco Aurélio Peres
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore, Singapore
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore, Singapore
| |
Collapse
|
16
|
Deniz-Garcia A, Fabelo H, Rodriguez-Almeida AJ, Zamora-Zamorano G, Castro-Fernandez M, Alberiche Ruano MDP, Solvoll T, Granja C, Schopf TR, Callico GM, Soguero-Ruiz C, Wägner AM. Quality, Usability, and Effectiveness of mHealth Apps and the Role of Artificial Intelligence: Current Scenario and Challenges. J Med Internet Res 2023; 25:e44030. [PMID: 37140973 PMCID: PMC10196903 DOI: 10.2196/44030] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 02/19/2023] [Accepted: 03/10/2023] [Indexed: 03/12/2023] Open
Abstract
The use of artificial intelligence (AI) and big data in medicine has increased in recent years. Indeed, the use of AI in mobile health (mHealth) apps could considerably assist both individuals and health care professionals in the prevention and management of chronic diseases, in a person-centered manner. Nonetheless, there are several challenges that must be overcome to provide high-quality, usable, and effective mHealth apps. Here, we review the rationale and guidelines for the implementation of mHealth apps and the challenges regarding quality, usability, and user engagement and behavior change, with a special focus on the prevention and management of noncommunicable diseases. We suggest that a cocreation-based framework is the best method to address these challenges. Finally, we describe the current and future roles of AI in improving personalized medicine and provide recommendations for developing AI-based mHealth apps. We conclude that the implementation of AI and mHealth apps for routine clinical practice and remote health care will not be feasible until we overcome the main challenges regarding data privacy and security, quality assessment, and the reproducibility and uncertainty of AI results. Moreover, there is a lack of both standardized methods to measure the clinical outcomes of mHealth apps and techniques to encourage user engagement and behavior changes in the long term. We expect that in the near future, these obstacles will be overcome and that the ongoing European project, Watching the risk factors (WARIFA), will provide considerable advances in the implementation of AI-based mHealth apps for disease prevention and health promotion.
Collapse
Affiliation(s)
- Alejandro Deniz-Garcia
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
| | - Himar Fabelo
- Complejo Hospitalario Universitario Insular - Materno Infantil, Fundación Canaria Instituto de Investigación Sanitaria de Canarias, Las Palmas de Gran Canaria, Spain
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Antonio J Rodriguez-Almeida
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Garlene Zamora-Zamorano
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
- Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Maria Castro-Fernandez
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Maria Del Pino Alberiche Ruano
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
- Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Terje Solvoll
- Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway
- Faculty of Nursing and Health Sciences, Nord University, Bodø, Norway
| | - Conceição Granja
- Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway
- Faculty of Nursing and Health Sciences, Nord University, Bodø, Norway
| | - Thomas Roger Schopf
- Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway
| | - Gustavo M Callico
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Cristina Soguero-Ruiz
- Departamento de Teoría de la Señal y Comunicaciones y Sistemas Telemáticos y Computación, Universidad Rey Juan Carlos, Madrid, Spain
| | - Ana M Wägner
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
- Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| |
Collapse
|
17
|
Liu H, Feng C, Yang T, Zhang Z, Wei X, Sun Y, Zhang L, Li W, Yu D. Combined metabolomics and gut microbiome to investigate the effects and mechanisms of Yuquan Pill on type 2 diabetes in rats. J Chromatogr B Analyt Technol Biomed Life Sci 2023; 1222:123713. [PMID: 37059008 DOI: 10.1016/j.jchromb.2023.123713] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 03/16/2023] [Accepted: 04/03/2023] [Indexed: 04/09/2023]
Abstract
Yuquan Pill (YQP) is a traditional Chinese medicine (TCM) for the treatment of type 2 diabetes (T2DM) in China for many years, and has a beneficial clinical effect. In this study, the antidiabetic mechanism of YQP was investigated for the first time from the perspective of metabolomics and intestinal microbiota. After 28 days of high-fat feeding, rats were injected intraperitoneally with streptozotocin (STZ, 35 mg/kg) followed by a single oral administration of YQP 2.16 g/kg and metformin 200 mg/kg for 5 weeks. The results showed that YQP was effectively improved insulin resistance and alleviated hyperglycemia and hyperlipidemia associated with T2DM. YQP was found to regulate metabolism and gut microbiota in T2DM rats using untargeted metabolomics and gut microbiota integration. Forty-one metabolites and five metabolic pathways were identified, including Ascorbate and aldarate metabolism, Nicotinate and nicotinamide metabolism, Galactose metabolism, Pentose phosphate pathway and Tyrosine metabolism. YQP can regulate T2DM-induced dysbacteriosis by modulating the abundance of Firmicutes, Bacteroidetes, Ruminococcus, Lactobacillus. The restorative effects of YQP in rats with T2DM have been confirmed and provide a scientific basis for the clinical treatment of diabetic patients.
Collapse
|
18
|
Hahn W, Schütte K, Schultz K, Wolkenhauer O, Sedlmayr M, Schuler U, Eichler M, Bej S, Wolfien M. Contribution of Synthetic Data Generation towards an Improved Patient Stratification in Palliative Care. J Pers Med 2022; 12:1278. [PMID: 36013227 PMCID: PMC9409663 DOI: 10.3390/jpm12081278] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 07/29/2022] [Accepted: 08/01/2022] [Indexed: 11/23/2022] Open
Abstract
AI model development for synthetic data generation to improve Machine Learning (ML) methodologies is an integral part of research in Computer Science and is currently being transferred to related medical fields, such as Systems Medicine and Medical Informatics. In general, the idea of personalized decision-making support based on patient data has driven the motivation of researchers in the medical domain for more than a decade, but the overall sparsity and scarcity of data are still major limitations. This is in contrast to currently applied technology that allows us to generate and analyze patient data in diverse forms, such as tabular data on health records, medical images, genomics data, or even audio and video. One solution arising to overcome these data limitations in relation to medical records is the synthetic generation of tabular data based on real world data. Consequently, ML-assisted decision-support can be interpreted more conveniently, using more relevant patient data at hand. At a methodological level, several state-of-the-art ML algorithms generate and derive decisions from such data. However, there remain key issues that hinder a broad practical implementation in real-life clinical settings. In this review, we will give for the first time insights towards current perspectives and potential impacts of using synthetic data generation in palliative care screening because it is a challenging prime example of highly individualized, sparsely available patient information. Taken together, the reader will obtain initial starting points and suitable solutions relevant for generating and using synthetic data for ML-based screenings in palliative care and beyond.
Collapse
Affiliation(s)
- Waldemar Hahn
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Katharina Schütte
- University Palliative Center, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Kristian Schultz
- Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany
- Leibniz-Institute for Food Systems Biology, Technical University Munich, 85354 Freising, Germany
- Stellenbosch Institute of Advanced Study, Wallenberg Research Centre, Stellenbosch University, Stellenbosch 7602, South Africa
| | - Martin Sedlmayr
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Ulrich Schuler
- University Palliative Center, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| | - Martin Eichler
- National Center for Tumor Diseases Dresden (NCT/UCC), Fetscherstraße 74, 01307 Dresden, Germany
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
- Faculty of Medicine, University Hospital Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
- Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Bautzner Landstraße 400, 01328 Dresden, Germany
| | - Saptarshi Bej
- Department of Systems Biology and Bioinformatics, University of Rostock, Universitätsplatz 1, 18051 Rostock, Germany
- Leibniz-Institute for Food Systems Biology, Technical University Munich, 85354 Freising, Germany
| | - Markus Wolfien
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, 01307 Dresden, Germany
| |
Collapse
|