1
|
Priyadharshini M, Murugesh V, Samkumar GV, Chowdhury S, Panigrahi A, Pati A, Sahu B. A population based optimization of convolutional neural networks for chronic kidney disease prediction. Sci Rep 2025; 15:14500. [PMID: 40281257 PMCID: PMC12032355 DOI: 10.1038/s41598-025-99270-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2024] [Accepted: 04/18/2025] [Indexed: 04/29/2025] Open
Abstract
Chronic kidney disease (CKD) is a global public health concern, and the timely detection of the disease is priceless. Most of the classical machine learning models have the major drawbacks of being unsophisticated, non-robust, and non-accurate. This research work is therefore seeking to introduce OptiNet-CKD, a paradigm based on a DNN that has been integrated with a developed population optimization algorithm (POA) for CKD prediction optimization. POA is unlike gradient-based optimization methods in that it uses an initialized population of networks and perturbs their weight values to provide a broader exploration of the solution space. The model is more robust and less likely to overfit, and the predictions are likely to be more accurate since this approach helps to avoid the local minima problem suffered by gradient-based optimizers. To preprocess it for DNN learning, a CKD dataset with 400 records containing numerical and categorical features was imputed for missing data and scaled for its features. The model was evaluated using performance metrics such as accuracy, precision, recall, F1-score, and ROC AUC. OptiNet-CKD achieved 100% accuracy, 1.0 precision, 1.0 recall, 1.0 F1-score, and 1.0 ROC-AUC from traditional models (logistic regression, decision trees) and even fundamental deep neural networks. Results show that OptiNet-CKD is a reliable and robust prediction method for CKD, with more substantial generalization and performance than the existing methods. A combination of DNN and POA constitutes a promising approach for medical data analysis, especially for the diagnosis of CKD. POA expands the solution space, helping to expunge the model from falling into local minima and giving the model increased power in generalizing complicated medical data. Based on the simplicity of the algorithm, together with the structured formula and the extractions made in the preprocessing step, this framework can be extended to other medical conditions with similar data complexities, providing a potent tool for improving diagnostic accuracy in healthcare.
Collapse
Affiliation(s)
- M Priyadharshini
- Department of Computer Science and Engineering, Faculty of Science and Technology (IcfaiTech), The ICFAI Foundation for Higher Education, Hyderabad, Telangana, 501203, India
| | - V Murugesh
- Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andra Pradesh, India
| | - G V Samkumar
- Department of CSE, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andra Pradesh, India
| | - Subrata Chowdhury
- Department of Computer Science and Engineering, Sreenivasa Institute of Technology and Management Studies, Chittoor, Andra Pradesh, India
| | - Amrutanshu Panigrahi
- Department of CSE, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Abhilash Pati
- Department of CSE, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India.
| | - Bibhuprasad Sahu
- Department of Information Technology, Vardhaman College of Engineering (Autonomous), Hyderabad, Telangana, India
| |
Collapse
|
2
|
Oneto L, Chicco D. Eight quick tips for biologically and medically informed machine learning. PLoS Comput Biol 2025; 21:e1012711. [PMID: 39787089 PMCID: PMC11717244 DOI: 10.1371/journal.pcbi.1012711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2025] Open
Abstract
Machine learning has become a powerful tool for computational analysis in the biomedical sciences, with its effectiveness significantly enhanced by integrating domain-specific knowledge. This integration has give rise to informed machine learning, in contrast to studies that lack domain knowledge and treat all variables equally (uninformed machine learning). While the application of informed machine learning to bioinformatics and health informatics datasets has become more seamless, the likelihood of errors has also increased. To address this drawback, we present eight guidelines outlining best practices for employing informed machine learning methods in biomedical sciences. These quick tips offer recommendations on various aspects of informed machine learning analysis, aiming to assist researchers in generating more robust, explainable, and dependable results. Even if we originally crafted these eight simple suggestions for novices, we believe they are deemed relevant for expert computational researchers as well.
Collapse
Affiliation(s)
- Luca Oneto
- Dipartimento di Informatica Bioingegneria Robotica e Ingegneria dei Sistemi, Università di Genova, Genoa, Italy
| | - Davide Chicco
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Bahrami P, Tanbakuchi D, Afzalaghaee M, Ghayour-Mobarhan M, Esmaily H. Development of risk models for early detection and prediction of chronic kidney disease in clinical settings. Sci Rep 2024; 14:32136. [PMID: 39739001 PMCID: PMC11685774 DOI: 10.1038/s41598-024-83973-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 12/18/2024] [Indexed: 01/02/2025] Open
Abstract
Chronic kidney disease (CKD) imposes a high burden with high mortality and morbidity rates. Early detection of CKD is imperative in preventing the adverse outcomes attributed to the later stages. Therefore, this study aims to utilize machine learning techniques to predict CKD at early stages. This study uses data obtained from a large longitudinal cohort study. The features include patients' sociodemographic, anthropometric, and laboratory tests that are mostly associated with CKD based on national and international studies. Missing data and outliers were deleted using listwise and interquartile range techniques, respectively. Data initially remained imbalanced to investigate the ability of models to work on imbalanced datasets. Stratified K-folds cross-validation, a robust approach that performs well on imbalanced data, was further performed to enhance the splitting. Interestingly, an interaction was found between age and gender where contrasting data was generated, therefore, to avoid this interaction gender-specific algorithms were developed. Four main algorithms and four algorithms using the stratified K-folds cross-validation technique, consisting of gender-specific Random Forest and feedforward Neural Networks were developed using the preprocessed data of 6855 participants. The RF model in women exhibited the highest AUC of 0.90 followed closely by 0.89 in their NN model. Both models constructed for men yielded an AUC of 0.88. Sensitivity scores were higher in men compared to women. Models demonstrated subpar results regarding specificity, however, the high precision and F1 scores, make the models extremely valuable in a clinical setting to accurately identify CKD cases while minimizing false positive diagnoses. Moreover, the results from stratified K-fold cross-validation indicated that the NN models were more sensitive to the imbalanced dataset and demonstrated a marked increase in performance, particularly specificity, after this approach. These data offer valuable insights for the development of future risk stratification models for CKD.
Collapse
Affiliation(s)
- Pegah Bahrami
- School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Davoud Tanbakuchi
- School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Monavar Afzalaghaee
- Department of Statistics and Epidemiology, Faculty of Health, Mashhad University of Medical Sciences, Mashhad, Iran
- Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Ghayour-Mobarhan
- International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Habibollah Esmaily
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran.
- Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
4
|
Abhadiomhen SE, Nzeakor EO, Oyibo K. Health Risk Assessment Using Machine Learning: Systematic Review. ELECTRONICS 2024; 13:4405. [DOI: 10.3390/electronics13224405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2024]
Abstract
According to the World Health Organization, chronic illnesses account for over 70% of deaths globally, underscoring the need for effective health risk assessment (HRA). While machine learning (ML) has shown potential in enhancing HRA, no systematic review has explored its application in general health risk assessments. Existing reviews typically focus on specific conditions. This paper reviews published articles that utilize ML for HRA, and it aims to identify the model development methods. A systematic review following Tranfield et al.’s three-stage approach was conducted, and it adhered to the PRISMA protocol. The literature was sourced from five databases, including PubMed. Of the included articles, 42% (11/26) addressed general health risks. Secondary data sources were most common (14/26, 53.85%), while primary data were used in eleven studies, with nine (81.81%) using data from a specific population. Random forest was the most popular algorithm, which was used in nine studies (34.62%). Notably, twelve studies implemented multiple algorithms, while seven studies incorporated model interpretability techniques. Although these studies have shown promise in addressing digital health inequities, more research is needed to include diverse sample populations, particularly from underserved communities, to enhance the generalizability of existing models. Furthermore, model interpretability should be prioritized to ensure transparent, trustworthy, and broadly applicable healthcare solutions.
Collapse
Affiliation(s)
- Stanley Ebhohimhen Abhadiomhen
- Department of Electrical Engineering and Computer Science, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada
- Department of Computer Science, University of Nigeria, Nsukka 400241, Nigeria
| | - Emmanuel Onyekachukwu Nzeakor
- Department of Electrical Engineering and Computer Science, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada
| | - Kiemute Oyibo
- Department of Electrical Engineering and Computer Science, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada
| |
Collapse
|
5
|
Chang CY, Peng CH, Chen FY, Huang LY, Kuo CH, Chu TW, Liang YJ. The risk factors determined by four machine learning methods for the change of difference of bone mineral density in post-menopausal women after three years follow-up. Sci Rep 2024; 14:23234. [PMID: 39369003 PMCID: PMC11455928 DOI: 10.1038/s41598-024-73799-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 09/20/2024] [Indexed: 10/07/2024] Open
Abstract
The prevalence of osteoporosis has drastically increased recently. It is not only the most frequent but is also a major global public health problem due to its high morbidity. There are many risk factors associated with osteoporosis were identified. However, most studies have used the traditional multiple linear regression (MLR) to explore their relationships. Recently, machine learning (Mach-L) has become a new modality for data analysis because it enables machine to learn from past data or experiences without being explicitly programmed and could capture nonlinear relationships better. These methods have the potential to outperform conventional MLR in disease prediction. In the present study, we enrolled a Chinese post-menopause cohort followed up for 4 years. The difference of T-score (δ-T score) was the dependent variable. Information such as demographic, biochemistry and life styles were the independent variables. Our goals were: (1) Compare the prediction accuracy between Mach-L and traditional MLR for δ-T score. (2) Rank the importance of risk factors (independent variables) for prediction of δ T-score. Totally, there were 1698 postmenopausal women were enrolled from MJ Health Database. Four different Mach-L methods namely, Random forest (RF), eXtreme Gradient Boosting (XGBoost), Naïve Bayes (NB), and stochastic gradient boosting (SGB), to construct predictive models for predicting δ-BMD after four years follow-up. The dataset was then randomly divided into an 80% training dataset for model building and a 20% testing dataset for model testing. A 10-fold cross-validation technique for hyperparameter tuning was used. The model with the lowest root mean square error for the validation dataset was viewed as the best model for each ML method. The averaged metrics of the RF, SGB, NB, and XGBoost models were used to compare the model performance of the benchmark MLR model that used the same training and testing dataset as the Mach-L methods. We defined that the priority demonstrated in each model ranked 1 as the most critical risk factor and 22 as the last selected risk factor. For Pearson correlation, age, education, BMI, HDL-C, and TSH were positively and plasma calcium level, and baseline T-score were negatively correlated with δ-T score. All four Mach-L methods yielded lower prediction errors than the MLR method and were all convincing Mach-L models. From our results, it could be noted that education level is the most important factor for δ-T Score, followed by DBP, smoking, SBP, UA, age, and LDL-C. All four Mach-L outperformed traditional MLR. By using Mach-L, the most important six risk factors were selected which are, from the most important to the least: DBP, SBP, UA, education level, TG and sleeping hour. δ T score was positively related to SBP, education level, UA and TG and negatively related to DBP and sleeping hour in postmenopausal Chinese women.
Collapse
Affiliation(s)
- Ching-Yao Chang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, School of Medicine, Fu-Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei City, Taiwan, ROC
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Chief Executive Officer's Office, MJ Health Research Foundation, Taipei, 114, Taiwan
| | - Yao-Jen Liang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City, Taiwan, ROC.
- Department and Institute of Life Science, Fu Jen Catholic University, New Taipei City, Taiwan, ROC.
| |
Collapse
|
6
|
Alhuwaydi AM. Exploring the Role of Artificial Intelligence in Mental Healthcare: Current Trends and Future Directions - A Narrative Review for a Comprehensive Insight. Risk Manag Healthc Policy 2024; 17:1339-1348. [PMID: 38799612 PMCID: PMC11127648 DOI: 10.2147/rmhp.s461562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 05/10/2024] [Indexed: 05/29/2024] Open
Abstract
Mental health is an essential component of the health and well-being of a person and community, and it is critical for the individual, society, and socio-economic development of any country. Mental healthcare is currently in the health sector transformation era, with emerging technologies such as artificial intelligence (AI) reshaping the screening, diagnosis, and treatment modalities of psychiatric illnesses. The present narrative review is aimed at discussing the current landscape and the role of AI in mental healthcare, including screening, diagnosis, and treatment. Furthermore, this review attempted to highlight the key challenges, limitations, and prospects of AI in providing mental healthcare based on existing works of literature. The literature search for this narrative review was obtained from PubMed, Saudi Digital Library (SDL), Google Scholar, Web of Science, and IEEE Xplore, and we included only English-language articles published in the last five years. Keywords used in combination with Boolean operators ("AND" and "OR") were the following: "Artificial intelligence", "Machine learning", Deep learning", "Early diagnosis", "Treatment", "interventions", "ethical consideration", and "mental Healthcare". Our literature review revealed that, equipped with predictive analytics capabilities, AI can improve treatment planning by predicting an individual's response to various interventions. Predictive analytics, which uses historical data to formulate preventative interventions, aligns with the move toward individualized and preventive mental healthcare. In the screening and diagnostic domains, a subset of AI, such as machine learning and deep learning, has been proven to analyze various mental health data sets and predict the patterns associated with various mental health problems. However, limited studies have evaluated the collaboration between healthcare professionals and AI in delivering mental healthcare, as these sensitive problems require empathy, human connections, and holistic, personalized, and multidisciplinary approaches. Ethical issues, cybersecurity, a lack of data analytics diversity, cultural sensitivity, and language barriers remain concerns for implementing this futuristic approach in mental healthcare. Considering these sensitive problems require empathy, human connections, and holistic, personalized, and multidisciplinary approaches, it is imperative to explore these aspects. Therefore, future comparative trials with larger sample sizes and data sets are warranted to evaluate different AI models used in mental healthcare across regions to fill the existing knowledge gaps.
Collapse
Affiliation(s)
- Ahmed M Alhuwaydi
- Department of Internal Medicine, Division of Psychiatry, College of Medicine, Jouf University, Sakaka, Saudi Arabia
| |
Collapse
|
7
|
Chen MS, Liu TC, Jhou MJ, Yang CT, Lu CJ. Analyzing Longitudinal Health Screening Data with Feature Ensemble and Machine Learning Techniques: Investigating Diagnostic Risk Factors of Metabolic Syndrome for Chronic Kidney Disease Stages 3a to 3b. Diagnostics (Basel) 2024; 14:825. [PMID: 38667472 PMCID: PMC11048899 DOI: 10.3390/diagnostics14080825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/12/2024] [Accepted: 04/13/2024] [Indexed: 04/28/2024] Open
Abstract
Longitudinal data, while often limited, contain valuable insights into features impacting clinical outcomes. To predict the progression of chronic kidney disease (CKD) in patients with metabolic syndrome, particularly those transitioning from stage 3a to 3b, where data are scarce, utilizing feature ensemble techniques can be advantageous. It can effectively identify crucial risk factors, influencing CKD progression, thereby enhancing model performance. Machine learning (ML) methods have gained popularity due to their ability to perform feature selection and handle complex feature interactions more effectively than traditional approaches. However, different ML methods yield varying feature importance information. This study proposes a multiphase hybrid risk factor evaluation scheme to consider the diverse feature information generated by ML methods. The scheme incorporates variable ensemble rules (VERs) to combine feature importance information, thereby aiding in the identification of important features influencing CKD progression and supporting clinical decision making. In the proposed scheme, we employ six ML models-Lasso, RF, MARS, LightGBM, XGBoost, and CatBoost-each renowned for its distinct feature selection mechanisms and widespread usage in clinical studies. By implementing our proposed scheme, thirteen features affecting CKD progression are identified, and a promising AUC score of 0.883 can be achieved when constructing a model with them.
Collapse
Affiliation(s)
- Ming-Shu Chen
- Department of Healthcare Administration, College of Healthcare & Management, Asia Eastern University of Science and Technology, New Taipei City 220, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Chih-Te Yang
- Department of Business Administration, Tamkang University, New Taipei City 251, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242, Taiwan
| |
Collapse
|
8
|
Saito H, Yoshimura H, Tanaka K, Kimura H, Watanabe K, Tsubokura M, Ejiri H, Zhao T, Ozaki A, Kazama S, Shimabukuro M, Asahi K, Watanabe T, Kazama JJ. Predicting CKD progression using time-series clustering and light gradient boosting machines. Sci Rep 2024; 14:1723. [PMID: 38242985 PMCID: PMC10798962 DOI: 10.1038/s41598-024-52251-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 01/16/2024] [Indexed: 01/21/2024] Open
Abstract
Predicting the transition of kidney function in chronic kidney disease is difficult as specific symptoms are lacking and often overlooked, and progress occurs due to complicating factors. In this study, we applied time-series cluster analysis and a light gradient boosting machine to predict the trajectories of kidney function in non-dialysis dependent chronic kidney disease patients with baseline estimated glomerular filtration rate (GFR) ≥ 45 mL/min/1.73 m2. Based on 5-year changes in estimated GFR, participants were stratified into groups with similar trajectories by cluster analysis. Next, we applied the light gradient boosting machine algorithm and Shapley addictive explanation to develop a prediction model for clusters and identify important parameters for prediction. Data from 780 participants were available for analysis. Participants were classified into five classes (Class 1: n = 78, mean [± standard deviation] estimated GFR 100 ± 19.3 mL/min/1.73 m2; Class 2: n = 176, 76.0 ± 9.3 mL/min/1.73 m2; Class 3: n = 191, 59.8 ± 5.9 mL/min/1.73 m2; Class 4: n = 261, 52.7 ± 4.6 mL/min/1.73 m2; and Class 5: n = 74, 53.5 ± 12.0 mL/min/1.73 m2). Declines in estimated GFR were 8.9% in Class 1, 12.2% in Class 2, 4.9% in Class 3, 12.0% in Class 4, and 45.1% in Class 5 during the 5-year period. The accuracy of prediction was 0.675, and the top three most important Shapley addictive explanation values were 1.61 for baseline estimated GFR, 0.12 for hemoglobin, and 0.11 for body mass index. The estimated GFR transition of patients with preserved chronic kidney disease mostly depended on baseline estimated GFR, and the borderline for estimated GFR trajectory was nearly 50 mL/min/1.73 m2.
Collapse
Affiliation(s)
- Hirotaka Saito
- Department of Nephrology and Hypertension, Fukushima Medical University, 1 Hikariga-Oka, Fukushima City, Fukushima, 960-1295, Japan
| | - Hiroki Yoshimura
- Department of Radiation Health Management, Fukushima Medical University, Fukushima, Japan
| | - Kenichi Tanaka
- Department of Nephrology and Hypertension, Fukushima Medical University, 1 Hikariga-Oka, Fukushima City, Fukushima, 960-1295, Japan.
- Division of Advanced Community Based Care for Lifestyle Related Diseases, Fukushima Medical University, Fukushima, Japan.
| | - Hiroshi Kimura
- Department of Nephrology and Hypertension, Fukushima Medical University, 1 Hikariga-Oka, Fukushima City, Fukushima, 960-1295, Japan
- Division of Advanced Community Based Care for Lifestyle Related Diseases, Fukushima Medical University, Fukushima, Japan
| | - Kimio Watanabe
- Department of Nephrology and Hypertension, Fukushima Medical University, 1 Hikariga-Oka, Fukushima City, Fukushima, 960-1295, Japan
| | - Masaharu Tsubokura
- Department of Radiation Health Management, Fukushima Medical University, Fukushima, Japan
| | - Hiroki Ejiri
- Department of Nephrology and Hypertension, Fukushima Medical University, 1 Hikariga-Oka, Fukushima City, Fukushima, 960-1295, Japan
| | - Tianchen Zhao
- Department of Radiation Health Management, Fukushima Medical University, Fukushima, Japan
| | - Akihiko Ozaki
- Department of Thyroid and Endocrinology, Fukushima Medical University, Fukushima, Japan
| | - Sakumi Kazama
- Division of Advanced Community Based Care for Lifestyle Related Diseases, Fukushima Medical University, Fukushima, Japan
| | - Michio Shimabukuro
- Division of Advanced Community Based Care for Lifestyle Related Diseases, Fukushima Medical University, Fukushima, Japan
- Department of Diabetes, Endocrinology, and Metabolism, Fukushima Medical University, Fukushima, Japan
| | - Koichi Asahi
- Division of Advanced Community Based Care for Lifestyle Related Diseases, Fukushima Medical University, Fukushima, Japan
- Division of Nephrology and Hypertension, Iwate Medical University, Yahaba, Japan
| | - Tsuyoshi Watanabe
- Division of Advanced Community Based Care for Lifestyle Related Diseases, Fukushima Medical University, Fukushima, Japan
| | - Junichiro J Kazama
- Department of Nephrology and Hypertension, Fukushima Medical University, 1 Hikariga-Oka, Fukushima City, Fukushima, 960-1295, Japan
- Division of Advanced Community Based Care for Lifestyle Related Diseases, Fukushima Medical University, Fukushima, Japan
| |
Collapse
|
9
|
Wang CK, Chang CY, Chu TW, Liang YJ. Using Machine Learning to Identify the Relationships between Demographic, Biochemical, and Lifestyle Parameters and Plasma Vitamin D Concentration in Healthy Premenopausal Chinese Women. Life (Basel) 2023; 13:2257. [PMID: 38137858 PMCID: PMC10744461 DOI: 10.3390/life13122257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 11/15/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
INTRODUCTION Vitamin D plays a vital role in maintaining homeostasis and enhancing the absorption of calcium, an essential component for strengthening bones and preventing osteoporosis. There are many factors known to relate to plasma vitamin D concentration (PVDC). However, most of these studies were performed with traditional statistical methods. Nowadays, machine learning methods (Mach-L) have become new tools in medical research. In the present study, we used four Mach-L methods to explore the relationships between PVDC and demographic, biochemical, and lifestyle factors in a group of healthy premenopausal Chinese women. Our goals were as follows: (1) to evaluate and compare the predictive accuracy of Mach-L and MLR, and (2) to establish a hierarchy of the significance of the aforementioned factors related to PVDC. METHODS Five hundred ninety-three healthy Chinese women were enrolled. In total, there were 35 variables recorded, including demographic, biochemical, and lifestyle information. The dependent variable was 25-OH vitamin D (PVDC), and all other variables were the independent variables. Multiple linear regression (MLR) was regarded as the benchmark for comparison. Four Mach-L methods were applied (random forest (RF), stochastic gradient boosting (SGB), extreme gradient boosting (XGBoost), and elastic net). Each method would produce several estimation errors. The smaller these errors were, the better the model was. RESULTS Pearson's correlation, age, glycated hemoglobin, HDL-cholesterol, LDL-cholesterol, and hemoglobin were positively correlated to PVDC, whereas eGFR was negatively correlated to PVDC. The Mach-L methods yielded smaller estimation errors for all five parameters, which indicated that they were better methods than the MLR model. After averaging the importance percentage from the four Mach-L methods, a rank of importance could be obtained. Age was the most important factor, followed by plasma insulin level, TSH, spouse status, LDH, and ALP. CONCLUSIONS In a healthy Chinese premenopausal cohort using four different Mach-L methods, age was found to be the most important factor related to PVDC, followed by plasma insulin level, TSH, spouse status, LDH, and ALP.
Collapse
Affiliation(s)
- Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan;
| | - Ching-Yao Chang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Chief Executive Officer’s Office, MJ Health Research Foundation, Taipei 114, Taiwan;
| | - Yao-Jen Liang
- Graduate Institute of Applied Science and Engineering, Fu Jen Catholic University, New Taipei City 242, Taiwan;
| |
Collapse
|
10
|
Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023; 11:7951-7964. [DOI: 10.12998/wjcc.v11.i33.7951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/23/2023] [Accepted: 11/13/2023] [Indexed: 11/24/2023] Open
Abstract
BACKGROUND The prevalence of type 2 diabetes (T2D) has been increasing dramatically in recent decades, and 47.5% of T2D patients will die of cardiovascular disease. Thallium-201 myocardial perfusion scan (MPS) is a precise and non-invasive method to detect coronary artery disease (CAD). Most previous studies used traditional logistic regression (LGR) to evaluate the risks for abnormal CAD. Rapidly developing machine learning (Mach-L) techniques could potentially outperform LGR in capturing non-linear relationships.
AIM To aims were: (1) Compare the accuracy of Mach-L methods and LGR; and (2) Found the most important factors for abnormal TMPS.
METHODS 556 T2D were enrolled in the study (287 men and 269 women). Demographic and biochemistry data were used as independent variables and the sum of stressed score derived from MPS scan was the dependent variable. Subjects with a MPS score ≥ 9 were defined as abnormal. In addition to traditional LGR, classification and regression tree (CART), random forest, Naïve Bayes, and eXtreme gradient boosting were also applied. Sensitivity, specificity, accuracy and area under the receiver operation curve were used to evaluate the respective accuracy of LGR and Mach-L methods.
RESULTS Except for CART, the other Mach-L methods outperformed LGR, with gender, body mass index, age, low-density lipoprotein cholesterol, glycated hemoglobin and smoking emerging as the most important factors to predict abnormal MPS.
CONCLUSION Four Mach-L methods are found to outperform LGR in predicting abnormal TMPS in Chinese T2D, with the most important risk factors being gender, body mass index, age, low-density lipoprotein cholesterol, glycated hemoglobin and smoking.
Collapse
Affiliation(s)
- Chung-Chi Yang
- Division of Cardiovascular Medicine, Taoyuan Armed Forces General Hospital, Taoyuan City 32551, Taiwan
- Division of Cardiovascular, Tri-service General Hospital, Taipei City 114202, Taiwan
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, New Taipei City 23148, Taiwan
- School of Medicine, Fu-Jen Catholic University, New Taipei City 242062, Taiwan
| | - Li-Ying Huang
- Department of Internal Medicine, Department of Medical Education, School of Medicine, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 243, Taiwan
| | - Fang Yu Chen
- Department of Endocrinology, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
| | - Chun-Heng Kuo
- School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 243, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei City 243, Taiwan
| | - Chung-Ze Wu
- Division of Endocrinology, Shuang Ho Hospital, New Taipei City 23561, Taiwan
- School of Medicine, Taipei Medical University, Taipei City 11031, Taiwan
| | - Te-Lin Hsia
- Department of Internal Medicine, Cardinal Tien Hospital, New Taipei City 23148, Taiwan
| | - Chung-Yu Lin
- Department of Cardiology, Fu Jen Catholic University Hospital, New Taipei City 24352, Taiwan
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
11
|
Tzou SJ, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Chu TW. Comparison between linear regression and four different machine learning methods in selecting risk factors for osteoporosis in a Chinese female aged cohort. J Chin Med Assoc 2023; 86:1028-1036. [PMID: 37729604 DOI: 10.1097/jcma.0000000000000999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND Population aging is emerging as an increasingly acute challenge for countries around the world. One particular manifestation of this phenomenon is the impact of osteoporosis on individuals and national health systems. Previous studies of risk factors for osteoporosis were conducted using traditional statistical methods, but more recent efforts have turned to machine learning approaches. Most such efforts, however, treat the target variable (bone mineral density [BMD] or fracture rate) as a categorical one, which provides no quantitative information. The present study uses five different machine learning methods to analyze the risk factors for T-score of BMD, seeking to (1) compare the prediction accuracy between different machine learning methods and traditional multiple linear regression (MLR) and (2) rank the importance of 25 different risk factors. METHODS The study sample includes 24 412 women older than 55 years with 25 related variables, applying traditional MLR and five different machine learning methods: classification and regression tree, Naïve Bayes, random forest, stochastic gradient boosting, and eXtreme gradient boosting. The metrics used for model performance comparisons are the symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error. RESULTS Machine learning approaches outperformed MLR for all four prediction errors. The average importance ranking of each factor generated by the machine learning methods indicates that age is the most important factor determining T-score, followed by estimated glomerular filtration rate (eGFR), body mass index (BMI), uric acid (UA), and education level. CONCLUSION In a group of women older than 55 years, we demonstrated that machine learning methods provide superior performance in estimating T-Score, with age being the most important impact factor, followed by eGFR, BMI, UA, and education level.
Collapse
Affiliation(s)
- Shiow-Jyu Tzou
- Teaching and Researching Center, Kaohsiung Armed Forces General Hospital, Kaohsiung, Taiwan, ROC
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung, Taiwan, ROC
| | - Chung-Hsin Peng
- Department of Urology, Cardinal Tien Hospital, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan, ROC
| | - Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, New Taipei, Taiwan
| | - Chung-Ze Wu
- Department of Internal Medicine, Shuang Ho Hospital, New Taipei City, Division of Endocrinology and Metabolism, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan, ROC
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, ROC
- MJ Health Research Foundation, Taipei, Taiwan, ROC
| |
Collapse
|
12
|
Chen CH, Wang CK, Wang CY, Chang CF, Chu TW. Roles of biochemistry data, lifestyle, and inflammation in identifying abnormal renal function in old Chinese. World J Clin Cases 2023; 11:7004-7016. [PMID: 37946770 PMCID: PMC10631406 DOI: 10.12998/wjcc.v11.i29.7004] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/01/2023] [Accepted: 09/11/2023] [Indexed: 10/13/2023] Open
Abstract
BACKGROUND The incidence of chronic kidney disease (CKD) has dramatically increased in recent years, with significant impacts on patient mortality rates. Previous studies have identified multiple risk factors for CKD, but they mostly relied on the use of traditional statistical methods such as logistic regression and only focused on a few risk factors. AIM To determine factors that can be used to identify subjects with a low estimated glomerular filtration rate (L-eGFR < 60 mL/min per 1.73 m2) in a cohort of 1236 Chinese people aged over 65. METHODS Twenty risk factors were divided into three models. Model 1 consisted of demographic and biochemistry data. Model 2 added lifestyle data to Model 1, and Model 3 added inflammatory markers to Model 2. Five machine learning methods were used: Multivariate adaptive regression splines, eXtreme Gradient Boosting, stochastic gradient boosting, Light Gradient Boosting Machine, and Categorical Features + Gradient Boosting. Evaluation criteria included accuracy, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), F-1 score, and balanced accuracy. RESULTS A trend of increasing AUC of each was observed from Model 1 to Model 3 and reached statistical significance. Model 3 selected uric acid as the most important risk factor, followed by age, hemoglobin (Hb), body mass index (BMI), sport hours, and systolic blood pressure (SBP). CONCLUSION Among all the risk factors including demographic, biochemistry, and lifestyle risk factors, along with inflammation markers, UA is the most important risk factor to identify L-eGFR, followed by age, Hb, BMI, sport hours, and SBP in a cohort of elderly Chinese people.
Collapse
Affiliation(s)
- Chao-Hung Chen
- Division of Urology, Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan
- Division of Urology, Department of Surgery, Chang Gung Memorial Hospital, Keelung 204, Taiwan
| | - Chun-Kai Wang
- Department of Obstetrics and Gynecology, Zuoying Branch of Kaohsiung Armed Forces General Hospital, Kaohsiung 813, Taiwan
| | - Chen-Yu Wang
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Chun-Feng Chang
- Division of Urology, Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan
- Division of Urology, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
- Chief Executive Officer's Office, MJ Health Research Foundation, Taipei 114, Taiwan
| |
Collapse
|
13
|
Tsai MH, Jhou MJ, Liu TC, Fang YW, Lu CJ. An integrated machine learning predictive scheme for longitudinal laboratory data to evaluate the factors determining renal function changes in patients with different chronic kidney disease stages. Front Med (Lausanne) 2023; 10:1155426. [PMID: 37859858 PMCID: PMC10582636 DOI: 10.3389/fmed.2023.1155426] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/19/2023] [Indexed: 10/21/2023] Open
Abstract
Background and objectives Chronic kidney disease (CKD) is a global health concern. This study aims to identify key factors associated with renal function changes using the proposed machine learning and important variable selection (ML&IVS) scheme on longitudinal laboratory data. The goal is to predict changes in the estimated glomerular filtration rate (eGFR) in a cohort of patients with CKD stages 3-5. Design A retrospective cohort study. Setting and participants A total of 710 outpatients who presented with stable nondialysis-dependent CKD stages 3-5 at the Shin-Kong Wu Ho-Su Memorial Hospital Medical Center from 2016 to 2021. Methods This study analyzed trimonthly laboratory data including 47 indicators. The proposed scheme used stochastic gradient boosting, multivariate adaptive regression splines, random forest, eXtreme gradient boosting, and light gradient boosting machine algorithms to evaluate the important factors for predicting the results of the fourth eGFR examination, especially in patients with CKD stage 3 and those with CKD stages 4-5, with or without diabetes mellitus (DM). Main outcome measurement Subsequent eGFR level after three consecutive laboratory data assessments. Results Our ML&IVS scheme demonstrated superior predictive capabilities and identified significant factors contributing to renal function changes in various CKD groups. The latest levels of eGFR, blood urea nitrogen (BUN), proteinuria, sodium, and systolic blood pressure as well as mean levels of eGFR, BUN, proteinuria, and triglyceride were the top 10 significantly important factors for predicting the subsequent eGFR level in patients with CKD stages 3-5. In individuals with DM, the latest levels of BUN and proteinuria, mean levels of phosphate and proteinuria, and variations in diastolic blood pressure levels emerged as important factors for predicting the decline of renal function. In individuals without DM, all phosphate patterns and latest albumin levels were found to be key factors in the advanced CKD group. Moreover, proteinuria was identified as an important factor in the CKD stage 3 group without DM and CKD stages 4-5 group with DM. Conclusion The proposed scheme highlighted factors associated with renal function changes in different CKD conditions, offering valuable insights to physicians for raising awareness about renal function changes.
Collapse
Affiliation(s)
- Ming-Hsien Tsai
- Division of Nephrology, Department of Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- Department of Medicine, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Yu-Wei Fang
- Division of Nephrology, Department of Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- Department of Medicine, School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
14
|
Huang HH, Hsieh SJ, Chen MS, Jhou MJ, Liu TC, Shen HL, Yang CT, Hung CC, Yu YY, Lu CJ. Machine Learning Predictive Models for Evaluating Risk Factors Affecting Sperm Count: Predictions Based on Health Screening Indicators. J Clin Med 2023; 12:1220. [PMID: 36769868 PMCID: PMC9917545 DOI: 10.3390/jcm12031220] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 01/13/2023] [Accepted: 02/01/2023] [Indexed: 02/05/2023] Open
Abstract
In many countries, especially developed nations, the fertility rate and birth rate have continually declined. Taiwan's fertility rate has paralleled this trend and reached its nadir in 2022. Therefore, the government uses many strategies to encourage more married couples to have children. However, couples marrying at an older age may have declining physical status, as well as hypertension and other metabolic syndrome symptoms, in addition to possibly being overweight, which have been the focus of the studies for their influences on male and female gamete quality. Many previous studies based on infertile people are not truly representative of the general population. This study proposed a framework using five machine learning (ML) predictive algorithms-random forest, stochastic gradient boosting, least absolute shrinkage and selection operator regression, ridge regression, and extreme gradient boosting-to identify the major risk factors affecting male sperm count based on a major health screening database in Taiwan. Unlike traditional multiple linear regression, ML algorithms do not need statistical assumptions and can capture non-linear relationships or complex interactions between dependent and independent variables to generate promising performance. We analyzed annual health screening data of 1375 males from 2010 to 2017, including data on health screening indicators, sourced from the MJ Group, a major health screening center in Taiwan. The symmetric mean absolute percentage error, relative absolute error, root relative squared error, and root mean squared error were used as performance evaluation metrics. Our results show that sleep time (ST), alpha-fetoprotein (AFP), body fat (BF), systolic blood pressure (SBP), and blood urea nitrogen (BUN) are the top five risk factors associated with sperm count. ST is a known risk factor influencing reproductive hormone balance, which can affect spermatogenesis and final sperm count. BF and SBP are risk factors associated with metabolic syndrome, another known risk factor of altered male reproductive hormone systems. However, AFP has not been the focus of previous studies on male fertility or semen quality. BUN, the index for kidney function, is also identified as a risk factor by our established ML model. Our results support previous findings that metabolic syndrome has negative impacts on sperm count and semen quality. Sleep duration also has an impact on sperm generation in the testes. AFP and BUN are two novel risk factors linked to sperm counts. These findings could help healthcare personnel and law makers create strategies for creating environments to increase the country's fertility rate. This study should also be of value to follow-up research.
Collapse
Affiliation(s)
- Hung-Hsiang Huang
- Division of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei City 220, Taiwan
| | - Shang-Ju Hsieh
- Division of Urology, Department of Surgery, Far Eastern Memorial Hospital, New Taipei City 220, Taiwan
| | - Ming-Shu Chen
- Department of Healthcare Administration, College of Healthcare & Management, Asia Eastern University of Science and Technology, New Taipei City 220, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Hsiang-Li Shen
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
| | - Chih-Te Yang
- Department of Business Administration, Tamkang University, New Taipei City 251, Taiwan
| | - Chung-Chih Hung
- Department of Laboratory Medicine, Taipei Hospital, Ministry of Health and Welfare, New Taipei City 242, Taiwan
| | - Ya-Yen Yu
- Department of Medical Laboratory, Chang-Hua Hospital, Ministry of Health and Welfare, Chang Hua County 513, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242, Taiwan
| |
Collapse
|
15
|
Huang YC, Cheng YC, Jhou MJ, Chen M, Lu CJ. Integrated Machine Learning Decision Tree Model for Risk Evaluation in Patients with Non-Valvular Atrial Fibrillation When Taking Different Doses of Dabigatran. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:2359. [PMID: 36767726 PMCID: PMC9915180 DOI: 10.3390/ijerph20032359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 01/24/2023] [Accepted: 01/24/2023] [Indexed: 06/18/2023]
Abstract
The new generation of nonvitamin K antagonists are broadly applied for stroke prevention due to their notable efficacy and safety. Our study aimed to develop a suggestive utilization of dabigatran through an integrated machine learning (ML) decision-tree model. Participants taking different doses of dabigatran in the Randomized Evaluation of Long-Term Anticoagulant Therapy trial were included in our analysis and defined as the 110 mg and 150 mg groups. The proposed scheme integrated ML methods, namely naive Bayes, random forest (RF), classification and regression tree (CART), and extreme gradient boosting (XGBoost), which were used to identify the essential variables for predicting vascular events in the 110 mg group and bleeding in the 150 mg group. RF (0.764 for 110 mg; 0.747 for 150 mg) and XGBoost (0.708 for 110 mg; 0.761 for 150 mg) had better area under the receiver operating characteristic curve (AUC) values than logistic regression (benchmark model; 0.683 for 110 mg; 0.739 for 150 mg). We then selected the top ten important variables as internal nodes of the CART decision tree. The two best CART models with ten important variables output tree-shaped rules for predicting vascular events in the 110 mg group and bleeding in the 150 mg group. Our model can be used to provide more visualized and interpretable suggestive rules to clinicians managing NVAF patients who are taking dabigatran.
Collapse
Affiliation(s)
- Yung-Chuan Huang
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Yu-Chen Cheng
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| | - Mingchih Chen
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
16
|
A Hybrid Risk Factor Evaluation Scheme for Metabolic Syndrome and Stage 3 Chronic Kidney Disease Based on Multiple Machine Learning Techniques. Healthcare (Basel) 2022; 10:healthcare10122496. [PMID: 36554020 PMCID: PMC9778302 DOI: 10.3390/healthcare10122496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 11/28/2022] [Accepted: 12/08/2022] [Indexed: 12/14/2022] Open
Abstract
With the rapid development of medicine and technology, machine learning (ML) techniques are extensively applied to medical informatics and the suboptimal health field to identify critical predictor variables and risk factors. Metabolic syndrome (MetS) and chronic kidney disease (CKD) are important risk factors for many comorbidities and complications. Existing studies that utilize different statistical or ML algorithms to perform CKD data analysis mostly analyze the early-stage subjects directly, but few studies have discussed the predictive models and important risk factors for the stage-III CKD high-risk health screening population. The middle stages 3a and 3b of CKD indicate moderate renal failure. This study aims to construct an effective hybrid important risk factor evaluation scheme for subjects with MetS and CKD stages III based on ML predictive models. The six well-known ML techniques, namely random forest (RF), logistic regression (LGR), multivariate adaptive regression splines (MARS), extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost), and a light gradient boosting machine (LightGBM), were used in the proposed scheme. The data were sourced from the Taiwan health examination indicators and the questionnaire responses of 71,108 members between 2005 and 2017. In total, 375 stage 3a CKD and 50 CKD stage 3b CKD patients were enrolled, and 33 different variables were used to evaluate potential risk factors. Based on the results, the top five important variables, namely BUN, SBP, Right Intraocular Pressure (R-IOP), RBCs, and T-Cho/HDL-C (C/H), were identified as significant variables for evaluating the subjects with MetS and CKD stage 3a or 3b.
Collapse
|
17
|
Kitchen C, Chang HY, Weiner JP, Kharrazi H. Assessing the Added Value of Vital Signs Extracted from Electronic Health Records in Healthcare Risk Adjustment Models. Healthc Policy 2022; 15:1671-1682. [PMID: 36092549 PMCID: PMC9462838 DOI: 10.2147/rmhp.s356080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 03/26/2022] [Indexed: 11/24/2022] Open
Abstract
Purpose Patient vital signs are related to specific health risks and outcomes but are underutilized in the prediction of health-care utilization and cost. To measure the added value of electronic health record (EHR) extracted Body Mass Index (BMI) and blood pressure (BP) values in improving healthcare risk and utilization predictions. Patients and Methods A sample of 12,820 adult outpatients from the Johns Hopkins Health System (JHHS) were identified between 2016 and 2017, having high data quality and recorded values for BMI and BP. We evaluated the added value of BMI and BP in predicting health-care utilization and cost through a retrospective cohort design. BMI, mean arterial pressure (MAP), systolic and diastolic BPs were summarized as annual aggregated values. Concurrent annual BMI and MAP changes were quantified as the difference between maximum and minimum recorded values. Model performance estimates consisted of repeated 10-fold cross validation, compared to base model point estimates for demographic and diagnostic, coded events: (1) patient age and sex, (2) age, sex, and the Charlson weighted index, (3) age, sex and the Johns Hopkins ACG system’s DxPM risk score. Results Both categorical BMI and BP were progressively indicative of disease comorbidity, but not uniformly related to health-care utilization or cost. Annual change in BMI and MAP improved predictions for most concurrent year outcomes when compared to base models. Conclusion When a healthcare system lacks relevant diagnostic or risk assessment information for a patient, vital signs may be useful for a simple estimation of disease risk, cost and utilization.
Collapse
Affiliation(s)
- Christopher Kitchen
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Hsien-Yen Chang
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jonathan P Weiner
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Hadi Kharrazi
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.,Division of Health Sciences Informatics, Johns Hopkins School of Medicine, Baltimore, MD, USA
| |
Collapse
|
18
|
Integrating Health Data-Driven Machine Learning Algorithms to Evaluate Risk Factors of Early Stage Hypertension at Different Levels of HDL and LDL Cholesterol. Diagnostics (Basel) 2022; 12:diagnostics12081965. [PMID: 36010315 PMCID: PMC9407063 DOI: 10.3390/diagnostics12081965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/08/2022] [Accepted: 08/11/2022] [Indexed: 11/26/2022] Open
Abstract
Purpose: Cardiovascular disease (CVD) is a major worldwide health burden. As the risk factors of CVD, hypertension, and hyperlipidemia are most mentioned. Early stage hypertension in the population with dyslipidemia is an important public health hazard. This study was the application of data-driven machine learning (ML), demonstrating complex relationships between risk factors and outcomes and promising predictive performance with vast amounts of medical data, aimed to investigate the association between dyslipidemia and the incidence of early stage hypertension in a large cohort with normal blood pressure at baseline. Methods: This study analyzed annual health screening data for 71,108 people from 2005 to 2017, including data for 27 risk-related indicators, sourced from the MJ Group, a major health screening center in Taiwan. We used five machine learning (ML) methods—stochastic gradient boosting (SGB), multivariate adaptive regression splines (MARS), least absolute shrinkage and selection operator regression (Lasso), ridge regression (Ridge), and gradient boosting with categorical features support (CatBoost)—to develop a multi-stage ML algorithm-based prediction scheme and then evaluate important risk factors at the early stage of hypertension, especially for groups with high-density lipoprotein cholesterol (HDL-C) and low-density lipoprotein cholesterol (LDL-C) levels within or out of the reference range. Results: Age, body mass index, waist circumference, waist-to-hip ratio, fasting plasma glucose, and C-reactive protein (CRP) were associated with hypertension. The hemoglobin level was also a positive contributor to blood pressure elevation and it appeared among the top three important risk factors in all LDL-C/HDL-C groups; therefore, these variables may be important in affecting blood pressure in the early stage of hypertension. A residual contribution to blood pressure elevation was found in groups with increased LDL-C. This suggests that LDL-C levels are associated with CPR levels, and that the LDL-C level may be an important factor for predicting the development of hypertension. Conclusion: The five prediction models provided similar classifications of risk factors. The results of this study show that an increase in LDL-C is more important than the start of a drop in HDL-C in health screening of sub-healthy adults. The findings of this study should be of value to health awareness raising about hypertension and further discussion and follow-up research.
Collapse
|
19
|
Sun CK, Tang YX, Liu TC, Lu CJ. An Integrated Machine Learning Scheme for Predicting Mammographic Anomalies in High-Risk Individuals Using Questionnaire-Based Predictors. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19159756. [PMID: 35955112 PMCID: PMC9368335 DOI: 10.3390/ijerph19159756] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/02/2022] [Accepted: 08/06/2022] [Indexed: 05/09/2023]
Abstract
This study aimed to investigate the important predictors related to predicting positive mammographic findings based on questionnaire-based demographic and obstetric/gynecological parameters using the proposed integrated machine learning (ML) scheme. The scheme combines the benefits of two well-known ML algorithms, namely, least absolute shrinkage and selection operator (Lasso) logistic regression and extreme gradient boosting (XGB), to provide adequate prediction for mammographic anomalies in high-risk individuals and the identification of significant risk factors. We collected questionnaire data on 18 breast-cancer-related risk factors from women who participated in a national mammographic screening program between January 2017 and December 2020 at a single tertiary referral hospital to correlate with their mammographic findings. The acquired data were retrospectively analyzed using the proposed integrated ML scheme. Based on the data from 21,107 valid questionnaires, the results showed that the Lasso logistic regression models with variable combinations generated by XGB could provide more effective prediction results. The top five significant predictors for positive mammography results were younger age, breast self-examination, older age at first childbirth, nulliparity, and history of mammography within 2 years, suggesting a need for timely mammographic screening for women with these risk factors.
Collapse
Affiliation(s)
- Cheuk-Kay Sun
- Division of Hepatology and Gastroenterology, Department of Internal Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei 11101, Taiwan
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- School of Medicine, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- School of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Yun-Xuan Tang
- Department of Radiology, Shin Kong Wu Ho-Su Memorial Hospital, Taipei 11101, Taiwan
- Department of Medical Imaging and Radiological Technology, Yuanpei University of Medical Technology, Hsinchu 30015, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Correspondence:
| |
Collapse
|
20
|
Comparison between Machine Learning and Multiple Linear Regression to Identify Abnormal Thallium Myocardial Perfusion Scan in Chinese Type 2 Diabetes. Diagnostics (Basel) 2022; 12:diagnostics12071619. [PMID: 35885524 PMCID: PMC9324130 DOI: 10.3390/diagnostics12071619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2022] [Revised: 06/27/2022] [Accepted: 06/30/2022] [Indexed: 11/17/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) patients have a high risk of coronary artery disease (CAD). Thallium-201 myocardial perfusion scan (Th-201 scan) is a non-invasive and extensively used tool in recognizing CAD in clinical settings. In this study, we attempted to compare the predictive accuracy of evaluating abnormal Th-201 scans using traditional multiple linear regression (MLR) with four machine learning (ML) methods. From the study, we can determine whether ML surpasses traditional MLR and rank the clinical variables and compare them with previous reports.In total, 796 T2DM, including 368 men and 528 women, were enrolled. In addition to traditional MLR, classification and regression tree (CART), random forest (RF), stochastic gradient boosting (SGB) and eXtreme gradient boosting (XGBoost) were also used to analyze abnormal Th-201 scans. Stress sum score was used as the endpoint (dependent variable). Our findings show that all four root mean square errors of ML are smaller than with MLR, which implies that ML is more precise than MLR in determining abnormal Th-201 scans by using clinical parameters. The first seven factors, from the most important to the least are:body mass index, hemoglobin, age, glycated hemoglobin, Creatinine, systolic and diastolic blood pressure. In conclusion, ML is not inferior to traditional MLR in predicting abnormal Th-201 scans, and the most important factors are body mass index, hemoglobin, age, glycated hemoglobin, creatinine, systolic and diastolic blood pressure. ML methods are superior in these kinds of studies.
Collapse
|
21
|
Huang LY, Chen FY, Jhou MJ, Kuo CH, Wu CZ, Lu CH, Chen YL, Pei D, Cheng YF, Lu CJ. Comparing Multiple Linear Regression and Machine Learning in Predicting Diabetic Urine Albumin-Creatinine Ratio in a 4-Year Follow-Up Study. J Clin Med 2022; 11:3661. [PMID: 35806944 PMCID: PMC9267784 DOI: 10.3390/jcm11133661] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 06/19/2022] [Accepted: 06/22/2022] [Indexed: 02/07/2023] Open
Abstract
The urine albumin-creatinine ratio (uACR) is a warning for the deterioration of renal function in type 2 diabetes (T2D). The early detection of ACR has become an important issue. Multiple linear regression (MLR) has traditionally been used to explore the relationships between risk factors and endpoints. Recently, machine learning (ML) methods have been widely applied in medicine. In the present study, four ML methods were used to predict the uACR in a T2D cohort. We hypothesized that (1) ML outperforms traditional MLR and (2) different ranks of the importance of the risk factors will be obtained. A total of 1147 patients with T2D were followed up for four years. MLR, classification and regression tree, random forest, stochastic gradient boosting, and eXtreme gradient boosting methods were used. Our findings show that the prediction errors of the ML methods are smaller than those of MLR, which indicates that ML is more accurate. The first six most important factors were baseline creatinine level, systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose. In conclusion, ML might be more accurate in predicting uACR in a T2D cohort than the traditional MLR, and the baseline creatinine level is the most important predictor, which is followed by systolic and diastolic blood pressure, glycated hemoglobin, and fasting plasma glucose in Chinese patients with T2D.
Collapse
Affiliation(s)
- Li-Ying Huang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Fang-Yu Chen
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
| | - Chun-Heng Kuo
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Chung-Ze Wu
- Division of Endocrinology, Department of Internal Medicine, Shuang Ho Hospital, New Taipei City 23561, Taiwan;
- Division of Endocrinology and Metabolism, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Chieh-Hua Lu
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Tri-Service General Hospital, School of Medicine, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Yen-Lin Chen
- Department of Pathology, Tri-Service General Hospital, National Defense Medical Center, Taipei 11490, Taiwan;
| | - Dee Pei
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Department of Medical Education, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei City 24352, Taiwan; (L.-Y.H.); (F.-Y.C.); (C.-H.K.); (D.P.)
| | - Yu-Fang Cheng
- Department of Endocrinology and Metabolism, Changhua Christian Hospital, Changhua 50051, Taiwan;
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|