1
|
M'hamdi O, Takács S, Palotás G, Ilahy R, Helyes L, Pék Z. A Comparative Analysis of XGBoost and Neural Network Models for Predicting Some Tomato Fruit Quality Traits from Environmental and Meteorological Data. Plants (Basel) 2024; 13:746. [PMID: 38475592 DOI: 10.3390/plants13050746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 03/01/2024] [Accepted: 03/04/2024] [Indexed: 03/14/2024]
Abstract
The tomato as a raw material for processing is globally important and is pivotal in dietary and agronomic research due to its nutritional, economic, and health significance. This study explored the potential of machine learning (ML) for predicting tomato quality, utilizing data from 48 cultivars and 28 locations in Hungary over 5 seasons. It focused on °Brix, lycopene content, and colour (a/b ratio) using extreme gradient boosting (XGBoost) and artificial neural network (ANN) models. The results revealed that XGBoost consistently outperformed ANN, achieving high accuracy in predicting °Brix (R² = 0.98, RMSE = 0.07) and lycopene content (R² = 0.87, RMSE = 0.61), and excelling in colour prediction (a/b ratio) with a R² of 0.93 and RMSE of 0.03. ANN lagged behind particularly in colour prediction, showing a negative R² value of -0.35. Shapley additive explanation's (SHAP) summary plot analysis indicated that both models are effective in predicting °Brix and lycopene content in tomatoes, highlighting different aspects of the data. SHAP analysis highlighted the models' efficiency (especially in °Brix and lycopene predictions) and underscored the significant influence of cultivar choice and environmental factors like climate and soil. These findings emphasize the importance of selecting and fine-tuning the appropriate ML model for enhancing precision agriculture, underlining XGBoost's superiority in handling complex agronomic data for quality assessment.
Collapse
Affiliation(s)
- Oussama M'hamdi
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
- Doctoral School of Plant Science, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| | - Sándor Takács
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| | - Gábor Palotás
- Univer Product Zrt, Szolnoki út 35, 6000 Kecskemét, Hungary
| | - Riadh Ilahy
- Laboratory of Horticulture, National Agricultural Research Institute of Tunisia (INRAT), University of Carthage, Ariana 1004, Tunisia
| | - Lajos Helyes
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| | - Zoltán Pék
- Institute of Horticultural Sciences, Hungarian University of Agriculture and Life Sciences, Páter K. Str. 1, 2100 Gödöllö, Hungary
| |
Collapse
|
2
|
Suryawanshi A, Behera N. Prediction of wear of dental composite materials using machine learning algorithms. Comput Methods Biomech Biomed Engin 2024; 27:400-410. [PMID: 36920276 DOI: 10.1080/10255842.2023.2187671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 02/21/2023] [Accepted: 03/01/2023] [Indexed: 03/16/2023]
Abstract
Since dental materials are worn down over time and eventually need to be replaced. Resin composites are frequently employed as dental restorative materials. By employing the in-vitro test findings of the pin-on-disc tribometer [ASTM G99-04], the goal of this study is to evaluate the capability of three different machine learning (ML) models in analyzing the wear of dental composite materials when immersed in chewable tobacco solution. Four distinct dental composite material samples are used in this investigation, and after being dipped in a chewing tobacco solution for a few days, the samples are taken out and subjected to a wear test. Three different ML models (MLP, KNN, XGBoost) have been chosen for predicting the wear of dental composite specimens. XGBoost ML model yields an R2 value of 0.9996 and it performs noticeably better than the other approaches.
Collapse
|
3
|
Pavlov M, Barić D, Novak A, Manola Š, Jurin I. From statistical inference to machine learning: A paradigm shift in contemporary cardiovascular pharmacotherapy. Br J Clin Pharmacol 2024; 90:691-699. [PMID: 37845041 DOI: 10.1111/bcp.15927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 10/02/2023] [Indexed: 10/18/2023] Open
Abstract
AIMS Heart failure with reduced ejection fraction (HFrEF) poses significant challenges for clinicians and researchers, owing to its multifaceted aetiology and complex treatment regimens. In light of this, artificial intelligence methods offer an innovative approach to identifying relationships within complex clinical datasets. Our study aims to explore the potential for machine learning algorithms to provide deeper insights into datasets of HFrEF patients. METHODS To this end, we analysed a cohort of 386 HFrEF patients who had been initiated on sodium-glucose co-transporter-2 inhibitor treatment and had completed a minimum of a 6-month follow-up. RESULTS In traditional frequentist statistical analyses, patients receiving the highest doses of beta-blockers (BBs) (chi-square test, P = .036) and those newly initiated on sacubitril-valsartan (chi-square test, P = .023) showed better outcomes. However, none of these pharmacological features stood out as independent predictors of improved outcomes in the Cox proportional hazards model. In contrast, when employing eXtreme Gradient Boosting (XGBoost) algorithms in conjunction with the data using Shapley additive explanations (SHAP), we identified several models with significant predictive power. The XGBoost algorithm inherently accommodates non-linear distribution, multicollinearity and confounding. Within this framework, pharmacological categories like 'newly initiated treatment with sacubitril/valsartan' and 'BB dose escalation' emerged as strong predictors of long-term outcomes. CONCLUSIONS In this manuscript, we not only emphasize the strengths of this machine learning approach but also discuss its potential limitations and the risk of identifying statistically significant yet clinically irrelevant predictors.
Collapse
Affiliation(s)
- Marin Pavlov
- Department of Cardiology, Dubrava University Hospital, Zagreb, Croatia
| | - Domjan Barić
- Department of Physics, Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - Andrej Novak
- Department of Cardiology, Dubrava University Hospital, Zagreb, Croatia
- Department of Physics, Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - Šime Manola
- Department of Cardiology, Dubrava University Hospital, Zagreb, Croatia
| | - Ivana Jurin
- Department of Cardiology, Dubrava University Hospital, Zagreb, Croatia
| |
Collapse
|
4
|
Dianati-Nasab M, Salimifard K, Mohammadi R, Saadatmand S, Fararouei M, Hosseini KS, Jiavid-Sharifi B, Chaussalet T, Dehdar S. Machine learning algorithms to uncover risk factors of breast cancer: insights from a large case-control study. Front Oncol 2024; 13:1276232. [PMID: 38425674 PMCID: PMC10903343 DOI: 10.3389/fonc.2023.1276232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/27/2023] [Indexed: 03/02/2024] Open
Abstract
Introduction This large case-control study explored the application of machine learning models to identify risk factors for primary invasive incident breast cancer (BC) in the Iranian population. This study serves as a bridge toward improved BC prevention, early detection, and management through the identification of modifiable and unmodifiable risk factors. Methods The dataset includes 1,009 cases and 1,009 controls, with comprehensive data on lifestyle, health-behavior, reproductive and sociodemographic factors. Different machine learning models, namely Random Forest (RF), Neural Networks (NN), Bootstrap Aggregating Classification and Regression Trees (Bagged CART), and Extreme Gradient Boosting Tree (XGBoost), were employed to analyze the data. Results The findings highlight the significance of a chest X-ray history, deliberate weight loss, abortion history, and post-menopausal status as predictors. Factors such as second-hand smoking, lower education, menarche age (>14), occupation (employed), first delivery age (18-23), and breastfeeding duration (>42 months) were also identified as important predictors in multiple models. The RF model exhibited the highest Area Under the Curve (AUC) value of 0.9, as indicated by the Receiver Operating Characteristic (ROC) curve. Following closely was the Bagged CART model with an AUC of 0.89, while the XGBoost model achieved a slightly lower AUC of 0.78. In contrast, the NN model demonstrated the lowest AUC of 0.74. On the other hand, the RF model achieved an accuracy of 83.9% and a Kappa coefficient of 67.8% and the XGBoost, achieved a lower accuracy of 82.5% and a lower Kappa coefficient of 0.6. Conclusion This study could be beneficial for targeted preventive measures according to the main risk factors for BC among high-risk women.
Collapse
Affiliation(s)
- Mostafa Dianati-Nasab
- School of Medical and Life Sciences, Sunway University, Sunway City, Malaysia
- Department of Epidemiology, School of Public Health, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Khodakaram Salimifard
- Computational Intelligence & Intelligent Optimization Research Group, Business & Economics School, Persian Gulf University, Bushehr, Iran
| | - Reza Mohammadi
- Department of Operation Management, Amsterdam Business School, University of Amsterdam, Amsterdam, Netherlands
| | - Sara Saadatmand
- Computational Intelligence & Intelligent Optimization Research Group, Business & Economics School, Persian Gulf University, Bushehr, Iran
| | - Mohammad Fararouei
- School of Medical and Life Sciences, Sunway University, Sunway City, Malaysia
| | - Kosar S. Hosseini
- Department of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | | | - Thierry Chaussalet
- Computer Science and Engineering, University of Westminster, London, United Kingdom
| | - Samira Dehdar
- Computational Intelligence & Intelligent Optimization Research Group, Business & Economics School, Persian Gulf University, Bushehr, Iran
| |
Collapse
|
5
|
Lehtonen E, Kujala I, Tamminen J, Maaniitty T, Saraste A, Teuho J, Knuuti J, Klén R. Incremental prognostic value of downstream positron emission tomography perfusion imaging after coronary computed tomography angiography: a study using machine learning. Eur Heart J Cardiovasc Imaging 2024; 25:285-292. [PMID: 37774503 PMCID: PMC10824480 DOI: 10.1093/ehjci/jead246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/07/2023] [Accepted: 09/22/2023] [Indexed: 10/01/2023] Open
Abstract
AIMS To evaluate the incremental value of positron emission tomography (PET) myocardial perfusion imaging (MPI) over coronary computed tomography angiography (CCTA) in predicting short- and long-term outcome using machine learning (ML) approaches. METHODS AND RESULTS A total of 2411 patients with clinically suspected coronary artery disease (CAD) underwent CCTA, out of whom 891 patients were admitted to downstream PET MPI for haemodynamic evaluation of obstructive coronary stenosis. Two sets of Extreme Gradient Boosting (XGBoost) ML models were trained, one with all the clinical and imaging variables (including PET) and the other with only clinical and CCTA-based variables. Difference in the performance of the two sets was analysed by means of area under the receiver operating characteristic curve (AUC). After the removal of incomplete data entries, 2284 patients remained for further analysis. During the 8-year follow-up, 210 adverse events occurred including 59 myocardial infarctions, 35 unstable angina pectoris, and 116 deaths. The PET MPI data improved the outcome prediction over CCTA during the first 4 years of the observation time and the highest AUC was at the observation time of Year 1 (0.82, 95% confidence interval 0.804-0.827). After that, there was no significant incremental prognostic value by PET MPI. CONCLUSION PET MPI variables improve the prediction of adverse events beyond CCTA imaging alone for the first 4 years of follow-up. This illustrates the complementary nature of anatomic and functional information in predicting the outcome of patients with suspected CAD.
Collapse
Affiliation(s)
- Eero Lehtonen
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
| | - Iida Kujala
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
| | - Jonne Tamminen
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
| | - Teemu Maaniitty
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
- Department of Clinical Physiology, Nuclear Medicine and PET, Turku University Hospital, Turku, Finland
| | - Antti Saraste
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
- Heart Center, Turku University Hospital and University of Turku, Turku, Finland
| | - Jarmo Teuho
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
| | - Juhani Knuuti
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
- Department of Clinical Physiology, Nuclear Medicine and PET, Turku University Hospital, Turku, Finland
| | - Riku Klén
- Turku PET Centre, Turku University Hospital and University of Turku, Turku, Finland
| |
Collapse
|
6
|
Lombard MA, Brown EE, Saftner DM, Arienzo MM, Fuller-Thomson E, Brown CJ, Ayotte JD. Estimating Lithium Concentrations in Groundwater Used as Drinking Water for the Conterminous United States. Environ Sci Technol 2024; 58:1255-1264. [PMID: 38164924 PMCID: PMC10795177 DOI: 10.1021/acs.est.3c03315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 11/28/2023] [Accepted: 12/19/2023] [Indexed: 01/03/2024]
Abstract
Lithium (Li) concentrations in drinking-water supplies are not regulated in the United States; however, Li is included in the 2022 U.S. Environmental Protection Agency list of unregulated contaminants for monitoring by public water systems. Li is used pharmaceutically to treat bipolar disorder, and studies have linked its occurrence in drinking water to human-health outcomes. An extreme gradient boosting model was developed to estimate geogenic Li in drinking-water supply wells throughout the conterminous United States. The model was trained using Li measurements from ∼13,500 wells and predictor variables related to its natural occurrence in groundwater. The model predicts the probability of Li in four concentration classifications, ≤4 μg/L, >4 to ≤10 μg/L, >10 to ≤30 μg/L, and >30 μg/L. Model predictions were evaluated using wells held out from model training and with new data and have an accuracy of 47-65%. Important predictor variables include average annual precipitation, well depth, and soil geochemistry. Model predictions were mapped at a spatial resolution of 1 km2 and represent well depths associated with public- and private-supply wells. This model was developed by hydrologists and public-health researchers to estimate Li exposure from drinking water and compare to national-scale human-health data for a better understanding of dose-response to low (<30 μg/L) concentrations of Li.
Collapse
Affiliation(s)
- Melissa A. Lombard
- New
England Water Science Center, U.S. Geological
Survey, 331 Commerce Way, Pembroke, New Hampshire 03275, United States
| | - Eric E. Brown
- Centre
for Addiction and Mental Health, University
of Toronto, 80 Workman
Way, Toronto, Ontario, Canada M6J 1H4
| | - Daniel M. Saftner
- Desert
Research Institute, 2215 Raggio Parkway, Reno, Nevada 89512, United States
| | - Monica M. Arienzo
- Desert
Research Institute, 2215 Raggio Parkway, Reno, Nevada 89512, United States
| | - Esme Fuller-Thomson
- Institute
for Life Course and Aging, University of
Toronto, 246 Bloor Street
West, Toronto, Ontario, Canada M5S 1V4
| | - Craig J. Brown
- New
England Water Science Center, U.S. Geological
Survey, 339 Main Street, East Hartford, Connecticut 06108, United States
| | - Joseph D. Ayotte
- New
England Water Science Center, U.S. Geological
Survey, 331 Commerce Way, Pembroke, New Hampshire 03275, United States
| |
Collapse
|
7
|
Jana T, Sarkar D, Ganguli D, Mukherjee SK, Mandal RS, Das S. ABDpred: Prediction of active antimicrobial compounds using supervised machine learning techniques. Indian J Med Res 2024; 159:78-90. [PMID: 38345040 PMCID: PMC10954100 DOI: 10.4103/ijmr.ijmr_1832_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 03/06/2024] Open
Abstract
BACKGROUND OBJECTIVES Discovery of new antibiotics is the need of the hour to treat infectious diseases. An ever-increasing repertoire of multidrug-resistant pathogens poses an imminent threat to human lives across the globe. However, the low success rate of the existing approaches and technologies for antibiotic discovery remains a major bottleneck. In silico methods like machine learning (ML) deem more promising to meet the above challenges compared with the conventional experimental approaches. The goal of this study was to create ML models that may be used to successfully predict new antimicrobial compounds. METHODS In this article, we employed eight different ML algorithms namely, extreme gradient boosting, random forest, gradient boosting classifier, deep neural network, support vector machine, multilayer perceptron, decision tree, and logistic regression. These models were trained using a dataset comprising 312 antibiotic drugs and a negative set of 936 non-antibiotic drugs in a five-fold cross validation approach. RESULTS The top four ML classifiers (extreme gradient boosting, random forest, gradient boosting classifier and deep neural network) were able to achieve an accuracy of 80 per cent and above during the evaluation of testing and blind datasets. INTERPRETATION CONCLUSIONS We aggregated the top performing four models through a soft-voting technique to develop an ensemble-based ML method and incorporated it into a freely accessible online prediction server named ABDpred ( http://clinicalmedicinessd.com.in/abdpred/ ).
Collapse
Affiliation(s)
- Tanmoy Jana
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Debasree Sarkar
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Debayan Ganguli
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Sandip Kumar Mukherjee
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Rahul Shubhra Mandal
- Department of Cancer Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Santasabuj Das
- Division of Clinical Medicine, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
- ICMR-National Institute of Occupational Health, Ahmedabad, India
| |
Collapse
|
8
|
de la Fuente J, Llorente-González S, Fernandez-Robredo P, Hernandez M, García-Layana A, Ochoa I, Recalde S. Suitability of machine learning for atrophy and fibrosis development in neovascular age-related macular degeneration. Acta Ophthalmol 2023. [PMID: 38131161 DOI: 10.1111/aos.16616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 11/20/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023]
Abstract
PURPOSE To assess the suitability of machine learning (ML) techniques in predicting the development of fibrosis and atrophy in patients with neovascular age-related macular degeneration (nAMD), receiving anti-VEGF treatment over a 36-month period. METHODS An extensive analysis was conducted on the use of ML to predict fibrosis and atrophy development on nAMD patients at 36 months from start of anti-VEGF treatment, using only data from the first 12 months. We use data collected according to real-world practice, which includes clinical and genetic factors. RESULTS The ML analysis consistently identified ETDRS as a relevant factor for predicting the development of atrophy and fibrosis, confirming previous statistical analyses. Also, it was shown that genetic variables did not demonstrate statistical relevance in the prediction. Despite the complexity of predicting macular degeneration, our model was able to obtain a balance accuracy of 63% and an AUC of 0.72 when predicting the development of atrophy or fibrosis at 36 months. CONCLUSION This study demonstrates the potential of ML techniques in predicting the development of fibrosis and atrophy in nAMD patients receiving long-term anti-VEGF treatment. The findings highlight the importance of clinical factors, particularly ETDRS (early treatment diabetic retinopathy study) visual acuity test, in predicting these outcomes. The lessons learned from this research can guide future ML-based prediction tasks in the field of ophthalmology and contribute to the design of data collection processes.
Collapse
Affiliation(s)
- Jesus de la Fuente
- Department of Electrical and Electronics Engineering, School of Engineering (Tecnun), University of Navarra, Pamplona, Spain
- Center for Data Science, New York University, New York City, New York, USA
| | - Sara Llorente-González
- Retinal Pathologies and New Therapies Group, Experimental Ophthalmology Laboratory, Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
- Navarra Institute for Health Research, IdiSNA, Pamplona, Spain
- Thematic Network of Cooperative Health Research in Eye Diseases (Oftared), Health Institute Carlos III (ISCIII), Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
| | - Patricia Fernandez-Robredo
- Retinal Pathologies and New Therapies Group, Experimental Ophthalmology Laboratory, Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
- Navarra Institute for Health Research, IdiSNA, Pamplona, Spain
- Thematic Network of Cooperative Health Research in Eye Diseases (Oftared), Health Institute Carlos III (ISCIII), Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
| | - María Hernandez
- Retinal Pathologies and New Therapies Group, Experimental Ophthalmology Laboratory, Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
- Navarra Institute for Health Research, IdiSNA, Pamplona, Spain
- Thematic Network of Cooperative Health Research in Eye Diseases (Oftared), Health Institute Carlos III (ISCIII), Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
| | - Alfredo García-Layana
- Retinal Pathologies and New Therapies Group, Experimental Ophthalmology Laboratory, Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
- Navarra Institute for Health Research, IdiSNA, Pamplona, Spain
- Thematic Network of Cooperative Health Research in Eye Diseases (Oftared), Health Institute Carlos III (ISCIII), Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
| | - Idoia Ochoa
- Department of Electrical and Electronics Engineering, School of Engineering (Tecnun), University of Navarra, Pamplona, Spain
- Institute for Data Science and Artificial Intelligence (DATAI), University of Navarra, Pamplona, Spain
| | - Sergio Recalde
- Retinal Pathologies and New Therapies Group, Experimental Ophthalmology Laboratory, Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
- Navarra Institute for Health Research, IdiSNA, Pamplona, Spain
- Thematic Network of Cooperative Health Research in Eye Diseases (Oftared), Health Institute Carlos III (ISCIII), Department of Ophthalmology, Clinica Universidad de Navarra, Pamplona, Spain
| |
Collapse
|
9
|
Jovic O, Mouras R. Extreme Gradient Boosting Combined with Conformal Predictors for Informative Solubility Estimation. Molecules 2023; 29:19. [PMID: 38202602 PMCID: PMC10779886 DOI: 10.3390/molecules29010019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/15/2023] [Accepted: 12/17/2023] [Indexed: 01/12/2024] Open
Abstract
We used the extreme gradient boosting (XGB) algorithm to predict the experimental solubility of chemical compounds in water and organic solvents and to select significant molecular descriptors. The accuracy of prediction of our forward stepwise top-importance XGB (FSTI-XGB) on curated solubility data sets in terms of RMSE was found to be 0.59-0.76 Log(S) for two water data sets, while for organic solvent data sets it was 0.69-0.79 Log(S) for the Methanol data set, 0.65-0.79 for the Ethanol data set, and 0.62-0.70 Log(S) for the Acetone data set. That was the first step. In the second step, we used uncurated and curated AquaSolDB data sets for applicability domain (AD) tests of Drugbank, PubChem, and COCONUT databases and determined that more than 95% of studied ca. 500,000 compounds were within the AD. In the third step, we applied conformal prediction to obtain narrow prediction intervals and we successfully validated them using test sets' true solubility values. With prediction intervals obtained in the last fourth step, we were able to estimate individual error margins and the accuracy class of the solubility prediction for molecules within the AD of three public databases. All that was possible without the knowledge of experimental database solubilities. We find these four steps novel because usually, solubility-related works only study the first step or the first two steps.
Collapse
Affiliation(s)
| | - Rabah Mouras
- Pharmaceutical Manufacturing Technology Centre, Bernal Institute, Department of Chemical Sciences, University of Limerick, V94 T9PX Limerick, Ireland;
| |
Collapse
|
10
|
Wu J, Zhang C, He F, Wang Y, Zeng L, Liu W, Zhao D, Mao J, Gao F. Factors Affecting Intention to Leave Among ICU Healthcare Professionals in China: Insights from a Cross-Sectional Survey and XGBoost Analysis. Risk Manag Healthc Policy 2023; 16:2543-2553. [PMID: 38024488 PMCID: PMC10676671 DOI: 10.2147/rmhp.s432847] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 11/02/2023] [Indexed: 12/01/2023] Open
Abstract
Background The intention to leave among intensive care unit (ICU) healthcare professionals in China has become a concerning issue. Therefore, understanding the factors influencing the intention to leave and implementing appropriate measures have become urgent needs for maintaining a stable healthcare workforce. Objective This study aims to investigate the current status of intention to leave among ICU healthcare professionals in China, explore the relevant factors affecting this intention, and provide targeted recommendations to reduce the intention to leave among healthcare professionals. Methods A cross-sectional survey was conducted, involving ICU healthcare professionals from 3-A hospitals of the 34 provinces in China. The survey encompassed 22 indicators, including demographic information (marital status, children, income), work-related factors (weekly working hours, night shift frequency, hospital environment), and psychological assessment (using Symptom Checklist-90 (SCL-90)). The data from a sample population of 3653 individuals were analyzed using the extreme gradient boosting (XGBoost) method to predict intention to leave. Results The survey results revealed that 62.09% (2268 individuals) of the surveyed ICU healthcare professionals expressed an intention to leave. The XGBoost model achieved a predictive accuracy of 75.38% and an Area Under the Curve (AUC) of 0.77. Conclusion Satisfaction with income was found to be the strongest predictor of intention to leave among ICU healthcare professionals. Additionally, factors such as years of experience, night shift frequency, and pride in hospital work were found to play significant roles in influencing the intention to leave.
Collapse
Affiliation(s)
- Jiangnan Wu
- Department of Artificial Intelligence, Tianjin University of Technology, Tianjin, People’s Republic of China
| | - Chao Zhang
- Sixth Department of Oncology, Hebei General Hospital, Shijiazhuang, People’s Republic of China
| | - Feng He
- The Second Hospital of Hebei Medical University, Shijiazhuang, People’s Republic of China
| | - Yuan Wang
- Department of Neurosurgery, Tangshan Gongren Hospital, Tangshan, People’s Republic of China
| | - Liangnan Zeng
- Department of Nursing, Chengdu Fifth People’s Hospital, The Fifth People’s Hospital Affiliated to Chengdu University of Traditional Chinese Medicine, Chengdu, People’s Republic of China
| | - Wei Liu
- Hebei Psychological Counselor Association, Shijiazhuang, People’s Republic of China
| | - Di Zhao
- Department of Neurosurgery, The Fourth Hospital of Hebei Medical University, Shijiazhuang, People’s Republic of China
| | - Jingkun Mao
- Department of Artificial Intelligence, Tianjin University of Technology, Tianjin, People’s Republic of China
| | - Fei Gao
- Hebei General Hospital, Shijiazhuang, People’s Republic of China
| |
Collapse
|
11
|
Emaminejad SA, Sparks J, Cusick RD. Integrating Bio-Electrochemical Sensors and Machine Learning to Predict the Efficacy of Biological Nutrient Removal Processes at Water Resource Recovery Facilities. Environ Sci Technol 2023; 57:18372-18381. [PMID: 37386725 DOI: 10.1021/acs.est.3c00352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Monitoring biological nutrient removal (BNR) processes at water resource recovery facilities (WRRFs) with data-driven models is currently limited by the data limitations associated with the variability of bioavailable carbon (C) in wastewater. This study focuses on leveraging the amperometric response of a bio-electrochemical sensor (BES) to wastewater C variability, to predict influent shock loading events and NO3- removal in the first-stage anoxic zone (ANX1) of a five-stage Bardenpho BNR process using machine learning (ML) methods. Shock loading prediction with BES signal processing successfully detected 86.9% of the influent industrial slug and rain events of the plant during the study period. Extreme gradient boosting (XGBoost) and artificial neural network (ANN) models developed using the BES signal and other recorded variables provided a good prediction performance for NO3- removal in the ANX1, particularly within the normal operating range of WRRFs. A sensitivity analysis of the XGBoost model using SHapley Additive exPlanations indicated that the BES signal had the strongest impact on the model output and current approaches to methanol dosing that neglect C availability can negatively impact nitrogen (N) removal due to cascading impacts of overdosing on nitrification efficacy.
Collapse
Affiliation(s)
- Seyed Aryan Emaminejad
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Jeff Sparks
- Hampton Roads Sanitation District Nansemond Treatment Plant, Virginia Beach, Virginia 23455, United States
| | - Roland D Cusick
- Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
12
|
Arturi K, Hollender J. Machine Learning-Based Hazard-Driven Prioritization of Features in Nontarget Screening of Environmental High-Resolution Mass Spectrometry Data. Environ Sci Technol 2023; 57:18067-18079. [PMID: 37279189 PMCID: PMC10666537 DOI: 10.1021/acs.est.3c00304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 05/15/2023] [Accepted: 05/15/2023] [Indexed: 06/08/2023]
Abstract
Nontarget high-resolution mass spectrometry screening (NTS HRMS/MS) can detect thousands of organic substances in environmental samples. However, new strategies are needed to focus time-intensive identification efforts on features with the highest potential to cause adverse effects instead of the most abundant ones. To address this challenge, we developed MLinvitroTox, a machine learning framework that uses molecular fingerprints derived from fragmentation spectra (MS2) for a rapid classification of thousands of unidentified HRMS/MS features as toxic/nontoxic based on nearly 400 target-specific and over 100 cytotoxic endpoints from ToxCast/Tox21. Model development results demonstrated that using customized molecular fingerprints and models, over a quarter of toxic endpoints and the majority of the associated mechanistic targets could be accurately predicted with sensitivities exceeding 0.95. Notably, SIRIUS molecular fingerprints and xboost (Extreme Gradient Boosting) models with SMOTE (Synthetic Minority Oversampling Technique) for handling data imbalance were a universally successful and robust modeling configuration. Validation of MLinvitroTox on MassBank spectra showed that toxicity could be predicted from molecular fingerprints derived from MS2 with an average balanced accuracy of 0.75. By applying MLinvitroTox to environmental HRMS/MS data, we confirmed the experimental results obtained with target analysis and narrowed the analytical focus from tens of thousands of detected signals to 783 features linked to potential toxicity, including 109 spectral matches and 30 compounds with confirmed toxic activity.
Collapse
Affiliation(s)
- Katarzyna Arturi
- Department
of Environmental Chemistry, Swiss Federal
Institute of Aquatic Science and Technology (Eawag), Ueberlandstrasse 133, 8600 Dübendorf, Switzerland
| | - Juliane Hollender
- Department
of Environmental Chemistry, Swiss Federal
Institute of Aquatic Science and Technology (Eawag), Ueberlandstrasse 133, 8600 Dübendorf, Switzerland
- Institute
of Biogeochemistry and Pollution Dynamics, Eidgenössische Technische Hochschule Zürich (ETH Zurich), Rämistrasse 101, 8092 Zürich, Switzerland
| |
Collapse
|
13
|
Sun S, Yao W, Wang Y, Yue P, Guo F, Deng X, Zhang Y. Development and validation of machine-learning models for the difficulty of retroperitoneal laparoscopic adrenalectomy based on radiomics. Front Endocrinol (Lausanne) 2023; 14:1265790. [PMID: 38034013 PMCID: PMC10687448 DOI: 10.3389/fendo.2023.1265790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 11/03/2023] [Indexed: 12/02/2023] Open
Abstract
Objective The aim is to construct machine learning (ML) prediction models for the difficulty of retroperitoneal laparoscopic adrenalectomy (RPLA) based on clinical and radiomic characteristics and to validate the models. Methods Patients who had undergone RPLA at Shanxi Bethune Hospital between August 2014 and December 2020 were retrospectively gathered. They were then randomly split into a training set and a validation set, maintaining a ratio of 7:3. The model was constructed using the training set and validated using the validation set. Furthermore, a total of 117 patients were gathered between January and December 2021 to form a prospective set for validation. Radiomic features were extracted by drawing the region of interest using the 3D slicer image computing platform and Python. Key features were selected through LASSO, and the radiomics score (Rad-score) was calculated. Various ML models were constructed by combining Rad-score with clinical characteristics. The optimal models were selected based on precision, recall, the area under the curve, F1 score, calibration curve, receiver operating characteristic curve, and decision curve analysis in the training, validation, and prospective sets. Shapley Additive exPlanations (SHAP) was used to demonstrate the impact of each variable in the respective models. Results After comparing the performance of 7 ML models in the training, validation, and prospective sets, it was found that the RF model had a more stable predictive performance, while xGBoost can significantly benefit patients. According to SHAP, the variable importance of the two models is similar, and both can reflect that the Rad-score has the most significant impact. At the same time, clinical characteristics such as hemoglobin, age, body mass index, gender, and diabetes mellitus also influenced the difficulty. Conclusion This study constructed ML models for predicting the difficulty of RPLA by combining clinical and radiomic characteristics. The models can help surgeons evaluate surgical difficulty, reduce risks, and improve patient benefits.
Collapse
Affiliation(s)
- Shiwei Sun
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Wei Yao
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Yue Wang
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Peng Yue
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Fuyu Guo
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Xiaoqian Deng
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
| | - Yangang Zhang
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, China
- Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Third Hospital of Shanxi Medical University, Taiyuan, China
- Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
14
|
Hedhoud Y, Mekhaznia T, Amroune M. An improvement of the CNN-XGboost model for pneumonia disease classification. Pol J Radiol 2023; 88:e483-e493. [PMID: 38020497 PMCID: PMC10660141 DOI: 10.5114/pjr.2023.132533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/14/2023] [Indexed: 12/01/2023] Open
Abstract
Purpose X-ray images are viewed as a vital component in emergency diagnosis. They are often used by deep learning applications for disease prediction, especially for thoracic pathologies. Pneumonia, a fatal thoracic disease induced by bacteria or viruses, generates a pleural effusion where fluids are accumulated inside lungs, leading to breathing difficulty. The utilization of X-ray imaging for pneumonia detection offers several advantages over other modalities such as computed tomography scans or magnetic resonance imaging. X-rays provide a cost-effective and easily accessible method for screening and diagnosing pneumonia, allowing for quicker assessment and timely intervention. However, interpretation of chest X-ray images depends on the radiologist's competency. Within this study, we aim to suggest new elements leading to good interpretation of chest X-ray images for pneumonia detection, especially for distinguishing between viral and bacterial pneumonia. Material and methods We proposed an interpretation model based on convolutional neural networks (CNNs) and extreme gradient boosting (XGboost) for pneumonia classification. The experimental study is processed through various scenarios, using Python as a programming language and a public database obtained from Guangzhou Women and Children's Medical Centre. Results The results demonstrate an acceptable accuracy of 87% within a mere 7 seconds, thereby endorsing its effectiveness compared to similar existing works. Conclusions Our study provides a model based on CNN and XGboost to classify images of viral and bacterial pneumonia. The work is a challenging task due to the lack of appropriate data. The experimental process allows a better accuracy of 87%, a specificity of 89%, and a sensitivity of 85%.
Collapse
Affiliation(s)
| | - Tahar Mekhaznia
- Tebessi University, Tebessa, Algeria
- LAMIS Laboratory, Cheikh Larbi Tebessi University, Tebessa, Algeria
| | - Mohamed Amroune
- LAMIS Laboratory, Cheikh Larbi Tebessi University, Tebessa, Algeria
| |
Collapse
|
15
|
Majumder S, Bhattacharya S, Debnath P, Ganguly B, Chanda M. Identification and classification of arrhythmic heartbeats from electrocardiogram signals using feature induced optimal extreme gradient boosting algorithm. Comput Methods Biomech Biomed Engin 2023:1-14. [PMID: 37807947 DOI: 10.1080/10255842.2023.2265009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Arrhythmic heartbeat classification has gained a lot of attention to accelerate the detection of cardiovascular diseases and mitigating the potential cause of one-third of deaths worldwide. In this article, a computer-aided diagnostic (CAD) approach has been proposed for the automated identification and classification of arrhythmic heartbeats from electrocardiogram (ECG) signals using multiple features aided supervised learning model. For proper diagnosis of arrhythmic heartbeats, MIT-BIH Arrhythmia database has been used to train and test the proposed approach. The ECG signals, extracted from sensor leads, have undergone pre-processing via discrete wavelet transform. Three sets of features, i.e. statistical, temporal, and spectral, are extracted from the processed ECG signals followed by random forest aided recursive feature elimination strategy to select the prominent features for proper classification of arrhythmic heartbeats by the proposed optimal extreme gradient boosting (O-XGBoost) classifier. Hyperparameters such as learning rate, tree-specific parameters, and regularization parameters have been optimized to improve the performance of the XGBoost classifier. Moreover, the synthetic minority over-sampling technique has been employed for balancing the dataset in order to improve the classification performance. Quantitative results reveal the remarkable performance over state-of-the-art methods. The proposed model can be implemented in any computer-aided diagnostic system with similar topological structures.
Collapse
Affiliation(s)
- S Majumder
- Electronics and Communication Engineering Department, Meghnad Saha Institute of Technology, Kolkata, India
| | - S Bhattacharya
- Electronics and Communication Engineering Department, Meghnad Saha Institute of Technology, Kolkata, India
| | - P Debnath
- Department of Basic Sciences & Humanities, Techno International New Town, Kolkata, India
| | - B Ganguly
- Department of Electrical Engineering, Meghnad Saha Institute of Technology, Kolkata, India
| | - M Chanda
- Electronics and Communication Engineering Department, Meghnad Saha Institute of Technology, Kolkata, India
| |
Collapse
|
16
|
Chang CC, Liu TC, Lu CJ, Chiu HC, Lin WN. Machine learning strategy for identifying altered gut microbiomes for diagnostic screening in myasthenia gravis. Front Microbiol 2023; 14:1227300. [PMID: 37829445 PMCID: PMC10565662 DOI: 10.3389/fmicb.2023.1227300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 09/06/2023] [Indexed: 10/14/2023] Open
Abstract
Myasthenia gravis (MG) is a neuromuscular junction disease with a complex pathophysiology and clinical variation for which no clear biomarker has been discovered. We hypothesized that because changes in gut microbiome composition often occur in autoimmune diseases, the gut microbiome structures of patients with MG would differ from those without, and supervised machine learning (ML) analysis strategy could be trained using data from gut microbiota for diagnostic screening of MG. Genomic DNA from the stool samples of MG and those without were collected and established a sequencing library by constructing amplicon sequence variants (ASVs) and completing taxonomic classification of each representative DNA sequence. Four ML methods, namely least absolute shrinkage and selection operator, extreme gradient boosting (XGBoost), random forest, and classification and regression trees with nested leave-one-out cross-validation were trained using ASV taxon-based data and full ASV-based data to identify key ASVs in each data set. The results revealed XGBoost to have the best predicted performance. Overlapping key features extracted when XGBoost was trained using the full ASV-based and ASV taxon-based data were identified, and 31 high-importance ASVs (HIASVs) were obtained, assigned importance scores, and ranked. The most significant difference observed was in the abundance of bacteria in the Lachnospiraceae and Ruminococcaceae families. The 31 HIASVs were used to train the XGBoost algorithm to differentiate individuals with and without MG. The model had high diagnostic classification power and could accurately predict and identify patients with MG. In addition, the abundance of Lachnospiraceae was associated with limb weakness severity. In this study, we discovered that the composition of gut microbiomes differed between MG and non-MG subjects. In addition, the proposed XGBoost model trained using 31 HIASVs had the most favorable performance with respect to analyzing gut microbiomes. These HIASVs selected by the ML model may serve as biomarkers for clinical use and mechanistic study in the future. Our proposed ML model can identify several taxonomic markers and effectively discriminate patients with MG from those without with a high accuracy, the ML strategy can be applied as a benchmark to conduct noninvasive screening of MG.
Collapse
Affiliation(s)
- Che-Cheng Chang
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Hou-Chang Chiu
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Taipei Medical University, Shuang-Ho Hospital, New Taipei City, Taiwan
| | - Wei-Ning Lin
- Graduate Institute of Biomedical and Pharmaceutical Science, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
17
|
Kozanecki D, Kowalczyk I, Krasoń S, Rabenda M, Domagalski Ł, Wirowski A. The Machine Learning Methods in Non-Destructive Testing of Dynamic Properties of Vacuum Insulated Glazing Type Composite Panels. Materials (Basel) 2023; 16:5055. [PMID: 37512328 PMCID: PMC10386526 DOI: 10.3390/ma16145055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
The VIG (Vacuum Insulated Glazing) unit, composite glazing in which the space between glass panes is filled with vacuum, is one of the most advanced technologies. The key elements of the construction of VIG plates are the support pillars. Therefore, an important issue is the analysis of their mechanical properties, such as Young's modulus and their variability over a long period of time. Machine learning (ML) methods are undergoing tremendous development these days. Among the many different techniques included in AI, neural networks (NN) and extreme gradient boosting (XGB) algorithms deserve special attention. In this study, to train selected methods of machine learning, numerical data developed in the VIG plate modelling process using Abaqus program were used. The test method proposed in this article is based on the VIG plate subjected to forced vibrations of specific frequencies and then the reading of the dynamic response of the composite plate. Such collected and pre-developed experimental data were used to obtain the mechanical parameters of the steel elements located inside the analysed vacuum glazing. In the future, the proposed research methods can be used to analyse the mechanical properties of other types of composite panels.
Collapse
Affiliation(s)
- Damian Kozanecki
- Department of Structural Mechanics, Lodz University of Technology, Politechniki 6, 93-590 Lodz, Poland
| | - Izabela Kowalczyk
- Department of Structural Mechanics, Lodz University of Technology, Politechniki 6, 93-590 Lodz, Poland
| | - Sylwia Krasoń
- Department of Structural Mechanics, Lodz University of Technology, Politechniki 6, 93-590 Lodz, Poland
| | - Martyna Rabenda
- Department of Concrete Structures, Lodz University of Technology, Politechniki 6, 93-590 Lodz, Poland
| | - Łukasz Domagalski
- Department of Structural Mechanics, Lodz University of Technology, Politechniki 6, 93-590 Lodz, Poland
| | - Artur Wirowski
- Department of Structural Mechanics, Lodz University of Technology, Politechniki 6, 93-590 Lodz, Poland
| |
Collapse
|
18
|
Jovanovic G, Perisic M, Bacanin N, Zivkovic M, Stanisic S, Strumberger I, Alimpic F, Stojic A. Potential of Coupling Metaheuristics-Optimized-XGBoost and SHAP in Revealing PAHs Environmental Fate. Toxics 2023; 11:394. [PMID: 37112620 PMCID: PMC10142005 DOI: 10.3390/toxics11040394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 04/17/2023] [Accepted: 04/19/2023] [Indexed: 06/19/2023]
Abstract
Polycyclic aromatic hydrocarbons (PAHs) refer to a group of several hundred compounds, among which 16 are identified as priority pollutants, due to their adverse health effects, frequency of occurrence, and potential for human exposure. This study is focused on benzo(a)pyrene, being considered an indicator of exposure to a PAH carcinogenic mixture. For this purpose, we have applied the XGBoost model to a two-year database of pollutant concentrations and meteorological parameters, with the aim to identify the factors which were mostly associated with the observed benzo(a)pyrene concentrations and to describe types of environments that supported the interactions between benzo(a)pyrene and other polluting species. The pollutant data were collected at the energy industry center in Serbia, in the vicinity of coal mining areas and power stations, where the observed benzo(a)pyrene maximum concentration for a study period reached 43.7 ngm-3. The metaheuristics algorithm has been used to optimize the XGBoost hyperparameters, and the results have been compared to the results of XGBoost models tuned by eight other cutting-edge metaheuristics algorithms. The best-produced model was later on interpreted by applying Shapley Additive exPlanations (SHAP). As indicated by mean absolute SHAP values, the temperature at the surface, arsenic, PM10, and total nitrogen oxide (NOx) concentrations appear to be the major factors affecting benzo(a)pyrene concentrations and its environmental fate.
Collapse
Affiliation(s)
- Gordana Jovanovic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Mirjana Perisic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Nebojsa Bacanin
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Miodrag Zivkovic
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Svetlana Stanisic
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Ivana Strumberger
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Filip Alimpic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
| | - Andreja Stojic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| |
Collapse
|
19
|
Faqih M, Omar MB, Ibrahim R. Prediction of Dry-Low Emission Gas Turbine Operating Range from Emission Concentration Using Semi-Supervised Learning. Sensors (Basel) 2023; 23:3863. [PMID: 37112203 PMCID: PMC10145957 DOI: 10.3390/s23083863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/27/2023] [Accepted: 04/03/2023] [Indexed: 06/19/2023]
Abstract
Dry-Low Emission (DLE) technology significantly reduces the emissions from the gas turbine process by implementing the principle of lean pre-mixed combustion. The pre-mix ensures low nitrogen oxides (NOx) and carbon monoxide (CO) production by operating at a particular range using a tight control strategy. However, sudden disturbances and improper load planning may lead to frequent tripping due to frequency deviation and combustion instability. Therefore, this paper proposed a semi-supervised technique to predict the suitable operating range as a tripping prevention strategy and a guide for efficient load planning. The prediction technique is developed by hybridizing Extreme Gradient Boosting and K-Means algorithm using actual plant data. Based on the result, the proposed model can predict the combustion temperature, nitrogen oxides, and carbon monoxide concentration with an accuracy represented by R squared value of 0.9999, 0.9309, and 0.7109, which outperforms other algorithms such as decision tree, linear regression, support vector machine, and multilayer perceptron. Further, the model can identify DLE gas turbine operation regions and determine the optimum range the turbine can safely operate while maintaining lower emission production. The typical DLE gas turbine's operating range can operate safely is found at 744.68 °C -829.64 °C. The proposed technique can be used as a preventive maintenance strategy in many applications involving tight operating range control in mitigating tripping issues. Furthermore, the findings significantly contribute to power generation fields for better control strategies to ensure the reliable operation of DLE gas turbines.
Collapse
Affiliation(s)
- Mochammad Faqih
- Department of Chemical Engineering, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia;
| | - Madiah Binti Omar
- Department of Chemical Engineering, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia;
| | - Rosdiazli Ibrahim
- Department of Electrical and Electronics Engineering, Universiti Teknologi PETRONAS, Seri Iskandar 32610, Malaysia;
| |
Collapse
|
20
|
Hauptman A, Balasubramaniam GM, Arnon S. Machine Learning Diffuse Optical Tomography Using Extreme Gradient Boosting and Genetic Programming. Bioengineering (Basel) 2023; 10:bioengineering10030382. [PMID: 36978773 PMCID: PMC10045273 DOI: 10.3390/bioengineering10030382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 03/18/2023] [Accepted: 03/20/2023] [Indexed: 03/30/2023] Open
Abstract
Diffuse optical tomography (DOT) is a non-invasive method for detecting breast cancer; however, it struggles to produce high-quality images due to the complexity of scattered light and the limitations of traditional image reconstruction algorithms. These algorithms can be affected by boundary conditions and have a low imaging accuracy, a shallow imaging depth, a long computation time, and a high signal-to-noise ratio. However, machine learning can potentially improve the performance of DOT by being better equipped to solve inverse problems, perform regression, classify medical images, and reconstruct biomedical images. In this study, we utilized a machine learning model called "XGBoost" to detect tumors in inhomogeneous breasts and applied a post-processing technique based on genetic programming to improve accuracy. The proposed algorithm was tested using simulated DOT measurements from complex inhomogeneous breasts and evaluated using the cosine similarity metrics and root mean square error loss. The results showed that the use of XGBoost and genetic programming in DOT could lead to more accurate and non-invasive detection of tumors in inhomogeneous breasts compared to traditional methods, with the reconstructed breasts having an average cosine similarity of more than 0.97 ± 0.07 and average root mean square error of around 0.1270 ± 0.0031 compared to the ground truth.
Collapse
Affiliation(s)
- Ami Hauptman
- Department of Computer Science, Sapir Academic College, Sderot 7915600, Israel
| | - Ganesh M Balasubramaniam
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be'er Sheva 8441405, Israel
| | - Shlomi Arnon
- Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Be'er Sheva 8441405, Israel
| |
Collapse
|
21
|
Liu Y, Lyu X, Yang B, Fang Z, Hu D, Shi L, Wu B, Tian Y, Zhang E, Yang Y. Early Triage of Critically Ill Adult Patients With Mushroom Poisoning: Machine Learning Approach. JMIR Form Res 2023; 7:e44666. [PMID: 36943366 PMCID: PMC10131621 DOI: 10.2196/44666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/23/2023] Open
Abstract
BACKGROUND Early triage of patients with mushroom poisoning is essential for administering precise treatment and reducing mortality. To our knowledge, there has been no established method to triage patients with mushroom poisoning based on clinical data. OBJECTIVE The purpose of this work was to construct a triage system to identify patients with mushroom poisoning based on clinical indicators using several machine learning approaches and to assess the prediction accuracy of these strategies. METHODS In all, 567 patients were collected from 5 primary care hospitals and facilities in Enshi, Hubei Province, China, and divided into 2 groups; 322 patients from 2 hospitals were used as the training cohort, and 245 patients from 3 hospitals were used as the test cohort. Four machine learning algorithms were used to construct the triage model for patients with mushroom poisoning. Performance was assessed using the area under the receiver operating characteristic curve (AUC), decision curve, sensitivity, specificity, and other representative statistics. Feature contributions were evaluated using Shapley additive explanations. RESULTS Among several machine learning algorithms, extreme gradient boosting (XGBoost) showed the best discriminative ability in 5-fold cross-validation (AUC=0.83, 95% CI 0.77-0.90) and the test set (AUC=0.90, 95% CI 0.83-0.96). In the test set, the XGBoost model had a sensitivity of 0.93 (95% CI 0.81-0.99) and a specificity of 0.79 (95% CI 0.73-0.85), whereas the physicians' assessment had a sensitivity of 0.86 (95% CI 0.72-0.95) and a specificity of 0.66 (95% CI 0.59-0.73). CONCLUSIONS The 14-factor XGBoost model for the early triage of mushroom poisoning can rapidly and accurately identify critically ill patients and will possibly serve as an important basis for the selection of treatment options and referral of patients, potentially reducing patient mortality and improving clinical outcomes.
Collapse
Affiliation(s)
- Yuxuan Liu
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - Xiaoguang Lyu
- Department of Gastroenterology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Bo Yang
- Department of Internal Medicine, Renmin Hospital of Xianfeng, Enshi, China
| | - Zhixiang Fang
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, China
| | - Dejun Hu
- Department of Internal Medicine, Renmin Hospital of Xianfeng, Enshi, China
| | - Lei Shi
- Department of Nephrology, Minda Hospital of Hubei Minzu University, Enshi, China
| | - Bisheng Wu
- Department of General Surgery, Renmin Hospital of Xianfeng, Enshi, China
| | - Yong Tian
- Department of Internal Medicine, Renmin Hospital of Laifeng, Enshi, China
| | - Enli Zhang
- Department of General Surgery, Central Hospital of Hefeng, Enshi, China
| | - YuanChao Yang
- Department of Gastroenterology, Renmin Hospital of Xuanen, Enshi, China
| |
Collapse
|
22
|
Armstrong CEJ, Niimi J, Boss PK, Pagay V, Jeffery DW. Use of Machine Learning with Fused Spectral Data for Prediction of Product Sensory Characteristics: The Case of Grape to Wine. Foods 2023; 12:foods12040757. [PMID: 36832832 PMCID: PMC9955574 DOI: 10.3390/foods12040757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 01/26/2023] [Accepted: 02/01/2023] [Indexed: 02/12/2023] Open
Abstract
Generations of sensors have been developed for predicting food sensory profiles to circumvent the use of a human sensory panel, but a technology that can rapidly predict a suite of sensory attributes from one spectral measurement remains unavailable. Using spectra from grape extracts, this novel study aimed to address this challenge by exploring the use of a machine learning algorithm, extreme gradient boosting (XGBoost), to predict twenty-two wine sensory attribute scores from five sensory stimuli: aroma, colour, taste, flavour, and mouthfeel. Two datasets were obtained from absorbance-transmission and fluorescence excitation-emission matrix (A-TEEM) spectroscopy with different fusion methods: variable-level data fusion of absorbance and fluorescence spectral fingerprints, and feature-level data fusion of A-TEEM and CIELAB datasets. The results for externally validated models showed slightly better performance using only A-TEEM data, predicting five out of twenty-two wine sensory attributes with R2 values above 0.7 and fifteen with R2 values above 0.5. Considering the complex biotransformation involved in processing grapes to wine, the ability to predict sensory properties based on underlying chemical composition in this way suggests that the approach could be more broadly applicable to the agri-food sector and other transformed foodstuffs to predict a product's sensory characteristics from raw material spectral attributes.
Collapse
Affiliation(s)
- Claire E. J. Armstrong
- Australian Research Council Training Centre for Innovative Wine Production, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
- School of Agriculture, Food and Wine, and Waite Research Institute, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
| | - Jun Niimi
- School of Agriculture, Food and Wine, and Waite Research Institute, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
- CSIRO Agriculture and Food, Locked Bag 2, Glen Osmond, SA 5064, Australia
| | - Paul K. Boss
- Australian Research Council Training Centre for Innovative Wine Production, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
- CSIRO Agriculture and Food, Locked Bag 2, Glen Osmond, SA 5064, Australia
| | - Vinay Pagay
- Australian Research Council Training Centre for Innovative Wine Production, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
- School of Agriculture, Food and Wine, and Waite Research Institute, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
| | - David W. Jeffery
- Australian Research Council Training Centre for Innovative Wine Production, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
- School of Agriculture, Food and Wine, and Waite Research Institute, The University of Adelaide, PMB 1, Glen Osmond, SA 5064, Australia
- Correspondence:
| |
Collapse
|
23
|
Eysenbach G, Chao HJ, Chiang YC, Chen HY. Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation. J Med Internet Res 2023; 25:e43734. [PMID: 36749620 PMCID: PMC9944157 DOI: 10.2196/43734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/25/2022] [Accepted: 01/16/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. OBJECTIVE We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. METHODS This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. RESULTS The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. CONCLUSIONS Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.
Collapse
Affiliation(s)
| | - Horng-Jiun Chao
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Yi-Chun Chiang
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| | - Hsiang-Yin Chen
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
24
|
Li S, Dou R, Song X, Lui KY, Xu J, Guo Z, Hu X, Guan X, Cai C. Developing an Interpretable Machine Learning Model to Predict in-Hospital Mortality in Sepsis Patients: A Retrospective Temporal Validation Study. J Clin Med 2023; 12. [PMID: 36769564 DOI: 10.3390/jcm12030915] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 01/22/2023] [Accepted: 01/23/2023] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Risk stratification plays an essential role in the decision making for sepsis management, as existing approaches can hardly satisfy the need to assess this heterogeneous population. We aimed to develop and validate a machine learning model to predict in-hospital mortality in critically ill patients with sepsis. METHODS Adult patients fulfilling the definition of Sepsis-3 were included at a large tertiary medical center. Relevant clinical features were extracted within the first 24 h in ICU, re-classified into different genres, and utilized for model development under three strategies: "Basic + Lab", "Basic + Intervention", and "Whole" feature sets. Extreme gradient boosting (XGBoost) was compared with logistic regression (LR) and established severity scores. Temporal validation was conducted using admissions from 2017 to 2019. RESULTS The final cohort included 24,272 patients, of which 4013 patients formed the test cohort for temporal validation. The trained and fine-tuned XGBoost model with the whole feature set showed the best discriminatory ability in the test cohort with AUROC as 0.85, significantly higher than the XGBoost "Basic + Lab" model (0.83), the LR "Whole" model (0.82), SOFA (0.63), SAPS-II (0.73), and LODS score (0.74). The performance in varying subgroups remained robust, and predictors, such as increased urine output and supplemental oxygen therapy, were crucially correlated with improved survival when interpretability was explored. CONCLUSIONS We developed and validated a novel XGBoost-based model and demonstrated significantly improved performance to LR and other scores in predicting the mortality risks of sepsis patients in the hospital using features in the first 24 h.
Collapse
|
25
|
Song X, Li H, Chen Q, Zhang T, Huang G, Zou L, Du D. Predicting pneumonia during hospitalization in flail chest patients using machine learning approaches. Front Surg 2023; 9:1060691. [PMID: 36684357 PMCID: PMC9852626 DOI: 10.3389/fsurg.2022.1060691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 11/14/2022] [Indexed: 01/07/2023] Open
Abstract
Objective Pneumonia is a common pulmonary complication of flail chest, causing high morbidity and mortality rates in affected patients. The existing methods for identifying pneumonia have low accuracy, and their use may delay antimicrobial therapy. However, machine learning can be combined with electronic medical record systems to identify information and assist in quick clinical decision-making. Our study aimed to develop a novel machine-learning model to predict pneumonia risk in flail chest patients. Methods From January 2011 to December 2021, the electronic medical records of 169 adult patients with flail chest at a tertiary teaching hospital in an urban level I Trauma Centre in Chongqing were retrospectively analysed. Then, the patients were randomly divided into training and test sets at a ratio of 7:3. Using the Fisher score, the best subset of variables was chosen. The performance of the seven models was evaluated by computing the area under the receiver operating characteristic curve (AUC). The output of the XGBoost model was shown using the Shapley Additive exPlanation (SHAP) method. Results Of 802 multiple rib fracture patients, 169 flail chest patients were eventually included, and 86 (50.80%) were diagnosed with pneumonia. The XGBoost model performed the best among all seven machine-learning models. The AUC of the XGBoost model was 0.895 (sensitivity: 84.3%; specificity: 80.0%).Pneumonia in flail chest patients was associated with several features: systolic blood pressure, pH value, blood transfusion, and ISS. Conclusion Our study demonstrated that the XGBoost model with 32 variables had high reliability in assessing risk indicators of pneumonia in flail chest patients. The SHAP method can identify vital pneumonia risk factors, making the XGBoost model's output clinically meaningful.
Collapse
Affiliation(s)
- Xiaolin Song
- School of Medicine, Chongqing University, Chongqing, China,Department of Traumatology, Chongqing Emergency Medical Center, Chongqing University Central Hospital, Chongqing, China
| | - Hui Li
- Department of Traumatology, Chongqing Emergency Medical Center, Chongqing University Central Hospital, Chongqing, China
| | - Qingsong Chen
- Department of Traumatology, Chongqing Emergency Medical Center, Chongqing University Central Hospital, Chongqing, China
| | - Tao Zhang
- School of Medicine, Chongqing University, Chongqing, China,Department of Traumatology, Chongqing Emergency Medical Center, Chongqing University Central Hospital, Chongqing, China
| | - Guangbin Huang
- Department of Traumatology, Chongqing Emergency Medical Center, Chongqing University Central Hospital, Chongqing, China
| | - Lingyun Zou
- Clinical Data Research Center, Chongqing Emergency Medical Center, Chongqing University Central Hospital, Chongqing, China,Correspondence: Dingyuan Du Lingyun Zou
| | - Dingyuan Du
- Department of Traumatology, Chongqing Emergency Medical Center, Chongqing University Central Hospital, Chongqing, China,Correspondence: Dingyuan Du Lingyun Zou
| |
Collapse
|
26
|
Chen M, Lan Q, Nie S, Hu L, Fang Y, Cui W, Bai X, Liu L, Zhu B. Forensic efficiencies of individual identification, kinship testing and ancestral inference in three Yunnan groups based on a self-developed multiple DIP panel. Front Genet 2023; 13:1057231. [PMID: 36685924 PMCID: PMC9845582 DOI: 10.3389/fgene.2022.1057231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Accepted: 11/25/2022] [Indexed: 01/06/2023] Open
Abstract
Deletion/insertion polymorphism (DIP), as a short insertion/deletion sequence polymorphic genetic marker, has attracted the attention of forensic genetic scientist due to its lack of stutter, short amplicon and abundant ancestral information. In this study, based on a self-developed 43 autosomal deletion/insertion polymorphism (A-DIP) loci panel which could meet the forensic application purposes of individual identification, kinship testing and ancestral inference to some extent, we evaluated the forensic efficiencies of the above three forensic objectives in Chinese Yi, Hani and Miao groups of Yunnan province. The cumulative match probability (CPM) and combined probability of exclusion (CPE) of these three groups were 1.11433E-18, 8.24299E-19, 4.21721E-18; 0.999610217, 0.999629285 and 0.999582084, respectively. Average 96.65% full sibling pairs could be identified from unrelated individual pairs (as likelihood ratios > 1) using this DIP panel, whereas the average false positive rate was 3.69% in three target Yunnan groups. With the biogeographical ancestor prediction models constructed by extreme gradient boosting (XGBoost) and support vector machine (SVM) algorithms, 0.8239 (95% CI 0.7984, 0.8474) of the unrelated individuals could be correctly divided according to the continental origins based on the 43 A-DIPs which were large frequency distribution differentiations among different continental populations. The present results of principal component analysis (PCA), multidimensional scaling (MDS), neighbor joining (NJ) and maximum likelihood (ML) phylogenetic trees and STRUCTURE analyses indicated that these three Yunnan groups had relatively close genetic distances with East Asian populations.
Collapse
Affiliation(s)
- Man Chen
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Qiong Lan
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Shengjie Nie
- School of Forensic Medicine, Kunming Medical University, Kunming, China
| | - Liping Hu
- School of Forensic Medicine, Kunming Medical University, Kunming, China
| | - Yating Fang
- School of Basic Medical Sciences, Anhui Medical University, Hefei, China
| | - Wei Cui
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Xiaole Bai
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Liu Liu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China
| | - Bofeng Zhu
- Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, China,Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China,Key Laboratory of Shaanxi Province for Craniofacial Precision Medicine Research, College of Stomatology, Xi’an Jiaotong University, Xi’an, China,*Correspondence: Bofeng Zhu,
| |
Collapse
|
27
|
Dehdar S, Salimifard K, Mohammadi R, Marzban M, Saadatmand S, Fararouei M, Dianati-Nasab M. Applications of different machine learning approaches in prediction of breast cancer diagnosis delay. Front Oncol 2023; 13:1103369. [PMID: 36874113 PMCID: PMC9978377 DOI: 10.3389/fonc.2023.1103369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 01/30/2023] [Indexed: 02/18/2023] Open
Abstract
Background The increasing rate of breast cancer (BC) incidence and mortality in Iran has turned this disease into a challenge. A delay in diagnosis leads to more advanced stages of BC and a lower chance of survival, which makes this cancer even more fatal. Objectives The present study was aimed at identifying the predicting factors for delayed BC diagnosis in women in Iran. Methods In this study, four machine learning methods, including extreme gradient boosting (XGBoost), random forest (RF), neural networks (NNs), and logistic regression (LR), were applied to analyze the data of 630 women with confirmed BC. Also, different statistical methods, including chi-square, p-value, sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (AUC), were utilized in different steps of the survey. Results Thirty percent of patients had a delayed BC diagnosis. Of all the patients with delayed diagnoses, 88.5% were married, 72.1% had an urban residency, and 84.8% had health insurance. The top three important factors in the RF model were urban residency (12.04), breast disease history (11.58), and other comorbidities (10.72). In the XGBoost, urban residency (17.54), having other comorbidities (17.14), and age at first childbirth (>30) (13.13) were the top factors; in the LR model, having other comorbidities (49.41), older age at first childbirth (82.57), and being nulliparous (44.19) were the top factors. Finally, in the NN, it was found that being married (50.05), having a marriage age above 30 (18.03), and having other breast disease history (15.83) were the main predicting factors for a delayed BC diagnosis. Conclusion Machine learning techniques suggest that women with an urban residency who got married or had their first child at an age older than 30 and those without children are at a higher risk of diagnosis delay. It is necessary to educate them about BC risk factors, symptoms, and self-breast examination to shorten the delay in diagnosis.
Collapse
Affiliation(s)
- Samira Dehdar
- Computational Intelligence & Intelligent Optimization Research Group, Business and Economic School, Persian Gulf University, Bushehr, Iran
| | - Khodakaram Salimifard
- Computational Intelligence & Intelligent Optimization Research Group, Business and Economic School, Persian Gulf University, Bushehr, Iran
| | - Reza Mohammadi
- Business Analytics Section, Amsterdam Business School, University of Amsterdam, Amsterdam, Netherlands
| | - Maryam Marzban
- Department of Public Health, School of Public Health, Bushehr University of Medical Science, Bushehr, Iran
| | - Sara Saadatmand
- Computational Intelligence & Intelligent Optimization Research Group, Business and Economic School, Persian Gulf University, Bushehr, Iran
| | - Mohammad Fararouei
- Department of Epidemiology, School of Public Health, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mostafa Dianati-Nasab
- Department of Complex Genetics and Epidemiology, School of Nutrition and Translational Research in Metabolism, Maastricht University, Maastricht, Netherlands
| |
Collapse
|
28
|
Hu X, Hu X, Yu Y, Wang J. Prediction model for gestational diabetes mellitus using the XG Boost machine learning algorithm. Front Endocrinol (Lausanne) 2023; 14:1105062. [PMID: 36967760 PMCID: PMC10034315 DOI: 10.3389/fendo.2023.1105062] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 01/30/2023] [Indexed: 03/29/2023] Open
Abstract
OBJECTIVE To develop the extreme gradient boosting (XG Boost) machine learning (ML) model for predicting gestational diabetes mellitus (GDM) compared with a model using the traditional logistic regression (LR) method. METHODS A case-control study was carried out among pregnant women, who were assigned to either the training set (these women were recruited from August 2019 to November 2019) or the testing set (these women were recruited in August 2020). We applied the XG Boost ML model approach to identify the best set of predictors out of a set of 33 variables. The performance of the prediction model was determined by using the area under the receiver operating characteristic (ROC) curve (AUC) to assess discrimination, and the Hosmer-Lemeshow (HL) test and calibration plots to assess calibration. Decision curve analysis (DCA) was introduced to evaluate the clinical use of each of the models. RESULTS A total of 735 and 190 pregnant women were included in the training and testing sets, respectively. The XG Boost ML model, which included 20 predictors, resulted in an AUC of 0.946 and yielded a predictive accuracy of 0.875, whereas the model using a traditional LR included four predictors and presented an AUC of 0.752 and yielded a predictive accuracy of 0.786. The HL test and calibration plots show that the two models have good calibration. DCA indicated that treating only those women whom the XG Boost ML model predicts are at risk of GDM confers a net benefit compared with treating all women or treating none. CONCLUSIONS The established model using XG Boost ML showed better predictive ability than the traditional LR model in terms of discrimination. The calibration performance of both models was good.
Collapse
Affiliation(s)
- Xiaoqi Hu
- Department of Nursing, Yantian District People's Hospital, Shenzhen, Guangdong, China
| | - Xiaolin Hu
- School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China
| | - Ya Yu
- Department of Nursing, Guangzhou First People's Hospital, Guangzhou, Guangdong, China
| | - Jia Wang
- Department of Nursing, Shenzhen Hospital of Southern Medical University, Shenzhen, Guangdong, China
| |
Collapse
|
29
|
Srisongkram T, Weerapreeyakul N. Drug Repurposing against KRAS Mutant G12C: A Machine Learning, Molecular Docking, and Molecular Dynamics Study. Int J Mol Sci 2022; 24:ijms24010669. [PMID: 36614109 PMCID: PMC9821013 DOI: 10.3390/ijms24010669] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/23/2022] [Accepted: 12/27/2022] [Indexed: 01/03/2023] Open
Abstract
The Kirsten rat sarcoma viral G12C (KRASG12C) protein is one of the most common mutations in non-small-cell lung cancer (NSCLC). KRASG12C inhibitors are promising for NSCLC treatment, but their weaker activity in resistant tumors is their drawback. This study aims to identify new KRASG12C inhibitors from among the FDA-approved covalent drugs by taking advantage of artificial intelligence. The machine learning models were constructed using an extreme gradient boosting (XGBoost) algorithm. The models can predict KRASG12C inhibitors well, with an accuracy score of validation = 0.85 and Q2Ext = 0.76. From 67 FDA-covalent drugs, afatinib, dacomitinib, acalabrutinib, neratinib, zanubrutinib, dutasteride, and finasteride were predicted to be active inhibitors. Afatinib obtained the highest predictive log-inhibitory concentration at 50% (pIC50) value against KRASG12C protein close to the KRASG12C inhibitors. Only afatinib, neratinib, and zanubrutinib covalently bond at the active site like the KRASG12C inhibitors in the KRASG12C protein (PDB ID: 6OIM). Moreover, afatinib, neratinib, and zanubrutinib exhibited a distance deviation between the KRASG2C protein-ligand complex similar to the KRASG12C inhibitors. Therefore, afatinib, neratinib, and zanubrutinib could be used as drug candidates against the KRASG12C protein. This finding unfolds the benefit of artificial intelligence in drug repurposing against KRASG12C protein.
Collapse
|
30
|
Xiong S, Liu Z, Min C, Shi Y, Zhang S, Liu W. Compressive Strength Prediction of Cemented Backfill Containing Phosphate Tailings Using Extreme Gradient Boosting Optimized by Whale Optimization Algorithm. Materials (Basel) 2022; 16:308. [PMID: 36614647 PMCID: PMC9821812 DOI: 10.3390/ma16010308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 12/19/2022] [Accepted: 12/22/2022] [Indexed: 06/17/2023]
Abstract
Unconfined compressive strength (UCS) is the most significant mechanical index for cemented backfill, and it is mainly determined by traditional mechanical tests. This study optimized the extreme gradient boosting (XGBoost) model by utilizing the whale optimization algorithm (WOA) to construct a hybrid model for the UCS prediction of cemented backfill. The PT proportion, the OPC proportion, the FA proportion, the solid concentration, and the curing age were selected as input variables, and the UCS of the cemented PT backfill was selected as the output variable. The original XGBoost model, the XGBoost model optimized by particle swarm optimization (PSO-XGBoost), and the decision tree (DT) model were also constructed for comparison with the WOA-XGBoost model. The results showed that the values of the root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE) obtained from the WOA-XGBoost model, XGBoost model, PSO-XGBoost model, and DT model were equal to (0.241, 0.967, 0.184), (0.426, 0.917, 0.336), (0.316, 0.943, 0.258), and (0.464, 0.852, 0.357), respectively. The results show that the proposed WOA-XGBoost has better prediction accuracy than the other machine learning models, confirming the ability of the WOA to enhance XGBoost in cemented PT backfill strength prediction. The WOA-XGBoost model could be a fast and accurate method for the UCS prediction of cemented PT backfill.
Collapse
Affiliation(s)
| | | | | | - Ying Shi
- Correspondence: ; Tel.: +86-18670351208
| | | | | |
Collapse
|
31
|
Toma RN, Gao Y, Piltan F, Im K, Shon D, Yoon TH, Yoo DS, Kim JM. Classification Framework of the Bearing Faults of an Induction Motor Using Wavelet Scattering Transform-Based Features. Sensors (Basel) 2022; 22:s22228958. [PMID: 36433553 PMCID: PMC9696953 DOI: 10.3390/s22228958] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 11/08/2022] [Accepted: 11/16/2022] [Indexed: 05/27/2023]
Abstract
In the machine learning and data science pipelines, feature extraction is considered the most crucial component according to researchers, where generating a discriminative feature matrix is the utmost challenging task to achieve high classification accuracy. Generally, the classical feature extraction techniques are sensitive to the noisy component of the signal and need more time for training. To deal with these issues, a comparatively new feature extraction technique, referred to as a wavelet scattering transform (WST) is utilized, and incorporated with ML classifiers to design a framework for bearing fault classification in this paper. The WST is a knowledge-based technique, and the structure is similar to the convolution neural network. This technique provides low-variance features of real-valued signals, which are usually necessary for classification tasks. These signals are resistant to signal deformation and preserve information at high frequencies. The current signal data from a publicly available dataset for three different bearing conditions are considered. By combining the scattering path coefficients, the decomposition coefficients from the 0th and 1st layers are considered as features. The experimental results demonstrate that WST-based features, when used with ensemble ML algorithms, could achieve more than 99% classification accuracy. The performance of ANN models with these features is similar. This work exhibits that utilizing WST coefficients for the motor current signal as features can improve the bearing fault classification accuracy when compared to other feature extraction approaches such as empirical wavelet transform (EWT), information fusion (IF), and wavelet packet decomposition (WPD). Thus, our proposed approach can be considered as an effective classification method for the fault diagnosis of rotating machinery.
Collapse
Affiliation(s)
- Rafia Nishat Toma
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea
| | - Yangde Gao
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea
| | - Farzin Piltan
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea
| | - Kichang Im
- ICT Convergence Safety Research Center, University of Ulsan, Ulsan 44610, Republic of Korea
| | - Dongkoo Shon
- Electronics and Telecommunications Research Institute (ETRI), Daejeon 34129, Republic of Korea
| | - Tae Hyun Yoon
- Electronics and Telecommunications Research Institute (ETRI), Daejeon 34129, Republic of Korea
| | - Dae-Seung Yoo
- Electronics and Telecommunications Research Institute (ETRI), Daejeon 34129, Republic of Korea
| | - Jong-Myon Kim
- Department of Electrical, Electronics and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea
- PD Technologies Cooperation, Ulsan 44610, Republic of Korea
| |
Collapse
|
32
|
Kim M, Okuyucu O, Ordu E, Ordu S, Arslan Ö, Ko J. Prediction of Undrained Shear Strength by the GMDH-Type Neural Network Using SPT-Value and Soil Physical Properties. Materials (Basel) 2022; 15:6385. [PMID: 36143696 PMCID: PMC9502201 DOI: 10.3390/ma15186385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 08/31/2022] [Accepted: 09/07/2022] [Indexed: 06/16/2023]
Abstract
This study presents a novel method for predicting the undrained shear strength (cu) using artificial intelligence technology. The cu value is critical in geotechnical applications and difficult to directly determine without laboratory tests. The group method of data handling (GMDH)-type neural network (NN) was utilized for the prediction of cu. The GMDH-type NN models were designed with various combinations of input parameters. In the prediction, the effective stress (σv'), standard penetration test result (NSPT), liquid limit (LL), plastic limit (PL), and plasticity index (PI) were used as input parameters in the design of the prediction models. In addition, the GMDH-type NN models were compared with the most commonly used method (i.e., linear regression) and other regression models such as random forest (RF) and support vector regression (SVR) models as comparative methods. In order to evaluate each model, the correlation coefficient (R2), mean absolute error (MAE), and root mean square error (RMSE) were calculated for different input parameter combinations. The most effective model, the GMDH-type NN with input parameters (e.g., σv', NSPT, LL, PL, PI), had a higher correlation coefficient (R2 = 0.83) and lower error rates (MAE = 14.64 and RMSE = 22.74) than other methods used in the prediction of cu value. Furthermore, the impact of input variables on the model output was investigated using the SHAP (SHApley Additive ExPlanations) technique based on the extreme gradient boosting (XGBoost) ensemble learning algorithm. The results demonstrated that using the GMDH-type NN is an efficient method in obtaining a new empirical mathematical model to provide a reliable prediction of the undrained shear strength of soils.
Collapse
Affiliation(s)
- Mintae Kim
- School of Civil, Environmental, and Architectural Engineering, Korea University, Seoul 02841, Korea
| | - Osman Okuyucu
- Department of Civil Engineering, Tekirdağ Namık Kemal University, Tekirdağ 59860, Turkey
| | - Ertuğrul Ordu
- Department of Civil Engineering, Tekirdağ Namık Kemal University, Tekirdağ 59860, Turkey
| | - Seyma Ordu
- Department of Environmental Engineering, Tekirdağ Namık Kemal University, Tekirdağ 59860, Turkey
| | - Özkan Arslan
- Department of Electronics and Communication Engineering, Tekirdağ Namık Kemal University, Tekirdağ 59860, Turkey
| | - Junyoung Ko
- Department of Civil Engineering, Chungnam National University, Daejeon 34134, Korea
| |
Collapse
|
33
|
Sun CK, Tang YX, Liu TC, Lu CJ. An Integrated Machine Learning Scheme for Predicting Mammographic Anomalies in High-Risk Individuals Using Questionnaire-Based Predictors. Int J Environ Res Public Health 2022; 19:ijerph19159756. [PMID: 35955112 PMCID: PMC9368335 DOI: 10.3390/ijerph19159756] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/02/2022] [Accepted: 08/06/2022] [Indexed: 05/09/2023]
Abstract
This study aimed to investigate the important predictors related to predicting positive mammographic findings based on questionnaire-based demographic and obstetric/gynecological parameters using the proposed integrated machine learning (ML) scheme. The scheme combines the benefits of two well-known ML algorithms, namely, least absolute shrinkage and selection operator (Lasso) logistic regression and extreme gradient boosting (XGB), to provide adequate prediction for mammographic anomalies in high-risk individuals and the identification of significant risk factors. We collected questionnaire data on 18 breast-cancer-related risk factors from women who participated in a national mammographic screening program between January 2017 and December 2020 at a single tertiary referral hospital to correlate with their mammographic findings. The acquired data were retrospectively analyzed using the proposed integrated ML scheme. Based on the data from 21,107 valid questionnaires, the results showed that the Lasso logistic regression models with variable combinations generated by XGB could provide more effective prediction results. The top five significant predictors for positive mammography results were younger age, breast self-examination, older age at first childbirth, nulliparity, and history of mammography within 2 years, suggesting a need for timely mammographic screening for women with these risk factors.
Collapse
Affiliation(s)
- Cheuk-Kay Sun
- Division of Hepatology and Gastroenterology, Department of Internal Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei 11101, Taiwan
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- School of Medicine, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- School of Medicine, Taipei Medical University, Taipei 11031, Taiwan
| | - Yun-Xuan Tang
- Department of Radiology, Shin Kong Wu Ho-Su Memorial Hospital, Taipei 11101, Taiwan
- Department of Medical Imaging and Radiological Technology, Yuanpei University of Medical Technology, Hsinchu 30015, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 24205, Taiwan
- Correspondence:
| |
Collapse
|
34
|
Stang M, Krämer B, Nagl C, Schäfers W. From human business to machine learning—methods for automating real estate appraisals and their practical implications. Z Immobilienökonomie 2022. [PMCID: PMC9294847 DOI: 10.1365/s41056-022-00063-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Until recently, in most countries, the use of Automated Valuation Models (AVMs) in the lending process was only allowed for support purposes, and not as the sole value-determining tool. However, this is currently changing, and regulators around the world are actively discussing the approval of AVMs. But the discussion is generally limited to AVMs that are based on already established methods such as an automation of the traditional sales comparison approach or linear regressions. Modern machine learning approaches are almost completely excluded from the debate. Accordingly, this study contributes to the discussion on why AVMs based on machine learning approaches should also be considered. For this purpose, an automation of the sales comparison method by using filters and similarity functions, two hedonic price functions, namely an OLS model and a GAM model, as well as a XGBoost machine learning approach, are applied to a dataset of 1.2 million residential properties across Germany. We find that the machine learning method XGBoost offers the overall best performance regarding the accuracy of estimations. Practical application shows that optimization of the established methods—OLS and GAM—is time-consuming and labor-intensive, and has significant disadvantages when being implemented on a national scale. In addition, our results show that different types of methods perform best in different regions and, thus, regulators should not only focus on one single method, but consider a multitude of them.
Collapse
Affiliation(s)
- Moritz Stang
- grid.7727.50000 0001 2190 5763International Real Estate Business School, University of Regensburg, Regensburg, Germany
| | - Bastian Krämer
- grid.7727.50000 0001 2190 5763International Real Estate Business School, University of Regensburg, Regensburg, Germany
| | - Cathrine Nagl
- grid.7727.50000 0001 2190 5763International Real Estate Business School, University of Regensburg, Regensburg, Germany
| | - Wolfgang Schäfers
- grid.7727.50000 0001 2190 5763International Real Estate Business School, University of Regensburg, Regensburg, Germany
| |
Collapse
|
35
|
Zhou Y, Han F, Shi XL, Zhang JX, Li GY, Yuan CC, Lu GT, Hu LH, Pan JJ, Xiao WM, Yao GH. Prediction of the severity of acute pancreatitis using machine learning models. Postgrad Med 2022. [PMID: 35801388 DOI: 10.1080/00325481.2022.2099193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2022]
Abstract
BACKGROUND Acute pancreatitis (AP) is the most common pancreatic disease. Predicting the severity of AP is critical for making preventive decisions. However, the performance of existing scoring systems in predicting AP severity was not satisfactory. The purpose of this study was to develop predictive models for the severity of AP using machine learning (ML) algorithms and explore the important predictors that affected the prediction results. METHODS The data of 441 patients in the Department of Gastroenterology in our hospital were analyzed retrospectively. The demographic data, blood routine and blood biochemical indexes, and the CTSI score were collected to develop five different ML predictive models to predict the severity of AP. The performance of the models was evaluated by the area under the receiver operating characteristic curve (AUC). The important predictors were determined by ranking the feature importance of the predictive factors. RESULTS Compared to other ML models, the extreme gradient boosting model (XGBoost) showed better performance in predicting severe AP, with an AUC of 0.906, an accuracy of 0.902, a sensitivity of 0.700, a specificity of 0.961, and a F1socre of 0.764. Further analysis showed that the CTSI score, ALB, LDH, and NEUT were the important predictors of the severity of AP. CONCLUSION The results showed that the XGBoost algorithm can accurately predict the severity of AP, which can provide an assistance for the clinicians to identify severe AP at an early stage.
Collapse
|
36
|
Boeckaerts D, Stock M, De Baets B, Briers Y. Identification of Phage Receptor-Binding Protein Sequences with Hidden Markov Models and an Extreme Gradient Boosting Classifier. Viruses 2022; 14:1329. [PMID: 35746800 DOI: 10.3390/v14061329] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 06/09/2022] [Accepted: 06/16/2022] [Indexed: 11/30/2022] Open
Abstract
Receptor-binding proteins (RBPs) of bacteriophages initiate the infection of their corresponding bacterial host and act as the primary determinant for host specificity. The ever-increasing amount of sequence data enables the development of predictive models for the automated identification of RBP sequences. However, the development of such models is challenged by the inconsistent or missing annotation of many phage proteins. Recently developed tools have started to bridge this gap but are not specifically focused on RBP sequences, for which many different annotations are available. We have developed two parallel approaches to alleviate the complex identification of RBP sequences in phage genomic data. The first combines known RBP-related hidden Markov models (HMMs) from the Pfam database with custom-built HMMs to identify phage RBPs based on protein domains. The second approach consists of training an extreme gradient boosting classifier that can accurately discriminate between RBPs and other phage proteins. We explained how these complementary approaches can reinforce each other in identifying RBP sequences. In addition, we benchmarked our methods against the recently developed PhANNs tool. Our best performing model reached a precision-recall area-under-the-curve of 93.8% and outperformed PhANNs on an independent test set, reaching an F1-score of 84.0% compared to 69.8%.
Collapse
|
37
|
Wang Y, Miao X, Xiao G, Huang C, Sun J, Wang Y, Li P, You X. Clinical Prediction of Heart Failure in Hemodialysis Patients: Based on the Extreme Gradient Boosting Method. Front Genet 2022; 13:889378. [PMID: 35559036 PMCID: PMC9086166 DOI: 10.3389/fgene.2022.889378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 03/15/2022] [Indexed: 11/18/2022] Open
Abstract
Background: Heart failure (HF) is the main cause of mortality in hemodialysis (HD) patients. However, it is still a challenge for the prediction of HF in HD patients. Therefore, we aimed to establish and validate a prediction model to predict HF events in HD patients. Methods: A total of 355 maintenance HD patients from two hospitals were included in this retrospective study. A total of 21 variables, including traditional demographic characteristics, medical history, and blood biochemical indicators, were used. Two classification models were established based on the extreme gradient boosting (XGBoost) algorithm and traditional linear logistic regression. The performance of the two models was evaluated based on calibration curves and area under the receiver operating characteristic curves (AUCs). Feature importance and SHapley Additive exPlanation (SHAP) were used to recognize risk factors from the variables. The Kaplan–Meier curve of each risk factor was constructed and compared with the log-rank test. Results: Compared with the traditional linear logistic regression, the XGBoost model had better performance in accuracy (78.5 vs. 74.8%), sensitivity (79.6 vs. 75.6%), specificity (78.1 vs. 74.4%), and AUC (0.814 vs. 0.722). The feature importance and SHAP value of XGBoost indicated that age, hypertension, platelet count (PLT), C-reactive protein (CRP), and white blood cell count (WBC) were risk factors of HF. These results were further confirmed by Kaplan–Meier curves. Conclusions: The HF prediction model based on XGBoost had a satisfactory performance in predicting HF events, which could prove to be a useful tool for the early prediction of HF in HD.
Collapse
Affiliation(s)
- Yanfeng Wang
- The School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China
| | - Xisha Miao
- The School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China
| | - Gang Xiao
- Department of Clinical Laboratory, The Third Affiliated Hospital, Southern Medical University, Guangzhou, China
| | - Chun Huang
- The School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China
| | - Junwei Sun
- The School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China
| | - Ying Wang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Panlong Li
- The School of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, China
| | - Xu You
- Department of Clinical Laboratory, The Third Affiliated Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
38
|
Li Y, Zou Z, Gao Z, Wang Y, Xiao M, Xu C, Jiang G, Wang H, Jin L, Wang J, Wang HZ, Guo S, Wu J. Prediction of lung cancer risk in Chinese population with genetic-environment factor using extreme gradient boosting. Cancer Med 2022; 11:4469-4478. [PMID: 35499292 PMCID: PMC9741969 DOI: 10.1002/cam4.4800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 04/22/2022] [Accepted: 04/24/2022] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Detecting early-stage lung cancer is critical to reduce the lung cancer mortality rate; however, existing models based on germline variants perform poorly, and new models are needed. This study aimed to use extreme gradient boosting to develop a predictive model for the early diagnosis of lung cancer in a multicenter case-control study. MATERIALS AND METHODS A total of 974 cases and 1005 controls in Shanghai and Taizhou were recruited, and 61 single nucleotide polymorphisms (SNPs) were genotyped. Multivariate logistic regression was used to calculate the association between signal SNPs and lung cancer risk. Logistic regression (LR) and extreme gradient boosting (XGBoost) algorithms, a large-scale machine learning algorithm, were adopted to build the lung cancer risk model. In both models, 10-fold cross-validation was performed, and model predictive performance was evaluated by the area under the curve (AUC). RESULTS After FDR adjustment, TYMS rs3819102 and BAG6 rs1077393 were significantly associated with lung cancer risk (p < 0.05). For lung cancer risk prediction, the model predicted only with epidemiology attained an AUC of 0.703 for LR and 0.744 for XGBoost. Compared with the LR model predicted only with epidemiology, further adding SNPs and applying XGBoost increased the AUC to 0.759 (p < 0.001) in the XGBoost model. BAG6 rs1077393 was the most important predictor among all SNPs in the lung cancer prediction XGBoost model, followed by TERT rs2735845 and CAMKK1 rs7214723. Further stratification in lung adenocarcinoma (ADC) showed a significantly elevated performance from 0.639 to 0.699 (p = 0.009) when applying XGBoost and adding SNPs to the model, while the best model for lung squamous cell carcinoma (SCC) prediction was the LR model predicted with epidemiology and SNPs (AUC = 0.833), compared with the XGBoost model (AUC = 0.816). CONCLUSION Our lung cancer risk prediction models in the Chinese population have a strong predictive ability, especially for SCC. Adding SNPs and applying the XGBoost algorithm to the epidemiologic-based logistic regression risk prediction model significantly improves model performance.
Collapse
Affiliation(s)
- Yutao Li
- School of Life SciencesFudan UniversityShanghaiChina
| | - Zixiu Zou
- School of Life SciencesFudan UniversityShanghaiChina
| | - Zhunyi Gao
- Company 6 of Basic Medical SchoolNavy Military Medical UniversityShanghaiChina
| | - Yi Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Man Xiao
- Department of Biochemistry and Molecular BiologyHainan Medical UniversityHaikouChina
| | - Chang Xu
- Clinical College of Xiangnan UniversityChenzhouChina
| | - Gengxi Jiang
- Department of Thoracic Surgerythe First Affiliated Hospital of Naval Medical University (Second Military Medical University)ShanghaiChina
| | - Haijian Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Li Jin
- School of Life SciencesFudan UniversityShanghaiChina
| | - Jiucun Wang
- School of Life SciencesFudan UniversityShanghaiChina
| | - Huai Zhou Wang
- Department of Laboratory Diagnosisthe First Affiliated Hospital of Naval Medical University (Second Military Medical University)ShanghaiChina
| | - Shicheng Guo
- School of Life SciencesFudan UniversityShanghaiChina
| | - Junjie Wu
- School of Life SciencesFudan UniversityShanghaiChina,Department of Pulmonary and Critical Care Medicine, Zhongshan HospitalFudan UniversityShanghaiChina,Department of Pulmonary and Critical Care MedicineShanghai Geriatric Medical CenterShanghaiChina
| |
Collapse
|
39
|
Wang R, Wang L, Zhang J, He M, Xu J. XGBoost machine learning algorism performed better than regression models in predicting mortality of moderate to severe traumatic brain injury. World Neurosurg 2022:S1878-8750(22)00492-2. [PMID: 35430400 DOI: 10.1016/j.wneu.2022.04.044] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 04/08/2022] [Accepted: 04/09/2022] [Indexed: 02/08/2023]
Abstract
BACKGROUND Traumatic brain injury (TBI) brings severe mortality and morbidity risk to patients. Predicting outcome of these patients is necessary for physicians to make suitable treatments to improve prognosis. The aim of this study is to develop a mortality prediction approach using the XGBoost (extreme gradient boosting) in moderate to severe TBI. METHODS 368 patients hospitalized in West China hospital for TBI with GCS below 13 were identified. To construct XGBoost prediction approach, patients were divided into training set and test set with ratio of 7:3. Logistic regression prediction model was also constructed and compared with XGBoost model. Area under the receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity were calculated to compare the prognostic value between XGBoost and logistic regression. RESULTS 205 patients suffered poor outcome with mortality of 55.7%. Non-survivors had lower Glasgow Coma Scale (GCS) (5 vs 7, p<0.001) and higher Injury Severit Score (ISS) than survivors (25 vs 16, p<0.001). Platelet (p<0.001), albumin (p<0.001), hemoglobin (p<0.001) were significantly lower in non-survivors while glucose (p<0.001) and prothrombin time (PT) (p<0.001)was significantly higher in non-survivors. Among the XGBoost approach, GCS, PT and glucose had the most significant feature importance. The AUC (0.955 vs 0.805) and accuracy (0.955 vs 0.70) of XGBoost were both higher than logistic regression. CONCLUSION Predicting mortality of moderate to severe TBI patients using XGBoost algorism is more effective and precise than logistic regression. The XGBoost prediction approach is beneficial for physicians to evaluate TBI patients at high risk of poor outcome.
Collapse
|
40
|
Abdu Gumaei, Walaa N. Ismail, Md. Rafiul Hassan, Mohammad Mehedi Hassan, Ebtsam Mohamed, Abdullah Alelaiwi, Giancarlo Fortino. A Decision-Level Fusion Method for COVID-19 Patient Health Prediction. Big Data Research 2022; 27. [ DOI: 10.1016/j.bdr.2021.100287] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 08/11/2021] [Accepted: 10/28/2021] [Indexed: 06/16/2023]
Abstract
With the continuous attempts to develop effective machine learning methods, information fusion approaches play an important role in integrating data from multiple sources and improving these methods' performance. Among the different fusion techniques, decision-level fusion has unique advantages to fuse the decisions of various classifiers and getting an effective outcome. In this paper, we propose a decision-level fusion method that combines three well-calibrated ensemble classifiers, namely, a random forest (RF), gradient boosting (GB), and extreme gradient boosting (XGB) methods. It is used to predict the COVID-19 patient health for early monitoring and efficient treatment. A soft voting technique is used to generate the final decision result from the predictions of these calibrated classifiers. The method uses the COVID-19 patient's health information, travel demographic, and geographical data to predict the possible outcome of the COVID-19 case, recovered, or death. A different set of experiments is conducted on a public novel Corona Virus 2019 dataset using a different ratio of test sets. The experimental results show that the proposed fusion method achieved an accuracy of 97.24% and an F1-score of 0.97, which is higher than the current related work that has an accuracy of 94% and an F1-score 0.86, on 20% test set taken from the dataset.
Collapse
|
41
|
Sung SF, Hsieh CY, Hu YH. Early Prediction of Functional Outcomes After Acute Ischemic Stroke Using Unstructured Clinical Text: Retrospective Cohort Study. JMIR Med Inform 2022; 10:e29806. [PMID: 35175201 PMCID: PMC8895286 DOI: 10.2196/29806] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/17/2021] [Accepted: 01/02/2022] [Indexed: 02/06/2023] Open
Abstract
Background Several prognostic scores have been proposed to predict functional outcomes after an acute ischemic stroke (AIS). Most of these scores are based on structured information and have been used to develop prediction models via the logistic regression method. With the increased use of electronic health records and the progress in computational power, data-driven predictive modeling by using machine learning techniques is gaining popularity in clinical decision-making. Objective We aimed to investigate whether machine learning models created by using unstructured text could improve the prediction of functional outcomes at an early stage after AIS. Methods We identified all consecutive patients who were hospitalized for the first time for AIS from October 2007 to December 2019 by using a hospital stroke registry. The study population was randomly split into a training (n=2885) and test set (n=962). Free text in histories of present illness and computed tomography reports was transformed into input variables via natural language processing. Models were trained by using the extreme gradient boosting technique to predict a poor functional outcome at 90 days poststroke. Model performance on the test set was evaluated by using the area under the receiver operating characteristic curve (AUC). Results The AUCs of text-only models ranged from 0.768 to 0.807 and were comparable to that of the model using National Institutes of Health Stroke Scale (NIHSS) scores (0.811). Models using both patient age and text achieved AUCs of 0.823 and 0.825, which were similar to those of the model containing age and NIHSS scores (0.841); the model containing preadmission comorbidities, level of consciousness, age, and neurological deficit (PLAN) scores (0.837); and the model containing Acute Stroke Registry and Analysis of Lausanne (ASTRAL) scores (0.840). Adding variables from clinical text improved the predictive performance of the model containing age and NIHSS scores, the model containing PLAN scores, and the model containing ASTRAL scores (the AUC increased from 0.841 to 0.861, from 0.837 to 0.856, and from 0.840 to 0.860, respectively). Conclusions Unstructured clinical text can be used to improve the performance of existing models for predicting poststroke functional outcomes. However, considering the different terminologies that are used across health systems, each individual health system may consider using the proposed methods to develop and validate its own models.
Collapse
Affiliation(s)
- Sheng-Feng Sung
- Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City, Taiwan.,Department of Nursing, Min-Hwei Junior College of Health Care Management, Tainan, Taiwan
| | - Cheng-Yang Hsieh
- Department of Neurology, Tainan Sin Lau Hospital, Tainan, Taiwan
| | - Ya-Han Hu
- Department of Information Management, National Central University, Taoyuan City, Taiwan
| |
Collapse
|
42
|
Tang M, Gao L, He B, Yang Y. Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort. Cancer Manag Res 2022; 14:25-35. [PMID: 35018119 PMCID: PMC8742582 DOI: 10.2147/cmar.s340739] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 12/01/2021] [Indexed: 12/16/2022] Open
Abstract
Purpose The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with non-metastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
Collapse
Affiliation(s)
- Mo Tang
- Oncology Department, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| | - Lihao Gao
- Smart City Business Unit, Baidu Inc., Beijing, People's Republic of China
| | - Bin He
- Oncology Department, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| | - Yufei Yang
- Oncology Department, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| |
Collapse
|
43
|
Lee S, Son SO, Park J, Park J. Ensemble-Based Methodology to Identify Optimal Personal Mobility Service Areas Using Public Data. KSCE J Civ Eng 2022; 26:3150-3159. [PMCID: PMC9077355 DOI: 10.1007/s12205-022-1356-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 02/02/2022] [Accepted: 03/14/2022] [Indexed: 11/14/2023]
Abstract
Public transportation networks are well established in main cities, but there are some inconveniences in using public transportation in some cities. Public transportation is less accessible and walking distance of getting to public transportation is too long in some cities. Compared to other cities, Seoul has a higher satisfaction rate with public transportation. There are many cases, however, where short-distance taxis are used because walking to destinations after using public transportation is inconvenient; instead, Personal mobility (PM) devices can be used for these short-distances trip. This study aims to find the optimal PM service area using GIS(Geographic Information System)-based public transportation big data analyses. Variables were generated by collecting socio-economic factors, public transportation data, and geographic data and Extreme gradient boosting and Random forest, which are representative ensemble methods, were used for evaluation. We divided Seoul into a hexagonal grid and developed the optimal PM location service model by creating hexagonal cell data units and analyzing the areas with the models. We found that residential complexes, parks, and near subway stations (all areas with high foot traffic) are best suited for optimal placement. We also determined deployment should be in lower sloped areas. We expect this work to help determine public transportation stop and shared mobility station locations as well as contribute to public transportation demand surveys and accessibility analyses.
Collapse
Affiliation(s)
- Sangjae Lee
- Dept. of Transportation and Logistic Engineering, Hanyang University, Ansan, 15588 Korea
| | - Seung-oh Son
- Dept. of Smart City Engineering, Hanyang University, Ansan, 15588 Korea
| | - Juneyoung Park
- Dept. of Transportation and Logistic Engineering, Hanyang University, Ansan, 15588 Korea
- Dept. of Smart City Engineering, Hanyang University, Ansan, 15588 Korea
| | - Jaehong Park
- Dept. of Highway & Transportation Research, Korea Institute of Civil Engineering & Building Technology, Goyang, 10223 Korea
| |
Collapse
|
44
|
Wang R, Zhang J, Shan B, He M, Xu J. XGBoost Machine Learning Algorithm for Prediction of Outcome in Aneurysmal Subarachnoid Hemorrhage. Neuropsychiatr Dis Treat 2022; 18:659-667. [PMID: 35378822 PMCID: PMC8976557 DOI: 10.2147/ndt.s349956] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/09/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Patients suffered aneurysmal subarachnoid hemorrhage (aSAH) usually develop poor survival and functional outcome. Evaluating aSAH patients at high risk of poor outcome is necessary for clinicians to make suitable therapeutical strategy. This study is conducted to develop prognostic model using XGBoost (extreme gradient boosting) algorithm in aSAH. METHODS A total of 351 aSAH patients admitted to West China hospital were identified. Patients were divided into training set and test set with ratio of 7:3 to testify the predictive value of XGBoost based prognostic model. Additionally, logistic regression model was also constructed and compared with XGBoost based model. Area under the receiver operating characteristic curve (AUC), sensitivity and specificity were calculated to evaluate the value of XGBoost and logistic regression. RESULTS There were 74 (21.1%) non-survivors and 148 (42.1%) patients with unfavorable functional outcome. Non-survivors had older age (p=0.025), lower Glasgow coma scale (GCS) (p<0.001), higher World Federation of Neurosurgical Societies WFNS score (p<0.001), mFisher score (p<0.001). The incidence of intraventricular hemorrhage (IVH) (p=0.025) and delayed cerebral ischemia (DCI) (p<0.001) was higher in non-survivors than survivors. The AUC of XGBoost model for predicting mortality and unfavorable functional outcome were 0.950 and 0.958, which were higher than 0.767 and 0.829 of logistic regression model. CONCLUSION XGBoost based model is more precise than logistic regression model in predicting outcome of aSAH patients. Using XGBoost prognostic model is helpful for clinicians to identify high-risk aSAH patients and therefore strengthen medical care.
Collapse
Affiliation(s)
- Ruoran Wang
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People's Republic of China
| | - Jing Zhang
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People's Republic of China
| | - Baoyin Shan
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People's Republic of China
| | - Min He
- Department of Critical care medicine, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People's Republic of China
| | - Jianguo Xu
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu, Sichuan Province, People's Republic of China
| |
Collapse
|
45
|
Zhou S, Sun W, Zhang P, Li L. Predicting Pseudogene-miRNA Associations Based on Feature Fusion and Graph Auto-Encoder. Front Genet 2021; 12:781277. [PMID: 34966413 PMCID: PMC8710693 DOI: 10.3389/fgene.2021.781277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/16/2021] [Indexed: 11/13/2022] Open
Abstract
Pseudogenes were originally regarded as non-functional components scattered in the genome during evolution. Recent studies have shown that pseudogenes can be transcribed into long non-coding RNA and play a key role at multiple functional levels in different physiological and pathological processes. microRNAs (miRNAs) are a type of non-coding RNA, which plays important regulatory roles in cells. Numerous studies have shown that pseudogenes and miRNAs have interactions and form a ceRNA network with mRNA to regulate biological processes and involve diseases. Exploring the associations of pseudogenes and miRNAs will facilitate the clinical diagnosis of some diseases. Here, we propose a prediction model PMGAE (Pseudogene–MiRNA association prediction based on the Graph Auto-Encoder), which incorporates feature fusion, graph auto-encoder (GAE), and eXtreme Gradient Boosting (XGBoost). First, we calculated three types of similarities including Jaccard similarity, cosine similarity, and Pearson similarity between nodes based on the biological characteristics of pseudogenes and miRNAs. Subsequently, we fused the above similarities to construct a similarity profile as the initial representation features for nodes. Then, we aggregated the similarity profiles and associations of nodes to obtain the low-dimensional representation vector of nodes through a GAE. In the last step, we fed these representation vectors into an XGBoost classifier to predict new pseudogene–miRNA associations (PMAs). The results of five-fold cross validation show that PMGAE achieves a mean AUC of 0.8634 and mean AUPR of 0.8966. Case studies further substantiated the reliability of PMGAE for mining PMAs and the study of endogenous RNA networks in relation to diseases.
Collapse
Affiliation(s)
- Shijia Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.,Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
46
|
Chen X, Jiang Z. ISFMDA: Learning Interactions of Selected Features-Based Method for Predicting Potential MicroRNA-Disease Associations. J Comput Biol 2021; 28:1219-1227. [PMID: 34847740 DOI: 10.1089/cmb.2021.0149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Prediction of potential microRNA-disease associations is one of the important tasks in computational biology fields. Mining more sophisticated features can improve the performance of the prediction methods. This article proposes a novel algorithm (ISFMDA) that can effectively learn low- or high-order interactions of recursive feature elimination selected features by an extreme gradient boosting, a factorization machine, and a deep neural network. As a result, ISFMDA can obtain an area under receiver operating characteristic curve (AUROC) of 0.9342 ± 0.0007 in fivefold cross-validation tests with 51.25% of original features, which verifies the effectiveness of the methods.
Collapse
Affiliation(s)
- Xuejun Chen
- School of Computer Science and Technology, East China Normal University, Shanghai, China
| | - Zhenran Jiang
- School of Computer Science and Technology, East China Normal University, Shanghai, China
| |
Collapse
|
47
|
Guan X, Zhang B, Fu M, Li M, Yuan X, Zhu Y, Peng J, Guo H, Lu Y. Clinical and inflammatory features based machine learning model for fatal risk prediction of hospitalized COVID-19 patients: results from a retrospective cohort study. Ann Med 2021; 53:257-266. [PMID: 33410720 PMCID: PMC7799376 DOI: 10.1080/07853890.2020.1868564] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 12/20/2020] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVES To appraise effective predictors for COVID-19 mortality in a retrospective cohort study. METHODS A total of 1270 COVID-19 patients, including 984 admitted in Sino French New City Branch (training and internal validation sets randomly split at 7:3 ratio) and 286 admitted in Optical Valley Branch (external validation set) of Wuhan Tongji hospital, were included in this study. Forty-eight clinical and laboratory features were screened with LASSO method. Further multi-tree extreme gradient boosting (XGBoost) machine learning-based model was used to rank importance of features selected from LASSO and subsequently constructed death risk prediction model with simple-tree XGBoost model. Performances of models were evaluated by AUC, prediction accuracy, precision, and F1 scores. RESULTS Six features, including disease severity, age, levels of high-sensitivity C-reactive protein (hs-CRP), lactate dehydrogenase (LDH), ferritin, and interleukin-10 (IL-10), were selected as predictors for COVID-19 mortality. Simple-tree XGBoost model conducted by these features can predict death risk accurately with >90% precision and >85% sensitivity, as well as F1 scores >0.90 in training and validation sets. CONCLUSION We proposed the disease severity, age, serum levels of hs-CRP, LDH, ferritin, and IL-10 as significant predictors for death risk of COVID-19, which may help to identify the high-risk COVID-19 cases. KEY MESSAGES A machine learning method is used to build death risk model for COVID-19 patients. Disease severity, age, hs-CRP, LDH, ferritin, and IL-10 are death risk factors. These findings may help to identify the high-risk COVID-19 cases.
Collapse
Affiliation(s)
- Xin Guan
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Department of Occupational and Environmental Health, State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Bo Zhang
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ming Fu
- Department of Occupational and Environmental Health, State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Mengying Li
- Department of Occupational and Environmental Health, State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xu Yuan
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yaowu Zhu
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jing Peng
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Huan Guo
- Department of Occupational and Environmental Health, State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yanjun Lu
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
48
|
Wang P, Zhang G, Yu ZG, Huang G. A Deep Learning and XGBoost-Based Method for Predicting Protein-Protein Interaction Sites. Front Genet 2021; 12:752732. [PMID: 34764983 PMCID: PMC8576272 DOI: 10.3389/fgene.2021.752732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/20/2021] [Indexed: 11/29/2022] Open
Abstract
Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.
Collapse
Affiliation(s)
- Pan Wang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Guiyang Zhang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang, China
| |
Collapse
|
49
|
Kurten S, Winant D, Beullens K. Mothers Matter: Using Regression Tree Algorithms to Predict Adolescents' Sharing of Drunk References on Social Media. Int J Environ Res Public Health 2021; 18:11338. [PMID: 34769854 PMCID: PMC8583103 DOI: 10.3390/ijerph182111338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/28/2021] [Accepted: 10/13/2021] [Indexed: 11/16/2022]
Abstract
Exposure to online drinking on social media is associated with real-life alcohol consumption. Building on the Theory of planned behavior, the current study substantially adds to this line of research by identifying the predictors of sharing drunk references on social media. Based on a cross-sectional survey among 1639 adolescents with a mean age of 15 (59% female), this study compares and discusses multiple regression tree algorithms predicting the sharing of drunk references. More specifically, this paper compares the accuracy of classification and regression tree, bagging, random forest and extreme gradient boosting algorithms. The analysis indicates that four concepts are central to predicting adolescents' sharing of drunk references: (1) exposure to them on social media; (2) the perceived injunctive norms of the mother towards alcohol consumption; (3) the perceived descriptive norms of best friends towards alcohol consumption; and (4) willingness to drink alcohol. The most accurate results were obtained using extreme gradient boosting. This study provides theoretical, practical, and methodological conclusions. It shows that maternal norms toward alcohol consumption are a central predictor for sharing drunk references. Therefore, future media literacy interventions should take an ecological perspective. In addition, this analysis indicates that regression trees are an advantageous method in youth research, combining accurate predictions with straightforward interpretations.
Collapse
Affiliation(s)
- Sebastian Kurten
- Faculty of Social Sciences, Leuven School for Mass Communication Research, KU Leuven, Parkstraat 45, 3000 Leuven, Belgium;
| | - David Winant
- Department of Electrical Engineering, Dynamical Systems, Signal Processing and Data Analytics (STADIUS), KU Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium;
| | - Kathleen Beullens
- Faculty of Social Sciences, Leuven School for Mass Communication Research, KU Leuven, Parkstraat 45, 3000 Leuven, Belgium;
| |
Collapse
|
50
|
Shin SJ, Park J, Lee SH, Yang K, Park RW. Predictability of Mortality in Patients With Myocardial Injury After Noncardiac Surgery Based on Perioperative Factors via Machine Learning: Retrospective Study. JMIR Med Inform 2021; 9:e32771. [PMID: 34647900 PMCID: PMC8554678 DOI: 10.2196/32771] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 08/31/2021] [Accepted: 09/20/2021] [Indexed: 11/13/2022] Open
Abstract
Background Myocardial injury after noncardiac surgery (MINS) is associated with increased postoperative mortality, but the relevant perioperative factors that contribute to the mortality of patients with MINS have not been fully evaluated. Objective To establish a comprehensive body of knowledge relating to patients with MINS, we researched the best performing predictive model based on machine learning algorithms. Methods Using clinical data from 7629 patients with MINS from the clinical data warehouse, we evaluated 8 machine learning algorithms for accuracy, precision, recall, F1 score, area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve to investigate the best model for predicting mortality. Feature importance and Shapley Additive Explanations values were analyzed to explain the role of each clinical factor in patients with MINS. Results Extreme gradient boosting outperformed the other models. The model showed an AUROC of 0.923 (95% CI 0.916-0.930). The AUROC of the model did not decrease in the test data set (0.894, 95% CI 0.86-0.922; P=.06). Antiplatelet drugs prescription, elevated C-reactive protein level, and beta blocker prescription were associated with reduced 30-day mortality. Conclusions Predicting the mortality of patients with MINS was shown to be feasible using machine learning. By analyzing the impact of predictors, markers that should be cautiously monitored by clinicians may be identified.
Collapse
Affiliation(s)
- Seo Jeong Shin
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea
| | - Jungchan Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea.,Department of Anesthesiology and Pain Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Seung-Hwa Lee
- Rehabilitation & Prevention Center, Heart Vascular Stroke Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea.,Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Kwangmo Yang
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea.,Center for Health Promotion, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Republic of Korea.,Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Republic of Korea
| |
Collapse
|