1
|
Wei Q, Xu Z, Yin H. Enhanced nitrogen prediction and mechanistic process analysis in high-salinity wastewater treatment using interpretable machine learning approach. BIORESOURCE TECHNOLOGY 2025; 426:132393. [PMID: 40081773 DOI: 10.1016/j.biortech.2025.132393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2025] [Revised: 03/04/2025] [Accepted: 03/10/2025] [Indexed: 03/16/2025]
Abstract
This study introduces an interpretable machine learning framework to predict nitrogen removal in membrane bioreactor (MBR) treating high-salinity wastewater. By integrating Shapley additive explanations (SHAP) with Categorical Boosting (CatBoost), we address the critical gap in linking predictive accuracy to operational decision-making for saline systems. CatBoost achieved the best performance, with an coefficient of determination (R2) of 0.88 and root mean square error (RMSE) of 4.27 for the effluent ammonia nitrogen (NH4+-Nout), and an R2 of 0.91 and RMSE of 4.35 for the effluent total nitrogen (TNout). SHAP analysis uniquely revealed salinity's dual role in inhibiting nitrifying enzymes and disrupting carbon metabolism, with dissolved oxygen, pH and chemical oxygen demand removal efficiency as key regulators. Temperature and carbon-to-nitrogen ratio further modulated total nitrogen dynamics through electron donor availability and microbial activity. The proposed SHAP-CatBoost model in high salinity MBR combines predictive modelling with mechanical process control.
Collapse
Affiliation(s)
- Qing Wei
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; State Key Laboratory of Pollution Control and Resource Reuse, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
| | - Zuxin Xu
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; State Key Laboratory of Pollution Control and Resource Reuse, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China.
| | - Hailong Yin
- College of Environmental Science and Engineering, Tongji University, Shanghai 200092, China; State Key Laboratory of Pollution Control and Resource Reuse, Tongji University, Shanghai 200092, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai 200092, China
| |
Collapse
|
2
|
Kim TM, Kim YH, Song SH, Choi IY, Kim DJ, Ko T. Explainability Enhanced Machine Learning Model for Classifying Intellectual Disability and Attention-Deficit/Hyperactivity Disorder With Psychological Test Reports. J Korean Med Sci 2025; 40:e26. [PMID: 40132533 PMCID: PMC11932825 DOI: 10.3346/jkms.2025.40.e26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Accepted: 10/17/2024] [Indexed: 03/27/2025] Open
Abstract
BACKGROUND Psychological test reports are essential in assessing intellectual functioning, aiding in diagnosing and treating intellectual disability (ID) and attention-deficit/hyperactivity disorder (ADHD). However, these reports can have several problems because they are diverse, unstructured, subjective, and involve human errors. Additionally, physicians often do not read the entire report, and the number of reports is lower than that of diagnoses. METHODS We developed explainable predictive models for classifying IDs and ADHDs based on written reports to address these issues. The reports of 1,475 patients with IDs and ADHDs who underwent intelligence tests were used for the models. These models were developed by analyzing reports using natural language processing (NLP) and incorporating the physician's diagnosis for each report. We selected n-gram features from the models' results by extracting important features using SHapley Additive exPlanations and permutation importance to make the models explainable. Developing the n-gram feature-based original text search system compensated for the lack of human readability caused by NLP and enabled the reconstruction of human-readable texts from the selected n-gram features. RESULTS The maximum model accuracy was 0.92, and the 80 human-readable texts were restored from four models. CONCLUSION The results showed that the models could accurately classify IDs and ADHDs, even with a few reports. The models were also able to explain their predictions. The explainability-enhanced model can help physicians understand the classification process of IDs and ADHDs and provide evidence-based insights.
Collapse
Affiliation(s)
- Tong Min Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Young-Hoon Kim
- Department of Pediatrics, Uijeongbu St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Uijeongbu, Korea
| | | | - In-Young Choi
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Dai-Jin Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
- Department of Psychiatry, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Taehoon Ko
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
- Department of Medical Sciences, College of Medicine, The Catholic University of Korea, Seoul, Korea
- CMC Institute for Basic Medical Science, The Catholic Medical Center of The Catholic University of Korea, Seoul, Korea.
| |
Collapse
|
3
|
Singh RN, Krishnan P, Bharadwaj C, Sah S, Das B. Optimizing chickpea yield prediction under wilt disease through synergistic integration of biophysical and image parameters using machine learning models. Sci Rep 2025; 15:4417. [PMID: 39910102 PMCID: PMC11799175 DOI: 10.1038/s41598-025-87134-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2024] [Accepted: 01/16/2025] [Indexed: 02/07/2025] Open
Abstract
Crop health assessment and early yield predictions are highly crucial under biotic stress conditions for crop management and market planning by farmers and policy planners. The objective of this study was, therefore, to assess the impact of different levels of wilt disease on the biophysical parameters of chickpea and developing machine learning (ML) models for early yield prediction. Field experiments were carried out over three years at the Indian Agricultural Research Institute research farm in New Delhi. Thermal and visible images were collected alongside the measurement of crop biophysical parameters, including leaf area index (LAI), photosynthesis, transpiration rate, stomatal conductance, relative leaf water content (RWC), membrane stability index (MSI), and NDVI, for 85 chickpea genotypes with varying levels of wilt resistance. ML models were developed for early yield prediction by combining visible and thermal image indices with biophysical parameters. The results showed that the canopy temperatures were directly correlated with increasing levels of wilt severity. Crop photosynthesis, stomatal conductance, transpiration, LAI, RWC, MSI, and NDVI dropped significantly with increasing levels of wilt severity. Yield reductions of 44-69% were observed in susceptible genotypes. Machine learning models were able to give accurate early yield predictions. The accuracy of the models increases as we move closer to the harvest. Ranking of the model's performances indicated that XGB is the best model to predict chickpea yield under wilt conditions. NDVI was identified as most important variable for yield prediction. The findings of the study quantified the impacts of wilt on important crop biophysical parameters and highlighted the suitability of ML models in early yield prediction under different levels of disease severity.
Collapse
Affiliation(s)
- R N Singh
- Division of Agricultural Physics, ICAR-Indian Agricultural Research Institute, New Delhi, India
- ICAR-National Institute of Abiotic Stress Management, Pune, Maharashtra, India
| | - P Krishnan
- Division of Agricultural Physics, ICAR-Indian Agricultural Research Institute, New Delhi, India.
| | - C Bharadwaj
- Division of Genetics, ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Sonam Sah
- ICAR-National Institute of Abiotic Stress Management, Pune, Maharashtra, India
| | - B Das
- ICAR-Central Coastal Agricultural Research Institute, Old Goa, Goa, India
| |
Collapse
|
4
|
Wu Z, Cha S, Wang C, Qu T, Zou Z. Salmon Consumption Behavior Prediction Based on Bayesian Optimization and Explainable Artificial Intelligence. Foods 2025; 14:429. [PMID: 39942022 PMCID: PMC11817250 DOI: 10.3390/foods14030429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 01/26/2025] [Accepted: 01/27/2025] [Indexed: 02/16/2025] Open
Abstract
Predicting seafood consumption behavior is essential for fishing companies to adjust their production plans and marketing strategies. To achieve accurate predictions, this paper introduces a model for forecasting seafood consumption behavior based on an interpretable machine learning algorithm. Additionally, the Shapley Additive exPlanation (SHAP) model and the Accumulated Local Effects (ALE) plot were integrated to provide a detailed analysis of the factors influencing Shanghai residents' intentions to purchase salmon. In this study, we constructed nine regression prediction models, including ANN, Decision Tree, GBDT, Random Forest, AdaBoost, XGBoost, LightGBM, CatBoost, and NGBoost, to predict the consumers' intentions to purchase salmon and to compare their predictive performance. In addition, Bayesian optimization algorithm is used to optimize the hyperparameters of the optimal regression prediction model to improve the model prediction accuracy. Finally, the SHAP model was used to analyze the key factors and interactions affecting the consumers' willingness to purchase salmon, and the Accumulated Local Effects plot was used to show the specific prediction patterns of different influences on salmon consumption. The results of the study show that salmon farming safety and ease of cooking have significant nonlinear effects on salmon consumption; the BO-CatBoost nonlinear regression prediction model demonstrates superior performance compared to the benchmark model, with the test set exhibiting RMSE, MSE, MAE, R2 and TIC values of 0.155, 0.024, 0.097, 0.902, and 0.313, respectively. This study can provide technical support for suppliers in the salmon value chain and help their decision-making to adjust their corporate production plan and marketing activities.
Collapse
Affiliation(s)
- Zhan Wu
- School of Economics and Management, Shanghai Ocean University, Shanghai 201306, China;
| | - Sina Cha
- School of Business, The University of Hong Kong, Hong Kong 999077, China;
| | - Chunxiao Wang
- School of Economics and Management, Shanghai Ocean University, Shanghai 201306, China;
| | - Tinghong Qu
- School of Economics and Management, Shanghai Ocean University, Shanghai 201306, China;
| | - Zongfeng Zou
- School of Management, Shanghai University, Shanghai 200444, China;
| |
Collapse
|
5
|
Chen L, Yang Y, Wang W. Temporal Autoregressive Matrix Factorization for High-Dimensional Time Series Prediction of OSS. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:13741-13752. [PMID: 37247312 DOI: 10.1109/tnnls.2023.3271327] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Open-source software (OSS) plays an increasingly significant role in modern software development tendency, so accurate prediction of the future development of OSS has become an essential topic. The behavioral data of different open-source software are closely related to their development prospects. However, most of these behavioral data are typical high-dimensional time series data streams with noise and missing values. Hence, accurate prediction on such cluttered data requires the model to be highly scalable, which is not a property of traditional time series prediction models. To this end, we propose a temporal autoregressive matrix factorization (TAMF) framework that supports data-driven temporal learning and prediction. Specifically, we first construct a trend and period autoregressive model to extract trend and period features from OSS behavioral data, and then combine the regression model with a graph-based matrix factorization (MF) to complete the missing values by exploiting the correlations among the time series data. Finally, use the trained regression model to make predictions on the target data. This scheme ensures that TAMF can be applied to different types of high-dimensional time series data and thus has high versatility. We selected ten real developer behavior data from GitHub for case analysis. The experimental results show that TAMF has good scalability and prediction accuracy.
Collapse
|
6
|
Sah S, Haldar D, Singh RN, Das B, Nain AS. Rice yield prediction through integration of biophysical parameters with SAR and optical remote sensing data using machine learning models. Sci Rep 2024; 14:21674. [PMID: 39289440 PMCID: PMC11408675 DOI: 10.1038/s41598-024-72624-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 09/09/2024] [Indexed: 09/19/2024] Open
Abstract
In an era marked by growing global population and climate variability, ensuring food security has become a paramount concern. Rice, being a staple crop for billions of people, requires accurate and timely yield prediction to ensure global food security. This study was undertaken across two rice crop seasons in the Udham Singh Nagar district of Uttarakhand state to predict rice yield at 45, 60 and 90 days after transplanting (DAT) through machine learning (ML) models, utilizing a combination of optical and Synthetic Aperture Radar (SAR) data in conjunction with crop biophysical parameters. Results revealed that the ML models were able to provide relatively accurate early yield estimates. For summer rice, eXtreme gradient boosting (XGB) was the best-performing model at all three stages (45, 60, and 90 DAT), while for kharif rice, the best-performing models at 45, 60, and 90 DAT were XGB, Neural network (NNET), and Cubist, respectively. The combined ranking of ML models showed that prediction accuracy improved as the prediction date approaches harvest, and the best prediction of yield was observed at 90 DAT for both summer and kharif rice. Overall rankings indicate that for summer rice, the top three models were XGB, NNET, and Support vector regression, while for kharif rice, these were Cubist, NNET, and Random Forest, respectively. The findings of this study offer valuable insights into the potential of the combined use of remote sensing and biophysical parameters using ML models, which enhances food security planning and resource management by enabling more informed decision-making by stakeholders such as farmers, policy planners as well as researchers.
Collapse
Affiliation(s)
- Sonam Sah
- G. B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, India
- ICAR-National Institute of Abiotic Stress Management, Pune, Maharashtra, India
| | - Dipanwita Haldar
- Indian Institute of Remote Sensing, Dehradun, Uttarakhand, India
| | - R N Singh
- ICAR-National Institute of Abiotic Stress Management, Pune, Maharashtra, India
| | - B Das
- ICAR-Central Coastal Agricultural Research Institute, Goa, Old Goa, India
| | - Ajeet Singh Nain
- G. B. Pant University of Agriculture and Technology, Pantnagar, Uttarakhand, India.
| |
Collapse
|
7
|
Su R, Duan C, Chen B. The shift in the spatiotemporal relationship between supply and demand of ecosystem services and its drivers in China. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 365:121698. [PMID: 38968890 DOI: 10.1016/j.jenvman.2024.121698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Revised: 06/15/2024] [Accepted: 07/02/2024] [Indexed: 07/07/2024]
Abstract
In China, over 65% of human activities are concentrated in cities, resulting in a conflict between the supply and demand of ecosystem services (ESs). To alleviate this problem, many cities have adopted eco-friendly development modes, however, the effectiveness of these models in reducing ESs supply-demand conflicts has not been comprehensively reviewed, and the human and natural drivers behind these relationship shifts remain unclear. To bridge this gap, this study analyzed the shifts in the relationships between supply and demand of ESs across China from 2010 to 2020 at a city level, as well as identified the human and natural drivers behind them. Firstly, the InVEST models were integrated with socioeconomic data to evaluate the supply and demand distribution for three pivotal ESs: water yield (WY), habitat quality (HQ), and soil retention (SR). Then, a four-quadrant diagram approach was proposed to enhance the analysis of their spatiotemporal relationships. Furthermore, random forest models were employed to examine the drivers of the shifts in these relationships. The results showed that WY and SR services witnessed growth until 2015, and then receded, while HQ saw a modest decline from 2010 to 2020. Spatial synergies in the supply and demand of ESs were primarily observed in the southern cities, with a significant northward extension by 2020. From a temporal perspective, the percentage of cities achieving coordination in WY and SR services increased from 32.6% to 57.3%, respectively, in the 2010-2015 period to 42.4% and 63.3% between 2015 and 2020, meanwhile, HQ service conflicts diminished from 58.7% to 53.5%. The changes in socioeconomic and land use factors contributed to 64.3%, 36.1%, and 33.3% of the shifts in the supply-demand relationship for HQ, WY, and SR services, respectively. Our analysis highlights the potential of human-driven ecological management to enhance the balance of this relationship. It can support the design of city-specific policies that foster a balance between ecological processes and socio-economic development.
Collapse
Affiliation(s)
- Rui Su
- State Key Joint Laboratory of Environmental Simulation and Pollution Control, School of Environment, Beijing Normal University, Beijing, 100875, PR China
| | - Cuncun Duan
- College of Water Sciences, Beijing Normal University, Beijing, 100875, PR China
| | - Bin Chen
- State Key Joint Laboratory of Environmental Simulation and Pollution Control, School of Environment, Beijing Normal University, Beijing, 100875, PR China.
| |
Collapse
|
8
|
Nguyen HV, Byeon H. Prediction of Out-of-Hospital Cardiac Arrest Survival Outcomes Using a Hybrid Agnostic Explanation TabNet Model. MATHEMATICS 2023; 11:2030. [DOI: 10.3390/math11092030] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Survival after out-of-hospital cardiac arrest (OHCA) is contingent on time-sensitive interventions taken by onlookers, emergency call operators, first responders, emergency medical services (EMS) personnel, and hospital healthcare staff. By building integrated cardiac resuscitation systems of care, measurement systems, and techniques for assuring the correct execution of evidence-based treatments by bystanders, EMS professionals, and hospital employees, survival results can be improved. To aid in OHCA prognosis and treatment, we develop a hybrid agnostic explanation TabNet (HAE-TabNet) model to predict OHCA patient survival. According to the results, the HAE-TabNet model has an “Area under the receiver operating characteristic curve value” (ROC AUC) score of 0.9934 (95% confidence interval 0.9933–0.9935), which outperformed other machine learning models in the previous study, such as XGBoost, k-nearest neighbors, random forest, decision trees, and logistic regression. In order to achieve model prediction explainability for a non-expert in the artificial intelligence field, we combined the HAE-TabNet model with a LIME-based explainable model. This HAE-TabNet model may assist medical professionals in the prognosis and treatment of OHCA patients effectively.
Collapse
Affiliation(s)
- Hung Viet Nguyen
- Department of Digital Anti-Aging Healthcare (BK21), Inje University, Gimhae 50834, Republic of Korea
| | - Haewon Byeon
- Department of Digital Anti-Aging Healthcare (BK21), Inje University, Gimhae 50834, Republic of Korea
| |
Collapse
|
9
|
Shield attitude prediction based on Bayesian-LGBM machine learning. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
10
|
Suleman MT, Khan YD. m1A-pred: Prediction of Modified 1-methyladenosine Sites in RNA Sequences through Artificial Intelligence. Comb Chem High Throughput Screen 2022; 25:2473-2484. [PMID: 35718969 DOI: 10.2174/1386207325666220617152743] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/06/2022] [Accepted: 04/11/2022] [Indexed: 01/27/2023]
Abstract
BACKGROUND The process of nucleotides modification or methyl groups addition to nucleotides is known as post-transcriptional modification (PTM). 1-methyladenosine (m1A) is a type of PTM formed by adding a methyl group to the nitrogen at the 1st position of the adenosine base. Many human disorders are associated with m1A, which is widely found in ribosomal RNA and transfer RNA. OBJECTIVE The conventional methods such as mass spectrometry and site-directed mutagenesis proved to be laborious and burdensome. Systematic identification of modified sites from RNA sequences is gaining much attention nowadays. Consequently, an extreme gradient boost predictor, m1A-Pred, is developed in this study for the prediction of modified m1A sites. METHODS The current study involves the extraction of position and composition-based properties within nucleotide sequences. The extraction of features helps in the development of the features vector. Statistical moments were endorsed for dimensionality reduction in the obtained features. RESULTS Through a series of experiments using different computational models and evaluation methods, it was revealed that the proposed predictor, m1A-pred, proved to be the most robust and accurate model for the identification of modified sites. AVAILABILITY AND IMPLEMENTATION To enhance the research on m1A sites, a friendly server was also developed, which was the final phase of this research.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
11
|
Joseph LP, Joseph EA, Prasad R. Explainable diabetes classification using hybrid Bayesian-optimized TabNet architecture. Comput Biol Med 2022; 151:106178. [PMID: 36306578 DOI: 10.1016/j.compbiomed.2022.106178] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 09/23/2022] [Accepted: 10/01/2022] [Indexed: 12/27/2022]
Abstract
Diabetes is a deadly chronic disease that occurs when the pancreas is not able to produce ample insulin or when the body cannot use insulin effectively. If undetected, it may lead to a host of health complications. Hence, accurate and explainable early-stage detection of diabetes is essential for the proper administration of treatment options in leading a healthy and productive life. For this, we developed an interpretable TabNet model tuned via Bayesian optimization (BO). To achieve model-specific interpretability, the attention mechanism of TabNet architecture was used, which offered the local and global model explanations on the influence of the attributes on the outcomes. The model was further explained locally and globally using more robust model-agnostic LIME and SHAP eXplainable Artificial Intelligence (XAI) tools. The proposed model outperformed all benchmarked models by obtaining high accuracy of 92.2% and 99.4% using the Pima Indians diabetes dataset (PIDD) and the early-stage diabetes risk prediction dataset (ESDRPD), respectively. Based on the XAI results, it was clear that the most influential attribute for diabetes classification using PIDD and ESDRPD were Insulin and Polyuria, respectively. The feature importance values registered for insulin was 0.301 (PIDD) and for polyuria 0.206 was registered (ESDRPD). The high accuracy and ancillary interpretability of our objective model is expected to increase end-users trust and confidence in early-stage detection of diabetes.
Collapse
Affiliation(s)
- Lionel P Joseph
- School of Mathematics, Physics, and Computing, University of Southern Queensland, Springfield, QLD, 4300, Australia
| | - Erica A Joseph
- Umanand Prasad School of Medicine and Health Sciences, The University of Fiji, Saweni, Lautoka, Fiji
| | - Ramendra Prasad
- Department of Science, School of Science and Technology, The University of Fiji, Saweni, Lautoka, Fiji.
| |
Collapse
|
12
|
Soft sensor for the prediction of oxygen content in boiler flue gas using neural networks and extreme gradient boosting. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07771-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
13
|
Machine Learning Enabled 3D Body Measurement Estimation Using Hybrid Feature Selection and Bayesian Search. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
The 3D body scan technology has recently innovated the way of measuring human bodies and generated a large volume of body measurements. However, one inherent issue that plagues the use of the resultant database is the missing data usually caused by using automatic data extractions from the 3D body scans. Tedious extra efforts have to be made to manually fill the missing data for various applications. To tackle this problem, this paper proposes a machine learning (ML)-based approach for 3D body measurement estimation while considering the measurement (feature) importance. The proposed approach selects the most critical features to reduce the algorithm input and to improve the ML method performance. In addition, a Bayesian search is further used in fine-tuning the hyperparameters to minimize the mean square error. Two distinct ML methods, i.e., Random Forest and XGBoost, are used and tested on a real-world dataset that contains 3D body scans of 212 participants in the Kansas-Missouri area of the United States. The results show the effectiveness of the proposed methods with roughly 3% of Mean Absolute Percentage Errors in estimating the missing data. The two ML methods with the proposed hybrid feature selection and the Baysian search are comprehensively compared. The comparative results suggest that the Random Forest method performs better than the XGBoost counterpart in filling missing 3D body measurements.
Collapse
|
14
|
Zhang Y, Lin R, Zhang H, Peng Y. Vibration prediction and analysis of strip rolling mill based on XGBoost and Bayesian optimization. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00795-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractThe stable operation of strip rolling mill is the key factor to ensure the stability of product quality. The design capability of existing domestic imported and self-developed strip rolling mills cannot be fully developed, and the frequent occurrence of mill vibration and operation instability problems seriously restrict the equipment capacity and the production of high-end strip products. The vibration prediction analysis method for hot strip mill based on eXtreme gradient boosting (XGBoost) and Bayesian optimization (BO) is proposed. First, an XGBoost prediction model is developed based on a self-built data set to construct a complex functional relationship between process parameters and rolling mill vibration. Second, the important hyperparameters and parameters of XGBoost are optimized using Bayesian optimization algorithm to improve the prediction accuracy, computational efficiency, and stability of the model. Third, a comprehensive comparison is made between the prediction model in this paper and other well-known machine learning benchmark models. Finally, the prediction results of the model are interpreted using the SHapley Additive exPlanations (SHAP) method. The proposed model outperforms existing models in terms of prediction accuracy, computational speed and stability. At the same time, the degree of influence of each feature on rolling mill vibration is also obtained.
Collapse
|
15
|
Liu J, Chen W. First satellite-based regional hourly NO 2 estimations using a space-time ensemble learning model: A case study for Beijing-Tianjin-Hebei Region, China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 820:153289. [PMID: 35066047 DOI: 10.1016/j.scitotenv.2022.153289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 01/16/2022] [Accepted: 01/16/2022] [Indexed: 06/14/2023]
Abstract
Surface Nitrogen dioxide (NO2) concentrations have been generated with satellite retrievals using multiple statistical algorithms. However, they are often given at coarse frequencies ("snapshot", daily or even longer), limiting their applications in epidemiological studies and assessing the evolution of NO2 pollution. This study investigated the potential applicability of Himawari-8 derived hourly fine particulate matter concentrations in producing hourly NO2 concentrations by constructing a space-time ensemble model. The Beijing-Tianjin-Hebei (BTH) region, one of the serious pollution regions in China, is the study region chosen. The proposed model performs well in estimating hourly NO2 concentration with a high cross-validation (CV) coefficient of determination (R2 = 0.81) and low CV root-mean-square (RMSE = 9.71 μg/m3), mean prediction errors (MPE = 6.33 μg/m3), and relative prediction errors (RPE = 22.5%). On daily, monthly, seasonal, and annual time scales, CV R2 increases to 0.89, 0.93, 0.97, and 0.99, respectively. The annual mean model estimated NO2 concentration over BTH region is 28.2 ± 6.5 μg/m3, with relatively higher NO2 concentrations are seen in southern and southeastern BTH. Winter experiences the most severe NO2 concentrations, followed by autumn, spring, and summer. Surface NO2 concentrations are higher (lower) in the morning (afternoon) and tend to decrease gradually with time. The model generally captures the hourly evolution of NO2 concentrations for the severe pollution episode but shows some underestimations. The annual mean NO2 concentrations were 2.8% lower on the weekend than on weekdays. In addition, the weekend effects of NO2 concentrations are larger at rush hour and lower in the noon. The hourly NO2 products derived from proposed approach are potentially useful for improving our understanding of the source, evolution, and transportation behavior of NO2 pollution episodes and for exposure- and health-related research. The proposed approach also enriches the potential applications of geostationary satellites (e.g., Himawari-8).
Collapse
Affiliation(s)
- Jianjun Liu
- Environmental Model and Data Optima (EMDO) Laboratory, Laurel, MD 20707, United States.
| | - Wen Chen
- Department of Atmospheric and Oceanic Science, University of Maryland, College Park, MD 20742, United States
| |
Collapse
|
16
|
Laiton-Bonadiez C, Branch-Bedoya JW, Zapata-Cortes J, Paipa-Sanabria E, Arango-Serna M. Industry 4.0 Technologies Applied to the Rail Transportation Industry: A Systematic Review. SENSORS (BASEL, SWITZERLAND) 2022; 22:2491. [PMID: 35408111 PMCID: PMC9002761 DOI: 10.3390/s22072491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/13/2022] [Accepted: 03/19/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Industry 4.0 technologies have been widely used in the railway industry, focusing mainly on maintenance and control tasks necessary in the railway infrastructure. Given the great potential that these technologies offer, the scientific community has come to use them in varied ways to solve a wide range of problems such as train failures, train station security, rail system control and communication in hard-to-reach areas, among others. For this reason, this paper aims to answer the following research questions: what are the main issues in the railway transport industry, what are the technologic strategies that are currently being used to solve these issues and what are the technologies from industry 4.0 that are used in the railway transport industry to solve the aforementioned issues? METHODS This study adopts a systematic literature review approach. We searched the Science Direct and Web of Science database inception from January 2017 to November 2021. Studies published in conferences or journals written in English or Spanish were included for initial process evaluation. The initial included papers were analyzed by authors and selected based on whether they helped answer the proposed research questions or not. RESULTS Of the recovered 515 articles, 109 were eligible, from which we could identify three main application domains in the railway industry: monitoring, decision and planification techniques, and communication and security. Regarding industry 4.0 technologies, we identified 9 different technologies applied in reviewed studies: Artificial Intelligence (AI), Internet of Things (IoT), Cloud Computing, Big Data, Cybersecurity, Modelling and Simulation, Smart Decision Support Systems (SDSS), Computer Vision and Virtual Reality (VR). This study is, to our knowledge, one of the first to show how industry 4.0 technologies are currently being used to tackle railway industry problems and current application trends in the scientific community, which is highly useful for the development of future studies and more advanced solutions. FUNDING Colombian national organizations Minciencias and the Mining-Energy Planning Unit.
Collapse
Affiliation(s)
- Camilo Laiton-Bonadiez
- Facultad de Minas, Universidad Nacional de Colombia Sede Medellín, Medellín 050041, Colombia; (J.W.B.-B.); (M.A.-S.)
| | - John W. Branch-Bedoya
- Facultad de Minas, Universidad Nacional de Colombia Sede Medellín, Medellín 050041, Colombia; (J.W.B.-B.); (M.A.-S.)
| | | | | | - Martin Arango-Serna
- Facultad de Minas, Universidad Nacional de Colombia Sede Medellín, Medellín 050041, Colombia; (J.W.B.-B.); (M.A.-S.)
| |
Collapse
|
17
|
Zhang S, Khattak A, Matara CM, Hussain A, Farooq A. Hybrid feature selection-based machine learning Classification system for the prediction of injury severity in single and multiple-vehicle accidents. PLoS One 2022; 17:e0262941. [PMID: 35108288 PMCID: PMC8809572 DOI: 10.1371/journal.pone.0262941] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 01/07/2022] [Indexed: 11/19/2022] Open
Abstract
To undertake a reliable analysis of injury severity in road traffic accidents, a complete understanding of important attributes is essential. As a result of the shift from traditional statistical parametric procedures to computer-aided methods, machine learning approaches have become an important aspect in predicting the severity of road traffic injuries. The paper presents a hybrid feature selection-based machine learning classification approach for detecting significant attributes and predicting injury severity in single and multiple-vehicle accidents. To begin, we employed a Random Forests (RF) classifier in conjunction with an intrinsic wrapper-based feature selection approach called the Boruta Algorithm (BA) to find the relevant important attributes that determine injury severity. The influential attributes were then fed into a set of four classifiers to accurately predict injury severity (Naive Bayes (NB), K-Nearest Neighbor (K-NN), Binary Logistic Regression (BLR), and Extreme Gradient Boosting (XGBoost)). According to BA's experimental investigation, the vehicle type was the most influential factor, followed by the month of the year, the driver's age, and the alignment of the road segment. The driver's gender, the presence of a median, and the presence of a shoulder were all found to be unimportant. According to classifier performance measures, XGBoost surpasses the other classifiers in terms of prediction performance. Using the specified attributes, the accuracy, Cohen's Kappa, F1-Measure, and AUC-ROC values of the XGBoost were 82.10%, 0.607, 0.776, and 0.880 for single vehicle accidents and 79.52%, 0.569, 0.752, and 0.86 for multiple-vehicle accidents, respectively.
Collapse
Affiliation(s)
- Shuguang Zhang
- CCCC Southwest Investment & Development Company Limited, Beijing, China
| | - Afaq Khattak
- The Key Laboratory of Road and Traffic Engineering, Ministry of Education, Tongji University, Jiading, Shanghai, China
| | | | - Arshad Hussain
- NUST Institute of Civil Engineering, National University of Sciences and Technology, Islamabad, Pakistan
| | - Asim Farooq
- Head of Department at Centre of Excellence in Transportation Engineering, Pak Austria Facshhoule, Institute of Applied Sciences, Haripur, Pakistan
| |
Collapse
|
18
|
Chelgani SC. Estimation of gross calorific value based on coal analysis using an explainable artificial intelligence. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2021.100116] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|