1
|
Zhao M, Ma C, Zhang H, Li H, Huo S. Long-term water quality simulation and driving factors identification within the watershed scale using machine learning. JOURNAL OF CONTAMINANT HYDROLOGY 2025; 273:104604. [PMID: 40393303 DOI: 10.1016/j.jconhyd.2025.104604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2025] [Revised: 04/05/2025] [Accepted: 05/10/2025] [Indexed: 05/22/2025]
Abstract
Understanding long-term trends and analyzing their driving factors are essential to effectively enhance water quality in watersheds. In China, although the overall quality of surface water continues to improve, significant issues remain in certain regions. The Liao River Basin, a critical industrial hub and key agricultural grain base in northeast China, continues to face unstable water quality conditions, despite over 20 years of management efforts. This study compared several data-driven models (random forest (RF), support vector machine regression (SVR), K-nearest neighbors (KNN), stacking, long short-term memory (LSTM), convolutional-long short-term memory (CNN-LSTM)), to accurately fill the water quality data gaps (i.e., total nitrogen (TN), ammonia nitrogen (NH3-N), total phosphorus (TP), chemical oxygen demand (CODCr), permanganate index (CODMn), electroconductibility (E)) from 1980 to 2022 in Liao River Basin. In addition, the SHapley Additive exPlanations (SHAP) model was employed to quantitatively assess the driving factors of water quality. The results showed that the RF model exhibited robust predictive capabilities. TN showed a steady increase of approximately 20 % from 1980 to 2022, while the other parameters were effectively controlled. Anthropogenic activities, especially in agriculture and urban areas, were found to significantly contribute to water quality deterioration. Additionally, climatic factors such as extreme rainfall, annual average precipitation, and extreme temperatures-along with geographical factors like soil properties and slope, were found to play crucial roles in influencing water quality.
Collapse
Affiliation(s)
- Mingxuan Zhao
- Beijing Normal University, Beijing 100875, China; State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | - Chunzi Ma
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China; School of Water Conservancy and Civil Engineering, Northeast Agricultural University, Harbin 150038, China
| | | | - Haisheng Li
- State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 100012, China
| | | |
Collapse
|
2
|
Gholamzadeh M, Safdari R, Asadi Gharabaghi M, Abtahi H. Analysis of the most influential factors affecting outcomes of lung transplant recipients: a multivariate prediction model based on UNOS Data. BMJ Open 2025; 15:e089796. [PMID: 40379311 PMCID: PMC12086922 DOI: 10.1136/bmjopen-2024-089796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 04/11/2025] [Indexed: 05/19/2025] Open
Abstract
OBJECTIVES In lung transplantation (LTx), a priority is assigned to each candidate on the waiting list. Our primary objective was to identify the key factors that influence the allocation of priorities in LTx using machine learning (ML) techniques to enhance the process of prioritising patients. DESIGN Developing a prediction model. SETTING AND PARTICIPANTS Our data were retrieved from the United Network for Organ Sharing (UNOS) open-source database of transplant patients between 2005 and 2023. INTERVENTIONS After the preprocessing process, a feature engineering technique was employed to select the most relevant features. Then, six ML models with optimised hyperparameters including multiple linear regression, random forest regressor (RF), support vector machine regressor, XGBoost regressor, a multilayer perceptron model and a deep learning model were developed based on the UNOS dataset. PRIMARY AND SECONDARY OUTCOME MEASURES The performance of each model was evaluated using R-squared (R2) and other error rate metrics. Next, the Shapley Additive Explanations (SHAP) technique was used to identify the most important features in the prediction. RESULTS The raw dataset contains 196 270 records with 545 features in all organs. After preprocessing, 32 966 records with 15 features remain. Among various models, the RF model achieved a high R2 score. Additionally, the RF model exhibited the lowest error values, indicating its superior precision compared with other regression models. The SHAP technique in conjunction with the RF model revealed the 11 most important features for priority allocation. Subsequently, we developed a web-based decision support tool using Python and the Streamlit framework based on the best-fine-tuned model. CONCLUSION The deployment of the ML model has the potential to act as an automated tool to aid physicians in assessing the priority of lung transplants and identifying significant factors that play a role in patient survival.
Collapse
Affiliation(s)
- Marsa Gholamzadeh
- Health Information Management and Medical Informatics Department, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Reza Safdari
- Health Information Management and Medical Informatics Department, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Mehrnaz Asadi Gharabaghi
- Department of Pulmonary Medicine, Faculty of Medicine, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| | - Hamidreza Abtahi
- Pulmonary and Critical Care Medicine Department, Thoracic Research Center, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
| |
Collapse
|
3
|
Kumari A, Akhtar M, Shah R, Tanveer M. Support matrix machine: A review. Neural Netw 2025; 181:106767. [PMID: 39488110 DOI: 10.1016/j.neunet.2024.106767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 07/31/2024] [Accepted: 09/26/2024] [Indexed: 11/04/2024]
Abstract
Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. SMM preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class-imbalance, and multi-class classification models. We also analyze the applications of the SMM and conclude the article by outlining potential future research avenues and possibilities that may motivate researchers to advance the SMM algorithm.
Collapse
Affiliation(s)
- Anuradha Kumari
- Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India
| | - Mushir Akhtar
- Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India
| | - Rupal Shah
- Department of Electrical Engineering, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India
| | - M Tanveer
- Department of Mathematics, Indian Institute of Technology Indore, Simrol, Indore, 453552, Madhya Pradesh, India.
| |
Collapse
|
4
|
Dehghani MR, Nikravesh H, Aghel M, Kafi M, Kazemzadeh Y, Ranjbar A. Estimation of hydrogen solubility in aqueous solutions using machine learning techniques for hydrogen storage in deep saline aquifers. Sci Rep 2024; 14:25890. [PMID: 39468172 PMCID: PMC11519546 DOI: 10.1038/s41598-024-76850-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 10/17/2024] [Indexed: 10/30/2024] Open
Abstract
The porous underground structures have recently attracted researchers' attention for hydrogen gas storage due to their high storage capacity. One of the challenges in storing hydrogen gas in aqueous solutions is estimating its solubility in water. In this study, after collecting experimental data from previous research and eliminating four outliers, nine machine learning methods were developed to estimate the solubility of hydrogen in water. To optimize the parameters used in model construction, a Bayesian optimization algorithm was employed. By examining error functions and plots, the LSBoost method with R² = 0.9997 and RMSE = 4.18E-03 was identified as the most accurate method. Additionally, artificial neural network, CatBoost, Extra trees, Gaussian process regression, bagged trees, regression trees, support vector machines, and linear regression methods had R² values of 0.9925, 0.9907, 0.9906, 0.9867, 0.9866, 0.9808, 0.9464, and 0.7682 and RMSE values of 2.13E-02, 2.43E-02, 2.44E-02, 2.83E-02, 2.85E-02, 3.40E-02, 5.68E-02, and 1.18E-01, respectively. Subsequently, residual error plots were generated, indicating the accurate performance of the LSBoost model across all ranges. The maximum residual error was - 0.0252, and only 4 data points were estimated with an error greater than ± 0.01. A kernel density estimation (KDE) plot for residual errors showed no specific bias in the models except for the linear regression model. To investigate the impact of temperature, pressure, and salinity parameters on the model outputs, the Pearson correlation coefficients for the LSBoost model were calculated, showing that pressure, temperature, and salinity had values of 0.8188, 0.1008, and - 0.5506, respectively, indicating that pressure had the strongest direct relationship, while salinity had an inverse relationship with hydrogen solubility. Considering the results of this research, the LSBoost method, alongside approaches like state equations, can be applied in real-world scenarios for underground hydrogen storage. The findings of this study can help in a better understanding of hydrogen solubility in aqueous solutions, aiding in the optimization of underground hydrogen storage systems.
Collapse
Affiliation(s)
- Mohammad Rasool Dehghani
- Department of Petroleum Engineering, Faculty of Petroleum, Gas, and Petrochemical Engineering, Persian Gulf University, Bushehr, Iran
| | - Hamed Nikravesh
- Department of Petroleum Engineering, Faculty of Petroleum, Gas, and Petrochemical Engineering, Persian Gulf University, Bushehr, Iran
| | - Maryam Aghel
- Department of Environmental Health Engineering, Faculty of Health and Nutrition, Bushehr University of Medical Sciences, Bushehr, Iran
| | - Moein Kafi
- Department of Petroleum Engineering, Faculty of Petroleum, Gas, and Petrochemical Engineering, Persian Gulf University, Bushehr, Iran
| | - Yousef Kazemzadeh
- Department of Petroleum Engineering, Faculty of Petroleum, Gas, and Petrochemical Engineering, Persian Gulf University, Bushehr, Iran.
| | - Ali Ranjbar
- Department of Petroleum Engineering, Faculty of Petroleum, Gas, and Petrochemical Engineering, Persian Gulf University, Bushehr, Iran
| |
Collapse
|
5
|
Abedi E, Sayadi M, Mousavifard M, Roshanzamir F. A comparative study on bath and horn ultrasound-assisted modification of bentonite and their effects on the bleaching efficiency of soybean and sunflower oil: Machine learning as a new approach for mathematical modeling. Food Sci Nutr 2024; 12:6752-6771. [PMID: 39554347 PMCID: PMC11561808 DOI: 10.1002/fsn3.4300] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/31/2024] [Accepted: 06/18/2024] [Indexed: 11/19/2024] Open
Abstract
In this study, the effect of high-power bath and horn ultrasound at different powers on specific surface area (S BET), total pore volume (V total), and average pore volume (D ave) of bleaching clay was examined. After subjecting the bleaching clay to ultrasonication treatment, the SBET values demonstrated an escalation from 31.4 ± 2.7 m2 g-1 to 59.8 ± 3.1 m2 g-1 for HU200BC, 143.8 ± 3.9 m2 g-1 for HU400BC, 54.4 ± 3.6 m2 g-1 for BU400BC, and 137.5 ± 2.8 m2 g-1 for BU800BC. The mean pore diameter (D ave) declined from 29.7 ± 0.14 nm in bleaching clay to 11.3 ± 0.13 nm in HU200BC, 8.3 ± 0.12 nm in HU400BC, 16.7 ± 0.14 nm in BU400BC, and 9.6 ± 0.12 nm in BU800BC. Therefore, horn ultrasound-treated bleaching clay significantly increased S BET and V total, indicating improved adsorption capacity. Moreover, to establish the relationship between bleaching parameters, seven multi-output ML regression models of Feedforward Neural Network (FNN), Random Forest (RF), Support Vector Regression (SVR), Multi-Task Lasso, Ridge regression, Extreme Gradient Boosting (XGBoost), and Gradient Boosting are used, and compared with response surface methodology (RSM). ML has revolutionized the understanding of complex relationships between ultrasonic parameters, oil color, and pigment degradation, providing insights into how various factors such as temperature, ultrasonic power, and time can influence the bleaching process, ultimately enhancing the efficiency and precision of the treatment. The XGBoost model showed outstanding performance in predicting the target variables with a high R 2-train up to 1, R 2-test up to .983, and a minimum mean absolute error (MAE) of 0.498. The lower error between the predicted and experimental values implies the superiority of the XGBoost model to predict outcomes rather than RSM. It represents the suitability of bath ultrasound as a mild condition for low-pigmented oil bleaching. Finally, the Bayesian optimization method in conjunction with XGBoost was used to optimize the amount of bleaching clay and energy consumption, and its performance was compared with RSM. It was observed that the consumption of bleaching clay was reduced by approximately 60% for sunflower oil and 30%-35% for soybean oil.
Collapse
Affiliation(s)
- Elahe Abedi
- Department of Food Science and Technology, Faculty of AgricultureFasa UniversityFasaIran
| | - Mehran Sayadi
- Department of Food Safety and Hygiene, School of HealthFasa University of Medical SciencesFasaIran
| | - Maryam Mousavifard
- Department of Civil Engineering, Faculty of EngineeringFasa UniversityFasaIran
| | - Farzad Roshanzamir
- Department of Food Safety and Hygiene, School of HealthFasa University of Medical SciencesFasaIran
| |
Collapse
|
6
|
Zhang T, Huo MD, Ma Z, Hu J, Liang Q, Chen H. Prediction model of stock return on investment based on hybrid DNN and TabNet model. PeerJ Comput Sci 2024; 10:e2057. [PMID: 39678265 PMCID: PMC11639136 DOI: 10.7717/peerj-cs.2057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 04/23/2024] [Indexed: 12/17/2024]
Abstract
With the development of the social economy, research on stock market prediction is in full swing. However, the fluctuations in stock price and returns are influenced by many factors, including political policies, market environment, investor psychology, and so on. The traditional analysis method, based on subjective experience, requires significant time and effort, and its prediction accuracy is often poor. Now, the application of machine learning algorithms to predict stock returns has become a hot topic among scholars. This article comprehensively analyzes the advantages and disadvantages of support vector machine (SVM), tree-based algorithms, and neural network algorithms in processing tabular data and time series data. It proposes a hybrid model based on the deep neural network (DNN) and TabNet models, combining the strengths of the DNN and tree-based models. In the model training stage, two neural networks are established to accept the inputs of ID features and numerical features, respectively, and multiple fully connected layers are used to complete the construction of the DNN model. The TabNet is implemented based on the attention transformer and feature transformer, and the prediction results of the two models are fused. The proposed model has a best Pearson correlation coefficient (PCC) value and a lowest root mean square error (RMSE) value at the same time, because the hybrid algorithm performs particularly well on large data sets with the least feature engineering and has strong interpretability, such as quantifying the contribution of different features in the model, it has certain theoretical significance and wide application value.
Collapse
Affiliation(s)
- Tonghui Zhang
- Brooks School of Public Policy, Cornell University, Ithaca, NY, United States of America
| | - Ming Da Huo
- Jinan University, Guangzhou, Guangdong, China
| | - Zhaozhao Ma
- Central South University, Changsha, Hunan, China
| | - Jiajun Hu
- National University of Singapore, Singapore, Singapore, Singapore
| | - Qian Liang
- Yunnan Normal University, Kunming, Yunnan, China
| | - Heng Chen
- Fuzhou software vocational and technical college, Fuzhou, Fujian, China
| |
Collapse
|
7
|
Shulajkovska M, Smerkol M, Dovgan E, Gams M. A machine-learning approach to a mobility policy proposal. Heliyon 2023; 9:e20393. [PMID: 37842632 PMCID: PMC10568339 DOI: 10.1016/j.heliyon.2023.e20393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 09/18/2023] [Accepted: 09/21/2023] [Indexed: 10/17/2023] Open
Abstract
The objective of the URBANITE project is to design an open-data, open-source, smart-city framework to enhance the decision-making processes in European cities. The framework's basis is a robust and user-friendly simulation tool that is supplemented with several innovative service modules. One of the modules, a multi-output, machine-learning unit, is deployed on the simulation results, enabling city officials to more effectively analyse vast quantities of data, discern patterns and trends, and so facilitate advanced policy decisions. The city's decision makers define potential city scenarios, key performance indicators, and a utility function, while the module assists in identifying the policy that is best aligned with the stipulated constraints and preferences. One of the main improvements is a speeding up of the policy testing for the decision makers, reducing the time needed for one policy verification from 3 hours to around 10 seconds. The system was evaluated for Bilbao's Moyua area, where it suggested strategies that could result in a decrease in emissions of more than 5% C O 2 , NOx, PM in the selected area and a broader part of the city with a machine-learning accuracy of 91%. The system was therefore able to provide valuable insights into effective policies for restricting private traffic in specific districts and identifying the most advantageous times for these restrictions.
Collapse
Affiliation(s)
| | - Maj Smerkol
- Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia
| | - Erik Dovgan
- Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia
| | - Matjaž Gams
- Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
8
|
Shi T, Chen S. Robust Twin Support Vector Regression with Smooth Truncated Hε Loss Function. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11198-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
9
|
Xiao Y, Liu J, Wen K, Liu B, Zhao L, Kong X. A least squares twin support vector machine method with uncertain data. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03897-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|