1
|
Li Y, Xiang B, Wang T, He Y, Liu X, Li Y, Ren S, Wang E, Guo G. Applications of machine learning in potentially toxic elemental contamination in soils: A review. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2025; 295:118110. [PMID: 40188733 DOI: 10.1016/j.ecoenv.2025.118110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 02/24/2025] [Accepted: 03/24/2025] [Indexed: 04/21/2025]
Abstract
Soil contamination by potentially toxic elements (PTEs) poses substantial risks to the environment and human health. Traditional investigational methods are often inadequate for large-scale assessments because they are time-consuming, costly, and have a limited accuracy. Machine learning (ML) techniques have emerged as promising tools in environmental studies because of their superiority in processing high-dimensional and unstructured data. However, critical evaluations of contemporary ML applications and methods in PTEs content, distribution, and identification remain scarce. To address this research gap, this study reviews applications of ML to soil PTEs contamination including content prediction, spatial distribution, source identification, and other related tasks. Hyperspectral data combined with ML methods can predict the content of PTEs in large-scale areas at a low cost. In addition, ML algorithms that integrate environmental covariates offer superior performance in spatial predictions compared with traditional geostatistical methods. Moreover, ML techniques incorporated with receptor models provide important advances in the quantitative identification and apportioning of PTE sources, thereby supporting effective environmental management and risk assessment. Based on the frequency of the variables used, we propose that soil pH, soil organic matter (SOM), industrial activities, soil texture, and other relevant factors are key environmental variables that enhance the accuracy of predictions regarding the spatial distribution and source identification of PTEs. From these findings, ML techniques, through their powerful data processing capabilities, provide new perspectives and tools for the efficient assessment and management of soil PTEs contamination.
Collapse
Affiliation(s)
- Yan Li
- Chinese Research Academy of Environmental Sciences, State Key Laboratory of Environmental Criteria and Risk Assessment, Beijing 100012, China; Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China
| | - Bao Xiang
- Chinese Research Academy of Environmental Sciences, State Key Laboratory of Environmental Criteria and Risk Assessment, Beijing 100012, China.
| | - Tianyang Wang
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China
| | - Yinhai He
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China
| | - Xiaoyang Liu
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China
| | - Yancheng Li
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China
| | - Shichang Ren
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China
| | - Erdan Wang
- Chinese Research Academy of Environmental Sciences, State Key Laboratory of Environmental Criteria and Risk Assessment, Beijing 100012, China
| | - Guanlin Guo
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China.
| |
Collapse
|
2
|
Milinovic J, Santos P, Sant'Ovaia H, Futuro A, Pereira CM, Murton BJ, Flores D, Azenha M. Multivariate analysis applied to X-ray fluorescence to assess soil contamination pathways: case studies of mass magnetic susceptibility in soils near abandoned coal and W/Sn mines. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2024; 46:202. [PMID: 38696051 PMCID: PMC11065930 DOI: 10.1007/s10653-024-01988-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 04/06/2024] [Indexed: 05/05/2024]
Abstract
Determining the origin and pathways of contaminants in the natural environment is key to informing any mitigation process. The mass magnetic susceptibility of soils allows a rapid method to measure the concentration of magnetic minerals, derived from anthropogenic activities such as mining or industrial processes, i.e., smelting metals (technogenic origin), or from the local bedrock (of geogenic origin). This is especially effective when combined with rapid geochemical analyses of soils. The use of multivariate analysis (MVA) elucidates complex multiple-component relationships between soil geochemistry and magnetic susceptibility. In the case of soil mining sites, X-ray fluorescence (XRF) spectroscopic data of soils contaminated by mine waste shows statistically significant relationships between magnetic susceptibility and some base metal species (e.g., Fe, Pb, Zn, etc.). Here, we show how qualitative and quantitative MVA methodologies can be used to assess soil contamination pathways using mass magnetic susceptibility and XRF spectra of soils near abandoned coal and W/Sn mines (NW Portugal). Principal component analysis (PCA) showed how the first two primary components (PC-1 + PC-2) explained 94% of the sample variability, grouped them according to their geochemistry and magnetic susceptibility in to geogenic and technogenic groups. Regression analyses showed a strong positive correlation (R2 > 0.95) between soil geochemistry and magnetic properties at the local scale. These parameters provided an insight into the multi-element variables that control magnetic susceptibility and indicated the possibility of efficient assessment of potentially contaminated sites through mass-specific soil magnetism.
Collapse
Affiliation(s)
- Jelena Milinovic
- Chemistry and Biochemistry Department, Faculty of Sciences, CIQ‑UP, Institute of Molecular Sciences (IMS), University of Porto, Rua do Campo Alegre s/n, 4169‑007, Porto, Portugal.
| | - Patrícia Santos
- Institute of Earth Sciences, Pole of University of Porto, 4169-007, Porto, Portugal
- Department of Geosciences, Environment and Spatial Planning FCUP, University of Porto, 4169-007, Porto, Portugal
| | - Helena Sant'Ovaia
- Institute of Earth Sciences, Pole of University of Porto, 4169-007, Porto, Portugal
- Department of Geosciences, Environment and Spatial Planning FCUP, University of Porto, 4169-007, Porto, Portugal
| | - Aurora Futuro
- CERENA, Faculdade de Engenharia da Universidade do Porto, Rua Dr Roberto Frias s/n, 4200-465, Porto, Portugal
| | - Carlos M Pereira
- Chemistry and Biochemistry Department, Faculty of Sciences, CIQ‑UP, Institute of Molecular Sciences (IMS), University of Porto, Rua do Campo Alegre s/n, 4169‑007, Porto, Portugal
| | - Bramley J Murton
- NOC, National Oceanography Centre, European Way, Southampton, SO14 3ZH, UK
| | - Deolinda Flores
- Institute of Earth Sciences, Pole of University of Porto, 4169-007, Porto, Portugal
- Department of Geosciences, Environment and Spatial Planning FCUP, University of Porto, 4169-007, Porto, Portugal
| | - Manuel Azenha
- Chemistry and Biochemistry Department, Faculty of Sciences, CIQ‑UP, Institute of Molecular Sciences (IMS), University of Porto, Rua do Campo Alegre s/n, 4169‑007, Porto, Portugal
| |
Collapse
|
3
|
Zou Z, Wang Q, Wu Q, Li M, Zhen J, Yuan D, Zhou M, Xu C, Wang Y, Zhao Y, Yin S, Xu L. Inversion of heavy metal content in soil using hyperspectral characteristic bands-based machine learning method. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 355:120503. [PMID: 38457894 DOI: 10.1016/j.jenvman.2024.120503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 01/16/2024] [Accepted: 02/25/2024] [Indexed: 03/10/2024]
Abstract
The global concern regarding the adverse effects of heavy metal pollution in soil has grown significantly. Accurate prediction of heavy metal content in soil is crucial for environmental protection. This study proposes an inversion analysis method for heavy metals (As, Cd, Cr, Cu, Ni, Pb) in soil based on hyperspectral and machine learning algorithms for 21 soil reference materials from multiple provinces in China. On this basis, an integrated learning model called Stacked RF (the base model is XGBoost, LightGBM, CatBoost, and the meta-model is RF) was established to perform soil heavy metal inversion. Specifically, three popular algorithms were initially employed to preprocess the spectral data, then Random Forest (RF) was used to select the best feature bands to reduce the impact of noise, finally Stacking and four basic machine learning algorithms were used to establish comparisons and analysis of inversion model. Compared with traditional machine learning methods, the stacking model showcases enhanced stability and superior accuracy. Research results indicate that machine learning algorithms, especially ensemble learning models, have better inversion effects on heavy metals in soil. Overall, the MF-RF-Stacking model performed best in the inversion of the six heavy metals. The research results will provide a new perspective on the ensemble learning model method for soil heavy metal content inversion using data of hyperspectral characteristic bands collected from soil reference materials.
Collapse
Affiliation(s)
- Zhiyong Zou
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Qianlong Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Qingsong Wu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Menghua Li
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Jiangbo Zhen
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Dongyu Yuan
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Man Zhou
- College of Food Science, Sichuan Agricultural University, Ya'an, 625014, China
| | - Chong Xu
- Ruijie Networks Co., Ltd., Chengdu, 610000, China
| | - Yuchao Wang
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Yongpeng Zhao
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China
| | - Shutao Yin
- Institute of Modern Agricultural Industry, China Agricultural University, Chengdu, Sichuan, 611430, China.
| | - Lijia Xu
- College of Mechanical and Electrical Engineering, Sichuan Agricultural University, Ya'an, 625014, China.
| |
Collapse
|
4
|
Agyeman PC, Borůvka L, Kebonye NM, Khosravi V, John K, Drabek O, Tejnecky V. Prediction of the concentration of cadmium in agricultural soil in the Czech Republic using legacy data, preferential sampling, Sentinel-2, Landsat-8, and ensemble models. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 330:117194. [PMID: 36603265 DOI: 10.1016/j.jenvman.2022.117194] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/23/2022] [Accepted: 12/30/2022] [Indexed: 06/17/2023]
Abstract
The current study assesses and predicts cadmium (Cd) concentration in agricultural soil using two Cd datasets, namely legacy data (LD) and preferential sampling-legacy data (PS-LD), along with four streams of auxiliary datasets extracted from Sentinel-2 (S2) and Landsat-8 (L8) bands. The study was divided into two contexts: Cd prediction in agricultural soil using LD, ensemble models, 10 and 20 m spatial resolution of S2 and L8 (context 1), and Cd prediction in agricultural soil using PS-LD, ensemble models and 10 and 20 m spatial resolution of S2 and L8 (context 2). In context 1, ensemble 1, L8 with PS-LD was the cumulative optimal approach that predicted Cd in agricultural soil with a higher R2 value of 0.76, root mean square error (RMSE) of 0.66, mean absolute error (MAE) of 0.35, and median absolute error (MdAE) of 0.13. However, with R2 = 0.78, RMSE = 0.63, MAE = 0.34, and MdAE = 0.15, ensemble 1, S2 of PS-LD was the best prediction approach in predicting Cd concentration in agricultural soil in context 2. Overall, the predictions from both contexts indicated that ensemble 1 of S2 combined with PS-LD was the most appropriate and best model for Cd prediction in agricultural soil. The modeling approaches' uncertainty in both contexts was assessed using ensemble-sequential gaussian simulation (EnSGS), which revealed that the degree of uncertainty propagated in the study area was within 5% in both contexts. The combination of the PS dataset and the LD along with ensemble models and the remote sensing dataset, produced promising results. Nonetheless, the results demonstrated that the 20 m spatial resolution band dataset used in the prediction of Cd in agricultural soil outperformed the 10 m spatial resolution. When PS is combined with LD, an appropriate modeling approach, and a well-correlated remote sensing dataset are used, good results are obtained.
Collapse
Affiliation(s)
- Prince Chapman Agyeman
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500, Prague, Czech Republic.
| | - Luboš Borůvka
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500, Prague, Czech Republic
| | - Ndiye Michael Kebonye
- Department of Geosciences, Chair of Soil Science and Geomorphology, University of Tübingen, Rümelinstr. 19-23, Tübingen, Germany; DFG Cluster of Excellence "Machine Learning: New Perspectives for Science", University of Tübingen, AI Research Building, Maria-von-Linden-Str. 6, 72076, Tübingen, Germany
| | - Vahid Khosravi
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500, Prague, Czech Republic
| | - Kingsley John
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500, Prague, Czech Republic
| | - Ondrej Drabek
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500, Prague, Czech Republic
| | - Vaclav Tejnecky
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and Natural Resources, Czech University of Life Sciences Prague, 16500, Prague, Czech Republic
| |
Collapse
|
5
|
Agyeman PC, Kebonye NM, Khosravi V, Kingsley J, Borůvka L, Vašát R, Boateng CM. Optimal zinc level and uncertainty quantification in agricultural soils via visible near-infrared reflectance and soil chemical properties. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 326:116701. [PMID: 36395645 DOI: 10.1016/j.jenvman.2022.116701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/25/2022] [Accepted: 11/01/2022] [Indexed: 06/16/2023]
Abstract
Zinc (Zn) is a vital element required by all living creatures for optimal health and ecosystem functioning. Therefore, several researchers have modeled and mapped its occurrence and distribution in soils. Nonetheless, leveraging model predictive performances while coupling information derived from visible near-infrared (Vis-NIR) and soils (i.e. chemical properties) to estimate potential toxic elements (PTEs) like Zn in agricultural soils is largely untapped. This study applies two methods to rapidly monitor Zn concentration in agricultural soil. Firstly, employing Vis-NIR and machine learning algorithms (MLAs) (Context 1) and secondly, applying Vis-NIR, soil chemical properties (SCP), and MLAs (Context 2). For the Vis-NIR information, single and combined pretreatment methods were applied. The following MLAs were used: conditional inference forest (CIF), partial least squares regression (PLSR), M5 tree model (M5), extreme gradient boosting (EGB), and support vector machine regression (SVMR) respectively. For context 1, the results indicated that M5-MSC (M5 tree model-multiplicative scatter correction) with coefficient of determination (R2) = 0.72, root mean square error (RMSE) = 21.08 (mg/kg), median absolute error (MdAE) = 13.69 and ratio of performance to interquartile range (RPIQ) = 1.63 was promising. Regarding context 2, CIF with spectral pretreatment and soil properties [CIF-DWTLOGMSC + SCP (conditional inference forest-discrete wavelet transformation-logarithmic transformation-multiplicative scatter correction-soil chemical properties)] yielded the best performance of R2 = 0.86, RMSE = 14.52 (mg/kg), MdAE = 6.25 and RPIQ = 1.78. Altogether, for contexts 1 and 2, the CIF-DWTLOGMSC + SCP approach (context 2) was the best Zn model outcome for the agricultural soil. The uncertainty map revealed a low to high error distribution in context 1, and a low to moderate distribution in context 2 for all models except CIF, which had some patches with high uncertainty. We conclude that a multiple optimization approach for modeling Zn levels in agricultural soils is invaluable and may provide fast and reliable information needed for area-specific decision-making.
Collapse
Affiliation(s)
- Prince Chapman Agyeman
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and NaturalResources, Czech University of Life Sciences Prague, 16500 Prague, Czech Republic.
| | - Ndiye Michael Kebonye
- Department of Geosciences, Chair of Soil Science and Geomorphology, University Of Tübingen, Rümelinstr. 19-23, Tübingen, Germany; DFG Cluster of Excellence "Machine Learning", University of Tübingen, AI Research Building, Maria-von-Linden-Str. 6, 72076, Tübingen, Germany
| | - Vahid Khosravi
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and NaturalResources, Czech University of Life Sciences Prague, 16500 Prague, Czech Republic
| | - John Kingsley
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and NaturalResources, Czech University of Life Sciences Prague, 16500 Prague, Czech Republic
| | - Luboš Borůvka
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and NaturalResources, Czech University of Life Sciences Prague, 16500 Prague, Czech Republic
| | - Radim Vašát
- Department of Soil Science and Soil Protection, Faculty of Agrobiology, Food and NaturalResources, Czech University of Life Sciences Prague, 16500 Prague, Czech Republic
| | | |
Collapse
|