1
|
Mwakapesa DS, Lan X, Mao Y. Landslide susceptibility assessment using deep learning considering unbalanced samples distribution. Heliyon 2024; 10:e30107. [PMID: 38707366 PMCID: PMC11068606 DOI: 10.1016/j.heliyon.2024.e30107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/18/2024] [Accepted: 04/19/2024] [Indexed: 05/07/2024] Open
Abstract
Landslide susceptibility assessment (LSA) is fundamental for managing landslide geological disasters. This study presents a deep learning approach (DNN-MSFM) designed to enhance LSA modeling, particularly addressing limitations caused by the unbalanced distribution of data samples in applied datasets. DNN-MSFM approach combines a deep neural network (DNN) and a mean squared false misclassification loss function (MSFM) to handle unbalanced samples from the algorithmic perspective. The model's performance was evaluated on an unbalanced dataset containing mapping units' records of 293 landslide samples and 653 non-landslide samples from the Baota District, China. Its effectiveness was assessed through statistical metrics and compared against DNN and Support Vector Machine (SVM) basic models. The results demonstrated a significant performance enhancement from the DNN-MSFM (OverallAccuracy = 0.889 and area under the receiver operating characteristic curve (AUC) = 0.84), indicating its effectiveness in learning the underlying landslide susceptibility features and demonstrating its ability to provide improved predictions even in areas with unbalanced landslide samples. Moreover, the study emphasizes the importance of considering balanced loss functions in training DNN under various imbalance degrees and contributes to expanding the applicability of DNN in LSA modeling. Also, this study builds a foundation for further enhancements of deep learning methods for geological disaster assessments.
Collapse
Affiliation(s)
- Deborah Simon Mwakapesa
- School of Civil, and Surveying, & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China
| | - Xiaoji Lan
- School of Civil, and Surveying, & Mapping Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China
| | - Yimin Mao
- School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, China
- School of Information Engineering, Shaoguan University, Shaoguan 512005, China
| |
Collapse
|
2
|
Gu T, Duan P, Wang M, Li J, Zhang Y. Effects of non-landslide sampling strategies on machine learning models in landslide susceptibility mapping. Sci Rep 2024; 14:7201. [PMID: 38532140 DOI: 10.1038/s41598-024-57964-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 03/23/2024] [Indexed: 03/28/2024] Open
Abstract
This study aims to explore the effects of different non-landslide sampling strategies on machine learning models in landslide susceptibility mapping. Non-landslide samples are inherently uncertain, and the selection of non-landslide samples may suffer from issues such as noisy or insufficient regional representations, which can affect the accuracy of the results. In this study, a positive-unlabeled (PU) bagging semi-supervised learning method was introduced for non-landslide sample selection. In addition, buffer control sampling (BCS) and K-means (KM) clustering were applied for comparative analysis. Based on landslide data from Qiaojia County, Yunnan Province, China, collected in 2014, three machine learning models, namely, random forest, support vector machine, and CatBoost, were used for landslide susceptibility mapping. The results show that the quality of samples selected using different non-landslide sampling strategies varies significantly. Overall, the quality of non-landslide samples selected using the PU bagging method is superior, and this method performs best when combined with CatBoost for predicting (AUC = 0.897) landslides in very high and high susceptibility zones (82.14%). Additionally, the KM results indicated overfitting, displaying high accuracy for validation but poor statistical outcomes for zoning. The BCS results were the worst.
Collapse
Affiliation(s)
- Tengfei Gu
- Faculty of Geography, Yunnan Normal University, Kunming, 650500, China
- Badong National Observation and Research Station of Geohazards, China University of Geosciences (Wuhan), Wuhan, 430074, China
| | - Ping Duan
- Faculty of Geography, Yunnan Normal University, Kunming, 650500, China.
| | - Mingguo Wang
- Yunnan Institute of Geological Surveying and Mapping Co., Ltd., Kunming, 650051, China
| | - Jia Li
- Faculty of Geography, Yunnan Normal University, Kunming, 650500, China
| | - Yanke Zhang
- Wuhan Tianjihang Information Technology Co., Ltd., Wuhan, 430074, China
| |
Collapse
|
3
|
Gu T, Li J, Wang M, Duan P, Zhang Y, Cheng L. Study on landslide susceptibility mapping with different factor screening methods and random forest models. PLoS One 2023; 18:e0292897. [PMID: 37824559 PMCID: PMC10569556 DOI: 10.1371/journal.pone.0292897] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 09/29/2023] [Indexed: 10/14/2023] Open
Abstract
The number of input factors affects the prediction accuracy of a model. Factor screening plays an important role as the starting point for data input. The aim of this study is to explore the influence of different factor screening methods on the prediction results. Taking the 2014 landslide inventory of Jingdong County as an example, a landslide database was constructed based on 136 landslide events and 11 selected factors, which were randomly divided into a training dataset and a test dataset according to a ratio of 7:3. Four factor screening methods, namely, the information gain ratio (IGR), GeoDetector, Pearson correlation coefficient and multicollinearity test (MT), were selected to screen the factors. A random forest (RF) model was then used in combination with each factor set for landslide susceptibility mapping (LSM). Finally, accuracy validation was performed using confusion matrices and ROC curves. The results show that factor screening is beneficial in improving the accuracy of the resulting model compared to the original model. Second, the IGR_RF model had the highest AUC value (0.9334), which was higher than that of the MT_RF model without factor screening (0.9194), and the IGR_RF model predicted the most landslides in the very high susceptibility zone (51.22%), indicating the good prediction performance of the IGR_RF model. Finally, the factor weighting analysis revealed that NDVI, elevation and aspect had the greatest influence on landslides in Jingdong County and that curvature had the least influence on landslides. This study can provide a reference for factor screening in LSM.
Collapse
Affiliation(s)
- Tengfei Gu
- Faculty of Geography, Yunnan Normal University, Kunming, Yunnan Province, China
- Badong National Observation and Research Station of Geohazards, China University of Geosciences (Wuhan), Wuhan, Hubei Province, China
| | - Jia Li
- Faculty of Geography, Yunnan Normal University, Kunming, Yunnan Province, China
| | - Mingguo Wang
- Yunnan Institute of Geological Surveying and Mapping Company Limited, Kunming, Yunnan Province, China
| | - Ping Duan
- Faculty of Geography, Yunnan Normal University, Kunming, Yunnan Province, China
| | - Yanke Zhang
- Wuhan Tianjihang Information Technology Company Limited, Wuhan, Hubei Province, China
| | - Libo Cheng
- Faculty of Geography, Yunnan Normal University, Kunming, Yunnan Province, China
| |
Collapse
|
4
|
Chang W, Wang X, Yang J, Qin T. An Improved CatBoost-Based Classification Model for Ecological Suitability of Blueberries. SENSORS (BASEL, SWITZERLAND) 2023; 23:1811. [PMID: 36850409 PMCID: PMC9961688 DOI: 10.3390/s23041811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Selecting the best planting area for blueberries is an essential issue in agriculture. To better improve the effectiveness of blueberry cultivation, a machine learning-based classification model for blueberry ecological suitability was proposed for the first time and its validation was conducted by using multi-source environmental features data in this paper. The sparrow search algorithm (SSA) was adopted to optimize the CatBoost model and classify the ecological suitability of blueberries based on the selection of data features. Firstly, the Borderline-SMOTE algorithm was used to balance the number of positive and negative samples. The Variance Inflation Factor and information gain methods were applied to filter out the factors affecting the growth of blueberries. Subsequently, the processed data were fed into the CatBoost for training, and the parameters of the CatBoost were optimized to obtain the optimal model using SSA. Finally, the SSA-CatBoost model was adopted to classify the ecological suitability of blueberries and output the suitability types. Taking a study on a blueberry plantation in Majiang County, Guizhou Province, China as an example, the findings demonstrate that the AUC value of the SSA-CatBoost-based blueberry ecological suitability model is 0.921, which is 2.68% higher than that of the CatBoost (AUC = 0.897) and is significantly higher than Logistic Regression (AUC = 0.855), Support Vector Machine (AUC = 0.864), and Random Forest (AUC = 0.875). Furthermore, the ecological suitability of blueberries in Majiang County is mapped according to the classification results of different models. When comparing the actual blueberry cultivation situation in Majiang County, the classification results of the SSA-CatBoost model proposed in this paper matches best with the real blueberry cultivation situation in Majiang County, which is of a high reference value for the selection of blueberry cultivation sites.
Collapse
Affiliation(s)
- Wenfeng Chang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Xiao Wang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Jing Yang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Tao Qin
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| |
Collapse
|
5
|
Huang J, Lv P, Lian Y, Zhang M, Ge X, Li S, Pan Y, Zhao J, Xu Y, Tang H, Li N, Zhang Z. Construction of machine learning tools to predict threatened miscarriage in the first trimester based on AEA, progesterone and β-hCG in China: a multicentre, observational, case-control study. BMC Pregnancy Childbirth 2022; 22:697. [PMID: 36085038 PMCID: PMC9461209 DOI: 10.1186/s12884-022-05025-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Background Endocannabinoid anandamide (AEA), progesterone (P4) and β-human chorionic gonadotrophin (β-hCG) are associated with the threatened miscarriage in the early stage. However, no study has investigated whether combing these three hormones could predict threatened miscarriage. Thus, we aim to establish machine learning models utilizing these three hormones to predict threatened miscarriage risk. Methods This is a multicentre, observational, case-control study involving 215 pregnant women. We recruited 119 normal pregnant women and 96 threatened miscarriage pregnant women including 58 women with ongoing pregnancy and 38 women with inevitable miscarriage. P4 and β-hCG levels were detected by chemiluminescence immunoassay assay. The level of AEA was tested by ultra-high-performance liquid chromatography-tandem mass spectrometry. Six predictive machine learning models were established and evaluated by the confusion matrix, area under the receiver operating characteristic (ROC) curve (AUC), accuracy and precision. Results The median concentration of AEA was significantly lower in the healthy pregnant women group than that in the threatened miscarriage group, while the median concentration of P4 was significantly higher in the normal pregnancy group than that in the threatened miscarriage group. Only the median level of P4 was significantly lower in the inevitable miscarriage group than that in the ongoing pregnancy group. Moreover, AEA is strongly positively correlated with threatened miscarriage, while P4 is negatively correlated with both threatened miscarriage and inevitable miscarriage. Interestingly, AEA and P4 are negatively correlated with each other. Among six models, logistic regression (LR), support vector machine (SVM) and multilayer perceptron (MLP) models obtained the AUC values of 0.75, 0.70 and 0.70, respectively; and their accuracy and precision were all above 0.60. Among these three models, the LR model showed the highest accuracy (0.65) and precision (0.70) to predict threatened miscarriage. Conclusions The LR model showed the highest overall predictive power, thus machine learning combined with the level of AEA, P4 and β-hCG might be a new approach to predict the threatened miscarriage risk in the near feature. Supplementary Information The online version contains supplementary material available at 10.1186/s12884-022-05025-y.
Collapse
|
6
|
Ge R, Lv Y, Tao W. A Statistical Prediction Model for Healthcare and Landslide Sensitivity Evaluation in Coal Mining Subsidence Area. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1805689. [PMID: 35607472 PMCID: PMC9124098 DOI: 10.1155/2022/1805689] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 04/10/2022] [Accepted: 04/19/2022] [Indexed: 11/26/2022]
Abstract
The purpose of this study is to compare the results of the frequency ratio (FR) model with the weight of evidence (WOE) and the logical regression (LR) methods when applied to the landslide susceptibility evaluation in coal mining subsidence areas. Key geological disaster prevention and control areas are taken as the research areas. Field investigation is carried out according to the recorded landslide disaster points in the past five years, and 86 landslide disaster points are determined from the remote sensing satellite images. Furthermore, 12 factors affecting the occurrence of landslide are selected as landslide sensitivity evaluation factors. Among them, slope degree, curvature, elevation, and slope aspect are derived using the digital elevation model (DEM) through 30 m × 30 m resolution. The DEM datasets are derived from the geospatial data cloud, lithology datasets are derived from the geological lithology maps, and land use type map is derived from the current situation of national land use. The distances between roads and coal mining subsidence areas are calculated according to field investigation and remote sensing image interpretation results. In addition, the evaluation model includes an annual rainfall distribution map. Finally, the accuracy of three models is compared by ROC curve analysis. The elevation results demonstrate that the frequency ratio-logic regression (FR-LR) model takes the maximum accurateness of 0.913, subsequent to the FR model and the frequency ratio-weight of evidence (FR-WOE) model, respectively. Thus, using LR method based on the FR model has guiding significance for predicting the landslide sensitivity in coal mining. This reduces probable risks and disasters that affect human health. Subsequently, this ensures higher safety from the healthcare perspective in the mining fields.
Collapse
Affiliation(s)
- Ruoxin Ge
- School of Mining Engineering, Taiyuan University of Technology, Taiyuan 030024, China
| | - Yiqing Lv
- School of Mining Engineering, Taiyuan University of Technology, Taiyuan 030024, China
| | - Weiheng Tao
- School of Mining Engineering, Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
7
|
Ge X, Zhang A, Li L, Sun Q, He J, Wu Y, Tan R, Pan Y, Zhao J, Xu Y, Tang H, Gao Y. Application of machine learning tools: Potential and useful approach for the prediction of type 2 diabetes mellitus based on the gut microbiome profile. Exp Ther Med 2022; 23:305. [PMID: 35340868 PMCID: PMC8931625 DOI: 10.3892/etm.2022.11234] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 02/09/2022] [Indexed: 12/07/2022] Open
Abstract
The gut microbiota plays an important role in the regulation of the immune system and the metabolism of the host. The aim of the present study was to characterize the gut microbiota of patients with type 2 diabetes mellitus (T2DM). A total of 118 participants with newly diagnosed T2DM and 89 control subjects were recruited in the present study; six clinical parameters were collected and the quantity of 10 different types of bacteria was assessed in the fecal samples using quantitative PCR. Taking into consideration the six clinical variables and the quantity of the 10 different bacteria, 3 predictive models were established in the training set and test set, and evaluated using a confusion matrix, area under the receiver operating characteristic curve (AUC) values, sensitivity (recall), specificity, accuracy, positive predictive value and negative predictive value (npv). The abundance of Bacteroides, Eubacterium rectale and Roseburia inulinivorans was significantly lower in the T2DM group compared with the control group. However, the abundance of Enterococcus was significantly higher in the T2DM group compared with the control group. In addition, Faecalibacterium prausnitzii, Enterococcus and Roseburia inulinivorans were significantly associated with sex status while Bacteroides, Bifidobacterium, Enterococcus and Roseburia inulinivorans were significantly associated with older age. In the training set, among the three models, support vector machine (SVM) and XGboost models obtained AUC values of 0.72 and 0.70, respectively. In the test set, only SVM obtained an AUC value of 0.77, and the precision and specificity were both above 0.77, whereas the accuracy, recall and npv were above 0.60. Furthermore, Bifidobacterium, age and Roseburia inulinivorans played pivotal roles in the model. In conclusion, the SVM model exhibited the highest overall predictive power, thus the combined use of machine learning tools with gut microbiome profiling may be a promising approach for improving early prediction of T2DM in the near feature.
Collapse
Affiliation(s)
- Xiaochun Ge
- Department of Endocrinology, Affiliated Hospital of Chengde Medical University, Chengde, Hebei 067000, P.R. China
| | - Aimin Zhang
- Department of Endocrinology, Affiliated Hospital of Chengde Medical University, Chengde, Hebei 067000, P.R. China
| | - Lihui Li
- Department of Endocrinology, Affiliated Hospital of Chengde Medical University, Chengde, Hebei 067000, P.R. China
| | - Qitian Sun
- Department of Endocrinology, Affiliated Hospital of Chengde Medical University, Chengde, Hebei 067000, P.R. China
| | - Jianqiu He
- Department of Endocrinology, Affiliated Hospital of Chengde Medical University, Chengde, Hebei 067000, P.R. China
| | - Yu Wu
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai 201204, P.R. China
| | - Rundong Tan
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai 201204, P.R. China
| | - Yingxia Pan
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai 201204, P.R. China
| | - Jiangman Zhao
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai 201204, P.R. China
| | - Yue Xu
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai 201204, P.R. China
| | - Hui Tang
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai 201204, P.R. China
| | - Yu Gao
- Department of Endocrinology, Affiliated Hospital of Chengde Medical University, Chengde, Hebei 067000, P.R. China
| |
Collapse
|
8
|
Ghasemian B, Shahabi H, Shirzadi A, Al-Ansari N, Jaafari A, Kress VR, Geertsema M, Renoud S, Ahmad A. A Robust Deep-Learning Model for Landslide Susceptibility Mapping: A Case Study of Kurdistan Province, Iran. SENSORS (BASEL, SWITZERLAND) 2022; 22:1573. [PMID: 35214473 PMCID: PMC8878333 DOI: 10.3390/s22041573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 01/26/2022] [Accepted: 01/26/2022] [Indexed: 06/14/2023]
Abstract
We mapped landslide susceptibility in Kamyaran city of Kurdistan Province, Iran, using a robust deep-learning (DP) model based on a combination of extreme learning machine (ELM), deep belief network (DBN), back propagation (BP), and genetic algorithm (GA). A total of 118 landslide locations were recorded and divided in the training and testing datasets. We selected 25 conditioning factors, and of these, we specified the most important ones by an information gain ratio (IGR) technique. We assessed the performance of the DP model using statistical measures including sensitivity, specificity, accuracy, F1-measure, and area under-the-receiver operating characteristic curve (AUC). Three benchmark algorithms, i.e., support vector machine (SVM), REPTree, and NBTree, were used to check the applicability of the proposed model. The results by IGR concluded that of the 25 conditioning factors, only 16 factors were important for our modeling procedure, and of these, distance to road, road density, lithology and land use were the four most significant factors. Results based on the testing dataset revealed that the DP model had the highest accuracy (0.926) of the compared algorithms, followed by NBTree (0.917), REPTree (0.903), and SVM (0.894). The landslide susceptibility maps prepared from the DP model with AUC = 0.870 performed the best. We consider the DP model a suitable tool for landslide susceptibility mapping.
Collapse
Affiliation(s)
- Bahareh Ghasemian
- Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 6617715175, Iran;
| | - Himan Shahabi
- Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj 6617715175, Iran;
| | - Ataollah Shirzadi
- Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj 6617715175, Iran;
| | - Nadhir Al-Ansari
- Civil, Environmental and Natural Resources Engineering, Lulea University of Technology, 97187 Lulea, Sweden;
| | - Abolfazl Jaafari
- Research Institute of Forests and Rangelands, Agricultural Research, Education and Extension Organization (AREEO), Tehran 1496813111, Iran;
| | - Victoria R. Kress
- Department of Ecosystem Science and Management, University of Northern British Columbia, 3333 University Way, Prince George, BC V2N 4Z9, Canada;
| | - Marten Geertsema
- Research Geomorphologist, Ministry of Forests, Lands, Natural Resource Operations and Rural Development, 499 George Street, Prince George, BC V2L 1R5, Canada;
| | - Somayeh Renoud
- Data Mining Laboratory, Department of Engineering, College of Farabi, University of Tehran, Tehran 1417935840, Iran;
| | - Anuar Ahmad
- Department of Geoinformation, Faculty of Built Environment and Surveying, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Malaysia;
| |
Collapse
|
9
|
Multivariate Analysis and Machine Learning Approach for Mapping the Variability and Vulnerability of Urban Flooding: The Case of Tangier City, Morocco. HYDROLOGY 2021. [DOI: 10.3390/hydrology8040182] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Urban flooding is a complex natural hazard, driven by the interaction between several parameters related to urban development in a context of climate change, which makes it highly variable in space and time and challenging to predict. In this study, we apply a multivariate analysis method (PCA) and four machine learning algorithms to investigate and map the variability and vulnerability of urban floods in the city of Tangier, northern Morocco. Thirteen parameters that could potentially affect urban flooding were selected and divided into two categories: geo-environmental parameters and socio-economic parameters. PCA processing allowed identifying and classifying six principal components (PCs), totaling 73% of the initial information. The scores of the parameters on the PCs and the spatial distribution of the PCs allow to highlight the interconnection between the topographic properties and urban characteristics (population density and building density) as the main source of variability of flooding, followed by the relationship between the drainage (drainage density and distance to channels) and urban properties. All four machine learning algorithms show excellent performance in predicting urban flood vulnerability (ROC curve > 0.9). The Classifications and Regression Tree and Support Vector Machine models show the best prediction performance (ACC = 91.6%). Urban flood vulnerability maps highlight, on the one hand, low lands with a high drainage density and recent buildings, and on the other, higher, steep-sloping areas with old buildings and a high population density, as areas of high to very-high vulnerability.
Collapse
|
10
|
Landslide Susceptibility Mapping in the Vrancea-Buzău Seismic Region, Southeast Romania. GEOSCIENCES 2021. [DOI: 10.3390/geosciences11120495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This study presents the results of a landslide susceptibility analysis applied to the Vrancea-Buzău seismogenic region in the Carpathian Mountains, Romania. The target area is affected by a large diversity of landslide processes. Slopes are made-up of various types of rocks, climatic conditions can be classified as wet, and the area is a seismically active one. All this contributes to the observed high landslide hazard. The paper analyses the spatial component of the landslide hazard affecting the target area, the regional landslide susceptibility. First, an existing landslide inventory was completed to cover a wider area for the landslide susceptibility analysis. Second, two types of methods are applied, a purely statistical technique, based on correlations between landslide occurrence and local conditions, as well as the simplified spatial process-based Newmark Displacement analysis. Landslide susceptibility maps have been produced by applying both methods, the second one also allowing us to simulate different scenarios, based on various soil saturation rates and seismic inputs. Furthermore, landslide susceptibility was computed both for the landslide source and runout zones—the first providing information about areas where landslides are preferentially triggered and the second indicating where landslides preferentially move along the slope and accumulate. The analysis showed that any of the different methods applied produces reliable maps of landslide susceptibility. However, uncertainties were also outlined as validation is insufficient, especially in the northern area, where only a few landslides could be mapped due to the intense vegetation cover.
Collapse
|
11
|
A Meta-Learning Approach of Optimisation for Spatial Prediction of Landslides. REMOTE SENSING 2021. [DOI: 10.3390/rs13224521] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Optimisation plays a key role in the application of machine learning in the spatial prediction of landslides. The common practice in optimising landslide prediction models is to search for optimal/suboptimal hyperparameter values in a number of predetermined hyperparameter configurations based on an objective function, i.e., k-fold cross-validation accuracy. However, the overhead of hyperparameter optimisation can be prohibitive, especially for computationally expensive algorithms. This paper introduces an optimisation approach based on meta-learning for the spatial prediction of landslides. The proposed approach is tested in a dense tropical forested area of Cameron Highlands, Malaysia. Instead of optimising prediction models with a large number of hyperparameter configurations, the proposed approach begins with promising configurations based on several basic and statistical meta-features. The proposed meta-learning approach was tested based on Bayesian optimisation as a hyperparameter tuning algorithm and random forest (RF) as a prediction model. The spatial database was established with a total of 63 historical landslides and 15 conditioning factors. Three RF models were constructed based on (1) default parameters as suggested by the sklearn library, (2) parameters suggested by the Bayesian optimisation (BO), and (3) parameters suggested by the proposed meta-learning approach (BO-ML). Based on five-fold cross-validation accuracy, the Bayesian method achieved the best performance for both the training (0.810) and test (0.802) datasets. The meta-learning approach achieved slightly lower accuracies than the Bayesian method for the training (0.769) and test (0.800) datasets. Similarly, based on F1-score and area under the receiving operating characteristic curves (AUROC), the models with optimised parameters either by the Bayesian or meta-learning methods produced more accurate landslide susceptibility assessment than the model with the default parameters. In the present approach, instead of learning from scratch, the meta-learning would begin with hyperparameter configurations optimal for the most similar previous datasets, which can be considerably helpful and time-saving for landslide modelings.
Collapse
|
12
|
Jia WJ, Wang MF, Zhou CH, Yang QH. Analysis of the spatial association of geographical detector-based landslides and environmental factors in the southeastern Tibetan Plateau, China. PLoS One 2021; 16:e0251776. [PMID: 34014965 PMCID: PMC8136732 DOI: 10.1371/journal.pone.0251776] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 05/04/2021] [Indexed: 11/25/2022] Open
Abstract
Steep canyons surrounded by high mountains resulting from large-scale landslides characterize the study area located in the southeastern part of the Tibetan Plateau. A total of 1766 large landslides were identified based on integrated remote sensing interpretations utilizing multisource satellite images and topographic data that were dominated by 3 major regional categories, namely, rockslides, rock falls, and flow-like landslides. The geographical detector method was applied to quantitatively unveil the spatial association between the landslides and 12 environmental factors through computation of the q values based on spatially stratified heterogeneity. Meanwhile, a certainty factor (CF) model was used for comparison. The results indicate that the q values of the 12 influencing factors vary obviously, and the dominant factors are also different for the 3 types of landslides, with annual mean precipitation (AMP) being the dominant factor for rockslide distribution, elevation being the dominant factor for rock fall distribution and lithology being the dominant factor for flow-like distribution. Integrating the results of the factor detector and ecological detector, the AMP, annual mean temperature (AMT), elevation, river density, fault distance and lithology have a stronger influence on the spatial distribution of landslides than other factors. Furthermore, the factor interactions can significantly enhance their interpretability of landslides, and the top 3 dominant interactions were revealed. Based on statistics of landslide discrepancies with respect to diverse stratification of each factor, the high-risk zones were identified for 3 types of landslides, and the results were contrasted with the CF model. In conclusion, our method provides an objective framework for landslide prevention and mitigation through quantitative, spatial and statistical analyses in regions with complex terrain.
Collapse
Affiliation(s)
- Wei-Jie Jia
- State Key laboratory of Resource and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, China Academy of Sciences, Beijing, China
- University of the Chinese Academy of Sciences, Beijing, China
- Department of Geological Remote Sensing, China Aero Geophysical Survey and Remote Sensing Center for Natural Resources, Beijing, China
- * E-mail:
| | - Meng-Fei Wang
- Department of Geological Remote Sensing, China Aero Geophysical Survey and Remote Sensing Center for Natural Resources, Beijing, China
| | - Cheng-Hu Zhou
- State Key laboratory of Resource and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, China Academy of Sciences, Beijing, China
| | - Qing-Hua Yang
- Department of Geological Remote Sensing, China Aero Geophysical Survey and Remote Sensing Center for Natural Resources, Beijing, China
| |
Collapse
|
13
|
Systematic Review of Machine Learning Applications in Mining: Exploration, Exploitation, and Reclamation. MINERALS 2021. [DOI: 10.3390/min11020148] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Recent developments in smart mining technology have enabled the production, collection, and sharing of a large amount of data in real time. Therefore, research employing machine learning (ML) that utilizes these data is being actively conducted in the mining industry. In this study, we reviewed 109 research papers, published over the past decade, that discuss ML techniques for mineral exploration, exploitation, and mine reclamation. Research trends, ML models, and evaluation methods primarily discussed in the 109 papers were systematically analyzed. The results demonstrated that ML studies have been actively conducted in the mining industry since 2018, mostly for mineral exploration. Among the ML models, support vector machine was utilized the most, followed by deep learning models. The ML models were evaluated mostly in terms of their root mean square error and coefficient of determination.
Collapse
|
14
|
Comparison of machine learning tools for the prediction of AMD based on genetic, age, and diabetes-related variables in the Chinese population. Regen Ther 2021; 15:180-186. [PMID: 33426217 PMCID: PMC7770346 DOI: 10.1016/j.reth.2020.09.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Revised: 09/01/2020] [Accepted: 09/09/2020] [Indexed: 11/23/2022] Open
Abstract
Introduction Age-related macular degeneration (AMD) is the main cause of visual impairment and the most important cause of blindness in older people. However, there is currently no effective treatment for this disease, so it is necessary to establish a risk model to predict AMD development. Methods This study included a total of 202 subjects, comprising 82 AMD patients and 120 control subjects. Sixty-six single-nucleotide polymorphisms (SNPs) were identified using the MassArray assay. Considering 14 independent clinical variables as well as SNPs, four predictive models were established in the training set and evaluated by the confusion matrix, area under the receiver operating characteristic (ROC) curve (AUROC). The difference distributions of the 14 independent clinical features between the AMD and control groups were tested using the chi-squared test. Age and diabetes were adjusted using logistic regression analysis and the “genomic-control” method was used for multiple testing correction. Results Three SNPs (rs10490924, OR = 1.686, genomic-control corrected p-value (GC) = 0.030; rs2338104, OR = 1.794, GC = 0.025 and rs1864163, OR = 2.125, GC = 0.038) were significant risk factors for AMD development. In the training set, four models obtained AUROC values above 0.72. Conclusions We believe machine learning tools will be useful for the early prediction of AMD and for the development of relevant intervention strategies.
Collapse
|
15
|
Combining Evolutionary Algorithms and Machine Learning Models in Landslide Susceptibility Assessments. REMOTE SENSING 2020. [DOI: 10.3390/rs12233854] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The main objective of the present study is to introduce a novel predictive model that combines evolutionary algorithms and machine learning (ML) models, so as to construct a landslide susceptibility map. Genetic algorithms (GA) are used as a feature selection method, whereas the particle swarm optimization (PSO) method is used to optimize the structural parameters of two ML models, support vector machines (SVM) and artificial neural network (ANN). A well-defined spatial database, which included 335 landslides and twelve landslide-related variables (elevation, slope angle, slope aspect, curvature, plan curvature, profile curvature, topographic wetness index, stream power index, distance to faults, distance to river, lithology, and hydrological cover) are considered for the analysis, in the Achaia Regional Unit located in Northern Peloponnese, Greece. The outcome of the study illustrates that both ML models have an excellent performance, with the SVM model achieving the highest learning accuracy (0.977 area under the receiver operating characteristic curve value (AUC)), followed by the ANN model (0.969). However, the ANN model shows the highest prediction accuracy (0.800 AUC), followed by the SVM (0.750 AUC) model. Overall, the proposed ML models highlights the necessity of feature selection and tuning procedures via evolutionary optimization algorithms and that such approaches could be successfully used for landslide susceptibility mapping as an alternative investigation tool.
Collapse
|
16
|
Duan R, Xue W, Wang K, Yin N, Hao H, Chu H, Wang L, Meng P, Diao L. Estimation of the LDL subclasses in ischemic stroke as a risk factor in a Chinese population. BMC Neurol 2020; 20:414. [PMID: 33183255 PMCID: PMC7664065 DOI: 10.1186/s12883-020-01989-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 10/30/2020] [Indexed: 01/09/2023] Open
Abstract
Background Acute ischemic stroke (AIS) is one of the leading causes of mortality and long-term disability worldwide. Our study aims to clarify the role of low-density lipoproteins (LDL) subclasses in the occurrence of AIS and develop a risk xprediction model based on these characteristics to identify high-risk people. Methods Five hundred and sixty-six patients with AIS and 197 non-AIS controls were included in this study. Serum lipids and other baseline characteristics including fasting blood glucose (GLU), serum creatinine (Scr), and blood pressure were investigated in relation to occurrence of AIS. The LDL subfractions were classified and measured with the Lipoprint System by a polyacrylamide gel electrophoresis technique. Results Levels of LDL-3, LDL-4 and LDL-5 subclasses were significantly higher in the AIS group compared to the non-AIS group and lower level of LDL-1 was prevalent in the AIS patients. Consistently, Spearman correlation coefficient demonstrated that sd-demonevels, especially LDL-3 and LDL-4 levels, were significantly positively correlated with AIS. Furthermore, there is a significant positive correlation between small dense LDL (sd-LDL, that is LDL-3 to 7) levels and serum lipids including total cholesterol (TC), Low density lipoprotein cholesterol (LDL-C), and Triglyceride (TG). Increased LDL-3 and LDL-4 as well as decreased LDL-1 and LDL-2 were correlated to the occurrence of AIS, even in the people with normal LDL-C levels. A new prediction model including 12 variables can accurately predict the AIS risk in Chinese patients (AUC = 0.82 ± 0.04). Conclusions Levels of LDL subclasses should be considered in addition to serum LDL-C in assessment and management of AIS. A new prediction model based on clinical variables including LDL subtractions can help clinicians identify high of AIS, even in the people with norm.
Collapse
Affiliation(s)
- Ruisheng Duan
- Department of Neurology, Hebei General Hospital, Shijiazhuang, 050000, Hebei, China.
| | - Wenjun Xue
- Department of Neurology, the First People's Hospital of Pingdingshan, Henan, 467000, Pingdingshan, China
| | - Kunpeng Wang
- Department of Neurosurgery, the Affiliated Hospital of Chengde Medical University, Chengde, 067000, Hebei, China
| | - Nan Yin
- Department of Neurology, Hebei General Hospital, Shijiazhuang, 050000, Hebei, China
| | - Hongyu Hao
- Department of Neurology, Hebei General Hospital, Shijiazhuang, 050000, Hebei, China
| | - Hongshan Chu
- Department of Neurology, Hebei General Hospital, Shijiazhuang, 050000, Hebei, China
| | - Lijun Wang
- Department of Medicine, Shanghai Zhangjiang institute of Medical innovation, Biotecan Pharmaceuticals co., ltd., Shanghai, 201204, China
| | - Peng Meng
- Department of Medicine, Shanghai Zhangjiang institute of Medical innovation, Biotecan Pharmaceuticals co., ltd., Shanghai, 201204, China.
| | - Le Diao
- Department of Medicine, Shanghai Zhangjiang institute of Medical innovation, Biotecan Pharmaceuticals co., ltd., Shanghai, 201204, China.
| |
Collapse
|
17
|
Separating Landslide Source and Runout Signatures with Topographic Attributes and Data Mining to Increase the Quality of Landslide Inventory. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10196652] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Landslide sources and runout features of typical natural terrain landslides can be observed from a geotechnical perspective. Landslide sources are the major area of occurrences, whereas runout signatures reveal the subsequent phenomena caused by unstable gravity. Remotely sensed landslide detection generally includes runout areas, unless these results have been excluded manually through detailed comparison with stereo aerial photos and other auxiliary data. Areas detected using remotely sensed landslide detection can be referred to as “landslide-affected” areas. The runout areas should be separated from landslide-affected areas when upgrading landslide detections into a landslide inventory to avoid unreliable results caused by impure samples. A supervised data mining procedure was developed to separate landslide sources and runout areas based on four topographic attributes derived from a 10–m digital elevation model with a random forest algorithm and cost-sensitive analysis. This approach was compared with commonly used methods, namely support vector machine (SVM) and logistic regression (LR). The Typhoon Morakot event in the Laonong River watershed, southern Taiwan, was modeled. The developed models constructed using the limited training data sets could separate landslide source and runout signatures verified using the polygon and area constraint-based datasets. Furthermore, the performance of developed models outperformed SVM and LR algorithms, achieving over 80% overall accuracy, area under the curve of the receiver operating characteristic, user’s accuracy, and producer’s accuracy in most cases. The agreement of quantitative evaluations between the area sizes of inventory polygons for training and the predicted targets was also observed when applying the supervised modeling strategy.
Collapse
|
18
|
A Holistic Analysis for Landslide Susceptibility Mapping Applying Geographic Object-Based Random Forest: A Comparison between Protected and Non-Protected Forests. REMOTE SENSING 2020. [DOI: 10.3390/rs12030434] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Despite recent progress in landslide susceptibility mapping, a holistic method is still needed to integrate and customize influential factors with the focus on forest regions. This study was accomplished to test the performance of geographic object-based random forest in modeling the susceptibility of protected and non-protected forests to landslides in northeast Iran. Moreover, it investigated the influential conditioning and triggering factors that control the susceptibility of these two forest areas to landslides. After surveying the landslide events, segment objects were generated from the Landsat 8 multispectral images and digital elevation model (DEM) data. The features of conditioning factors were derived from the DEM and available thematic layers. Natural triggering factors were derived from the historical events of rainfall, floods, and earthquake. The object-based image analysis was used for deriving anthropogenic-induced forest loss and fragmentation. The layers of logging and mining were obtained from available historical data. Landslide samples were extracted from field observations, satellite images, and available database. A single database was generated including all conditioning and triggering object features, and landslide samples for modeling the susceptibility of two forest areas to landslides using the random forest algorithm. The optimal performance of random forest was obtained after building 500 trees with the area under the receiver operating characteristics (AUROC) values of 86.3 and 81.8% for the protected and non-protected forests, respectively. The top influential factors were the topographic and hydrologic features for mapping landslide susceptibility in the protected forest. However, the scores were loaded evenly among the topographic, hydrologic, natural, and anthropogenic triggers in the non-protected forest. The topographic features obtained about 60% of the importance values with the domination of the topographic ruggedness index and slope in the protected forest. Although the importance of topographic features was reduced to 36% in the non-protected forest, anthropogenic and natural triggering factors remarkably gained 33.4% of the importance values in this area. This study confirms that some anthropogenic activities such as forest fragmentation and logging significantly intensified the susceptibility of the non-protected forest to landslides.
Collapse
|
19
|
Multistage fuzzy comprehensive evaluation of landslide hazards based on a cloud model. PLoS One 2019; 14:e0224312. [PMID: 31689296 PMCID: PMC6830815 DOI: 10.1371/journal.pone.0224312] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Accepted: 10/11/2019] [Indexed: 11/19/2022] Open
Abstract
To accurately study the risk assessment of landslide disasters, firstly, the environmental conditions of induced landslide disasters are regarded as a fuzzy system, and the landslide risk factors in the multi-level analysis system are constructed to build a multi-level fuzzy evaluation index system. Then, the cloud model theory is introduced to improve the importance scale and membership degree involved in the evaluation process, and the multi-level fuzzy comprehensive evaluation method of landslide risk improved by a cloud model is proposed. Thus, a multi-level fuzzy evaluation cloud model for evaluating landslide risk is established. Finally, using the improved cloud model method, a multistage fuzzy comprehensive evaluation of landslide risk is conducted for the K112+210~K112 +630 section of the Long Chuan to Huaiji Highway Project in Guangdong Province. The results show that the improved cloud model can solve the problem of uncertainty in the process of landslide preparation and occurrence, greatly improve the effectiveness of landslide evaluation results, and provide an effective reference for landslide disaster prevention.
Collapse
|