1
|
Koldasbayeva D, Tregubova P, Gasanov M, Zaytsev A, Petrovskaia A, Burnaev E. Challenges in data-driven geospatial modeling for environmental research and practice. Nat Commun 2024; 15:10700. [PMID: 39702456 DOI: 10.1038/s41467-024-55240-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 12/04/2024] [Indexed: 12/21/2024] Open
Abstract
Machine learning-based geospatial applications offer unique opportunities for environmental monitoring due to domains and scales adaptability and computational efficiency. However, the specificity of environmental data introduces biases in straightforward implementations. We identify a streamlined pipeline to enhance model accuracy, addressing issues like imbalanced data, spatial autocorrelation, prediction errors, and the nuances of model generalization and uncertainty estimation. We examine tools and techniques for overcoming these obstacles and provide insights into future geospatial AI developments. A big picture of the field is completed from advances in data processing in general, including the demands of industry-related solutions relevant to outcomes of applied sciences.
Collapse
Affiliation(s)
| | | | - Mikhail Gasanov
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Alexey Zaytsev
- Skolkovo Institute of Science and Technology, Moscow, Russia
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications (BIMSA), Beijing, China
| | | | - Evgeny Burnaev
- Skolkovo Institute of Science and Technology, Moscow, Russia
- Autonomous Non-Profit Organization Artificial Intelligence Research Institute (AIRI), Moscow, Russia
| |
Collapse
|
2
|
Zhou J, Huang J, Sun Z, Yi Q, He A. Machine learning approaches to debris flow susceptibility analyses in the Yunnan section of the Nujiang River Basin. PeerJ 2024; 12:e17352. [PMID: 38784390 PMCID: PMC11114124 DOI: 10.7717/peerj.17352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Background The Yunnan section of the Nujiang River (YNR) Basin in the alpine-valley area is one of the most critical areas of debris flow in China. Methods We analyzed the applicability of three machine learning algorithms to model of susceptibility to debris flow-Random Forest (RF), the linear kernel support vector machine (Linear SVM), and the radial basis function support vector machine (RBFSVM)-and compared 20 factors to determine the dominant controlling in debris flow occurrence in the region. Results We found that (1) RF outperformed RBFSVM and Linear SVM in terms of accuracy, (2) topographic conditions were prerequisites, and geology, precipitation, vegetation, and anthropogenic influence were critical to forming debris flows. Also, the relative elevation difference was the most prominent evaluation factor of debris flow susceptibility, and (3) susceptibility maps based on RF's debris flow susceptibility (DFS) showed that zones with very high susceptibility were distributed along the mainstream of the Nujiang River. These findings provide methodological guidance and reference for improvement of DFS assessment. It enriches the content of DFS studies in the alpine-valley areas.
Collapse
Affiliation(s)
- Jingyi Zhou
- School of Earth Sciences, Yunnan University, Kunming, China
| | - Jiangcheng Huang
- Institute of International Rivers and Eco-Security, Yunnan University, Kunming, China
| | - Zhengbao Sun
- School of Engineering, Yunnan University, Kunming, China
| | - Qi Yi
- School of Earth Sciences, Yunnan University, Kunming, China
| | - Aoyang He
- Institute of International Rivers and Eco-Security, Yunnan University, Kunming, China
| |
Collapse
|
3
|
Wu X, Wang J. Application of Bagging, Boosting and Stacking Ensemble and EasyEnsemble Methods for Landslide Susceptibility Mapping in the Three Gorges Reservoir Area of China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4977. [PMID: 36981886 PMCID: PMC10049250 DOI: 10.3390/ijerph20064977] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/06/2023] [Accepted: 03/07/2023] [Indexed: 06/18/2023]
Abstract
Since the impoundment of the Three Gorges Reservoir area in 2003, the potential risks of geological disasters in the reservoir area have increased significantly, among which the hidden dangers of landslides are particularly prominent. To reduce casualties and damage, efficient and precise landslide susceptibility evaluation methods are important. Multiple ensemble models have been used to evaluate the susceptibility of the upper part of Badong County to landslides. In this study, EasyEnsemble technology was used to solve the imbalance between landslide and nonlandslide sample data. The extracted evaluation factors were input into three bagging, boosting, and stacking ensemble models for training, and landslide susceptibility mapping (LSM) was drawn. According to the importance analysis, the important factors affecting the occurrence of landslides are altitude, terrain surface texture (TST), distance to residences, distance to rivers and land use. The influences of different grid sizes on the susceptibility results were compared, and a larger grid was found to lead to the overfitting of the prediction results. Therefore, a 30 m grid was selected as the evaluation unit. The accuracy, area under the curve (AUC), recall rate, test set precision, and kappa coefficient of a multi-grained cascade forest (gcForest) model with the stacking method were 0.958, 0.991, 0.965, 0.946, and 0.91, respectively, which a significantly better than the values produced by the other models.
Collapse
|
4
|
Choi JE, Seol DH, Kim CY, Hong SJ. Generative Adversarial Network-Based Fault Detection in Semiconductor Equipment with Class-Imbalanced Data. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23041889. [PMID: 36850488 PMCID: PMC9967967 DOI: 10.3390/s23041889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 02/05/2023] [Accepted: 02/06/2023] [Indexed: 05/14/2023]
Abstract
This research proposes an application of generative adversarial networks (GANs) to solve the class imbalance problem in the fault detection and classification study of a plasma etching process. Small changes in the equipment part condition of the plasma equipment may cause an equipment fault, resulting in a process anomaly. Thus, fault detection in the semiconductor process is essential for success in advanced process control. Two datasets that assume faults of the mass flow controller (MFC) in equipment components were acquired using optical emission spectroscopy (OES) in the plasma etching process of a silicon trench: The abnormal process changed by the MFC is assumed to be faults, and the minority class of Case 1 is the normal class, and that of Case 2 is the abnormal class. In each case, additional minority class data were generated using GANs to compensate for the degradation of model training due to class-imbalanced data. Comparisons of five existing fault detection algorithms with the augmented datasets showed improved modeling performances. Generating a dataset for the minority group using GANs is beneficial for class imbalance problems of OES datasets in fault detection for the semiconductor plasma equipment.
Collapse
|
5
|
A Recurrent Adaptive Network: Balanced Learning for Road Crack Segmentation with High-Resolution Images. REMOTE SENSING 2022. [DOI: 10.3390/rs14143275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Road crack segmentation based on high-resolution images is an important task in road service maintenance. The undamaged road surface area is much larger than the damaged area on a highway. This imbalanced situation yields poor road crack segmentation performance for convolutional neural networks. In this paper, we first evaluate the mainstream convolutional neural network structure in the road crack segmentation task. Second, inspired by the second law of thermodynamics, an improved method called a recurrent adaptive network for a pixelwise road crack segmentation task is proposed to solve the extreme imbalance between positive and negative samples. We achieved a flow between precision and recall, similar to the conduction of temperature repetition. During the training process, the recurrent adaptive network (1) dynamically evaluates the degree of imbalance, (2) determines the positive and negative sampling rates, and (3) adjusts the loss weights of positive and negative features. By following these steps, we established a channel between precision and recall and kept them balanced as they flow to each other. A dataset of high-resolution road crack images with annotations (named HRRC) was built from a real road inspection scene. The images in HRRC were collected on a mobile vehicle measurement platform by high-resolution industrial cameras and were carefully labeled at the pixel level. Therefore, this dataset has sufficient data complexity to objectively evaluate the real performance of convolutional neural networks in highway patrol scenes. Our main contribution is a new method of solving the data imbalance problem, and the method of guiding model training by analyzing precision and recall is experimentally demonstrated to be effective. The recurrent adaptive network achieves state-of-the-art performance on this dataset.
Collapse
|
6
|
Graph-Represented Broad Learning System for Landslide Susceptibility Mapping in Alpine-Canyon Region. REMOTE SENSING 2022. [DOI: 10.3390/rs14122773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Zhouqu County is located at the intersection of two active structural belts in the east of the Qinghai-Tibet Plateau, which is a rare, high-incidence area of landslides, debris flow, and earthquakes on a global scale. The complex regional geological background, the fragile ecological environment, and the significant tectonic activities have caused great difficulties for the dynamic susceptibility assessment and prediction of landslides in the study area. Specifically, Zhouqu is a typical alpine-canyon region in geomorphology; currently there is still a lack of a landslide susceptibility assessment study for this particular type of area. Therefore, the development of landslide susceptibility mapping (LSM) in this area is of great significance for quickly grasping the regional landslide situation and formulating disaster reduction strategies. In this article, we propose a graph-represented learning algorithm named GBLS within a broad framework in order to better extract the spatially relevant characteristics of the geographical data and to quickly obtain the change pattern of landslide susceptibility according to the frequent variation (increase or decrease) of the data. Based on the broad structure, we construct a group of graph feature nodes through graph-represented learning to make better use of geometric correlation of data to upgrade the precision. The proposed method maintains the efficiency and effectiveness due to its broad structure, and even better, it is able to take advantage of incremental data to complete fast learning methodology without repeated calculation, thus avoiding time waste and massive computation consumption. Empirical results verify the excellent performance with high efficiency and generalization of GBLS on the 407 landslides in the study area inventoried by remote sensing interpretation and field investigation. Then, the landslide susceptibility map is drawn to visualize the landslide susceptibility assessment according to the result of GBLS with the highest AUC (0.982). The four most influential factors were ranked out as rainfall, NDVI, aspect, and Terrain Ruggedness Index. Our research provides a selection criterion that can be referenced for future research where GBLS is of great significance in LSM of the alpine-canyon region. It plays an important role in demonstrating and popularizing the research in the same type of landform environment. The LSM would help the government better prevent and confine the risk of landslide hazards in the alpine-canyon region of Zhouqu.
Collapse
|
7
|
Kogut T, Tomczak A, Słowik A, Oberski T. Seabed Modelling by Means of Airborne Laser Bathymetry Data and Imbalanced Learning for Offshore Mapping. SENSORS 2022; 22:s22093121. [PMID: 35590809 PMCID: PMC9100212 DOI: 10.3390/s22093121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/15/2022] [Accepted: 04/18/2022] [Indexed: 11/16/2022]
Abstract
An important problem associated with the aerial mapping of the seabed is the precise classification of point clouds characterizing the water surface, bottom, and bottom objects. This study aimed to improve the accuracy of classification by addressing the asymmetric amount of data representing these three groups. A total of 53 Synthetic Minority Oversampling Technique (SMOTE) algorithms were adjusted and evaluated to balance the amount of data. The prepared data set was used to train the Multi-Layer Perceptron (MLP) neural network used for classifying the point cloud. Data balancing contributed to significantly increasing the accuracy of classification. The best overall classification accuracy achieved varied from 95.8% to 97.0%, depending on the oversampling algorithm used, and was significantly better than the classification accuracy obtained for unbalanced data and data with downsampling (89.6% and 93.5%, respectively). Some of the algorithms allow for 10% increased detection of points on the objects compared to unbalanced data or data with simple downsampling. The results suggest that the use of selected oversampling algorithms can aid in improving the point cloud classification and making the airborne laser bathymetry technique more appropriate for seabed mapping.
Collapse
Affiliation(s)
- Tomasz Kogut
- Department of Geodesy and Offshore Survey, Maritime University of Szczecin, Żołnierska 46, 71-250 Szczecin, Poland;
- Correspondence:
| | - Arkadiusz Tomczak
- Department of Geodesy and Offshore Survey, Maritime University of Szczecin, Żołnierska 46, 71-250 Szczecin, Poland;
| | - Adam Słowik
- Department of Computer Engineering, Koszalin University of Technology, Sniadeckich 2, 75-453 Koszalin, Poland;
| | - Tomasz Oberski
- Department of Geodesy and Geoinformatics, Koszalin University of Technology, Sniadeckich 2, 75-453 Koszalin, Poland;
| |
Collapse
|
8
|
Hybrids of Support Vector Regression with Grey Wolf Optimizer and Firefly Algorithm for Spatial Prediction of Landslide Susceptibility. REMOTE SENSING 2021. [DOI: 10.3390/rs13244966] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Landslides are one of the most frequent and important natural disasters in the world. The purpose of this study is to evaluate the landslide susceptibility in Zhenping County using a hybrid of support vector regression (SVR) with grey wolf optimizer (GWO) and firefly algorithm (FA) by frequency ratio (FR) preprocessed. Therefore, a landslide inventory composed of 140 landslides and 16 landslide conditioning factors is compiled as a landslide database. Among these landslides, 70% (98) landslides were randomly selected as the training dataset of the model, and the other landslides (42) were used to verify the model. The 16 landslide conditioning factors include elevation, slope, aspect, plan curvature, profile curvature, distance to faults, distance to rivers, distance to roads, sediment transport index (STI), stream power index (SPI), topographic wetness index (TWI), normalized difference vegetation index (NDVI), landslide, rainfall, soil and lithology. The conditioning factors selection and spatial correlation analysis were carried out by using the correlation attribute evaluation (CAE) method and the frequency ratio (FR) algorithm. The area under the receiver operating characteristic curve (AUROC) and kappa data of the training dataset and validation dataset are used to evaluate the prediction ability and the relationship between the advantages and disadvantages of landslide susceptibility maps. The results show that the SVR-GWO model (AUROC = 0.854) has the best performance in landslide spatial prediction, followed by the SVR-FA (AUROC = 0.838) and SVR models (AUROC = 0.818). The hybrid models of SVR-GWO and SVR-FA improve the performance of the single SVR model, and all three models have good prospects for regional-scale landslide spatial modeling.
Collapse
|
9
|
Automated Building Detection from Airborne LiDAR and Very High-Resolution Aerial Imagery with Deep Neural Network. REMOTE SENSING 2021. [DOI: 10.3390/rs13234803] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The detection of buildings in the city is essential in several geospatial domains and for decision-making regarding intelligence for city planning, tax collection, project management, revenue generation, and smart cities, among other areas. In the past, the classical approach used for building detection was by using the imagery and it entailed human–computer interaction, which was a daunting proposition. To tackle this task, a novel network based on an end-to-end deep learning framework is proposed to detect and classify buildings features. The proposed CNN has three parallel stream channels: the first is the high-resolution aerial imagery, while the second stream is the digital surface model (DSM). The third was fixed on extracting deep features using the fusion of channel one and channel two, respectively. Furthermore, the channel has eight group convolution blocks of 2D convolution with three max-pooling layers. The proposed model’s efficiency and dependability were tested on three different categories of complex urban building structures in the study area. Then, morphological operations were applied to the extracted building footprints to increase the uniformity of the building boundaries and produce improved building perimeters. Thus, our approach bridges a significant gap in detecting building objects in diverse environments; the overall accuracy (OA) and kappa coefficient of the proposed method are greater than 80% and 0.605, respectively. The findings support the proposed framework and methodologies’ efficacy and effectiveness at extracting buildings from complex environments.
Collapse
|
10
|
A Meta-Learning Approach of Optimisation for Spatial Prediction of Landslides. REMOTE SENSING 2021. [DOI: 10.3390/rs13224521] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Optimisation plays a key role in the application of machine learning in the spatial prediction of landslides. The common practice in optimising landslide prediction models is to search for optimal/suboptimal hyperparameter values in a number of predetermined hyperparameter configurations based on an objective function, i.e., k-fold cross-validation accuracy. However, the overhead of hyperparameter optimisation can be prohibitive, especially for computationally expensive algorithms. This paper introduces an optimisation approach based on meta-learning for the spatial prediction of landslides. The proposed approach is tested in a dense tropical forested area of Cameron Highlands, Malaysia. Instead of optimising prediction models with a large number of hyperparameter configurations, the proposed approach begins with promising configurations based on several basic and statistical meta-features. The proposed meta-learning approach was tested based on Bayesian optimisation as a hyperparameter tuning algorithm and random forest (RF) as a prediction model. The spatial database was established with a total of 63 historical landslides and 15 conditioning factors. Three RF models were constructed based on (1) default parameters as suggested by the sklearn library, (2) parameters suggested by the Bayesian optimisation (BO), and (3) parameters suggested by the proposed meta-learning approach (BO-ML). Based on five-fold cross-validation accuracy, the Bayesian method achieved the best performance for both the training (0.810) and test (0.802) datasets. The meta-learning approach achieved slightly lower accuracies than the Bayesian method for the training (0.769) and test (0.800) datasets. Similarly, based on F1-score and area under the receiving operating characteristic curves (AUROC), the models with optimised parameters either by the Bayesian or meta-learning methods produced more accurate landslide susceptibility assessment than the model with the default parameters. In the present approach, instead of learning from scratch, the meta-learning would begin with hyperparameter configurations optimal for the most similar previous datasets, which can be considerably helpful and time-saving for landslide modelings.
Collapse
|