101
|
Learning Deep Hierarchical Spatial-Spectral Features for Hyperspectral Image Classification Based on Residual 3D-2D CNN. SENSORS 2019; 19:s19235276. [PMID: 31795511 PMCID: PMC6928880 DOI: 10.3390/s19235276] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Revised: 11/26/2019] [Accepted: 11/28/2019] [Indexed: 11/17/2022]
Abstract
Every pixel in a hyperspectral image contains detailed spectral information in hundreds of narrow bands captured by hyperspectral sensors. Pixel-wise classification of a hyperspectral image is the cornerstone of various hyperspectral applications. Nowadays, deep learning models represented by the convolutional neural network (CNN) provides an ideal solution for feature extraction, and has made remarkable achievements in supervised hyperspectral classification. However, hyperspectral image annotation is time-consuming and laborious, and available training data is usually limited. Due to the “small-sample problem”, CNN-based hyperspectral classification is still challenging. Focused on the limited sample-based hyperspectral classification, we designed an 11-layer CNN model called R-HybridSN (Residual-HybridSN) from the perspective of network optimization. With an organic combination of 3D-2D-CNN, residual learning, and depth-separable convolutions, R-HybridSN can better learn deep hierarchical spatial–spectral features with very few training data. The performance of R-HybridSN is evaluated over three public available hyperspectral datasets on different amounts of training samples. Using only 5%, 1%, and 1% labeled data for training in Indian Pines, Salinas, and University of Pavia, respectively, the classification accuracy of R-HybridSN is 96.46%, 98.25%, 96.59%, respectively, which is far better than the contrast models.
Collapse
|
102
|
Differentially Deep Subspace Representation for Unsupervised Change Detection of SAR Images. REMOTE SENSING 2019. [DOI: 10.3390/rs11232740] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Temporal analysis of synthetic aperture radar (SAR) time series is a basic and significant issue in the remote sensing field. Change detection as well as other interpretation tasks of SAR images always involves non-linear/non-convex problems. Complex (non-linear) change criteria or models have thus been proposed for SAR images, instead of direct difference (e.g., change vector analysis) with/without linear transform (e.g., Principal Component Analysis, Slow Feature Analysis) used in optical image change detection. In this paper, inspired by the powerful deep learning techniques, we present a deep autoencoder (AE) based non-linear subspace representation for unsupervised change detection with multi-temporal SAR images. The proposed architecture is built upon an autoencoder-like (AE-like) network, which non-linearly maps the input SAR data into a latent space. Unlike normal AE networks, a self-expressive layer performing like principal component analysis (PCA) is added between the encoder and the decoder, which further transforms the mapped SAR data to mutually orthogonal subspaces. To make the proposed architecture more efficient at change detection tasks, the parameters are trained to minimize the representation difference of unchanged pixels in the deep subspace. Thus, the proposed architecture is namely the Differentially Deep Subspace Representation (DDSR) network for multi-temporal SAR images change detection. Experimental results on real datasets validate the effectiveness and superiority of the proposed architecture.
Collapse
|
103
|
Structural constraint deep matrix factorization for sequential data clustering. INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS 2019. [DOI: 10.1007/s41315-019-00106-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
104
|
Kang Z, Zhao X, Peng C, Zhu H, Zhou JT, Peng X, Chen W, Xu Z. Partition level multiview subspace clustering. Neural Netw 2019; 122:279-288. [PMID: 31731045 DOI: 10.1016/j.neunet.2019.10.010] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 09/17/2019] [Accepted: 10/14/2019] [Indexed: 10/25/2022]
Abstract
Multiview clustering has gained increasing attention recently due to its ability to deal with multiple sources (views) data and explore complementary information between different views. Among various methods, multiview subspace clustering methods provide encouraging performance. They mainly integrate the multiview information in the space where the data points lie. Hence, their performance may be deteriorated because of noises existing in each individual view or inconsistent between heterogeneous features. For multiview clustering, the basic premise is that there exists a shared partition among all views. Therefore, the natural space for multiview clustering should be all partitions. Orthogonal to existing methods, we propose to fuse multiview information in partition level following two intuitive assumptions: (i) each partition is a perturbation of the consensus clustering; (ii) the partition that is close to the consensus clustering should be assigned a large weight. Finally, we propose a unified multiview subspace clustering model which incorporates the graph learning from each view, the generation of basic partitions, and the fusion of consensus partition. These three components are seamlessly integrated and can be iteratively boosted by each other towards an overall optimal solution. Experiments on four benchmark datasets demonstrate the efficacy of our approach against the state-of-the-art techniques.
Collapse
Affiliation(s)
- Zhao Kang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Sichuan, 611731, China
| | - Xinjia Zhao
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Sichuan, 611731, China
| | - Chong Peng
- College of Computer Science and Technology, Qingdao University, China
| | - Hongyuan Zhu
- Institute for Infocomm Research, A*STAR, Singapore
| | | | - Xi Peng
- College of Computer Science, Sichuan University, China
| | - Wenyu Chen
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Sichuan, 611731, China.
| | - Zenglin Xu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Sichuan, 611731, China; Centre for Artificial Intelligence, Peng Cheng Lab, Shenzhen 518055, China.
| |
Collapse
|
105
|
Hu P, Peng D, Sang Y, Xiang Y. Multi-View Linear Discriminant Analysis Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:5352-5365. [PMID: 31059440 DOI: 10.1109/tip.2019.2913511] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In many real-world applications, an object can be described from multiple views or styles, leading to the emerging multi-view analysis. To eliminate the complicated (usually highly nonlinear) view discrepancy for favorable cross-view recognition and retrieval, we propose a Multi-view Linear Discriminant Analysis Network (MvLDAN) by seeking a nonlinear discriminant and view-invariant representation shared among multiple views. Unlike existing multi-view methods which directly learn a common space to reduce the view gap, our MvLDAN employs multiple feedforward neural networks (one for each view) and a novel eigenvalue-based multi-view objective function to encapsulate as much discriminative variance as possible into all the available common feature dimensions. With the proposed objective function, the MvLDAN could produce representations possessing: 1) low variance within the same class regardless of view discrepancy, 2) high variance between different classes regardless of view discrepancy, and 3) high covariance between any two views. In brief, in the learned multi-view space, the obtained deep features can be projected into a latent common space in which the samples from the same class are as close to each other as possible (even though they are from different views), and the samples from different classes are as far from each other as possible (even though they are from the same view). The effectiveness of the proposed method is verified by extensive experiments carried out on five databases, in comparison with the 19 state-of-the-art approaches.
Collapse
|
106
|
Wu S, Wu W, Lei S, Lin S, Li R, Yu Z, Wong HS. Semi-Supervised Human Detection via Region Proposal Networks Aided by Verification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1562-1574. [PMID: 31603782 DOI: 10.1109/tip.2019.2944306] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In this paper, we explore how to leverage readily available unlabeled data to improve semi-supervised human detection performance. For this purpose, we specifically modify the region proposal network (RPN) for learning on a partially labeled dataset. Based on commonly observed false positive types, a verification module is developed to assess foreground human objects in the candidate regions to provide an important cue for filtering the RPN's proposals. The remaining proposals with high confidence scores are then used as pseudo annotations for re-training our detection model. To reduce the risk of error propagation in the training process, we adopt a self-paced training strategy to progressively include more pseudo annotations generated by the previous model over multiple training rounds. The resulting detector re-trained on the augmented data can be expected to have better detection performance. The effectiveness of the main components of this framework is verified through extensive experiments, and the proposed approach achieves state-of-the-art detection results on multiple scene-specific human detection benchmarks in the semi-supervised setting.
Collapse
|
107
|
|
108
|
Joint correntropy metric weighting and block diagonal regularizer for robust multiple kernel subspace clustering. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.063] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
109
|
Hyperspectral Image Denoising Using Global Weighted Tensor Norm Minimum and Nonlocal Low-Rank Approximation. REMOTE SENSING 2019. [DOI: 10.3390/rs11192281] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A hyperspectral image (HSI) contains abundant spatial and spectral information, but it is always corrupted by various noises, especially Gaussian noise. Global correlation (GC) across spectral domain and nonlocal self-similarity (NSS) across spatial domain are two important characteristics for an HSI. To keep the integrity of the global structure and improve the details of the restored HSI, we propose a global and nonlocal weighted tensor norm minimum denoising method which jointly utilizes GC and NSS. The weighted multilinear rank is utilized to depict the GC information. To preserve structural information with NSS, a patch-group-based low-rank-tensor-approximation (LRTA) model is designed. The LRTA makes use of Tucker decompositions of 4D patches, which are composed of a similar 3D patch group of HSI. The alternating direction method of multipliers (ADMM) is adapted to solve the proposed models. Experimental results show that the proposed algorithm can preserve the structural information and outperforms several state-of-the-art denoising methods.
Collapse
|
110
|
Enhanced Feature Representation in Detection for Optical Remote Sensing Images. REMOTE SENSING 2019. [DOI: 10.3390/rs11182095] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In recent years, deep learning has led to a remarkable breakthrough in object detection in remote sensing images. In practice, two-stage detectors perform well regarding detection accuracy but are slow. On the other hand, one-stage detectors integrate the detection pipeline of two-stage detectors to simplify the detection process, and are faster, but with lower detection accuracy. Enhancing the capability of feature representation may be a way to improve the detection accuracy of one-stage detectors. For this goal, this paper proposes a novel one-stage detector with enhanced capability of feature representation. The enhanced capability benefits from two proposed structures: dual top-down module and dense-connected inception module. The former efficiently utilizes multi-scale features from multiple layers of the backbone network. The latter both widens and deepens the network to enhance the ability of feature representation with limited extra computational cost. To evaluate the effectiveness of proposed structures, we conducted experiments on horizontal bounding box detection tasks on the challenging DOTA dataset and gained 73.49% mean Average Precision (mAP), achieving state-of-the-art performance. Furthermore, our method ran significantly faster than the best public two-stage detector on the DOTA dataset.
Collapse
|
111
|
Zhou JT, Fang M, Zhang H, Gong C, Peng X, Cao Z, Goh RSM. Learning With Annotation of Various Degrees. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2794-2804. [PMID: 30640630 DOI: 10.1109/tnnls.2018.2885854] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we study a new problem in the scenario of sequences labeling. To be exact, we consider that the training data are with annotation of various degrees, namely, fully labeled, unlabeled, and partially labeled sequences. The learning with fully un/labeled sequence refers to the standard setting in traditional un/supervised learning, and the proposed partially labeling specifies the subject that the element does not belong to. The partially labeled data are cheaper to obtain compared with the fully labeled data though it is less informative, especially when the tasks require a lot of domain knowledge. To solve such a practical challenge, we propose a novel deep conditional random field (CRF) model which utilizes an end-to-end learning manner to smoothly handle fully/un/partially labeled sequences within a unified framework. To the best of our knowledge, this could be one of the first works to utilize the partially labeled instance for sequence labeling, and the proposed algorithm unifies the deep learning and CRF in an end-to-end framework. Extensive experiments show that our method achieves state-of-the-art performance in two sequence labeling tasks on some popular data sets.
Collapse
|
112
|
|
113
|
Abstract
Curriculum Learning (CL) is a recently proposed learning paradigm that aims to achieve satisfactory performance by properly organizing the learning sequence from simple curriculum examples to more difficult ones. Up to now, few works have been done to explore CL for the data with graph structure. Therefore, this article proposes a novel CL algorithm that can be utilized to guide the Label Propagation (LP) over graphs, of which the target is to “learn” the labels of unlabeled examples on the graphs. Specifically, we assume that different unlabeled examples have different levels of difficulty for propagation, and their label learning should follow a simple-to-difficult sequence with the updated curricula. Furthermore, considering that the practical data are often characterized by multiple modalities, every modality in our method is associated with a “teacher” that not only evaluates the difficulties of examples from its own viewpoint, but also cooperates with other teachers to generate the overall simplest curriculum examples for propagation. By taking the curriculums suggested by the teachers as a whole, the common preference (i.e., commonality) of teachers on selecting the simplest examples can be discovered by a row-sparse matrix, and their distinct opinions (i.e., individuality) are captured by a sparse noise matrix. As a result, an accurate curriculum sequence can be established and the propagation quality can thus be improved. Theoretically, we prove that the propagation risk bound is closely related to the examples’ difficulty information, and empirically, we show that our method can generate higher accuracy than the state-of-the-art CL approach and LP algorithms on various multi-modal tasks.
Collapse
Affiliation(s)
- Chen Gong
- PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Jian Yang
- PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Dacheng Tao
- UBTECH Sydney Artificial Intelligence Centre and the School of Computer Science, Faculty of Engineering and Information Technologies, the University of Sydney, Sydney, Australia
| |
Collapse
|
114
|
Spectral-Spatial Hyperspectral Image Classification via Robust Low-Rank Feature Extraction and Markov Random Field. REMOTE SENSING 2019. [DOI: 10.3390/rs11131565] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In this paper, a new supervised classification algorithm which simultaneously considers spectral and spatial information of a hyperspectral image (HSI) is proposed. Since HSI always contains complex noise (such as mixture of Gaussian and sparse noise), the quality of the extracted feature inclines to be decreased. To tackle this issue, we utilize the low-rank property of local three-dimensional, patch and adopt complex noise strategy to model the noise embedded in each local patch. Specifically, we firstly use the mixture of Gaussian (MoG) based low-rank matrix factorization (LRMF) method to simultaneously extract the feature and remove noise from each local matrix unfolded from the local patch. Then, a classification map is obtained by applying some classifier to the extracted low-rank feature. Finally, the classification map is processed by Markov random field (MRF) in order to further utilize the smoothness property of the labels. To ease experimental comparison for different HSI classification methods, we built an open package to make the comparison fairly and efficiently. By using this package, the proposed classification method is verified to obtain better performance compared with other state-of-the-art methods.
Collapse
|
115
|
A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery. REMOTE SENSING 2019. [DOI: 10.3390/rs11111376] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With great significance in military and civilian applications, the topic of detecting small and densely arranged objects in wide-scale remote sensing imagery is still challenging nowadays. To solve this problem, we propose a novel effectively optimized one-stage network (NEOON). As a fully convolutional network, NEOON consists of four parts: Feature extraction, feature fusion, feature enhancement, and multi-scale detection. To extract effective features, the first part has implemented bottom-up and top-down coherent processing by taking successive down-sampling and up-sampling operations in conjunction with residual modules. The second part consolidates high-level and low-level features by adopting concatenation operations with subsequent convolutional operations to explicitly yield strong feature representation and semantic information. The third part is implemented by constructing a receptive field enhancement (RFE) module and incorporating it into the fore part of the network where the information of small objects exists. The final part is achieved by four detectors with different sensitivities accessing the fused features, all four parallel, to enable the network to make full use of information of objects in different scales. Besides, the Focal Loss is set to enable the cross entropy for classification to solve the tough problem of class imbalance in one-stage methods. In addition, we introduce the Soft-NMS to preserve accurate bounding boxes in the post-processing stage especially for densely arranged objects. Note that the split and merge strategy and multi-scale training strategy are employed in training. Thorough experiments are performed on ACS datasets constructed by us and NWPU VHR-10 datasets to evaluate the performance of NEOON. Specifically, 4.77% and 5.50% improvements in mAP and recall, respectively, on the ACS dataset as compared to YOLOv3 powerfully prove that NEOON can effectually improve the detection accuracy of small objects in remote sensing imagery. In addition, extensive experiments and comprehensive evaluations on the NWPU VHR-10 dataset with 10 classes have illustrated the superiority of NEOON in the extraction of spatial information of high-resolution remote sensing images.
Collapse
|
116
|
Spatial–Spectral Squeeze-and-Excitation Residual Network for Hyperspectral Image Classification. REMOTE SENSING 2019. [DOI: 10.3390/rs11070884] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Jointly using spectral and spatial information has become a mainstream strategy in the field of hyperspectral image (HSI) processing, especially for classification. However, due to the existence of noisy or correlated spectral bands in the spectral domain and inhomogeneous pixels in the spatial neighborhood, HSI classification results are often degraded and unsatisfactory. Motivated by the attention mechanism, this paper proposes a spatial–spectral squeeze-and-excitation (SSSE) module to adaptively learn the weights for different spectral bands and for different neighboring pixels. The SSSE structure can suppress or motivate features at a certain position, which can effectively resist noise interference and improve the classification results. Furthermore, we embed several SSSE modules into a residual network architecture and generate an SSSE-based residual network (SSSERN) model for HSI classification. The proposed SSSERN method is compared with several existing deep learning networks on two benchmark hyperspectral data sets. Experimental results demonstrate the effectiveness of our proposed network.
Collapse
|
117
|
A New CNN-Bayesian Model for Extracting Improved Winter Wheat Spatial Distribution from GF-2 imagery. REMOTE SENSING 2019. [DOI: 10.3390/rs11060619] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
When the spatial distribution of winter wheat is extracted from high-resolution remote sensing imagery using convolutional neural networks (CNN), field edge results are usually rough, resulting in lowered overall accuracy. This study proposed a new per-pixel classification model using CNN and Bayesian models (CNN-Bayesian model) for improved extraction accuracy. In this model, a feature extractor generates a feature vector for each pixel, an encoder transforms the feature vector of each pixel into a category-code vector, and a two-level classifier uses the difference between elements of category-probability vectors as the confidence value to perform per-pixel classifications. The first level is used to determine the category of a pixel with high confidence, and the second level is an improved Bayesian model used to determine the category of low-confidence pixels. The CNN-Bayesian model was trained and tested on Gaofen 2 satellite images. Compared to existing models, our approach produced an improvement in overall accuracy, the overall accuracy of SegNet, DeepLab, VGG-Ex, and CNN-Bayesian was 0.791, 0.852, 0.892, and 0.946, respectively. Thus, this approach can produce superior results when winter wheat spatial distribution is extracted from satellite imagery.
Collapse
|
118
|
|
119
|
Wang X, Peng D, Hu P, Sang Y. Adversarial correlated autoencoder for unsupervised multi-view representation learning. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.01.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
120
|
Infrared Small Target Detection Based on Partial Sum of the Tensor Nuclear Norm. REMOTE SENSING 2019. [DOI: 10.3390/rs11040382] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Excellent performance, real time and strong robustness are three vital requirements for infrared small target detection. Unfortunately, many current state-of-the-art methods merely achieve one of the expectations when coping with highly complex scenes. In fact, a common problem is that real-time processing and great detection ability are difficult to coordinate. Therefore, to address this issue, a robust infrared patch-tensor model for detecting an infrared small target is proposed in this paper. On the basis of infrared patch-tensor (IPT) model, a novel nonconvex low-rank constraint named partial sum of tensor nuclear norm (PSTNN) joint weighted l1 norm was employed to efficiently suppress the background and preserve the target. Due to the deficiency of RIPT which would over-shrink the target with the possibility of disappearing, an improved local prior map simultaneously encoded with target-related and background-related information was introduced into the model. With the help of a reweighted scheme for enhancing the sparsity and high-efficiency version of tensor singular value decomposition (t-SVD), the total algorithm complexity and computation time can be reduced dramatically. Then, the decomposition of the target and background is transformed into a tensor robust principle component analysis problem (TRPCA), which can be efficiently solved by alternating direction method of multipliers (ADMM). A series of experiments substantiate the superiority of the proposed method beyond state-of-the-art baselines.
Collapse
|
121
|
Chen X, Xu C, Yang X, Song L, Tao D. Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 28:546-560. [PMID: 30222565 DOI: 10.1109/tip.2018.2869695] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Style transfer describes the rendering of an image's semantic content as different artistic styles. Recently, generative adversarial networks (GANs) have emerged as an effective approach in style transfer by adversarially training the generator to synthesize convincing counterfeits. However, traditional GAN suffers from the mode collapse issue, resulting in unstable training and making style transfer quality difficult to guarantee. In addition, the GAN generator is only compatible with one style, so a series of GANs must be trained to provide users with choices to transfer more than one kind of style. In this paper, we focus on tackling these challenges and limitations to improve style transfer. We propose adversarial gated networks (Gated-GAN) to transfer multiple styles in a single model. The generative networks have three modules: an encoder, a gated transformer, and a decoder. Different styles can be achieved by passing input images through different branches of the gated transformer. To stabilize training, the encoder and decoder are combined as an auto-encoder to reconstruct the input images. The discriminative networks are used to distinguish whether the input image is a stylized or genuine image. An auxiliary classifier is used to recognize the style categories of transferred images, thereby helping the generative networks generate images in multiple styles. In addition, Gated-GAN makes it possible to explore a new style by investigating styles learned from artists or genres. Our extensive experiments demonstrate the stability and effectiveness of the proposed model for multi-style transfer.
Collapse
|
122
|
An Efficient Framework for Remote Sensing Parallel Processing: Integrating the Artificial Bee Colony Algorithm and Multiagent Technology. REMOTE SENSING 2019. [DOI: 10.3390/rs11020152] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Remote sensing (RS) image processing can be converted to an optimization problem, which can then be solved by swarm intelligence algorithms, such as the artificial bee colony (ABC) algorithm, to improve the accuracy of the results. However, such optimization algorithms often result in a heavy computational burden. To realize the intrinsic parallel computing ability of ABC to address the computational challenges of RS optimization, an improved multiagent (MA)-based ABC framework with a reduced communication cost among agents is proposed by utilizing MA technology. Two types of agents, massive bee agents and one administration agent, located in multiple computing nodes are designed. Based on the communication and cooperation among agents, RS optimization computing is realized in a distributed and concurrent manner. Using hyperspectral RS clustering and endmember extraction as case studies, experimental results indicate that the proposed MA-based ABC approach can effectively improve the computing efficiency while maintaining optimization accuracy.
Collapse
|
123
|
|
124
|
|
125
|
Kwan C. Remote Sensing Performance Enhancement in Hyperspectral Images. SENSORS 2018; 18:s18113598. [PMID: 30360507 PMCID: PMC6263628 DOI: 10.3390/s18113598] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 10/10/2018] [Accepted: 10/19/2018] [Indexed: 12/03/2022]
Abstract
Hyperspectral images with hundreds of spectral bands have been proven to yield high performance in material classification. However, despite intensive advancement in hardware, the spatial resolution is still somewhat low, as compared to that of color and multispectral (MS) imagers. In this paper, we aim at presenting some ideas that may further enhance the performance of some remote sensing applications such as border monitoring and Mars exploration using hyperspectral images. One popular approach to enhancing the spatial resolution of hyperspectral images is pansharpening. We present a brief review of recent image resolution enhancement algorithms, including single super-resolution and multi-image fusion algorithms, for hyperspectral images. Advantages and limitations of the enhancement algorithms are highlighted. Some limitations in the pansharpening process include the availability of high resolution (HR) panchromatic (pan) and/or MS images, the registration of images from multiple sources, the availability of point spread function (PSF), and reliable and consistent image quality assessment. We suggest some proactive ideas to alleviate the above issues in practice. In the event where hyperspectral images are not available, we suggest the use of band synthesis techniques to generate HR hyperspectral images from low resolution (LR) MS images. Several recent interesting applications in border monitoring and Mars exploration using hyperspectral images are presented. Finally, some future directions in this research area are highlighted.
Collapse
Affiliation(s)
- Chiman Kwan
- Signal Processing, Inc., Rockville, MD 20850, USA.
| |
Collapse
|
126
|
Detecting Building Edges from High Spatial Resolution Remote Sensing Imagery Using Richer Convolution Features Network. REMOTE SENSING 2018. [DOI: 10.3390/rs10091496] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
As the basic feature of building, building edges play an important role in many fields such as urbanization monitoring, city planning, surveying and mapping. Building edges detection from high spatial resolution remote sensing (HSRRS) imagery has always been a long-standing problem. Inspired by the recent success of deep-learning-based edge detection, a building edge detection model using a richer convolutional features (RCF) network is employed in this paper to detect building edges. Firstly, a dataset for building edges detection is constructed by the proposed most peripheral constraint conversion algorithm. Then, based on this dataset the RCF network is retrained. Finally, the edge probability map is obtained by RCF-building model, and this paper involves a geomorphological concept to refine edge probability map according to geometric morphological analysis of topographic surface. The experimental results suggest that RCF-building model can detect building edges accurately and completely, and that this model has an edge detection F-measure that is at least 5% higher than that of other three typical building extraction methods. In addition, the ablation experiment result proves that using the most peripheral constraint conversion algorithm can generate more superior dataset, and the involved refinement algorithm shows a higher F-measure and better visual effect contrasted with the non-maximal suppression algorithm.
Collapse
|