1
|
SAFFNet: Self-Attention-Based Feature Fusion Network for Remote Sensing Few-Shot Scene Classification. REMOTE SENSING 2021. [DOI: 10.3390/rs13132532] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In real applications, it is necessary to classify new unseen classes that cannot be acquired in training datasets. To solve this problem, few-shot learning methods are usually adopted to recognize new categories with only a few (out-of-bag) labeled samples together with the known classes available in the (large-scale) training dataset. Unlike common scene classification images obtained by CCD (Charge-Coupled Device) cameras, remote sensing scene classification datasets tend to have plentiful texture features rather than shape features. Therefore, it is important to extract more valuable texture semantic features from a limited number of labeled input images. In this paper, a multi-scale feature fusion network for few-shot remote sensing scene classification is proposed by integrating a novel self-attention feature selection module, denoted as SAFFNet. Unlike a pyramidal feature hierarchy for object detection, the informative representations of the images with different receptive fields are automatically selected and re-weighted for feature fusion after refining network and global pooling operation for a few-shot remote sensing classification task. Here, the feature weighting value can be fine-tuned by the support set in the few-shot learning task. The proposed model is evaluated on three publicly available datasets for few shot remote sensing scene classification. Experimental results demonstrate the effectiveness of the proposed SAFFNet to improve the few-shot classification accuracy significantly compared to other few-shot methods and the typical multi-scale feature fusion network.
Collapse
|
2
|
Li Y, Cao H, Allen CM, Wang X, Erdelez S, Shyu CR. Computational modeling of human reasoning processes for interpretable visual knowledge: a case study with radiographers. Sci Rep 2020; 10:21620. [PMID: 33303770 PMCID: PMC7730148 DOI: 10.1038/s41598-020-77550-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Accepted: 11/10/2020] [Indexed: 11/18/2022] Open
Abstract
Visual reasoning is critical in many complex visual tasks in medicine such as radiology or pathology. It is challenging to explicitly explain reasoning processes due to the dynamic nature of real-time human cognition. A deeper understanding of such reasoning processes is necessary for improving diagnostic accuracy and computational tools. Most computational analysis methods for visual attention utilize black-box algorithms which lack explainability and are therefore limited in understanding the visual reasoning processes. In this paper, we propose a computational method to quantify and dissect visual reasoning. The method characterizes spatial and temporal features and identifies common and contrast visual reasoning patterns to extract significant gaze activities. The visual reasoning patterns are explainable and can be compared among different groups to discover strategy differences. Experiments with radiographers of varied levels of expertise on 10 levels of visual tasks were conducted. Our empirical observations show that the method can capture the temporal and spatial features of human visual attention and distinguish expertise level. The extracted patterns are further examined and interpreted to showcase key differences between expertise levels in the visual reasoning processes. By revealing task-related reasoning processes, this method demonstrates potential for explaining human visual understanding.
Collapse
Affiliation(s)
- Yu Li
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Hongfei Cao
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Carla M Allen
- Department of Clinical and Diagnostic Science, University of Missouri, Columbia, MO, 65211, USA
| | - Xin Wang
- Department of Information Science, University of Northern Texas, Denton, TX, 76203, USA
| | - Sanda Erdelez
- School of Library and Information Science, Simmons University, Boston, MA, 02115, USA
| | - Chi-Ren Shyu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
3
|
Liu P, Gou G, Shan X, Tao D, Zhou Q. Global Optimal Structured Embedding Learning for Remote Sensing Image Retrieval. SENSORS (BASEL, SWITZERLAND) 2020; 20:E291. [PMID: 31948002 PMCID: PMC6983082 DOI: 10.3390/s20010291] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 12/24/2019] [Accepted: 01/02/2020] [Indexed: 11/16/2022]
Abstract
A rich line of works focus on designing elegant loss functions under the deep metric learning (DML) paradigm to learn a discriminative embedding space for remote sensing image retrieval (RSIR). Essentially, such embedding space could efficiently distinguish deep feature descriptors. So far, most existing losses used in RSIR are based on triplets, which have disadvantages of local optimization, slow convergence and insufficient use of similarity structure in a mini-batch. In this paper, we present a novel DML method named as global optimal structured loss to deal with the limitation of triplet loss. To be specific, we use a softmax function rather than a hinge function in our novel loss to realize global optimization. In addition, we present a novel optimal structured loss, which globally learn an efficient deep embedding space with mined informative sample pairs to force the positive pairs within a limitation and push the negative ones far away from a given boundary. We have conducted extensive experiments on four public remote sensing datasets and the results show that the proposed global optimal structured loss with pairs mining scheme achieves the state-of-the-art performance compared with the baselines.
Collapse
Affiliation(s)
- Pingping Liu
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (G.G.); (X.S.)
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
- School of Mechanical Science and Engineering, Jilin University, Changchun 130025, China
| | - Guixia Gou
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (G.G.); (X.S.)
| | - Xue Shan
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; (G.G.); (X.S.)
| | - Dan Tao
- School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing 100044, China;
| | - Qiuzhan Zhou
- College of Communication Engineering, Jilin University, Changchun 130012, China;
| |
Collapse
|
4
|
Abstract
With the rapid progress of remote sensing (RS) observation technologies, cross-modal RS image-sound retrieval has attracted some attention in recent years. However, these methods perform cross-modal image-sound retrieval by leveraging high-dimensional real-valued features, which can require more storage than low-dimensional binary features (i.e., hash codes). Moreover, these methods cannot directly encode relative semantic similarity relationships. To tackle these issues, we propose a new, deep, cross-modal RS image-sound hashing approach, called deep triplet-based hashing (DTBH), to integrate hash code learning and relative semantic similarity relationship learning into an end-to-end network. Specially, the proposed DTBH method designs a triplet selection strategy to select effective triplets. Moreover, in order to encode relative semantic similarity relationships, we propose the objective function, which makes sure that that the anchor images are more similar to the positive sounds than the negative sounds. In addition, a triplet regularized loss term leverages approximate l1-norm of hash-like codes and hash codes and can effectively reduce the information loss between hash-like codes and hash codes. Extensive experimental results showed that the DTBH method could achieve a superior performance to other state-of-the-art cross-modal image-sound retrieval methods. For a sound query RS image task, the proposed approach achieved a mean average precision (mAP) of up to 60.13% on the UCM dataset, 87.49% on the Sydney dataset, and 22.72% on the RSICD dataset. For RS image query sound task, the proposed approach achieved a mAP of 64.27% on the UCM dataset, 92.45% on the Sydney dataset, and 23.46% on the RSICD dataset. Future work will focus on how to consider the balance property of hash codes to improve image-sound retrieval performance.
Collapse
|
5
|
Elakkiya R, Vanitha V. Interactive real time fuzzy class level gesture similarity measure based sign language recognition using artificial neural networks. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-190707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- R. Elakkiya
- Centre for Artificial Intelligence and Machine Learning, School of Computing, SASTRA Deemed University, Thanjavur, Tamil Nadu, India
| | - V. Vanitha
- Computer Science & Engineering, Sriramachandra Institute of Engineering & Technology, SRIHER, Porur, Chennai, Tamil Nadu, India
| |
Collapse
|
6
|
Large-Scale Remote Sensing Image Retrieval Based on Semi-Supervised Adversarial Hashing. REMOTE SENSING 2019. [DOI: 10.3390/rs11172055] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Remote sensing image retrieval (RSIR), a superior content organization technique, plays an important role in the remote sensing (RS) community. With the number of RS images increases explosively, not only the retrieval precision but also the retrieval efficiency is emphasized in the large-scale RSIR scenario. Therefore, the approximate nearest neighborhood (ANN) search attracts the researchers’ attention increasingly. In this paper, we propose a new hash learning method, named semi-supervised deep adversarial hashing (SDAH), to accomplish the ANN for the large-scale RSIR task. The assumption of our model is that the RS images have been represented by the proper visual features. First, a residual auto-encoder (RAE) is developed to generate the class variable and hash code. Second, two multi-layer networks are constructed to regularize the obtained latent vectors using the prior distribution. These two modules mentioned are integrated under the generator adversarial framework. Through the minimax learning, the class variable would be a one-hot-like vector while the hash code would be the binary-like vector. Finally, a specific hashing function is formulated to enhance the quality of the generated hash code. The effectiveness of the hash codes learned by our SDAH model was proved by the positive experimental results counted on three public RS image archives. Compared with the existing hash learning methods, the proposed method reaches improved performance.
Collapse
|
7
|
Abstract
Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g., 50% faster than the current systems.
Collapse
|
8
|
Abstract
Due to the specific characteristics and complicated contents of remote sensing (RS) images, remote sensing image retrieval (RSIR) is always an open and tough research topic in the RS community. There are two basic blocks in RSIR, including feature learning and similarity matching. In this paper, we focus on developing an effective feature learning method for RSIR. With the help of the deep learning technique, the proposed feature learning method is designed under the bag-of-words (BOW) paradigm. Thus, we name the obtained feature deep BOW (DBOW). The learning process consists of two parts, including image descriptor learning and feature construction. First, to explore the complex contents within the RS image, we extract the image descriptor in the image patch level rather than the whole image. In addition, instead of using the handcrafted feature to describe the patches, we propose the deep convolutional auto-encoder (DCAE) model to deeply learn the discriminative descriptor for the RS image. Second, the k-means algorithm is selected to generate the codebook using the obtained deep descriptors. Then, the final histogrammic DBOW features are acquired by counting the frequency of the single code word. When we get the DBOW features from the RS images, the similarities between RS images are measured using L1-norm distance. Then, the retrieval results can be acquired according to the similarity order. The encouraging experimental results counted on four public RS image archives demonstrate that our DBOW feature is effective for the RSIR task. Compared with the existing RS image features, our DBOW can achieve improved behavior on RSIR.
Collapse
|
9
|
Durbha SS, Kurte KR, Bhangale U. Semantics and High Performance Computing Driven Approaches for Enhanced Exploitation of Earth Observation (EO) Data: State of the Art. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES 2017. [DOI: 10.1007/s40010-017-0432-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Scene Semantic Understanding Based on the Spatial Context Relations of Multiple Objects. REMOTE SENSING 2017. [DOI: 10.3390/rs9101030] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Alike Scene Retrieval from Land-Cover Products Based on the Label Co-Occurrence Matrix (LCM) †. REMOTE SENSING 2017. [DOI: 10.3390/rs9090912] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
12
|
Tiede D, Baraldi A, Sudmanns M, Belgiu M, Lang S. Architecture and prototypical implementation of a semantic querying system for big Earth observation image bases. EUROPEAN JOURNAL OF REMOTE SENSING 2017; 50:452-463. [PMID: 29098143 PMCID: PMC5632919 DOI: 10.1080/22797254.2017.1357432] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 05/17/2016] [Accepted: 07/12/2017] [Indexed: 05/27/2023]
Abstract
Spatiotemporal analytics of multi-source Earth observation (EO) big data is a pre-condition for semantic content-based image retrieval (SCBIR). As a proof of concept, an innovative EO semantic querying (EO-SQ) subsystem was designed and prototypically implemented in series with an EO image understanding (EO-IU) subsystem. The EO-IU subsystem is automatically generating ESA Level 2 products (scene classification map, up to basic land cover units) from optical satellite data. The EO-SQ subsystem comprises a graphical user interface (GUI) and an array database embedded in a client server model. In the array database, all EO images are stored as a space-time data cube together with their Level 2 products generated by the EO-IU subsystem. The GUI allows users to (a) develop a conceptual world model based on a graphically supported query pipeline as a combination of spatial and temporal operators and/or standard algorithms and (b) create, save and share within the client-server architecture complex semantic queries/decision rules, suitable for SCBIR and/or spatiotemporal EO image analytics, consistent with the conceptual world model.
Collapse
Affiliation(s)
- Dirk Tiede
- Department of Geoinformatics – Z_GIS, University of Salzburg, Salzburg, Austria
| | - Andrea Baraldi
- Department of Geoinformatics – Z_GIS, University of Salzburg, Salzburg, Austria
- Department of Agricultural and Food Sciences, University of Naples Federico II, Portici, Italy
| | - Martin Sudmanns
- Department of Geoinformatics – Z_GIS, University of Salzburg, Salzburg, Austria
| | - Mariana Belgiu
- Department of Geoinformatics – Z_GIS, University of Salzburg, Salzburg, Austria
| | - Stefan Lang
- Department of Geoinformatics – Z_GIS, University of Salzburg, Salzburg, Austria
| |
Collapse
|
13
|
Content-Based High-Resolution Remote Sensing Image Retrieval via Unsupervised Feature Learning and Collaborative Affinity Metric Fusion. REMOTE SENSING 2016. [DOI: 10.3390/rs8090709] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
14
|
Yang C, Liu H, Wang S, Liao S. Scene-Level Geographic Image Classification Based on a Covariance Descriptor Using Supervised Collaborative Kernel Coding. SENSORS 2016; 16:s16030392. [PMID: 26999150 PMCID: PMC4813967 DOI: 10.3390/s16030392] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 03/14/2016] [Accepted: 03/15/2016] [Indexed: 11/17/2022]
Abstract
Scene-level geographic image classification has been a very challenging problem and has become a research focus in recent years. This paper develops a supervised collaborative kernel coding method based on a covariance descriptor (covd) for scene-level geographic image classification. First, covd is introduced in the feature extraction process and, then, is transformed to a Euclidean feature by a supervised collaborative kernel coding model. Furthermore, we develop an iterative optimization framework to solve this model. Comprehensive evaluations on public high-resolution aerial image dataset and comparisons with state-of-the-art methods show the superiority and effectiveness of our approach.
Collapse
Affiliation(s)
- Chunwei Yang
- High-Tech Institute of Xi'an, Xi'an 710025, China.
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
| | - Huaping Liu
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
| | | | - Shouyi Liao
- High-Tech Institute of Xi'an, Xi'an 710025, China.
| |
Collapse
|
15
|
Predicting Relevant Change in High Resolution Satellite Imagery. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2014. [DOI: 10.3390/ijgi3041491] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
16
|
Candemir S, Jaeger S, Palaniappan K, Musco JP, Singh RK, Zhiyun Xue, Karargyris A, Antani S, Thoma G, McDonald CJ. Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration. IEEE TRANSACTIONS ON MEDICAL IMAGING 2014; 33:577-90. [PMID: 24239990 PMCID: PMC11977575 DOI: 10.1109/tmi.2013.2290491] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The National Library of Medicine (NLM) is developing a digital chest X-ray (CXR) screening system for deployment in resource constrained communities and developing countries worldwide with a focus on early detection of tuberculosis. A critical component in the computer-aided diagnosis of digital CXRs is the automatic detection of the lung regions. In this paper, we present a nonrigid registration-driven robust lung segmentation method using image retrieval-based patient specific adaptive lung models that detects lung boundaries, surpassing state-of-the-art performance. The method consists of three main stages: 1) a content-based image retrieval approach for identifying training images (with masks) most similar to the patient CXR using a partial Radon transform and Bhattacharyya shape similarity measure, 2) creating the initial patient-specific anatomical model of lung shape using SIFT-flow for deformable registration of training masks to the patient CXR, and 3) extracting refined lung boundaries using a graph cuts optimization approach with a customized energy function. Our average accuracy of 95.4% on the public JSRT database is the highest among published results. A similar degree of accuracy of 94.1% and 91.7% on two new CXR datasets from Montgomery County, MD, USA, and India, respectively, demonstrates the robustness of our lung segmentation approach.
Collapse
|
17
|
Jaeger S, Karargyris A, Candemir S, Folio L, Siegelman J, Callaghan F, Palaniappan K, Singh RK, Antani S, Thoma G, McDonald CJ. Automatic tuberculosis screening using chest radiographs. IEEE TRANSACTIONS ON MEDICAL IMAGING 2014; 33:233-45. [PMID: 24108713 DOI: 10.1109/tmi.2013.2284099] [Citation(s) in RCA: 180] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Tuberculosis is a major health threat in many regions of the world. Opportunistic infections in immunocompromised HIV/AIDS patients and multi-drug-resistant bacterial strains have exacerbated the problem, while diagnosing tuberculosis still remains a challenge. When left undiagnosed and thus untreated, mortality rates of patients with tuberculosis are high. Standard diagnostics still rely on methods developed in the last century. They are slow and often unreliable. In an effort to reduce the burden of the disease, this paper presents our automated approach for detecting tuberculosis in conventional posteroanterior chest radiographs. We first extract the lung region using a graph cut segmentation method. For this lung region, we compute a set of texture and shape features, which enable the X-rays to be classified as normal or abnormal using a binary classifier. We measure the performance of our system on two datasets: a set collected by the tuberculosis control program of our local county's health department in the United States, and a set collected by Shenzhen Hospital, China. The proposed computer-aided diagnostic system for TB screening, which is ready for field deployment, achieves a performance that approaches the performance of human experts. We achieve an area under the ROC curve (AUC) of 87% (78.3% accuracy) for the first set, and an AUC of 90% (84% accuracy) for the second set. For the first set, we compare our system performance with the performance of radiologists. When trying not to miss any positive cases, radiologists achieve an accuracy of about 82% on this set, and their false positive rate is about half of our system's rate.
Collapse
|
18
|
Genetic Optimization for Associative Semantic Ranking Models of Satellite Images by Land Cover. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2013. [DOI: 10.3390/ijgi2020531] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
19
|
Operational Automatic Remote Sensing Image Understanding Systems: Beyond Geographic Object-Based and Object-Oriented Image Analysis (GEOBIA/GEOOIA). Part 2: Novel system Architecture, Information/Knowledge Representation, Algorithm Design and Implementation. REMOTE SENSING 2012. [DOI: 10.3390/rs4092768] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|