1
|
Shen X, Chen Y, Liu W, Zheng Y, Sun QS, Pan S. Graph Convolutional Multi-Label Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7997-8009. [PMID: 39028597 DOI: 10.1109/tnnls.2024.3421583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/21/2024]
Abstract
Cross-modal hashing encodes different modalities of multimodal data into low-dimensional Hamming space for fast cross-modal retrieval. In multi-label cross-modal retrieval, multimodal data are often annotated with multiple labels, and some labels, e.g., "ocean" and "cloud," often co-occur. However, existing cross-modal hashing methods overlook label dependency that is crucial for improving performance. To fulfill this gap, this article proposes graph convolutional multi-label hashing (GCMLH) for effective multi-label cross-modal retrieval. Specifically, GCMLH first generates word embedding of each label and develops label encoder to learn highly correlated label embedding via graph convolutional network (GCN). In addition, GCMLH develops feature encoder for each modality, and feature fusion module to generate highly semantic feature via GCN. GCMLH uses teacher-student learning scheme to transfer knowledge from the teacher modules, i.e., label encoder and feature fusion module, to the student module, i.e., feature encoder, such that learned hash code can well exploit multi-label dependency and multimodal semantic structure. Extensive empirical results on several benchmarks demonstrate the superiority of the proposed method over existing state-of-the-arts.
Collapse
|
2
|
Tian J, Saddik AE, Xu X, Li D, Cao Z, Shen HT. Intrinsic Consistency Preservation With Adaptively Reliable Samples for Source-Free Domain Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4738-4749. [PMID: 38379234 DOI: 10.1109/tnnls.2024.3362948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Unsupervised domain adaptation (UDA) aims to alleviate the domain shift by transferring knowledge learned from a labeled source dataset to an unlabeled target domain. Although UDA has seen promising progress recently, it requires access to data from both domains, making it problematic in source data-absent scenarios. In this article, we investigate a practical task source-free domain adaptation (SFDA) that alleviates the limitations of the widely studied UDA in simultaneously acquiring source and target data. In addition, we further study the imbalanced SFDA (ISFDA) problem, which addresses the intra-domain class imbalance and inter-domain label shift in SFDA. We observe two key issues in SFDA that: 1) target data form clusters in the representation space regardless of whether the target data points are aligned with the source classifier and 2) target samples with higher classification confidence are more reliable and have less variation in their classification confidence during adaptation. Motivated by these observations, we propose a unified method, named intrinsic consistency preservation with adaptively reliable samples (ICPR), to jointly cope with SFDA and ISFDA. Specifically, ICPR first encourages the intrinsic consistency in the predictions of neighbors for unlabeled samples with weak augmentation (standard flip-and-shift), regardless of their reliability. ICPR then generates strongly augmented views specifically for adaptively selected reliable samples and is trained to fix the intrinsic consistency between weakly and strongly augmented views of the same image concerning predictions of neighbors and their own. Additionally, we propose to use a prototype-like classifier to avoid the classification confusion caused by severe intra-domain class imbalance and inter-domain label shift. We demonstrate the effectiveness and general applicability of ICPR on six benchmarks of both SFDA and ISFDA tasks. The reproducible code of our proposed ICPR method is available at https://github.com/CFM-MSG/Code_ICPR.
Collapse
|
3
|
Wang Z, Yang Y, Chen Y, Yuan T, Sermesant M, Delingette H, Wu O. Mutual Information Guided Diffusion for Zero-Shot Cross-Modality Medical Image Translation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2825-2838. [PMID: 38551825 PMCID: PMC11580158 DOI: 10.1109/tmi.2024.3382043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
Cross-modality data translation has attracted great interest in medical image computing. Deep generative models show performance improvement in addressing related challenges. Nevertheless, as a fundamental challenge in image translation, the problem of zero-shot learning cross-modality image translation with fidelity remains unanswered. To bridge this gap, we propose a novel unsupervised zero-shot learning method called Mutual Information guided Diffusion Model, which learns to translate an unseen source image to the target modality by leveraging the inherent statistical consistency of Mutual Information between different modalities. To overcome the prohibitive high dimensional Mutual Information calculation, we propose a differentiable local-wise mutual information layer for conditioning the iterative denoising process. The Local-wise-Mutual-Information-Layer captures identical cross-modality features in the statistical domain, offering diffusion guidance without relying on direct mappings between the source and target domains. This advantage allows our method to adapt to changing source domains without the need for retraining, making it highly practical when sufficient labeled source domain data is not available. We demonstrate the superior performance of MIDiffusion in zero-shot cross-modality translation tasks through empirical comparisons with other generative models, including adversarial-based and diffusion-based models. Finally, we showcase the real-world application of MIDiffusion in 3D zero-shot learning-based cross-modality image segmentation tasks.
Collapse
|
4
|
Liang X, Yang E, Yang Y, Deng C. Multi-Relational Deep Hashing for Cross-Modal Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3009-3020. [PMID: 38625760 DOI: 10.1109/tip.2024.3385656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Deep cross-modal hashing retrieval has recently made significant progress. However, existing methods generally learn hash functions with pairwise or triplet supervisions, which involves learning the relevant information by splicing partial similarity between data pairs; notably, this approach only captures the data similarity locally and incompletely, resulting in sub-optimal retrieval performance. In this paper, we propose a novel Multi-Relational Deep Hashing (MRDH) approach, which can fully bridge the modality gap by comprehensively modeling the similarity relationship between data in different modalities. In more detail, to investigate the inter-modal relationships, we constrain the consistency of cross-modal pairwise similarities to maintain the semantic similarity across modalities. Moreover, to further capture complete similarity information, we design a new similarity metric, which we term cross-modal global similarity, by encouraging hash codes of similar data pairs from different modalities to approach a common center and hash codes for dissimilar pairs to converge to different centers. Adopting this approach enables our model to generate more discriminative hash codes. Extensive experiments on three benchmark datasets demonstrate the superiority of our method on cross-modal hashing retrieval.
Collapse
|
5
|
Bai C, Zeng C, Ma Q, Zhang J. Graph Convolutional Network Discrete Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4756-4767. [PMID: 35604998 DOI: 10.1109/tnnls.2022.3174970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With the rapid development of deep neural networks, cross-modal hashing has made great progress. However, the information of different types of data is asymmetrical, that is to say, if the resolution of an image is high enough, it can reproduce almost 100% of the real-world scenes. However, text usually carries personal emotion and it is not objective enough, so we generally think that the information of image will be much richer than text. Although most of the existing methods unify the semantic feature extraction and hash function learning modules for end-to-end learning, they ignore this issue and do not use information-rich modalities to support information-poor modalities, leading to suboptimal results, although they unify the semantic feature extraction and hash function learning modules for end-to-end learning. Furthermore, previous methods learn hash functions in a relaxed way that causes nontrivial quantization losses. To address these issues, we propose a new method called graph convolutional network (GCN) discrete hashing. This method uses a GCN to bridge the information gap between different types of data. The GCN can represent each label as word embedding, with the embedding regarded as a set of interdependent object classifiers. From these classifiers, we can obtain predicted labels to enhance feature representations across modalities. In addition, we use an efficient discrete optimization strategy to learn the discrete binary codes without relaxation. Extensive experiments conducted on three commonly used datasets demonstrate that our proposed method graph convolutional network-based discrete hashing (GCDH) outperforms the current state-of-the-art cross-modal hashing methods.
Collapse
|
6
|
Zhang M, Li J, Zheng X. Semantic embedding based online cross-modal hashing method. Sci Rep 2024; 14:736. [PMID: 38184671 PMCID: PMC10771426 DOI: 10.1038/s41598-023-50242-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 12/17/2023] [Indexed: 01/08/2024] Open
Abstract
Hashing has been extensively utilized in cross-modal retrieval due to its high efficiency in handling large-scale, high-dimensional data. However, most existing cross-modal hashing methods operate as offline learning models, which learn hash codes in a batch-based manner and prove to be inefficient for streaming data. Recently, several online cross-modal hashing methods have been proposed to address the streaming data scenario. Nevertheless, these methods fail to fully leverage the semantic information and accurately optimize hashing in a discrete fashion. As a result, both the accuracy and efficiency of online cross-modal hashing methods are not ideal. To address these issues, this paper introduces the Semantic Embedding-based Online Cross-modal Hashing (SEOCH) method, which integrates semantic information exploitation and online learning into a unified framework. To exploit the semantic information, we map the semantic labels to a latent semantic space and construct a semantic similarity matrix to preserve the similarity between new data and existing data in the Hamming space. Moreover, we employ a discrete optimization strategy to enhance the efficiency of cross-modal retrieval for online hashing. Through extensive experiments on two publicly available multi-label datasets, we demonstrate the superiority of the SEOCH method.
Collapse
Affiliation(s)
- Meijia Zhang
- School of Data Science and Computer Science, Shandong Women's University, Jinan, 250300, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, 250022, China
| | - Junzheng Li
- Network Information Management Center, Shandong Management University, Jinan, 250357, China
| | - Xiyuan Zheng
- School of Data Science and Computer Science, Shandong Women's University, Jinan, 250300, China.
| |
Collapse
|
7
|
Hoang T, Do TT, Nguyen TV, Cheung NM. Multimodal Mutual Information Maximization: A Novel Approach for Unsupervised Deep Cross-Modal Hashing. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6289-6302. [PMID: 34982698 DOI: 10.1109/tnnls.2021.3135420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this article, we adopt the maximizing mutual information (MI) approach to tackle the problem of unsupervised learning of binary hash codes for efficient cross-modal retrieval. We proposed a novel method, dubbed cross-modal info-max hashing (CMIMH). First, to learn informative representations that can preserve both intramodal and intermodal similarities, we leverage the recent advances in estimating variational lower bound of MI to maximizing the MI between the binary representations and input features and between binary representations of different modalities. By jointly maximizing these MIs under the assumption that the binary representations are modeled by multivariate Bernoulli distributions, we can learn binary representations, which can preserve both intramodal and intermodal similarities, effectively in a mini-batch manner with gradient descent. Furthermore, we find out that trying to minimize the modality gap by learning similar binary representations for the same instance from different modalities could result in less informative representations. Hence, balancing between reducing the modality gap and losing modality-private information is important for the cross-modal retrieval tasks. Quantitative evaluations on standard benchmark datasets demonstrate that the proposed method consistently outperforms other state-of-the-art cross-modal retrieval methods.
Collapse
|
8
|
Liu F, Liu J, Hong R, Lu H. Question-Guided Erasing-Based Spatiotemporal Attention Learning for Video Question Answering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1367-1379. [PMID: 34464265 DOI: 10.1109/tnnls.2021.3105280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Spatiotemporal attention learning for video question answering (VideoQA) has always been a challenging task, where existing approaches treat the attention parts and the nonattention parts in isolation. In this work, we propose to enforce the correlation between the attention parts and the nonattention parts as a distance constraint for discriminative spatiotemporal attention learning. Specifically, we first introduce a novel attention-guided erasing mechanism in the traditional spatiotemporal attention to obtain multiple aggregated attention features and nonattention features and then learn to separate the attention and the nonattention features with an appropriate distance. The distance constraint is enforced by a metric learning loss, without increasing the inference complexity. In this way, the model can learn to produce more discriminative spatiotemporal attention distribution on videos, thus enabling more accurate question answering. In order to incorporate the multiscale spatiotemporal information that is beneficial for video understanding, we additionally develop a pyramid variant on basis of the proposed approach. Comprehensive ablation experiments are conducted to validate the effectiveness of our approach, and state-of-the-art performance is achieved on several widely used datasets for VideoQA.
Collapse
|
9
|
Jiang G, Wang H, Peng J, Chen D, Fu X. Learning interpretable shared space via rank constraint for multi-view clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03778-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
10
|
Xu L, Zeng X, Zheng B, Li W. Multi-Manifold Deep Discriminative Cross-Modal Hashing for Medical Image Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3371-3385. [PMID: 35507618 DOI: 10.1109/tip.2022.3171081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Benefitting from the low storage cost and high retrieval efficiency, hash learning has become a widely used retrieval technology to approximate nearest neighbors. Within it, the cross-modal medical hashing has attracted an increasing attention in facilitating efficiently clinical decision. However, there are still two main challenges in weak multi-manifold structure perseveration across multiple modalities and weak discriminability of hash code. Specifically, existing cross-modal hashing methods focus on pairwise relations within two modalities, and ignore underlying multi-manifold structures across over 2 modalities. Then, there is little consideration about discriminability, i.e., any pair of hash codes should be different. In this paper, we propose a novel hashing method named multi-manifold deep discriminative cross-modal hashing (MDDCH) for large-scale medical image retrieval. The key point is multi-modal manifold similarity which integrates multiple sub-manifolds defined on heterogeneous data to preserve correlation among instances, and it can be measured by three-step connection on corresponding hetero-manifold. Then, we propose discriminative item to make each hash code encoded by hash functions be different, which improves discriminative performance of hash code. Besides, we introduce Gaussian-binary Restricted Boltzmann Machine to directly output hash codes without using any continuous relaxation. Experiments on three benchmark datasets (AIBL, Brain and SPLP) show that our proposed MDDCH achieves comparative performance to recent state-of-the-art hashing methods. Additionally, diagnostic evaluation from professional physicians shows that all the retrieved medical images describe the same object and illness as the queried image.
Collapse
|
11
|
Zhang L, Shang Y, Li P, Luo H, Shao L. Community-Aware Photo Quality Evaluation by Deeply Encoding Human Perception. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3136-3146. [PMID: 32735541 DOI: 10.1109/tcyb.2019.2937319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Computational photo quality evaluation is a useful technique in many tasks of computer vision and graphics, for example, photo retaregeting, 3-D rendering, and fashion recommendation. The conventional photo quality models are designed by characterizing the pictures from all communities (e.g., "architecture" and "colorful") indiscriminately, wherein community-specific features are not exploited explicitly. In this article, we develop a new community-aware photo quality evaluation framework. It uncovers the latent community-specific topics by a regularized latent topic model (LTM) and captures human visual quality perception by exploring multiple attributes. More specifically, given massive-scale online photographs from multiple communities, a novel ranking algorithm is proposed to measure the visual/semantic attractiveness of regions inside each photograph. Meanwhile, three attributes, namely: 1) photo quality scores; weak semantic tags; and inter-region correlations, are seamlessly and collaboratively incorporated during ranking. Subsequently, we construct the gaze shifting path (GSP) for each photograph by sequentially linking the top-ranking regions from each photograph, and an aggregation-based CNN calculates the deep representation for each GSP. Based on this, an LTM is proposed to model the GSP distribution from multiple communities in the latent space. To mitigate the overfitting problem caused by communities with very few photographs, a regularizer is incorporated into our LTM. Finally, given a test photograph, we obtain its deep GSP representation and its quality score is determined by the posterior probability of the regularized LTM. Comparative studies on four image sets have shown the competitiveness of our method. Besides, the eye-tracking experiments have demonstrated that our ranking-based GSPs are highly consistent with real human gaze movements.
Collapse
|
12
|
Zou X, Wu S, Zhang N, Bakker EM. Multi-label modality enhanced attention based self-supervised deep cross-modal hashing. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107927] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
13
|
Khan A, Hayat S, Ahmad M, Wen J, Farooq MU, Fang M, Jiang W. Cross‐modal retrieval based on deep regularized hashing constraints. INT J INTELL SYST 2022. [DOI: 10.1002/int.22853] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Asad Khan
- School of Computer Science and Cyber Engineering Guangzhou University Guangzhou China
| | - Sakander Hayat
- School of Mathematics and Information Sciences Guangzhou University Guangzhou China
| | - Muhammad Ahmad
- Department of Computer Science National University of Computer and Emerging Sciences (NUCES‐FAST) Faisalabad Campus Chiniot Pakistan
| | - Jinyu Wen
- School of Computer Science and Cyber Engineering Guangzhou University Guangzhou China
| | - Muhammad Umar Farooq
- School of Computer Science and Technology University of Science and Technology of China Hefei China
| | - Meie Fang
- School of Computer Science and Cyber Engineering Guangzhou University Guangzhou China
| | - Wenchao Jiang
- School of Computers Guangdong University of Technology Guangzhou China
| |
Collapse
|
14
|
Ji Z, Wang H, Han J, Pang Y. SMAN: Stacked Multimodal Attention Network for Cross-Modal Image-Text Retrieval. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1086-1097. [PMID: 32386178 DOI: 10.1109/tcyb.2020.2985716] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This article focuses on tackling the task of the cross-modal image-text retrieval which has been an interdisciplinary topic in both computer vision and natural language processing communities. Existing global representation alignment-based methods fail to pinpoint the semantically meaningful portion of images and texts, while the local representation alignment schemes suffer from the huge computational burden for aggregating the similarity of visual fragments and textual words exhaustively. In this article, we propose a stacked multimodal attention network (SMAN) that makes use of the stacked multimodal attention mechanism to exploit the fine-grained interdependencies between image and text, thereby mapping the aggregation of attentive fragments into a common space for measuring cross-modal similarity. Specifically, we sequentially employ intramodal information and multimodal information as guidance to perform multiple-step attention reasoning so that the fine-grained correlation between image and text can be modeled. As a consequence, we are capable of discovering the semantically meaningful visual regions or words in a sentence which contributes to measuring the cross-modal similarity in a more precise manner. Moreover, we present a novel bidirectional ranking loss that enforces the distance among pairwise multimodal instances to be closer. Doing so allows us to make full use of pairwise supervised information to preserve the manifold structure of heterogeneous pairwise data. Extensive experiments on two benchmark datasets demonstrate that our SMAN consistently yields competitive performance compared to state-of-the-art methods.
Collapse
|
15
|
Abstract
Cross-modal retrieval aims to search samples of one modality via queries of other modalities, which is a hot issue in the community of multimedia. However, two main challenges, i.e., heterogeneity gap and semantic interaction across different modalities, have not been solved efficaciously. Reducing the heterogeneous gap can improve the cross-modal similarity measurement. Meanwhile, modeling cross-modal semantic interaction can capture the semantic correlations more accurately. To this end, this paper presents a novel end-to-end framework, called Dual Attention Generative Adversarial Network (DA-GAN). This technique is an adversarial semantic representation model with a dual attention mechanism, i.e., intra-modal attention and inter-modal attention. Intra-modal attention is used to focus on the important semantic feature within a modality, while inter-modal attention is to explore the semantic interaction between different modalities and then represent the high-level semantic correlation more precisely. A dual adversarial learning strategy is designed to generate modality-invariant representations, which can reduce the cross-modal heterogeneity efficiently. The experiments on three commonly used benchmarks show the better performance of DA-GAN than these competitors.
Collapse
|
16
|
Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00615-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractWith the vigorous development of mobile Internet technology and the popularization of smart devices, while the amount of multimedia data has exploded, its forms have become more and more diversified. People’s demand for information is no longer satisfied with single-modal data retrieval, and cross-modal retrieval has become a research hotspot in recent years. Due to the strong feature learning ability of deep learning, cross-modal deep hashing has been extensively studied. However, the similarity of different modalities is difficult to measure directly because of the different distribution and representation of cross-modal. Therefore, it is urgent to eliminate the modal gap and improve retrieval accuracy. Some previous research work has introduced GANs in cross-modal hashing to reduce semantic differences between different modalities. However, most of the existing GAN-based cross-modal hashing methods have some issues such as network training is unstable and gradient disappears, which affect the elimination of modal differences. To solve this issue, this paper proposed a novel Semantic-guided Autoencoder Adversarial Hashing method for cross-modal retrieval (SAAH). First of all, two kinds of adversarial autoencoder networks, under the guidance of semantic multi-labels, maximize the semantic relevance of instances and maintain the immutability of cross-modal. Secondly, under the supervision of semantics, the adversarial module guides the feature learning process and maintains the modality relations. In addition, to maintain the inter-modal correlation of all similar pairs, this paper use two types of loss functions to maintain the similarity. To verify the effectiveness of our proposed method, sufficient experiments were conducted on three widely used cross-modal datasets (MIRFLICKR, NUS-WIDE and MS COCO), and compared with several representatives advanced cross-modal retrieval methods, SAAH achieved leading retrieval performance.
Collapse
|
17
|
He L, Li H, Chen M, Wang J, Altaye M, Dillman JR, Parikh NA. Deep Multimodal Learning From MRI and Clinical Data for Early Prediction of Neurodevelopmental Deficits in Very Preterm Infants. Front Neurosci 2021; 15:753033. [PMID: 34675773 PMCID: PMC8525883 DOI: 10.3389/fnins.2021.753033] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 09/13/2021] [Indexed: 01/31/2023] Open
Abstract
The prevalence of disabled survivors of prematurity has increased dramatically in the past 3 decades. These survivors, especially, very preterm infants (VPIs), born ≤ 32 weeks gestational age, are at high risk for neurodevelopmental impairments. Early and clinically effective personalized prediction of outcomes, which forms the basis for early treatment decisions, is urgently needed during the peak neuroplasticity window—the first couple of years after birth—for at-risk infants, when intervention is likely to be most effective. Advances in MRI enable the noninvasive visualization of infants' brains through acquired multimodal images, which are more informative than unimodal MRI data by providing complementary/supplementary depicting of brain tissue characteristics and pathology. Thus, analyzing quantitative multimodal MRI features affords unique opportunities to study early postnatal brain development and neurodevelopmental outcome prediction in VPIs. In this study, we investigated the predictive power of multimodal MRI data, including T2-weighted anatomical MRI, diffusion tensor imaging, resting-state functional MRI, and clinical data for the prediction of neurodevelopmental deficits. We hypothesize that integrating multimodal MRI and clinical data improves the prediction over using each individual data modality. Employing the aforementioned multimodal data, we proposed novel end-to-end deep multimodal models to predict neurodevelopmental (i.e., cognitive, language, and motor) deficits independently at 2 years corrected age. We found that the proposed models can predict cognitive, language, and motor deficits at 2 years corrected age with an accuracy of 88.4, 87.2, and 86.7%, respectively, significantly better than using individual data modalities. This current study can be considered as proof-of-concept. A larger study with external validation is important to validate our approach to further assess its clinical utility and overall generalizability.
Collapse
Affiliation(s)
- Lili He
- Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Hailong Li
- Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Ming Chen
- Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Electronic Engineering and Computing Systems, University of Cincinnati, Cincinnati, OH, United States
| | - Jinghua Wang
- Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Mekibib Altaye
- Biostatistics and Epidemiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Jonathan R Dillman
- Imaging Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Radiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Radiology, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| | - Nehal A Parikh
- The Perinatal Institute, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States.,Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, United States
| |
Collapse
|
18
|
|
19
|
|
20
|
Li M, Li Q, Tang L, Peng S, Ma Y, Yang D. Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5107034. [PMID: 34326867 PMCID: PMC8310450 DOI: 10.1155/2021/5107034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/08/2021] [Indexed: 11/18/2022]
Abstract
Cross-modal hashing encodes heterogeneous multimedia data into compact binary code to achieve fast and flexible retrieval across different modalities. Due to its low storage cost and high retrieval efficiency, it has received widespread attention. Supervised deep hashing significantly improves search performance and usually yields more accurate results, but requires a lot of manual annotation of the data. In contrast, unsupervised deep hashing is difficult to achieve satisfactory performance due to the lack of reliable supervisory information. To solve this problem, inspired by knowledge distillation, we propose a novel unsupervised knowledge distillation cross-modal hashing method based on semantic alignment (SAKDH), which can reconstruct the similarity matrix using the hidden correlation information of the pretrained unsupervised teacher model, and the reconstructed similarity matrix can be used to guide the supervised student model. Specifically, firstly, the teacher model adopted an unsupervised semantic alignment hashing method, which can construct a modal fusion similarity matrix. Secondly, under the supervision of teacher model distillation information, the student model can generate more discriminative hash codes. Experimental results on two extensive benchmark datasets (MIRFLICKR-25K and NUS-WIDE) show that compared to several representative unsupervised cross-modal hashing methods, the mean average precision (MAP) of our proposed method has achieved a significant improvement. It fully reflects its effectiveness in large-scale cross-modal data retrieval.
Collapse
Affiliation(s)
- Mingyong Li
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Qiqi Li
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Lirong Tang
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Shuang Peng
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Yan Ma
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| | - Degang Yang
- College of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China
| |
Collapse
|
21
|
Saha M, Guo X, Sharma A. TilGAN: GAN for Facilitating Tumor-Infiltrating Lymphocyte Pathology Image Synthesis With Improved Image Classification. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:79829-79840. [PMID: 34178560 PMCID: PMC8224465 DOI: 10.1109/access.2021.3084597] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Tumor-infiltrating lymphocytes (TILs) act as immune cells against cancer tissues. The manual assessment of TILs is usually erroneous, tedious, costly and subject to inter- and intraobserver variability. Machine learning approaches can solve these issues, but they require a large amount of labeled data for model training, which is expensive and not readily available. In this study, we present an efficient generative adversarial network, TilGAN, to generate high-quality synthetic pathology images followed by classification of TIL and non-TIL regions. Our proposed architecture is constructed with a generator network and a discriminator network. The novelty exists in the TilGAN architecture, loss functions, and evaluation techniques. Our TilGAN-generated images achieved a higher Inception score than the real images (2.90 vs. 2.32, respectively). They also achieved a lower kernel Inception distance (1.44) and a lower Fréchet Inception distance (0.312). It also passed the Turing test performed by experienced pathologists and clinicians. We further extended our evaluation studies and used almost one million synthetic data, generated by TilGAN, to train a classification model. Our proposed classification model achieved a 97.83% accuracy, a 97.37% F1-score, and a 97% area under the curve. Our extensive experiments and superior outcomes show the efficiency and effectiveness of our proposed TilGAN architecture. This architecture can also be used for other types of images for image synthesis.
Collapse
Affiliation(s)
- Monjoy Saha
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| | - Xiaoyuan Guo
- Department of Computer Science, Emory University, Atlanta, GA 30332, USA
| | - Ashish Sharma
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
22
|
He J, Zhang T, Zheng Y, Xu M, Zhang Y, Wu F. Consistency Graph Modeling for Semantic Correspondence. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:4932-4946. [PMID: 33961558 DOI: 10.1109/tip.2021.3077138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
To establish robust semantic correspondence between images covering different objects belonging to the same category, there are three important types of information including inter-image relationship, intra-image relationship and cycle consistency. Most existing methods only exploit one or two types of the above information and cannot make them enhance and complement each other. Different from existing methods, we propose a novel end-to-end Consistency Graph Modeling Network (CGMNet) for semantic correspondence by modeling inter-image relationship, intra-image relationship and cycle consistency jointly in a unified deep model. The proposed CGMNet enjoys several merits. First, to the best of our knowledge, this is the first work to jointly model the three kinds of information in a deep model for semantic correspondence. Second, our model has designed three effective modules including cross-graph module, intra-graph module and cycle consistency module, which can jointly learn more discriminative feature representations robust to local ambiguities and background clutter for semantic correspondence. Extensive experimental results show that our algorithm performs favorably against state-of-the-art methods on four challenging datasets including PF-PASCAL, PF-WILLOW, Caltech-101 and TSS.
Collapse
|
23
|
Zhang G, Chen K, Xu S, Cho PC, Nan Y, Zhou X, Lv C, Li C, Xie G. Lesion synthesis to improve intracranial hemorrhage detection and classification for CT images. Comput Med Imaging Graph 2021; 90:101929. [PMID: 33984782 DOI: 10.1016/j.compmedimag.2021.101929] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 02/02/2021] [Accepted: 04/21/2021] [Indexed: 11/17/2022]
Abstract
Computer-aided diagnosis (CAD) for intracranial hemorrhage (ICH) is needed due to its high mortality rate and time sensitivity. Training a stable and robust deep learning-based model usually requires enough training examples, which may be impractical in many real-world scenarios. Lesion synthesis offers a possible solution to solve this problem, especially for the issue of the lack of micro bleedings. In this paper, we propose a novel strategy to generate artificial lesions on non-lesion CT images so as to produce additional labeled training examples. Artificial masks in any location, size, or shape can be generated through Artificial Mask Generator (AMG) and then be converted into hemorrhage lesions through Lesion Synthesis Network (LSN). Images with and without artificial lesions are combined for training an ICH detection with a novel Residual Score. We evaluate our method by the auxiliary diagnosis task of ICH. Our experiments demonstrate that the proposed approach can improve the AUC value from 84% to 91% in the ICH detection task and from 89% to 96% in the classification task. Moreover, by adding artificial lesions of small size, the sensitivity of micro bleeding is remarkably improved from 49% to 70%. Besides, the proposed method overcomes the other three synthetic approaches by a large margin.
Collapse
Affiliation(s)
- Guyue Zhang
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China; Zhejiang Institute of Standardization, Hangzhou, Zhejiang Province 310007, China.
| | - Kaixing Chen
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China.
| | - Shangliang Xu
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China.
| | - Po Chuan Cho
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China.
| | - Yang Nan
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China; National Heart and Lung Institute, Imperial College London, London, UK.
| | - Xin Zhou
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China.
| | - Chuanfeng Lv
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China.
| | - Changsheng Li
- School of Computer Science & Technology, Beijing Institute of Technology, Beijing 210023, China.
| | - Guotong Xie
- Ping An Technology (Shenzhen) Co., Ltd., Shanghai 200000, China.
| |
Collapse
|
24
|
Wang H, Peng J, Jiang G, Xu F, Fu X. Discriminative feature and dictionary learning with part-aware model for vehicle re-identification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.06.148] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
25
|
Li Y, Wang Q, Zhang J, Hu L, Ouyang W. The theoretical research of generative adversarial networks: an overview. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.114] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
26
|
|
27
|
|
28
|
Chen W, Wang W, Liu L, Lew MS. New Ideas and Trends in Deep Multimodal Content Understanding: A Review. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.042] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
29
|
Jiang G, Wang H, Peng J, Chen D, Fu X. Graph-based Multi-view Binary Learning for image clustering. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.07.132] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
30
|
Zhao W, Guan Z, Luo H, Peng J, Fan J. Deep Multiple Instance Hashing for Fast Multi-Object Image Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7995-8007. [PMID: 34554911 DOI: 10.1109/tip.2021.3112011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-keyword query is widely supported in text search engines. However, an analogue in image retrieval systems, multi-object query, is rarely studied. Meanwhile, traditional object-based image retrieval methods often involve multiple steps separately. In this work, we propose a weakly-supervised Deep Multiple Instance Hashing (DMIH) approach for multi-object image retrieval. Our DMIH approach, which leverages a popular CNN model to build the end-to-end relation between a raw image and the binary hash codes of its multiple objects, can support multi-object queries effectively and integrate object detection with hashing learning seamlessly. We treat object detection as a binary multiple instance learning (MIL) problem and such instances are automatically extracted from multi-scale convolutional feature maps. We also design a conditional random field (CRF) module to capture both the semantic and spatial relations among different class labels. For hashing training, we sample image pairs to learn their semantic relationships in terms of hash codes of the most probable proposals for owned labels as guided by object predictors. The two objectives benefit each other in a multi-task learning scheme. Finally, a two-level inverted index method is proposed to further speed up the retrieval of multi-object queries. Our DMIH approach outperforms state-of-the-arts on public benchmarks for object-based image retrieval and achieves promising results for multi-object queries.
Collapse
|
31
|
Zhu S, Feng Y, Zhou M, Qiang B, Fang B, Wei R. Prototype-Based Discriminative Feature Representation for Class-incremental Cross-modal Retrieval. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s021800142150018x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Cross-modal retrieval aims to retrieve the related items from various modalities with respect to a query from any type. The key challenge of cross-modal retrieval is to learn more discriminative representations between different category, as well as expand to an unseen class retrieval in the open world retrieval task. To tackle the above problem, in this paper, we propose a prototype learning-based discriminative feature learning (PLDFL) to learn more discriminative representations in a common space. First, we utilize a prototype learning algorithm to cluster these samples labeled with the same semantic class, by jointly taking into consideration the intra-class compactness and inter-class sparsity without discriminative treatments. Second, we use the weight-sharing strategy to model the correlations of cross-modal samples to narrow down the modality gap. Finally, we apply the prototype to achieve class-incremental learning to prove the robustness of our proposed approach. According to our experimental results, significant retrieval performance in terms of mAP can be achieved on average compared to several state-of-the-art approaches.
Collapse
Affiliation(s)
- Shaoquan Zhu
- College of Computer Science, Chongqing University, Chongqing 400030, P. R. China
- Key Laboratory of Dependable Service, Computing in Cyber Physical Society, Ministry of Education, Chongqing 400030, P. R. China
| | - Yong Feng
- College of Computer Science, Chongqing University, Chongqing 400030, P. R. China
- Key Laboratory of Dependable Service, Computing in Cyber Physical Society, Ministry of Education, Chongqing 400030, P. R. China
| | - Mingliang Zhou
- State Key Lab of IoT for Smart City, CIS, University of Macau, Macau SAR 999078, P. R. China
| | - Baohua Qiang
- Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, P. R. China
- Guangxi Key Laboratory of Optoelectroric Information Processing, Guilin University of Electronic Technology, Guilin 541004, P. R. China
| | - Bin Fang
- College of Computer Science, Chongqing University, Chongqing 400030, P. R. China
- Key Laboratory of Dependable Service, Computing in Cyber Physical Society, Ministry of Education, Chongqing 400030, P. R. China
| | - Ran Wei
- Chongqing Medical Data Information Technology Co., Ltd, Building 3, Block B, Administration Centre, Nanan District Chongqing 401336, P. R. China
| |
Collapse
|
32
|
Meng M, Wang H, Yu J, Chen H, Wu J. Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:986-1000. [PMID: 33232233 DOI: 10.1109/tip.2020.3038365] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Hashing-based techniques have provided attractive solutions to cross-modal similarity search when addressing vast quantities of multimedia data. However, existing cross-modal hashing (CMH) methods face two critical limitations: 1) there is no previous work that simultaneously exploits the consistent or modality-specific information of multi-modal data; 2) the discriminative capabilities of pairwise similarity is usually neglected due to the computational cost and storage overhead. Moreover, to tackle the discrete constraints, relaxation-based strategy is typically adopted to relax the discrete problem to the continuous one, which severely suffers from large quantization errors and leads to sub-optimal solutions. To overcome the above limitations, in this article, we present a novel supervised CMH method, namely Asymmetric Supervised Consistent and Specific Hashing (ASCSH). Specifically, we explicitly decompose the mapping matrices into the consistent and modality-specific ones to sufficiently exploit the intrinsic correlation between different modalities. Meanwhile, a novel discrete asymmetric framework is proposed to fully explore the supervised information, in which the pairwise similarity and semantic labels are jointly formulated to guide the hash code learning process. Unlike existing asymmetric methods, the discrete asymmetric structure developed is capable of solving the binary constraint problem discretely and efficiently without any relaxation. To validate the effectiveness of the proposed approach, extensive experiments on three widely used datasets are conducted and encouraging results demonstrate the superiority of ASCSH over other state-of-the-art CMH methods.
Collapse
|
33
|
Yang H, Sun J, Carass A, Zhao C, Lee J, Prince JL, Xu Z. Unsupervised MR-to-CT Synthesis Using Structure-Constrained CycleGAN. IEEE TRANSACTIONS ON MEDICAL IMAGING 2020; 39:4249-4261. [PMID: 32780700 DOI: 10.1109/tmi.2020.3015379] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Synthesizing a CT image from an available MR image has recently emerged as a key goal in radiotherapy treatment planning for cancer patients. CycleGANs have achieved promising results on unsupervised MR-to-CT image synthesis; however, because they have no direct constraints between input and synthetic images, cycleGANs do not guarantee structural consistency between these two images. This means that anatomical geometry can be shifted in the synthetic CT images, clearly a highly undesirable outcome in the given application. In this paper, we propose a structure-constrained cycleGAN for unsupervised MR-to-CT synthesis by defining an extra structure-consistency loss based on the modality independent neighborhood descriptor. We also utilize a spectral normalization technique to stabilize the training process and a self-attention module to model the long-range spatial dependencies in the synthetic images. Results on unpaired brain and abdomen MR-to-CT image synthesis show that our method produces better synthetic CT images in both accuracy and visual quality as compared to other unsupervised synthesis methods. We also show that an approximate affine pre-registration for unpaired training data can improve synthesis results.
Collapse
|
34
|
Zhang Y, Zhou W, Wang M, Tian Q, Li H. Deep Relation Embedding for Cross-Modal Retrieval. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:617-627. [PMID: 33232230 DOI: 10.1109/tip.2020.3038354] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Cross-modal retrieval aims to identify relevant data across different modalities. In this work, we are dedicated to cross-modal retrieval between images and text sentences, which is formulated into similarity measurement for each image-text pair. To this end, we propose a Cross-modal Relation Guided Network (CRGN) to embed image and text into a latent feature space. The CRGN model uses GRU to extract text feature and ResNet model to learn the globally guided image feature. Based on the global feature guiding and sentence generation learning, the relation between image regions can be modeled. The final image embedding is generated by a relation embedding module with an attention mechanism. With the image embeddings and text embeddings, we conduct cross-modal retrieval based on the cosine similarity. The learned embedding space well captures the inherent relevance between image and text. We evaluate our approach with extensive experiments on two public benchmark datasets, i.e., MS-COCO and Flickr30K. Experimental results demonstrate that our approach achieves better or comparable performance with the state-of-the-art methods with notable efficiency.
Collapse
|
35
|
He T, Liu Y, Ko TH, Chan KCC, Ong YS. Contextual Correlation Preserving Multiview Featured Graph Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:4318-4331. [PMID: 31329151 DOI: 10.1109/tcyb.2019.2926431] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Graph clustering, which aims at discovering sets of related vertices in graph-structured data, plays a crucial role in various applications, such as social community detection and biological module discovery. With the huge increase in the volume of data in recent years, graph clustering is used in an increasing number of real-life scenarios. However, the classical and state-of-the-art methods, which consider only single-view features or a single vector concatenating features from different views and neglect the contextual correlation between pairwise features, are insufficient for the task, as features that characterize vertices in a graph are usually from multiple views and the contextual correlation between pairwise features may influence the cluster preference for vertices. To address this challenging problem, we introduce in this paper, a novel graph clustering model, dubbed contextual correlation preserving multiview featured graph clustering (CCPMVFGC) for discovering clusters in graphs with multiview vertex features. Unlike most of the aforementioned approaches, CCPMVFGC is capable of learning a shared latent space from multiview features as the cluster preference for each vertex and making use of this latent space to model the inter-relationship between pairwise vertices. CCPMVFGC uses an effective method to compute the degree of contextual correlation between pairwise vertex features and utilizes view-wise latent space representing the feature-cluster preference to model the computed correlation. Thus, the cluster preference learned by CCPMVFGC is jointly inferred by multiview features, view-wise correlations of pairwise features, and the graph topology. Accordingly, we propose a unified objective function for CCPMVFGC and develop an iterative strategy to solve the formulated optimization problem. We also provide the theoretical analysis of the proposed model, including convergence proof and computational complexity analysis. In our experiments, we extensively compare the proposed CCPMVFGC with both classical and state-of-the-art graph clustering methods on eight standard graph datasets (six multiview and two single-view datasets). The results show that CCPMVFGC achieves competitive performance on all eight datasets, which validates the effectiveness of the proposed model.
Collapse
|
36
|
Hoang T, Do TT, Nguyen TV, Cheung NM. Unsupervised Deep Cross-modality Spectral Hashing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8391-8406. [PMID: 32784139 DOI: 10.1109/tip.2020.3014727] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
This paper presents a novel framework, namely Deep Cross-modality Spectral Hashing (DCSH), to tackle the unsupervised learning problem of binary hash codes for efficient cross-modal retrieval. The framework is a two-step hashing approach which decouples the optimization into (1) binary optimization and (2) hashing function learning. In the first step, we propose a novel spectral embedding-based algorithm to simultaneously learn single-modality and binary cross-modality representations. While the former is capable of well preserving the local structure of each modality, the latter reveals the hidden patterns from all modalities. In the second step, to learn mapping functions from informative data inputs (images and word embeddings) to binary codes obtained from the first step, we leverage the powerful CNN for images and propose a CNN-based deep architecture to learn text modality. Quantitative evaluations on three standard benchmark datasets demonstrate that the proposed DCSH method consistently outperforms other state-of-the-art methods.
Collapse
|
37
|
Peng J, Wang H, Xu F, Fu X. Cross domain knowledge learning with dual-branch adversarial network for vehicle re-identification. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.112] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
38
|
Meng X, Wang H, Feng L. The similarity-consensus regularized multi-view learning for dimension reduction. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105835] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
39
|
Du C, Yuan J, Dong J, Li L, Chen M, Li T. GPU based parallel optimization for real time panoramic video stitching. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2019.06.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
|
41
|
Huang C, Luo X, Zhang J, Liao Q, Wang X, Jiang Z, Qi S. Explore instance similarity: An instance correlation based hashing method for multi-label cross-model retrieval. Inf Process Manag 2020. [DOI: 10.1016/j.ipm.2019.102165] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
42
|
AI Radar Sensor: Creating Radar Depth Sounder Images Based on Generative Adversarial Network. SENSORS 2019; 19:s19245479. [PMID: 31842359 PMCID: PMC6960960 DOI: 10.3390/s19245479] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 12/04/2019] [Accepted: 12/06/2019] [Indexed: 11/17/2022]
Abstract
Significant resources have been spent in collecting and storing large and heterogeneous radar datasets during expensive Arctic and Antarctic fieldwork. The vast majority of data available is unlabeled, and the labeling process is both time-consuming and expensive. One possible alternative to the labeling process is the use of synthetically generated data with artificial intelligence. Instead of labeling real images, we can generate synthetic data based on arbitrary labels. In this way, training data can be quickly augmented with additional images. In this research, we evaluated the performance of synthetically generated radar images based on modified cycle-consistent adversarial networks. We conducted several experiments to test the quality of the generated radar imagery. We also tested the quality of a state-of-the-art contour detection algorithm on synthetic data and different combinations of real and synthetic data. Our experiments show that synthetic radar images generated by generative adversarial network (GAN) can be used in combination with real images for data augmentation and training of deep neural networks. However, the synthetic images generated by GANs cannot be used solely for training a neural network (training on synthetic and testing on real) as they cannot simulate all of the radar characteristics such as noise or Doppler effects. To the best of our knowledge, this is the first work in creating radar sounder imagery based on generative adversarial network.
Collapse
|
43
|
Wu L, Wang Y, Shao L, Wang M. 3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:3347-3359. [PMID: 30716051 DOI: 10.1109/tnnls.2019.2891244] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We present the global deep video representation learning to video-based person reidentification (re-ID) that aggregates local 3-D features across the entire video extent. Existing methods typically extract frame-wise deep features from 2-D convolutional networks (ConvNets) which are pooled temporally to produce the video-level representations. However, 2-D ConvNets lose temporal priors immediately after the convolutions, and a separate temporal pooling is limited in capturing human motion in short sequences. In this paper, we present global video representation learning, to be complementary to 3-D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. Nevertheless, encoding each video frame in its entirety and computing aggregate global representations across all frames is tremendously challenging due to the occlusions and misalignments. To resolve this, our proposed network is further augmented with the 3-D part alignment to learn local features through the soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3-D features are demonstrated to achieve the state-of-the-art results on three benchmark data sets: MARS, Imagery Library for Intelligent Detection Systems-Video Re-identification, and PRID2011.
Collapse
|
44
|
Wu L, Wang Y, Yin H, Wang M, Shao L. Few-Shot Deep Adversarial Learning for Video-based Person Re-identification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:1233-1245. [PMID: 31535998 DOI: 10.1109/tip.2019.2940684] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Video-based person re-identification (re-ID) refers to matching people across camera views from arbitrary unaligned video footages. Existing methods rely on supervision signals to optimise a projected space under which the distances between inter/intra-videos are maximised/minimised. However, this demands exhaustively labelling people across camera views, rendering them unable to be scaled in large networked cameras. Also, it is noticed that learning effective video representations with view invariance is not explicitly addressed for which features exhibit different distributions otherwise. Thus, matching videos for person re-ID demands flexible models to capture the dynamics in time-series observations and learn view-invariant representations with access to limited labeled training samples. In this paper, we propose a novel few-shot deep learning approach to videobased person re-ID, to learn comparable representations that are discriminative and view-invariant. The proposed method is developed on the variational recurrent neural networks (VRNNs) and trained adversarially to produce latent variables with temporal dependencies that are highly discriminative yet view-invariant in matching persons. Through extensive experiments conducted on three benchmark datasets, we empirically show the capability of our method in creating view-invariant temporal features and state-of-the-art performance achieved by our method.
Collapse
|
45
|
Mu N, Xu X, Zhang X. Finding autofocus region in low contrast surveillance images using CNN-based saliency algorithm. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.04.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
46
|
Fu B, Li Y, Wang XH, Ren YG. Image super-resolution using TV priori guided convolutional network. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.06.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
47
|
|
48
|
|
49
|
Zhang C, Lin Y, Zhu L, Liu A, Zhang Z, Huang F. CNN-VWII: An efficient approach for large-scale video retrieval by image queries. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.03.015] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
50
|
|