1
|
Shen W, Zhou M, Luo J, Li Z, Kwong S. Graph-Represented Distribution Similarity Index for Full-Reference Image Quality Assessment. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:3075-3089. [PMID: 38656839 DOI: 10.1109/tip.2024.3390565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
In this paper, we propose a graph-represented image distribution similarity (GRIDS) index for full-reference (FR) image quality assessment (IQA), which can measure the perceptual distance between distorted and reference images by assessing the disparities between their distribution patterns under a graph-based representation. First, we transform the input image into a graph-based representation, which is proven to be a versatile and effective choice for capturing visual perception features. This is achieved through the automatic generation of a vision graph from the given image content, leading to holistic perceptual associations for irregular image regions. Second, to reflect the perceived image distribution, we decompose the undirected graph into cliques and then calculate the product of the potential functions for the cliques to obtain the joint probability distribution of the undirected graph. Finally, we compare the distances between the graph feature distributions of the distorted and reference images at different stages; thus, we combine the distortion distribution measurements derived from different graph model depths to determine the perceived quality of the distorted images. The empirical results obtained from an extensive array of experiments underscore the competitive nature of our proposed method, which achieves performance on par with that of the state-of-the-art methods, demonstrating its exceptional predictive accuracy and ability to maintain consistent and monotonic behaviour in image quality prediction tasks. The source code is publicly available at the following website https://github.com/Land5cape/GRIDS.
Collapse
|
2
|
Zhu P, Li J, Wang Y, Xiao B, Zhao S, Hu Q. Collaborative Decision-Reinforced Self-Supervision for Attributed Graph Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:10851-10863. [PMID: 35584075 DOI: 10.1109/tnnls.2022.3171583] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Attributed graph clustering aims to partition nodes of a graph structure into different groups. Recent works usually use variational graph autoencoder (VGAE) to make the node representations obey a specific distribution. Although they have shown promising results, how to introduce supervised information to guide the representation learning of graph nodes and improve clustering performance is still an open problem. In this article, we propose a Collaborative Decision-Reinforced Self-Supervision (CDRS) method to solve the problem, in which a pseudo node classification task collaborates with the clustering task to enhance the representation learning of graph nodes. First, a transformation module is used to enable end-to-end training of existing methods based on VGAE. Second, the pseudo node classification task is introduced into the network through multitask learning to make classification decisions for graph nodes. The graph nodes that have consistent decisions on clustering and pseudo node classification are added to a pseudo-label set, which can provide fruitful self-supervision for subsequent training. This pseudo-label set is gradually augmented during training, thus reinforcing the generalization capability of the network. Finally, we investigate different sorting strategies to further improve the quality of the pseudo-label set. Extensive experiments on multiple datasets show that the proposed method achieves outstanding performance compared with state-of-the-art methods. Our code is available at https://github.com/Jillian555/TNNLS_CDRS.
Collapse
|
3
|
Peng Z, Liu H, Jia Y, Hou J. EGRC-Net: Embedding-Induced Graph Refinement Clustering Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:6457-6468. [PMID: 37991909 DOI: 10.1109/tip.2023.3333557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
Existing graph clustering networks heavily rely on a predefined yet fixed graph, which can lead to failures when the initial graph fails to accurately capture the data topology structure of the embedding space. In order to address this issue, we propose a novel clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net), which effectively utilizes the learned embedding to adaptively refine the initial graph and enhance the clustering performance. To begin, we leverage both semantic and topological information by employing a vanilla auto-encoder and a graph convolution network, respectively, to learn a latent feature representation. Subsequently, we utilize the local geometric structure within the feature embedding space to construct an adjacency matrix for the graph. This adjacency matrix is dynamically fused with the initial one using our proposed fusion architecture. To train the network in an unsupervised manner, we minimize the Jeffreys divergence between multiple derived distributions. Additionally, we introduce an improved approximate personalized propagation of neural predictions to replace the standard graph convolution network, enabling EGRC-Net to scale effectively. Through extensive experiments conducted on nine widely-used benchmark datasets, we demonstrate that our proposed methods consistently outperform several state-of-the-art approaches. Notably, EGRC-Net achieves an improvement of more than 11.99% in Adjusted Rand Index (ARI) over the best baseline on the DBLP dataset. Furthermore, our scalable approach exhibits a 10.73% gain in ARI while reducing memory usage by 33.73% and decreasing running time by 19.71%. The code for EGRC-Net will be made publicly available at https://github.com/ZhihaoPENG-CityU/EGRC-Net.
Collapse
|
4
|
Liang W, Jin J, Daly I, Sun H, Wang X, Cichocki A. Novel channel selection model based on graph convolutional network for motor imagery. Cogn Neurodyn 2023; 17:1283-1296. [PMID: 37786654 PMCID: PMC10542066 DOI: 10.1007/s11571-022-09892-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 08/03/2022] [Accepted: 09/14/2022] [Indexed: 11/03/2022] Open
Abstract
Multi-channel electroencephalography (EEG) is used to capture features associated with motor imagery (MI) based brain-computer interface (BCI) with a wide spatial coverage across the scalp. However, redundant EEG channels are not conducive to improving BCI performance. Therefore, removing irrelevant channels can help improve the classification performance of BCI systems. We present a new method for identifying relevant EEG channels. Our method is based on the assumption that useful channels share related information and that this can be measured by inter-channel connectivity. Specifically, we treat all candidate EEG channels as a graph and define channel selection as the problem of node classification on a graph. Then we design a graph convolutional neural network (GCN) model for channels classification. Channels are selected based on the outputs of our GCN model. We evaluate our proposed GCN-based channel selection (GCN-CS) method on three MI datasets. On three datasets, GCN-CS achieves performance improvements by reducing the number of channels. Specifically, we achieve classification accuracies of 79.76% on Dataset 1, 89.14% on Dataset 2 and 87.96% on Dataset 3, which outperform competing methods significantly.
Collapse
Affiliation(s)
- Wei Liang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237 China
| | - Jing Jin
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237 China
- Shenzhen Research Institute of East China University of Technology, Shenzhen, 518063 China
| | - Ian Daly
- Brain-Computer Interfacing and Neural Engineering Laboratory, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK
| | - Hao Sun
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237 China
| | - Xingyu Wang
- Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai, 200237 China
| | - Andrzej Cichocki
- Skolkovo Institute of Science and Technology, Moscow, Russia 143026
- Systems Research Institute of Polish Academy of Science, Warsaw, Poland
- Department of Informatics, Nicolaus Copernicus University, Torun, Poland
| |
Collapse
|
5
|
Zafar A, Dad Kallu K, Atif Yaqub M, Ali MU, Hyuk Byun J, Yoon M, Su Kim K. A Hybrid GCN and Filter-Based Framework for Channel and Feature Selection: An fNIRS-BCI Study. INT J INTELL SYST 2023. [DOI: 10.1155/2023/8812844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
In this study, a channel and feature selection methodology is devised for brain-computer interface (BCI) applications using functional near-infrared spectroscopy (fNIRS). A graph convolutional network (GCN) is employed to select the appropriate and correlated fNIRS channels. Furthermore, in the feature extraction phase, the performance of two filter-based feature selection algorithms, (i) the minimum redundancy maximum relevance (mRMR) and (ii) ReliefF, is investigated. The five most commonly used temporal statistical features (i.e., mean, slope, maximum, skewness, and kurtosis) are used, whereas the conventional support vector machine (SVM) is utilized as a classifier for training and testing. The proposed methodology is validated using an available online dataset of motor imagery (left- and right-hand), mental arithmetic, and baseline tasks. First, the efficacy of the proposed methodology is shown for two-class BCI applications (i.e., left- vs. right-hand motor imagery and mental arithmetic vs. baseline). Second, the proposed framework is applied to four-class BCI applications (i.e., left- vs. right-hand motor imagery vs. mental arithmetic vs. baseline). The results show that the number of appropriate channels and features was significantly reduced, resulting in a significant increase in classification accuracy for both two-class and four-class BCI applications, respectively. Furthermore, both mRMR (i.e., 87.8% for motor imagery, 87.1% for mental arithmetic, and 78.7% for four-class) and ReliefF (i.e., 90.7% for motor imagery, 93.7% for mental arithmetic, and 81.6% for four-class) yielded high average classification accuracy
. However, the results of the ReliefF algorithm are more stable and significant.
Collapse
|
6
|
STI-Net: Spatiotemporal Integration Network for Video Saliency Detection. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
7
|
Li S, Liu F, Jiao L, Chen P, Liu X, Li L. MFNet: A Novel GNN-Based Multi-Level Feature Network With Superpixel Priors. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:7306-7321. [PMID: 36383578 DOI: 10.1109/tip.2022.3220057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Since the superpixel segmentation method aggregates pixels based on similarity, the boundaries of some superpixels indicate the outline of the object and the superpixels provide prerequisites for learning structural-aware features. It is worthwhile to research how to utilize these superpixel priors effectively. In this work, by constructing the graph within superpixel and the graph among superpixels, we propose a novel Multi-level Feature Network (MFNet) based on graph neural network with the above superpixel priors. In our MFNet, we learn three-level features in a hierarchical way: from pixel-level feature to superpixel-level feature, and then to image-level feature. To solve the problem that the existing methods cannot represent superpixels well, we propose a superpixel representation method based on graph neural network, which takes the graph constructed by a single superpixel as input to extract the feature of the superpixel. To reflect the versatility of our MFNet, we apply it to an image-level prediction task and a pixel-level prediction task by designing different prediction modules. An attention linear classifier prediction module is proposed for image-level prediction tasks, such as image classification. An FC-based superpixel prediction module and a Decoder-based pixel prediction module are proposed for pixel-level prediction tasks, such as salient object detection. Our MFNet achieves competitive results on a number of datasets when compared with related methods. The visualization shows that the object boundaries and outline of the saliency maps predicted by our proposed MFNet are more refined and pay more attention to details.
Collapse
|
8
|
Gao L, Liu B, Fu P, Xu M. Depth-aware Inverted Refinement Network for RGB-D Salient Object Detection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
9
|
Song M, Song W, Yang G, Chen C. Improving RGB-D Salient Object Detection via Modality-Aware Decoder. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:6124-6138. [PMID: 36112559 DOI: 10.1109/tip.2022.3205747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most existing RGB-D salient object detection (SOD) methods are primarily focusing on cross-modal and cross-level saliency fusion, which has been proved to be efficient and effective. However, these methods still have a critical limitation, i.e., their fusion patterns - typically the combination of selective characteristics and its variations, are too highly dependent on the network's non-linear adaptability. In such methods, the balances between RGB and D (Depth) are formulated individually considering the intermediate feature slices, but the relation at the modality level may not be learned properly. The optimal RGB-D combinations differ depending on the RGB-D scenarios, and the exact complementary status is frequently determined by multiple modality-level factors, such as D quality, the complexity of the RGB scene, and degree of harmony between them. Therefore, given the existing approaches, it may be difficult for them to achieve further performance breakthroughs, as their methodologies belong to some methods that are somewhat less modality sensitive. To conquer this problem, this paper presents the Modality-aware Decoder (MaD). The critical technical innovations include a series of feature embedding, modality reasoning, and feature back-projecting and collecting strategies, all of which upgrade the widely-used multi-scale and multi-level decoding process to be modality-aware. Our MaD achieves competitive performance over other state-of-the-art (SOTA) models without using any fancy tricks in the decoder's design. Codes and results will be publicly available at https://github.com/MengkeSong/MaD.
Collapse
|
10
|
Wang M, Deng W. Adaptive Face Recognition Using Adversarial Information Network. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4909-4921. [PMID: 35839179 DOI: 10.1109/tip.2022.3189830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In many real-world applications, face recognition models often degenerate when training data (referred to as source domain) are different from testing data (referred to as target domain). To alleviate this mismatch caused by some factors like pose and skin tone, the utilization of pseudo-labels generated by clustering algorithms is an effective way in unsupervised domain adaptation. However, they always miss some hard positive samples. Supervision on pseudo-labeled samples attracts them towards their prototypes and would cause an intra-domain gap between pseudo-labeled samples and the remaining unlabeled samples within target domain, which results in the lack of discrimination in face recognition. In this paper, considering the particularity of face recognition, we propose a novel adversarial information network (AIN) to address it. First, a novel adversarial mutual information (MI) loss is proposed to alternately minimize MI with respect to the target classifier and maximize MI with respect to the feature extractor. By this min-max manner, the positions of target prototypes are adaptively modified which makes unlabeled images clustered more easily such that intra-domain gap can be mitigated. Second, to assist adversarial MI loss, we utilize a graph convolution network to predict linkage likelihoods between target data and generate pseudo-labels. It leverages valuable information in the context of nodes and can achieve more reliable results. The proposed method is evaluated under two scenarios, i.e., domain adaptation across poses and image conditions, and domain adaptation across faces with different skin tones. Extensive experiments show that AIN successfully improves cross-domain generalization and offers a new state-of-the-art on RFW dataset.
Collapse
|
11
|
Cao P, Zhu Z, Wang Z, Zhu Y, Niu Q. Applications of graph convolutional networks in computer vision. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07368-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Dai Y, Yu J, Zhang D, Hu T, Zheng X. RODFormer: High-Precision Design for Rotating Object Detection with Transformers. SENSORS 2022; 22:s22072633. [PMID: 35408247 PMCID: PMC9003240 DOI: 10.3390/s22072633] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 03/27/2022] [Accepted: 03/27/2022] [Indexed: 02/04/2023]
Abstract
Aiming at the problem of Transformers lack of local spatial receptive field and discontinuous boundary loss in rotating object detection, in this paper, we propose a Transformer-based high-precision rotating object detection model (RODFormer). Firstly, RODFormer uses a structured transformer architecture to collect feature information of different resolutions to improve the collection range of feature information. Secondly, a new feed-forward network (spatial-FFN) is constructed. Spatial-FFN fuses the local spatial features of 3 × 3 depthwise separable convolutions with the global channel features of multilayer perceptron (MLP) to solve the deficiencies of FFN in local spatial modeling. Finally, based on the space-FFN architecture, a detection head is built using the CIOU-smooth L1 loss function and only returns to the horizontal frame when the rotating frame is close to the horizontal, so as to alleviate the loss discontinuity of the rotating frame. Ablation experiments of RODFormer on the DOTA dataset show that the Transformer-structured module, the spatial-FFN module and the CIOU-smooth L1 loss function module are all effective in improving the detection accuracy of RODFormer. Compared with 12 rotating object detection models on the DOTA dataset, RODFormer has the highest average detection accuracy (up to 75.60%), that is, RODFormer is more competitive in rotating object detection accuracy.
Collapse
|