1
|
Li Y, Lin Y, Hu P, Peng D, Luo H, Peng X. Single-Cell RNA-Seq Debiased Clustering via Batch Effect Disentanglement. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11371-11381. [PMID: 37030864 DOI: 10.1109/tnnls.2023.3260003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A variety of single-cell RNA-seq (scRNA-seq) clustering methods has achieved great success in discovering cellular phenotypes. However, it remains challenging when the data confounds with batch effects brought by different experimental conditions or technologies. Namely, the data partitions would be biased toward these nonbiological factors. Meanwhile, the batch differences are not always much smaller than true biological variations, hindering the cooperation of batch integration and clustering methods. To overcome this challenge, we propose single-cell RNA-seq debiased clustering (SCDC), an end-to-end clustering method that is debiased toward batch effects by disentangling the biological and nonbiological information from scRNA-seq data during data partitioning. In six analyses, SCDC qualitatively and quantitatively outperforms both the state-of-the-art clustering and batch integration methods in handling scRNA-seq data with batch effects. Furthermore, SCDC clusters data with a linearly increasing running time with respect to cell numbers and a fixed graphics processing unit (GPU) memory consumption, making it scalable to large datasets. The code will be released on Github.
Collapse
|
2
|
Hu S, Shi Z, Yan X, Lou Z, Ye Y. Multiview Clustering With Propagating Information Bottleneck. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:9915-9929. [PMID: 37022400 DOI: 10.1109/tnnls.2023.3238041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In many practical applications, massive data are observed from multiple sources, each of which contains multiple cohesive views, called hierarchical multiview (HMV) data, such as image-text objects with different types of visual and textual features. Naturally, the inclusion of source and view relationships offers a comprehensive view of the input HMV data and achieves an informative and correct clustering result. However, most existing multiview clustering (MVC) methods can only process single-source data with multiple views or multisource data with single type of feature, failing to consider all the views across multiple sources. Observing the rich closely related multivariate (i.e., source and view) information and the potential dynamic information flow interacting among them, in this article, a general hierarchical information propagation model is first built to address the above challenging problem. It describes the process from optimal feature subspace learning (OFSL) of each source to final clustering structure learning (CSL). Then, a novel self-guided method named propagating information bottleneck (PIB) is proposed to realize the model. It works in a circulating propagation fashion, so that the resulting clustering structure obtained from the last iteration can "self-guide" the OFSL of each source, and the learned subspaces are in turn used to conduct the subsequent CSL. We theoretically analyze the relationship between the cluster structures learned in the CSL phase and the preservation of relevant information propagated from the OFSL phase. Finally, a two-step alternating optimization method is carefully designed for optimization. Experimental results on various datasets show the superiority of the proposed PIB method over several state-of-the-art methods.
Collapse
|
3
|
Yan X, Jia L, Cao H, Yu Y, Wang T, Zhang F, Guan Q. Multitargets Joint Training Lightweight Model for Object Detection of Substation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2413-2424. [PMID: 35877791 DOI: 10.1109/tnnls.2022.3190139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The object detection of the substation is the key to ensuring the safety and reliable operation of the substation. The traditional image detection algorithms use the corresponding texture features of single-class objects and would not handle other different class objects easily. The object detection algorithm based on deep networks has generalization, and its sizeable complex backbone limits the application in the substation monitoring terminals with weak computing power. This article proposes a multitargets joint training lightweight model. The proposed model uses the feature maps of the complex model and the labels of objects in images as training multitargets. The feature maps have deeper feature information, and the feature maps of complex networks have higher information entropy than lightweight networks have. This article proposes the heat pixels method to improve the adequate object information because of the imbalance of the proportion between the foreground and the background. The heat pixels method is designed as a kind of reverse network calculation and reflects the object's position to the pixels of the feature maps. The temperature of the pixels indicates the probability of the existence of the objects in the locations. Three different lightweight networks use the complex model feature maps and the traditional tags as the training multitargets. The public dataset VOC and the substation equipment dataset are adopted in the experiments. The experimental results demonstrate that the proposed model can effectively improve object detection accuracy and reduce the time-consuming and calculation amount.
Collapse
|
4
|
Liao Z, Zhang X, Su W, Zhan K. View-Consistent Heterogeneous Network on Graphs With Few Labeled Nodes. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:5523-5532. [PMID: 35298391 DOI: 10.1109/tcyb.2022.3157771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Performing transductive learning on graphs with very few labeled data, that is, two or three samples for each category, is challenging due to the lack of supervision. In the existing work, self-supervised learning via a single view model is widely adopted to address the problem. However, recent observation shows multiview representations of an object share the same semantic information in high-level feature space. For each sample, we generate heterogeneous representations and use view-consistency loss to make their representations consistent with each other. Multiview representation also inspires to supervise the pseudolabels generation by the aid of mutual supervision between views. In this article, we thus propose a view-consistent heterogeneous network (VCHN) to learn better representations by aligning view-agnostic semantics. Specifically, VCHN is constructed by constraining the predictions between two views so that the view pairs can supervise each other. To make the best use of cross-view information, we further propose a novel training strategy to generate more reliable pseudolabels, which thus enhances predictions of the VCHN. Extensive experimental results on three benchmark datasets demonstrate that our method achieves superior performance over state-of-the-art methods under very low label rates.
Collapse
|
5
|
Lin Y, Gou Y, Liu X, Bai J, Lv J, Peng X. Dual Contrastive Prediction for Incomplete Multi-View Representation Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:4447-4461. [PMID: 35939466 DOI: 10.1109/tpami.2022.3197238] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this article, we propose a unified framework to solve the following two challenging problems in incomplete multi-view representation learning: i) how to learn a consistent representation unifying different views, and ii) how to recover the missing views. To address the challenges, we provide an information theoretical framework under which the consistency learning and data recovery are treated as a whole. With the theoretical framework, we propose a novel objective function which jointly solves the aforementioned two problems and achieves a provable sufficient and minimal representation. In detail, the consistency learning is performed by maximizing the mutual information of different views through contrastive learning, and the missing views are recovered by minimizing the conditional entropy through dual prediction. To the best of our knowledge, this is one of the first works to theoretically unify the cross-view consistency learning and data recovery for representation learning. Extensive experimental results show that the proposed method remarkably outperforms 20 competitive multi-view learning methods on six datasets in terms of clustering, classification, and human action recognition. The code could be accessed from https://pengxi.me.
Collapse
|
6
|
Guo Y, Liu M. Spatial-temporal trajectory anomaly detection based on an improved spectral clustering algorithm. INTELL DATA ANAL 2023. [DOI: 10.3233/ida-216185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
With the development of wireless communication technology, when users use wireless networks to meet various needs, wireless networks also record a large number of users’ spatial-temporal trajectory data. In order to better pay attention to the healthy development of students and promote the information construction on campus, a spectral clustering algorithm based on the multi-scale threshold and density combined with shared nearest neighbors (MSTDSNN-SC) is proposed. Firstly, it improves the affinity distance function based on the shortest time dis-tance-shortest time distance sub-sequence (STD-STDSS) by adding location popularity and uses this model to construct the initial adjacency matrix. Then it introduces the covariance scale threshold and spatial scale threshold to perform 0–1 processing on the adjacency matrix to obtain more accurate sample similarity. Next, it constructs an eigenvector space by eigenvalue decom-position of the adjacency matrix. Finally, it uses DBSCAN clustering algorithm with shared nearest neighbors to avoid to manually determine the number of clusters. Taking Internet usage data on campus as an example, multiple clustering algorithms are used for anomaly detection and four evaluation metrics are applied to estimate the clustering results. MSTDSNN-SC algorithm reflects better clustering performance. Furthermore, the abnormal trajectories list is verified to be effective and credible.
Collapse
|
7
|
Ma Z, Yu J, Wang L, Chen H, Zhao Y, He X, Wang Y, Song Y. Multi-view clustering based on view-attention driven. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-023-01787-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
8
|
Shi S, Nie F, Wang R, Li X. Multi-View Clustering via Nonnegative and Orthogonal Graph Reconstruction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:201-214. [PMID: 34288875 DOI: 10.1109/tnnls.2021.3093297] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The goal of multi-view clustering is to partition samples into different subsets according to their diverse features. Previous multi-view clustering methods mainly exist two forms: multi-view spectral clustering and multi-view matrix factorization. Although they have shown excellent performance in many occasions, there are still many disadvantages. For example, multi-view spectral clustering usually needs to perform postprocessing. Multi-view matrix factorization directly decomposes the original data features. When the size of features is large, it encounters the expensive time consumption to decompose these data features thoroughly. Therefore, we proposed a novel multi-view clustering approach. The main advantages include the following three aspects: 1) it searches for a common joint graph across multiple views, which fully explores the hidden structure information by utilizing the compatibility among views; 2) the introduced nonnegative constraint manipulates that the final clustering results can be directly obtained; and 3) straightforwardly decomposing the similarity matrix can transform the eigenvalue factorization in spectral clustering with computational complexity O(n3) into the singular value decomposition (SVD) with O(nc2) time cost, where n and c , respectively, denote the numbers of samples and classes. Thus, the computational efficiency can be improved. Moreover, in order to learn a better clustering model, we set that the constructed similarity graph approximates each view affinity graph as close as possible by adding the constraint as the initial affinity matrices own. Furthermore, substantial experiments are conducted, which verifies the superiority of the proposed two clustering methods comparing with single-view clustering approaches and state-of-the-art multi-view clustering methods.
Collapse
|
9
|
Yang X, Deng C, Dang Z, Tao D. Deep Multiview Collaborative Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:516-526. [PMID: 34370671 DOI: 10.1109/tnnls.2021.3097748] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The clustering methods have absorbed even-increasing attention in machine learning and computer vision communities in recent years. In this article, we focus on the real-world applications where a sample can be represented by multiple views. Traditional methods learn a common latent space for multiview samples without considering the diversity of multiview representations and use K -means to obtain the final results, which are time and space consuming. On the contrary, we propose a novel end-to-end deep multiview clustering model with collaborative learning to predict the clustering results directly. Specifically, multiple autoencoder networks are utilized to embed multi-view data into various latent spaces and a heterogeneous graph learning module is employed to fuse the latent representations adaptively, which can learn specific weights for different views of each sample. In addition, intraview collaborative learning is framed to optimize each single-view clustering task and provide more discriminative latent representations. Simultaneously, interview collaborative learning is employed to obtain complementary information and promote consistent cluster structure for a better clustering solution. Experimental results on several datasets show that our method significantly outperforms several state-of-the-art clustering approaches.
Collapse
|
10
|
Wang J, Wang H, Nie F, Li X. Ratio Sum Versus Sum Ratio for Linear Discriminant Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:10171-10185. [PMID: 34874851 DOI: 10.1109/tpami.2021.3133351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Dimension reduction is a critical technology for high-dimensional data processing, where Linear Discriminant Analysis (LDA) and its variants are effective supervised methods. However, LDA prefers to feature with smaller variance, which causes feature with weak discriminative ability retained. In this paper, we propose a novel Ratio Sum for Linear Discriminant Analysis (RSLDA), which aims at maximizing discriminative ability of each feature in subspace. To be specific, it maximizes the sum of ratio of the between-class distance to the within-class distance in each dimension of subspace. Since the original RSLDA problem is difficult to obtain the closed solution, an equivalent problem is developed which can be solved by an alternative optimization algorithm. For solving the equivalent problem, it is transformed into two sub-problems, one of which can be solved directly, the other is changed into a convex optimization problem, where singular value decomposition is employed instead of matrix inversion. Consequently, performance of algorithm cannot be affected by the non-singularity of covariance matrix. Furthermore, Kernel RSLDA (KRSLDA) is presented to improve the robustness of RSLDA. Additionally, time complexity of RSLDA and KRSLDA are analyzed. Extensive experiments show that RSLDA and KRSLDA outperforms other comparison methods on toy datasets and multiple public datasets.
Collapse
|
11
|
Wang Q, Jiang X, Chen M, Li X. Autoweighted Multiview Feature Selection With Graph Optimization. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12966-12977. [PMID: 34398782 DOI: 10.1109/tcyb.2021.3094843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we focus on the unsupervised multiview feature selection, which tries to handle high-dimensional data in the field of multiview learning. Although some graph-based methods have achieved satisfactory performance, they ignore the underlying data structure across different views. Besides, their predefined Laplacian graphs are sensitive to the noises in the original data space and fail to obtain the optimal neighbor assignment. To address the above problems, we propose a novel unsupervised multiview feature selection model based on graph learning, and the contributions are three-fold: 1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Therefore, the proposed model can reveal the data relationship from the feature subset; 2) a reasonable rank constraint is added to optimize the similarity matrix to obtain more accurate information; and 3) an autoweighted framework is presented to assign view weights adaptively, and an effective alternative iterative algorithm is proposed to optimize the problem. Experiments on various datasets demonstrate the superiority of the proposed method compared to the state-of-the-art methods.
Collapse
|
12
|
Li G, Song D, Bai W, Han K, Tharmarasa R. Consensus and Complementary Regularized Non-negative Matrix Factorization for Multi-View Image Clustering. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
13
|
Gao J, Gong M, Li X. Congested crowd instance localization with dilated convolutional swin transformer. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
14
|
Liang N, Yang Z, Li Z, Xie S. Label prediction based constrained non-negative matrix factorization for semi-supervised multi-view classification. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
15
|
Yeh CH, Lin CH, Kang LW, Huang CH, Lin MH, Chang CY, Wang CC. Lightweight Deep Neural Network for Joint Learning of Underwater Object Detection and Color Conversion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6129-6143. [PMID: 33900925 DOI: 10.1109/tnnls.2021.3072414] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Underwater image processing has been shown to exhibit significant potential for exploring underwater environments. It has been applied to a wide variety of fields, such as underwater terrain scanning and autonomous underwater vehicles (AUVs)-driven applications, such as image-based underwater object detection. However, underwater images often suffer from degeneration due to attenuation, color distortion, and noise from artificial lighting sources as well as the effects of possibly low-end optical imaging devices. Thus, object detection performance would be degraded accordingly. To tackle this problem, in this article, a lightweight deep underwater object detection network is proposed. The key is to present a deep model for jointly learning color conversion and object detection for underwater images. The image color conversion module aims at transforming color images to the corresponding grayscale images to solve the problem of underwater color absorption to enhance the object detection performance with lower computational complexity. The presented experimental results with our implementation on the Raspberry pi platform have justified the effectiveness of the proposed lightweight jointly learning model for underwater object detection compared with the state-of-the-art approaches.
Collapse
|
16
|
Zhang T, Cong Y, Sun G, Dong J. Visual-Tactile Fused Graph Learning for Object Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:12275-12289. [PMID: 34133303 DOI: 10.1109/tcyb.2021.3080321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Object clustering has received considerable research attention most recently. However, 1) most existing object clustering methods utilize visual information while ignoring important tactile modality, which would inevitably lead to model performance degradation and 2) simply concatenating visual and tactile information via multiview clustering method can make complementary information to not be fully explored, since there are many differences between vision and touch. To address these issues, we put forward a graph-based visual-tactile fused object clustering framework with two modules: 1) a modality-specific representation learning module MR and 2) a unified affinity graph learning module MU . Specifically, MR focuses on learning modality-specific representations for visual-tactile data, where deep non-negative matrix factorization (NMF) is adopted to extract the hidden information behind each modality. Meanwhile, we employ an autoencoder-like structure to enhance the robustness of the learned representations, and two graphs to improve its compactness. Furthermore, MU highlights how to mitigate the differences between vision and touch, and further maximize the mutual information, which adopts a minimizing disagreement scheme to guide the modality-specific representations toward a unified affinity graph. To achieve ideal clustering performance, a Laplacian rank constraint is imposed to regularize the learned graph with ideal connected components, where noises that caused wrong connections are removed and clustering labels can be obtained directly. Finally, we propose an efficient alternating iterative minimization updating strategy, followed by a theoretical proof to prove framework convergence. Comprehensive experiments on five public datasets demonstrate the superiority of the proposed framework.
Collapse
|
17
|
Crowd Density Estimation in Spatial and Temporal Distortion Environment Using Parallel Multi-Size Receptive Fields and Stack Ensemble Meta-Learning. Symmetry (Basel) 2022. [DOI: 10.3390/sym14102159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The estimation of crowd density is crucial for applications such as autonomous driving, visual surveillance, crowd control, public space planning, and warning visually distracted drivers prior to an accident. Having strong translational, reflective, and scale symmetry, models for estimating the density of a crowd yield an encouraging result. However, dynamic scenes with perspective distortions and rapidly changing spatial and temporal domains still present obstacles. The main reasons for this are the dynamic nature of a scene and the difficulty of representing and incorporating the feature space of objects of varying sizes into a prediction model. To overcome the aforementioned issues, this paper proposes a parallel multi-size receptive field units framework that leverages the majority of the CNN layer’s features, allowing for the representation and participation in the model prediction of the features of objects of all sizes. The proposed method utilizes features generated from lower to higher layers. As a result, different object scales can be handled at different framework depths, and various environmental densities can be estimated. However, the inclusion of the vast majority of layer features in the prediction model has a number of negative effects on the prediction’s outcome. Asymmetric non-local attention and the channel weighting module of a feature map are proposed to handle noise and background details and re-weight each channel to make it more sensitive to important features while ignoring irrelevant ones, respectively. While the output predictions of some layers have high bias and low variance, those of other layers have low bias and high variance. Using stack ensemble meta-learning, we combine individual predictions made with lower-layer features and higher-layer features to improve prediction while balancing the tradeoff between bias and variance. The UCF CC 50 dataset and the ShanghaiTech dataset have both been subjected to extensive testing. The results of the experiments indicate that the proposed method is effective for dense distributions and objects of various sizes.
Collapse
|
18
|
Su J, Huang J, Qing L, He X, Chen H. A new approach for social group detection based on spatio-temporal interpersonal distance measurement. Heliyon 2022; 8:e11038. [PMID: 36267375 PMCID: PMC9576905 DOI: 10.1016/j.heliyon.2022.e11038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 05/03/2022] [Accepted: 10/06/2022] [Indexed: 11/06/2022] Open
Abstract
Visual-based social group detection aims to cluster pedestrians in crowd scenes according to social interactions and spatio-temporal position relations by using surveillance video data. It is a basic technique for crowd behaviour analysis and group-based activity understanding. According to the theory of proxemics study, the interpersonal relationship between individuals determines the scope of their self-space, while the spatial distance can reflect the closeness degree of their interpersonal relationship. In this paper, we proposed a new unsupervised approach to address the issues of interaction recognition and social group detection in public spaces, which remits the need to intensely label time-consuming training data. First, based on pedestrians' spatio-temporal trajectories, the interpersonal distances among individuals were measured from static and dynamic perspectives. Combined with proxemics' theory, a social interaction recognition scheme was designed to judge whether there is a social interaction between pedestrians. On this basis, the pedestrians are clustered to identify if they form a social group. Extensive experiments on our pedestrian dataset “SCU-VSD-Social” annotated with multi-group labels demonstrated that the proposed method has outstanding performance in both accuracy and complexity.
Collapse
|
19
|
Wang Q, Liu R, Chen M, Li X. Robust Rank-Constrained Sparse Learning: A Graph-Based Framework for Single View and Multiview Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10228-10239. [PMID: 33872170 DOI: 10.1109/tcyb.2021.3067137] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Graph-based clustering aims to partition the data according to a similarity graph, which has shown impressive performance on various kinds of tasks. The quality of similarity graph largely determines the clustering results, but it is difficult to produce a high-quality one, especially when data contain noises and outliers. To solve this problem, we propose a robust rank constrained sparse learning (RRCSL) method in this article. The L2,1 -norm is adopted into the objective function of sparse representation to learn the optimal graph with robustness. To preserve the data structure, we construct an initial graph and search the graph within its neighborhood. By incorporating a rank constraint, the learned graph can be directly used as the cluster indicator, and the final results are obtained without additional postprocessing. In addition, the proposed method cannot only be applied to single-view clustering but also extended to multiview clustering. Plenty of experiments on synthetic and real-world datasets have demonstrated the superiority and robustness of the proposed framework.
Collapse
|
20
|
Co-consensus semi-supervised multi-view learning with orthogonal non-negative matrix factorization. Inf Process Manag 2022. [DOI: 10.1016/j.ipm.2022.103054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
21
|
Tang Y, Xie Y, Zhang C, Zhang Z, Zhang W. One-Step Multiview Subspace Segmentation via Joint Skinny Tensor Learning and Latent Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9179-9193. [PMID: 33661745 DOI: 10.1109/tcyb.2021.3053057] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Multiview subspace clustering (MSC) has attracted growing attention due to the extensive value in various applications, such as natural language processing, face recognition, and time-series analysis. In this article, we are devoted to address two crucial issues in MSC: 1) high computational cost and 2) cumbersome multistage clustering. Existing MSC approaches, including tensor singular value decomposition (t-SVD)-MSC that has achieved promising performance, generally utilize the dataset itself as the dictionary and regard representation learning and clustering process as two separate parts, thus leading to the high computational overhead and unsatisfactory clustering performance. To remedy these two issues, we propose a novel MSC model called joint skinny tensor learning and latent clustering (JSTC), which can learn high-order skinny tensor representations and corresponding latent clustering assignments simultaneously. Through such a joint optimization strategy, the multiview complementary information and latent clustering structure can be exploited thoroughly to improve the clustering performance. An alternating direction minimization algorithm, which owns low computational complexity and can be run in parallel when solving several key subproblems, is carefully designed to optimize the JSTC model. Such a nice property makes our JSTC an appealing solution for large-scale MSC problems. We conduct extensive experiments on ten popular datasets and compare our JSTC with 12 competitors. Five commonly used metrics, including four external measures (NMI, ACC, F-score, and RI) and one internal metric (SI), are adopted to evaluate the clustering quality. The experimental results with the Wilcoxon statistical test demonstrate the superiority of the proposed method in both clustering performance and operational efficiency.
Collapse
|
22
|
Tang J, Feng H. Robust local-coordinate non-negative matrix factorization with adaptive graph for robust clustering. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
23
|
Arshad MH, Bilal M, Gani A. Human Activity Recognition: Review, Taxonomy and Open Challenges. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22176463. [PMID: 36080922 PMCID: PMC9460866 DOI: 10.3390/s22176463] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 06/12/2023]
Abstract
Nowadays, Human Activity Recognition (HAR) is being widely used in a variety of domains, and vision and sensor-based data enable cutting-edge technologies to detect, recognize, and monitor human activities. Several reviews and surveys on HAR have already been published, but due to the constantly growing literature, the status of HAR literature needed to be updated. Hence, this review aims to provide insights on the current state of the literature on HAR published since 2018. The ninety-five articles reviewed in this study are classified to highlight application areas, data sources, techniques, and open research challenges in HAR. The majority of existing research appears to have concentrated on daily living activities, followed by user activities based on individual and group-based activities. However, there is little literature on detecting real-time activities such as suspicious activity, surveillance, and healthcare. A major portion of existing studies has used Closed-Circuit Television (CCTV) videos and Mobile Sensors data. Convolutional Neural Network (CNN), Long short-term memory (LSTM), and Support Vector Machine (SVM) are the most prominent techniques in the literature reviewed that are being utilized for the task of HAR. Lastly, the limitations and open challenges that needed to be addressed are discussed.
Collapse
Affiliation(s)
- Muhammad Haseeb Arshad
- Department of Computer Science, National University of Computer and Emerging Sciences, Chiniot-Faisalabad Campus, Chiniot 35400, Pakistan
| | - Muhammad Bilal
- Department of Software Engineering, National University of Computer and Emerging Sciences, Chiniot-Faisalabad Campus, Chiniot 35400, Pakistan
| | - Abdullah Gani
- Faculty of Computing and Informatics, University Malaysia Sabah, Kota Kinabalu 88400, Sabah, Malaysia
| |
Collapse
|
24
|
Fine-grained multi-view clustering with robust multi-prototypes representation. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03898-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
25
|
Wang Q, Han T, Gao J, Yuan Y. Neuron Linear Transformation: Modeling the Domain Shift for Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3238-3250. [PMID: 33502985 DOI: 10.1109/tnnls.2021.3051371] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cross-domain crowd counting (CDCC) is a hot topic due to its importance in public safety. The purpose of CDCC is to alleviate the domain shift between the source and target domain. Recently, typical methods attempt to extract domain-invariant features via image translation and adversarial learning. When it comes to specific tasks, we find that the domain shifts are reflected in model parameters' differences. To describe the domain gap directly at the parameter level, we propose a neuron linear transformation (NLT) method, exploiting domain factor and bias weights to learn the domain shift. Specifically, for a specific neuron of a source model, NLT exploits few labeled target data to learn domain shift parameters. Finally, the target neuron is generated via a linear transformation. Extensive experiments and analysis on six real-world data sets validate that NLT achieves top performance compared with other domain adaptation methods. An ablation study also shows that the NLT is robust and more effective than supervised and fine-tune training. Code is available at https://github.com/taohan10200/NLT.
Collapse
|
26
|
Signals and cues of social groups. Behav Brain Sci 2022; 45:e100. [PMID: 35796370 DOI: 10.1017/s0140525x21001461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A crucial factor in how we perceive social groups involves the signals and cues emitted by them. Groups signal various properties of their constitution through coordinated behaviors across sensory modalities, influencing receivers' judgments of the group and subsequent interactions. We argue that group communication is a necessary component of a comprehensive computational theory of social groups.
Collapse
|
27
|
Dong X, Wu D, Nie F, Wang R, Li X. Multi-view Clustering with Adaptive Procrustes on Grassmann Manifold. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.07.089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
28
|
Liang N, Yang Z, Li Z, Han W. Incomplete multi-view clustering with incomplete graph-regularized orthogonal non-negative matrix factorization. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03551-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
29
|
Hu S, Shi Z, Ye Y. DMIB: Dual-Correlated Multivariate Information Bottleneck for Multiview Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:4260-4274. [PMID: 33085626 DOI: 10.1109/tcyb.2020.3025636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multiview clustering (MVC) has recently been the focus of much attention due to its ability to partition data from multiple views via view correlations. However, most MVC methods only learn either interfeature correlations or intercluster correlations, which may lead to unsatisfactory clustering performance. To address this issue, we propose a novel dual-correlated multivariate information bottleneck (DMIB) method for MVC. DMIB is able to explore both interfeature correlations (the relationship among multiple distinct feature representations from different views) and intercluster correlations (the close agreement among clustering results obtained from individual views). For the former, we integrate both view-shared feature correlations discovered by learning a shared discriminative feature subspace and view-specific feature information to fully explore the interfeature correlation. This allows us to attain multiple reliable local clustering results of different views. Following this, we explore the intercluster correlations by learning the shared mutual information over different local clusterings for an improved global partition. By integrating both correlations, we formulate the problem as a unified information maximization function and further design a two-step method for optimization. Moreover, we theoretically prove the convergence of the proposed algorithm, and discuss the relationships between our method and several existing clustering paradigms. The experimental results on multiple datasets demonstrate the superiority of DMIB compared to several state-of-the-art clustering methods.
Collapse
|
30
|
Wang J, Zhang G, Zhang K, Zhao Y, Wang Q, Li X. Detection of Small Aerial Object Using Random Projection Feature With Region Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3957-3970. [PMID: 32991300 DOI: 10.1109/tcyb.2020.3018120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Small aerial object detection plays an important role in numerous computer vision tasks, including remote sensing, early warning systems, and visual tracking. Despite existing moving object detection techniques that can achieve reasonable results in normal size objects, they fail to distinguish the small objects from the dynamic background. To cope with this issue, a novel method is proposed for accurate small aerial object detection under different situations. Initially, the block segmentation is introduced for reducing frame information redundancy. Meanwhile, a random projection feature (RPF) is proposed for characterizing blocks into feature vectors. Subsequently, a moving direction estimation based on feature vectors is presented to measure the motions of blocks and filter out the major directions. Finally, variable search region clustering (VSRC), together with the color feature difference, is designed for extracting pixelwise targets from the remaining moving direction blocks. The comprehensive experiments demonstrate that our approach outperforms the level of state-of-the-art methods upon the integrity of small aerial objects, especially on the dynamic background and scale variation targets.
Collapse
|
31
|
Choudhary M, Tiwari V, Venkanna U. Iris Liveness Detection Using Fusion of Domain-Specific Multiple BSIF and DenseNet Features. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2370-2381. [PMID: 32697732 DOI: 10.1109/tcyb.2020.3005089] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the past few years, some fusion-based approaches have been proposed to constitute discriminatory features for iris liveness detection. However, several methods exist in the literature for iris feature extraction and, thus, identifying an optimal composite of such features is still a vital challenge. This article also proposes a score-level fusion of two distinct domain-specific features, i.e., multiple binarized statistical image feature (BSIF) and DenseNet-based features. However, instead of randomly scrutinizing such features, statistical tests are executed on six predominant iris features to identify the optimal feature set to combine. Particularly, this work emphasizes textured-lens-based presentation attacks and aims to identify the type of contact lenses within the iris samples. The experimental analysis depicts that the domain-specific features substantially outperform the generic features while discriminating live iris from the artifacts. Furthermore, the proposed fusion-based approach is assessed on three iris datasets and the outcomes are compared with various state of the arts using three validation protocols in terms of equal error rate (EER). The comparative analysis perceived that the proposed method obtains a significant performance gain over the existing approaches and offers an improved benchmark for both, iris liveness detection and contact lens identification.
Collapse
|
32
|
Bhuiyan MR, Abdullah J, Hashim N, Al Farid F, Ahsanul Haque M, Uddin J, Mohd Isa WN, Husen MN, Abdullah N. A deep crowd density classification model for Hajj pilgrimage using fully convolutional neural network. PeerJ Comput Sci 2022; 8:e895. [PMID: 35494812 PMCID: PMC9044363 DOI: 10.7717/peerj-cs.895] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 01/26/2022] [Indexed: 06/14/2023]
Abstract
This research enhances crowd analysis by focusing on excessive crowd analysis and crowd density predictions for Hajj and Umrah pilgrimages. Crowd analysis usually analyzes the number of objects within an image or a frame in the videos and is regularly solved by estimating the density generated from the object location annotations. However, it suffers from low accuracy when the crowd is far away from the surveillance camera. This research proposes an approach to overcome the problem of estimating crowd density taken by a surveillance camera at a distance. The proposed approach employs a fully convolutional neural network (FCNN)-based method to monitor crowd analysis, especially for the classification of crowd density. This study aims to address the current technological challenges faced in video analysis in a scenario where the movement of large numbers of pilgrims with densities ranging between 7 and 8 per square meter. To address this challenge, this study aims to develop a new dataset based on the Hajj pilgrimage scenario. To validate the proposed method, the proposed model is compared with existing models using existing datasets. The proposed FCNN based method achieved a final accuracy of 100%, 98%, and 98.16% on the proposed dataset, the UCSD dataset, and the JHU-CROWD dataset, respectively. Additionally, The ResNet based method obtained final accuracy of 97%, 89%, and 97% for the proposed dataset, UCSD dataset, and JHU-CROWD dataset, respectively. The proposed Hajj-Crowd-2021 crowd analysis dataset and the model outperformed the other state-of-the-art datasets and models in most cases.
Collapse
Affiliation(s)
- Md Roman Bhuiyan
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Junaidi Abdullah
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Noramiza Hashim
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Fahmid Al Farid
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selengor, Malaysia
| | - Mohammad Ahsanul Haque
- Data Scientist and Machine Learning Developer, Aalborg University, Aalborg, Aalborg, Denmark
| | - Jia Uddin
- Technology Studies Department, Woosong University, Daejeon, South Korea
| | | | - Mohd Nizam Husen
- Information Technology, Malaysian Institute of Information Technology Universiti Kuala Lumpur, Kuala Lumpur, Malaysia
| | - Norra Abdullah
- Computer Science, WSA Venture Australia (M) Sdn Bhd, Cyberjaya, Malaysia
| |
Collapse
|
33
|
Statistical Methods with Applications in Data Mining: A Review of the Most Recent Works. MATHEMATICS 2022. [DOI: 10.3390/math10060993] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The importance of statistical methods in finding patterns and trends in otherwise unstructured and complex large sets of data has grown over the past decade, as the amount of data produced keeps growing exponentially and knowledge obtained from understanding data allows to make quick and informed decisions that save time and provide a competitive advantage. For this reason, we have seen considerable advances over the past few years in statistical methods in data mining. This paper is a comprehensive and systematic review of these recent developments in the area of data mining.
Collapse
|
34
|
GCHGAT: pedestrian trajectory prediction using group constrained hierarchical graph attention networks. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02997-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
35
|
An efficient Spatial–Temporal model based on gated linear units for trajectory prediction. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.12.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
36
|
Qin Y, Wu H, Zhang X, Feng G. Semi-Supervised Structured Subspace Learning for Multi-View Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 31:1-14. [PMID: 34807827 DOI: 10.1109/tip.2021.3128325] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Multi-view clustering aims at simultaneously obtaining a consensus underlying subspace across multiple views and conducting clustering on the learned consensus subspace, which has gained a variety of interest in image processing. In this paper, we propose the Semi-supervised Structured Subspace Learning algorithm for clustering data points from Multiple sources (SSSL-M). We explicitly extend the traditional multi-view clustering with a semi-supervised manner and then build an anti-block-diagonal indicator matrix with small amount of supervisory information to pursue the block-diagonal structure of the shared affinity matrix. SSSL-M regularizes multiple view-specific affinity matrices into a shared affinity matrix based on reconstruction through a unified framework consisting of backward encoding networks and the self-expressive mapping. The shared affinity matrix is comprehensive and can flexibly encode complementary information from multiple view-specific affinity matrices. An enhanced structural consistency of affinity matrices from different views can be achieved and the intrinsic relationships among affinity matrices from multiple views can be effectively reflected in this manner. Technically, we formulate the proposed model as an optimization problem, which can be solved by an alternating optimization scheme. Experimental results over seven different benchmark datasets demonstrate that better clustering results can be obtained by our method compared with the state-of-the-art approaches.
Collapse
|
37
|
Zhang B, Wang N, Zhao Z, Abraham A, Liu H. Crowd counting based on attention-guided multi-scale fusion networks. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.045] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
38
|
Amirgholipour S, Jia W, Liu L, Fan X, Wang D, He X. PDANet: Pyramid density-aware attention based network for accurate crowd counting. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.04.037] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
39
|
Abhadiomhen SE, Wang Z, Shen X, Fan J. Multiview Common Subspace Clustering via Coupled Low Rank Representation. ACM T INTEL SYST TEC 2021; 12:1-25. [DOI: 10.1145/3465056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 05/01/2021] [Indexed: 10/20/2022]
Abstract
Multi-view subspace clustering (MVSC) finds a shared structure in latent low-dimensional subspaces of multi-view data to enhance clustering performance. Nonetheless, we observe that most existing MVSC methods neglect the diversity in multi-view data by considering only the common knowledge to find a shared structure either directly or by merging different similarity matrices learned for each view. In the presence of noise, this predefined shared structure becomes a biased representation of the different views. Thus, in this article, we propose a MVSC method based on coupled low-rank representation to address the above limitation. Our method first obtains a low-rank representation for each view, constrained to be a linear combination of the view-specific representation and the shared representation by simultaneously encouraging the sparsity of view-specific one. Then, it uses the
k
-block diagonal regularizer to learn a manifold recovery matrix for each view through respective low-rank matrices to recover more manifold structures from them. In this way, the proposed method can find an ideal similarity matrix by approximating clustering projection matrices obtained from the recovery structures. Hence, this similarity matrix denotes our clustering structure with exactly
k
connected components by applying a rank constraint on the similarity matrix’s relaxed Laplacian matrix to avoid spectral post-processing of the low-dimensional embedding matrix. The core of our idea is such that we introduce dynamic approximation into the low-rank representation to allow the clustering structure and the shared representation to guide each other to learn cleaner low-rank matrices that would lead to a better clustering structure. Therefore, our approach is notably different from existing methods in which the local manifold structure of data is captured in advance. Extensive experiments on six benchmark datasets show that our method outperforms 10 similar state-of-the-art compared methods in six evaluation metrics.
Collapse
Affiliation(s)
- Stanley Ebhohimhen Abhadiomhen
- School of Computer Science and Communication Engineering, Jiangsu University, China and Department of Computer Science, University of Nigeria, Nsukka, Nigeria
| | - Zhiyang Wang
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China
| | - Xiangjun Shen
- School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang, Jiangsu, China
| | | |
Collapse
|
40
|
Chen Z, Cong R, Xu Q, Huang Q. DPANet: Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7012-7024. [PMID: 33141667 DOI: 10.1109/tip.2020.3028289] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
There are two main issues in RGB-D salient object detection: (1) how to effectively integrate the complementarity from the cross-modal RGB-D data; (2) how to prevent the contamination effect from the unreliable depth map. In fact, these two problems are linked and intertwined, but the previous methods tend to focus only on the first problem and ignore the consideration of depth map quality, which may yield the model fall into the sub-optimal state. In this paper, we address these two issues in a holistic model synergistically, and propose a novel network named DPANet to explicitly model the potentiality of the depth map and effectively integrate the cross-modal complementarity. By introducing the depth potentiality perception, the network can perceive the potentiality of depth information in a learning-based manner, and guide the fusion process of two modal data to prevent the contamination occurred. The gated multi-modality attention module in the fusion process exploits the attention mechanism with a gate controller to capture long-range dependencies from a cross-modal perspective. Experimental results compared with 16 state-of-the-art methods on 8 datasets demonstrate the validity of the proposed approach both quantitatively and qualitatively. https://github.com/JosephChenHub/DPANet.
Collapse
|
41
|
Zhang C, Wang Q, Li X. V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.103] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
42
|
Filipic J, Biagini M, Mas I, Pose CD, Giribet JI, Parisi DR. People counting using visible and infrared images. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.03.089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
43
|
Liang C, Shang M, Luo J. Cancer Subtype Identification by Consensus Guided Graph Autoencoders. Bioinformatics 2021; 37:4779-4786. [PMID: 34289034 DOI: 10.1093/bioinformatics/btab535] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 06/22/2021] [Accepted: 07/19/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Cancer subtype identification aims to divide cancer patients into subgroups with distinct clinical phenotypes and facilitate the development for subgroup specific therapies. The massive amount of multi-omics datasets accumulated in the public databases have provided unprecedented opportunities to fulfill this task. As a result, great computational efforts have been made to accurately identify cancer subtypes via integrative analysis of these multi-omics datasets. RESULTS In this paper, we propose a Consensus Guided Graph Autoencoder (CGGA) to effectively identify cancer subtypes. First, we learn for each omic a new feature matrix by using graph autoencoders, where both structure information and node features can be effectively incorporated during the learning process. Second, we learn a set of omic-specific similarity matrices together with a consensus matrix based on the features obtained in the first step. The learned omic-specific similarity matrices are then fed back to the graph autoencoders to guide the feature learning. By iterating the two steps above, our method obtains a final consensus similarity matrix for cancer subtyping. To comprehensively evaluate the prediction performance of our method, we compare CGGA with several approaches ranging from general-purpose multi-view clustering algorithms to multi-omics-specific integrative methods. The experimental results on both generic datasets and cancer datasets confirm the superiority of our method. Moreover, we validate the effectiveness of our method in leveraging multi-omics datasets to identify cancer subtypes. In addition, we investigate the clinical implications of the obtained clusters for glioblastoma and provide new insights into the treatment for patients with different subtypes. AVAILABILITY The source code of our method is freely available at https://github.com/alcs417/CGGA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Mingchao Shang
- School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
| |
Collapse
|
44
|
Yang X, Zhu Q, Li P, Chen P, Niu Q. Fine-grained predicting urban crowd flows with adaptive spatio-temporal graph convolutional network. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.02.089] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
45
|
|
46
|
Ouyang P, Zhu J, Fan C, Niu Z, Zhan S. Local Enhancement and Bidirectional Feature Refinement Network for Single-Shot Detector. Cognit Comput 2021. [DOI: 10.1007/s12559-020-09814-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
47
|
Xiong Z, Yuan Y, Wang Q. ASK: Adaptively Selecting Key Local Features for RGB-D Scene Recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:2722-2733. [PMID: 33502980 DOI: 10.1109/tip.2021.3053459] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Indoor scene images usually contain scattered objects and various scene layouts, which make RGB-D scene classification a challenging task. Existing methods still have limitations for classifying scene images with great spatial variability. Thus, how to extract local patch-level features effectively using only image label is still an open problem for RGB-D scene recognition. In this article, we propose an efficient framework for RGB-D scene recognition, which adaptively selects important local features to capture the great spatial variability of scene images. Specifically, we design a differentiable local feature selection (DLFS) module, which can extract the appropriate number of key local scene-related features. Discriminative local theme-level and object-level representations can be selected with DLFS module from the spatially-correlated multi-modal RGB-D features. We take advantage of the correlation between RGB and depth modalities to provide more cues for selecting local features. To ensure that discriminative local features are selected, the variational mutual information maximization loss is proposed. Additionally, the DLFS module can be easily extended to select local features of different scales. By concatenating the local-orderless and global-structured multi-modal features, the proposed framework can achieve state-of-the-art performance on public RGB-D scene recognition datasets.
Collapse
|
48
|
|
49
|
Zhu H, Cheng Y, Peng X, Zhou JT, Kang Z, Lu S, Fang Z, Li L, Lim JH. Single-Image Dehazing via Compositional Adversarial Network. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:829-838. [PMID: 31902791 DOI: 10.1109/tcyb.2019.2955092] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single-image dehazing has been an important topic given the commonly occurred image degradation caused by adverse atmosphere aerosols. The key to haze removal relies on an accurate estimation of global air-light and the transmission map. Most existing methods estimate these two parameters using separate pipelines which reduces the efficiency and accumulates errors, thus leading to a suboptimal approximation, hurting the model interpretability, and degrading the performance. To address these issues, this article introduces a novel generative adversarial network (GAN) for single-image dehazing. The network consists of a novel compositional generator and a novel deeply supervised discriminator. The compositional generator is a densely connected network, which combines fine-scale and coarse-scale information. Benefiting from the new generator, our method can directly learn the physical parameters from data and recover clean images from hazy ones in an end-to-end manner. The proposed discriminator is deeply supervised, which enforces that the output of the generator to look similar to the clean images from low-level details to high-level structures. To the best of our knowledge, this is the first end-to-end generative adversarial model for image dehazing, which simultaneously outputs clean images, transmission maps, and air-lights. Extensive experiments show that our method remarkably outperforms the state-of-the-art methods. Furthermore, to facilitate future research, we create the HazeCOCO dataset which is currently the largest dataset for single-image dehazing.
Collapse
|
50
|
Yuan Y, Li X, Wang Q, Nie F. A semi-supervised learning algorithm via adaptive Laplacian graph. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.069] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|