1
|
Wang JC, Chen YJ, Zou Q. GRACE: Unveiling Gene Regulatory Networks With Causal Mechanistic Graph Neural Networks in Single-Cell RNA-Sequencing Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:9005-9017. [PMID: 38896510 DOI: 10.1109/tnnls.2024.3412753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Reconstructing gene regulatory networks (GRNs) using single-cell RNA sequencing (scRNA-seq) data holds great promise for unraveling cellular fate development and heterogeneity. While numerous machine-learning methods have been proposed to infer GRNs from scRNA-seq gene expression data, many of them operate solely in a statistical or black box manner, limiting their capacity for making causal inferences between genes. In this study, we introduce GRN inference with Accuracy and Causal Explanation (GRACE), a novel graph-based causal autoencoder framework that combines a structural causal model (SCM) with graph neural networks (GNNs) to enable GRN inference and gene causal reasoning from scRNA-seq data. By explicitly modeling causal relationships between genes, GRACE facilitates the learning of regulatory context and gene embeddings. With the learned gene signals, our model successfully decoding the causal structures and alleviates the accurate determination of multiple attributes of gene regulation that is important to determine the regulatory levels. Through extensive evaluations on seven benchmarks, we demonstrate that GRACE outperforms 14 state-of-the-art GRN inference methods, with the incorporation of causal mechanisms significantly enhancing the accuracy of GRN and gene causality inference. Furthermore, the application to human peripheral blood mononuclear cell (PBMC) samples reveals cell type-specific regulators in monocyte phagocytosis and immune regulation, validated through network analysis and functional enrichment analysis.
Collapse
|
2
|
Tian Y, Xie L, Fang J, Jiao J, Ye Q, Tian Q. Exploring Complicated Search Spaces With Interleaving-Free Sampling. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:7764-7771. [PMID: 39024083 DOI: 10.1109/tnnls.2024.3408329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Conventional neural architecture search (NAS) algorithms typically work on search spaces with short-distance node connections. We argue that such designs, though safe and stable, are obstacles to exploring more effective network architectures. In this brief, we explore the search algorithm upon a complicated search space with long-distance connections and show that existing weight-sharing search algorithms fail due to the existence of interleaved connections (ICs). Based on the observation, we present a simple-yet-effective algorithm, termed interleaving-free neural architecture search (IF-NAS). We further design a periodic sampling strategy to construct subnetworks during the search procedure, avoiding the ICs to emerge in any of them. In the proposed search space, IF-NAS outperforms both random sampling and previous weight-sharing search algorithms by significant margins. It can also be well-generalized to the microcell-based spaces. This study emphasizes the importance of macrostructure and we look forward to further efforts in this direction. The code is available at github.com/sunsmarterjie/IFNAS.
Collapse
|
3
|
Li Y, Xiao Z, Yang L, Meng D, Zhou X, Fan H, Zhang L. AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5454-5468. [PMID: 38662556 DOI: 10.1109/tnnls.2024.3384446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Multiobject tracking (MOT) is a fundamental problem in computer vision with numerous applications, such as intelligent surveillance and automated driving. Despite the significant progress made in MOT, pedestrian attributes, such as gender, hairstyle, body shape, and clothing features, which contain rich and high-level information, have been less explored. To address this gap, we propose a simple, effective, and generic method to predict pedestrian attributes to support general reidentification (Re-ID) embedding. We first introduce attribute multi-object tracking (AttMOT), a large, highly enriched synthetic dataset for pedestrian tracking, containing over 80k frames and six million pedestrian identity switches (IDs) with different times, weather conditions, and scenarios. To the best of authors' knowledge, AttMOT is the first MOT dataset with semantic attributes. Subsequently, we explore different approaches to fuse Re-ID embedding and pedestrian attributes, including attention mechanisms, which we hope will stimulate the development of attribute-assisted MOT. The proposed method attribute-assisted method (AAM) demonstrates its effectiveness and generality on several representative pedestrian MOT benchmarks, including MOT17 and MOT20, through experiments on the AttMOT dataset. When applied to the state-of-the-art trackers, AAM achieves consistent improvements in multi-object tracking accuracy (MOTA), higher order tracking accuracy (HOTA), association accuracy (AssA), IDs, and IDF1 scores. For instance, on MOT17, the proposed method yields a +1.1 MOTA, +1.7 HOTA, and +1.8 IDF1 improvement when used with FairMOT. To further encourage related research, we release the data and code at https://github.com/HengLan/AttMOT.
Collapse
|
4
|
Wang P, Su F, Zhao Z, Zhao Y, Boulgouris NV. GAReID: Grouped and Attentive High-Order Representation Learning for Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3990-4004. [PMID: 36197859 DOI: 10.1109/tnnls.2022.3209537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
As person parts are frequently misaligned between detected human boxes, an image representation that can handle this part misalignment is required. In this work, we propose an effective grouped attentive re-identification (GAReID) framework to learn part-aligned and background robust representations for person re-identification (ReID). Specifically, the GAReID framework consists of grouped high-order pooling (GHOP) and attentive high-order pooling (AHOP) layers, which generate high-order image and foreground features, respectively. In addition, a novel grouped Kronecker product (GKP) is proposed to use both channel group and shuffle strategies for high-order feature compression, while promoting the representational capabilities of compressed high-order features. We show that our method derives from an interpretable motivation and elegantly reduces part misalignments without using landmark detection or feature partition. This article theoretically and experimentally demonstrates the superiority of the GAReID framework, achieving state-of-the-art performance on various person ReID datasets.
Collapse
|
5
|
Zhu A, Wang Z, Xue J, Wan X, Jin J, Wang T, Snoussi H. Improving Text-Based Person Retrieval by Excavating All-Round Information Beyond Color. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5097-5111. [PMID: 38416620 DOI: 10.1109/tnnls.2024.3368217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Text-based person retrieval is the process of searching a massive visual resource library for images of a particular pedestrian, based on a textual query. Existing approaches often suffer from a problem of color (CLR) over-reliance, which can result in a suboptimal person retrieval performance by distracting the model from other important visual cues such as texture and structure information. To handle this problem, we propose a novel framework to Excavate All-round Information Beyond Color for the task of text-based person retrieval, which is therefore termed EAIBC. The EAIBC architecture includes four branches, namely an RGB branch, a grayscale (GRS) branch, a high-frequency (HFQ) branch, and a CLR branch. Furthermore, we introduce a mutual learning (ML) mechanism to facilitate communication and learning among the branches, enabling them to take full advantage of all-round information in an effective and balanced manner. We evaluate the proposed method on three benchmark datasets, including CUHK-PEDES, ICFG-PEDES, and RSTPReid. The experimental results demonstrate that EAIBC significantly outperforms existing methods and achieves state-of-the-art (SOTA) performance in supervised, weakly supervised, and cross-domain settings.
Collapse
|
6
|
Lu Z, Lin R, Hu H. Disentangling Modality and Posture Factors: Memory-Attention and Orthogonal Decomposition for Visible-Infrared Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5494-5508. [PMID: 38619964 DOI: 10.1109/tnnls.2024.3384023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Striving to match the person identities between visible (VIS) and near-infrared (NIR) images, VIS-NIR reidentification (Re-ID) has attracted increasing attention due to its wide applications in low-light scenes. However, owing to the modality and pose discrepancies exhibited in heterogeneous images, the extracted representations inevitably comprise various modality and posture factors, impacting the matching of cross-modality person identity. To solve the problem, we propose a disentangling modality and posture factors (DMPFs) model to disentangle modality and posture factors by fusing the information of features memory and pedestrian skeleton. Specifically, the DMPF comprises three modules: three-stream features extraction network (TFENet), modality factor disentanglement (MFD), and posture factor disentanglement (PFD). First, aiming to provide memory and skeleton information for modality and posture factors disentanglement, the TFENet is designed as a three-stream network to extract VIS-NIR image features and skeleton features. Second, to eliminate modality discrepancy across different batches, we maintain memory queues of previous batch features through the momentum updating mechanism and propose MFD to integrate features in the whole training set by memory-attention layers. These layers explore intramodality and intermodality relationships between features from the current batch and memory queues under the optimization of the optimal transport (OT) method, which encourages the heterogeneous features with the same identity to present higher similarity. Third, to decouple the posture factors from representations, we introduce the PFD module to learn posture-unrelated features with the assistance of the skeleton features. Besides, we perform subspace orthogonal decomposition on both image and skeleton features to separate the posture-related and identity-related information. The posture-related features are adopted to disentangle the posture factors from representations by a designed posture-features consistency (PfC) loss, while the identity-related features are concatenated to obtain more discriminative identity representations. The effectiveness of DMPF is validated through comprehensive experiments on two VIS-NIR pedestrian Re-ID datasets.
Collapse
|
7
|
Singh J, Murala S, Kosuru GSR. KL-DNAS: Knowledge Distillation-Based Latency Aware-Differentiable Architecture Search for Video Motion Magnification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2342-2352. [PMID: 38190685 DOI: 10.1109/tnnls.2023.3346169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
Video motion magnification is the task of making subtle minute motions visible. Many times subtle motion occurs while being invisible to the naked eye, e.g., slight deformations in muscles of an athlete, small vibrations in the objects, microexpression, and chest movement while breathing. Magnification of such small motions has resulted in various applications like posture deformities detection, microexpression recognition, and studying the structural properties. State-of-the-art (SOTA) methods have fixed computational complexity, which makes them less suitable for applications requiring different time constraints, e.g., real-time respiratory rate measurement and microexpression classification. To solve this problem, we propose a knowledge distillation-based latency aware-differentiable architecture search (KL-DNAS) method for video motion magnification. To reduce memory requirements and to improve denoising characteristics, we use a teacher network to search the network by parts using knowledge distillation (KD). Furthermore, search among different receptive fields and multifeature connections are applied for individual layers. Also, a novel latency loss is proposed to jointly optimize the target latency constraint and output quality. We are able to find smaller model than the SOTA method and better motion magnification with lesser distortions. https://github.com/jasdeep-singh-007/KL-DNAS.
Collapse
|
8
|
Lopes V, Alexandre LA. Toward Less Constrained Macro-Neural Architecture Search. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2854-2868. [PMID: 37906493 DOI: 10.1109/tnnls.2023.3326648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Networks found with neural architecture search (NAS) achieve the state-of-the-art performance in a variety of tasks, out-performing human-designed networks. However, most NAS methods heavily rely on human-defined assumptions that constrain the search: architecture's outer skeletons, number of layers, parameter heuristics, and search spaces. In addition, common search spaces consist of repeatable modules (cells) instead of fully exploring the architecture's search space by designing entire architectures (macro-search). Imposing such constraints requires deep human expertise and restricts the search to predefined settings. In this article, we propose less constrained macro-neural architecture search (LCMNAS), a method that pushes NAS to less constrained search spaces by performing macro-search without relying on predefined heuristics or bounded search spaces. LCMNAS introduces three components for the NAS pipeline: 1) a method that leverages information about well-known architectures to autonomously generate complex search spaces based on weighted directed graphs (WDGs) with hidden properties; 2) an evolutionary search strategy that generates complete architectures from scratch; and 3) a mixed-performance estimation approach that combines information about architectures at the initialization stage and lower fidelity estimates to infer their trainability and capacity to model complex functions. We present experiments in 14 different datasets showing that LCMNAS is capable of generating both cell and macro-based architectures with minimal GPU computation and state-of-the-art results. Moreover, we conduct extensive studies on the importance of different NAS components in both cell and macro-based settings. The code for reproducibility is publicly available at https://github.com/VascoLopes/LCMNAS.
Collapse
|
9
|
Wu Q, Li J, Dai P, Ye Q, Cao L, Wu Y, Ji R. Unsupervised Domain Adaptation on Person Reidentification Via Dual-Level Asymmetric Mutual Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1371-1382. [PMID: 37934637 DOI: 10.1109/tnnls.2023.3326477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Unsupervised domain adaptation (UDA) person reidentification (Re-ID) aims to identify pedestrian images within an unlabeled target domain with an auxiliary labeled source-domain dataset. Many existing works attempt to recover reliable identity information by considering multiple homogeneous networks. And take these generated labels to train the model in the target domain. However, these homogeneous networks identify people in approximate subspaces and equally exchange their knowledge with others or their mean net to improve their ability, inevitably limiting the scope of available knowledge and putting them into the same mistake. This article proposes a dual-level asymmetric mutual learning (DAML) method to learn discriminative representations from a broader knowledge scope with diverse embedding spaces. Specifically, two heterogeneous networks mutually learn knowledge from asymmetric subspaces through the pseudo label generation in a hard distillation manner. The knowledge transfer between two networks is based on an asymmetric mutual learning (AML) manner. The teacher network learns to identify both the target and source domain while adapting to the target domain distribution based on the knowledge of the student. Meanwhile, the student network is trained on the target dataset and employs the ground-truth label through the knowledge of the teacher. Extensive experiments in Market-1501, CUHK-SYSU, and MSMT17 public datasets verified the superiority of DAML over state-of-the-arts (SOTA).
Collapse
|
10
|
Zhu K, Guo H, Zhang S, Wang Y, Liu J, Wang J, Tang M. AAformer: Auto-Aligned Transformer for Person Re-Identification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17307-17317. [PMID: 37624720 DOI: 10.1109/tnnls.2023.3301856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/27/2023]
Abstract
In person re-identification (re-ID), extracting part-level features from person images has been verified to be crucial to offer fine-grained information. Most of the existing CNN-based methods only locate the human parts coarsely, or rely on pretrained human parsing models and fail in locating the identifiable nonhuman parts (e.g., knapsack). In this article, we introduce an alignment scheme in transformer architecture for the first time and propose the auto-aligned transformer (AAformer) to automatically locate both the human parts and nonhuman ones at patch level. We introduce the "Part tokens ([PART]s)," which are learnable vectors, to extract part features in the transformer. A [PART] only interacts with a local subset of patches in self-attention and learns to be the part representation. To adaptively group the image patches into different subsets, we design the auto-alignment. Auto-alignment employs a fast variant of optimal transport (OT) algorithm to online cluster the patch embeddings into several groups with the [PART]s as their prototypes. AAformer integrates the part alignment into the self-attention and the output [PART]s can be directly used as part features for retrieval. Extensive experiments validate the effectiveness of [PART]s and the superiority of AAformer over various state-of-the-art methods.
Collapse
|
11
|
Yan S, Tang H, Zhang L, Tang J. Image-Specific Information Suppression and Implicit Local Alignment for Text-Based Person Search. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17973-17986. [PMID: 37713222 DOI: 10.1109/tnnls.2023.3310118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
Text-based person search (TBPS) is a challenging task that aims to search pedestrian images with the same identity from an image gallery given a query text. In recent years, TBPS has made remarkable progress, and state-of-the-art (SOTA) methods achieve superior performance by learning local fine-grained correspondence between images and texts. However, most existing methods rely on explicitly generated local parts to model fine-grained correspondence between modalities, which is unreliable due to the lack of contextual information or the potential introduction of noise. Moreover, the existing methods seldom consider the information inequality problem between modalities caused by image-specific information. To address these limitations, we propose an efficient joint multilevel alignment network (MANet) for TBPS, which can learn aligned image/text feature representations between modalities at multiple levels, and realize fast and effective person search. Specifically, we first design an image-specific information suppression (ISS) module, which suppresses image background and environmental factors by relation-guided localization (RGL) and channel attention filtration (CAF), respectively. This module effectively alleviates the information inequality problem and realizes the alignment of information volume between images and texts. Second, we propose an implicit local alignment (ILA) module to adaptively aggregate all pixel/word features of image/text to a set of modality-shared semantic topic centers and implicitly learn the local fine-grained correspondence between modalities without additional supervision and cross-modal interactions. Also, a global alignment (GA) is introduced as a supplement to the local perspective. The cooperation of global and local alignment modules enables better semantic alignment between modalities. Extensive experiments on multiple databases demonstrate the effectiveness and superiority of our MANet.
Collapse
|
12
|
Dai Y, Wang X, Gao L, Song J, Zheng F, Shen HT. Overcoming Data Deficiency for Multi-Person Pose Estimation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10857-10868. [PMID: 37163399 DOI: 10.1109/tnnls.2023.3244957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Building multi-person pose estimation (MPPE) models that can handle complex foreground and uncommon scenes is an important challenge in computer vision. Aside from designing novel models, strengthening training data is a promising direction but remains largely unexploited for the MPPE task. In this article, we systematically identify the key deficiencies of existing pose datasets that prevent the power of well-designed models from being fully exploited and propose the corresponding solutions. Specifically, we find that the traditional data augmentation techniques are inadequate in addressing the two key deficiencies, imbalanced instance complexity (IC) (evaluated by our new metric IC) and insufficient realistic scenes. To overcome these deficiencies, we propose a model-agnostic full-view data generation (Full-DG) method to enrich the training data from the perspectives of both poses and scenes. By hallucinating images with more balanced pose complexity and richer real-world scenes, Full-DG can help improve pose estimators' robustness and generalizability. In addition, we introduce a plug-and-play adaptive category-aware loss (AC-loss) to alleviate the severe pixel-level imbalance between keypoints and backgrounds (i.e., around 1:600). Full-DG together with AC-loss can be readily applied to both the bottom-up and top-down models to improve their accuracy. Notably, plugging into the representative estimators HigherHRNet and HRNet, our method achieves substantial performance gains of 1.0%-2.9% AP on the COCO benchmark, and 1.0%-5.1% AP on the CrowdPose benchmark.
Collapse
|
13
|
Zheng Z, Wang X, Zheng N, Yang Y. Parameter-Efficient Person Re-Identification in the 3D Space. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7534-7547. [PMID: 36315532 DOI: 10.1109/tnnls.2022.3214834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
People live in a 3D world. However, existing works on person re-identification (re-id) mostly consider the semantic representation learning in a 2D space, intrinsically limiting the understanding of people. In this work, we address this limitation by exploring the prior knowledge of the 3D body structure. Specifically, we project 2D images to a 3D space and introduce a novel parameter-efficient omni-scale graph network (OG-Net) to learn the pedestrian representation directly from 3D point clouds. OG-Net effectively exploits the local information provided by sparse 3D points and takes advantage of the structure and appearance information in a coherent manner. With the help of 3D geometry information, we can learn a new type of deep re-id feature free from noisy variants, such as scale and viewpoint. To our knowledge, we are among the first attempts to conduct person re-id in the 3D space. We demonstrate through extensive experiments that the proposed method: (1) eases the matching difficulty in the traditional 2D space; 2) exploits the complementary information of 2D appearance and 3D structure; 3) achieves competitive results with limited parameters on four large-scale person re-id datasets; and 4) has good scalability to unseen datasets. Our code, models, and generated 3D human data are publicly available at https://github.com/layumi/person-reid-3d.
Collapse
|
14
|
Peng C, Li Y, Shang R, Jiao L. ReCNAS: Resource-Constrained Neural Architecture Search Based on Differentiable Annealing and Dynamic Pruning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2805-2819. [PMID: 35862327 DOI: 10.1109/tnnls.2022.3192169] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The differentiable neural architecture search (NAS) framework has obtained extensive attention and achieved remarkable performance due to its search efficiency. However, most existing differentiable NAS methods still suffer from issues of model collapse, degenerated search-evaluation correlation, and inefficient hardware deployment, which causes the searched architectures to be suboptimal in accuracy and cannot meet different computation resource constraints (e.g., FLOPs and latency). In this article, we propose a novel resource-constrained NAS (ReCNAS) method, which can efficiently search high-performance architectures that satisfy the given constraints, and deal with the issues observed in previous differentiable NAS methods from three aspects: search space, search strategy, and resource adaptability. First, we introduce an elastic densely connected layerwise search space, which decouples the architecture depth representation from the search of candidate operations to alleviate the aggregation of skip connections and architecture redundancies. Second, a scheme of group annealing and progressive pruning is proposed to improve the efficiency and bridge the search-evaluation gap, which steadily forces the architecture parameters close to binary distribution and progressively prunes the inferior operations. Third, we present a novel resource-constrained architecture generation method, which prunes the redundant channel throughout the search based on dynamic programming, making the searched architecture scalable to different devices and requirements. Extensive experimental results demonstrate the efficiency and search stability of our ReCNAS, which is capable of discovering high-performance architectures on different datasets and tasks, surpassing other NAS methods, while tightly meeting the target resource constraints without any tuning required. Besides, the searched architectures show strong generalizability to other complex vision tasks.
Collapse
|