1
|
Jiang J, Liu X, Yan P, Wei S, Cui Y. Localize-diffusion based dual-branch anomaly detection. Neural Netw 2025; 188:107439. [PMID: 40187081 DOI: 10.1016/j.neunet.2025.107439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 02/19/2025] [Accepted: 03/24/2025] [Indexed: 04/07/2025]
Abstract
Due to the scarcity of real anomaly samples for use in anomaly detection studies, data augmentation methods are typically employed to generate pseudo anomaly samples to supplement the limited real samples. However, existing data augmentation methods often generate image patches with fixed shapes as anomalies in random regions. These anomalies are unrealistic and lack diversity, resulting in generated samples with limited practical value. To address this issue, we propose a dual-branch anomaly detection (DBA) technique based on Localize-Diffusion (LD) augmentation. LD can infer the approximate position and size of the object to be detected based on the samples' color distribution: this can effectively avoid the problem of patch generation outside the target object's location. LD subsequently incorporates hard augmentation and continuously propagates irregular patches to the surrounding area, which enriches the diversity of the generated samples. Based on the anomalies' multi-scale characteristics, DBA adopts two branches for training and anomaly detection based on the generated pseudo anomaly samples: one focuses on identifying anomaly-specific features from learned anomalies, while the other discriminates between normal and anomaly samples based on residual features in the latent space. Finally, an adaptive scoring module is used to calculate a weighted average of the results of the two branches, achieving the goal of anomaly detection. Extensive experimental analyses reveal that DBA achieves excellent anomaly detection performance using only 14.2M parameters, notably achieving 99.6 detection AUC on the MVTec AD dataset.
Collapse
Affiliation(s)
- Jielin Jiang
- School of Software, Nanjing University of Information Science and Technology, Nanjing, 210044, Jiangsu, China; State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, Jiangsu, China; Jiangsu Province Engineering Research Center of Advanced Computing and Intelligent Services, Nanjing University of Information Science and Technology, Nanjing, 210044, Jiangsu, China.
| | - Xiying Liu
- School of Software, Nanjing University of Information Science and Technology, Nanjing, 210044, Jiangsu, China.
| | - Peiyi Yan
- School of Software, Nanjing University of Information Science and Technology, Nanjing, 210044, Jiangsu, China.
| | - Shun Wei
- School of Software, Nanjing University of Information Science and Technology, Nanjing, 210044, Jiangsu, China.
| | - Yan Cui
- College of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, 210038, Jiangsu, China.
| |
Collapse
|
2
|
Ma X, Wu J, Liu W. SAC-BL: A hypothesis testing framework for unsupervised visual anomaly detection and location. Neural Netw 2025; 185:107147. [PMID: 39892355 DOI: 10.1016/j.neunet.2025.107147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 10/28/2024] [Accepted: 01/10/2025] [Indexed: 02/03/2025]
Abstract
Reconstruction-based methods achieve promising performance for visual anomaly detection (AD), relying on the underlying assumption that the anomalies cannot be accurately reconstructed. However, this assumption does not always hold, especially when suffering weak anomalous (a.k.a. normal-like) examples. More significantly, the existing methods primarily devote to obtaining the strong discriminative score functions, but neglecting the systematic investigation of the decision rule based on the proposed score function. Unlike previous work, this paper solves the AD issue starting from the decision rule within the statistical framework, providing a new insight for AD community. Specifically, we frame the AD task as a multiple hypothesis testing problem, Then, we propose a novel betting-like (BL) procedure with an embedding of strong anomaly constraint network (SACNet), called SAC-BL, to address this testing problem. In SAC-BL, BL procedure serves as the decision rule and SACNet is trained to capture the critical discriminative information from weak anomalies. Theoretically, our SAC-BL can control false discovery rate (FDR) at the prescribed level. Finally, we conduct extensive experiments to verify the superiority of SAC-BL over previous method.
Collapse
Affiliation(s)
- Xinsong Ma
- School of Computer Science, Wuhan University, 299 Ba Yi Road, Wuchang District, Wuhan, 430072, Hubei, China.
| | - Jie Wu
- School of Computer Science, Wuhan University, 299 Ba Yi Road, Wuchang District, Wuhan, 430072, Hubei, China.
| | - Weiwei Liu
- School of Computer Science, Wuhan University, 299 Ba Yi Road, Wuchang District, Wuhan, 430072, Hubei, China.
| |
Collapse
|
3
|
Han D, Xu L, Zhou M, Wan J, Li M, Li G. Reconsidering learnable fine-grained text prompts for few-shot anomaly detection in visual-language models. Neural Netw 2025; 182:106906. [PMID: 39581046 DOI: 10.1016/j.neunet.2024.106906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 10/25/2024] [Accepted: 11/08/2024] [Indexed: 11/26/2024]
Abstract
Few-Shot Anomaly Detection (FSAD) in industrial images aims to identify abnormalities using only a few normal images, which is crucial for industrial scenarios where sample training is limited. The recent advances in large-scale pre-trained visual-language models have brought significant improvements to the FSAD, which typically requires hundreds of text prompts to be manually crafted through prompt engineering. However, manually designed text prompts cannot accurately match the informative features of different categories across diverse images, and the domain gap between train and test datasets can severely impact the generalization capability of text prompts. To address these issues, we propose a visual-language model based on fine-grained learnable text prompts as a unified general framework for FSAD in industry. Firstly, we design a Fine-grained Text Prompts Adapter (FTPA) and an associated registration loss to enhance the efficiency of text prompts. The manually designed text prompts are improved and optimized by capturing normal and abnormal semantic information in the image, so that the text prompts can describe the image semantic information at a finer granularity. In addition, we introduce a Dynamic Modulation Mechanism (DMM) to avoid potential errors in text prompts post-training due to the agnostic during cross-dataset detection. This is achieved by explicitly modulating the branch guided by few-shot images and the branch guided by fine-grained text prompts. Extensive experiments demonstrate that our proposed method achieves state-of-the-art few-shot industrial anomaly detection and segmentation performance. In the 4-shot, the AUROC of the anomaly classification and anomaly segmentation achieves 98.3%, 96.3%, and 93.8%, 97.9% on the MVTec-AD and VisA datasets, respectively.
Collapse
Affiliation(s)
- Delong Han
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, 250014, China.
| | - Luo Xu
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, 250014, China.
| | - Mingle Zhou
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, 250014, China.
| | - Jin Wan
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, 250014, China.
| | - Min Li
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, 250014, China.
| | - Gang Li
- Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250353, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, 250014, China.
| |
Collapse
|