1
|
Wang G, Zhang X, Peng Z, Zhang T, Tang X, Zhou H, Jiao L. Negative Deterministic Information-Based Multiple Instance Learning for Weakly Supervised Object Detection and Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6188-6202. [PMID: 38748523 DOI: 10.1109/tnnls.2024.3395751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
Weakly supervised object detection (WSOD) and semantic segmentation with image-level annotations have attracted extensive attention due to their high label efficiency. Multiple instance learning (MIL) offers a feasible solution for the two tasks by treating each image as a bag with a series of instances (object regions or pixels) and identifying foreground instances that contribute to bag classification. However, conventional MIL paradigms often suffer from issues, e.g., discriminative instance domination and missing instances. In this article, we observe that negative instances usually contain valuable deterministic information, which is the key to solving the two issues. Motivated by this, we propose a novel MIL paradigm based on negative deterministic information (NDI), termed NDI-MIL, which is based on two core designs with a progressive relation: NDI collection and negative contrastive learning (NCL). In NDI collection, we identify and distill NDI from negative instances online by a dynamic feature bank. The collected NDI is then utilized in a NCL mechanism to locate and punish those discriminative regions, by which the discriminative instance domination and missing instances issues are effectively addressed, leading to improved object- and pixel-level localization accuracy and completeness. In addition, we design an NDI-guided instance selection (NGIS) strategy to further enhance the systematic performance. Experimental results on several public benchmarks, including PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO, show that our method achieves satisfactory performance. The code is available at: https://github.com/GC-WSL/NDI.
Collapse
|
2
|
Jiao F, Shang Z, Lu H, Chen P, Chen S, Xiao J, Zhang F, Zhang D, Lv C, Han Y. A weakly supervised deep learning framework for automated PD-L1 expression analysis in lung cancer. Front Immunol 2025; 16:1540087. [PMID: 40230846 PMCID: PMC11994606 DOI: 10.3389/fimmu.2025.1540087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Accepted: 03/12/2025] [Indexed: 04/16/2025] Open
Abstract
The growing application of immune checkpoint inhibitors (ICIs) in cancer immunotherapy has underscored the critical need for reliable methods to identify patient populations likely to respond to ICI treatments, particularly in lung cancer treatment. Currently, the tumor proportion score (TPS), a crucial biomarker for patient selection, relies on manual interpretation by pathologists, which often shows substantial variability and inconsistency. To address these challenges, we innovatively developed multi-instance learning for TPS (MiLT), an innovative artificial intelligence (AI)-powered tool that predicts TPS from whole slide images. Our approach leverages multiple instance learning (MIL), which significantly reduces the need for labor-intensive cell-level annotations while maintaining high accuracy. In comprehensive validation studies, MiLT demonstrated remarkable consistency with pathologist assessments (intraclass correlation coefficient = 0.960, 95% confidence interval = 0.950-0.971) and robust performance across both internal and external cohorts. This tool not only standardizes TPS evaluation but also adapts to various clinical standards and provides time-efficient predictions, potentially transforming routine pathological practice. By offering a reliable, AI-assisted solution, MiLT could significantly improve patient selection for immunotherapy and reduce inter-observer variability among pathologists. These promising results warrant further exploration in prospective clinical trials and suggest new possibilities for integrating advanced AI in pathological diagnostics. MiLT represents a significant step toward more precise and efficient cancer immunotherapy decision-making.
Collapse
Affiliation(s)
- Feng Jiao
- Department of Oncology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhanxian Shang
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| | - Hongmin Lu
- Department of Oncology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Peilin Chen
- Department of Clinical and Translational Medicine, 3D Medicines Inc., Shanghai, China
| | - Shiting Chen
- Department of Clinical and Translational Medicine, 3D Medicines Inc., Shanghai, China
| | - Jiayi Xiao
- School of Life Science and Technology, Tongji University, Shanghai, China
| | - Fuchuang Zhang
- Department of Clinical and Translational Medicine, 3D Medicines Inc., Shanghai, China
| | - Dadong Zhang
- Department of Clinical and Translational Medicine, 3D Medicines Inc., Shanghai, China
| | - Chunxin Lv
- Department of Oncology, Shanghai Punan Hospital of Pudong New District, Shanghai, China
| | - Yuchen Han
- Department of Pathology, Shanghai Chest Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai, China
| |
Collapse
|
3
|
Urabe A, Adachi M, Sakamoto N, Kojima M, Ishikawa S, Ishii G, Yano T, Sakashita S. Deep learning detected histological differences between invasive and non-invasive areas of early esophageal cancer. Cancer Sci 2025; 116:824-834. [PMID: 39692707 PMCID: PMC11875758 DOI: 10.1111/cas.16426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Revised: 11/18/2024] [Accepted: 11/25/2024] [Indexed: 12/19/2024] Open
Abstract
The depth of invasion plays a critical role in predicting the prognosis of early esophageal cancer, but the reasons behind invasion and the changes occurring in invasive areas are still not well understood. This study aimed to explore the morphological differences between invasive and non-invasive areas in early esophageal cancer specimens that have undergone endoscopic submucosal dissection (ESD), using artificial intelligence (AI) to shed light on the underlying mechanisms. In this study, data from 75 patients with esophageal squamous cell carcinoma (ESCC) were analyzed and endoscopic assessments were conducted to determine submucosal (SM) invasion. An AI model, specifically a Clustering-constrained Attention Multiple Instance Learning model (CLAM), was developed to predict the depth of cancer by training on surface histological images taken from both invasive and non-invasive regions. The AI model highlighted specific image portions, or patches, which were further examined to identify morphological differences between the two types of areas. The 256-pixel AI model demonstrated an average area under the receiver operating characteristic curve (AUC) value of 0.869 and an accuracy (ACC) of 0.788. The analysis of the AI-identified patches revealed that regions with invasion (SM) exhibited greater vascularity compared with non-invasive regions (epithelial). The invasive patches were characterized by a significant increase in the number and size of blood vessels, as well as a higher count of red blood cells (all with p-values <0.001). In conclusion, this study demonstrated that AI could identify critical differences in surface histopathology between non-invasive and invasive regions, particularly highlighting a higher number and larger size of blood vessels in invasive areas.
Collapse
Affiliation(s)
- Akiko Urabe
- Department of Pathology and Clinical LaboratoriesNational Cancer Center Hospital EastKashiwaChibaJapan
- Department of Gastroenterology and EndoscopyNational Cancer Center Hospital EastKashiwaChibaJapan
| | - Masahiro Adachi
- Department of Pathology and Clinical LaboratoriesNational Cancer Center Hospital EastKashiwaChibaJapan
| | - Naoya Sakamoto
- Division of Pathology, Exploratory Oncology Research & Clinical Trial CenterNational Cancer CenterKashiwaChibaJapan
| | - Motohiro Kojima
- Department of Pathology and Clinical LaboratoriesNational Cancer Center Hospital EastKashiwaChibaJapan
| | - Shumpei Ishikawa
- Division of Pathology, Exploratory Oncology Research & Clinical Trial CenterNational Cancer CenterKashiwaChibaJapan
| | - Genichiro Ishii
- Department of Pathology and Clinical LaboratoriesNational Cancer Center Hospital EastKashiwaChibaJapan
| | - Tomonori Yano
- Department of Gastroenterology and EndoscopyNational Cancer Center Hospital EastKashiwaChibaJapan
| | - Shingo Sakashita
- Division of Pathology, Exploratory Oncology Research & Clinical Trial CenterNational Cancer CenterKashiwaChibaJapan
| |
Collapse
|
4
|
Xiao Y, Liu B, Hao Z. Multi-Instance Nonparallel Tube Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2563-2577. [PMID: 38241095 DOI: 10.1109/tnnls.2023.3347449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2024]
Abstract
In multi-instance nonparallel plane learning (NPL), the training set is comprised of bags of instances and the nonparallel planes are trained to classify the bags. Most of the existing multi-instance NPL methods are proposed based on a twin support vector machine (TWSVM). Similar to TWSVM, they use only a single plane to generalize the data occurrence of one class and do not sufficiently consider the boundary information, which may lead to the limitation of their classification accuracy. In this article, we propose a multi-instance nonparallel tube learning (MINTL) method. Distinguished from the existing multi-instance NPL methods, MINTL embeds the boundary information into the classifier by learning a large-margin-based -tube for each class, such that the boundary information can be incorporated into refining the classifier and further improving the performance. Specifically, given a -class multi-instance dataset, MINTL seeks -tubes, one for each class. In multi-instance learning, each positive bag contains at least one positive instance. To build up the -tube of class , we require that each bag of class should have at least one instance included in the -tube. Moreover, except for one instance included in the -tube, the remaining instances in the positive bag may include positive instances or irrelevant instances, and their labels are unavailable. A large margin constraint is presented to assign the remaining instances either inside the -tube or outside the -tube with a large margin. Substantial experiments on real-world datasets have shown that MINTL obtains significantly better classification accuracy than the existing multi-instance NPL methods.
Collapse
|
5
|
Li Z, Wang C. Achieving Sharp Upper Bounds on the Expressive Power of Neural Networks via Tropical Polynomials. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2931-2945. [PMID: 38315593 DOI: 10.1109/tnnls.2024.3350786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The expressive power of neural networks describes the ability to represent or approximate complex functions. The number of linear regions is the standard and most natural measure of expressive power. However, a major challenge in utilizing the number of linear regions as a measure of expressive power is the exponential gap between the theoretical upper and lower bounds, which becomes more pronounced as the neural network capacity increases. In this article, we aim to derive a sharp upper bound on piecewise linear neural networks (PLNNs) to bridge this gap. Specifically, we first establish the relationship between tropical polynomials and PLNNs. In the unexpanded tropical polynomials form, we make the proposition that hyperplanes are not all in the general positions, thereby reducing the number of intersecting hyperplanes. We propose a rank-based approach and present the empirical analysis that this approach outperforms previous Zaslavsky's theorem-based methods. In the expanded tropical polynomials form, accounting for limitations in weight initialization and model computational precision, we raise the concept that the values range of each term is bounded. We propose a precision-based approach that transforms the approximate exponential growth of the number of linear regions into polynomial growth with width, which is effective at larger layer widths. Finally, we compare the number of linear regions that can be represented by each hidden layer in both forms and derive a sharp upper bound for PLNNs. Empirical analysis and experimental results provide compelling evidence for the efficacy and feasibility of this sharp upper bound on both simulated experiments and real datasets.
Collapse
|
6
|
Bai J, Ren J, Xiao Z, Chen Z, Gao C, Ali TAA, Jiao L. Localizing From Classification: Self-Directed Weakly Supervised Object Localization for Remote Sensing Images. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17935-17949. [PMID: 37672374 DOI: 10.1109/tnnls.2023.3309889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
In recent years, object localization and detection methods in remote sensing images (RSIs) have received increasing attention due to their broad applications. However, most previous fully supervised methods require a large number of time-consuming and labor-intensive instance-level annotations. Compared with those fully supervised methods, weakly supervised object localization (WSOL) aims to recognize object instances using only image-level labels, which greatly saves the labeling costs of RSIs. In this article, we propose a self-directed weakly supervised strategy (SD-WSS) to perform WSOL in RSIs. To specify, we fully exploit and enhance the spatial feature extraction capability of the RSIs' classification model to accurately localize the objects of interest. To alleviate the serious discriminative region problem exhibited by previous WSOL methods, the spatial location information implicit in the classification model is carefully extracted by GradCAM++ to guide the learning procedure. Furthermore, to eliminate the interference from complex backgrounds of RSIs, we design a novel self-directed loss to make the model optimize itself and explicitly tell it where to look. Finally, we review and annotate the existing remote sensing scene classification dataset and create two new WSOL benchmarks in RSIs, named C45V2 and PN2. We conduct extensive experiments to evaluate the proposed method and six mainstream WSOL methods with three backbones on C45V2 and PN2. The results demonstrate that our proposed method achieves better performance when compared with state-of-the-arts.
Collapse
|
7
|
Chen W, Yu Z, Yang C, Lu Y. Abnormal Behavior Recognition Based on 3D Dense Connections. Int J Neural Syst 2024; 34:2450049. [PMID: 39010725 DOI: 10.1142/s0129065724500497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Abnormal behavior recognition is an important technology used to detect and identify activities or events that deviate from normal behavior patterns. It has wide applications in various fields such as network security, financial fraud detection, and video surveillance. In recent years, Deep Convolution Networks (ConvNets) have been widely applied in abnormal behavior recognition algorithms and have achieved significant results. However, existing abnormal behavior detection algorithms mainly focus on improving the accuracy of the algorithms and have not explored the real-time nature of abnormal behavior recognition. This is crucial to quickly identify abnormal behavior in public places and improve urban public safety. Therefore, this paper proposes an abnormal behavior recognition algorithm based on three-dimensional (3D) dense connections. The proposed algorithm uses a multi-instance learning strategy to classify various types of abnormal behaviors, and employs dense connection modules and soft-threshold attention mechanisms to reduce the model's parameter count and enhance network computational efficiency. Finally, redundant information in the sequence is reduced by attention allocation to mitigate its negative impact on recognition results. Experimental verification shows that our method achieves a recognition accuracy of 95.61% on the UCF-crime dataset. Comparative experiments demonstrate that our model has strong performance in terms of recognition accuracy and speed.
Collapse
Affiliation(s)
- Wei Chen
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, P. R. China
| | - Zhanhe Yu
- School of Information Science and Technology, North China University of Technology, Beijing 100144, P. R. China
| | - Chaochao Yang
- School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, P. R. China
| | - Yuanyao Lu
- School of Information Science and Technology, North China University of Technology, Beijing 100144, P. R. China
| |
Collapse
|
8
|
Liu M, Bian Y, Liu Q, Wang X, Wang Y. Weakly Supervised Tracklet Association Learning With Video Labels for Person Re-Identification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3595-3607. [PMID: 38133978 DOI: 10.1109/tpami.2023.3346168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Supervised person re-identification (re-id) methods require expensive manual labeling costs. Although unsupervised re-id methods can reduce the requirement of the labeled datasets, the performance of these methods is lower than the supervised alternatives. Recently, some weakly supervised learning-based person re-id methods have been proposed, which is a balance between supervised and unsupervised learning. Nevertheless, most of these models require another auxiliary fully supervised datasets or ignore the interference of noisy tracklets. To address this problem, in this work, we formulate a weakly supervised tracklet association learning (WS-TAL) model only leveraging the video labels. Specifically, we first propose an intra-bag tracklet discrimination learning (ITDL) term. It can capture the associations between person identities and images by assigning pseudo labels to each person image in a bag. And then, the discriminative feature for each person is learned by utilizing the obtained associations after filtering the noisy tracklets. Based on that, a cross-bag tracklet association learning (CTAL) term is presented to explore the potential tracklet associations between bags by mining reliable positive tracklet pairs and hard negative pairs. Finally, these two complementary terms are jointly optimized to train our re-id model. Extensive experiments on the weakly labeled datasets demonstrate that WS-TAL achieves 88.1% and 90.3% rank-1 accuracy on the MARS and DukeMTMC-VideoReID datasets respectively. The performance of our model surpasses the state-of-the-art weakly supervised models by a large margin, even outperforms some fully supervised re-id models.
Collapse
|
9
|
Xu Y, Zhou C, Yu X, Yang Y. Cyclic Self-Training With Proposal Weight Modulation for Cross-Supervised Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1992-2002. [PMID: 37015123 DOI: 10.1109/tip.2023.3261752] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Weakly-supervised object detection (WSOD), which requires only image-level annotations for training detectors, has gained enormous attention. Despite recent rapid advance in WSOD, there remains a large performance gap compared with fully-supervised object detection. To narrow the performance gap, we study cross-supervised object detection (CSOD), where existing classes (base classes) have instance-level annotations while newly added classes (novel classes) only need image-level annotations. For improving localization accuracy, we propose a Cyclic Self-Training (CST) method to introduce instance-level supervision into a commonly used WSOD method, online instance classifier refinement (OICR). Our proposed CST consists of forward pseudo labeling and backward pseudo labeling. Specifically, OICR exploits the forward pseudo labeling to generate pseudo ground-truth bounding-boxes for all classes, thus enabling instance classifier training. Then, the backward pseudo labeling is designed to generate pseudo ground-truth bounding-boxes of higher quality for novel classes by fusing the predictions of the instance classifiers. As a result, both novel and base classes will have bounding-box annotations for training, alleviating the supervision inconsistency between base and novel classes. In the forward pseudo labeling, the generated pseudo ground-truths may be misaligned with objects and thus introduce poor-quality examples for training the ICs. To reduce the impacts of these poor-quality training examples, we propose a Proposal Weight Modulation (PWM) module learned in a class-agnostic and contrastive manner by exploiting bounding-box annotations of base classes. Experiments on PASCAL VOC and MS COCO datasets demonstrate the superiority of our proposed method.
Collapse
|
10
|
Wu Z, Wen J, Xu Y, Yang J, Li X, Zhang D. Enhanced Spatial Feature Learning for Weakly Supervised Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:961-972. [PMID: 35675239 DOI: 10.1109/tnnls.2022.3178180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Weakly supervised object detection (WSOD) has become an effective paradigm, which requires only class labels to train object detectors. However, WSOD detectors are prone to learn highly discriminative features corresponding to local objects rather than complete objects, resulting in imprecise object localization. To address the issue, designing backbones specifically for WSOD is a feasible solution. However, the redesigned backbone generally needs to be pretrained on large-scale ImageNet or trained from scratch, both of which require much more time and computational costs than fine-tuning. In this article, we explore to optimize the backbone without losing the availability of the original pretrained model. Since the pooling layer summarizes neighborhood features, it is crucial to spatial feature learning. In addition, it has no learnable parameters, so its modification will not change the pretrained model. Based on the above analysis, we further propose enhanced spatial feature learning (ESFL) for WSOD, which first takes full advantage of multiple kernels in a single pooling layer to handle multiscale objects and then enhances above-average activations within the rectangular neighborhood to alleviate the problem of ignoring unsalient object parts. The experimental results on the PASCAL VOC and the MS COCO benchmarks demonstrate that ESFL can bring significant performance improvement for the WSOD method and achieve state-of-the-art results.
Collapse
|