1
|
Yao Y, Wang L, Zhang L, Yang Y, Li P, Zimmermann R, Shao L. Learning Latent Stable Patterns for Image Understanding With Weak and Noisy Labels. IEEE TRANSACTIONS ON CYBERNETICS 2019; 49:4243-4252. [PMID: 30296245 DOI: 10.1109/tcyb.2018.2861419] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper focuses on weakly supervised image understanding, in which the semantic labels are available only at image-level, without the specific object or scene location in an image. Existing algorithms implicitly assume that image-level labels are error-free, which might be too restrictive. In practice, image labels obtained from the pretrained predictors are easily contaminated. To solve this problem, we propose a novel algorithm for weakly supervised segmentation when only noisy image labels are available during training. More specifically, a semantic space is constructed first by encoding image labels through a graphlet (i.e., superpixel cluster) embedding process. Then, we observe that in the semantic space, the distribution of graphlets from images with a same label remains stable, regardless of the noises in image labels. Therefore, we propose a generative model, called latent stability analysis, to discover the stable patterns from images with noisy labels. Inferring graphlet semantics by making use of these mid-level stable patterns is much more secure and accurate than directly transferring noisy image-level labels into different regions. Finally, we calculate the semantics of each superpixel using maximum majority voting of its correlated graphlets. Comprehensive experimental results show that our algorithm performs impressively when the image labels are predicted by either the hand-crafted or deeply learned image descriptors.
Collapse
|
2
|
Yi S, Wang X, Lu C, Jia J, Li H. $L_0$ Regularized Stationary-Time Estimation for Crowd Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:981-994. [PMID: 28113539 DOI: 10.1109/tpami.2016.2560807] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, we tackle the problem of stationary crowd analysis which is as important as modeling mobile groups in crowd scenes and finds many important applications in crowd surveillance. Our key contribution is to propose a robust algorithm for estimating how long a foreground pixel becomes stationary. It is much more challenging than only subtracting background because failure at a single frame due to local movement of objects, lighting variation, and occlusion could lead to large errors on stationary-time estimation. To achieve robust and accurate estimation, sparse constraints along spatial and temporal dimensions are jointly added by mixed partials (which are second-order gradients) to shape a 3D stationary-time map. It is formulated as an L0 optimization problem. Besides background subtraction, it distinguishes among different foreground objects, which are close or overlapped in the spatio-temporal space by using a locally shared foreground codebook. The proposed technologies are further demonstrated through three applications. 1) Based on the results of stationary-time estimation, 12 descriptors are proposed to detect four types of stationary crowd activities. 2) The averaged stationary-time map is estimated to analyze crowd scene structures. 3) The result of stationary-time estimation is also used to study the influence of stationary crowd groups to traffic patterns.
Collapse
|
3
|
Yi S, Li H, Wang X. Pedestrian Behavior Modeling From Stationary Crowds With Applications to Intelligent Surveillance. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:4354-4368. [PMID: 27416595 DOI: 10.1109/tip.2016.2590322] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Pedestrian behavior modeling and analysis is important for crowd scene understanding and has various applications in video surveillance. Stationary crowd groups are a key factor influencing pedestrian walking patterns but was mostly ignored in the literature. It plays different roles for different pedestrians in a crowded scene and can change over time. In this paper, a novel model is proposed to model pedestrian behaviors by incorporating stationary crowd groups as a key component. Through inference on the interactions between stationary crowd groups and pedestrians, our model can be used to investigate pedestrian behaviors. The effectiveness of the proposed model is demonstrated through multiple applications, including walking path prediction, destination prediction, personality attribute classification, and abnormal event detection. To evaluate our model, two large pedestrian walking route datasets are built. The walking routes of around 15 000 pedestrians from two crowd surveillance videos are manually annotated. The datasets will be released to the public and benefit future research on pedestrian behavior analysis and crowd scene understanding.
Collapse
|
4
|
|
5
|
Shi Z, Hospedales TM, Xiang T. Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:1959-1972. [PMID: 26340253 DOI: 10.1109/tpami.2015.2392769] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We address the problem of localisation of objects as bounding boxes in images and videos with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. In this paper, a novel framework based on Bayesian joint topic modelling is proposed, which differs significantly from the existing ones in that: (1) All foreground object classes are modelled jointly in a single generative model that encodes multiple object co-existence so that "explaining away" inference can resolve ambiguity and lead to better learning and localisation. (2) Image backgrounds are shared across classes to better learn varying surroundings and "push out" objects of interest. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Moreover, the Bayesian formulation enables the exploitation of various types of prior knowledge to compensate for the limited supervision offered by weakly labelled data, as well as Bayesian domain adaptation for transfer learning. Extensive experiments on the PASCAL VOC, ImageNet and YouTube-Object videos datasets demonstrate the effectiveness of our Bayesian joint model for weakly supervised object localisation.
Collapse
|
6
|
Traffic Behavior Recognition Using the Pachinko Allocation Model. SENSORS 2015; 15:16040-59. [PMID: 26151213 PMCID: PMC4541867 DOI: 10.3390/s150716040] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 06/24/2015] [Accepted: 07/01/2015] [Indexed: 11/17/2022]
Abstract
CCTV-based behavior recognition systems have gained considerable attention in recent years in the transportation surveillance domain for identifying unusual patterns, such as traffic jams, accidents, dangerous driving and other abnormal behaviors. In this paper, a novel approach for traffic behavior modeling is presented for video-based road surveillance. The proposed system combines the pachinko allocation model (PAM) and support vector machine (SVM) for a hierarchical representation and identification of traffic behavior. A background subtraction technique using Gaussian mixture models (GMMs) and an object tracking mechanism based on Kalman filters are utilized to firstly construct the object trajectories. Then, the sparse features comprising the locations and directions of the moving objects are modeled by PAMinto traffic topics, namely activities and behaviors. As a key innovation, PAM captures not only the correlation among the activities, but also among the behaviors based on the arbitrary directed acyclic graph (DAG). The SVM classifier is then utilized on top to train and recognize the traffic activity and behavior. The proposed model shows more flexibility and greater expressive power than the commonly-used latent Dirichlet allocation (LDA) approach, leading to a higher recognition accuracy in the behavior classification.
Collapse
|
7
|
Wang J, Fu W, Lu H, Ma S. Bilayer sparse topic model for scene analysis in imbalanced surveillance videos. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2014; 23:5198-5208. [PMID: 25330486 DOI: 10.1109/tip.2014.2363408] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Dynamic scene analysis has become a popular research area especially in video surveillance. The goal of this paper is to mine semantic motion patterns and detect abnormalities deviating from normal ones occurring in complex dynamic scenarios. To address this problem, we propose a data-driven and scene-independent approach, namely, Bilayer sparse topic model (BiSTM), where a given surveillance video is represented by a word-document hierarchical generative process. In this BiSTM, motion patterns are treated as latent topics sparsely distributed over low-level motion vectors, whereas a video clip can be sparsely reconstructed by a mixture of topics (motion pattern). In addition to capture the characteristic of extreme imbalance between numerous typical normal activities and few rare abnormalities in surveillance video data, a one-class constraint is directly imposed on the distribution of documents as a discriminant priori. By jointly learning topics and one-class document representation within a discriminative framework, the topic (pattern) space is more specific and explicit. An effective alternative iteration algorithm is presented for the model learning. Experimental results and comparisons on various public data sets demonstrate the promise of the proposed approach.
Collapse
|
8
|
Leach MJ, Sparks E, Robertson NM. Contextual anomaly detection in crowded surveillance scenes. Pattern Recognit Lett 2014. [DOI: 10.1016/j.patrec.2013.11.018] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Xu J, Denman S, Reddy V, Fookes C, Sridharan S. Real-time video event detection in crowded scenes using MPEG derived features: A multiple instance learning approach. Pattern Recognit Lett 2014. [DOI: 10.1016/j.patrec.2013.11.019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
10
|
|
11
|
Kittler J, Christmas W, de Campos T, Windridge D, Yan F, Illingworth J, Osman M. Domain Anomaly Detection in Machine Perception: A System Architecture and Taxonomy. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014; 36:845-859. [PMID: 26353221 DOI: 10.1109/tpami.2013.209] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We address the problem of anomaly detection in machine perception. The concept of domain anomaly is introduced as distinct from the conventional notion of anomaly used in the literature. We propose a unified framework for anomaly detection which exposes the multifaceted nature of anomalies and suggest effective mechanisms for identifying and distinguishing each facet as instruments for domain anomaly detection. The framework draws on the Bayesian probabilistic reasoning apparatus which clearly defines concepts such as outlier, noise, distribution drift, novelty detection (object, object primitive), rare events, and unexpected events. Based on these concepts we provide a taxonomy of domain anomaly events. One of the mechanisms helping to pinpoint the nature of anomaly is based on detecting incongruence between contextual and noncontextual sensor(y) data interpretation. The proposed methodology has wide applicability. It underpins in a unified way the anomaly detection applications found in the literature. To illustrate some of its distinguishing features, in here the domain anomaly detection methodology is applied to the problem of anomaly detection for a video annotation system.
Collapse
|
12
|
Fu Y, Hospedales TM, Xiang T, Gong S. Learning multimodal latent attributes. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014; 36:303-316. [PMID: 24356351 DOI: 10.1109/tpami.2013.128] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular, we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multimodal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we 1) introduce a concept of semilatent attribute space, expressing user-defined and latent attributes in a unified framework, and 2) propose a novel scalable probabilistic topic model for learning multimodal semilatent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multitask learning, learning with label noise, N-shot transfer learning, and importantly zero-shot learning.
Collapse
Affiliation(s)
- Yanwei Fu
- Queen Mary University of London, London
| | | | - Tao Xiang
- Queen Mary University of London, London
| | | |
Collapse
|
13
|
Zhang Y, Zhang Y, Swears E, Larios N, Wang Z, Ji Q. Modeling temporal interactions with interval temporal bayesian networks for complex activity recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:2468-2483. [PMID: 23969390 DOI: 10.1109/tpami.2013.33] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Complex activities typically consist of multiple primitive events happening in parallel or sequentially over a period of time. Understanding such activities requires recognizing not only each individual event but, more importantly, capturing their spatiotemporal dependencies over different time intervals. Most of the current graphical model-based approaches have several limitations. First, time--sliced graphical models such as hidden Markov models (HMMs) and dynamic Bayesian networks are typically based on points of time and they hence can only capture three temporal relations: precedes, follows, and equals. Second, HMMs are probabilistic finite-state machines that grow exponentially as the number of parallel events increases. Third, other approaches such as syntactic and description-based methods, while rich in modeling temporal relationships, do not have the expressive power to capture uncertainties. To address these issues, we introduce the interval temporal Bayesian network (ITBN), a novel graphical model that combines the Bayesian Network with the interval algebra to explicitly model the temporal dependencies over time intervals. Advanced machine learning methods are introduced to learn the ITBN model structure and parameters. Experimental results show that by reasoning with spatiotemporal dependencies, the proposed model leads to a significantly improved performance when modeling and recognizing complex activities involving both parallel and sequential events.
Collapse
Affiliation(s)
- Yongmian Zhang
- IT Research Division, Konica Minolta Laboratory U.S.A. Inc., 2855 Campus Dr., San Mateo, CA 94403, USA.
| | | | | | | | | | | |
Collapse
|
14
|
Ricci E, Zen G, Sebe N, Messelodi S. A Prototype Learning Framework Using EMD: Application to Complex Scenes Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:513-526. [PMID: 22689076 DOI: 10.1109/tpami.2012.131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In the last decades, many efforts have been devoted to develop methods for automatic scene understanding in the context of video surveillance applications. This paper presents a novel nonobject centric approach for complex scene analysis. Similarly to previous methods, we use low-level cues to individuate atomic activities and create clip histograms. Differently from recent works, the task of discovering high-level activity patterns is formulated as a convex prototype learning problem. This problem results in a simple linear program that can be solved efficiently with standard solvers. The main advantage of our approach is that, using as the objective function the Earth Mover's Distance (EMD), the similarity among elementary activities is taken into account in the learning phase. To improve scalability we also consider some variants of EMD adopting L1 as ground distance for 1D and 2D, linear and circular histograms. In these cases, only the similarity between neighboring atomic activities, corresponding to adjacent histogram bins, is taken into account. Therefore, we also propose an automatic strategy for sorting atomic activities. Experimental results on publicly available datasets show that our method compares favorably with state-of-the-art approaches, often outperforming them.
Collapse
|
15
|
Zhao L, Shang L, Gao Y, Yang Y, Jia X. Video Behavior Analysis Using Topic Models and Rough Sets [Applications Notes]. IEEE COMPUT INTELL M 2013. [DOI: 10.1109/mci.2012.2228597] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
16
|
Fu Y, Hospedales TM, Xiang T, Gong S. Attribute Learning for Understanding Unstructured Social Activity. COMPUTER VISION – ECCV 2012 2012. [DOI: 10.1007/978-3-642-33765-9_38] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|