1
|
Automatic Number Plate Recognition:A Detailed Survey of Relevant Algorithms. SENSORS 2021; 21:s21093028. [PMID: 33925845 PMCID: PMC8123416 DOI: 10.3390/s21093028] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/16/2021] [Accepted: 04/21/2021] [Indexed: 11/19/2022]
Abstract
Technologies and services towards smart-vehicles and Intelligent-Transportation-Systems (ITS), continues to revolutionize many aspects of human life. This paper presents a detailed survey of current techniques and advancements in Automatic-Number-Plate-Recognition (ANPR) systems, with a comprehensive performance comparison of various real-time tested and simulated algorithms, including those involving computer vision (CV). ANPR technology has the ability to detect and recognize vehicles by their number-plates using recognition techniques. Even with the best algorithms, a successful ANPR system deployment may require additional hardware to maximize its accuracy. The number plate condition, non-standardized formats, complex scenes, camera quality, camera mount position, tolerance to distortion, motion-blur, contrast problems, reflections, processing and memory limitations, environmental conditions, indoor/outdoor or day/night shots, software-tools or other hardware-based constraint may undermine its performance. This inconsistency, challenging environments and other complexities make ANPR an interesting field for researchers. The Internet-of-Things is beginning to shape future of many industries and is paving new ways for ITS. ANPR can be well utilized by integrating with RFID-systems, GPS, Android platforms and other similar technologies. Deep-Learning techniques are widely utilized in CV field for better detection rates. This research aims to advance the state-of-knowledge in ITS (ANPR) built on CV algorithms; by citing relevant prior work, analyzing and presenting a survey of extraction, segmentation and recognition techniques whilst providing guidelines on future trends in this area.
Collapse
|
2
|
Liu J, Zhou S, Wu Y, Chen K, Ouyang W, Xu D. Block Proposal Neural Architecture Search. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; 30:15-25. [PMID: 33035163 DOI: 10.1109/tip.2020.3028288] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The existing neural architecture search (NAS) methods usually restrict the search space to the pre-defined types of block for a fixed macro-architecture. However, this strategy will limit the search space and affect architecture flexibility if block proposal search (BPS) is not considered for NAS. As a result, block structure search is the bottleneck in many previous NAS works. In this work, we propose a new evolutionary algorithm referred to as latency EvoNAS (LEvoNAS) for block structure search, and also incorporate it to the NAS framework by developing a novel two-stage framework referred to as Block Proposal NAS (BP-NAS). Comprehensive experimental results on two computer vision tasks demonstrate the superiority of our newly proposed approach over the state-of-the-art lightweight methods. For the classification task on the ImageNet dataset, our BPN-A is better than 1.0-MobileNetV2 with similar latency, and our BPN-B saves 23.7% latency when compared with 1.4-MobileNetV2 with higher top-1 accuracy. Furthermore, for the object detection task on the COCO dataset, our method achieves significant performance improvement than MobileNetV2, which demonstrates the generalization capability of our newly proposed framework.
Collapse
|
3
|
Qi Y, Zhang S, Jiang F, Zhou H, Tao D, Li X. Siamese Local and Global Networks for Robust Face Tracking. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:9152-9164. [PMID: 32941139 DOI: 10.1109/tip.2020.3023621] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Convolutional neural networks (CNNs) have achieved great success in several face-related tasks, such as face detection, alignment and recognition. As a fundamental problem in computer vision, face tracking plays a crucial role in various applications, such as video surveillance, human emotion detection and human-computer interaction. However, few CNN-based approaches are proposed for face (bounding box) tracking. In this paper, we propose a face tracking method based on Siamese CNNs, which takes advantages of powerful representations of hierarchical CNN features learned from massive face images. The proposed method captures discriminative face information at both local and global levels. At the local level, representations for attribute patches (i.e:, eyes, nose and mouth) are learned to distinguish a face from another one, which are robust to pose changes and occlusions. At the global level, representations for each whole face are learned, which take into account the spatial relationships among local patches and facial characters, such as skin color and nevus. In addition, we build a new largescale challenging face tracking dataset to evaluate face tracking methods and to facilitate the research forward in this field. Extensive experiments on the collected dataset demonstrate the effectiveness of our method in comparison to several state-of-theart visual tracking methods.
Collapse
|
4
|
Automatic Identification of Tool Wear Based on Convolutional Neural Network in Face Milling Process. SENSORS 2019; 19:s19183817. [PMID: 31487810 PMCID: PMC6767294 DOI: 10.3390/s19183817] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 09/02/2019] [Accepted: 09/02/2019] [Indexed: 11/16/2022]
Abstract
Monitoring of tool wear in machining process has found its importance to predict tool life, reduce equipment downtime, and tool costs. Traditional visual methods require expert experience and human resources to obtain accurate tool wear information. With the development of charge-coupled device (CCD) image sensor and the deep learning algorithms, it has become possible to use the convolutional neural network (CNN) model to automatically identify the wear types of high-temperature alloy tools in the face milling process. In this paper, the CNN model is developed based on our image dataset. The convolutional automatic encoder (CAE) is used to pre-train the network model, and the model parameters are fine-tuned by back propagation (BP) algorithm combined with stochastic gradient descent (SGD) algorithm. The established ToolWearnet network model has the function of identifying the tool wear types. The experimental results show that the average recognition precision rate of the model can reach 96.20%. At the same time, the automatic detection algorithm of tool wear value is improved by combining the identified tool wear types. In order to verify the feasibility of the method, an experimental system is built on the machine tool. By matching the frame rate of the industrial camera and the machine tool spindle speed, the wear image information of all the inserts can be obtained in the machining gap. The automatic detection method of tool wear value is compared with the result of manual detection by high precision digital optical microscope, the mean absolute percentage error is 4.76%, which effectively verifies the effectiveness and practicality of the method.
Collapse
|
5
|
|
6
|
Buoncompagni S, Maio D, Maltoni D, Papi S. Saliency-based keypoint selection for fast object detection and matching. Pattern Recognit Lett 2015. [DOI: 10.1016/j.patrec.2015.04.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
7
|
Yarlagadda P, Ommer B. Beyond the Sum of Parts: Voting with Groups of Dependent Entities. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:1134-1147. [PMID: 26357338 DOI: 10.1109/tpami.2014.2363456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The high complexity of multi-scale, category-level object detection in cluttered scenes is efficiently handled by Hough voting methods. However, the main shortcoming of the approach is that mutually dependent local observations are independently casting their votes for intrinsically global object properties such as object scale. Object hypotheses are then assumed to be a mere sum of their part votes. Popular representation schemes are, however, based on a dense sampling of semi-local image features, which are consequently mutually dependent. We take advantage of part dependencies and incorporate them into probabilistic Hough voting by deriving an objective function that connects three intimately related problems: i) grouping mutually dependent parts, ii) solving the correspondence problem conjointly for dependent parts, and iii) finding concerted object hypotheses using extended groups rather than based on local observations alone. Early commitments are avoided by not restricting parts to only a single vote for a locally best correspondence and we learn a weighting of parts during training to reflect their differing relevance for an object. Experiments successfully demonstrate the benefit of incorporating part dependencies through grouping into Hough voting. The joint optimization of groupings, correspondences, and votes not only improves the detection accuracy over standard Hough voting and a sliding window baseline, but it also reduces the computational complexity by significantly decreasing the number of candidate hypotheses.
Collapse
|
8
|
Mahmoud TM, Abd-El-Hafeez T, Omar A. An Efficient System for Blocking Pornography Websites. COMPUTER VISION AND IMAGE PROCESSING IN INTELLIGENT SYSTEMS AND MULTIMEDIA TECHNOLOGIES 2014:161-176. [DOI: 10.4018/978-1-4666-6030-4.ch008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
The Internet is a powerful source of information. However, some of the information that is available on the Internet cannot be shown to every type of public. For instance, pornography is not desirable to be shown to children; pornography is the most harmful content affecting child safety and causing many destructive side effects. A content filter is one of more pieces of software that work together to prevent users from viewing material found on the Internet. In this chapter, the authors present an efficient content-based software system for detecting and filtering pornography images in Web pages. The proposed system runs online in the background of Internet Explorer (IE) for the purpose of restricting access to pornography Web pages. Skin and face detection techniques are the main components of the proposed system. Because the proposed filter works online, the authors propose two fasting techniques that can be used to speed up the filtering system. The results obtained using the proposed system are compared with four commercial filtering programs. The success rate of the proposed filtering system is better than the considered filtering programs.
Collapse
|
9
|
Krüger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodríguez-Sánchez AJ, Wiskott L. Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2013; 35:1847-1871. [PMID: 23787340 DOI: 10.1109/tpami.2012.272] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Computational modeling of the primate visual system yields insights of potential relevance to some of the challenges that computer vision is facing, such as object recognition and categorization, motion detection and activity recognition, or vision-based navigation and manipulation. This paper reviews some functional principles and structures that are generally thought to underlie the primate visual cortex, and attempts to extract biological principles that could further advance computer vision research. Organized for a computer vision audience, we present functional principles of the processing hierarchies present in the primate visual system considering recent discoveries in neurophysiology. The hierarchical processing in the primate visual system is characterized by a sequence of different levels of processing (on the order of 10) that constitute a deep hierarchy in contrast to the flat vision architectures predominantly used in today's mainstream computer vision. We hope that the functional description of the deep hierarchies realized in the primate visual system provides valuable insights for the design of computer vision algorithms, fostering increasingly productive interaction between biological and computer vision research.
Collapse
Affiliation(s)
- Norbert Krüger
- Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Campusvej 55, Odense M 5230, Denmark.
| | | | | | | | | | | | | | | |
Collapse
|
10
|
Turk M. A New Biased Discriminant Analysis Using Composite Vectors for Eye Detection. ACTA ACUST UNITED AC 2012; 42:1095-106. [PMID: 22410345 DOI: 10.1109/tsmcb.2012.2186798] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We propose a new biased discriminant analysis (BDA) using composite vectors for eye detection. A composite vector consists of several pixels inside a window on an image. The covariance of composite vectors is obtained from their inner product and can be considered as a generalization of the covariance of pixels. The proposed composite BDA (C-BDA) method is a BDA using the covariance of composite vectors. We construct a hybrid cascade detector for eye detection, using Haar-like features in the earlier stages and composite features obtained from C-BDA in the later stages. The proposed detector runs in real time; its execution time is 5.5 ms on a typical PC. The experimental results for the CMU PIE database and our own real-world data set show that the proposed detector provides robust performance to several kinds of variations such as facial pose, illumination, eyeglasses, and partial occlusion. On the whole, the detection rate per pair of eyes is 98.0% for the 3604 face images of the CMU PIE database and 95.1% for the 2331 face images of the real-world data set. In particular, it provides a 99.7% detection rate for the 2120 CMU PIE images without glasses. Face recognition performance is also investigated using the eye coordinates from the proposed detector. The recognition results for the real-world data set show that the proposed detector gives similar performance to the method using manually located eye coordinates, showing that the accuracy of the proposed eye detector is comparable with that of the ground-truth data.
Collapse
|
11
|
Discriminant Phase Component for Face Recognition. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2012. [DOI: 10.1155/2012/718915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Numerous face recognition techniques have been developed owing to the growing number of real-world applications. Most of current algorithms for face recognition involve considerable amount of computations and hence they cannot be used on devices constrained with limited speed and memory. In this paper, we propose a novel solution for efficient face recognition problem for systems that utilize small memory capacities and demand fast performance. The new technique divides the face images into components and finds the discriminant phases of the Fourier transform of these components automatically using the sequential floating forward search method. A thorough study and comprehensive experiments relating time consumption versus system performance are applied to benchmark face image databases. Finally, the proposed technique is compared with other known methods and evaluated through the recognition rate and the computational time, where we achieve a recognition rate of 98.5% with computational time of 6.4 minutes for a database consisting of 2360 images.
Collapse
|
12
|
Mahmoud TM, Abd-El-Hafeez T, Omar A. A Highly Efficient Content Based Approach to Filter Pornography Websites. INTERNATIONAL JOURNAL OF COMPUTER VISION AND IMAGE PROCESSING 2012; 2:75-90. [DOI: 10.4018/ijcvip.2012010105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
With the ever-growing Web, the Websites with objectionable contents like pornography, violence, racism, etc. have been augmented rapidly during recent years. Among the offensive contents, the pornography is the most harmful one affecting children safety and causing many destructive side effects. A content filter is one or more pieces of software that work together to prevent users from viewing material found on the Internet. This paper presents an efficient content based software system for detecting and filtering pornography images in Web pages. The proposed system runs online in the background of Internet Explorer (IE) for the purpose of restricting access to pornography Web pages. Skin and face detection techniques are the main components of the proposed system. Because the proposed filter works online, the authors propose two fasting techniques that can be used to speed up the filtering system. The results obtained using the proposed system is compared with four commercial filtering programs. The success rate of the proposed filtering system is better than the considered filtering programs.
Collapse
|
13
|
Chang LB, Jin Y, Zhang W, Borenstein E, Geman S. Context, Computation, and Optimal ROC Performance in Hierarchical Models. Int J Comput Vis 2010. [DOI: 10.1007/s11263-010-0391-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
14
|
Sznitman R, Jedynak B. Active testing for face detection and localization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2010; 32:1914-1920. [PMID: 20479494 DOI: 10.1109/tpami.2010.106] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
We provide a novel search technique which uses a hierarchical model and a mutual information gain heuristic to efficiently prune the search space when localizing faces in images. We show exponential gains in computation over traditional sliding window approaches, while keeping similar performance levels.
Collapse
Affiliation(s)
- Raphael Sznitman
- Department of Computer Science, The Johns Hopkins University, CSEB Room 136, 3400 North Charles Street, Baltimore, MD 21218, USA.
| | | |
Collapse
|
15
|
White AG, Cipriani PG, Kao HL, Lees B, Geiger D, Sontag E, Gunsalus KC, Piano F. Rapid and accurate developmental stage recognition of C. elegans from high-throughput image data. PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2010; 2010:3089-3096. [PMID: 22053146 DOI: 10.1109/cvpr.2010.5540065] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a hierarchical principle for object recognition and its application to automatically classify developmental stages of C. elegans animals from a population of mixed stages. The object recognition machine consists of four hierarchical layers, each composed of units upon which evaluation functions output a label score, followed by a grouping mechanism that resolves ambiguities in the score by imposing local consistency constraints. Each layer then outputs groups of units, from which the units of the next layer are derived. Using this hierarchical principle, the machine builds up successively more sophisticated representations of the objects to be classified. The algorithm segments large and small objects, decomposes objects into parts, extracts features from these parts, and classifies them by SVM. We are using this system to analyze phenotypic data from C. elegans high-throughput genetic screens, and our system overcomes a previous bottleneck in image analysis by achieving near real-time scoring of image data. The system is in current use in a functioning C. elegans laboratory and has processed over two hundred thousand images for lab users.
Collapse
Affiliation(s)
- Amelia G White
- Center for Genomics and Systems Biology and Department of Biology, New York University, New York, NY, USA
| | | | | | | | | | | | | | | |
Collapse
|
16
|
Ommer B, Buhmann JM. Learning the compositional nature of visual object categories for recognition. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2010; 32:501-516. [PMID: 20075474 DOI: 10.1109/tpami.2009.22] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Real-world scene understanding requires recognizing object categories in novel visual scenes. This paper describes a composition system that automatically learns structured, hierarchical object representations in an unsupervised manner without requiring manual segmentation or manual object localization. A central concept for learning object models in the challenging, general case of unconstrained scenes, large intraclass variations, large numbers of categories, and lacking supervision information is to exploit the compositional nature of our (visual) world. The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models statistically and computationally tractable. We propose a robust descriptor for local image parts and show how characteristic compositions of parts can be learned that are based on an unspecific part vocabulary shared between all categories. Moreover, a Bayesian network is presented that comprises all the compositional constituents together with scene context and object shape. Object recognition is then formulated as a statistical inference problem in this probabilistic model.
Collapse
Affiliation(s)
- Björn Ommer
- University of California, Berkeley, CA, USA.
| | | |
Collapse
|
17
|
Chen Y, Zhu LL, Yuille A, Zhang H. Unsupervised learning of probabilistic object models (POMs) for object classification, segmentation, and recognition using knowledge propagation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2009; 31:1747-1761. [PMID: 19696447 DOI: 10.1109/tpami.2009.95] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We present a method to learn probabilistic object models (POMs) with minimal supervision, which exploit different visual cues and perform tasks such as classification, segmentation, and recognition. We formulate this as a structure induction and learning task and our strategy is to learn and combine elementary POMs that make use of complementary image cues. We describe a novel structure induction procedure, which uses knowledge propagation to enable POMs to provide information to other POMs and "teach them" (which greatly reduces the amount of supervision required for training and speeds up the inference). In particular, we learn a POM-IP defined on Interest Points using weak supervision [1], [2] and use this to train a POM-mask, defined on regional features, which yields a combined POM that performs segmentation/localization. This combined model can be used to train POM-edgelets, defined on edgelets, which gives a full POM with improved performance on classification. We give detailed experimental analysis on large data sets for classification and segmentation with comparison to other methods. Inference takes five seconds while learning takes approximately four hours. In addition, we show that the full POM is invariant to scale and rotation of the object (for learning and inference) and can learn hybrid objects classes (i.e., when there are several objects and the identity of the object in each image is unknown). Finally, we show that POMs can be used to match between different objects of the same category, and hence, enable objects recognition.
Collapse
Affiliation(s)
- Yuanhao Chen
- University of Science and Technology of China, Hefei, People's Republic of China.
| | | | | | | |
Collapse
|
18
|
Zhu L, Chen Y, Yuille A. Unsupervised learning of Probabilistic Grammar-Markov Models for object categories. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2009; 31:114-128. [PMID: 19029550 DOI: 10.1109/tpami.2008.67] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
We introduce a Probabilistic Grammar-Markov Model (PGMM) which couples probabilistic context free grammars and Markov Random Fields. These PGMMs are generative models defined over attributed features and are used to detect and classify objects in natural images. PGMMs are designed so that they can perform rapid inference, parameter learning, and the more difficult task of structure induction. PGMMs can deal with unknown 2D pose (position, orientation, and scale) in both inference and learning, different appearances, or aspects, of the model. The PGMMs can be learnt in an unsupervised manner where the image can contain one of an unknown number of objects of different categories or even be pure background. We first study the weakly supervised case, where each image contains an example of the (single) object of interest, and then generalize to less supervised cases. The goal of this paper is theoretical but, to provide proof of concept, we demonstrate results from this approach on a subset of the Caltech dataset (learning on a training set and evaluating on a testing set). Our results are generally comparable with the current state of the art, and our inference is performed in less than five seconds.
Collapse
Affiliation(s)
- Long Zhu
- Department of Statistics, UCLA, Los Angeles, CA 90095, USA.
| | | | | |
Collapse
|
19
|
Destrero A, De Mol C, Odone F, Verri A. A sparsity-enforcing method for learning face features. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2009; 18:188-201. [PMID: 19095529 DOI: 10.1109/tip.2008.2007610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this paper, we propose a new trainable system for selecting face features from over-complete dictionaries of image measurements. The starting point is an iterative thresholding algorithm which provides sparse solutions to linear systems of equations. Although the proposed methodology is quite general and could be applied to various image classification tasks, we focus here on the case study of face and eyes detection. For our initial representation, we adopt rectangular features in order to allow straightforward comparisons with existing techniques. For computational efficiency and memory saving requirements, instead of implementing the full optimization scheme on tenths of thousands of features, we propose a three-stage architecture which consists of finding first intermediate solutions to smaller size optimization problems, then merging the obtained results, and next applying further selection procedures. The devised system requires the solution of a number of independent problems, and, hence, the necessary computations could be implemented in parallel. Experimental results obtained on both benchmark and newly acquired face and eyes images indicate that our method is a serious competitor to other feature selection schemes recently popularized in computer vision for dealing with problems of real-time object detection. A major advantage of the proposed system is that it performs well even with relatively small training sets.
Collapse
Affiliation(s)
- Augusto Destrero
- Department of Computer and Information Sciences, Università di Genova, Genova, Italy.
| | | | | | | |
Collapse
|
20
|
Wu J, Brubaker SC, Mullin MD, Rehg JM. Fast asymmetric learning for cascade face detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2008; 30:369-382. [PMID: 18195433 DOI: 10.1109/tpami.2007.1181] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
A cascade face detector uses a sequence of node classifiers to distinguish faces from non-faces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper. The first is a categorization of asymmetries in the learning goal, and why they make face detection hard. The second is the Forward Feature Selection (FFS) algorithm and a fast pre- omputing strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is Linear Asymmetric Classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem. We demonstrated experimentally that LAC results in improved ensemble classifier performance.
Collapse
Affiliation(s)
- Jianxin Wu
- School of Interactive Computing, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0760, USA.
| | | | | | | |
Collapse
|
21
|
Dollár P, Babenko B, Belongie S, Perona P, Tu Z. Multiple Component Learning for Object Detection. LECTURE NOTES IN COMPUTER SCIENCE 2008. [DOI: 10.1007/978-3-540-88688-4_16] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
22
|
Carneiro G, Jepson AD. Flexible spatial configuration of local image features. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2007; 29:2089-2104. [PMID: 17934220 DOI: 10.1109/tpami.2007.1126] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Local image features have been designed to be informative and repeatable under rigid transformations and illumination deformations. Even though current state-of-the-art local image features present a high degree of repeatability, their local appearance alone usually does not bring enough discriminative power to support a reliable matching, resulting in a relatively high number of mismatches in the correspondence set formed during the data association procedure. As a result, geometric filters, commonly based on global spatial configuration, have been used to reduce this number of mismatches. However, this approach presents a trade off between the effectiveness to reject mismatches and the robustness to non-rigid deformations. In this paper, we propose two geometric filters, based on semilocal spatial configuration of local features, that are designed to be robust to non-rigid deformations and to rigid transformations, without compromising its efficacy to reject mismatches. We compare our methods to the Hough transform, which is an efficient and effective mismatch rejection step based on global spatial configuration of features. In these comparisons, our methods are shown to be more effective in the task of rejecting mismatches for rigid transformations and non-rigid deformations at comparable time complexity figures. Finally, we demonstrate how to integrate these methods in a probabilistic recognition system such that the final verification step uses not only the similarity between features, but also their semi-local configuration.
Collapse
Affiliation(s)
- Gustavo Carneiro
- Siemens Corporate Research, Integrated Data Systems Department, Princeton, NJ 08540, USA.
| | | |
Collapse
|
23
|
Ferencz A, Learned-Miller EG, Malik J. Learning to Locate Informative Features for Visual Identification. Int J Comput Vis 2007. [DOI: 10.1007/s11263-007-0093-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
24
|
|
25
|
|
26
|
Abstract
Pattern recognition systems that are invariant to shape, pose, lighting and texture are never sufficiently selective; they suffer a high rate of "false alarms". How are biological vision systems both invariant and selective? Specifically, how are proper arrangements of sub-patterns distinguished from the chance arrangements that defeat selectivity in artificial systems? The answer may lie in the nonlinear dynamics that characterize complex and other invariant cell types: these cells are temporarily more receptive to some inputs than to others (functional connectivity). One consequence is that pairs of such cells with overlapping receptive fields will possess a related property that might be termed functional common input. Functional common input would induce high correlation exactly when there is a match in the sub-patterns appearing in the overlapping receptive fields. These correlations, possibly expressed as a partial and highly local synchrony, would preserve the selectivity otherwise lost to invariance.
Collapse
Affiliation(s)
- Stuart Geman
- Division of Applied Mathematics, Brown University Providence, RI 02912, USA.
| |
Collapse
|
27
|
Chalmond B, Francesconi B, Herbin S. Using hidden scale for salient object detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2006; 15:2644-56. [PMID: 16948309 DOI: 10.1109/tip.2006.877380] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
This paper describes a method for detecting salient regions in remote-sensed images, based on scale and contrast interaction. We consider the focus on salient structures as the first stage of an object detection/recognition algorithm, where the salient regions are those likely to contain objects of interest. Salient objects are modeled as spatially localized and contrasted structures with any kind of shape or size. Their detection exploits a probabilistic mixture model that takes two series of multiscale features as input, one that is more sensitive to contrast information, and one that is able to select scale. The model combines them to classify each pixel in salient/nonsalient class, giving a binary segmentation of the image. The few parameters are learned with an EM-type algorithm.
Collapse
Affiliation(s)
- Bernard Chalmond
- Centre de Mathématiques et de Leurs Applications, CNRS (UMR 8536), Ecole Normale Supérieure de Cachan, 94235 Cachan, France.
| | | | | |
Collapse
|
28
|
Fergus R, Perona P, Zisserman A. Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition. Int J Comput Vis 2006. [DOI: 10.1007/s11263-006-8707-x] [Citation(s) in RCA: 154] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
29
|
Fei-Fei L, Fergus R, Perona P. One-shot learning of object categories. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2006; 28:594-611. [PMID: 16566508 DOI: 10.1109/tpami.2006.79] [Citation(s) in RCA: 478] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
Collapse
Affiliation(s)
- Li Fei-Fei
- University of Illinois Urbana-Champaign, 405 N. Mathews Ave., MC 251, Urbana, IL 61801, USA.
| | | | | |
Collapse
|
30
|
|
31
|
|
32
|
Amit Y, Koloydenko A, Niyogi P. Robust acoustic object detection. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:2634-48. [PMID: 16266183 DOI: 10.1121/1.2011411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
We consider a novel approach to the problem of detecting phonological objects like phonemes, syllables, or words, directly from the speech signal. We begin by defining local features in the time-frequency plane with built in robustness to intensity variations and time warping. Global templates of phonological objects correspond to the coincidence in time and frequency of patterns of the local features. These global templates are constructed by using the statistics of the local features in a principled way. The templates have clear phonetic interpretability, are easily adaptable, have built in invariances, and display considerable robustness in the face of additive noise and clutter from competing speakers. We provide a detailed evaluation of the performance of some diphone detectors and a word detector based on this approach. We also perform some phonetic classification experiments based on the edge-based features suggested here.
Collapse
Affiliation(s)
- Yali Amit
- Departments of Computer Science and Statistics, The University of Chicago, Hyde Park, Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
33
|
Abstract
We present a face detection method using spectral histograms and support vector machines (SVMs). Each image window is represented by its spectral histogram, which is a feature vector consisting of histograms of filtered images. Using statistical sampling, we show systematically the representation groups face images together; in comparison, commonly used representations often do not exhibit this necessary and desirable property. By using an SVM trained on a set of 4500 face and 8000 nonface images, we obtain a robust classifying function for face and non-face patterns. With an effective illumination-correction algorithm, our system reliably discriminates face and nonface patterns in images under different kinds of conditions. Our method on two commonly used data sets give the best performance among recent face-detection ones. We attribute the high performance to the desirable properties of the spectral histogram representation and good generalization of SVMs. Several further improvements in computation time and in performance are discussed.
Collapse
Affiliation(s)
- Christopher A Waring
- Department of Computer Science, The Florida State University, Tallahassee, FL 32306, USA.
| | | |
Collapse
|
34
|
|
35
|
Amit Y, Geman D, Fan X. A coarse-to-fine strategy for multiclass shape detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2004; 26:1606-1621. [PMID: 15573821 DOI: 10.1109/tpami.2004.111] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Multiclass shape detection, in the sense of recognizing and localizing instances from multiple shape classes, is formulated as a two-step process in which local indexing primes global interpretation. During indexing a list of instantiations (shape identities and poses) is compiled, constrained only by no missed detections at the expense of false positives. Global information, such as expected relationships among poses, is incorporated afterward to remove ambiguities. This division is motivated by computational efficiency. In addition, indexing itself is organized as a coarse-to-fine search simultaneously in class and pose. This search can be interpreted as successive approximations to likelihood ratio tests arising from a simple ("naive Bayes") statistical model for the edge maps extracted from the original images. The key to constructing efficient "hypothesis tests" for multiple classes and poses is local ORing; in particular, spread edges provide imprecise but common and locally invariant features. Natural tradeoffs then emerge between discrimination and the pattern of spreading. These are analyzed mathematically within the model-based framework and the whole procedure is illustrated by experiments in reading license plates.
Collapse
Affiliation(s)
- Yali Amit
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
36
|
Agarwal S, Awan A, Roth D. Learning to detect objects in images via a sparse, part-based representation. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2004; 26:1475-1490. [PMID: 15521495 DOI: 10.1109/tpami.2004.108] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
We study the problem of detecting objects in still, gray-scale images. Our primary focus is the development of a learning-based approach to the problem that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in previous work. A secondary focus of this paper is to highlight these issues and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.
Collapse
Affiliation(s)
- Shivani Agarwal
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave., Urbana, IL 61801, USA.
| | | | | |
Collapse
|
37
|
Fergus R, Perona P, Zisserman A. A Visual Category Filter for Google Images. LECTURE NOTES IN COMPUTER SCIENCE 2004. [DOI: 10.1007/978-3-540-24670-1_19] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
38
|
Chi Z, Rauske PL, Margoliash D. Pattern Filtering for Detection of Neural Activity, with Examples from HVc Activity During Sleep in Zebra Finches. Neural Comput 2003; 15:2307-37. [PMID: 14511523 DOI: 10.1162/089976603322362374] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The detection of patterned spiking activity is important in the study of neural coding. A pattern filtering approach is developed for pattern detection under the framework of point processes, which offers flexibility in combining temporal details and firing rates. The detection combines multiple steps of filtering in a coarse-to-fine manner. Under some conditional Poisson assumptions on the spiking activity, each filtering step is equivalent to classifying by likelihood ratios all the data segments as targets or as background sequences. Unlike previous studies, where global surrogate data were used to evaluate the statistical significance of the detected patterns, a localizedp-test procedure is developed, which better accounts for firing modulation and nonstationarity in spiking activity. Common temporal structures of patterned activity are learned using an entropy-based alignment procedure, without relying on metrics or pair-wise alignment. Applications of pattern filtering to single, presumptive interneurons recorded in the nucleus HVc of zebra finch are illustrated. These demonstrate a match between the auditory-evoked response to playback of the individual bird's own song and spontaneous activity during sleep. Small temporal compression or expansion, or both, is required for optimal matching of spontaneous patterns to stimulus-evoked activity.
Collapse
Affiliation(s)
- Zhiyi Chi
- Department of Statistics, Committee on Computational Neuroscience, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
39
|
Abstract
We describe an architecture for invariant visual detection and recognition. Learning is performed in a single central module. The architecture makes use of a replica module consisting of copies of retinotopic layers of local features, with a particular design of inputs and outputs, that allows them to be primed either to attend to a particular location, or to attend to a particular object representation. In the former case the data at a selected location can be classified in the central module. In the latter case all instances of the selected object are detected in the field of view. The architecture is used to explain a number of psychophysical and physiological observations: object based attention, the different response time slopes of target detection among distractors, and observed attentional modulation of neuronal responses. We hypothesize that the organization of visual cortex in columns of neurons responding to the same feature at the same location may provide the copying architecture needed for translation invariance.
Collapse
Affiliation(s)
- Yali Amit
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA.
| | | |
Collapse
|
40
|
Grigorescu C, Petkov N. Distance sets for shape filters and shape recognition. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2003; 12:1274-1286. [PMID: 18237892 DOI: 10.1109/tip.2003.816010] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
We introduce a novel rich local descriptor of an image point, we call the (labeled) distance set, which is determined by the spatial arrangement of image features around that point. We describe a two-dimensional (2D) visual object by the set of (labeled) distance sets associated with the feature points of that object. Based on a dissimilarity measure between (labeled) distance sets and a dissimilarity measure between sets of (labeled) distance sets, we address two problems that are often encountered in object recognition: object segmentation, for which we formulate a distance sets shape filter, and shape matching. The use of the shape filter is illustrated on printed and handwritten character recognition and detection of traffic signs in complex scenes. The shape comparison procedure is illustrated on handwritten character classification, COIL-20 database object recognition and MPEG-7 silhouette database retrieval.
Collapse
|
41
|
Abstract
Shape primitives have long been proposed as components for object models in the visual system, and account for a considerable body of behavioral findings. While a large amount of effort has been devoted to the study of detection of these parts in the scenes, no research has been undertaken simulating the acquisition of these representations. We present a model which suggests how the shape primitives may be learned by experience in a self-organized fashion. This model offers the first successful unsupervised learning of shape primitives which are as complex as object parts and can serve as intermediate representations for various objects. The algorithm uses synthetic gray-level objects, each composed of several parts (primitives or else), and shape primitives emerge as a result of partial matches between several objects. Our algorithm does not use any a priori knowledge about any attributes of the patterns to be learned; and the recurrence of these visual patterns in various objects is the only basis for their emergence as new features.
Collapse
Affiliation(s)
- Ladan Shams
- Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA.
| | | |
Collapse
|
42
|
Abstract
A learning account for the problem of object recognition is developed within the probably approximately correct (PAC) model of learnability. The key assumption underlying this work is that objects can be recognized (or discriminated) using simple representations in terms of syntactically simple relations over the raw image. Although the potential number of these simple relations could be huge, only a few of them are actually present in each observed image, and a fairly small number of those observed are relevant to discriminating an object. We show that these properties can be exploited to yield an efficient learning approach in terms of sample and computational complexity within the PAC model. No assumptions are needed on the distribution of the observed objects, and the learning performance is quantified relative to its experience. Most important, the success of learning an object representation is naturally tied to the ability to represent it as a function of some intermediate representations extracted from the image. We evaluate this approach in a large-scale experimental study in which the SNoW learning architecture is used to learn representations for the 100 objects in the Columbia Object Image Library. Experimental results exhibit good generalization and robustness properties of the SNoW-based method relative to other approaches. SNoW's recognition rate degrades more gracefully when the training data contains fewer views, and it shows similar behavior in some preliminary experiments with partially occluded objects.
Collapse
Affiliation(s)
- Dan Roth
- Department of Computer Science, Beckman Institute, Urbana, IL 61801, U.S.A.
| | | | | |
Collapse
|
43
|
|
44
|
On Affine Invariant Clustering and Automatic Cast Listing in Movies. COMPUTER VISION — ECCV 2002 2002. [DOI: 10.1007/3-540-47977-5_20] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
45
|
|
46
|
Abstract
We describe a system of thousands of binary perceptrons with coarse-oriented edges as input that is able to recognize shapes, even in a context with hundreds of classes. The perceptrons have randomized feedforward connections from the input layer and form a recurrent network among themselves. Each class is represented by a prelearned attractor (serving as an associative hook) in the recurrent net corresponding to a randomly selected subpopulation of the perceptrons. In training, first the attractor of the correct class is activated among the perceptrons; then the visual stimulus is presented at the input layer. The feedforward connections are modified using field-dependent Hebbian learning with positive synapses, which we show to be stable with respect to large variations in feature statistics and coding levels and allows the use of the same threshold on all perceptrons. Recognition is based on only the visual stimuli. These activate the recurrent network, which is then driven by the dynamics to a sustained attractor state, concentrated in the correct class subset and providing a form of working memory. We believe this architecture is more transparent than standard feedforward two-layer networks and has stronger biological analogies.
Collapse
Affiliation(s)
- Y Amit
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA
| | | |
Collapse
|
47
|
Abstract
Understanding how biological visual systems recognize objects is one of the ultimate goals in computational neuroscience. From the computational viewpoint of learning, different recognition tasks, such as categorization and identification, are similar, representing different trade-offs between specificity and invariance. Thus, the different tasks do not require different classes of models. We briefly review some recent trends in computational vision and then focus on feedforward, view-based models that are supported by psychophysical and physiological data.
Collapse
Affiliation(s)
- M Riesenhuber
- Department of Brain and Cognitive Sciences, McGovern Institute for Brain Research, Center for Biological and Computational Learning and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge 02142, USA
| | | |
Collapse
|
48
|
Abstract
This article describes a parallel neural net architecture for efficient and robust visual selection in generic gray-level images. Objects are represented through flexible star-type planar arrangements of binary local features which are in turn star-type planar arrangements of oriented edges. Candidate locations are detected over a range of scales and other deformations, using a generalized Hough transform. The flexibility of the arrangements provides the required invariance. Training involves selecting a small number of stable local features from a predefined pool, which are well localized on registered examples of the object. Training therefore requires only small data sets. The parallel architecture is constructed so that the Hough transform associated with any object can be implemented without creating or modifying any connections. The different object representations are learned and stored in a central module. When one of these representations is evoked, it "primes" the appropriate layers in the network so that the corresponding Hough transform is computed. Analogies between the different layers in the network and those in the visual system are discussed. Furthermore, the model can be used to explain certain experiments on visual selection reported in the literature.
Collapse
Affiliation(s)
- Y Amit
- Department of Statistics, University of Chicago, IL 60637, USA
| |
Collapse
|
49
|
|