1
|
|
2
|
|
3
|
Dollár P, Appel R, Belongie S, Perona P. Fast Feature Pyramids for Object Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2014; 36:1532-45. [PMID: 26353336 DOI: 10.1109/tpami.2014.2300479] [Citation(s) in RCA: 273] [Impact Index Per Article: 27.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Multi-resolution image features may be approximated via extrapolation from nearby scales, rather than being computed explicitly. This fundamental insight allows us to design object detection algorithms that are as accurate, and considerably faster, than the state-of-the-art. The computational bottleneck of many modern detectors is the computation of features at every scale of a finely-sampled image pyramid. Our key insight is that one may compute finely sampled feature pyramids at a fraction of the cost, without sacrificing performance: for a broad family of features we find that features computed at octave-spaced scale intervals are sufficient to approximate features on a finely-sampled pyramid. Extrapolation is inexpensive as compared to direct feature computation. As a result, our approximation yields considerable speedups with negligible loss in detection accuracy. We modify three diverse visual recognition systems to use fast feature pyramids and show results on both pedestrian detection (measured on the Caltech, INRIA, TUD-Brussels and ETH data sets) and general object detection (measured on the PASCAL VOC). The approach is general and is widely applicable to vision algorithms requiring fine-grained multi-scale analysis. Our approximation is valid for images with broad spectra (most natural images) and fails for images with narrow band-pass spectra (e.g., periodic textures).
Collapse
|
4
|
|
5
|
|
6
|
Gualdi G, Prati A, Cucchiara R. Multistage particle windows for fast and accurate object detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2012; 34:1589-1604. [PMID: 22184258 DOI: 10.1109/tpami.2011.247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The common paradigm employed for object detection is the sliding window (SW) search. This approach generates grid-distributed patches, at all possible positions and sizes, which are evaluated by a binary classifier: The tradeoff between computational burden and detection accuracy is the real critical point of sliding windows; several methods have been proposed to speed up the search such as adding complementary features. We propose a paradigm that differs from any previous approach since it casts object detection into a statistical-based search using a Monte Carlo sampling for estimating the likelihood density function with Gaussian kernels. The estimation relies on a multistage strategy where the proposal distribution is progressively refined by taking into account the feedback of the classifiers. The method can be easily plugged into a Bayesian-recursive framework to exploit the temporal coherency of the target objects in videos. Several tests on pedestrian and face detection, both on images and videos, with different types of classifiers (cascade of boosted classifiers, soft cascades, and SVM) and features (covariance matrices, Haar-like features, integral channel features, and histogram of oriented gradients) demonstrate that the proposed method provides higher detection rates and accuracy as well as a lower computational burden w.r.t. sliding window detection.
Collapse
Affiliation(s)
- Giovanni Gualdi
- Department of Information Engineering, University of Modena and Reggio Emilia, Modena, Italy.
| | | | | |
Collapse
|
7
|
Turk M. A New Biased Discriminant Analysis Using Composite Vectors for Eye Detection. ACTA ACUST UNITED AC 2012; 42:1095-106. [PMID: 22410345 DOI: 10.1109/tsmcb.2012.2186798] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We propose a new biased discriminant analysis (BDA) using composite vectors for eye detection. A composite vector consists of several pixels inside a window on an image. The covariance of composite vectors is obtained from their inner product and can be considered as a generalization of the covariance of pixels. The proposed composite BDA (C-BDA) method is a BDA using the covariance of composite vectors. We construct a hybrid cascade detector for eye detection, using Haar-like features in the earlier stages and composite features obtained from C-BDA in the later stages. The proposed detector runs in real time; its execution time is 5.5 ms on a typical PC. The experimental results for the CMU PIE database and our own real-world data set show that the proposed detector provides robust performance to several kinds of variations such as facial pose, illumination, eyeglasses, and partial occlusion. On the whole, the detection rate per pair of eyes is 98.0% for the 3604 face images of the CMU PIE database and 95.1% for the 2331 face images of the real-world data set. In particular, it provides a 99.7% detection rate for the 2120 CMU PIE images without glasses. Face recognition performance is also investigated using the eye coordinates from the proposed detector. The recognition results for the real-world data set show that the proposed detector gives similar performance to the method using manually located eye coordinates, showing that the accuracy of the proposed eye detector is comparable with that of the ground-truth data.
Collapse
|
8
|
Dollár P, Appel R, Kienzle W. Crosstalk Cascades for Frame-Rate Pedestrian Detection. COMPUTER VISION – ECCV 2012 2012. [DOI: 10.1007/978-3-642-33709-3_46] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
9
|
Efficient Scale and Rotation Invariant Object Detection Based on HOGs and Evolutionary Optimization Techniques. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/978-3-642-33179-4_22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
10
|
Shen J, Sun C, Yang W, Wang Z, Sun Z. A novel distribution-based feature for rapid object detection. Neurocomputing 2011. [DOI: 10.1016/j.neucom.2011.03.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
11
|
Descampe A, De Vleeschouwer C, Vandergheynst P, Macq B. Scalable feature extraction for coarse-to-fine JPEG 2000 image classification. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2011; 20:2636-2649. [PMID: 21411407 DOI: 10.1109/tip.2011.2126584] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In this paper, we address the issues of analyzing and classifying JPEG 2000 code-streams. An original representation, called integral volume, is first proposed to compute local image features progressively from the compressed code-stream, on any spatial image area, regardless of the code-blocks borders. Then, a JPEG 2000 classifier is presented that uses integral volumes to learn an ensemble of randomized trees. Several classification tasks are performed on various JPEG 2000 image databases and results are in the same range as the ones obtained in the literature with noncompressed versions of these databases. Finally, a cascade of such classifiers is considered, in order to specifically address the image retrieval issue, i.e., bi-class problems characterized by a highly skewed distribution. An efficient way to learn and optimize such cascade is proposed. We show that staying in a JPEG 2000 framework, initially seen as a constraint to avoid heavy decoding operations, is actually an advantage as it can benefit from the multiresolution and multilayer paradigms inherently present in this compression standard. In particular, unlike other existing cascaded retrieval systems, the features used along our cascade are increasingly discriminant and lead therefore to a better tradeoff of complexity versus performance.
Collapse
Affiliation(s)
- Antonin Descampe
- Institute of Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), Universite Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium.
| | | | | | | |
Collapse
|
12
|
|
13
|
Liu D, Hua G, Chen T. A hierarchical visual model for video object summarization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2010; 32:2178-2190. [PMID: 20975116 DOI: 10.1109/tpami.2010.31] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
We propose a novel method for removing irrelevant frames from a video given user-provided frame-level labeling for a very small number of frames. We first hypothesize a number of windows which possibly contain the object of interest, and then determine which window(s) truly contain the object of interest. Our method enjoys several favorable properties. First, compared to approaches where a single descriptor is used to describe a whole frame, each window's feature descriptor has the chance of genuinely describing the object of interest; hence it is less affected by background clutter. Second, by considering the temporal continuity of a video instead of treating frames as independent, we can hypothesize the location of the windows more accurately. Third, by infusing prior knowledge into the patch-level model, we can precisely follow the trajectory of the object of interest. This allows us to largely reduce the number of windows and hence reduce the chance of overfitting the data during learning. We demonstrate the effectiveness of the method by comparing it to several other semi-supervised learning approaches on challenging video clips.
Collapse
Affiliation(s)
- David Liu
- Siemens Corporate Research, 755 College Rd. E, Princeton, NJ 08540, USA.
| | | | | |
Collapse
|
14
|
Chang LB, Jin Y, Zhang W, Borenstein E, Geman S. Context, Computation, and Optimal ROC Performance in Hierarchical Models. Int J Comput Vis 2010. [DOI: 10.1007/s11263-010-0391-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Sznitman R, Jedynak B. Active testing for face detection and localization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2010; 32:1914-1920. [PMID: 20479494 DOI: 10.1109/tpami.2010.106] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
We provide a novel search technique which uses a hierarchical model and a mutual information gain heuristic to efficiently prune the search space when localizing faces in images. We show exponential gains in computation over traditional sliding window approaches, while keeping similar performance levels.
Collapse
Affiliation(s)
- Raphael Sznitman
- Department of Computer Science, The Johns Hopkins University, CSEB Room 136, 3400 North Charles Street, Baltimore, MD 21218, USA.
| | | |
Collapse
|
16
|
Wang CW, Hunter A, Gravill N, Matusiewicz S. Real time pose recognition of covered human for diagnosis of sleep apnoea. Comput Med Imaging Graph 2010; 34:523-33. [DOI: 10.1016/j.compmedimag.2009.11.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2009] [Revised: 08/07/2009] [Accepted: 11/08/2009] [Indexed: 10/20/2022]
|
17
|
|
18
|
Sapp B, Toshev A, Taskar B. Cascaded Models for Articulated Pose Estimation. COMPUTER VISION – ECCV 2010 2010. [DOI: 10.1007/978-3-642-15552-9_30] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
19
|
Lee N, Laine AF, Smith R. Coarse to fine segmentation of Stargardt rings using an expert guided dual ellipse model. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2008:2250-3. [PMID: 19163147 DOI: 10.1109/iembs.2008.4649644] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Computer aided diagnosis in the medical image domain requires adaptive knowledge-based models to handle uncertainty, ambiguity, and noise. We propose an expert guided coupled dual ellipse model in a coarse to fine energy minimization framework. In our approach we enforce subspace model constraints by fusing domain knowledge and model information to guide the segmentation process on the fly. We apply our method to the task of retinal Stargardt segmentation a disease that manifests itself in a ring like structure around the macula. Quantitative evaluations on synthetic and real data sets show the performance of our framework. Experimental results demonstrate that our framework performance well with an area under the ROC curve of 0.93.
Collapse
Affiliation(s)
- Noah Lee
- Biomedical Engineering Department, Columbia University, New York, 10027 USA.
| | | | | |
Collapse
|
20
|
Destrero A, De Mol C, Odone F, Verri A. A sparsity-enforcing method for learning face features. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2009; 18:188-201. [PMID: 19095529 DOI: 10.1109/tip.2008.2007610] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
In this paper, we propose a new trainable system for selecting face features from over-complete dictionaries of image measurements. The starting point is an iterative thresholding algorithm which provides sparse solutions to linear systems of equations. Although the proposed methodology is quite general and could be applied to various image classification tasks, we focus here on the case study of face and eyes detection. For our initial representation, we adopt rectangular features in order to allow straightforward comparisons with existing techniques. For computational efficiency and memory saving requirements, instead of implementing the full optimization scheme on tenths of thousands of features, we propose a three-stage architecture which consists of finding first intermediate solutions to smaller size optimization problems, then merging the obtained results, and next applying further selection procedures. The devised system requires the solution of a number of independent problems, and, hence, the necessary computations could be implemented in parallel. Experimental results obtained on both benchmark and newly acquired face and eyes images indicate that our method is a serious competitor to other feature selection schemes recently popularized in computer vision for dealing with problems of real-time object detection. A major advantage of the proposed system is that it performs well even with relatively small training sets.
Collapse
Affiliation(s)
- Augusto Destrero
- Department of Computer and Information Sciences, Università di Genova, Genova, Italy.
| | | | | | | |
Collapse
|
21
|
Cortés L, Amit Y. Efficient annotation of vesicle dynamics video microscopy. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2008; 30:1998-2010. [PMID: 18787247 DOI: 10.1109/tpami.2008.84] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
We describe an algorithm for the efficient annotation of events of interest in video microscopy. The specific application involves the detection and tracking of multiple p ossibly overlapping vesicles in total internal reflection fluorescent microscopy images. A st atistical model for the dynamic image data of vesicle configurations allows us to properly weight various hypotheses online. The goal is to find the most likely trajectories given a sequence of images. The computational challenge is addressed by defining a sequence of coarse-to-fine tests, derived from the statistical model, to quickly eliminate most candidate positions at each time frame. The computational load of the tests is initially very low and gradually in creases as the false positives become more difficult to eliminate. Only at the last step, state variables are estimated from a complete time- dependent model. Processing time thus mainly depends on the number of vesicles in the image and not on image size.
Collapse
Affiliation(s)
- Leandro Cortés
- Department of Computer Science, University of Chicago, 1100 E. 58th St., Chicago, IL 60637, USA.
| | | |
Collapse
|
22
|
|
23
|
Wu J, Brubaker SC, Mullin MD, Rehg JM. Fast asymmetric learning for cascade face detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2008; 30:369-382. [PMID: 18195433 DOI: 10.1109/tpami.2007.1181] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
A cascade face detector uses a sequence of node classifiers to distinguish faces from non-faces. This paper presents a new approach to design node classifiers in the cascade detector. Previous methods used machine learning algorithms that simultaneously select features and form ensemble classifiers. We argue that if these two parts are decoupled, we have the freedom to design a classifier that explicitly addresses the difficulties caused by the asymmetric learning goal. There are three contributions in this paper. The first is a categorization of asymmetries in the learning goal, and why they make face detection hard. The second is the Forward Feature Selection (FFS) algorithm and a fast pre- omputing strategy for AdaBoost. FFS and the fast AdaBoost can reduce the training time by approximately 100 and 50 times, in comparison to a naive implementation of the AdaBoost feature selection method. The last contribution is Linear Asymmetric Classifier (LAC), a classifier that explicitly handles the asymmetric learning goal as a well-defined constrained optimization problem. We demonstrated experimentally that LAC results in improved ensemble classifier performance.
Collapse
Affiliation(s)
- Jianxin Wu
- School of Interactive Computing, College of Computing, Georgia Institute of Technology, Atlanta, GA 30332-0760, USA.
| | | | | | | |
Collapse
|
24
|
|
25
|
Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners. LECTURE NOTES IN COMPUTER SCIENCE 2008. [DOI: 10.1007/978-3-540-88682-2_18] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|
26
|
Over EAB, Hooge ITC, Vlaskamp BNS, Erkelens CJ. Coarse-to-fine eye movement strategy in visual search. Vision Res 2007; 47:2272-80. [PMID: 17617434 DOI: 10.1016/j.visres.2007.05.002] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2007] [Revised: 04/16/2007] [Indexed: 10/23/2022]
Abstract
Oculomotor behavior contributes importantly to visual search. Saccadic eye movements can direct the fovea to potentially interesting parts of the visual field. Ensuing stable fixations enables the visual system to analyze those parts. The visual system may use fixation duration and saccadic amplitude as optimizers for visual search performance. Here we investigate whether the time courses of fixation duration and saccade amplitude depend on the subject's knowledge of the search stimulus, in particular target conspicuity. We analyzed 65,000 saccades and fixations in a search experiment for (possibly camouflaged) military vehicles of unknown type and size. Mean saccade amplitude decreased and mean fixation duration increased gradually as a function of the ordinal saccade and fixation number. In addition we analyzed 162,000 saccades and fixations recorded during a search experiment in which the location of the target was the only unknown. Whether target conspicuity was constant or varied appeared to have minor influence on the time courses of fixation duration and saccade amplitude. We hypothesize an intrinsic coarse-to-fine strategy for visual search that is even used when such a strategy is not optimal.
Collapse
Affiliation(s)
- E A B Over
- Physics of Man, Helmholtz Institute, Utrecht University, The Netherlands.
| | | | | | | |
Collapse
|
27
|
Huang C, Ai H, Li Y, Lao S. High-performance rotation invariant multiview face detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2007; 29:671-86. [PMID: 17299224 DOI: 10.1109/tpami.2007.1011] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Rotation invariant multiview face detection (MVFD) aims to detect faces with arbitrary rotation-in-plane (RIP) and rotation-off-plane (ROP) angles in still images or video sequences. MVFD is crucial as the first step in automatic face processing for general applications since face images are seldom upright and frontal unless they are taken cooperatively. In this paper, we propose a series of innovative methods to construct a high-performance rotation invariant multiview face detector, including the Width-First-Search (WFS) tree detector structure, the Vector Boosting algorithm for learning vector-output strong classifiers, the domain-partition-based weak learning method, the sparse feature in granular space, and the heuristic search for sparse feature selection. As a result of that, our multiview face detector achieves low computational complexity, broad detection scope, and high detection accuracy on both standard testing sets and real-life images.
Collapse
Affiliation(s)
- Chang Huang
- Department of Computer Science and Technology, Tsinghua University, Bejing, China.
| | | | | | | |
Collapse
|
28
|
Lepetit V, Fua P. Keypoint recognition using randomized trees. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2006; 28:1465-79. [PMID: 16929732 DOI: 10.1109/tpami.2006.188] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
In many 3D object-detection and pose-estimation problems, runtime performance is of critical importance. However, there usually is time to train the system, which we will show to be very useful. Assuming that several registered images of the target object are available, we developed a keypoint-based approach that is effective in this context by formulating wide-baseline matching of keypoints extracted from the input images to those found in the model images as a classification problem. This shifts much of the computational burden to a training phase, without sacrificing recognition performance. As a result, the resulting algorithm is robust, accurate, and fast-enough for frame-rate performance. This reduction in runtime computational complexity is our first contribution. Our second contribution is to show that, in this context, a simple and fast keypoint detector suffices to support detection and tracking even under large perspective and scale variations. While earlier methods require a detector that can be expected to produce very repeatable results, in general, which usually is very time-consuming, we simply find the most repeatable object keypoints for the specific target object during the training phase. We have incorporated these ideas into a real-time system that detects planar, nonplanar, and deformable objects. It then estimates the pose of the rigid ones and the deformations of the others.
Collapse
Affiliation(s)
- Vincent Lepetit
- Ecole Polytechnique Fédérale de Lausanne, Computer Vision Laboratory, CH-1015 Lausanne, Switzerland.
| | | |
Collapse
|
29
|
Chalmond B, Francesconi B, Herbin S. Using hidden scale for salient object detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2006; 15:2644-56. [PMID: 16948309 DOI: 10.1109/tip.2006.877380] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
This paper describes a method for detecting salient regions in remote-sensed images, based on scale and contrast interaction. We consider the focus on salient structures as the first stage of an object detection/recognition algorithm, where the salient regions are those likely to contain objects of interest. Salient objects are modeled as spatially localized and contrasted structures with any kind of shape or size. Their detection exploits a probabilistic mixture model that takes two series of multiscale features as input, one that is more sensitive to contrast information, and one that is able to select scale. The model combines them to classify each pixel in salient/nonsalient class, giving a binary segmentation of the image. The few parameters are learned with an EM-type algorithm.
Collapse
Affiliation(s)
- Bernard Chalmond
- Centre de Mathématiques et de Leurs Applications, CNRS (UMR 8536), Ecole Normale Supérieure de Cachan, 94235 Cachan, France.
| | | | | |
Collapse
|
30
|
Eckes C, Triesch J, von der Malsburg C. Analysis of cluttered scenes using an elastic matching approach for stereo images. Neural Comput 2006; 18:1441-71. [PMID: 16764510 DOI: 10.1162/neco.2006.18.6.1441] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
We present a system for the automatic interpretation of cluttered scenes containing multiple partly occluded objects in front of unknown, complex backgrounds. The system is based on an extended elastic graph matching algorithm that allows the explicit modeling of partial occlusions. Our approach extends an earlier system in two ways. First, we use elastic graph matching in stereo image pairs to increase matching robustness and disambiguate occlusion relations. Second, we use richer feature descriptions in the object models by integrating shape and texture with color features. We demonstrate that the combination of both extensions substantially increases recognition performance. The system learns about new objects in a simple one-shot learning approach. Despite the lack of statistical information in the object models and the lack of an explicit background model, our system performs surprisingly well for this very difficult task. Our results underscore the advantages of view-based feature constellation representations for difficult object recognition problems.
Collapse
Affiliation(s)
- Christian Eckes
- Fraunhofer Institute for Media Communications IMK, D-53754 Sankt Augustin, Germany.
| | | | | |
Collapse
|
31
|
|
32
|
Coarse-to-Fine Textures Retrieval in the JPEG 2000 Compressed Domain for Fast Browsing of Large Image Databases. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/11848035_38] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
33
|
Amit Y, Koloydenko A, Niyogi P. Robust acoustic object detection. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:2634-48. [PMID: 16266183 DOI: 10.1121/1.2011411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
We consider a novel approach to the problem of detecting phonological objects like phonemes, syllables, or words, directly from the speech signal. We begin by defining local features in the time-frequency plane with built in robustness to intensity variations and time warping. Global templates of phonological objects correspond to the coincidence in time and frequency of patterns of the local features. These global templates are constructed by using the statistics of the local features in a principled way. The templates have clear phonetic interpretability, are easily adaptable, have built in invariances, and display considerable robustness in the face of additive noise and clutter from competing speakers. We provide a detailed evaluation of the performance of some diphone detectors and a word detector based on this approach. We also perform some phonetic classification experiments based on the edge-based features suggested here.
Collapse
Affiliation(s)
- Yali Amit
- Departments of Computer Science and Statistics, The University of Chicago, Hyde Park, Chicago, Illinois 60637, USA.
| | | | | |
Collapse
|
34
|
Abstract
We present a face detection method using spectral histograms and support vector machines (SVMs). Each image window is represented by its spectral histogram, which is a feature vector consisting of histograms of filtered images. Using statistical sampling, we show systematically the representation groups face images together; in comparison, commonly used representations often do not exhibit this necessary and desirable property. By using an SVM trained on a set of 4500 face and 8000 nonface images, we obtain a robust classifying function for face and non-face patterns. With an effective illumination-correction algorithm, our system reliably discriminates face and nonface patterns in images under different kinds of conditions. Our method on two commonly used data sets give the best performance among recent face-detection ones. We attribute the high performance to the desirable properties of the spectral histogram representation and good generalization of SVMs. Several further improvements in computation time and in performance are discussed.
Collapse
Affiliation(s)
- Christopher A Waring
- Department of Computer Science, The Florida State University, Tallahassee, FL 32306, USA.
| | | |
Collapse
|
35
|
|
36
|
Amit Y, Geman D, Fan X. A coarse-to-fine strategy for multiclass shape detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2004; 26:1606-1621. [PMID: 15573821 DOI: 10.1109/tpami.2004.111] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Multiclass shape detection, in the sense of recognizing and localizing instances from multiple shape classes, is formulated as a two-step process in which local indexing primes global interpretation. During indexing a list of instantiations (shape identities and poses) is compiled, constrained only by no missed detections at the expense of false positives. Global information, such as expected relationships among poses, is incorporated afterward to remove ambiguities. This division is motivated by computational efficiency. In addition, indexing itself is organized as a coarse-to-fine search simultaneously in class and pose. This search can be interpreted as successive approximations to likelihood ratio tests arising from a simple ("naive Bayes") statistical model for the edge maps extracted from the original images. The key to constructing efficient "hypothesis tests" for multiple classes and poses is local ORing; in particular, spread edges provide imprecise but common and locally invariant features. Natural tradeoffs then emerge between discrimination and the pattern of spreading. These are analyzed mathematically within the model-based framework and the whole procedure is illustrated by experiments in reading license plates.
Collapse
Affiliation(s)
- Yali Amit
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
37
|
Chi Z, Rauske PL, Margoliash D. Pattern Filtering for Detection of Neural Activity, with Examples from HVc Activity During Sleep in Zebra Finches. Neural Comput 2003; 15:2307-37. [PMID: 14511523 DOI: 10.1162/089976603322362374] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
The detection of patterned spiking activity is important in the study of neural coding. A pattern filtering approach is developed for pattern detection under the framework of point processes, which offers flexibility in combining temporal details and firing rates. The detection combines multiple steps of filtering in a coarse-to-fine manner. Under some conditional Poisson assumptions on the spiking activity, each filtering step is equivalent to classifying by likelihood ratios all the data segments as targets or as background sequences. Unlike previous studies, where global surrogate data were used to evaluate the statistical significance of the detected patterns, a localizedp-test procedure is developed, which better accounts for firing modulation and nonstationarity in spiking activity. Common temporal structures of patterned activity are learned using an entropy-based alignment procedure, without relying on metrics or pair-wise alignment. Applications of pattern filtering to single, presumptive interneurons recorded in the nucleus HVc of zebra finch are illustrated. These demonstrate a match between the auditory-evoked response to playback of the individual bird's own song and spontaneous activity during sleep. Small temporal compression or expansion, or both, is required for optimal matching of spontaneous patterns to stimulus-evoked activity.
Collapse
Affiliation(s)
- Zhiyi Chi
- Department of Statistics, Committee on Computational Neuroscience, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
38
|
Abstract
We describe an architecture for invariant visual detection and recognition. Learning is performed in a single central module. The architecture makes use of a replica module consisting of copies of retinotopic layers of local features, with a particular design of inputs and outputs, that allows them to be primed either to attend to a particular location, or to attend to a particular object representation. In the former case the data at a selected location can be classified in the central module. In the latter case all instances of the selected object are detected in the field of view. The architecture is used to explain a number of psychophysical and physiological observations: object based attention, the different response time slopes of target detection among distractors, and observed attentional modulation of neuronal responses. We hypothesize that the organization of visual cortex in columns of neurons responding to the same feature at the same location may provide the copying architecture needed for translation invariance.
Collapse
Affiliation(s)
- Yali Amit
- Department of Statistics, University of Chicago, Chicago, IL 60637, USA.
| | | |
Collapse
|
39
|
|
40
|
Coughlan JM, Ferreira SJ. Finding Deformable Shapes Using Loopy Belief Propagation. COMPUTER VISION — ECCV 2002 2002. [DOI: 10.1007/3-540-47977-5_30] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|