nips nips2004 nips2004-192 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shai Avidan, Moshe Butman
Abstract: We give a fast rejection scheme that is based on image segments and demonstrate it on the canonical example of face detection. However, instead of focusing on the detection step we focus on the rejection step and show that our method is simple and fast to be learned, thus making it an excellent pre-processing step to accelerate standard machine learning classifiers, such as neural-networks, Bayes classifiers or SVM. We decompose a collection of face images into regions of pixels with similar behavior over the image set. The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99.8% of image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. Moreover, the training time for our method is much less than an hour, on a standard PC. The shape of the features (i.e. image segments) we use is data-driven, they are very cheap to compute and they form a very low dimensional feature space in which exhaustive search for the best features is tractable. 1
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We give a fast rejection scheme that is based on image segments and demonstrate it on the canonical example of face detection. [sent-5, score-1.208]
2 However, instead of focusing on the detection step we focus on the rejection step and show that our method is simple and fast to be learned, thus making it an excellent pre-processing step to accelerate standard machine learning classifiers, such as neural-networks, Bayes classifiers or SVM. [sent-6, score-0.567]
3 We decompose a collection of face images into regions of pixels with similar behavior over the image set. [sent-7, score-0.66]
4 The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99. [sent-8, score-1.573]
5 8% of image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. [sent-9, score-0.727]
6 image segments) we use is data-driven, they are very cheap to compute and they form a very low dimensional feature space in which exhaustive search for the best features is tractable. [sent-13, score-0.428]
7 1 Introduction This work is motivated by recent advances in object detection algorithms that use a cascade of rejectors to quickly detect objects in images. [sent-14, score-0.82]
8 Instead of using a full fledged classifier on every image patch, a sequence of increasingly more complex rejectors is applied. [sent-15, score-0.827]
9 Nonface image patches will be rejected early on in the cascade, while face image patches will survive the entire cascade and will be marked as a face. [sent-16, score-1.319]
10 Common to all these methods is the realization that simple and fast classifiers are enough to reject large portions of the image, leaving more time to use more sophisticated, and time consuming, classifiers on the remaining regions of the image. [sent-19, score-0.314]
11 First, is the feature space in which to work, second is a fast method to calculate the features from the raw image data and third is the feature selection algorithm to use. [sent-21, score-0.599]
12 [4] suggest the maximum rejection criteria that chooses rejectors that maximize the rejection rate of each classifier. [sent-24, score-1.262]
13 [12], that constructed the full SVM classifier first and then approximated it with a sequence or support vector rejectors that were calculated using non-linear optimization. [sent-28, score-0.655]
14 All the above mentioned method need to “touch” every pixel in an image patch at least once before they can reject the image patch. [sent-29, score-0.793]
15 Viola & Jones [15], on the other hand, construct a huge feature space that consists of combined box regions that can be quickly computed from the raw pixel data using the “integral image” and use a sequential feature selection algorithm for feature selection. [sent-30, score-0.466]
16 The rejectors are combined using a variant of AdaBoost [2]. [sent-31, score-0.535]
17 An important advantage of the huge feature space advocated by Viola & Jones is that now image patches can be rejected with an extremely small number of operations and there is no need to “touch” every pixel in the image patch at least once. [sent-33, score-1.095]
18 Our method offers a way to accelerate “slow” classification methods by using a preprocessing rejection step. [sent-38, score-0.423]
19 Our rejection scheme is fast to be trained and very effective in rejecting the vast majority of false patterns. [sent-39, score-0.539]
20 On the canonical face detection example, it took our method much less than an hour to train and it was able to reject over 99. [sent-40, score-0.461]
21 8% of the image patches, meaning that we can effectively accelerate standard classifiers by several orders of magnitude, without changing the classifier at all. [sent-41, score-0.331]
22 We take our features to be the approximated mean and variance of image segments, where every image segment consists of pixels that have similar behavior across the entire image set. [sent-43, score-1.435]
23 We use only a small number of representative pixels to calculate the approximated mean and variance, which makes our features very fast to compute during detection (in our experiments we found that our first rejector rejects almost 50% of all image patches, using just 8 pixels). [sent-46, score-1.15]
24 Finally, the number of segments we use is quite small which makes it possible to exhaustively calculate all possible rejectors based on single, pairs and triplets of segments in order to find the best rejectors in every step of the cascade. [sent-47, score-1.917]
25 This is in contrast to methods that construct a huge feature bank and use a greedy feature selection algorithm to choose “good” features from it. [sent-48, score-0.313]
26 In our experiments we train on a database that contains several thousands of face images and roughly half-a-million non-faces in less than an hour on an average PC and our rejection module runs at several frames per second. [sent-50, score-0.627]
27 This leads to the idea of image segmentation, that breaks an ensemble of images into regions of pixels that exhibit similar temporal behavior. [sent-55, score-0.54]
28 Given the image segmentation we take our features to be the mean and variance of each segment, giving us a very small feature space to work on (we chose to segment the face image into eight segments). [sent-56, score-1.117]
29 Unfortunately, calculating the mean and variance of an image segment requires going over all the pixels in the segment, a time consuming process. [sent-57, score-0.733]
30 However, since the segments represent similar-behaving pixels we found that we can approximate the calculation of the mean and variance of the entire segment using quite a small number of representative pixels. [sent-58, score-0.944]
31 In our experiments, four pixels were enough to adequately represent segments that contain several tens of pixels. [sent-59, score-0.531]
32 Now that we have a very small feature space to work with, and a fast way to extract features from raw pixels data we can exhaustively search for all possible combinations of single, pairs or triplets of features to find the best rejector in every stage. [sent-60, score-0.774]
33 1 Image Segments Image segments were already presented in the past [1] for the problem of classification of objects such as faces or vehicles. [sent-63, score-0.393]
34 If two pixels come from the same region of the face they are likely to have the same intensity values and hence have a strong temporal correlation. [sent-72, score-0.361]
35 We wish to find this correlations and segment the image plane into regions of pixels that have similar temporal behavior. [sent-73, score-0.625]
36 Then Ax is the intensity profile of pixel x (We address pixels with a single number because the images are represented in a scan-line vector form). [sent-76, score-0.314]
37 That is, Ax is an N -dimensional vector (where N is the number of images) that holds the intensity values of pixel x in each image in the ensemble. [sent-77, score-0.357]
38 As a result, the image-plane is segmented into several (possibly non-continuous) segments of temporally correlated pixels. [sent-81, score-0.416]
39 2 Finding Representative Pixels Our algorithm works by comparing the mean and variance properties of one or more image segments. [sent-84, score-0.38]
40 Unfortunately this requires touching every pixel in the image segment during test time, thus slowing the classification process considerably. [sent-85, score-0.502]
41 Therefor, during train time we find a set of representative pixels that will be used during test time. [sent-86, score-0.326]
42 Specifically, we approximate every segment in a face image with a small number of representative pixels Face segments 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 (a) (b) Figure 1: Face segmentation and representative pixels. [sent-87, score-1.42]
43 The face segmentation was computed using 1400 faces, each segment is marked with a different color and the segments need not be contiguous. [sent-89, score-0.693]
44 The crosses overlaid on the segments mark the representative pixels that were automatically selected by our method. [sent-90, score-0.641]
45 (b) Histogram of the difference between an approximated mean and the exact mean of a particular segment (the light blue segment on the left). [sent-91, score-0.506]
46 The histogram is peaked at zero, meaning that the representative pixels give a good approximation. [sent-92, score-0.304]
47 that approximate the mean and variance of the entire image segment. [sent-93, score-0.408]
48 Define µ i (xj ) to be the true mean of segment i of face j, and let µi (xj ) be its approximation, defined as ˆ µi (xj ) = ˆ k j=1 xj k where {xj }k are a subset of pixels in segment i of pattern j. [sent-94, score-0.746]
49 We use a greedy algorithm j=1 that incrementally searches for the next representative pixel that minimize n (ˆi (xj )) − µi (xj ))2 µ j=1 and add it to the collection of representative pixels of segment i. [sent-95, score-0.677]
50 In practice we use four representative pixels per segment. [sent-96, score-0.325]
51 The representative pixels computed this way are used for computing both the approximated mean and the approximated variance of every test pattern. [sent-97, score-0.706]
52 Given the representative pixels, the approximated variance σi (xj ) of segment i of pattern j ˆ is given by: k σi (xj ) = ˆ |xj − µi (xj )| ˆ j=1 2. [sent-99, score-0.516]
53 3 The rejection cascade We construct a rejection cascade that can quickly reject image patches, with minimal computational load. [sent-100, score-1.48]
54 Our feature space consist of the approximated mean and variance of the image segments. [sent-101, score-0.589]
55 This feature space is very fast to compute, as we need only four pixels to calculate the approximate mean and variance of the segment. [sent-103, score-0.511]
56 In addition this feature space gives enough information to reject texture-less regions without the need to normalize the mean or variance of the entire image patch. [sent-105, score-0.69]
57 1 Feature rejectors Now, that we have segmented every image into several segments and approximated every segment with a small number of representative pixels, we can exhaustively search for the best combination of segments that will reject the largest number of non-face images. [sent-109, score-2.204]
58 We repeat this process until the improvement in rejection is negligible. [sent-110, score-0.372]
59 faces) and N negative examples we construct the following linear rejectors and adjust the parameter θ so that they will correctly classify d · P (we use d = 0. [sent-113, score-0.597]
60 For each segment i, find a bound on its approximated mean. [sent-116, score-0.27]
61 For each segment i, find a bound on its approximated variance. [sent-120, score-0.27]
62 For each pair of segments i, j, find a bound on the difference between their approximated means. [sent-124, score-0.457]
63 For each pair of segments i, j, find a bound on the difference between their approximated variance. [sent-128, score-0.457]
64 For each triplet of segments i, j, k find a bound on the difference of the absolute difference of their approximated means. [sent-132, score-0.457]
65 We do not re-train rejectors after selecting a particular rejector. [sent-136, score-0.535]
66 2 Training We form the cascade of rejectors from a large pattern vs. [sent-139, score-0.743]
67 rejector binary table T, where each entry T(i, j) is 1 if rejector j rejects pattern i. [sent-140, score-0.447]
68 Because the table is binary we can store every entry in a single bit and therefor a table of 513, 000 patterns and 664 rejectors can easily fit in the memory. [sent-141, score-0.694]
69 We then use a greedy algorithm to pick the next rejector with the highest rejection score r. [sent-142, score-0.551]
70 The idea of creating a rejector pool in advance was independently suggested by [16] to accelerate the Viola-Jones training time. [sent-150, score-0.29]
71 Figure 2a shows the rejection rate of this cascade on a training set of 513, 000 images, as well as the number of arithmetic operations it takes. [sent-152, score-0.614]
72 Note that roughly 50% of all patterns are rejected by the first rejector using only 12 operations. [sent-153, score-0.359]
73 The y-axis is the rejection rate on a training set of about half-a-million non-faces and about 1500 faces. [sent-157, score-0.38]
74 Overall rejection rate of the feature rejectors on the training set is 88%, it drops to about 80% on the CMU+MIT database. [sent-159, score-1.004]
75 (b) Rejection rate as a function of image segmentation method. [sent-160, score-0.348]
76 We trained our system using four types of image segmentation and show the rejector. [sent-161, score-0.336]
77 We compare our image segmentation approach against naive segmentation of the image plane into horizontal blocks, vertical blocks or random segmentation. [sent-162, score-0.706]
78 In each case we trained a cascade of 21 rejectors and calculated their accumulative rejection rate on our training set. [sent-163, score-1.09]
79 Clearly working with our image segments gives the best results. [sent-164, score-0.592]
80 We wanted to confirm our intuition that indeed only meaningful regions in the image can produce such results and we therefor performed the following experiment. [sent-165, score-0.369]
81 We segmented the pixels in the image using four different methods. [sent-166, score-0.485]
82 (1) using our image segments (2) into 8 horizontal blocks (3) into 8 vertical blocks (4) into 8 randomly generated segments. [sent-167, score-0.696]
83 Figure 2b show that image segments gives the best results, by far. [sent-168, score-0.592]
84 4 Texture-less region rejection We found that the feature rejectors defined in the previous section are doing poorly in rejecting texture-less regions. [sent-171, score-1.004]
85 This is because we do not perform any sort of variance normalization on the image patch, a step that will slow us down. [sent-172, score-0.367]
86 However, by now we have computed the approximated mean and variance of all the image segments and we can construct rejectors based on all of them to reject texture-less regions. [sent-173, score-1.553]
87 In particular we construct the following two rejectors 1. [sent-174, score-0.57]
88 Reject all image patches where the variance of all 8 approximated means falls below a threshold. [sent-175, score-0.654]
89 Reject all image patches where the variance of all 8 approximated variances falls below a threshold. [sent-182, score-0.654]
90 8% of the image patches in the image, leaving only a handful of image patches to be tested by a “slow”, full scale classifier. [sent-195, score-0.858]
91 3 Experiments We have tested our rejection scheme on the standard CMU+MIT database [13]. [sent-199, score-0.37]
92 We calculate the approximated mean and variance only when they are needed, to save time. [sent-202, score-0.302]
93 8% of the image patches, while correctly detecting 93% of the faces. [sent-204, score-0.282]
94 On average the feature rejectors rejected roughly 80% of all image patches, the textureless region rejectors rejected additional 10% of the image patches, the linear rejectors rejected additional 5% and the multi-detection heuristic rejected the remaining image patterns. [sent-205, score-3.017]
95 This is not enough for face detection, as there are roughly 615, 000 image patches per image in the CMU+MIT database, and our rejector cascade passes, on average, 870 false positive image patches, per image. [sent-208, score-1.505]
96 We have also experimented with rescaling the features, instead of rescaling the image, but noted that the number of false positives increased by about 5% for every fixed detection rate we tried (All the results reported here use image pyramids). [sent-212, score-0.507]
97 4 Summary and Conclusions We presented a fast rejection scheme that is based on image segments and demonstrated it on the canonical example of face detection. [sent-213, score-1.208]
98 Image segments are made of regions of pixels with similar behavior over the image set. [sent-214, score-0.812]
99 image segments) we use is data-driven and they are very cheap to compute The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99. [sent-217, score-1.853]
100 8% of the image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. [sent-218, score-0.727]
wordName wordTfidf (topN-words)
[('rejectors', 0.535), ('rejection', 0.347), ('segments', 0.337), ('image', 0.255), ('cascade', 0.175), ('pixels', 0.173), ('rejector', 0.172), ('patches', 0.163), ('segment', 0.15), ('face', 0.146), ('reject', 0.146), ('rejected', 0.134), ('representative', 0.131), ('approximated', 0.12), ('feature', 0.089), ('variance', 0.082), ('accelerate', 0.076), ('fast', 0.074), ('detection', 0.07), ('therefor', 0.067), ('classi', 0.063), ('false', 0.062), ('exhaustively', 0.061), ('segmentation', 0.06), ('pixel', 0.06), ('faces', 0.056), ('passed', 0.054), ('xj', 0.051), ('hour', 0.051), ('cmu', 0.051), ('regions', 0.047), ('ers', 0.046), ('triplets', 0.046), ('rejects', 0.046), ('ax', 0.045), ('temporally', 0.043), ('mean', 0.043), ('intensity', 0.042), ('formally', 0.042), ('viola', 0.041), ('object', 0.04), ('patch', 0.04), ('images', 0.039), ('avidan', 0.038), ('keren', 0.038), ('features', 0.037), ('every', 0.037), ('segmented', 0.036), ('construct', 0.035), ('falls', 0.034), ('rate', 0.033), ('heisele', 0.033), ('rejecting', 0.033), ('sung', 0.033), ('elad', 0.033), ('pattern', 0.033), ('greedy', 0.032), ('jones', 0.032), ('patterns', 0.031), ('operations', 0.031), ('huge', 0.031), ('rectangles', 0.03), ('approaching', 0.03), ('copenhagen', 0.03), ('schneiderman', 0.03), ('consuming', 0.03), ('slow', 0.03), ('er', 0.029), ('calculate', 0.029), ('entire', 0.028), ('save', 0.028), ('arithmetic', 0.028), ('romdhani', 0.028), ('touch', 0.028), ('blocks', 0.028), ('rows', 0.028), ('nd', 0.028), ('correctly', 0.027), ('rowley', 0.027), ('raw', 0.026), ('ensemble', 0.026), ('canonical', 0.026), ('realization', 0.025), ('cheap', 0.025), ('shai', 0.025), ('rescaling', 0.025), ('horizontal', 0.025), ('repeat', 0.025), ('entry', 0.024), ('scheme', 0.023), ('vertical', 0.023), ('et', 0.023), ('vision', 0.022), ('leaving', 0.022), ('search', 0.022), ('roughly', 0.022), ('train', 0.022), ('four', 0.021), ('suggested', 0.021), ('pool', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 192 nips-2004-The power of feature clustering: An application to object detection
Author: Shai Avidan, Moshe Butman
Abstract: We give a fast rejection scheme that is based on image segments and demonstrate it on the canonical example of face detection. However, instead of focusing on the detection step we focus on the rejection step and show that our method is simple and fast to be learned, thus making it an excellent pre-processing step to accelerate standard machine learning classifiers, such as neural-networks, Bayes classifiers or SVM. We decompose a collection of face images into regions of pixels with similar behavior over the image set. The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99.8% of image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. Moreover, the training time for our method is much less than an hour, on a standard PC. The shape of the features (i.e. image segments) we use is data-driven, they are very cheap to compute and they form a very low dimensional feature space in which exhaustive search for the best features is tractable. 1
2 0.13944407 68 nips-2004-Face Detection --- Efficient and Rank Deficient
Author: Wolf Kienzle, Matthias O. Franz, Bernhard Schölkopf, Gökhan H. Bakir
Abstract: This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning large images, this decreases the computational complexity by a significant amount. Experimental results show that in face detection, rank deficient approximations are 4 to 6 times faster than unconstrained reduced set systems. 1
3 0.13736753 99 nips-2004-Learning Hyper-Features for Visual Identification
Author: Andras D. Ferencz, Erik G. Learned-miller, Jitendra Malik
Abstract: We address the problem of identifying specific instances of a class (cars) from a set of images all belonging to that class. Although we cannot build a model for any particular instance (as we may be provided with only one “training” example of it), we can use information extracted from observing other members of the class. We pose this task as a learning problem, in which the learner is given image pairs, labeled as matching or not, and must discover which image features are most consistent for matching instances and discriminative for mismatches. We explore a patch based representation, where we model the distributions of similarity measurements defined on the patches. Finally, we describe an algorithm that selects the most salient patches based on a mutual information criterion. This algorithm performs identification well for our challenging dataset of car images, after matching only a few, well chosen patches. 1
4 0.12103352 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
Author: Margarita Osadchy, Matthew L. Miller, Yann L. Cun
Abstract: We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional network to map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose, and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets – one for frontal pose, one for rotated faces, and one for profiles – and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system’s accuracy on both face detection and pose estimation is improved by training for the two tasks together.
5 0.11590319 162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction
Author: Sunita Sarawagi, William W. Cohen
Abstract: We describe semi-Markov conditional random fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semi-CRFs can measure properties of segments, and transitions within a segment can be non-Markovian. In spite of this additional power, exact learning and inference algorithms for semi-CRFs are polynomial-time—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semi-CRFs generally outperform conventional CRFs. 1
6 0.11099521 144 nips-2004-Parallel Support Vector Machines: The Cascade SVM
7 0.11029239 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution
8 0.1030404 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images
9 0.098360904 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
10 0.095219523 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
11 0.090097755 160 nips-2004-Seeing through water
12 0.089613594 40 nips-2004-Common-Frame Model for Object Recognition
13 0.087818295 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data
14 0.084801875 44 nips-2004-Conditional Random Fields for Object Recognition
15 0.077690661 85 nips-2004-Instance-Based Relevance Feedback for Image Retrieval
16 0.064418726 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
17 0.062045772 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters
18 0.058391824 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
19 0.058277108 139 nips-2004-Optimal Aggregation of Classifiers and Boosting Maps in Functional Magnetic Resonance Imaging
20 0.057851598 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
topicId topicWeight
[(0, -0.179), (1, 0.057), (2, -0.084), (3, -0.175), (4, 0.131), (5, 0.079), (6, 0.108), (7, -0.139), (8, -0.05), (9, 0.023), (10, -0.133), (11, 0.013), (12, -0.022), (13, -0.004), (14, -0.088), (15, -0.043), (16, -0.058), (17, 0.168), (18, 0.075), (19, -0.082), (20, 0.114), (21, 0.004), (22, -0.13), (23, -0.074), (24, 0.023), (25, 0.101), (26, 0.016), (27, -0.04), (28, -0.079), (29, -0.014), (30, 0.141), (31, -0.024), (32, -0.003), (33, -0.017), (34, 0.063), (35, -0.136), (36, -0.101), (37, 0.004), (38, -0.024), (39, 0.031), (40, -0.038), (41, -0.022), (42, 0.048), (43, 0.051), (44, 0.02), (45, 0.065), (46, -0.064), (47, -0.046), (48, 0.056), (49, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.96305048 192 nips-2004-The power of feature clustering: An application to object detection
Author: Shai Avidan, Moshe Butman
Abstract: We give a fast rejection scheme that is based on image segments and demonstrate it on the canonical example of face detection. However, instead of focusing on the detection step we focus on the rejection step and show that our method is simple and fast to be learned, thus making it an excellent pre-processing step to accelerate standard machine learning classifiers, such as neural-networks, Bayes classifiers or SVM. We decompose a collection of face images into regions of pixels with similar behavior over the image set. The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99.8% of image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. Moreover, the training time for our method is much less than an hour, on a standard PC. The shape of the features (i.e. image segments) we use is data-driven, they are very cheap to compute and they form a very low dimensional feature space in which exhaustive search for the best features is tractable. 1
2 0.69026899 68 nips-2004-Face Detection --- Efficient and Rank Deficient
Author: Wolf Kienzle, Matthias O. Franz, Bernhard Schölkopf, Gökhan H. Bakir
Abstract: This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning large images, this decreases the computational complexity by a significant amount. Experimental results show that in face detection, rank deficient approximations are 4 to 6 times faster than unconstrained reduced set systems. 1
3 0.68141538 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
Author: Margarita Osadchy, Matthew L. Miller, Yann L. Cun
Abstract: We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional network to map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose, and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets – one for frontal pose, one for rotated faces, and one for profiles – and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system’s accuracy on both face detection and pose estimation is improved by training for the two tasks together.
4 0.67413288 99 nips-2004-Learning Hyper-Features for Visual Identification
Author: Andras D. Ferencz, Erik G. Learned-miller, Jitendra Malik
Abstract: We address the problem of identifying specific instances of a class (cars) from a set of images all belonging to that class. Although we cannot build a model for any particular instance (as we may be provided with only one “training” example of it), we can use information extracted from observing other members of the class. We pose this task as a learning problem, in which the learner is given image pairs, labeled as matching or not, and must discover which image features are most consistent for matching instances and discriminative for mismatches. We explore a patch based representation, where we model the distributions of similarity measurements defined on the patches. Finally, we describe an algorithm that selects the most salient patches based on a mutual information criterion. This algorithm performs identification well for our challenging dataset of car images, after matching only a few, well chosen patches. 1
5 0.66596413 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data
Author: Oliver Williams, Andrew Blake, Roberto Cipolla
Abstract: There has been substantial progress in the past decade in the development of object classifiers for images, for example of faces, humans and vehicles. Here we address the problem of contaminations (e.g. occlusion, shadows) in test images which have not explicitly been encountered in training data. The Variational Ising Classifier (VIC) algorithm models contamination as a mask (a field of binary variables) with a strong spatial coherence prior. Variational inference is used to marginalize over contamination and obtain robust classification. In this way the VIC approach can turn a kernel classifier for clean data into one that can tolerate contamination, without any specific training on contaminated positives. 1
6 0.6113683 199 nips-2004-Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)
7 0.60053545 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images
8 0.52955347 40 nips-2004-Common-Frame Model for Object Recognition
9 0.52262533 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images
10 0.51485831 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
11 0.48035622 85 nips-2004-Instance-Based Relevance Feedback for Image Retrieval
12 0.44812259 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
13 0.44803786 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations
14 0.44240639 205 nips-2004-Who's In the Picture
15 0.44088846 162 nips-2004-Semi-Markov Conditional Random Fields for Information Extraction
16 0.43010509 44 nips-2004-Conditional Random Fields for Object Recognition
17 0.42096215 121 nips-2004-Modeling Nonlinear Dependencies in Natural Images using Mixture of Laplacian Distribution
18 0.41242459 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
19 0.40297115 53 nips-2004-Discriminant Saliency for Visual Recognition from Cluttered Scenes
20 0.40200359 81 nips-2004-Implicit Wiener Series for Higher-Order Image Analysis
topicId topicWeight
[(13, 0.062), (15, 0.146), (17, 0.047), (26, 0.067), (31, 0.021), (33, 0.17), (35, 0.015), (39, 0.025), (50, 0.036), (59, 0.272), (76, 0.012), (94, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.81489003 192 nips-2004-The power of feature clustering: An application to object detection
Author: Shai Avidan, Moshe Butman
Abstract: We give a fast rejection scheme that is based on image segments and demonstrate it on the canonical example of face detection. However, instead of focusing on the detection step we focus on the rejection step and show that our method is simple and fast to be learned, thus making it an excellent pre-processing step to accelerate standard machine learning classifiers, such as neural-networks, Bayes classifiers or SVM. We decompose a collection of face images into regions of pixels with similar behavior over the image set. The relationships between the mean and variance of image segments are used to form a cascade of rejectors that can reject over 99.8% of image patches, thus only a small fraction of the image patches must be passed to a full-scale classifier. Moreover, the training time for our method is much less than an hour, on a standard PC. The shape of the features (i.e. image segments) we use is data-driven, they are very cheap to compute and they form a very low dimensional feature space in which exhaustive search for the best features is tractable. 1
2 0.7709282 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
Author: Saharon Rosset, Robert Tibshirani, Ji Zhu, Trevor J. Hastie
Abstract: In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. 1
3 0.67492467 68 nips-2004-Face Detection --- Efficient and Rank Deficient
Author: Wolf Kienzle, Matthias O. Franz, Bernhard Schölkopf, Gökhan H. Bakir
Abstract: This paper proposes a method for computing fast approximations to support vector decision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized input space points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning large images, this decreases the computational complexity by a significant amount. Experimental results show that in face detection, rank deficient approximations are 4 to 6 times faster than unconstrained reduced set systems. 1
4 0.66921794 130 nips-2004-Newscast EM
Author: Wojtek Kowalczyk, Nikos A. Vlassis
Abstract: We propose a gossip-based distributed algorithm for Gaussian mixture learning, Newscast EM. The algorithm operates on network topologies where each node observes a local quantity and can communicate with other nodes in an arbitrary point-to-point fashion. The main difference between Newscast EM and the standard EM algorithm is that the M-step in our case is implemented in a decentralized manner: (random) pairs of nodes repeatedly exchange their local parameter estimates and combine them by (weighted) averaging. We provide theoretical evidence and demonstrate experimentally that, under this protocol, nodes converge exponentially fast to the correct estimates in each M-step of the EM algorithm. 1
5 0.66139209 70 nips-2004-Following Curved Regularized Optimization Solution Paths
Author: Saharon Rosset
Abstract: Regularization plays a central role in the analysis of modern data, where non-regularized fitting is likely to lead to over-fitted models, useless for both prediction and interpretation. We consider the design of incremental algorithms which follow paths of regularized solutions, as the regularization varies. These approaches often result in methods which are both efficient and highly flexible. We suggest a general path-following algorithm based on second-order approximations, prove that under mild conditions it remains “very close” to the path of optimal solutions and illustrate it with examples.
6 0.65415025 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
7 0.65403134 19 nips-2004-An Application of Boosting to Graph Classification
8 0.65367877 161 nips-2004-Self-Tuning Spectral Clustering
9 0.65330672 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
10 0.6532141 178 nips-2004-Support Vector Classification with Input Data Uncertainty
11 0.65245664 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
12 0.65231097 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
13 0.65100724 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
14 0.65085655 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
15 0.65029222 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
16 0.65021694 69 nips-2004-Fast Rates to Bayes for Kernel Machines
17 0.64964849 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
18 0.6496042 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill
19 0.64748621 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
20 0.64699668 103 nips-2004-Limits of Spectral Clustering