nips nips2007 nips2007-56 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Fair discriminative pedestrian finders are now available. [sent-7, score-0.609]
2 In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. [sent-8, score-0.82]
3 We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. [sent-11, score-0.85]
4 We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. [sent-12, score-0.194]
5 We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. [sent-13, score-0.609]
6 1 Introduction Very accurate pedestrian detectors are an important technical goal; approximately half-a-million pedestrians are killed by cars each year (1997 figures, in [1]). [sent-14, score-0.847]
7 At relatively low resolution, pedestrians tend to have a characteristic appearance. [sent-15, score-0.213]
8 In these cases, one will see either a “lollipop” shape — the torso is wider than the legs, which are together in the stance phase of the walk — or a “scissor” shape — where the legs are swinging in the walk. [sent-17, score-0.212]
9 Their method is based on a comprehensive study of features and their effects on performance for the pedestrian detection problem. [sent-21, score-0.769]
10 Each window is decomposed into overlapping blocks (large spatial domains) of cells (smaller spatial domains). [sent-24, score-0.197]
11 In each block, a histogram of gradient directions (or edge orientations) is computed for each cell with a measure of histogram “energy”. [sent-25, score-0.116]
12 The detection window is tiled with an overlapping grid. [sent-27, score-0.252]
13 Recently, Sabzmeydani and Mori [15] reported improved results by using AdaBoost to select shapelet features (triplets of location, direction and strength of local average gradient responses in different directions). [sent-37, score-0.143]
14 A key difficulty with pedestrian detection is that detectors must work on human configurations not often seen in datasets. [sent-38, score-0.787]
15 There is some evidence (figure 1) that less common configurations present real difficulties for very good current pedestrian detectors (our reimplementation of Dalal and Triggs’ work [9]). [sent-40, score-0.634]
16 Configuration estimates result in our method producing fewer false negatives than our implementation of Dalal and Triggs does. [sent-42, score-0.117]
17 We conjecture that a configuration estimate can avoid problems with occlusion or contrast failure because the configuration estimate reduces noise and the detector can use lower detection thresholds. [sent-44, score-0.291]
18 1 Configuration and Parts Detecting pedestrians with templates most likely works because pedestrians appear in a relatively limited range of configurations and views (e. [sent-46, score-0.478]
19 It appears certain that using the architecture of constructing features for whole image windows and then throwing the result into a classifier could be used to build a person-finder for arbitrary configurations and arbitrary views only with a major engineering effort. [sent-50, score-0.297]
20 In particular, people are made of body segments which individually have a quite simple structure, and these segments are connected into a kinematic structure which is quite well understood. [sent-53, score-0.303]
21 All this suggests finding people by finding the parts and then reasoning about their layout — essentially, building templates with complex internal kinematics. [sent-54, score-0.121]
22 Discriminative approaches use classifiers to detect parts, then reason about configuration [11]. [sent-57, score-0.119]
23 If one has a video sequence, part appearance can itself be learned [19, 20]; more recently, Ramanan has shown knowledge of articulation properties gives an appearance model in a single image [21]. [sent-59, score-0.213]
24 Codebook approaches avoid explicitly modelling body segments, and instead use unsupervised methods to find part decompositions that are good for recognition (rather than disarticulation) [25]. [sent-61, score-0.132]
25 We compute segment features by placing a box around some vertices (as in the head), or pairs of vertices (as in the torso and leg). [sent-68, score-0.309]
26 Histogram features are then computed for base points referred to the box coordinate frame; the histogram is shifted by the orientation of the box axis (section 3) within the rectified box. [sent-69, score-0.157]
27 On the far right, a window showing the color key for our structure learning points; dark green is a foot, green a knee, dark purple the other foot, purple the other knee, etc. [sent-70, score-0.225]
28 Note that structure learning is capable of finding distinction of left legs (green points) and right legs (pink points). [sent-71, score-0.284]
29 2 Configuration Estimation and Structure Learning We are presented with a window within which may lie a pedestrian. [sent-73, score-0.141]
30 We would like to be able to estimate the most likely configuration for any pedestrian present. [sent-74, score-0.574]
31 Our research hypothesis is that this estimate will improve pedestrian detector perfomance by reducing the amount of noise the final detector must cope with — essentially, the segmentation of the pedestrian is improved from a window to a (rectified) figure. [sent-75, score-1.649]
32 We follow convention (established by [26]) and model the configuration of a person as a tree model of segments (figure 2), with a score of segment quality and a score of segment-segment configuration. [sent-76, score-0.204]
33 Our configuration estimation procedure will use dynamic programming to extract the best configuration estimate from a set of scores depending on the location of vertices on the body model. [sent-78, score-0.15]
34 However, we do not know which features are most effective at estimating segment location; this is a well established difficulty in the literature [16]. [sent-79, score-0.165]
35 Structure learning is a method that uses a series of correct examples to estimate appropriate weightings of features relative to one another to produce a score that is effective at estimating configuration [27, 28]. [sent-80, score-0.114]
36 There is a variety of sensible choices of features for identifying body segments, but there is little evidence that a particular choice of features is best; different choices of W may lead to quite different behaviours. [sent-83, score-0.264]
37 In particular, we will collect a wide range of features likely to identify segments well in f , and wish to learn a choice of W that will give good configuration estimates. [sent-84, score-0.151]
38 3 Features There are two sets of features: first, those used for estimating configuration of a person from a window; and second, those used to determine whether a person is present conditioned on the best estimate of configuration. [sent-92, score-0.112]
39 The tree is given by the position of seven points, and encodes the head, torso and legs; arms are excluded because they are small and difficult to identify, and pedestrians can be identified without localizing arms. [sent-95, score-0.299]
40 The feature vector f (I, x; y) contains two types of feature: appearance features encode the appearance of putative segments; and geometric features encode relative and absolute configuration of the body segments. [sent-98, score-0.4]
41 However, histograms involve spatial pooling; this means that one can have many strong vertical orientations that do not join up to form a segment boundary. [sent-107, score-0.204]
42 To counter this effect, we use the local gradient features described by Ke and Sukthankar [12]. [sent-109, score-0.11]
43 To form these features, we concatenate the horizontal and vertical gradients of the patches in the segment coordinate frame, then normalize and apply PCA to reduce the number of dimensions. [sent-110, score-0.134]
44 This feature reveals whether the pattern of a body part appears at that location. [sent-112, score-0.129]
45 Generally, the features that determine configuration should also be good for determining whether a person is present or not. [sent-117, score-0.14]
46 However, a set of HOG features for the whole image window has been shown to be good at pedestrian detection [9]. [sent-118, score-0.987]
47 The support vector machine should be able to distinguish between good and bad features, so it is natural to concatenate the configuration features described above with a set of HOG features. [sent-119, score-0.11]
48 We find that these whole window features help recover from incorrect structure predictions. [sent-121, score-0.257]
49 These combined features are used in training the SVM classifier and in detection as well. [sent-122, score-0.195]
50 4 Results Dataset: We use INRIA Person, consisting of 2416 pedestrian images (1208 images with their leftright reflections) and 1218 background images for training. [sent-123, score-0.754]
51 For testing, there are 1126 pedestrian images (563 images with their left-right reflections) and 453 background images. [sent-124, score-0.694]
52 Training structure learning: we manually annotate 500 selected pedestrian images in the training set examples. [sent-125, score-0.666]
53 We use all 500 annotated examples to build the PCA spaces for each body segment. [sent-126, score-0.126]
54 We have trained the structure learning on 10 rounds and 20 rounds for comparisons. [sent-130, score-0.122]
55 A persistent nuisance associated with pictorial structure models of people is the tendency of such models to place legs on top of one another. [sent-132, score-0.251]
56 However, our results suggest that if one uses absolute configuration features as well as different appearance features for left and right legs (implicit in the structure learning procedure), the left and right legs are identified correctly. [sent-134, score-0.52]
57 The conditional independence assumption (which means we cannot use the angle between the legs as a feature) does not appear to cause problems, perhaps because absolute configuration features are sufficient. [sent-135, score-0.21]
58 We use 2146 pedestrian images with 2756 window images extracted from 1218 background images. [sent-137, score-0.835]
59 We then use this classifier to scan over 1218 background images with step side of 32 pixels and find hard examples (including false positives and true negatives of low confidence by using LibSVM [31] with probability option). [sent-139, score-0.263]
60 Testing: We test on 1126 positive images and scan 64x128 image windows over 453 negative test images, stepping by 16 pixels, a total of 182, 934 negative windows. [sent-142, score-0.33]
61 Scanning rate and comparison: Pedestrian detection systems work by scanning image windows, and presenting each window to a detector. [sent-143, score-0.461]
62 Dalal and Triggs established a methodology for evaluating pedestrian detectors, which is now quite widely used. [sent-144, score-0.574]
63 Their dataset offers a set of positive windows (where pedestrians are centered), and a set of negative images. [sent-145, score-0.349]
64 The negative images produce a pool of negative windows, and the detector is evaluated on detect rate on the positive windows and the false positive per window (FPPW) rate on the negative windows. [sent-146, score-0.753]
65 This strategy — which evaluates the detector, rather than the combination of detection and scanning — is appropriate for comparing systems that scan image windows at approximately the same high rate. [sent-147, score-0.478]
66 However, the important practical parameter for evaluating a system is the false positive per image (FPPI) rate. [sent-149, score-0.124]
67 If one has a detector that does not require a pedestrian to be centered in the image window, then one can obtain the same detect rate while scanning fewer image windows. [sent-150, score-1.222]
68 To date, this issue has not arisen, because pedestrian detectors have required pedestrians to be centered. [sent-152, score-0.847]
69 Left: a comparison of our method with the best detector of Dalal and Triggs, and the detector of Sabzmaydani and Mori, on the basis of FPPW rate. [sent-154, score-0.36]
70 This comparison ignores the fact that we can look at fewer image windows without loss of system sensitivity. [sent-155, score-0.244]
71 With 20 rounds of structure learning, our detector easily outperforms that of Dalal and Triggs. [sent-157, score-0.257]
72 Note that at high specificity, our detector is slightly more sensitive than that of Sabzmaydani and Mori, too. [sent-158, score-0.18]
73 Right: a comparison of our method with the best detector of Dalal and Triggs, and the detector of Sabzmaydani and Mori, on the basis of FPPI rate. [sent-159, score-0.36]
74 This comparison takes into account the fact that we can look at fewer image windows (by a factor of four). [sent-160, score-0.244]
75 The low variance in the detect rate under this procedure shows that our detector is highly insensitive to the configuration of the pedestrian within a window. [sent-163, score-0.908]
76 If one evaluates on the basis of false positives per image — which is likely the most important practical parameter — our system easily outperforms the state of the art. [sent-164, score-0.124]
77 1 The Effect of Configuration Estimates Figure 3 compares our detector with that of Dalal and Triggs, and of Sabzmeydani and Mori on the basis of detect and FPPW rates. [sent-166, score-0.299]
78 We plot detect rate against FPPW rate for the three detectors. [sent-167, score-0.189]
79 We scan images at steps of 16 pixels (rather than 8 pixels for Dalal and Triggs and Sabzmeydani and Mori). [sent-170, score-0.177]
80 This means that we scan four times fewer windows than they do. [sent-171, score-0.224]
81 If we can establish that the detect rate is not significantly affected by big offsets in pedestrian position, then we expect a large advantage in FPPI rate. [sent-172, score-0.728]
82 We evaluate the effect on the detect rate of scanning by large steps by a process of sampling. [sent-173, score-0.251]
83 Each positive example is replaced by a total of 256 replicates, obtained by offsetting the image window by steps in the range -7 to 8 in x and y (figure 4). [sent-174, score-0.218]
84 A tendency of the detector to require centered pedestrians would appear as variance in the reported detect rate. [sent-178, score-0.544]
85 The FPPI rate of the detector is not affected by this procedure, which evaluates only the spatial tuning of the detector. [sent-179, score-0.243]
86 In color, original positive examples from the INRIA test set; next to each, are three of the replicates we use to determine the effect on our detection system of scanning relatively few windows, or, equivalently, the effect on our detector of not having a pedestrian centered in the window. [sent-181, score-1.024]
87 Figure 3 compares system performance, combining detect and scanning rates, by plotting detect rate against FPPI rate. [sent-184, score-0.37]
88 We show four evaluation runs for our system; there is no evidence of substantial variance in detect rate. [sent-185, score-0.119]
89 Our system shows a very substantial increase in detect rate at fixed FPPI rate. [sent-186, score-0.154]
90 5 Discussion There is a difficulty with the evaluation methodology for pedestrian detection established by Dalal and Triggs (and widely followed). [sent-187, score-0.685]
91 A pedestrian detector that tests windows cannot find more pedestrians than there are windows. [sent-188, score-1.103]
92 This does not usually affect the interpretation of precision and recall statistics because the windows are closely packed. [sent-189, score-0.136]
93 However, in our method, because a pedestrian need not be centered in the window to be detected, the windows need not be closely packed, and there is a possibility of undercounting pedestrians who stand too close together. [sent-190, score-1.096]
94 We believe that this does not occur in our current method, because our window spacing is narrow relative to the width of a pedestrian. [sent-191, score-0.141]
95 First, they result in a detector that is relatively insensitive to the placement of a pedestrian in an image window, meaning one can look at fewer image windows to obtain the same detect rate, with consequent advantages to the rate at which the system produces false positives. [sent-198, score-1.276]
96 This is most likely because the process of estimating configurations focuses the detector on important image features (rather than pooling information over space). [sent-201, score-0.341]
97 The result would be that, when there is low contrast or a strange body configuration, the detector can use a somewhat lower detection threshold for the same FPPW rate. [sent-202, score-0.413]
98 Figure 1 shows human configurations detected by our method but not by our implementation of Dalal and Triggs; notice the predominance of either strange body configurations or low contrast. [sent-203, score-0.164]
99 Structure learning is an attractive method to determine which features are discriminative in configuration estimation, and it produces good configuration estimates in complex images. [sent-204, score-0.119]
100 Human detection based on a probabilistic assembly of robust part detectors. [sent-359, score-0.111]
wordName wordTfidf (topN-words)
[('pedestrian', 0.574), ('guration', 0.333), ('pedestrians', 0.213), ('dalal', 0.2), ('detector', 0.18), ('fppw', 0.164), ('con', 0.148), ('triggs', 0.146), ('hog', 0.143), ('window', 0.141), ('windows', 0.136), ('legs', 0.126), ('detect', 0.119), ('fppi', 0.115), ('mori', 0.115), ('detection', 0.111), ('scanning', 0.097), ('body', 0.096), ('torso', 0.086), ('features', 0.084), ('sabzmeydani', 0.082), ('segment', 0.081), ('vision', 0.08), ('image', 0.077), ('gurations', 0.073), ('appearance', 0.068), ('segments', 0.067), ('papageorgiou', 0.066), ('images', 0.06), ('detectors', 0.06), ('scan', 0.057), ('person', 0.056), ('pictorial', 0.052), ('templates', 0.052), ('forsyth', 0.049), ('sabzmaydani', 0.049), ('false', 0.047), ('rounds', 0.045), ('histogram', 0.045), ('ramanan', 0.043), ('human', 0.042), ('people', 0.041), ('negatives', 0.039), ('svm', 0.039), ('oriented', 0.039), ('wt', 0.039), ('ke', 0.036), ('recognition', 0.036), ('head', 0.035), ('orientations', 0.035), ('ii', 0.035), ('discriminative', 0.035), ('rate', 0.035), ('histograms', 0.033), ('pattern', 0.033), ('ections', 0.033), ('extremal', 0.033), ('hierachy', 0.033), ('knee', 0.033), ('mohan', 0.033), ('nders', 0.033), ('shapelet', 0.033), ('shoulders', 0.033), ('sukthankar', 0.033), ('uncommon', 0.033), ('urbana', 0.033), ('yp', 0.033), ('centered', 0.032), ('structure', 0.032), ('structured', 0.032), ('subgradient', 0.031), ('inria', 0.031), ('gure', 0.031), ('fewer', 0.031), ('detecting', 0.03), ('template', 0.03), ('examples', 0.03), ('pixels', 0.03), ('object', 0.029), ('vertices', 0.029), ('belongie', 0.029), ('hip', 0.029), ('mikolajczyk', 0.029), ('nder', 0.029), ('orientation', 0.028), ('spatial', 0.028), ('parts', 0.028), ('descriptors', 0.028), ('vertical', 0.027), ('gradient', 0.026), ('strange', 0.026), ('concatenate', 0.026), ('recti', 0.026), ('transportation', 0.026), ('chamfer', 0.026), ('purple', 0.026), ('felzenszwalb', 0.026), ('pose', 0.026), ('scores', 0.025), ('classi', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
2 0.14844388 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
Author: Cha Zhang, Paul A. Viola
Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate significant performance advantages. 1
3 0.14096104 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
4 0.12878756 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
5 0.11483984 50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation
Author: Leonid Sigal, Alexandru Balan, Michael J. Black
Abstract: Estimation of three-dimensional articulated human pose and motion from images is a central problem in computer vision. Much of the previous work has been limited by the use of crude generative models of humans represented as articulated collections of simple parts such as cylinders. Automatic initialization of such models has proved difficult and most approaches assume that the size and shape of the body parts are known a priori. In this paper we propose a method for automatically recovering a detailed parametric model of non-rigid body shape and pose from monocular imagery. Specifically, we represent the body using a parameterized triangulated mesh model that is learned from a database of human range scans. We demonstrate a discriminative method to directly recover the model parameters from monocular images using a conditional mixture of kernel regressors. This predicted pose and shape are used to initialize a generative model for more detailed pose and shape estimation. The resulting approach allows fully automatic pose and shape recovery from monocular and multi-camera imagery. Experimental results show that our method is capable of robustly recovering articulated pose, shape and biometric measurements (e.g. height, weight, etc.) in both calibrated and uncalibrated camera environments. 1
6 0.096829869 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
7 0.06951087 64 nips-2007-Cooled and Relaxed Survey Propagation for MRFs
8 0.067486726 171 nips-2007-Scan Strategies for Meteorological Radars
9 0.067244165 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
10 0.060378544 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
11 0.058181696 115 nips-2007-Learning the 2-D Topology of Images
12 0.054948743 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators
13 0.054913614 183 nips-2007-Spatial Latent Dirichlet Allocation
14 0.053262383 187 nips-2007-Structured Learning with Approximate Inference
15 0.052729812 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
16 0.049591318 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
17 0.049092948 109 nips-2007-Kernels on Attributed Pointsets with Applications
18 0.049030568 193 nips-2007-The Distribution Family of Similarity Distances
19 0.047424603 132 nips-2007-Modeling image patches with a directed hierarchy of Markov random fields
20 0.046971317 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
topicId topicWeight
[(0, -0.155), (1, 0.064), (2, -0.027), (3, -0.045), (4, 0.019), (5, 0.165), (6, 0.043), (7, 0.16), (8, 0.048), (9, -0.013), (10, -0.028), (11, -0.002), (12, -0.054), (13, -0.011), (14, -0.163), (15, 0.065), (16, -0.038), (17, 0.054), (18, -0.019), (19, -0.014), (20, -0.016), (21, 0.087), (22, 0.091), (23, 0.103), (24, -0.143), (25, -0.074), (26, -0.123), (27, -0.056), (28, 0.068), (29, 0.047), (30, -0.158), (31, 0.001), (32, -0.022), (33, 0.15), (34, 0.074), (35, 0.107), (36, -0.144), (37, 0.023), (38, 0.237), (39, -0.008), (40, 0.074), (41, 0.059), (42, 0.067), (43, 0.05), (44, 0.035), (45, 0.06), (46, 0.05), (47, -0.193), (48, 0.056), (49, -0.111)]
simIndex simValue paperId paperTitle
same-paper 1 0.96247464 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
2 0.7311852 113 nips-2007-Learning Visual Attributes
Author: Vittorio Ferrari, Andrew Zisserman
Abstract: We present a probabilistic generative model of visual attributes, together with an efficient learning algorithm. Attributes are visual qualities of objects, such as ‘red’, ‘striped’, or ‘spotted’. The model sees attributes as patterns of image segments, repeatedly sharing some characteristic properties. These can be any combination of appearance, shape, or the layout of segments within the pattern. Moreover, attributes with general appearance are taken into account, such as the pattern of alternation of any two colors which is characteristic for stripes. To enable learning from unsegmented training images, the model is learnt discriminatively, by optimizing a likelihood ratio. As demonstrated in the experimental evaluation, our model can learn in a weakly supervised setting and encompasses a broad range of attributes. We show that attributes can be learnt starting from a text query to Google image search, and can then be used to recognize the attribute and determine its spatial extent in novel real-world images.
3 0.63613755 137 nips-2007-Multiple-Instance Pruning For Learning Efficient Cascade Detectors
Author: Cha Zhang, Paul A. Viola
Abstract: Cascade detectors have been shown to operate extremely rapidly, with high accuracy, and have important applications such as face detection. Driven by this success, cascade learning has been an area of active research in recent years. Nevertheless, there are still challenging technical problems during the training process of cascade detectors. In particular, determining the optimal target detection rate for each stage of the cascade remains an unsolved issue. In this paper, we propose the multiple instance pruning (MIP) algorithm for soft cascades. This algorithm computes a set of thresholds which aggressively terminate computation with no reduction in detection rate or increase in false positive rate on the training dataset. The algorithm is based on two key insights: i) examples that are destined to be rejected by the complete classifier can be safely pruned early; ii) face detection is a multiple instance learning problem. The MIP process is fully automatic and requires no assumptions of probability distributions, statistical independence, or ad hoc intermediate rejection targets. Experimental results on the MIT+CMU dataset demonstrate significant performance advantages. 1
4 0.62327635 143 nips-2007-Object Recognition by Scene Alignment
Author: Bryan Russell, Antonio Torralba, Ce Liu, Rob Fergus, William T. Freeman
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database. 1
5 0.53492129 50 nips-2007-Combined discriminative and generative articulated pose and non-rigid shape estimation
Author: Leonid Sigal, Alexandru Balan, Michael J. Black
Abstract: Estimation of three-dimensional articulated human pose and motion from images is a central problem in computer vision. Much of the previous work has been limited by the use of crude generative models of humans represented as articulated collections of simple parts such as cylinders. Automatic initialization of such models has proved difficult and most approaches assume that the size and shape of the body parts are known a priori. In this paper we propose a method for automatically recovering a detailed parametric model of non-rigid body shape and pose from monocular imagery. Specifically, we represent the body using a parameterized triangulated mesh model that is learned from a database of human range scans. We demonstrate a discriminative method to directly recover the model parameters from monocular images using a conditional mixture of kernel regressors. This predicted pose and shape are used to initialize a generative model for more detailed pose and shape estimation. The resulting approach allows fully automatic pose and shape recovery from monocular and multi-camera imagery. Experimental results show that our method is capable of robustly recovering articulated pose, shape and biometric measurements (e.g. height, weight, etc.) in both calibrated and uncalibrated camera environments. 1
6 0.53118581 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
7 0.52423549 171 nips-2007-Scan Strategies for Meteorological Radars
8 0.37399325 193 nips-2007-The Distribution Family of Similarity Distances
9 0.35074326 57 nips-2007-Congruence between model and human attention reveals unique signatures of critical visual events
10 0.34497628 196 nips-2007-The Infinite Gamma-Poisson Feature Model
11 0.34181765 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
12 0.33921036 155 nips-2007-Predicting human gaze using low-level saliency combined with face detection
13 0.32248294 81 nips-2007-Estimating disparity with confidence from energy neurons
14 0.3141692 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
15 0.31323832 71 nips-2007-Discriminative Keyword Selection Using Support Vector Machines
16 0.30160546 109 nips-2007-Kernels on Attributed Pointsets with Applications
17 0.28580385 115 nips-2007-Learning the 2-D Topology of Images
18 0.27547088 211 nips-2007-Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data
19 0.27183193 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
20 0.25571036 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
topicId topicWeight
[(4, 0.011), (5, 0.047), (13, 0.04), (16, 0.03), (18, 0.014), (21, 0.078), (26, 0.018), (31, 0.018), (34, 0.022), (35, 0.023), (47, 0.061), (49, 0.026), (83, 0.125), (85, 0.012), (87, 0.061), (90, 0.048), (93, 0.283)]
simIndex simValue paperId paperTitle
same-paper 1 0.77298766 56 nips-2007-Configuration Estimates Improve Pedestrian Finding
Author: Duan Tran, David A. Forsyth
Abstract: Fair discriminative pedestrian finders are now available. In fact, these pedestrian finders make most errors on pedestrians in configurations that are uncommon in the training data, for example, mounting a bicycle. This is undesirable. However, the human configuration can itself be estimated discriminatively using structure learning. We demonstrate a pedestrian finder which first finds the most likely human pose in the window using a discriminative procedure trained with structure learning on a small dataset. We then present features (local histogram of oriented gradient and local PCA of gradient) based on that configuration to an SVM classifier. We show, using the INRIA Person dataset, that estimates of configuration significantly improve the accuracy of a discriminative pedestrian finder. 1
2 0.54143983 73 nips-2007-Distributed Inference for Latent Dirichlet Allocation
Author: David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed computation, where each of processors only sees of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors. ¢ ¤ ¦¥£ ¢ ¢
3 0.53722274 189 nips-2007-Supervised Topic Models
Author: Jon D. Mcauliffe, David M. Blei
Abstract: We introduce supervised latent Dirichlet allocation (sLDA), a statistical model of labelled documents. The model accommodates a variety of response types. We derive a maximum-likelihood procedure for parameter estimation, which relies on variational approximations to handle intractable posterior expectations. Prediction problems motivate this research: we use the fitted model to predict response values for new documents. We test sLDA on two real-world problems: movie ratings predicted from reviews, and web page popularity predicted from text descriptions. We illustrate the benefits of sLDA versus modern regularized regression, as well as versus an unsupervised LDA analysis followed by a separate regression. 1
4 0.53049004 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images
Author: Bill Triggs, Jakob J. Verbeek
Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1
5 0.530424 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
Author: Kai Yu, Wei Chu
Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1
6 0.52828127 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
7 0.52746087 18 nips-2007-A probabilistic model for generating realistic lip movements from speech
8 0.52531546 153 nips-2007-People Tracking with the Laplacian Eigenmaps Latent Variable Model
9 0.52514744 138 nips-2007-Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
10 0.52496254 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data
11 0.52440584 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
12 0.52245528 115 nips-2007-Learning the 2-D Topology of Images
13 0.52237219 69 nips-2007-Discriminative Batch Mode Active Learning
14 0.52091199 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)
15 0.52056962 196 nips-2007-The Infinite Gamma-Poisson Feature Model
16 0.52011913 2 nips-2007-A Bayesian LDA-based model for semi-supervised part-of-speech tagging
17 0.51939392 47 nips-2007-Collapsed Variational Inference for HDP
18 0.51911432 63 nips-2007-Convex Relaxations of Latent Variable Training
19 0.51901662 7 nips-2007-A Kernel Statistical Test of Independence
20 0.5186919 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes