nips nips2006 nips2006-122 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Deva Ramanan
Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Learning to parse images of articulated bodies Deva Ramanan Toyota Technological Institute at Chicago Chicago, IL 60637 ramanan@tti-c. [sent-1, score-0.384]
2 org Abstract We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. [sent-2, score-0.412]
3 Following a established line of research, pose estimation is framed as inference in a probabilistic model. [sent-4, score-0.382]
4 Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. [sent-6, score-0.331]
5 We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. [sent-7, score-0.451]
6 Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. [sent-8, score-0.393]
7 1 Introduction We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. [sent-9, score-0.412]
8 Following a established line of research, pose estimation is framed as inference in a probabilistic model. [sent-11, score-0.382]
9 When reliable features can be extracted (through say, background subtraction or skin detection), approaches tend to do well. [sent-13, score-0.225]
10 Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. [sent-16, score-0.331]
11 Since our approach is fairly general (we do not use any skin or face detectors), we also apply it to estimate horse poses from the Weizmann dataset [1]. [sent-17, score-0.313]
12 Another practical difficulty, specifically with pose estimation, is that of reporting results. [sent-18, score-0.27]
13 This is because the posterior of body poses is often multimodal, a single MAP/mode estimate won’t summarize it. [sent-20, score-0.345]
14 We calculate the probability of observing the actual pose under the distribution returned by our algorithm. [sent-22, score-0.27]
15 Related Work: Human pose estimation from static images is a very active research area. [sent-24, score-0.46]
16 Our work relies on the conditional random field (CRF) notion of deformable matching in [9]. [sent-26, score-0.504]
17 Our approach is related to those that simultaneously estimate pose and segment an image [7, 10, 2, 5], since we learn low-level segmentation cues to build part-specific region models. [sent-27, score-0.712]
18 We describe an iterative algorithm for pose estimation that learns a region model for each body part and for the background. [sent-32, score-0.774]
19 1 Overview Assume we are given an image of a person, who happens to be a soccer player wearing a white shirt on a green playing field (Fig. [sent-37, score-0.229]
20 We match an edge-based deformable model to the image to obtain (soft) estimates of body part positions. [sent-42, score-0.897]
21 The algorithm uses the estimated body part positions to build a rough region model for each body part and the background – it might learn that the torso is white-ish and the background is green-ish. [sent-47, score-1.092]
22 The algorithm then builds a region-based deformable model that looks for white torsos. [sent-48, score-0.54]
23 Soft estimates of body position from the new model are then used to build new region models, and the process is repeated. [sent-49, score-0.389]
24 As one might suspect, such an iterative procedure is quite sensitive to its starting point – the edgebased deformable model used for initialization and the region-based deformable model used in the first iteration prove crucial. [sent-50, score-1.25]
25 3), most of this paper deals with smart ways of building the deformable models. [sent-52, score-0.504]
26 2 Edge-based deformable model Our edge-based deformable model is an extension of the one proposed in [9]. [sent-53, score-1.08]
27 Let the location of each part li be paraminitial parse missing arm torso head ru−arm ll−leg hallucinated leg Figure 2: We build a deformable pose model based on edges. [sent-55, score-2.066]
28 Given an image I, we use a edgebased deformable model (middle) to compute body part locations P(L|I). [sent-56, score-0.947]
29 This defines an initial parse of the image into several body part regions right. [sent-57, score-0.616]
30 It is easy to hallucinate extra arms or legs in the negatives spaces between actual body parts (the extra leg). [sent-58, score-0.449]
31 When a body part is surrounded by clutter (the right arm), it is hard to localize. [sent-59, score-0.289]
32 The green region in between the legs is a poor leg candidate because of figure/ground cues – it groups better with the background grass. [sent-61, score-0.479]
33 Also, we can find left/right limb pairs by appealing to symmetry – if one limb is visible, we can build a model of its appearance, and use it to find the other one. [sent-62, score-0.326]
34 We operationalize both these notions by our iterative parsing procedure in Fig. [sent-63, score-0.215]
35 We define a parse to be a soft labeling of pixels into a region type (bg,torso,left lower arm, etc. [sent-66, score-0.375]
36 To exploit symmetry in appearance, we learn a single color model for left/right limb pairs. [sent-71, score-0.344]
37 We then use these masks as features for a deformable model that re-computes P(L|I). [sent-73, score-0.587]
38 final parse sample poses best pose torso head ru−arm ll−leg input Figure 4: The result of our procedure. [sent-75, score-0.863]
39 Given P(L|I) from the final iteration, we obtain a clean parse ˆ for the image. [sent-76, score-0.23]
40 We can write the deformable model as a log-linear model P(L|I) ∝ exp i,j∈E ψ(li − lj ) + φ(li ) (1) i Ψ(li − lj ) corresponds to a spatial prior on the relative arrangement of part i and j. [sent-84, score-1.025]
41 T ψ(li − li ) =αi bin(li − lj ) (2) Doing so allows us to capture more intricate distributions, at the cost of having more parameters to fit. [sent-88, score-0.417]
42 Here αi is a model parameter that favors certain (relative) spatial and angular bins for part i with respect to its parent. [sent-90, score-0.235]
43 Figure 5: We record the spatial configuration of an arm given the torso by placing a grid on the torso, and noting which bin the arm falls into. [sent-91, score-0.596]
44 We center the grid at the average location of arm in the training data. [sent-92, score-0.251]
45 We likewise bin the angular orientations to define a spatial distribution of arms given torsos. [sent-93, score-0.262]
46 Φ(li ) corresponds to the local image evidence for a part, which we define as T φ(li ) =βi fi (I(li )) (3) We write fi (I(li )) for feature vector extracted from the oriented image patch at location li . [sent-94, score-0.559]
47 Since E is a tree, we first pass “upstream” messages from part i to its parent j We compute the message from part i to j as mi (lj ) ∝ ψ(li − lj )ai (li ) (4) lj ai (li ) ∝ φ(li ) mk (li ) (5) k∈kidsi Message passing can be performed exhaustively and efficiently with convolutions. [sent-100, score-0.594]
48 If we temporarily ignore orientation and think of li = (xi , yi ), we can represent messages as 2D images. [sent-101, score-0.323]
49 The image ai is obtained by multiplying together response images from the children of part i and from the imaging model φ(li ). [sent-102, score-0.369]
50 φ(li ) can be computed by convolving the edge image with the filter βi . [sent-103, score-0.232]
51 mi (lj ) can be computed by convolving ai with a spatial filter extending over the bins from Fig. [sent-104, score-0.192]
52 This means that in practice, computing φ(li ) is the computational bottleneck, since that requires convolving the edge image repeatedly with rotated versions of filter βi . [sent-109, score-0.293]
53 Starting from the root, we can pass messages downstream from part j to part i (again with convolutions) P(li |I) ∝ ai (li ) ψ(li − lj )P(lj |I) (6) lj For numerical stability, we normalize images to 1 as they are computed. [sent-110, score-0.695]
54 We label training images with body part locations L, and find the filters that maximize P(L|I) for the training set. [sent-114, score-0.432]
55 3 Building a region model One can use the marginals (for say, the head) to define a soft labeling for the image into head/nonhead pixels. [sent-117, score-0.277]
56 One can do this by repeatedly sampling a head location (according to P(li |I)) and then rendering a head at the given location and orientation. [sent-118, score-0.26]
57 Let the rendered appearance for part i be an image patch si ; we use a simple rectangular mask. [sent-119, score-0.291]
58 In the limit of infinite samples, one will obtain an image P(xi , yi , θi |I)sθi (x − xi , y − yi ) i pi (x, y) = (7) xi ,yi ,θi We call such an image a parse for part i (the images on the right from Fig. [sent-120, score-0.71]
59 Given the parse image pi , we learn a color histogram model for part i and “its” background. [sent-123, score-0.726]
60 P(f gi (k)) ∝ pi (x, y)δ(im(x, y) = k) (8) (1 − pi (x, y))δ(im(x, y) = k) (9) x,y P(bgi (k)) ∝ x,y We use the part-specific histogram models to label each pixel as foreground or background with a likelihood ratio test (as shown in Fig. [sent-124, score-0.252]
61 To enforce symmetry in appearance, we learn a single color model for left/right limb pairs. [sent-126, score-0.344]
62 4 Region-based deformable model After an initial parse, our algorithm has built an initial region model for each part (and its background). [sent-127, score-0.82]
63 We write the oriented patch features extracted from these label images as fir (for “region”-based). [sent-129, score-0.291]
64 We want to use these features to help re-estimate the pose in an image – we using training data to learn how to do so. [sent-130, score-0.528]
65 We learn model parameters for a region-based deformable model Θr by CRF parameter estimation, as in Sec. [sent-131, score-0.656]
66 When learning Θr from training data, defining fir is tricky – should we use the ground-truth part locations to learn the color histogram models? [sent-133, score-0.422]
67 Doing so might be unrealistic – it assumes at “runtime”, the edge-based deformable model will always correctly estimate part locations. [sent-134, score-0.622]
68 Rather, we run the edge-based model on the training data, and use the resulting parses to learn the color histogram models. [sent-135, score-0.366]
69 When applying the region-based deformable model, we have already computed the edge responses e φe (li ) = βi T f e (I(li )) (to train the region model). [sent-137, score-0.677]
70 If this was the case, one would learn a zero weight for r the edge feature when learning βi from training data. [sent-140, score-0.184]
71 We learn roughly equal weights for the edge and region features, indicating both cues are complementary rather than redundant. [sent-141, score-0.307]
72 Given the parse from the region-based model, we can re-learn a color model for each part and the background (and re-parse given the new models, and iterate). [sent-142, score-0.514]
73 We have amassed a dataset of 305 images of people in interesting poses (which will be available on the author’s webpage). [sent-147, score-0.376]
74 To our knowledge, it is the largest labeled dataset available for human pose recognition. [sent-149, score-0.352]
75 Evalutation: Given an image, our parsing procedure returns a distribution over poses P(L|I). [sent-151, score-0.321]
76 Ideally, we want the true pose to have a high probability, and all other poses to have a low value. [sent-152, score-0.403]
77 Given ˆ a set of T test images each with a labeled ground-truth pose Lt , we score performance by computing 1 ˆ t |It ). [sent-153, score-0.403]
78 − T t log P(L Figure 6: We visualize the part models for our deformable templates – light areas correspond to positive βi weights, and dark corresponds to negative. [sent-155, score-0.707]
79 It is crucial to initialize our iterative procedure with a good edge-based deformable model. [sent-156, score-0.594]
80 Given a collection of training images with labeled body parts, one could build an edge template for each part by averaging (left) – this is the standard maxie mum likelihood (ML) solution. [sent-157, score-0.536]
81 3 is also r very crucial – we similarly learn region-based part templates βi with a CRF (right). [sent-161, score-0.25]
82 These templates focus more on region cues rather than edges. [sent-162, score-0.246]
83 These templates appear more sophisticated than rectangle-based limb detectors [8, 9] – for example, to find upper arms and legs, it seems important to emphasize the edge facing away from the body. [sent-163, score-0.388]
84 For each image, our parsing procedure returns a distribution of poses. [sent-174, score-0.188]
85 We evaluate our algorithm by looking at a perplexity-based score [11] – the negative log probability of the ground truth pose given the estimated distribution, averaged over the test set. [sent-175, score-0.302]
86 On the left, we look at the large datasets of people and horses (each with 300 images). [sent-176, score-0.231]
87 Iter0 corresponds to the distribution computed by the edge-based model, while Iter1 and Iter2 show the results after our iterative parsing with a region-based model. [sent-177, score-0.187]
88 We localize some difficult poses quite well, and furthermore, the estimated posterior P(L|I) oftentimes reflects actual ambiguity in the data (ie, if multiple people are present). [sent-187, score-0.295]
89 The posterior pose distribution often captures the non-rigid deformations in the body. [sent-192, score-0.336]
90 This suggests we can use the uncertainty in our deformable matching algorithm to recover extra information about the object. [sent-193, score-0.541]
91 Discussion: We have described an iterative parsing approach to pose estimation. [sent-196, score-0.457]
92 Starting with an edge-based detector, we obtain an initial parse and iteratively build better features with which to subsequently parse. [sent-197, score-0.376]
93 In many cases the posterior is ambiguous because the image is (ie, multiple people are present). [sent-207, score-0.228]
94 In particular, it may be surprising that the pair in the bottom-right both are recognized by the region model – this suggests that the the iter-region dissimilarity learned by the color histograms is a much stronger than the foreground similarity. [sent-208, score-0.269]
95 Posecut: simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. [sent-214, score-0.349]
96 Learning to estimate human pose with data driven belief propagation. [sent-230, score-0.309]
97 6 – the posterior can capture rich non-rigid deformations of body parts. [sent-241, score-0.245]
98 The Weizmann set of horses seems to be easier than our people dataset - we quantify this with a perplexity score in Table 1. [sent-242, score-0.35]
99 Proposal maps driven mcmc for estimating human body pose in static images. [sent-246, score-0.536]
100 Recovering human body configurations using pairwise constraints between parts. [sent-271, score-0.218]
wordName wordTfidf (topN-words)
[('deformable', 0.504), ('pose', 0.27), ('li', 0.25), ('parse', 0.23), ('body', 0.179), ('arm', 0.176), ('lj', 0.167), ('torso', 0.14), ('leg', 0.134), ('poses', 0.133), ('horses', 0.132), ('parsing', 0.125), ('crf', 0.112), ('arms', 0.112), ('region', 0.104), ('images', 0.101), ('iter', 0.101), ('ramanan', 0.101), ('people', 0.099), ('image', 0.096), ('color', 0.096), ('head', 0.09), ('templates', 0.088), ('limb', 0.088), ('legs', 0.084), ('part', 0.082), ('learn', 0.08), ('weizmann', 0.08), ('bin', 0.071), ('appearance', 0.071), ('background', 0.07), ('skin', 0.07), ('build', 0.07), ('cvpr', 0.07), ('edge', 0.069), ('convolving', 0.067), ('fir', 0.066), ('histogram', 0.063), ('iterative', 0.062), ('im', 0.061), ('parses', 0.056), ('ai', 0.054), ('cues', 0.054), ('articulated', 0.053), ('bgi', 0.05), ('edgebased', 0.05), ('fie', 0.05), ('shirt', 0.05), ('wearing', 0.05), ('static', 0.048), ('features', 0.047), ('angular', 0.046), ('edges', 0.045), ('symmetry', 0.044), ('hallucinated', 0.044), ('perplexity', 0.044), ('dataset', 0.043), ('pi', 0.043), ('patch', 0.042), ('messages', 0.042), ('soft', 0.041), ('estimation', 0.041), ('ru', 0.04), ('location', 0.04), ('human', 0.039), ('bins', 0.038), ('tend', 0.038), ('segmentation', 0.038), ('framed', 0.037), ('convolutions', 0.037), ('horse', 0.037), ('extra', 0.037), ('model', 0.036), ('oriented', 0.035), ('casting', 0.035), ('training', 0.035), ('returns', 0.035), ('inference', 0.034), ('deformations', 0.033), ('foreground', 0.033), ('visualize', 0.033), ('rotated', 0.033), ('posterior', 0.033), ('spatial', 0.033), ('green', 0.033), ('score', 0.032), ('experience', 0.032), ('yi', 0.031), ('ren', 0.031), ('detectors', 0.031), ('ie', 0.031), ('fairly', 0.03), ('quite', 0.03), ('initial', 0.029), ('hard', 0.028), ('procedure', 0.028), ('versions', 0.028), ('recovering', 0.028), ('sequentially', 0.028), ('eccv', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 122 nips-2006-Learning to parse images of articulated bodies
Author: Deva Ramanan
Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1
2 0.27256098 66 nips-2006-Detecting Humans via Their Pose
Author: Alessandro Bissacco, Ming-Hsuan Yang, Stefano Soatto
Abstract: We consider the problem of detecting humans and classifying their pose from a single image. Specifically, our goal is to devise a statistical model that simultaneously answers two questions: 1) is there a human in the image? and, if so, 2) what is a low-dimensional representation of her pose? We investigate models that can be learned in an unsupervised manner on unlabeled images of human poses, and provide information that can be used to match the pose of a new image to the ones present in the training set. Starting from a set of descriptors recently proposed for human detection, we apply the Latent Dirichlet Allocation framework to model the statistics of these features, and use the resulting model to answer the above questions. We show how our model can efficiently describe the space of images of humans with their pose, by providing an effective representation of poses for tasks such as classification and matching, while performing remarkably well in human/non human decision problems, thus enabling its use for human detection. We validate the model with extensive quantitative experiments and comparisons with other approaches on human detection and pose matching. 1
3 0.11501876 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
Author: Yuanhao Chen, Long Zhu, Alan L. Yuille
Abstract: We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes. The individual objects can be recovered as different aspects of the grammar for the object class. In all cases, we validate our results by learning the probability grammars from training datasets and evaluating them on the test datasets. We compare our method to alternative approaches. The advantages of our approach is the speed of inference (under one second), the parsing of the object, and increased accuracy of performance. Moreover, our approach is very general and can be applied to a large range of objects and structures. 1
4 0.10874978 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
5 0.094331272 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
Author: Samuel S. Gross, Olga Russakovsky, Chuong B. Do, Serafim Batzoglou
Abstract: We consider the problem of training a conditional random field (CRF) to maximize per-label predictive accuracy on a training set, an approach motivated by the principle of empirical risk minimization. We give a gradient-based procedure for minimizing an arbitrarily accurate approximation of the empirical risk under a Hamming loss function. In experiments with both simulated and real data, our optimization procedure gives significantly better testing performance than several current approaches for CRF training, especially in situations of high label noise. 1
6 0.090969585 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation
7 0.085212633 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
8 0.084484823 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
9 0.082118243 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
10 0.07474561 170 nips-2006-Robotic Grasping of Novel Objects
11 0.074037939 34 nips-2006-Approximate Correspondences in High Dimensions
12 0.073009081 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
13 0.070916429 185 nips-2006-Subordinate class recognition using relational object models
14 0.069585957 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo
15 0.068700269 42 nips-2006-Bayesian Image Super-resolution, Continued
16 0.061571393 50 nips-2006-Chained Boosting
17 0.061170399 110 nips-2006-Learning Dense 3D Correspondence
18 0.060292009 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields
19 0.05911788 52 nips-2006-Clustering appearance and shape by learning jigsaws
20 0.057956144 49 nips-2006-Causal inference in sensorimotor integration
topicId topicWeight
[(0, -0.196), (1, 0.016), (2, 0.148), (3, -0.126), (4, 0.066), (5, -0.056), (6, -0.146), (7, -0.108), (8, -0.0), (9, -0.083), (10, 0.051), (11, 0.048), (12, 0.005), (13, 0.081), (14, 0.133), (15, 0.063), (16, -0.063), (17, -0.058), (18, 0.061), (19, -0.058), (20, 0.002), (21, -0.032), (22, -0.007), (23, 0.011), (24, 0.038), (25, -0.064), (26, 0.039), (27, -0.14), (28, -0.007), (29, -0.031), (30, -0.213), (31, -0.0), (32, -0.106), (33, -0.008), (34, -0.064), (35, -0.043), (36, -0.04), (37, -0.164), (38, -0.043), (39, -0.2), (40, -0.004), (41, 0.01), (42, -0.016), (43, 0.054), (44, -0.005), (45, -0.141), (46, -0.03), (47, -0.014), (48, 0.06), (49, -0.059)]
simIndex simValue paperId paperTitle
same-paper 1 0.94595146 122 nips-2006-Learning to parse images of articulated bodies
Author: Deva Ramanan
Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1
2 0.7906723 66 nips-2006-Detecting Humans via Their Pose
Author: Alessandro Bissacco, Ming-Hsuan Yang, Stefano Soatto
Abstract: We consider the problem of detecting humans and classifying their pose from a single image. Specifically, our goal is to devise a statistical model that simultaneously answers two questions: 1) is there a human in the image? and, if so, 2) what is a low-dimensional representation of her pose? We investigate models that can be learned in an unsupervised manner on unlabeled images of human poses, and provide information that can be used to match the pose of a new image to the ones present in the training set. Starting from a set of descriptors recently proposed for human detection, we apply the Latent Dirichlet Allocation framework to model the statistics of these features, and use the resulting model to answer the above questions. We show how our model can efficiently describe the space of images of humans with their pose, by providing an effective representation of poses for tasks such as classification and matching, while performing remarkably well in human/non human decision problems, thus enabling its use for human detection. We validate the model with extensive quantitative experiments and comparisons with other approaches on human detection and pose matching. 1
3 0.64363503 170 nips-2006-Robotic Grasping of Novel Objects
Author: Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Andrew Y. Ng
Abstract: We consider the problem of grasping novel objects, specifically ones that are being seen for the first time through vision. We present a learning algorithm that neither requires, nor tries to build, a 3-d model of the object. Instead it predicts, directly as a function of the images, a point at which to grasp the object. Our algorithm is trained via supervised learning, using synthetic images for the training set. We demonstrate on a robotic manipulation platform that this approach successfully grasps a wide variety of objects, such as wine glasses, duct tape, markers, a translucent box, jugs, knife-cutters, cellphones, keys, screwdrivers, staplers, toothbrushes, a thick coil of wire, a strangely shaped power horn, and others, none of which were seen in the training set. 1
4 0.62649989 52 nips-2006-Clustering appearance and shape by learning jigsaws
Author: Anitha Kannan, John Winn, Carsten Rother
Abstract: Patch-based appearance models are used in a wide range of computer vision applications. To learn such models it has previously been necessary to specify a suitable set of patch sizes and shapes by hand. In the jigsaw model presented here, the shape, size and appearance of patches are learned automatically from the repeated structures in a set of training images. By learning such irregularly shaped ‘jigsaw pieces’, we are able to discover both the shape and the appearance of object parts without supervision. When applied to face images, for example, the learned jigsaw pieces are surprisingly strongly associated with face parts of different shapes and scales such as eyes, noses, eyebrows and cheeks, to name a few. We conclude that learning the shape of the patch not only improves the accuracy of appearance-based part detection but also allows for shape-based part detection. This enables parts of similar appearance but different shapes to be distinguished; for example, while foreheads and cheeks are both skin colored, they have markedly different shapes. 1
5 0.59838653 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
Author: Yuanhao Chen, Long Zhu, Alan L. Yuille
Abstract: We describe an unsupervised method for learning a probabilistic grammar of an object from a set of training examples. Our approach is invariant to the scale and rotation of the objects. We illustrate our approach using thirteen objects from the Caltech 101 database. In addition, we learn the model of a hybrid object class where we do not know the specific object or its position, scale or pose. This is illustrated by learning a hybrid class consisting of faces, motorbikes, and airplanes. The individual objects can be recovered as different aspects of the grammar for the object class. In all cases, we validate our results by learning the probability grammars from training datasets and evaluating them on the test datasets. We compare our method to alternative approaches. The advantages of our approach is the speed of inference (under one second), the parsing of the object, and increased accuracy of performance. Moreover, our approach is very general and can be applied to a large range of objects and structures. 1
6 0.48464769 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
7 0.44828263 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
8 0.41514984 133 nips-2006-Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model
9 0.40988317 101 nips-2006-Isotonic Conditional Random Fields and Local Sentiment Flow
10 0.39877313 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
11 0.39077556 185 nips-2006-Subordinate class recognition using relational object models
12 0.38919368 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
13 0.38203034 174 nips-2006-Similarity by Composition
14 0.37391576 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
15 0.36195359 54 nips-2006-Comparative Gene Prediction using Conditional Random Fields
16 0.3579922 45 nips-2006-Blind Motion Deblurring Using Image Statistics
17 0.34911165 34 nips-2006-Approximate Correspondences in High Dimensions
18 0.34775665 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
19 0.34670144 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
20 0.34669483 50 nips-2006-Chained Boosting
topicId topicWeight
[(1, 0.055), (3, 0.025), (7, 0.079), (9, 0.026), (12, 0.375), (20, 0.013), (22, 0.042), (44, 0.061), (57, 0.146), (65, 0.048), (69, 0.035), (90, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.83500421 122 nips-2006-Learning to parse images of articulated bodies
Author: Deva Ramanan
Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1
2 0.80105644 39 nips-2006-Balanced Graph Matching
Author: Timothee Cour, Praveen Srinivasan, Jianbo Shi
Abstract: Graph matching is a fundamental problem in Computer Vision and Machine Learning. We present two contributions. First, we give a new spectral relaxation technique for approximate solutions to matching problems, that naturally incorporates one-to-one or one-to-many constraints within the relaxation scheme. The second is a normalization procedure for existing graph matching scoring functions that can dramatically improve the matching accuracy. It is based on a reinterpretation of the graph matching compatibility matrix as a bipartite graph on edges for which we seek a bistochastic normalization. We evaluate our two contributions on a comprehensive test set of random graph matching problems, as well as on image correspondence problem. Our normalization procedure can be used to improve the performance of many existing graph matching algorithms, including spectral matching, graduated assignment and semidefinite programming. 1
3 0.66384327 33 nips-2006-Analysis of Representations for Domain Adaptation
Author: Shai Ben-David, John Blitzer, Koby Crammer, Fernando Pereira
Abstract: Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. In many situations, though, we have labeled training data for a source domain, and we wish to learn a classifier which performs well on a target domain with a different distribution. Under what conditions can we adapt a classifier trained on the source domain for use in the target domain? Intuitively, a good feature representation is a crucial factor in the success of domain adaptation. We formalize this intuition theoretically with a generalization bound for domain adaption. Our theory illustrates the tradeoffs inherent in designing a representation for domain adaptation and gives a new justification for a recently proposed model. It also points toward a promising new model for domain adaptation: one which explicitly minimizes the difference between the source and target domains, while at the same time maximizing the margin of the training set. 1
4 0.49493214 110 nips-2006-Learning Dense 3D Correspondence
Author: Florian Steinke, Volker Blanz, Bernhard Schölkopf
Abstract: Establishing correspondence between distinct objects is an important and nontrivial task: correctness of the correspondence hinges on properties which are difficult to capture in an a priori criterion. While previous work has used a priori criteria which in some cases led to very good results, the present paper explores whether it is possible to learn a combination of features that, for a given training set of aligned human heads, characterizes the notion of correct correspondence. By optimizing this criterion, we are then able to compute correspondence and morphs for novel heads. 1
5 0.4723002 66 nips-2006-Detecting Humans via Their Pose
Author: Alessandro Bissacco, Ming-Hsuan Yang, Stefano Soatto
Abstract: We consider the problem of detecting humans and classifying their pose from a single image. Specifically, our goal is to devise a statistical model that simultaneously answers two questions: 1) is there a human in the image? and, if so, 2) what is a low-dimensional representation of her pose? We investigate models that can be learned in an unsupervised manner on unlabeled images of human poses, and provide information that can be used to match the pose of a new image to the ones present in the training set. Starting from a set of descriptors recently proposed for human detection, we apply the Latent Dirichlet Allocation framework to model the statistics of these features, and use the resulting model to answer the above questions. We show how our model can efficiently describe the space of images of humans with their pose, by providing an effective representation of poses for tasks such as classification and matching, while performing remarkably well in human/non human decision problems, thus enabling its use for human detection. We validate the model with extensive quantitative experiments and comparisons with other approaches on human detection and pose matching. 1
6 0.46351638 80 nips-2006-Fundamental Limitations of Spectral Clustering
7 0.45643604 188 nips-2006-Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks
8 0.45195529 34 nips-2006-Approximate Correspondences in High Dimensions
9 0.45084134 74 nips-2006-Efficient Structure Learning of Markov Networks using $L 1$-Regularization
10 0.44865638 52 nips-2006-Clustering appearance and shape by learning jigsaws
11 0.44541463 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
12 0.44426629 86 nips-2006-Graph-Based Visual Saliency
13 0.43992162 50 nips-2006-Chained Boosting
14 0.43952417 42 nips-2006-Bayesian Image Super-resolution, Continued
15 0.4363687 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
16 0.43424779 136 nips-2006-Multi-Instance Multi-Label Learning with Application to Scene Classification
17 0.43344286 47 nips-2006-Boosting Structured Prediction for Imitation Learning
18 0.43305197 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
19 0.43167424 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation
20 0.4307496 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions