nips nips2006 nips2006-170 knowledge-graph by maker-knowledge-mining

170 nips-2006-Robotic Grasping of Novel Objects


Source: pdf

Author: Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Andrew Y. Ng

Abstract: We consider the problem of grasping novel objects, specifically ones that are being seen for the first time through vision. We present a learning algorithm that neither requires, nor tries to build, a 3-d model of the object. Instead it predicts, directly as a function of the images, a point at which to grasp the object. Our algorithm is trained via supervised learning, using synthetic images for the training set. We demonstrate on a robotic manipulation platform that this approach successfully grasps a wide variety of objects, such as wine glasses, duct tape, markers, a translucent box, jugs, knife-cutters, cellphones, keys, screwdrivers, staplers, toothbrushes, a thick coil of wire, a strangely shaped power horn, and others, none of which were seen in the training set. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider the problem of grasping novel objects, specifically ones that are being seen for the first time through vision. [sent-4, score-0.691]

2 Instead it predicts, directly as a function of the images, a point at which to grasp the object. [sent-6, score-0.462]

3 1 Introduction In this paper, we address the problem of grasping novel objects that a robot is perceiving for the first time through vision. [sent-9, score-0.988]

4 [15] However, autonomously grasping a previously unknown object still remains a challenging problem. [sent-11, score-0.751]

5 This is particularly true if we have only a single camera; for stereo systems, 3-d reconstruction is difficult for objects without texture, and even when stereopsis works well, it would typically reconstruct only the visible portions of the object. [sent-14, score-0.238]

6 Instead it predicts, directly as a function of the images, a point at which to grasp the object. [sent-17, score-0.462]

7 Informally, the algorithm takes two or more pictures of the object, and then tries to identify a point within each 2-d image that corresponds to a good point at which to grasp the object. [sent-18, score-0.621]

8 (For example, if trying to grasp a coffee mug, it might try to identify the mid-point of the handle. [sent-19, score-0.523]

9 Thus, rather than trying to triangulate every single point within each image in order to estimate depths—as in dense stereo—we only attempt to triangulate one (or at most a small number of) points corresponding to the 3-d point where we will grasp the object. [sent-21, score-0.633]

10 This allows us to grasp an object without ever needing to obtain its full 3-d shape, and applies even to textureless, translucent or reflective objects on which standard stereo 3-d reconstruction fares poorly. [sent-22, score-0.812]

11 To the best of our knowledge, our work represents the first algorithm capable of grasping novel objects (ones where a 3-d model is not available), including ones from novel object classes, that we are perceiving for the first time using vision. [sent-23, score-1.015]

12 Figure 1: Examples of objects on which the grasping algorithm was tested. [sent-24, score-0.809]

13 Piater’s algorithm [9] learned to position single fingers given a top-down view of an object, but considered only very simple objects (specifically, square, triangle and round “blocks”). [sent-29, score-0.225]

14 ) [14], but this seems unlikely to apply directly to grasping objects from novel object classes. [sent-33, score-0.943]

15 To pick up an object, we need to identify the grasping point—more formally, a position for the robot’s end-effector. [sent-34, score-0.794]

16 This paper focuses on the task of grasp identification, and thus we will consider only objects that can be picked up without performing complex manipulation. [sent-35, score-0.612]

17 1 We will attempt to grasp a number of common office and household objects such as toothbrushes, pens, books, mugs, martini glasses, jugs, keys, duct tape, and markers. [sent-36, score-0.71]

18 In Section 2, we describe our learning approach, as well as our probabilistic model for inferring the grasping point. [sent-40, score-0.654]

19 In Section 3, we describe the motion planning/trajectory planning (on our 5 degree of freedom arm) for moving the manipulator to the grasping point. [sent-41, score-0.695]

20 We propose a learning approach that uses visual features to predict good grasping points across a large range of objects. [sent-46, score-0.654]

21 Given two (or more) images of an object taken from different camera positions, we will predict the 3-d position of a grasping point. [sent-47, score-0.999]

22 In our approach, we will predict the 2-d location of the grasp in each image; more formally, we will try to identify the projection of a good grasping point onto the image plane. [sent-49, score-1.246]

23 If each of these points can be perfectly identified in each image, we can then easily “triangulate” from these images to obtain the 3-d grasping point. [sent-50, score-0.743]

24 ) In practice it is difficult to identify the projection of a grasping point into the image plane (and, if there are multiple grasping points, then the correspondence problem—i. [sent-53, score-1.467]

25 , deciding which grasping point in one image corresponds to which point in another image—must also be solved). [sent-55, score-0.784]

26 Figure 3: Synthetic images of the objects used for training. [sent-58, score-0.244]

27 The classes of objects used for training were martini glasses, mugs, teacups, pencils, whiteboard erasers, and books. [sent-59, score-0.209]

28 To address all of these issues, we develop a probabilistic model over possible grasping points, and apply it to infer a good position at which to grasp an object. [sent-61, score-1.157]

29 1 Features In our approach, we begin by dividing the image into small rectangular patches, and for each patch predict if it is a projection of a grasping point onto the image plane. [sent-63, score-0.894]

30 To predict if a patch contains a grasping point, local image features centered on the patch are insufficient, and one has to use more global properties of the object. [sent-71, score-0.806]

31 2 Synthetic Data for Training We apply supervised learning to learn to identify patches that contain grasping points. [sent-77, score-0.685]

32 , a set of images of objects labeled with the 2-d location of the grasping point in each image. [sent-80, score-0.927]

33 In detail, we generate synthetic images along with correct grasp (Fig. [sent-83, score-0.611]

34 3) using a computer graphics ray tracer,4 as this produces more realistic images than other simpler rendering methods. [sent-84, score-0.257]

35 First, once a synthetic model of an object has been created, a large number of training examples can be automatically generated by rendering the object under different (randomly chosen) lighting conditions, camera positions and orientations, etc. [sent-86, score-0.399]

36 We generated 2500 examples from synthetic data, comprising objects from six object classes (see Figure 3). [sent-102, score-0.341]

37 Using synthetic data also allows us to generate perfect labels for the training set with the exact location of a good grasp for each object. [sent-103, score-0.522]

38 3 Probabilistic Model On our manipulation platform, we have a camera mounted on the robotic arm. [sent-106, score-0.361]

39 6) When asked to grasp an object, we command the arm to move the camera to two or more positions, so as to acquire images of the object from these positions. [sent-108, score-0.879]

40 However, there are inaccuracies in the physical positioning of the arm, and hence some slight uncertainty in the position of the camera when the images are acquired. [sent-109, score-0.289]

41 Formally, let C be the image that would have been taken if the robot had moved exactly to the commanded position and ˆ orientation. [sent-111, score-0.249]

42 However, due to robot positioning error, instead an image C is taken from a slightly different location. [sent-112, score-0.22]

43 Let (u, v) be a 2-d position in image C, and let (ˆ, v ) be the corresponding image u ˆ ˆ ˆ u ˆ position in C. [sent-113, score-0.284]

44 u ˆ ˆ ˆ For each location (u, v) in an image C, we define the class label to be z(u, v) = 1{(u, v) is the projection of a grasping point into image plane}. [sent-116, score-0.854]

45 ) For a corresponding location (ˆ, v ) in image C, we similarly u ˆ ˆ define z (ˆ, v ) to indicate whether position (ˆ, v ) represents a grasping point in the image C. [sent-118, score-0.897]

46 Further, we use logistic regression to model ˆ in C being a good grasping point: v) ˆ = 1|C) = P (ˆ(u + z u, v + v) = 1|x; w) = 1/(1 + e−w T x ) (3) where x ∈ R459 are the features for the rectangular patch centered at (u + u , v + v ) in image ˆ C (described in Section 2. [sent-121, score-0.792]

47 Given two or more images of a new object from different camera positions, we want to infer the 3-d position of the grasping point. [sent-127, score-0.999]

48 ) Because logistic regression may have predicted multiple grasping points per image, there is usually ambiguity in the correspondence problem (i. [sent-130, score-0.68]

49 , which grasping point in one image corresponds to which graping point in another). [sent-132, score-0.784]

50 To address this while also taking into account the uncertainty in camera position, we propose a probabilistic model over possible grasping points in 3-d space. [sent-133, score-0.743]

51 In detail, we discretize the 3-d work-space of the robotic arm into a regular 3-d grid G ⊂ R3 , and associate with each grid element j a random variable yj = 1{grid cell j is a grasping point}. [sent-134, score-1.086]

52 In image Ci , let the ray passing through (u, v) be denoted Ri (u, v). [sent-139, score-0.213]

53 From our experiments (see Section 4), if we set σ 2 = 0, the triangulation is highly inaccurate, with average error in predicting grasping point being 15. [sent-146, score-0.683]

54 Figure 4: (a) Diagram illustrating rays from two images C1 and C2 intersecting at a grasping point (shown in dark blue). [sent-149, score-0.864]

55 (b) Actual plot in 3-d showing multiple rays from 4 images intersecting at the grasping point. [sent-150, score-0.835]

56 ) We know that if any of the grid-cells rj along the ray represent a grasping point, then its projection is a grasp point. [sent-153, score-1.255]

57 ,CN ) N P (yj =1|Ci )P (Ci ) i=1 P (yj =1) ∝ N i=1 P (Ci |yj = 1) (6) N i=1 P (yj = 1|Ci ) (7) where P (yj = 1) is the prior probability of a grid-cell being a grasping point (set to a constant value in our experiments). [sent-179, score-0.683]

58 Using Equations 2, 3, 5, and 7, we can now compute (up to a constant of proportionality that does not depend on the grid-cell) the probability of any grid-cell y j being a valid grasping point, given the images. [sent-180, score-0.654]

59 4 MAP Inference We infer the best grasping point by choosing the 3-d position (grid-cell) that is most likely to be a valid grasping point. [sent-182, score-1.407]

60 A straightforward implementation that explicitly computes the sum above for every single grid-cell would give good grasping performance, but be extremely inefficient (over 110 seconds). [sent-189, score-0.654]

61 For real-time manipulation, we therefore used a more efficient search algorithm in which we explicitly consider only grid-cells yj that at least one ray Ri (u, v) intersects. [sent-190, score-0.229]

62 7 This results in an algorithm that identifies a grasping position in 1. [sent-193, score-0.724]

63 The red points in each image show the most likely locations, predicted to be candidate grasping points by our logistic regression model. [sent-199, score-0.752]

64 ) Figure 6: The robotic arm picking up various objects: box, screwdriver, duct-tape, wine glass, a solder tool holder, powerhorn, cellphone, and martini glass and cereal bowl from dishwasher. [sent-201, score-0.544]

65 3 Control Having identified a grasping point, we have to move the end-effector of the robotic arm to it, and pick up the object. [sent-202, score-1.037]

66 In detail, our algorithm plans a trajectory in joint angle space [5] to take the endeffector to an approach position,8 and then moves the end-effecter in a straight line forward towards the grasping point. [sent-203, score-0.654]

67 Our robotic arm uses two classes of grasps: downward grasps and outward grasps. [sent-204, score-0.479]

68 These arise as a direct consequence of the shape of the workspace of our 5 dof robotic arm (Fig. [sent-205, score-0.344]

69 A “downward” grasp is used for objects that are close to the base of the arm, which the arm will grasp by reaching in a downward direction. [sent-207, score-1.232]

70 An “outward” grasp is for objects further away from the base, for which the arm is unable to reach in a downward direction. [sent-208, score-0.799]

71 The class is determined based on the position of the object and grasping point. [sent-209, score-0.821]

72 1 Hardware Setup Our experiments used a mobile robotic platform called STAIR (STanford AI Robot) on which are mounted a robotic arm, as well as other equipment such as our web-camera, microphones, etc. [sent-211, score-0.433]

73 STAIR was built as part of a project whose long-term goal is to create a robot that can navigate home and office environments, pick up and interact with objects and tools, and intelligently converse with and help people in these environments. [sent-212, score-0.301]

74 Our algorithms for grasping novel objects represent a first step towards achieving some of these goals. [sent-213, score-0.846]

75 The robotic arm we used is the Harmonic Arm made by Neuronics. [sent-214, score-0.344]

76 This is a 4 kg, 5-dof arm equipped with a parallel plate gripper, and has a positioning accuracy of ±1 mm. [sent-215, score-0.212]

77 8 The approach position is set to be a fixed distance away from the predicted grasp point. [sent-217, score-0.503]

78 Table 1: Average absolute error in locating the grasp point for different objects, as well as grasp success rate for picking up the different objects using our robotic arm. [sent-218, score-1.248]

79 (Although training was done on synthetic images, testing was done on the real robotic arm and real objects. [sent-219, score-0.433]

80 ) The average accuracy for classifying whether a 2-d image patch is a projection of a grasping point was 94. [sent-240, score-0.822]

81 2% (evaluated on a balanced test set), although the accuracy in predicting 3-d grasping points was higher because the probabilistic model for inferring a 3-d grasping point automatically aggregates data from multiple images, and therefore “fixes” some of the errors from individual classifiers. [sent-241, score-1.337]

82 Recall that the parameters of the vision algorithm were trained from synthetic images of a small set of six object classes, namely books, martini glasses, white-board erasers, coffee mugs, tea cups and pencils. [sent-244, score-0.429]

83 ) We note that many of these objects are translucent, textureless, and/or reflective, making 3-d reconstruction difficult for standard stereo systems. [sent-248, score-0.214]

84 Despite being tested on images of real (rather than synthetic) objects, including many very different from ones in the training set, it was usually able to identify correct grasp points. [sent-251, score-0.553]

85 We note that test set error (in terms of average absolute error in the predicted position of the grasp point) on the real images was only somewhat higher than the error on synthetic images; this shows that the algorithm trained on synthetic images transfers well to real images. [sent-252, score-0.859]

86 ) For comparison, neonate humans can grasp simple objects with an average accuracy of 1. [sent-256, score-0.612]

87 [2] Table 1 shows the errors in the predicted grasping points on the test set. [sent-258, score-0.654]

88 , coffee mugs) and those which were very dissimilar to the training objects (e. [sent-261, score-0.214]

89 In addition to reporting errors in grasp positions, we also report the grasp success rate, i. [sent-264, score-0.866]

90 , the fraction of times the robotic arm was able to physically pick up the object (out of 4 trials). [sent-266, score-0.48]

91 On average, the robot picked up the novel objects 87. [sent-267, score-0.323]

92 However, grasping objects such as mugs or jugs (by the handle) allows only a narrow trajectory of approach—where one “finger” is inserted into the handle—so that even a small error in the grasping point identification causes the arm to hit and move the object, resulting in a failed grasp attempt. [sent-271, score-2.285]

93 Although it may be possible to improve the algorithm’s accuracy, we believe that these problems can best be solved by using a more advanced robotic arm that is capable of haptic (touch) feedback. [sent-272, score-0.344]

94 In many instances, the algorithm was able to pick up completely novel objects (a strangely shaped power-horn, duct-tape, solder tool holder, etc. [sent-273, score-0.348]

95 Videos showing the robot grasping the objects, are available at http://ai. [sent-281, score-0.761]

96 5 demonstrates that the algorithm correctly identifies the grasp on multiple objects even in the presence of clutter and occlusion. [sent-285, score-0.588]

97 5 Conclusions We proposed an algorithm to enable a robot to grasp an object that it has never seen before. [sent-288, score-0.637]

98 Instead it predicts, directly as a function of the images, a point at which to grasp the object. [sent-290, score-0.462]

99 Acknowledgment We give warm thanks to Anya Petrovskaya, Morgan Quigley, and Jimmy Zhang for help with the robotic arm control driver software. [sent-292, score-0.344]

100 We also appended some hand-labeled real examples of dishwasher images to the training set to prevent the algorithm from identifying grasping points on background clutter, such as dishwasher prongs. [sent-392, score-0.821]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('grasping', 0.654), ('grasp', 0.433), ('robotic', 0.173), ('arm', 0.171), ('objects', 0.155), ('ray', 0.141), ('mugs', 0.108), ('robot', 0.107), ('object', 0.097), ('grasps', 0.095), ('synthetic', 0.089), ('images', 0.089), ('camera', 0.089), ('yj', 0.088), ('glasses', 0.082), ('jugs', 0.081), ('saxena', 0.081), ('tape', 0.081), ('toothbrushes', 0.081), ('ci', 0.079), ('image', 0.072), ('position', 0.07), ('duct', 0.068), ('keys', 0.068), ('rays', 0.068), ('translucent', 0.068), ('stereo', 0.059), ('coffee', 0.059), ('cellphones', 0.054), ('martini', 0.054), ('mounted', 0.054), ('strangely', 0.054), ('wine', 0.054), ('zi', 0.047), ('manipulation', 0.045), ('cups', 0.041), ('erasers', 0.041), ('icra', 0.041), ('manipulator', 0.041), ('monocular', 0.041), ('opengl', 0.041), ('pens', 0.041), ('positioning', 0.041), ('screwdrivers', 0.041), ('staplers', 0.041), ('wire', 0.041), ('yrk', 0.041), ('downward', 0.04), ('glass', 0.04), ('patch', 0.04), ('pick', 0.039), ('novel', 0.037), ('shaped', 0.036), ('triangulate', 0.035), ('perceiving', 0.035), ('platform', 0.033), ('coil', 0.032), ('color', 0.031), ('identify', 0.031), ('markers', 0.031), ('book', 0.03), ('point', 0.029), ('maxj', 0.028), ('horn', 0.028), ('robots', 0.028), ('cm', 0.028), ('texture', 0.028), ('projection', 0.027), ('tries', 0.027), ('bjects', 0.027), ('dishwasher', 0.027), ('driemeyer', 0.027), ('ean', 0.027), ('ested', 0.027), ('grasped', 0.027), ('holder', 0.027), ('hsiao', 0.027), ('michels', 0.027), ('ngers', 0.027), ('pencils', 0.027), ('rasp', 0.027), ('rror', 0.027), ('solder', 0.027), ('stair', 0.027), ('unloading', 0.027), ('yrj', 0.027), ('books', 0.027), ('rendering', 0.027), ('formally', 0.026), ('box', 0.026), ('thick', 0.026), ('logistic', 0.026), ('ri', 0.026), ('picking', 0.025), ('picked', 0.024), ('neonate', 0.024), ('stereopsis', 0.024), ('intersecting', 0.024), ('appended', 0.024), ('justin', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 170 nips-2006-Robotic Grasping of Novel Objects

Author: Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Andrew Y. Ng

Abstract: We consider the problem of grasping novel objects, specifically ones that are being seen for the first time through vision. We present a learning algorithm that neither requires, nor tries to build, a 3-d model of the object. Instead it predicts, directly as a function of the images, a point at which to grasp the object. Our algorithm is trained via supervised learning, using synthetic images for the training set. We demonstrate on a robotic manipulation platform that this approach successfully grasps a wide variety of objects, such as wine glasses, duct tape, markers, a translucent box, jugs, knife-cutters, cellphones, keys, screwdrivers, staplers, toothbrushes, a thick coil of wire, a strangely shaped power horn, and others, none of which were seen in the training set. 1

2 0.081752911 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

Author: David B. Grimes, Daniel R. Rashid, Rajesh P. Rao

Abstract: Learning by imitation represents an important mechanism for rapid acquisition of new behaviors in humans and robots. A critical requirement for learning by imitation is the ability to handle uncertainty arising from the observation process as well as the imitator’s own dynamics and interactions with the environment. In this paper, we present a new probabilistic method for inferring imitative actions that takes into account both the observations of the teacher as well as the imitator’s dynamics. Our key contribution is a nonparametric learning method which generalizes to systems with very different dynamics. Rather than relying on a known forward model of the dynamics, our approach learns a nonparametric forward model via exploration. Leveraging advances in approximate inference in graphical models, we show how the learned forward model can be directly used to plan an imitating sequence. We provide experimental results for two systems: a biomechanical model of the human arm and a 25-degrees-of-freedom humanoid robot. We demonstrate that the proposed method can be used to learn appropriate motor inputs to the model arm which imitates the desired movements. A second set of results demonstrates dynamically stable full-body imitation of a human teacher by the humanoid robot. 1

3 0.075853154 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions

Author: Andrea Frome, Yoram Singer, Jitendra Malik

Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1

4 0.07474561 122 nips-2006-Learning to parse images of articulated bodies

Author: Deva Ramanan

Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1

5 0.065434836 185 nips-2006-Subordinate class recognition using relational object models

Author: Aharon B. Hillel, Daphna Weinshall

Abstract: We address the problem of sub-ordinate class recognition, like the distinction between different types of motorcycles. Our approach is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories (like motorcycles), while sub-ordinate categories are more often defined by part properties (like ’jagged wheels’). Accordingly, we suggest a two-stage algorithm: First, a relational part based object model is learnt using unsegmented object images from the inclusive class (e.g., motorcycles in general). The model is then used to build a class-specific vector representation for images, where each entry corresponds to a model’s part. In the second stage we train a standard discriminative classifier to classify subclass instances (e.g., cross motorcycles) based on the class-specific vector representation. We describe extensive experimental results with several subclasses. The proposed algorithm typically gives better results than a competing one-step algorithm, or a two stage algorithm where classification is based on a model of the sub-ordinate class. 1

6 0.06517294 110 nips-2006-Learning Dense 3D Correspondence

7 0.063347913 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

8 0.056806732 148 nips-2006-Nonlinear physically-based models for decoding motor-cortical population activity

9 0.056284484 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests

10 0.054574113 69 nips-2006-Distributed Inference in Dynamical Systems

11 0.052092839 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing

12 0.050902437 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency

13 0.050277311 115 nips-2006-Learning annotated hierarchies from relational data

14 0.049877211 42 nips-2006-Bayesian Image Super-resolution, Continued

15 0.049243707 50 nips-2006-Chained Boosting

16 0.047872748 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation

17 0.047517307 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo

18 0.044796862 47 nips-2006-Boosting Structured Prediction for Imitation Learning

19 0.042790025 103 nips-2006-Kernels on Structured Objects Through Nested Histograms

20 0.040982615 66 nips-2006-Detecting Humans via Their Pose


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.122), (1, 0.011), (2, 0.082), (3, -0.066), (4, 0.014), (5, -0.033), (6, -0.094), (7, -0.069), (8, 0.009), (9, -0.036), (10, 0.029), (11, -0.015), (12, -0.029), (13, 0.031), (14, 0.047), (15, 0.043), (16, 0.001), (17, -0.03), (18, 0.008), (19, 0.051), (20, -0.008), (21, 0.083), (22, -0.066), (23, -0.046), (24, 0.025), (25, 0.042), (26, 0.019), (27, -0.062), (28, 0.071), (29, -0.03), (30, -0.138), (31, 0.022), (32, 0.057), (33, -0.083), (34, -0.014), (35, -0.133), (36, -0.001), (37, -0.035), (38, 0.048), (39, -0.094), (40, 0.15), (41, -0.08), (42, 0.157), (43, 0.0), (44, -0.053), (45, -0.1), (46, -0.069), (47, -0.005), (48, 0.01), (49, -0.125)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91161871 170 nips-2006-Robotic Grasping of Novel Objects

Author: Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Andrew Y. Ng

Abstract: We consider the problem of grasping novel objects, specifically ones that are being seen for the first time through vision. We present a learning algorithm that neither requires, nor tries to build, a 3-d model of the object. Instead it predicts, directly as a function of the images, a point at which to grasp the object. Our algorithm is trained via supervised learning, using synthetic images for the training set. We demonstrate on a robotic manipulation platform that this approach successfully grasps a wide variety of objects, such as wine glasses, duct tape, markers, a translucent box, jugs, knife-cutters, cellphones, keys, screwdrivers, staplers, toothbrushes, a thick coil of wire, a strangely shaped power horn, and others, none of which were seen in the training set. 1

2 0.57085341 122 nips-2006-Learning to parse images of articulated bodies

Author: Deva Ramanan

Abstract: We consider the machine vision task of pose estimation from static images, specifically for the case of articulated objects. This problem is hard because of the large number of degrees of freedom to be estimated. Following a established line of research, pose estimation is framed as inference in a probabilistic model. In our experience however, the success of many approaches often lie in the power of the features. Our primary contribution is a novel casting of visual inference as an iterative parsing process, where one sequentially learns better and better features tuned to a particular image. We show quantitative results for human pose estimation on a database of over 300 images that suggest our algorithm is competitive with or surpasses the state-of-the-art. Since our procedure is quite general (it does not rely on face or skin detection), we also use it to estimate the poses of horses in the Weizmann database. 1

3 0.53087246 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation

Author: David B. Grimes, Daniel R. Rashid, Rajesh P. Rao

Abstract: Learning by imitation represents an important mechanism for rapid acquisition of new behaviors in humans and robots. A critical requirement for learning by imitation is the ability to handle uncertainty arising from the observation process as well as the imitator’s own dynamics and interactions with the environment. In this paper, we present a new probabilistic method for inferring imitative actions that takes into account both the observations of the teacher as well as the imitator’s dynamics. Our key contribution is a nonparametric learning method which generalizes to systems with very different dynamics. Rather than relying on a known forward model of the dynamics, our approach learns a nonparametric forward model via exploration. Leveraging advances in approximate inference in graphical models, we show how the learned forward model can be directly used to plan an imitating sequence. We provide experimental results for two systems: a biomechanical model of the human arm and a 25-degrees-of-freedom humanoid robot. We demonstrate that the proposed method can be used to learn appropriate motor inputs to the model arm which imitates the desired movements. A second set of results demonstrates dynamically stable full-body imitation of a human teacher by the humanoid robot. 1

4 0.50866836 52 nips-2006-Clustering appearance and shape by learning jigsaws

Author: Anitha Kannan, John Winn, Carsten Rother

Abstract: Patch-based appearance models are used in a wide range of computer vision applications. To learn such models it has previously been necessary to specify a suitable set of patch sizes and shapes by hand. In the jigsaw model presented here, the shape, size and appearance of patches are learned automatically from the repeated structures in a set of training images. By learning such irregularly shaped ‘jigsaw pieces’, we are able to discover both the shape and the appearance of object parts without supervision. When applied to face images, for example, the learned jigsaw pieces are surprisingly strongly associated with face parts of different shapes and scales such as eyes, noses, eyebrows and cheeks, to name a few. We conclude that learning the shape of the patch not only improves the accuracy of appearance-based part detection but also allows for shape-based part detection. This enables parts of similar appearance but different shapes to be distinguished; for example, while foreheads and cheeks are both skin colored, they have markedly different shapes. 1

5 0.44187772 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions

Author: Andrea Frome, Yoram Singer, Jitendra Malik

Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1

6 0.39593256 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing

7 0.39582655 185 nips-2006-Subordinate class recognition using relational object models

8 0.39229041 148 nips-2006-Nonlinear physically-based models for decoding motor-cortical population activity

9 0.38861632 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection

10 0.37164521 25 nips-2006-An Application of Reinforcement Learning to Aerobatic Helicopter Flight

11 0.36714002 47 nips-2006-Boosting Structured Prediction for Imitation Learning

12 0.35966119 110 nips-2006-Learning Dense 3D Correspondence

13 0.35462686 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy

14 0.34245357 49 nips-2006-Causal inference in sensorimotor integration

15 0.34203905 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests

16 0.34021708 66 nips-2006-Detecting Humans via Their Pose

17 0.3378174 45 nips-2006-Blind Motion Deblurring Using Image Statistics

18 0.33604154 15 nips-2006-A Switched Gaussian Process for Estimating Disparity and Segmentation in Binocular Stereo

19 0.32792929 115 nips-2006-Learning annotated hierarchies from relational data

20 0.30290076 50 nips-2006-Chained Boosting


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.052), (3, 0.023), (7, 0.049), (9, 0.022), (22, 0.031), (44, 0.061), (57, 0.089), (65, 0.525), (69, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93609565 103 nips-2006-Kernels on Structured Objects Through Nested Histograms

Author: Marco Cuturi, Kenji Fukumizu

Abstract: We propose a family of kernels for structured objects which is based on the bag-ofcomponents paradigm. However, rather than decomposing each complex object into the single histogram of its components, we use for each object a family of nested histograms, where each histogram in this hierarchy describes the object seen from an increasingly granular perspective. We use this hierarchy of histograms to define elementary kernels which can detect coarse and fine similarities between the objects. We compute through an efficient averaging trick a mixture of such specific kernels, to propose a final kernel value which weights efficiently local and global matches. We propose experimental results on an image retrieval experiment which show that this mixture is an effective template procedure to be used with kernels on histograms.

same-paper 2 0.88806152 170 nips-2006-Robotic Grasping of Novel Objects

Author: Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Andrew Y. Ng

Abstract: We consider the problem of grasping novel objects, specifically ones that are being seen for the first time through vision. We present a learning algorithm that neither requires, nor tries to build, a 3-d model of the object. Instead it predicts, directly as a function of the images, a point at which to grasp the object. Our algorithm is trained via supervised learning, using synthetic images for the training set. We demonstrate on a robotic manipulation platform that this approach successfully grasps a wide variety of objects, such as wine glasses, duct tape, markers, a translucent box, jugs, knife-cutters, cellphones, keys, screwdrivers, staplers, toothbrushes, a thick coil of wire, a strangely shaped power horn, and others, none of which were seen in the training set. 1

3 0.8866924 146 nips-2006-No-regret Algorithms for Online Convex Programs

Author: Geoffrey J. Gordon

Abstract: Online convex programming has recently emerged as a powerful primitive for designing machine learning algorithms. For example, OCP can be used for learning a linear classifier, dynamically rebalancing a binary search tree, finding the shortest path in a graph with unknown edge lengths, solving a structured classification problem, or finding a good strategy in an extensive-form game. Several researchers have designed no-regret algorithms for OCP. But, compared to algorithms for special cases of OCP such as learning from expert advice, these algorithms are not very numerous or flexible. In learning from expert advice, one tool which has proved particularly valuable is the correspondence between no-regret algorithms and convex potential functions: by reasoning about these potential functions, researchers have designed algorithms with a wide variety of useful guarantees such as good performance when the target hypothesis is sparse. Until now, there has been no such recipe for the more general OCP problem, and therefore no ability to tune OCP algorithms to take advantage of properties of the problem or data. In this paper we derive a new class of no-regret learning algorithms for OCP. These Lagrangian Hedging algorithms are based on a general class of potential functions, and are a direct generalization of known learning rules like weighted majority and external-regret matching. In addition to proving regret bounds, we demonstrate our algorithms learning to play one-card poker. 1

4 0.87470371 156 nips-2006-Ordinal Regression by Extended Binary Classification

Author: Ling Li, Hsuan-tien Lin

Abstract: We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classification. In addition, our framework unifies many existing ordinal regression algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework. 1

5 0.86082685 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

Author: Robert Jenssen, Torbjørn Eltoft, Mark Girolami, Deniz Erdogmus

Abstract: We propose a new kernel-based data transformation technique. It is founded on the principle of maximum entropy (MaxEnt) preservation, hence named kernel MaxEnt. The key measure is Renyi’s entropy estimated via Parzen windowing. We show that kernel MaxEnt is based on eigenvectors, and is in that sense similar to kernel PCA, but may produce strikingly different transformed data sets. An enhanced spectral clustering algorithm is proposed, by replacing kernel PCA by kernel MaxEnt as an intermediate step. This has a major impact on performance.

6 0.55655712 203 nips-2006-implicit Online Learning with Kernels

7 0.53532511 26 nips-2006-An Approach to Bounded Rationality

8 0.53135425 61 nips-2006-Convex Repeated Games and Fenchel Duality

9 0.53087866 79 nips-2006-Fast Iterative Kernel PCA

10 0.5254637 115 nips-2006-Learning annotated hierarchies from relational data

11 0.48584029 82 nips-2006-Gaussian and Wishart Hyperkernels

12 0.48413235 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

13 0.48314959 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

14 0.48142594 117 nips-2006-Learning on Graph with Laplacian Regularization

15 0.47894803 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

16 0.47732866 65 nips-2006-Denoising and Dimension Reduction in Feature Space

17 0.47162449 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

18 0.4696466 80 nips-2006-Fundamental Limitations of Spectral Clustering

19 0.46899104 109 nips-2006-Learnability and the doubling dimension

20 0.46765029 112 nips-2006-Learning Nonparametric Models for Probabilistic Imitation