nips nips2008 nips2008-207 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Abstract: Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. 1
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. [sent-7, score-0.637]
2 These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. [sent-10, score-0.347]
3 In many cases, we are also interested in more refined descriptive questions with regards to an object such as “What is it doing? [sent-14, score-0.522]
4 In theory it is possible to convert some descriptive questions into discriminative classification tasks given the appropriate labels. [sent-20, score-0.43]
5 Intuitively, if we have a good model of what objects in a particular class “look like” and the range of variation that they exhibit, we can make these descriptive distinctions more readily, with a small number of training instances. [sent-22, score-0.33]
6 In this paper, we address the goal of finding precise, corresponded localizations of object classes in cluttered images while allowing for large deformations. [sent-24, score-0.542]
7 The Localizing Object Outlines using Probabilistic Shape (LOOPS) method constructs a unified probabilistic model that combines global shape with appearance-based boosted detectors to define a joint distribution over the location of the constituent elements on the object. [sent-25, score-0.319]
8 We can then leverage the object’s shape, an important characteristic that can be used for many descriptive distinctions [9], to address our descriptive tasks. [sent-26, score-0.574]
9 The main challenge is to correspond this model to a novel image while accounting for the possibility of object deformation and articulation. [sent-27, score-0.331]
10 The shape model is depicted via principal components corresponding to the neck and legs, and the ellipse marks one standard deviation from the mean. [sent-30, score-0.279]
11 Some works use geometry as a means toward object classification or detection [11, 2, 17, 21]). [sent-34, score-0.251]
12 , [3, 12]) do attempt to accurately localize objects in photographs but only allow for relatively rigid configurations, and cannot capture large deformations such as the articulation of the giraffe’s neck. [sent-38, score-0.258]
13 To the best of our knowledge, no work uses the consistent localization of parts for descriptive tasks. [sent-39, score-0.415]
14 Having a representation of the constituent elements of an object should aid in answering descriptive questions. [sent-40, score-0.489]
15 We adopt the AAM-like strategy of representing the shape of an object class via an ordered set of N landmark points that together constitute a piecewise linear contour. [sent-42, score-0.605]
16 Obtaining corresponded training outlines, however, requires painstaking supervision and we would like to be able to use readily available simple outlines such as those in the LabelMe dataset. [sent-43, score-0.559]
17 Therefore, before we begin, we need to automatically augment the simple training outlines with a corresponded labeling. [sent-44, score-0.5]
18 That is, we want to transform arbitrary outlines into useful training instances with consistent elements as depicted in the pipeline of our LOOPS method (Figure 1, first two boxes). [sent-45, score-0.439]
19 Once we have corresponded training outlines, each with N consistent landmarks, we can construct a distribution of the geometry of the objects’ outline as depicted in Figure 1(middle) and augment this with appearance based features to form a LOOPS model, as described in Section 2. [sent-47, score-0.383]
20 Given a model, we face the computational challenge of localizing the landmarks in test images in the face of clutter, large deformations, and articulations (Figure 1, fourth box). [sent-48, score-0.444]
21 We first consider a tractable global search space, consisting of candidate landmark assignments. [sent-52, score-0.299]
22 This allows a discrete probabilistic inference technique to achieve rough but accurate localization that robustly explores the multimodal set of solutions allowed by our large deformation model. [sent-53, score-0.274]
23 The localization of outlines in test images is described in detail in Section 3. [sent-58, score-0.557]
24 We demonstrate in Section 4 that this localization achieves state-of-the-art results for objects with significant deformation and articulation in natural images. [sent-59, score-0.316]
25 Finally, with the localized outlines in hand, we can readily perform a range of descriptive tasks (classification, ranking, clustering), based on the predicted location of landmarks in test images as well as appearance characteristics in the vicinity of those landmarks. [sent-60, score-1.269]
26 We demonstrate how this is carried out for several descriptive tasks in Section 4. [sent-61, score-0.353]
27 The second axis varies the components that are extracted from the LOOPS outlines for these tasks: we show examples that use the entire object shape, a subcomponent of the object shape, and the appearance of a specific part of the object. [sent-64, score-0.85]
28 2 2 The LOOPS Model Given a set of training instances, each with N corresponded landmarks, the LOOPS object class model combines two components: an explicit representation of the object’s shape (2D silhouette), and a set of image-based features. [sent-66, score-0.549]
29 We define the shape of a class of objects via the locations of the N object landmarks, each of which is assigned to one of the image pixels. [sent-67, score-0.516]
30 PShape encodes the (unnormalized) distribution over the object shape (outline), F det (li ) is a landmark specific grad detector, and Fij (li , lj ; I) encodes a preference for aligning outline segments along image edges. [sent-70, score-0.9]
31 Below we describe how the shape model and the detector features are learned. [sent-71, score-0.357]
32 This can naturally be posed as a pairwise feature over landmarks on opposite sides of the object. [sent-75, score-0.3]
33 As we discuss below in Section 3, the procedure to locate the model landmarks in an image first involves discrete global inference using the LOOPS model, followed by a local refinement stage. [sent-85, score-0.4]
34 Thus, during the discrete inference stage, we limit the number of pairwise elements by approximating the shape distribution with a sparse multivariate Gaussian. [sent-88, score-0.285]
35 ) To obtain the sparsity pattern, we choose a linear number of landmark pairs whose relative locations have the lowest variance across the training instances (and require that neighbor pairs be included), promoting shape stability. [sent-90, score-0.461]
36 To construct detector features F det , we build on the success of boosting in state-of-the-art object detection methods [17, 22]. [sent-92, score-0.438]
37 Specifically, we use boosting to learn a strong detector (classifier), Hi for each landmark i. [sent-93, score-0.393]
38 We then define the feature value in the conditional MRF for the assignment of landmark i to pixel li to be Fidet (li ; I) = Hi (li ). [sent-94, score-0.377]
39 For weak detectors we use features that are based on our shape model as well as other features that have proven useful for the task of object detection: shape templates [5], boundary fragments [17], filter response patches [22], and SIFT descriptors [16]. [sent-95, score-0.632]
40 The weak detector ht (li ) is one of these i features chosen at round t of boosting that best predicts whether landmark i is at a particular pixel T li . [sent-96, score-0.535]
41 i grad T The pairwise feature Fij (li , lj ; I) = r∈li lj |g(r) n(li , lj )| sums over the segment between adjacent landmarks, where g(r) is the image gradient at point r, and n(li , lj ) is the segment normal. [sent-98, score-0.412]
42 3 Figure 2: Example outlines predicted using (candidate) the top detection for each landmark independently, (discrete) inference, (c) a continuous refinement of (b). [sent-99, score-0.653]
43 Candidate 3 ⇒ Discrete ⇒ Refinement Localization of Object Outlines We now address our central computational challenge: assigning the landmarks of a LOOPS model to test image pixels while allowing for large deformations and articulations. [sent-100, score-0.436]
44 This allows us to outline objects by using probabilistic inference to find the most probable such assignment: L∗ = argmaxL P (L | I, w) Because, in principle, each landmark can be assigned to any pixel, finding L∗ is computationally prohibitive. [sent-103, score-0.384]
45 Furthermore, large articulations were not captured even with the “correct” starting point (placing the mean shape in the center of the true location). [sent-110, score-0.248]
46 We cannot directly perform inference over the entire seach space of N P assigments (for N model landmarks and P pixels). [sent-113, score-0.298]
47 To prune this space, we first assume that landmarks will fall on “interesting” points, and consider only candidate pixels (typically 1000-2000 per image) found by the SIFT interest operator [16]. [sent-114, score-0.343]
48 The only pairwise feature functions we use are over neighboring pairs of landmarks (as described in Section 2), which does not add to the density of the MRF construction, thus allowing the inference procedure to be tractable. [sent-117, score-0.329]
49 We perform approximate max-product inference using the Residual Belief Propagation (RBP) algorithm [6] to find the most likely assignment of landmarks to pixels L∗ in the pruned space. [sent-118, score-0.365]
50 Given the best assignment L∗ predicted in the discrete stage, we perform a refinement stage in which we reintroduce the entire pixel domain and use the full shape distribution. [sent-119, score-0.384]
51 Refinement involves a greedy hill-climbing algorithm in which we iterate across each landmark, moving it to the best candidate location using one of two types of moves, while holding the other landmarks fixed. [sent-120, score-0.353]
52 In a local move, each landmark picks the best pixel in a small window around its current location. [sent-121, score-0.253]
53 In a global move, each landmark can move to its mean location given all the other landmark assignments; this location is the mean of the conditional Gaussian PShape (li | L \ li ), easily computed from the joint shape Gaussian. [sent-122, score-0.8]
54 4 Experimental Results Our experimental evaluation is aimed at demonstrating the ability of a single LOOPS model to perform a range of tasks based on corresponded localization of objects. [sent-125, score-0.344]
55 More detailed results, including more object classes and scenes, and an analysis of outline accuracy, appear in [13]. [sent-128, score-0.308]
56 4 LOOPS OBJ CUT kAS Detector Figure 3: Randomly selected outlines produced by LOOPS and its two competitors, displaying the variation in the four classes considered in our descriptive classification experiments. [sent-132, score-0.637]
57 We report the rms of the distance from each point on the outline to the nearest point on the groundtruth (and vice versa), as a percentage of the groundtruth bounding box diagonal. [sent-146, score-0.283]
58 We first evaluate the ability of our model to produce accurate outlines in which the model’s landmarks are positioned consistently across test images. [sent-148, score-0.653]
59 We compare LOOPS to two state-of-the-art methods that seek to produce accurate object outlines in cluttered images: the OBJ CUT model of Prasad and Fitzgibbon [19] and the kAS Detector of Ferrari et al. [sent-149, score-0.637]
60 To provide a quantitative evaluation of the outlines, we measured the symmetric root mean squared (rms) distance between the produced outlines and the hand-labeled groundtruth. [sent-155, score-0.35]
61 As we can see both qualitatively in Figure 3 and quantitatively in Table 1, LOOPS produces significantly more accurate outlines than its competitors. [sent-156, score-0.384]
62 Figure 3 shows two example test images with the outlines for each of the four classes we considered here. [sent-157, score-0.429]
63 While in some cases the LOOPS outline is not perfect at the pixel level, it usually captures the correct articulation, pose, and shape of the object. [sent-158, score-0.35]
64 Descriptive Classification with LOOPS Outlines Our goal is to use the predicted LOOPS outlines for distinguishing between two configurations of an object. [sent-159, score-0.398]
65 To accomplish this, we first train the joint shape and appearance model and perform inference to localize outlines in the test images, all without knowledge of the classification task or any labels. [sent-160, score-0.742]
66 Representing each instance as a corresponded outline provides information that can be leveraged much more easily than the pixel-based representation. [sent-161, score-0.256]
67 We then incorporate the labels to train a descriptive classifier given a corresponded localization. [sent-162, score-0.437]
68 The LOOPS outlines are then classified based on their mean distance to each training contour. [sent-165, score-0.35]
69 In addition, we include a GROUND measure that uses the landmark coordinates of manually corresponded groundtruth outlines as features in a logistic regression classifier. [sent-166, score-0.747]
70 GROUND uses manually labeled outlines and approximately upper bounds the performance achievable from outlines. [sent-169, score-0.35]
71 For both lamp tasks, the same LOOPS, OBJ CUT, and kAS Detector models and localizations are used. [sent-170, score-0.284]
72 The second uses a discriminative approach based on the Centroid detector described above, which is similar to the detector used by [22]; we train the descriptive classifier based on the vector of feature responses at the predicted object centroid. [sent-176, score-0.901]
73 Importantly, by making use of the outline predicted in a cluttered image, we surpass the fully supervised baselines (rightmost on the graphs) with as little as a single supervised instance (leftmost on the graphs). [sent-185, score-0.233]
74 Once we have outlined instances, an important benefit of the LOOPS method is that we can in fact perform multiple descriptive tasks with the same object model. [sent-186, score-0.555]
75 We demonstrate this with a pair of classification tasks for the lamp object class, presented in Figure 4(bottom). [sent-187, score-0.492]
76 The tasks differ in which “part” of the object we consider for classification: triangular vs. [sent-188, score-0.268]
77 We stress that both the learned lamp model and the test localizations predicted by LOOPS are the same for both tasks. [sent-192, score-0.332]
78 The consequences of this result are promising: we can do most of the work once, and then readily perform a range of descriptive classification tasks. [sent-194, score-0.313]
79 Shape Similarity Search The second descriptive application area that we consider is similarity search, which involves the ranking of test instances based on their similarity to a search query. [sent-195, score-0.539]
80 Offline, we train a LOOPS model for the object class and localize corresponded outlines in the test images. [sent-199, score-0.772]
81 (c) Figure 6: (left) Object similarity search using the LOOPS output to determine the location of the lamp landmarks. [sent-203, score-0.378]
82 Users select a query lamp instance, a subset of landmarks (possibly all), and whether to use shape or color. [sent-208, score-0.736]
83 Each instance in the dataset is then ranked based on Euclidean distance to the query in shape PCA space or LAB color space as appropriate. [sent-209, score-0.292]
84 The second row shows the ranking when the user decides to focus on the lampshade landmarks, yielding only triangular lamp shades, and the third row focuses on the lamp base, returning only wide bases. [sent-211, score-0.553]
85 In this section, we consider an outline and appearance based clustering where the feature vector for each airplane includes the mean color values in the LAB color space for all pixels inside the airplane boundary (or in a region bounded by a user-selected set of landmarks). [sent-218, score-0.565]
86 Figure 5(left) shows 12 examples from one cluster that results from clustering using the entire plane, for a database of 770 images from the Caltech airplanes image set [8]. [sent-220, score-0.279]
87 In order to perform such coherent clustering of airplane tails, one needs first to accurately localize the tail in test images. [sent-224, score-0.287]
88 Even more than the table lamp ranking task presented above, this example highlights the ability of LOOPS to leverage localize appearance, opening the door for many additional shape and appearance based descriptive tasks. [sent-225, score-0.913]
89 5 Discussion and Future Work In this work we presented the Localizing Object Outlines using Probabilistic Shape (LOOPS) approach for obtaining accurate, corresponded outlines of objects in test images, with the goal of performing a variety of descriptive tasks. [sent-226, score-0.83]
90 Our approach relies on a coherent probabilistic model in which shape is combined with discriminative detectors. [sent-227, score-0.276]
91 We showed how the produced outlines can 7 be used to perform descriptive classification, search, and clustering based on shape and localized appearance, and we evaluated the error of our outlines compared to two state-of-the-art competitors. [sent-228, score-1.231]
92 First, we introduce a model that combines both generative and discriminative elements, allowing us to localize precise outlines of highly articulated objected in cluttered natural images. [sent-231, score-0.515]
93 Third, we demonstrate that precise localization is of value for a range of descriptive tasks, including those that are based on appearance. [sent-233, score-0.415]
94 Several existing methods produce outlines either as a by-product of detection (e. [sent-234, score-0.399]
95 We showed that LOOPS produces far more accurate outlines when dealing with significant object deformation and articulation, and demonstrated that it is able to translate this into superior classification rates for descriptive tasks. [sent-240, score-0.928]
96 No other work that considers object classes in natural images has demonstrated a combination of accurate localization and shape analysis that has solved these problems. [sent-241, score-0.64]
97 , the neck of the giraffe) as a set of landmarks that articulate together, and achieve better localization by estimating a distribution over part articulation (e. [sent-245, score-0.538]
98 Accurate object detection with deformable shape models learnt from images. [sent-324, score-0.448]
99 Incremental learning of object detectors using a visual shape alphabet. [sent-354, score-0.435]
100 Contextual models for object detection using boosted random fields. [sent-379, score-0.289]
wordName wordTfidf (topN-words)
[('loops', 0.396), ('outlines', 0.35), ('descriptive', 0.287), ('landmarks', 0.269), ('lamp', 0.224), ('landmark', 0.206), ('object', 0.202), ('shape', 0.197), ('detector', 0.16), ('corresponded', 0.15), ('kas', 0.137), ('localization', 0.128), ('obj', 0.12), ('outline', 0.106), ('appearance', 0.096), ('li', 0.095), ('articulation', 0.09), ('airplane', 0.09), ('images', 0.079), ('giraffe', 0.075), ('nement', 0.075), ('image', 0.074), ('cvpr', 0.073), ('classi', 0.072), ('localize', 0.07), ('pshape', 0.068), ('tasks', 0.066), ('lj', 0.064), ('localizations', 0.06), ('instances', 0.058), ('search', 0.057), ('deformation', 0.055), ('deformations', 0.055), ('ferrari', 0.055), ('cut', 0.054), ('cluttered', 0.051), ('tails', 0.051), ('aams', 0.051), ('airplanes', 0.051), ('articulations', 0.051), ('fidet', 0.051), ('grad', 0.051), ('neck', 0.051), ('detection', 0.049), ('similarity', 0.049), ('color', 0.049), ('location', 0.048), ('predicted', 0.048), ('clustering', 0.047), ('pixel', 0.047), ('query', 0.046), ('mrf', 0.046), ('localizing', 0.045), ('bending', 0.045), ('shade', 0.045), ('fij', 0.045), ('tail', 0.045), ('discriminative', 0.044), ('objects', 0.043), ('box', 0.041), ('competitors', 0.041), ('legs', 0.041), ('groundtruth', 0.041), ('er', 0.039), ('ranking', 0.039), ('boosted', 0.038), ('pixels', 0.038), ('elidan', 0.036), ('standing', 0.036), ('detectors', 0.036), ('candidate', 0.036), ('stage', 0.035), ('coherent', 0.035), ('clutter', 0.035), ('cheetah', 0.034), ('jets', 0.034), ('accurate', 0.034), ('row', 0.033), ('questions', 0.033), ('supervision', 0.033), ('depicted', 0.031), ('sift', 0.031), ('pairwise', 0.031), ('prasad', 0.03), ('centroid', 0.03), ('cation', 0.03), ('inference', 0.029), ('assignment', 0.029), ('re', 0.028), ('database', 0.028), ('baselines', 0.028), ('discrete', 0.028), ('rms', 0.027), ('pose', 0.027), ('heitz', 0.027), ('cootes', 0.027), ('boosting', 0.027), ('bounding', 0.027), ('readily', 0.026), ('hi', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.000001 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
Author: Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Abstract: Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. 1
2 0.19421262 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
Author: Alex J. Smola, Julian J. Mcauley, Tibério S. Caetano
Abstract: Models for near-rigid shape matching are typically based on distance-related features, in order to infer matches that are consistent with the isometric assumption. However, real shapes from image datasets, even when expected to be related by “almost isometric” transformations, are actually subject not only to noise but also, to some limited degree, to variations in appearance and scale. In this paper, we introduce a graphical model that parameterises appearance, distance, and angle features and we learn all of the involved parameters via structured prediction. The outcome is a model for near-rigid shape matching which is robust in the sense that it is able to capture the possibly limited but still important scale and appearance variations. Our experimental results reveal substantial improvements upon recent successful models, while maintaining similar running times. 1
3 0.17598492 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis
Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1
4 0.16021216 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
Author: Volkan Cevher, Marco F. Duarte, Chinmay Hegde, Richard Baraniuk
Abstract: Compressive Sensing (CS) combines sampling and compression into a single subNyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov Random Fields (MRFs) to represent sparse signals whose nonzero coefficients are clustered. Our new model-based recovery algorithm, dubbed Lattice Matching Pursuit (LaMP), stably recovers MRF-modeled signals using many fewer measurements and computations than the current state-of-the-art algorithms.
5 0.1601692 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller
Abstract: One of the original goals of computer vision was to fully understand a natural scene. This requires solving several sub-problems simultaneously, including object detection, region labeling, and geometric reasoning. The last few decades have seen great progress in tackling each of these problems in isolation. Only recently have researchers returned to the difficult task of considering them jointly. In this work, we consider learning a set of related models in such that they both solve their own problem and help each other. We develop a framework called Cascaded Classification Models (CCM), where repeated instantiations of these classifiers are coupled by their input/output variables in a cascade that improves performance at each level. Our method requires only a limited “black box” interface with the models, allowing us to use very sophisticated, state-of-the-art classifiers without having to look under the hood. We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d reconstruction. 1
6 0.14881919 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
7 0.11915867 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
8 0.11800595 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
9 0.11666238 95 nips-2008-Grouping Contours Via a Related Image
10 0.094284914 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
11 0.08795289 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
12 0.087413624 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
13 0.086588033 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
14 0.082240038 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
15 0.078889348 119 nips-2008-Learning a discriminative hidden part model for human action recognition
16 0.073329896 147 nips-2008-Multiscale Random Fields with Application to Contour Grouping
17 0.068768218 231 nips-2008-Temporal Dynamics of Cognitive Control
18 0.06857422 148 nips-2008-Natural Image Denoising with Convolutional Networks
19 0.068166181 23 nips-2008-An ideal observer model of infant object perception
20 0.067908868 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree
topicId topicWeight
[(0, -0.201), (1, -0.125), (2, 0.126), (3, -0.189), (4, 0.004), (5, 0.002), (6, -0.088), (7, -0.213), (8, 0.078), (9, -0.03), (10, -0.065), (11, -0.025), (12, 0.021), (13, -0.065), (14, 0.053), (15, 0.019), (16, 0.038), (17, 0.053), (18, 0.094), (19, 0.062), (20, 0.045), (21, -0.053), (22, 0.001), (23, 0.06), (24, 0.003), (25, 0.108), (26, -0.049), (27, 0.024), (28, -0.041), (29, 0.047), (30, -0.077), (31, -0.018), (32, 0.051), (33, -0.064), (34, 0.091), (35, 0.101), (36, 0.003), (37, 0.034), (38, -0.081), (39, 0.036), (40, -0.005), (41, -0.085), (42, -0.042), (43, 0.008), (44, -0.05), (45, -0.089), (46, -0.016), (47, 0.049), (48, -0.005), (49, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.95016128 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
Author: Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Abstract: Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. 1
2 0.78162014 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
Author: Alex J. Smola, Julian J. Mcauley, Tibério S. Caetano
Abstract: Models for near-rigid shape matching are typically based on distance-related features, in order to infer matches that are consistent with the isometric assumption. However, real shapes from image datasets, even when expected to be related by “almost isometric” transformations, are actually subject not only to noise but also, to some limited degree, to variations in appearance and scale. In this paper, we introduce a graphical model that parameterises appearance, distance, and angle features and we learn all of the involved parameters via structured prediction. The outcome is a model for near-rigid shape matching which is robust in the sense that it is able to capture the possibly limited but still important scale and appearance variations. Our experimental results reveal substantial improvements upon recent successful models, while maintaining similar running times. 1
3 0.7343505 95 nips-2008-Grouping Contours Via a Related Image
Author: Praveen Srinivasan, Liming Wang, Jianbo Shi
Abstract: Contours have been established in the biological and computer vision literature as a compact yet descriptive representation of object shape. While individual contours provide structure, they lack the large spatial support of region segments (which lack internal structure). We present a method for further grouping of contours in an image using their relationship to the contours of a second, related image. Stereo, motion, and similarity all provide cues that can aid this task; contours that have similar transformations relating them to their matching contours in the second image likely belong to a single group. To find matches for contours, we rely only on shape, which applies directly to all three modalities without modification, in contrast to the specialized approaches developed for each independently. Visually salient contours are extracted in each image, along with a set of candidate transformations for aligning subsets of them. For each transformation, groups of contours with matching shape across the two images are identified to provide a context for evaluating matches of individual contour points across the images. The resulting contexts of contours are used to perform a final grouping on contours in the original image while simultaneously finding matches in the related image, again by shape matching. We demonstrate grouping results on image pairs consisting of stereo, motion, and similar images. Our method also produces qualitatively better results against a baseline method that does not use the inferred contexts. 1
4 0.73156112 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis
Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1
5 0.67529362 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller
Abstract: One of the original goals of computer vision was to fully understand a natural scene. This requires solving several sub-problems simultaneously, including object detection, region labeling, and geometric reasoning. The last few decades have seen great progress in tackling each of these problems in isolation. Only recently have researchers returned to the difficult task of considering them jointly. In this work, we consider learning a set of related models in such that they both solve their own problem and help each other. We develop a framework called Cascaded Classification Models (CCM), where repeated instantiations of these classifiers are coupled by their input/output variables in a cascade that improves performance at each level. Our method requires only a limited “black box” interface with the models, allowing us to use very sophisticated, state-of-the-art classifiers without having to look under the hood. We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d reconstruction. 1
6 0.657498 147 nips-2008-Multiscale Random Fields with Application to Contour Grouping
7 0.57365102 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
8 0.55983818 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
9 0.54872513 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
10 0.5402993 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
11 0.51700175 23 nips-2008-An ideal observer model of infant object perception
12 0.45401561 119 nips-2008-Learning a discriminative hidden part model for human action recognition
13 0.4498668 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
14 0.44838229 148 nips-2008-Natural Image Denoising with Convolutional Networks
15 0.44336143 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
16 0.43395406 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
17 0.42937487 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree
18 0.40370989 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
19 0.39362165 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
20 0.39144892 48 nips-2008-Clustering via LP-based Stabilities
topicId topicWeight
[(6, 0.051), (7, 0.051), (12, 0.443), (16, 0.014), (28, 0.111), (35, 0.019), (57, 0.075), (59, 0.017), (63, 0.012), (71, 0.015), (77, 0.031), (83, 0.077)]
simIndex simValue paperId paperTitle
1 0.93959814 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade
Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1
2 0.90507716 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
Author: Roger P. Levy, Florencia Reali, Thomas L. Griffiths
Abstract: Language comprehension in humans is significantly constrained by memory, yet rapid, highly incremental, and capable of utilizing a wide range of contextual information to resolve ambiguity and form expectations about future input. In contrast, most of the leading psycholinguistic models and fielded algorithms for natural language parsing are non-incremental, have run time superlinear in input length, and/or enforce structural locality constraints on probabilistic dependencies between events. We present a new limited-memory model of sentence comprehension which involves an adaptation of the particle filter, a sequential Monte Carlo method, to the problem of incremental parsing. We show that this model can reproduce classic results in online sentence comprehension, and that it naturally provides the first rational account of an outstanding problem in psycholinguistics, in which the preferred alternative in a syntactic ambiguity seems to grow more attractive over time even in the absence of strong disambiguating information. 1
same-paper 3 0.85533845 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
Author: Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Abstract: Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. 1
4 0.82399881 193 nips-2008-Regularized Co-Clustering with Dual Supervision
Author: Vikas Sindhwani, Jianying Hu, Aleksandra Mojsilovic
Abstract: By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surprisingly impressive performance improvements over traditional one-sided row clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classification algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces. Empirical results demonstrate the effectiveness and utility of our algorithms. 1
5 0.820342 170 nips-2008-Online Optimization in X-Armed Bandits
Author: Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos
Abstract: We consider a generalization of stochastic bandit problems where the set of arms, X , is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally H¨ lder √ a known exponent, then the expected o with regret is bounded up to a logarithmic factor by n, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider. 1 Introduction and motivation Bandit problems arise in many settings, including clinical trials, scheduling, on-line parameter tuning of algorithms or optimization of controllers based on simulations. In the classical bandit problem there are a finite number of arms that the decision maker can select at discrete time steps. Selecting an arm results in a random reward, whose distribution is determined by the identity of the arm selected. The distributions associated with the arms are unknown to the decision maker whose goal is to maximize the expected sum of the rewards received. In many practical situations the arms belong to a large set. This set could be continuous [1; 6; 3; 2; 7], hybrid-continuous, or it could be the space of infinite sequences over a finite alphabet [4]. In this paper we consider stochastic bandit problems where the set of arms, X , is allowed to be an arbitrary topological space. We assume that the decision maker knows a dissimilarity function defined over this space that constraints the shape of the mean-payoff function. In particular, the dissimilarity function is assumed to put a lower bound on the mean-payoff function from below at each maxima. We also assume that the decision maker is able to cover the space of arms in a recursive manner, successively refining the regions in the covering such that the diameters of these sets shrink at a known geometric rate when measured with the dissimilarity. ∗ Csaba Szepesv´ ri is on leave from MTA SZTAKI. He also greatly acknowledges the support received from the a Alberta Ingenuity Fund, iCore and NSERC. 1 Our work generalizes and improves previous works on continuum-armed bandit problems: Kleinberg [6] and Auer et al. [2] focussed on one-dimensional problems. Recently, Kleinberg et al. [7] considered generic metric spaces assuming that the mean-payoff function is Lipschitz with respect to the (known) metric of the space. They proposed an interesting algorithm that achieves essentially the best possible regret in a minimax sense with respect to these environments. The goal of this paper is to further these works in a number of ways: (i) we allow the set of arms to be a generic topological space; (ii) we propose a practical algorithm motivated by the recent very successful tree-based optimization algorithms [8; 5; 4] and show that the algorithm is (iii) able to exploit higher order smoothness. In particular, as we shall argue in Section 7, (i) improves upon the results of Auer et al. [2], while (i), (ii) and (iii) improve upon the work of Kleinberg et al. [7]. Compared to Kleinberg et al. [7], our work represents an improvement in the fact that just like Auer et al. [2] we make use of the local properties of the mean-payoff function around the maxima only, and not a global property, such as Lipschitzness in √ the whole space. This allows us to obtain a regret which scales as O( n) 1 when e.g. the space is the unit hypercube and the mean-payoff function is locally H¨ lder with known exponent in the neighborhood of any o maxima (which are in finite number) and bounded away from the maxima outside of these neighborhoods. Thus, we get the desirable property that the rate of growth of the regret is independent of the dimensionality of the input space. We also prove a minimax lower bound that matches our upper bound up to logarithmic factors, showing that the performance of our algorithm is essentially unimprovable in a minimax sense. Besides these theoretical advances the algorithm is anytime and easy to implement. Since it is based on ideas that have proved to be efficient, we expect it to perform well in practice and to make a significant impact on how on-line global optimization is performed. 2 Problem setup, notation We consider a topological space X , whose elements will be referred to as arms. A decision maker “pulls” the arms in X one at a time at discrete time steps. Each pull results in a reward that depends on the arm chosen and which the decision maker learns of. The goal of the decision maker is to choose the arms so as to maximize the sum of the rewards that he receives. In this paper we are concerned with stochastic environments. Such an environment M associates to each arm x ∈ X a distribution Mx on the real line. The support of these distributions is assumed to be uniformly bounded with a known bound. For the sake of simplicity, we assume this bound is 1. We denote by f (x) the expectation of Mx , which is assumed to be measurable (all measurability concepts are with respect to the Borel-algebra over X ). The function f : X → R thus defined is called the mean-payoff function. When in round n the decision maker pulls arm Xn ∈ X , he receives a reward Yn drawn from MXn , independently of the past arm choices and rewards. A pulling strategy of a decision maker is determined by a sequence ϕ = (ϕn )n≥1 of measurable mappings, n−1 where each ϕn maps the history space Hn = X × [0, 1] to the space of probability measures over X . By convention, ϕ1 does not take any argument. A strategy is deterministic if for every n the range of ϕn contains only Dirac distributions. According to the process that was already informally described, a pulling strategy ϕ and an environment M jointly determine a random process (X1 , Y1 , X2 , Y2 , . . .) in the following way: In round one, the decision maker draws an arm X1 at random from ϕ1 and gets a payoff Y1 drawn from MX1 . In round n ≥ 2, first, Xn is drawn at random according to ϕn (X1 , Y1 , . . . , Xn−1 , Yn−1 ), but otherwise independently of the past. Then the decision maker gets a rewards Yn drawn from MXn , independently of all other random variables in the past given Xn . Let f ∗ = supx∈X f (x) be the maximal expected payoff. The cumulative regret of a pulling strategy in n n environment M is Rn = n f ∗ − t=1 Yt , and the cumulative pseudo-regret is Rn = n f ∗ − t=1 f (Xt ). 1 We write un = O(vu ) when un = O(vn ) up to a logarithmic factor. 2 In the sequel, we restrict our attention to the expected regret E [Rn ], which in fact equals E[Rn ], as can be seen by the application of the tower rule. 3 3.1 The Hierarchical Optimistic Optimization (HOO) strategy Trees of coverings We first introduce the notion of a tree of coverings. Our algorithm will require such a tree as an input. Definition 1 (Tree of coverings). A tree of coverings is a family of measurable subsets (Ph,i )1≤i≤2h , h≥0 of X such that for all fixed integer h ≥ 0, the covering ∪1≤i≤2h Ph,i = X holds. Moreover, the elements of the covering are obtained recursively: each subset Ph,i is covered by the two subsets Ph+1,2i−1 and Ph+1,2i . A tree of coverings can be represented, as the name suggests, by a binary tree T . The whole domain X = P0,1 corresponds to the root of the tree and Ph,i corresponds to the i–th node of depth h, which will be referred to as node (h, i) in the sequel. The fact that each Ph,i is covered by the two subsets Ph+1,2i−1 and Ph+1,2i corresponds to the childhood relationship in the tree. Although the definition allows the childregions of a node to cover a larger part of the space, typically the size of the regions shrinks as depth h increases (cf. Assumption 1). Remark 1. Our algorithm will instantiate the nodes of the tree on an ”as needed” basis, one by one. In fact, at any round n it will only need n nodes connected to the root. 3.2 Statement of the HOO strategy The algorithm picks at each round a node in the infinite tree T as follows. In the first round, it chooses the root node (0, 1). Now, consider round n + 1 with n ≥ 1. Let us denote by Tn the set of nodes that have been picked in previous rounds and by Sn the nodes which are not in Tn but whose parent is. The algorithm picks at round n + 1 a node (Hn+1 , In+1 ) ∈ Sn according to the deterministic rule that will be described below. After selecting the node, the algorithm further chooses an arm Xn+1 ∈ PHn+1 ,In+1 . This selection can be stochastic or deterministic. We do not put any further restriction on it. The algorithm then gets a reward Yn+1 as described above and the procedure goes on: (Hn+1 , In+1 ) is added to Tn to form Tn+1 and the children of (Hn+1 , In+1 ) are added to Sn to give rise to Sn+1 . Let us now turn to how (Hn+1 , In+1 ) is selected. Along with the nodes the algorithm stores what we call B–values. The node (Hn+1 , In+1 ) ∈ Sn to expand at round n + 1 is picked by following a path from the root to a node in Sn , where at each node along the path the child with the larger B–value is selected (ties are broken arbitrarily). In order to define a node’s B–value, we need a few quantities. Let C(h, i) be the set that collects (h, i) and its descendants. We let n Nh,i (n) = I{(Ht ,It )∈C(h,i)} t=1 be the number of times the node (h, i) was visited. A given node (h, i) is always picked at most once, but since its descendants may be picked afterwards, subsequent paths in the tree can go through it. Consequently, 1 ≤ Nh,i (n) ≤ n for all nodes (h, i) ∈ Tn . Let µh,i (n) be the empirical average of the rewards received for the time-points when the path followed by the algorithm went through (h, i): n 1 µh,i (n) = Yt I{(Ht ,It )∈C(h,i)} . Nh,i (n) t=1 The corresponding upper confidence bound is by definition Uh,i (n) = µh,i (n) + 3 2 ln n + ν 1 ρh , Nh,i (n) where 0 < ρ < 1 and ν1 > 0 are parameters of the algorithm (to be chosen later by the decision maker, see Assumption 1). For nodes not in Tn , by convention, Uh,i (n) = +∞. Now, for a node (h, i) in Sn , we define its B–value to be Bh,i (n) = +∞. The B–values for nodes in Tn are given by Bh,i (n) = min Uh,i (n), max Bh+1,2i−1 (n), Bh+1,2i (n) . Note that the algorithm is deterministic (apart, maybe, from the arbitrary random choice of Xt in PHt ,It ). Its total space requirement is linear in n while total running time at round n is at most quadratic in n, though we conjecture that it is O(n log n) on average. 4 Assumptions made on the model and statement of the main result We suppose that X is equipped with a dissimilarity , that is a non-negative mapping : X 2 → R satisfying (x, x) = 0. The diameter (with respect to ) of a subset A of X is given by diam A = supx,y∈A (x, y). Given the dissimilarity , the “open” ball with radius ε > 0 and center c ∈ X is B(c, ε) = { x ∈ X : (c, x) < ε } (we do not require the topology induced by to be related to the topology of X .) In what follows when we refer to an (open) ball, we refer to the ball defined with respect to . The dissimilarity will be used to capture the smoothness of the mean-payoff function. The decision maker chooses and the tree of coverings. The following assumption relates this choice to the parameters ρ and ν1 of the algorithm: Assumption 1. There exist ρ < 1 and ν1 , ν2 > 0 such that for all integers h ≥ 0 and all i = 1, . . . , 2h , the diameter of Ph,i is bounded by ν1 ρh , and Ph,i contains an open ball Ph,i of radius ν2 ρh . For a given h, the Ph,i are disjoint for 1 ≤ i ≤ 2h . Remark 2. A typical choice for the coverings in a cubic domain is to let the domains be hyper-rectangles. They can be obtained, e.g., in a dyadic manner, by splitting at each step hyper-rectangles in the middle along their longest side, in an axis parallel manner; if all sides are equal, we split them along the√ axis. In first this example, if X = [0, 1]D and (x, y) = x − y α then we can take ρ = 2−α/D , ν1 = ( D/2)α and ν2 = 1/8α . The next assumption concerns the environment. Definition 2. We say that f is weakly Lipschitz with respect to if for all x, y ∈ X , f ∗ − f (y) ≤ f ∗ − f (x) + max f ∗ − f (x), (x, y) . (1) Note that weak Lipschitzness is satisfied whenever f is 1–Lipschitz, i.e., for all x, y ∈ X , one has |f (x) − f (y)| ≤ (x, y). On the other hand, weak Lipschitzness implies local (one-sided) 1–Lipschitzness at any maxima. Indeed, at an optimal arm x∗ (i.e., such that f (x∗ ) = f ∗ ), (1) rewrites to f (x∗ ) − f (y) ≤ (x∗ , y). However, weak Lipschitzness does not constraint the growth of the loss in the vicinity of other points. Further, weak Lipschitzness, unlike Lipschitzness, does not constraint the local decrease of the loss at any point. Thus, weak-Lipschitzness is a property that lies somewhere between a growth condition on the loss around optimal arms and (one-sided) Lipschitzness. Note that since weak Lipschitzness is defined with respect to a dissimilarity, it can actually capture higher-order smoothness at the optima. For example, f (x) = 1 − x2 is weak Lipschitz with the dissimilarity (x, y) = c(x − y)2 for some appropriate constant c. Assumption 2. The mean-payoff function f is weakly Lipschitz. ∗ ∗ Let fh,i = supx∈Ph,i f (x) and ∆h,i = f ∗ − fh,i be the suboptimality of node (h, i). We say that def a node (h, i) is optimal (respectively, suboptimal) if ∆h,i = 0 (respectively, ∆h,i > 0). Let Xε = { x ∈ X : f (x) ≥ f ∗ − ε } be the set of ε-optimal arms. The following result follows from the definitions; a proof can be found in the appendix. 4 Lemma 1. Let Assumption 1 and 2 hold. If the suboptimality ∆h,i of a region is bounded by cν1 ρh for some c > 0, then all arms in Ph,i are max{2c, c + 1}ν1 ρh -optimal. The last assumption is closely related to Assumption 2 of Auer et al. [2], who observed that the regret of a continuum-armed bandit algorithm should depend on how fast the volume of the sets of ε-optimal arms shrinks as ε → 0. Here, we capture this by defining a new notion, the near-optimality dimension of the mean-payoff function. The connection between these concepts, as well as the zooming dimension defined by Kleinberg et al. [7] will be further discussed in Section 7. Define the packing number P(X , , ε) to be the size of the largest packing of X with disjoint open balls of radius ε with respect to the dissimilarity .2 We now define the near-optimality dimension, which characterizes the size of the sets Xε in terms of ε, and then state our main result. Definition 3. For c > 0 and ε0 > 0, the (c, ε0 )–near-optimality dimension of f with respect to equals inf d ∈ [0, +∞) : ∃ C s.t. ∀ε ≤ ε0 , P Xcε , , ε ≤ C ε−d (2) (with the usual convention that inf ∅ = +∞). Theorem 1 (Main result). Let Assumptions 1 and 2 hold and assume that the (4ν1 /ν2 , ν2 )–near-optimality dimension of the considered environment is d < +∞. Then, for any d > d there exists a constant C(d ) such that for all n ≥ 1, ERn ≤ C(d ) n(d +1)/(d +2) ln n 1/(d +2) . Further, if the near-optimality dimension is achieved, i.e., the infimum is achieved in (2), then the result holds also for d = d. Remark 3. We can relax the weak-Lipschitz property by requiring it to hold only locally around the maxima. In fact, at the price of increased constants, the result continues to hold if there exists ε > 0 such that (1) holds for any x, y ∈ Xε . To show this we only need to carefully adapt the steps of the proof below. We omit the details from this extended abstract. 5 Analysis of the regret and proof of the main result We first state three lemmas, whose proofs can be found in the appendix. The proofs of Lemmas 3 and 4 rely on concentration-of-measure techniques, while that of Lemma 2 follows from a simple case study. Let us fix some path (0, 1), (1, i∗ ), . . . , (h, i∗ ), . . . , of optimal nodes, starting from the root. 1 h Lemma 2. Let (h, i) be a suboptimal node. Let k be the largest depth such that (k, i∗ ) is on the path from k the root to (h, i). Then we have n E Nh,i (n) ≤ u+ P Nh,i (t) > u and Uh,i (t) > f ∗ or Us,i∗ ≤ f ∗ for some s ∈ {k+1, . . . , t−1} s t=u+1 Lemma 3. Let Assumptions 1 and 2 hold. 1, P Uh,i (n) ≤ f ∗ ≤ n−3 . Then, for all optimal nodes and for all integers n ≥ Lemma 4. Let Assumptions 1 and 2 hold. Then, for all integers t ≤ n, for all suboptimal nodes (h, i) 8 ln such that ∆h,i > ν1 ρh , and for all integers u ≥ 1 such that u ≥ (∆h,i −νnρh )2 , one has P Uh,i (t) > 1 f ∗ and Nh,i (t) > u ≤ t n−4 . 2 Note that sometimes packing numbers are defined as the largest packing with disjoint open balls of radius ε/2, or, ε-nets. 5 . Taking u as the integer part of (8 ln n)/(∆h,i − ν1 ρh )2 , and combining the results of Lemma 2, 3, and 4 with a union bound leads to the following key result. Lemma 5. Under Assumptions 1 and 2, for all suboptimal nodes (h, i) such that ∆h,i > ν1 ρh , we have, for all n ≥ 1, 8 ln n 2 E[Nh,i (n)] ≤ + . (∆h,i − ν1 ρh )2 n We are now ready to prove Theorem 1. Proof. For the sake of simplicity we assume that the infimum in the definition of near-optimality is achieved. To obtain the result in the general case one only needs to replace d below by d > d in the proof below. First step. For all h = 1, 2, . . ., denote by Ih the nodes at depth h that are 2ν1 ρh –optimal, i.e., the nodes ∗ (h, i) such that fh,i ≥ f ∗ − 2ν1 ρh . Then, I is the union of these sets of nodes. Further, let J be the set of nodes that are not in I but whose parent is in I. We then denote by Jh the nodes in J that are located at depth h in the tree. Lemma 4 bounds the expected number of times each node (h, i) ∈ Jh is visited. Since ∆h,i > 2ν1 ρh , we get 8 ln n 2 E Nh,i (n) ≤ 2 2h + . ν1 ρ n Second step. We bound here the cardinality |Ih |, h > 0. If (h, i) ∈ Ih then since ∆h,i ≤ 2ν1 ρh , by Lemma 1 Ph,i ⊂ X4ν1 ρh . Since by Assumption 1, the sets (Ph,i ), for (h, i) ∈ Ih , contain disjoint balls of radius ν2 ρh , we have that |Ih | ≤ P ∪(h,i)∈Ih Ph,i , , ν2 ρh ≤ P X(4ν1 /ν2 ) ν2 ρh , , ν2 ρh ≤ C ν2 ρh −d , where we used the assumption that d is the (4ν1 /ν2 , ν2 )–near-optimality dimension of f (and C is the constant introduced in the definition of the near-optimality dimension). Third step. Choose η > 0 and let H be the smallest integer such that ρH ≤ η. We partition the infinite tree T into three sets of nodes, T = T1 ∪ T2 ∪ T3 . The set T1 contains nodes of IH and their descendants, T2 = ∪0≤h < 1 and then, by optimizing over ρH (the worst value being ρH ∼ ( ln n )−1/(d+2) ). 6 Minimax optimality The packing dimension of a set X is the smallest d such that there exists a constant k such that for all ε > 0, P X , , ε ≤ k ε−d . For instance, compact subsets of Rd (with non-empty interior) have a packing dimension of d whenever is a norm. If X has a packing dimension of d, then all environments have a near-optimality dimension less than d. The proof of the main theorem indicates that the constant C(d) only depends on d, k (of the definition of packing dimension), ν1 , ν2 , and ρ, but not on the environment as long as it is weakly Lipschitz. Hence, we can extract from it a distribution-free bound of the form O(n(d+1)/(d+2) ). In fact, this bound can be shown to be optimal as is illustrated by the theorem below, whose assumptions are satisfied by, e.g., compact subsets of Rd and if is some norm of Rd . The proof can be found in the appendix. Theorem 2. If X is such that there exists c > 0 with P(X , , ε) ≥ c ε−d ≥ 2 for all ε ≤ 1/4 then for all n ≥ 4d−1 c/ ln(4/3), all strategies ϕ are bound to suffer a regret of at least 2/(d+2) 1 1 c n(d+1)/(d+2) , 4 4 4 ln(4/3) where the supremum is taken over all environments with weakly Lipschitz payoff functions. sup E Rn (ϕ) ≥ 7 Discussion Several works [1; 6; 3; 2; 7] have considered continuum-armed bandits in Euclidean or metric spaces and provided upper- and lower-bounds on the regret for given classes of environments. Cope [3] derived a regret √ of O( n) for compact and convex subset of Rd and a mean-payoff function with unique minima and second order smoothness. Kleinberg [6] considered mean-payoff functions f on the real line that are H¨ lder with o degree 0 < α ≤ 1. The derived regret is Θ(n(α+1)/(α+2) ). Auer et al. [2] extended the analysis to classes of functions with only a local H¨ lder assumption around maximum (with possibly higher smoothness degree o 1+α−αβ α ∈ [0, ∞)), and derived the regret Θ(n 1+2α−αβ ), where β is such that the Lebesgue measure of ε-optimal 7 states is O(εβ ). Another setting is that of [7] who considered a metric space (X , ) and assumed that f is Lipschitz w.r.t. . The obtained regret is O(n(d+1)/(d+2) ) where d is the zooming dimension (defined similarly to our near-optimality dimension, but using covering numbers instead of packing numbers and the sets Xε \ Xε/2 ). When (X , ) is a metric space covering and packing numbers are equivalent and we may prove that the zooming dimension and near-optimality dimensions are equal. Our main contribution compared to [7] is that our weak-Lipschitz assumption, which is substantially weaker than the global Lipschitz assumption assumed in [7], enables our algorithm to work better in some common situations, such as when the mean-payoff function assumes a local smoothness whose order is larger than one. In order to relate all these results, let us consider a specific example: Let X = [0, 1]D and assume that the mean-reward function f is locally equivalent to a H¨ lder function with degree α ∈ [0, ∞) around any o maxima x∗ of f (the number of maxima is assumed to be finite): f (x∗ ) − f (x) = Θ(||x − x∗ ||α ) as x → x∗ . (3) This means that ∃c1 , c2 , ε0 > 0, ∀x, s.t. ||x − x∗ || ≤ ε0 , c1 ||x − x∗ ||α ≤ f (x∗ ) − f (x) ≤ c2 ||x − x∗ ||α . √ Under this assumption, the result of Auer et al. [2] shows that for D = 1, the regret is Θ( n) (since here √ β = 1/α). Our result allows us to extend the n regret rate to any dimension D. Indeed, if we choose our def dissimilarity measure to be α (x, y) = ||x − y||α , we may prove that f satisfies a locally weak-Lipschitz √ condition (as defined in Remark 3) and that the near-optimality dimension is 0. Thus our regret is O( n), i.e., the rate is independent of the dimension D. In comparison, since Kleinberg et al. [7] have to satisfy a global Lipschitz assumption, they can not use α when α > 1. Indeed a function globally Lipschitz with respect to α is essentially constant. Moreover α does not define a metric for α > 1. If one resort to the Euclidean metric to fulfill their requirement that f be Lipschitz w.r.t. the metric then the zooming dimension becomes D(α − 1)/α, while the regret becomes √ O(n(D(α−1)+α)/(D(α−1)+2α) ), which is strictly worse than O( n) and in fact becomes close to the slow rate O(n(D+1)/(D+2) ) when α is larger. Nevertheless, in the case of α ≤ 1 they get the same regret rate. In contrast, our result shows that under very weak constraints on the mean-payoff function and if the local behavior of the function around its maximum (or finite number of maxima) is known then global optimization √ suffers a regret of order O( n), independent of the space dimension. As an interesting sidenote let us also remark that our results allow different smoothness orders along different dimensions, i.e., heterogenous smoothness spaces. References [1] R. Agrawal. The continuum-armed bandit problem. SIAM J. Control and Optimization, 33:1926–1951, 1995. [2] P. Auer, R. Ortner, and Cs. Szepesv´ ri. Improved rates for the stochastic continuum-armed bandit problem. 20th a Conference on Learning Theory, pages 454–468, 2007. [3] E. Cope. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. Preprint, 2004. [4] P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In Proceedings of 23rd Conference on Uncertainty in Artificial Intelligence, 2007. [5] S. Gelly, Y. Wang, R. Munos, and O. Teytaud. Modification of UCT with patterns in Monte-Carlo go. Technical Report RR-6062, INRIA, 2006. [6] R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In 18th Advances in Neural Information Processing Systems, 2004. [7] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing, 2008. [8] L. Kocsis and Cs. Szepesv´ ri. Bandit based Monte-Carlo planning. In Proceedings of the 15th European Conference a on Machine Learning, pages 282–293, 2006. 8
6 0.56198359 95 nips-2008-Grouping Contours Via a Related Image
7 0.52456343 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
8 0.52173781 229 nips-2008-Syntactic Topic Models
9 0.51283574 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
10 0.50472265 119 nips-2008-Learning a discriminative hidden part model for human action recognition
11 0.49788624 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
12 0.49743617 4 nips-2008-A Scalable Hierarchical Distributed Language Model
13 0.49621513 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
14 0.49555025 17 nips-2008-Algorithms for Infinitely Many-Armed Bandits
15 0.49514973 66 nips-2008-Dynamic visual attention: searching for coding length increments
16 0.49324873 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms
17 0.49218094 75 nips-2008-Estimating vector fields using sparse basis field expansions
18 0.48818696 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
19 0.48509505 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
20 0.48212019 217 nips-2008-Sparsity of SVMs that use the epsilon-insensitive loss