nips nips2005 nips2005-23 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: James Diebel, Sebastian Thrun
Abstract: This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology. 1
Reference: text
sentIndex sentText sentNum sentScore
1 An Application of Markov Random Fields to Range Sensing James Diebel and Sebastian Thrun Stanford AI Lab Stanford University, Stanford, CA 94305 Abstract This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. [sent-1, score-0.168]
2 A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. [sent-2, score-0.752]
3 The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. [sent-3, score-0.203]
4 This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. [sent-4, score-0.78]
5 We show that by using such an MRF, we can substantially improve over existing range imaging technology. [sent-5, score-0.215]
6 1 Introduction In recent years, there has been an enormous interest in developing technologies for measuring range. [sent-6, score-0.07]
7 The set of commercially available technologies include passive stereo with two or more cameras, active stereo, triangulating light stripers, millimeter wavelength radar, and scanning and flash lidar. [sent-7, score-0.437]
8 In the low-cost arena, systems such as the Swiss Ranger and the CanestaVision sensors provide means to acquire low-res range data along with passive camera images. [sent-8, score-0.474]
9 Both of these devices capture high-res visual images along with lower-res depth information. [sent-9, score-0.585]
10 This is the case for a number of devices at all price ranges, including the highly-praised range camera by 3DV Systems. [sent-10, score-0.433]
11 This paper addresses a single shortcoming that (with the exception of stereo) is shared by most active range acquisition devices: Namely that range is captured at much lower resolution than images. [sent-11, score-0.574]
12 This raises the question as to whether we can turn a low-resolution depth imager into a high-resolution one, by exploiting conventional camera images? [sent-12, score-0.732]
13 A positive answer to this question would significantly advance the field of depth perception. [sent-13, score-0.451]
14 Yet we lack techniques to fuse high-res conventional images with low-res depth images. [sent-14, score-0.641]
15 This paper applies graphical models to the problem of fusing low-res depth images with high-res camera images. [sent-15, score-0.758]
16 Specifically, we propose a Markov Random Field (MRF) method for integrating both data sources. [sent-16, score-0.049]
17 The intuition behind the MRF is that depth discontinuities in a scene often co-occur with color or brightness changes within the associated camera image. [sent-17, score-0.84]
18 Since the camera image is commonly available at much higher resolution, this insight can be used to enhance the resolution and accuracy of the depth image. [sent-18, score-1.031]
19 Our approach performs this data integration using a multi-resolution MRF, which ties together image and range data. [sent-19, score-0.354]
20 The mode of the probability distribution defined by the MRF provides us with a high-res depth map. [sent-20, score-0.501]
21 The density of image pixels is larger than those of the range measurements. [sent-22, score-0.401]
22 The reconstructed range nodes, labeled y, are unobservable, but their density matches that of the image pixels. [sent-23, score-0.461]
23 Auxiliary nodes labeled w and u mediate the information from the image and the depth map, as described in the text. [sent-24, score-0.765]
24 This approach leads to a high-res depth map within seconds, increasing the resolution of our depth sensor by an order of magnitude while improving local accuracy. [sent-26, score-1.188]
25 To back up this claim, we provide several example results obtained using a low-res laser range finder paired with a conventional point-and-shot camera. [sent-27, score-0.314]
26 While none of the modeling or inference techniques in this paper are new, we believe that this paper provides a significant application of graphical modeling techniques to a problem that can dramatically alter an entire growing industry. [sent-28, score-0.192]
27 The input to the MRF occurs at two layers, through the variables labeled xi and the variables labeled zi . [sent-30, score-0.26]
28 The variables xi correspond to the image pixels, and their values are the three-dimensional RGB value of each pixel. [sent-31, score-0.224]
29 The range measurements are sampled much less densely than the image pixels, as indicated in this figure. [sent-33, score-0.4]
30 The key variables in this MRF are the ones labeled y, which model the reconstructed range at the same resolution as the image pixels. [sent-34, score-0.674]
31 Additional nodes labeled u and w leverage the image information into the estimated depth map y. [sent-36, score-0.839]
32 The depth measurement potential is of the form Ψ k (yi − zi )2 = (1) i∈L Here L is the set of indexes for which a depth measurement is available, and k is a constant weight placed on the depth measurements. [sent-38, score-1.455]
33 This potential measures the quadratic distance between the estimated range in the high-res grid y and the measured range in the variables z, where available. [sent-39, score-0.374]
34 A depth smoothness prior is expressed by a potential of the form Φ wij (yi − yj )2 = i j∈N (i) (2) Here N (i) is the set of nodes adjacent to i. [sent-41, score-0.633]
35 The weighting factors wij are a key element, in that they provide the link to the image layer in the MRF. [sent-44, score-0.289]
36 Each wij is a deterministic function of the corresponding two adjacent image pixels, which is calculated as follows: wij uij = exp(−c uij ) = ||xi − xj ||2 2 (3) (4) Here c is a constant that quantifies how unwilling we are to have smoothing occur across edges in the image. [sent-45, score-0.58]
37 The conditional distribution over the target variables y is given by an expression of the form 1 1 p(y | x, z) = exp(− (Ψ + Φ)) (5) Z 2 where Z is a normalizer (partition function). [sent-47, score-0.038]
38 3 Optimization Unfortunately, computing the full posterior is impossible for such an MRF, not least because the MRF may possesses many millions of nodes; even loopy belief propagation [19] requires enormous time for convergence. [sent-48, score-0.071]
39 Instead, for the depth reconstruction problem we shall be content with computing the mode of the posterior. [sent-49, score-0.595]
40 Finding the mode of the log-posterior is, of course, a least square optimization problem, which we solve with the well-known conjugate gradient (CG) algorithm [12]. [sent-50, score-0.05]
41 A typical single-image optimization with 2 · 105 nodes takes about a second to optimize on a modern computer. [sent-51, score-0.053]
42 The details of the CG algorithm are omitted for brevity, but can be found in contemporary texts. [sent-52, score-0.025]
43 The resulting algorithm for probable depth image reconstruction is now remarkably simple: Simply set y [0] by the bilinear interpolation of z, and then iterate the CG update rule. [sent-53, score-0.773]
44 The result is a probable reconstruction of the depth map at the same resolution as the camera image. [sent-54, score-1.055]
45 4 Results Our experiments were performed with a SICK sweeping laser range finder and a Canon consumer digital camera with 5 mega pixels per image. [sent-55, score-0.595]
46 This configuration allows us to survey an entire room from a consistent vantage point and with known camera and laser positions at all times. [sent-57, score-0.41]
47 The output of this system is a set of pre-aligned laser range measurements and camera images. [sent-58, score-0.542]
48 The top row contains several views of the raw measurements and the bottom row is the output of the MRF. [sent-60, score-0.161]
49 The latter is clearly much sharper and less noisy; many features that are smaller than the resolution of the laser scanner are pulled out by the camera image. [sent-61, score-0.503]
50 Figure 5 shows the same scene from much further back. [sent-62, score-0.047]
51 Here we examine the painted metal door frame to an office. [sent-64, score-0.086]
52 The detailed structure is completely invisible in the raw laser scan but is easily drawn out when the image data is incorporated. [sent-65, score-0.44]
53 It is notable that traditional mesh fairing algorithms would not be able to recover this fine structure, as there is simply insufficient evidence of it in the range data alone. [sent-66, score-0.374]
54 Panels (a-c) show the raw data, the low-res depth map, a 3D model constructed from this depth map, and the same model with image texture superimposed. [sent-68, score-1.268]
55 The depth map is now high-resolution, as is the 3D model. [sent-70, score-0.525]
56 The 3D rendering is a substantial improvement over the raw sensor data; in fact, many small details are now visible. [sent-71, score-0.194]
57 Our approach clearly recovers those corners, thanks to the use of the camera image. [sent-73, score-0.219]
58 The coarse texture of the wooden surface is correctly inferred in contrast to the smooth white wall. [sent-76, score-0.166]
59 This brings up the obvious problem that sharp color gradients do frequently occur on smooth surfaces; take, for example, posters. [sent-77, score-0.123]
60 It shows a part of a door frame, for which a course depth map and a fine-grained image is available. [sent-79, score-0.771]
61 The rendering labeled (b) show the result of our MRF when color is entirely ignored, for different fixed value of the weights wij . [sent-80, score-0.308]
62 The images in (c) are the results of our approach, which clearly retains the sharp corner of the door frame. [sent-81, score-0.215]
63 Clearly, the reconstruction of such depth maps is an ill-posed problem, and our approach generates a high-res model that is still significantly better than the original data. [sent-83, score-0.545]
64 Notice, however, that the background wall is recovered accurately, and the corner of the room is visually enhanced. [sent-84, score-0.136]
65 5 Related Work One of the primary acquisition techniques for depth is stereo. [sent-85, score-0.579]
66 A good survey and comparison of stereo algorithms can is due to [14]. [sent-86, score-0.316]
67 Our algorithm does not apply to stereo vision, since by definition the resolution of the image and the inferred depth map are equivalent. [sent-87, score-1.17]
68 (a) 3D model based on the raw range data, with and without texture (b) Refined and super-resolved model, generated by our MRF Figure 4: This example illustrate that the amount of smoothing in the range data is dependent on the image texture. [sent-88, score-0.764]
69 On the left is a wooden box with an unsmooth surface that causes significant color variations. [sent-89, score-0.189]
70 In the background is a while wall with almost no color variation. [sent-91, score-0.142]
71 Here our approach smooths the mesh significantly; in fact, it enhances the visibility of the room corner. [sent-92, score-0.241]
72 Passive stereo, in which the sensor does not carry its own light source, is unable to estimate ranges in the absence of texture (e. [sent-93, score-0.162]
73 We remark that Markov Random fields have become a defining methodology in stereo reconstruction [15], along with layered EM-style methods [2, 16]; see the comparison in [14]. [sent-98, score-0.42]
74 Similar work due to [20] relies on a different set of image cues to improve stereo shape estimates. [sent-99, score-0.47]
75 In particular, learned regression coefficients are used to predict the band-passed shape of a scene from a band-passed image of that scene. [sent-100, score-0.233]
76 For range images, surfaces, and point clouds, there exists a large literature on smoothing while preserving features such as edges. [sent-103, score-0.23]
77 This includes work on diffusion processes [6], frequency-domain filtering [17], and anisotropic diffusion [5]; see also [3] and [1]. [sent-104, score-0.273]
78 Most recently [10] proposed an efficient non-iterative technique for feature-preserving mesh smoothing, [9] adapted bilateral filtering for application to mesh denoising. [sent-105, score-0.348]
79 None of these techniques, however, integrates highresolution images to guide the smoothing process. [sent-107, score-0.15]
80 Super-resolution techniques have long been popular in the computer vision field [8] and in aerial photogrammetry [11]. [sent-110, score-0.175]
81 Here Bayesian techniques are often brought to bear for integrating multiple images into a single image of higher resolution. [sent-111, score-0.388]
82 Finally, multiple range scans are often integrated into a single model [13, 18], yet none of these techniques involve image data. [sent-113, score-0.481]
83 6 Conclusion We have presented a Markov Random Field that integrated high-res image data into low-res range data, to recover range data at the same resolution as the image data. [sent-114, score-0.883]
84 This approach is specifically aimed at a new wave of commercially available sensors, which provide range at lower resolution than image data. [sent-115, score-0.579]
85 We have shown that our approach can truly fill the resolution gap between range and images, and use image data to effectively boost the resolution of a range finder. [sent-117, score-0.872]
86 While none of the techniques used here are new (even though CG is usually not applied for inference in MRFs), we believe this is the first application of MRF to multimodal data integration. [sent-118, score-0.127]
87 A large number of scientific fields would benefit from better range sensing; the present approach provides a solution that endows low-cost range finders with unprecedented resolution and accuracy. [sent-119, score-0.536]
88 In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), pages 434–438, Santa Barbara, CA, 1998. [sent-130, score-0.033]
89 In Proceedings of the IEEE Conference on Visualization, pages 397–405, 2000. [sent-136, score-0.033]
90 Spacetime stereo: A unifying framework for depth from triangulation. [sent-141, score-0.451]
91 Implicit fairing of irregular meshes using o diffusion and curvature flow. [sent-156, score-0.141]
92 A bayesian method for probable surface reconstruction and decimation. [sent-162, score-0.187]
93 High resolution terrain mapping using low altitude aerial stereo imagery. [sent-187, score-0.541]
94 Numerical recipes in C: the art of scientific computing. [sent-192, score-0.04]
95 A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. [sent-204, score-0.284]
96 In Proceedings of the British Machine Vision Conference, Vol 2, pages 314–328, 1999. [sent-217, score-0.033]
97 Correctness of belief propagation in gaussian graphical models of arbitrary topology. [sent-232, score-0.033]
wordName wordTfidf (topN-words)
[('mrf', 0.452), ('depth', 0.451), ('stereo', 0.284), ('camera', 0.219), ('image', 0.186), ('resolution', 0.175), ('range', 0.168), ('mesh', 0.149), ('raw', 0.115), ('siggraph', 0.113), ('laser', 0.109), ('anisotropic', 0.105), ('wij', 0.103), ('reconstruction', 0.094), ('color', 0.088), ('images', 0.088), ('diffusion', 0.084), ('labeled', 0.075), ('map', 0.074), ('cg', 0.072), ('techniques', 0.065), ('texture', 0.065), ('surfaces', 0.065), ('acquisition', 0.063), ('none', 0.062), ('smoothing', 0.062), ('proceedings', 0.062), ('door', 0.06), ('aerial', 0.057), ('desbrun', 0.057), ('diebel', 0.057), ('fairing', 0.057), ('quebec', 0.057), ('schr', 0.057), ('transcation', 0.057), ('wall', 0.054), ('nodes', 0.053), ('vision', 0.053), ('surface', 0.051), ('room', 0.05), ('mode', 0.05), ('bilateral', 0.05), ('commercially', 0.05), ('meyer', 0.05), ('uij', 0.05), ('wooden', 0.05), ('integrating', 0.049), ('imaging', 0.047), ('scene', 0.047), ('pixels', 0.047), ('measurements', 0.046), ('devices', 0.046), ('sensors', 0.046), ('nder', 0.045), ('smooths', 0.042), ('layered', 0.042), ('mrfs', 0.042), ('rendering', 0.042), ('probable', 0.042), ('passive', 0.041), ('recipes', 0.04), ('sensing', 0.04), ('thrun', 0.039), ('enormous', 0.038), ('variables', 0.038), ('mapped', 0.037), ('conventional', 0.037), ('stanford', 0.037), ('sensor', 0.037), ('sharp', 0.035), ('discontinuities', 0.035), ('zi', 0.034), ('measurement', 0.034), ('graphics', 0.033), ('pages', 0.033), ('belief', 0.033), ('survey', 0.032), ('reconstructed', 0.032), ('corner', 0.032), ('technologies', 0.032), ('scan', 0.03), ('light', 0.03), ('ranges', 0.03), ('cvpr', 0.028), ('panels', 0.028), ('der', 0.028), ('conference', 0.028), ('digital', 0.027), ('elds', 0.027), ('frame', 0.026), ('adjacent', 0.026), ('cally', 0.026), ('imager', 0.025), ('bajaj', 0.025), ('featureless', 0.025), ('altitude', 0.025), ('consumer', 0.025), ('contemporary', 0.025), ('elad', 0.025), ('endows', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 23 nips-2005-An Application of Markov Random Fields to Range Sensing
Author: James Diebel, Sebastian Thrun
Abstract: This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology. 1
2 0.35666099 110 nips-2005-Learning Depth from Single Monocular Images
Author: Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng
Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. 1
3 0.24764861 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
Author: Tai-sing Lee, Brian R. Potetz
Abstract: This paper explores the statistical relationship between natural images and their underlying range (depth) images. We look at how this relationship changes over scale, and how this information can be used to enhance low resolution range data using a full resolution intensity image. Based on our findings, we propose an extension to an existing technique known as shape recipes [3], and the success of the two methods are compared using images and laser scans of real scenes. Our extension is shown to provide a two-fold improvement over the current method. Furthermore, we demonstrate that ideal linear shape-from-shading filters, when learned from natural scenes, may derive even more strength from shadow cues than from the traditional linear-Lambertian shading cues. 1
4 0.14296819 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning
Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun
Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.
5 0.1404154 108 nips-2005-Layered Dynamic Textures
Author: Antoni B. Chan, Nuno Vasconcelos
Abstract: A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video.
6 0.11185823 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
7 0.093809091 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
8 0.073647387 97 nips-2005-Inferring Motor Programs from Images of Handwritten Digits
9 0.072590694 74 nips-2005-Faster Rates in Regression via Active Learning
10 0.070269898 169 nips-2005-Saliency Based on Information Maximization
11 0.068583719 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections
12 0.0680314 192 nips-2005-The Information-Form Data Association Filter
13 0.062246379 135 nips-2005-Neuronal Fiber Delineation in Area of Edema from Diffusion Weighted MRI
14 0.061855089 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
15 0.061829045 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?
16 0.060821462 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection
17 0.059708778 131 nips-2005-Multiple Instance Boosting for Object Detection
18 0.056898568 151 nips-2005-Pattern Recognition from One Example by Chopping
19 0.056883227 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels
20 0.055844821 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
topicId topicWeight
[(0, 0.182), (1, 0.001), (2, -0.011), (3, 0.254), (4, -0.118), (5, 0.071), (6, -0.155), (7, 0.15), (8, 0.258), (9, -0.201), (10, 0.113), (11, 0.248), (12, 0.062), (13, -0.038), (14, 0.133), (15, -0.15), (16, 0.238), (17, 0.056), (18, -0.09), (19, -0.155), (20, 0.04), (21, -0.024), (22, 0.084), (23, -0.034), (24, -0.069), (25, -0.04), (26, -0.029), (27, 0.06), (28, -0.136), (29, 0.114), (30, 0.135), (31, -0.036), (32, 0.001), (33, -0.098), (34, -0.04), (35, 0.021), (36, -0.075), (37, 0.017), (38, 0.01), (39, 0.017), (40, 0.001), (41, 0.007), (42, 0.026), (43, -0.034), (44, 0.012), (45, 0.025), (46, -0.114), (47, -0.008), (48, -0.008), (49, -0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.97887576 23 nips-2005-An Application of Markov Random Fields to Range Sensing
Author: James Diebel, Sebastian Thrun
Abstract: This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology. 1
2 0.87470061 110 nips-2005-Learning Depth from Single Monocular Images
Author: Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng
Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. 1
3 0.79820663 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
Author: Tai-sing Lee, Brian R. Potetz
Abstract: This paper explores the statistical relationship between natural images and their underlying range (depth) images. We look at how this relationship changes over scale, and how this information can be used to enhance low resolution range data using a full resolution intensity image. Based on our findings, we propose an extension to an existing technique known as shape recipes [3], and the success of the two methods are compared using images and laser scans of real scenes. Our extension is shown to provide a two-fold improvement over the current method. Furthermore, we demonstrate that ideal linear shape-from-shading filters, when learned from natural scenes, may derive even more strength from shadow cues than from the traditional linear-Lambertian shading cues. 1
4 0.61524844 108 nips-2005-Layered Dynamic Textures
Author: Antoni B. Chan, Nuno Vasconcelos
Abstract: A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video.
5 0.61154693 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning
Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun
Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.
6 0.38971782 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
7 0.31638199 169 nips-2005-Saliency Based on Information Maximization
8 0.2841419 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
9 0.25944138 97 nips-2005-Inferring Motor Programs from Images of Handwritten Digits
10 0.25189281 158 nips-2005-Products of ``Edge-perts
11 0.22744273 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?
12 0.22415358 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
13 0.22111525 109 nips-2005-Learning Cue-Invariant Visual Responses
14 0.21979561 151 nips-2005-Pattern Recognition from One Example by Chopping
15 0.21867362 162 nips-2005-Rate Distortion Codes in Sensor Networks: A System-level Analysis
16 0.21762015 7 nips-2005-A Cortically-Plausible Inverse Problem Solving Method Applied to Recognizing Static and Kinematic 3D Objects
17 0.2073089 198 nips-2005-Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails
18 0.20538636 79 nips-2005-Fusion of Similarity Data in Clustering
19 0.2012201 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections
20 0.19985716 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes
topicId topicWeight
[(3, 0.038), (10, 0.081), (11, 0.014), (27, 0.028), (31, 0.062), (34, 0.099), (39, 0.022), (41, 0.021), (50, 0.013), (55, 0.036), (69, 0.062), (73, 0.04), (83, 0.233), (88, 0.096), (91, 0.038), (93, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.8055529 23 nips-2005-An Application of Markov Random Fields to Range Sensing
Author: James Diebel, Sebastian Thrun
Abstract: This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology. 1
2 0.61146855 144 nips-2005-Off-policy Learning with Options and Recognizers
Author: Doina Precup, Cosmin Paduraru, Anna Koop, Richard S. Sutton, Satinder P. Singh
Abstract: We introduce a new algorithm for off-policy temporal-difference learning with function approximation that has lower variance and requires less knowledge of the behavior policy than prior methods. We develop the notion of a recognizer, a filter on actions that distorts the behavior policy to produce a related target policy with low-variance importance-sampling corrections. We also consider target policies that are deviations from the state distribution of the behavior policy, such as potential temporally abstract options, which further reduces variance. This paper introduces recognizers and their potential advantages, then develops a full algorithm for linear function approximation and proves that its updates are in the same direction as on-policy TD updates, which implies asymptotic convergence. Even though our algorithm is based on importance sampling, we prove that it requires absolutely no knowledge of the behavior policy for the case of state-aggregation function approximators. Off-policy learning is learning about one way of behaving while actually behaving in another way. For example, Q-learning is an off- policy learning method because it learns about the optimal policy while taking actions in a more exploratory fashion, e.g., according to an ε-greedy policy. Off-policy learning is of interest because only one way of selecting actions can be used at any time, but we would like to learn about many different ways of behaving from the single resultant stream of experience. For example, the options framework for temporal abstraction involves considering a variety of different ways of selecting actions. For each such option one would like to learn a model of its possible outcomes suitable for planning and other uses. Such option models have been proposed as fundamental building blocks of grounded world knowledge (Sutton, Precup & Singh, 1999; Sutton, Rafols & Koop, 2005). Using off-policy learning, one would be able to learn predictive models for many options at the same time from a single stream of experience. Unfortunately, off-policy learning using temporal-difference methods has proven problematic when used in conjunction with function approximation. Function approximation is essential in order to handle the large state spaces that are inherent in many problem do- mains. Q-learning, for example, has been proven to converge to an optimal policy in the tabular case, but is unsound and may diverge in the case of linear function approximation (Baird, 1996). Precup, Sutton, and Dasgupta (2001) introduced and proved convergence for the first off-policy learning algorithm with linear function approximation. They addressed the problem of learning the expected value of a target policy based on experience generated using a different behavior policy. They used importance sampling techniques to reduce the off-policy case to the on-policy case, where existing convergence theorems apply (Tsitsiklis & Van Roy, 1997; Tadic, 2001). There are two important difficulties with that approach. First, the behavior policy needs to be stationary and known, because it is needed to compute the importance sampling corrections. Second, the importance sampling weights are often ill-conditioned. In the worst case, the variance could be infinite and convergence would not occur. The conditions required to prevent this were somewhat awkward and, even when they applied and asymptotic convergence was assured, the variance could still be high and convergence could be slow. In this paper we address both of these problems in the context of off-policy learning for options. We introduce the notion of a recognizer. Rather than specifying an explicit target policy (for instance, the policy of an option), about which we want to make predictions, a recognizer specifies a condition on the actions that are selected. For example, a recognizer for the temporally extended action of picking up a cup would not specify which hand is to be used, or what the motion should be at all different positions of the cup. The recognizer would recognize a whole variety of directions of motion and poses as part of picking the cup. The advantage of this strategy is not that one might prefer a multitude of different behaviors, but that the behavior may be based on a variety of different strategies, all of which are relevant, and we would like to learn from any of them. In general, a recognizer is a function that recognizes or accepts a space of different ways of behaving and thus, can learn from a wider range of data. Recognizers have two advantages over direct specification of a target policy: 1) they are a natural and easy way to specify a target policy for which importance sampling will be well conditioned, and 2) they do not require the behavior policy to be known. The latter is important because in many cases we may have little knowledge of the behavior policy, or a stationary behavior policy may not even exist. We show that for the case of state aggregation, even if the behavior policy is unknown, convergence to a good model is achieved. 1 Non-sequential example The benefits of using recognizers in off-policy learning can be most easily seen in a nonsequential context with a single continuous action. Suppose you are given a sequence of sample actions ai ∈ [0, 1], selected i.i.d. according to probability density b : [0, 1] → ℜ+ (the behavior density). For example, suppose the behavior density is of the oscillatory form shown as a red line in Figure 1. For each each action, ai , we observe a corresponding outcome, zi ∈ ℜ, a random variable whose distribution depends only on ai . Thus the behavior density induces an outcome density. The on-policy problem is to estimate the mean mb of the outcome density. This problem can be solved simply by averaging the sample outcomes: mb = (1/n) ∑n zi . The off-policy problem is to use this same data to learn what ˆ i=1 the mean would be if actions were selected in some way other than b, for example, if the actions were restricted to a designated range, such as between 0.7 and 0.9. There are two natural ways to pose this off-policy problem. The most straightforward way is to be equally interested in all actions within the designated region. One professes to be interested in actions selected according to a target density π : [0, 1] → ℜ+ , which in the example would be 5.0 between 0.7 and 0.9, and zero elsewhere, as in the dashed line in 12 Probability density functions 1.5 Target policy with recognizer 1 Target policy w/o recognizer without recognizer .5 Behavior policy 0 0 Action 0.7 Empirical variances (average of 200 sample variances) 0.9 1 0 10 with recognizer 100 200 300 400 500 Number of sample actions Figure 1: The left panel shows the behavior policy and the target policies for the formulations of the problem with and without recognizers. The right panel shows empirical estimates of the variances for the two formulations as a function of the number sample actions. The lowest line is for the formulation using empirically-estimated recognition probabilities. Figure 1 (left). The importance- sampling estimate of the mean outcome is 1 n π(ai ) mπ = ∑ ˆ zi . n i=1 b(ai ) (1) This approach is problematic if there are parts of the region of interest where the behavior density is zero or very nearly so, such as near 0.72 and 0.85 in the example. Here the importance sampling ratios are exceedingly large and the estimate is poorly conditioned (large variance). The upper curve in Figure 1 (right) shows the empirical variance of this estimate as a function of the number of samples. The spikes and uncertain decline of the empirical variance indicate that the distribution is very skewed and that the estimates are very poorly conditioned. The second way to pose the problem uses recognizers. One professes to be interested in actions to the extent that they are both selected by b and within the designated region. This leads to the target policy shown in blue in the left panel of Figure 1 (it is taller because it still must sum to 1). For this problem, the variance of (1) is much smaller, as shown in the lower two lines of Figure 1 (right). To make this way of posing the problem clear, we introduce the notion of a recognizer function c : A → ℜ+ . The action space in the example is A = [0, 1] and the recognizer is c(a) = 1 for a between 0.7 and 0.9 and is zero elsewhere. The target policy is defined in general by c(a)b(a) c(a)b(a) = . (2) π(a) = µ ∑x c(x)b(x) where µ = ∑x c(x)b(x) is a constant, equal to the probability of recognizing an action from the behavior policy. Given π, mπ from (1) can be rewritten in terms of the recognizer as ˆ n π(ai ) 1 n c(ai )b(ai ) 1 1 n c(ai ) 1 mπ = ∑ zi ˆ = ∑ zi = ∑ zi (3) n i=1 b(ai ) n i=1 µ b(ai ) n i=1 µ Note that the target density does not appear at all in the last expression and that the behavior distribution appears only in µ, which is independent of the sample action. If this constant is known, then this estimator can be computed with no knowledge of π or b. The constant µ can easily be estimated as the fraction of recognized actions in the sample. The lowest line in Figure 1 (right) shows the variance of the estimator using this fraction in place of the recognition probability. Its variance is low, no worse than that of the exact algorithm, and apparently slightly lower. Because this algorithm does not use the behavior density, it can be applied when the behavior density is unknown or does not even exist. For example, suppose actions were selected in some deterministic, systematic way that in the long run produced an empirical distribution like b. This would be problematic for the other algorithms but would require no modification of the recognition-fraction algorithm. 2 Recognizers improve conditioning of off-policy learning The main use of recognizers is in formulating a target density π about which we can successfully learn predictions, based on the current behavior being followed. Here we formalize this intuition. Theorem 1 Let A = {a1 , . . . ak } ⊆ A be a subset of all the possible actions. Consider a fixed behavior policy b and let πA be the class of policies that only choose actions from A, i.e., if π(a) > 0 then a ∈ A. Then the policy induced by b and the binary recognizer cA is the policy with minimum-variance one-step importance sampling corrections, among those in πA : π(ai ) 2 π as given by (2) = arg min Eb (4) π∈πA b(ai ) Proof: Denote π(ai ) = πi , b(ai ) = bi . Then the expected variance of the one-step importance sampling corrections is: Eb πi bi πi bi 2 2 − Eb = ∑ bi i πi bi 2 −1 = ∑ i π2 i − 1, bi where the summation (here and everywhere below) is such that the action ai ∈ A. We want to find πi that minimizes this expression, subject to the constraint that ∑i πi = 1. This is a constrained optimization problem. To solve it, we write down the corresponding Lagrangian: π2 L(πi , β) = ∑ i − 1 + β(∑ πi − 1) i i bi We take the partial derivatives wrt πi and β and set them to 0: βbi ∂L 2 = πi + β = 0 ⇒ πi = − ∂πi bi 2 (5) ∂L = πi − 1 = 0 ∂β ∑ i (6) By taking (5) and plugging into (6), we get the following expression for β: − β 2 bi = 1 ⇒ β = − 2∑ ∑i bi i By substituting β into (5) we obtain: πi = bi ∑i b i This is exactly the policy induced by the recognizer defined by c(ai ) = 1 iff ai ∈ A. We also note that it is advantageous, from the point of view of minimizing the variance of the updates, to have recognizers that accept a broad range of actions: Theorem 2 Consider two binary recognizers c1 and c2 , such that µ1 > µ2 . Then the importance sampling corrections for c1 have lower variance than the importance sampling corrections for c2 . Proof: From the previous theorem, we have the variance of a recognizer cA : Var = ∑ i π2 bi i −1 = ∑ bi ∑ j∈A b j i 2 1 1 1 −1 = −1 = −1 bi µ ∑ j∈A b j 3 Formal framework for sequential problems We turn now to the full case of learning about sequential decision processes with function approximation. We use the standard framework in which an agent interacts with a stochastic environment. At each time step t, the agent receives a state st and chooses an action at . We assume for the moment that actions are selected according to a fixed behavior policy, b : S × A → [0, 1] where b(s, a) is the probability of selecting action a in state s. The behavior policy is used to generate a sequence of experience (observations, actions and rewards). The goal is to learn, from this data, predictions about different ways of behaving. In this paper we focus on learning predictions about expected returns, but other predictions can be tackled as well (for instance, predictions of transition models for options (Sutton, Precup & Singh, 1999), or predictions specified by a TD-network (Sutton & Tanner, 2005; Sutton, Rafols & Koop, 2006)). We assume that the state space is large or continuous, and function approximation must be used to compute any values of interest. In particular, we assume a space of feature vectors Φ and a mapping φ : S → Φ. We denote by φs the feature vector associated with s. An option is defined as a triple o = I, π, β where I ⊆ S is the set of states in which the option can be initiated, π is the internal policy of the option and β : S → [0, 1] is a stochastic termination condition. In the option work (Sutton, Precup & Singh, 1999), each of these elements has to be explicitly specified and fixed in order for an option to be well defined. Here, we will instead define options implicitly, using the notion of a recognizer. A recognizer is defined as a function c : S × A → [0, 1], where c(s, a) indicates to what extent the recognizer allows action a in state s. An important special case, which we treat in this paper, is that of binary recognizers. In this case, c is an indicator function, specifying a subset of actions that are allowed, or recognized, given a particular state. Note that recognizers do not specify policies; instead, they merely give restrictions on the policies that are allowed or recognized. A recognizer c together with a behavior policy b generates a target policy π, where: b(s, a)c(s, a) b(s, a)c(s, a) π(s, a) = (7) = µ(s) ∑x b(s, x)c(s, x) The denominator of this fraction, µ(s) = ∑x b(s, x)c(s, x), is the recognition probability at s, i.e., the probability that an action will be accepted at s when behavior is generated according to b. The policy π is only defined at states for which µ(s) > 0. The numerator gives the probability that action a is produced by the behavior and recognized in s. Note that if the recognizer accepts all state-action pairs, i.e. c(s, a) = 1, ∀s, a, then π is the same as b. Since a recognizer and a behavior policy can specify together a target policy, we can use recognizers as a way to specify policies for options, using (7). An option can only be initiated at a state for which at least one action is recognized, so µ(s) > 0, ∀s ∈ I. Similarly, the termination condition of such an option, β, is defined as β(s) = 1 if µ(s) = 0. In other words, the option must terminate if no actions are recognized at a given state. At all other states, β can be defined between 0 and 1 as desired. We will focus on computing the reward model of an option o, which represents the expected total return. The expected values of different features at the end of the option can be estimated similarly. The quantity that we want to compute is Eo {R(s)} = E{r1 + r2 + . . . + rT |s0 = s, π, β} where s ∈ I, experience is generated according to the policy of the option, π, and T denotes the random variable representing the time step at which the option terminates according to β. We assume that linear function approximation is used to represent these values, i.e. Eo {R(s)} ≈ θT φs where θ is a vector of parameters. 4 Off-policy learning algorithm In this section we present an adaptation of the off-policy learning algorithm of Precup, Sutton & Dasgupta (2001) to the case of learning about options. Suppose that an option’s policy π was used to generate behavior. In this case, learning the reward model of the option is a special case of temporal-difference learning of value functions. The forward ¯ (n) view of this algorithm is as follows. Let Rt denote the truncated n-step return starting at ¯ (0) time step t and let yt denote the 0-step truncated return, Rt . By the definition of the n-step truncated return, we have: ¯ (n) ¯ (n−1) Rt = rt+1 + (1 − βt+1 )Rt+1 . This is similar to the case of value functions, but it accounts for the possibility of terminating the option at time step t + 1. The λ-return is defined in the usual way: ∞ ¯ (n) ¯ Rtλ = (1 − λ) ∑ λn−1 Rt . n=1 The parameters of the linear function approximator are updated on every time step proportionally to: ¯ ¯ ∆θt = Rtλ − yt ∇θ yt (1 − β1 ) · · · (1 − βt ). In our case, however, trajectories are generated according to the behavior policy b. The main idea of the algorithm is to use importance sampling corrections in order to account for the difference in the state distribution of the two policies. Let ρt = (n) Rt , π(st ,at ) b(st ,at ) be the importance sampling ratio at time step t. The truncated n-step return, satisfies: (n) (n−1) Rt = ρt [rt+1 + (1 − βt+1 )Rt+1 ]. The update to the parameter vector is proportional to: ∆θt = Rtλ − yt ∇θ yt ρ0 (1 − β1 ) · · · ρt−1 (1 − βt ). The following result shows that the expected updates of the on-policy and off-policy algorithms are the same. Theorem 3 For every time step t ≥ 0 and any initial state s, ¯ Eb [∆θt |s] = Eπ [∆θt |s]. (n) (n) ¯ Proof: First we will show by induction that Eb {Rt |s} = Eπ {Rt |s}, ∀n (which implies ¯ that Eb {Rtλ |s} = Eπ (Rtλ |s}). For n = 0, the statement is trivial. Assuming that it is true for n − 1, we have (n) Eb Rt |s = a ∑b(s, a)∑Pss ρ(s, a) a = s ∑∑ a Pss b(s, a) a s = a ∑π(s, a)∑Pss a (n−1) a rss + (1 − β(s ))Eb Rt+1 |s π(s, a) a ¯ (n−1) r + (1 − β(s ))Eπ Rt+1 |s b(s, a) ss a ¯ (n−1) rss + (1 − β(s ))Eπ Rt+1 |s ¯ (n) = Eπ Rt |s . s Now we are ready to prove the theorem’s main statement. Defining Ωt to be the set of all trajectory components up to state st , we have: Eb {∆θt |s} = ∑ ω∈Ωt Pb (ω|s)Eb (Rtλ − yt )∇θ yt |ω t−1 ∏ ρi (1 − βi+1 ) i=0 πi (1 − βi+1 ) i=0 bi t−1 = t−1 ∑ ∏ bi Psaiisi+1 ω∈Ωt Eb Rtλ |st − yt ∇θ yt ∏ i=0 t−1 = ∑ ∏ πi Psaiisi+1 ω∈Ωt = ∑ ω∈Ωt ¯ Eπ Rtλ |st − yt ∇θ yt (1 − β1 )...(1 − βt ) i=0 ¯ ¯ Pπ (ω|s)Eπ (Rtλ − yt )∇θ yt |ω (1 − β1 )...(1 − βt ) = Eπ ∆θt |s . Note that we are able to use st and ω interchangeably because of the Markov property. ¯ Since we have shown that Eb [∆θt |s] = Eπ [∆θt |s] for any state s, it follows that the expected updates will also be equal for any distribution of the initial state s. When learning the model of options with data generated from the behavior policy b, the starting state distribution with respect to which the learning is performed, I0 is determined by the stationary distribution of the behavior policy, as well as the initiation set of the option I. We note also that the importance sampling corrections only have to be performed for the trajectory since the initiation of the updates for the option. No corrections are required for the experience prior to this point. This should generate updates that have significantly lower variance than in the case of learning values of policies (Precup, Sutton & Dasgupta, 2001). Because of the termination condition of the option, β, ∆θ can quickly decay to zero. To avoid this problem, we can use a restart function g : S → [0, 1], such that g(st ) specifies the extent to which the updating episode is considered to start at time t. Adding restarts generates a new forward update: t ∆θt = (Rtλ − yt )∇θ yt ∑ gi ρi ...ρt−1 (1 − βi+1 )...(1 − βt ), (8) i=0 where Rtλ is the same as above. With an adaptation of the proof in Precup, Sutton & Dasgupta (2001), we can show that we get the same expected value of updates by applying this algorithm from the original starting distribution as we would by applying the algorithm without restarts from a starting distribution defined by I0 and g. We can turn this forward algorithm into an incremental, backward view algorithm in the following way: • Initialize k0 = g0 , e0 = k0 ∇θ y0 • At every time step t: δt = θt+1 = kt+1 = et+1 = ρt (rt+1 + (1 − βt+1 )yt+1 ) − yt θt + αδt et ρt kt (1 − βt+1 ) + gt+1 λρt (1 − βt+1 )et + kt+1 ∇θ yt+1 Using a similar technique to that of Precup, Sutton & Dasgupta (2001) and Sutton & Barto (1998), we can prove that the forward and backward algorithm are equivalent (omitted due to lack of space). This algorithm is guaranteed to converge if the variance of the updates is finite (Precup, Sutton & Dasgupta, 2001). In the case of options, the termination condition β can be used to ensure that this is the case. 5 Learning when the behavior policy is unknown In this section, we consider the case in which the behavior policy is unknown. This case is generally problematic for importance sampling algorithms, but the use of recognizers will allow us to define importance sampling corrections, as well as a convergent algorithm. Recall that when using a recognizer, the target policy of the option is defined as: c(s, a)b(s, a) π(s, a) = µ(s) and the recognition probability becomes: π(s, a) c(s, a) = b(s, a) µ(s) Of course, µ(s) depends on b. If b is unknown, instead of µ(s), we will use a maximum likelihood estimate µ : S → [0, 1]. The structure used to compute µ will have to be compatible ˆ ˆ with the feature space used to represent the reward model. We will make this more precise below. Likewise, the recognizer c(s, a) will have to be defined in terms of the features used to represent the model. We will then define the importance sampling corrections as: c(s, a) ˆ ρ(s, a) = µ(s) ˆ ρ(s, a) = We consider the case in which the function approximator used to model the option is actually a state aggregator. In this case, we will define recognizers which behave consistently in each partition, i.e., c(s, a) = c(p, a), ∀s ∈ p. This means that an action is either recognized or not recognized in all states of the partition. The recognition probability µ will have one ˆ entry for every partition p of the state space. Its value will be: N(p, c = 1) µ(p) = ˆ N(p) where N(p) is the number of times partition p was visited, and N(p, c = 1) is the number of times the action taken in p was recognized. In the limit, w.p.1, µ converges to ˆ ∑s d b (s|p) ∑a c(p, a)b(s, a) where d b (s|p) is the probability of visiting state s from partiˆ ˆ tion p under the stationary distribution of b. At this limit, π(s, a) = ρ(s, a)b(s, a) will be a ˆ well-defined policy (i.e., ∑a π(s, a) = 1). Using Theorem 3, off-policy updates using imˆ portance sampling corrections ρ will have the same expected value as on-policy updates ˆ ˆ using π. Note though that the learning algorithm never uses π; the only quantities needed ˆ are ρ, which are learned incrementally from data. For the case of general linear function approximation, we conjecture that a similar idea can be used, where the recognition probability is learned using logistic regression. The development of this part is left for future work. Acknowledgements The authors gratefully acknowledge the ideas and encouragement they have received in this work from Eddie Rafols, Mark Ring, Lihong Li and other members of the rlai.net group. We thank Csaba Szepesvari and the reviewers of the paper for constructive comments. This research was supported in part by iCore, NSERC, Alberta Ingenuity, and CFI. References Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of ICML. Precup, D., Sutton, R. S. and Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In Proceedings of ICML. Sutton, R.S., Precup D. and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, vol . 112, pp. 181–211. Sutton,, R.S. and Tanner, B. (2005). Temporal-difference networks. In Proceedings of NIPS-17. Sutton R.S., Raffols E. and Koop, A. (2006). Temporal abstraction in temporal-difference networks”. In Proceedings of NIPS-18. Tadic, V. (2001). On the convergence of temporal-difference learning with linear function approximation. In Machine learning vol. 42, pp. 241-267. Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42:674–690.
3 0.60321045 177 nips-2005-Size Regularized Cut for Data Clustering
Author: Yixin Chen, Ya Zhang, Xiang Ji
Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is defined as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1
4 0.60156727 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
Author: Tai-sing Lee, Brian R. Potetz
Abstract: This paper explores the statistical relationship between natural images and their underlying range (depth) images. We look at how this relationship changes over scale, and how this information can be used to enhance low resolution range data using a full resolution intensity image. Based on our findings, we propose an extension to an existing technique known as shape recipes [3], and the success of the two methods are compared using images and laser scans of real scenes. Our extension is shown to provide a two-fold improvement over the current method. Furthermore, we demonstrate that ideal linear shape-from-shading filters, when learned from natural scenes, may derive even more strength from shadow cues than from the traditional linear-Lambertian shading cues. 1
5 0.60030049 78 nips-2005-From Weighted Classification to Policy Search
Author: Doron Blatt, Alfred O. Hero
Abstract: This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems. The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcement learning problem into a sequence of single-stage reinforcement learning subproblems, each of which is solved via an exact reduction to a weighted-classification problem that can be solved using off-the-self methods. Thus the algorithm converts a reinforcement learning problem into simpler supervised learning subproblems. It is shown that the method converges in a finite number of steps to a solution that cannot be further improved by componentwise optimization. The implication of the proposed algorithm is that a plethora of classification methods can be applied to find policies in the reinforcement learning problem. 1
6 0.59756619 45 nips-2005-Conditional Visual Tracking in Kernel Space
7 0.59571052 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
8 0.59555387 110 nips-2005-Learning Depth from Single Monocular Images
9 0.59533107 184 nips-2005-Structured Prediction via the Extragradient Method
10 0.59510583 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
11 0.59398592 30 nips-2005-Assessing Approximations for Gaussian Process Classification
12 0.59275937 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators
13 0.59130245 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
14 0.59088302 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
15 0.59031022 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error
16 0.59012699 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation
17 0.58858556 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
18 0.58801693 142 nips-2005-Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games
19 0.58798736 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections
20 0.58690584 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs