nips nips2005 nips2005-170 knowledge-graph by maker-knowledge-mining

170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape

Source: pdf

Author: Tai-sing Lee, Brian R. Potetz

Abstract: This paper explores the statistical relationship between natural images and their underlying range (depth) images. We look at how this relationship changes over scale, and how this information can be used to enhance low resolution range data using a full resolution intensity image. Based on our ﬁndings, we propose an extension to an existing technique known as shape recipes [3], and the success of the two methods are compared using images and laser scans of real scenes. Our extension is shown to provide a two-fold improvement over the current method. Furthermore, we demonstrate that ideal linear shape-from-shading ﬁlters, when learned from natural scenes, may derive even more strength from shadow cues than from the traditional linear-Lambertian shading cues. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper explores the statistical relationship between natural images and their underlying range (depth) images. [sent-5, score-0.386]

2 We look at how this relationship changes over scale, and how this information can be used to enhance low resolution range data using a full resolution intensity image. [sent-6, score-0.81]

3 Based on our ﬁndings, we propose an extension to an existing technique known as shape recipes [3], and the success of the two methods are compared using images and laser scans of real scenes. [sent-7, score-0.871]

4 Furthermore, we demonstrate that ideal linear shape-from-shading ﬁlters, when learned from natural scenes, may derive even more strength from shadow cues than from the traditional linear-Lambertian shading cues. [sent-9, score-0.307]

5 1 Introduction The inference of depth information from single images is typically performed by devising models of image formation based on the physics of light interaction and then inverting these models to solve for depth. [sent-10, score-0.475]

6 A statistical understanding of the joint distribution of real images and their underlying 3D structure would allow us to replace these assumptions and simpliﬁcations with probabilistic priors based on real scenes. [sent-13, score-0.365]

7 Real scenes are affected by many regularities in the environment, such as the natural geometry of objects, the arrangements of objects in space, natural distributions of light, and regularities in the position of the observer. [sent-15, score-0.26]

8 Few current shape inference algorithms make use of these trends. [sent-16, score-0.285]

9 Despite the potential usefulness of statistical models and the growing success of statistical methods in vision, few studies have been made into the statistical relationship between images and range (depth) images. [sent-17, score-0.348]

10 Those studies that have examined this relationship in nature have uncovered meaningful and exploitable statistical trends in real scenes which may be useful for designing new algorithms in surface inference, and also for understanding how humans perceive depth in real scenes [6, 4, 8]. [sent-18, score-0.79]

11 In this paper, we explore some of the properties of the statistical relationship between images and their underlying range (depth) images in real scenes, using images acquired by laser scanner in natural environments. [sent-19, score-0.921]

12 Speciﬁcally, we will examine the cross-covariance between images and range images, and how this structure changes over scale. [sent-20, score-0.302]

13 We then illustrate how our statistical ﬁndings can be applied to inference problems by analyzing and extending the shape recipe depth inference algorithm. [sent-21, score-0.674]

14 2 Shape recipes We will motivate our statistical study with an application. [sent-22, score-0.273]

15 Often, we may have a highresolution color image of a scene, but only a low spatial resolution range image (range images record the 3D distance between the scene and the camera for each pixel). [sent-23, score-0.946]

16 This often happens if our range image was acquired by applying a stereo depth inference algorithm. [sent-24, score-0.596]

17 Stereo algorithms rely on smoothness constraints, either explicitly or implicitly, and so the high-frequency components of the resulting range image are not reliable [1, 7]. [sent-25, score-0.332]

18 Lowresolution range data may also be the output of a laser range scanner, if the range scanner is inexpensive, or if the scan must be acquired quickly (range scanners typically acquire each pixel sequentially, taking up to several minutes for a high-resolution scan). [sent-26, score-0.687]

19 It should be possible to improve our estimate of the high spatial frequencies of the range image by using monocular cues from the high-resolution intensity (or color) image. [sent-27, score-0.794]

20 Shape recipes [3, 9] provide one way of doing this. [sent-28, score-0.273]

21 The basic principle of shape recipes is that a relationship between shape and light intensity could be learned from the low resolution image pair, and then extrapolated and applied to the high resolution intensity image to infer the high spatial frequencies of the range image. [sent-29, score-2.248]

22 One advantage of this approach is that hidden variables important to inference from monocular cues, such as illumination direction and material reﬂectance properties, might be implicitly learned from the lowresolution range and intensity images. [sent-30, score-0.589]

23 However, for this approach to work, we require some model of how the relationship between shape and intensity changes over scale, which we discuss below. [sent-31, score-0.489]

24 For shape recipes, both the high resolution intensity image and the low resolution range image are decomposed into steerable wavelet ﬁlter pyramids, linearly breaking the image down according to scale and orientation [2]. [sent-32, score-1.697]

25 Linear regression is then used between the highest frequency band of the available low-resolution range image and the corresponding band of the intensity image, to learn a linear ﬁlter that best predicts the range band from the image band. [sent-33, score-1.215]

26 The hypothesis of the model is that this ﬁlter can then be used to predict high frequency range bands from the high frequency image bands. [sent-34, score-0.594]

27 Let im,φ and zm,φ be steerable ﬁlter pyramid subbands of the intensity and range image respectively, at spatial resolution m and orientation φ (both are integers). [sent-36, score-1.138]

28 Number the band levels so that m=0 is the highest frequency subband of the intensity image, and m=n is the highest available frequency subband of the low-resolution range image. [sent-37, score-0.743]

29 Shape recipes work by learning a linear ﬁlter kn,φ at level n by minimizing sum-squared error (zn,φ − kn,φ in,φ )2 , where denotes convolution. [sent-39, score-0.273]

30 Higher resolution subbands of the range image are inferred by: zm,φ = ˆ 1 (kn,φ im,φ ) cn−m (1) where c = 2. [sent-40, score-0.557]

31 The choice of c = 2 in the shape recipe model is motivated by the linear Lambertian shading model [9]. [sent-41, score-0.531]

32 The underlying assumption of shape recipes is that the convolution kernel km,φ should be roughly constant over the four highest resolution bands of the steerable ﬁlter pyramid. [sent-43, score-0.989]

33 This is based on the idea that shape recipe kernels should vary slowly over scale. [sent-44, score-0.462]

34 To do this, we ﬁrst reexpress the shape recipe process in the Fourier domain. [sent-46, score-0.43]

35 The operations of shape recipes (pyramid decomposition, convolution, and image reconstruction) are all linear operations, and so they can be combined into a single linear convolution. [sent-47, score-0.662]

36 In other words, we can think of shape recipes as inferring the high resolution range data zhigh via a single convolution Zhigh (u, v) = I(u, v) · Krecipe (u, v) (2) where I is the Fourier transform of the intensity image i. [sent-48, score-1.558]

37 Once zhigh (the inverse Fourier transform of Zhigh ) is estimated, it can be combined with the known low-resolution range data simply by adding them together: zrecipe (x, y) = zlow (x, y) + zhigh (x, y). [sent-52, score-0.799]

38 II is also known as the power spectrum, and it is the Fourier transform of the autocorrelation of the intensity image. [sent-54, score-0.28]

39 ZI is the Fourier transform of the cross-correlation between the intensity and range images, and it has both real and imaginary parts. [sent-55, score-0.721]

40 Observe that I · K is a perfect reconstruction of the original high resolution range image (as long as II(u, v) = 0). [sent-57, score-0.595]

41 Because we do not have the full-resolution range image, we can only compute the low spatial frequencies of ZI(u, v). [sent-58, score-0.287]

42 Let Klow = ZIlow /II, where ZIlow is the Fourier transform of the cross-correlation between the low-resolution range image, and a low-resolution version of the intensity image. [sent-59, score-0.455]

43 In the appendix, we show that shape recipes implicitly perform this extrapolation by learning the highest available frequency octave of Klow , and duplicating this octave into all successive octaves of Krecipe , multiplied by a scale factor. [sent-62, score-0.986]

44 Figure 1a shows a plot of ZI from a scene in our database of ground-truth range data (to be described in section 3). [sent-65, score-0.323]

45 Second and more importantly, octave duplication violates Freeman and Torralba’s assumption that shape recipe kernels should change slowly over scale, which we take to mean over all scales, not just over successive octaves. [sent-67, score-0.627]

46 Even if octave 2 of K is made identical to octave 1, it is mathematically impossible for fractional octaves of K like 1. [sent-68, score-0.371]

47 In the next section, we use laser scans of real scenes to study the joint statistics of range and intensity images in greater detail, and use our results to form a statistically-motivated model of ZI. [sent-71, score-0.936]

48 We believe that a greater understanding of the joint distribution of natural images and their underlying 3D structure will have a broad impact on the development of robust depth inference algorithms, and also on understanding human depth perception. [sent-72, score-0.494]

49 More immediately, our statistical observations lead to a more accurate way to extrapolate Klow , which in turn results in a more accurate shape recipe method. [sent-73, score-0.43]

50 3 Scaling laws in natural scene statistics To study the correlational structures between depth and intensity in natural scenes, we have collected a database of coregistered intensity and high-resolution range images (corresponding pixels of the two images correspond to the same point in space). [sent-74, score-1.247]

51 Scans were collected using the Riegl LMS-Z360 laser range scanner with integrated color photosensor. [sent-75, score-0.305]

52 The shape recipe model was intended for scenes with homogenous albedo and surface material. [sent-87, score-0.676]

53 To test this algorithm in real scenes of this type, we selected 28 single-texture image sections from our database. [sent-88, score-0.46]

54 No logarithm or other transformation was applied to the intensity or range data (measured in meters), as this would interfere with the Lambertian model that motivates the shape recipe technique. [sent-90, score-0.816]

55 We show a log-log polar plot of |real[ZI(r, θ)]| from one image in our database in ﬁgure 1a. [sent-92, score-0.294]

56 We claim that ZI can be reasonably modeled by B(θ)/rα , where r is spatial frequency in polar coordinates, and B(θ) is a parameter of the model (with both real and imaginary parts) that depends only on polar angle θ. [sent-94, score-0.526]

57 The 1/r drop-off in the imaginary part of K can be explained by the linear Lambertian model of shading, with oblique lighting conditions. [sent-147, score-0.273]

58 This means that each octave of K is half of the octave before it. [sent-153, score-0.33]

59 Our empirical ﬁnding that the imaginary part of K obeys a 1/r power-law conﬁrms Freeman and Torralba’s reasoning behind choosing c = 2 for shape recipes. [sent-154, score-0.47]

60 However, the linear Lambertian shading model predicts that only the imaginary part of ZI should obey a power-law. [sent-155, score-0.319]

61 Yet, in our database, the real part of ZI was typically stronger than the imaginary part. [sent-157, score-0.302]

62 The real part of ZI is the Fourier transform of the even-symmetric part of the cross-correlation function, and it includes the direct correlation cov[i, z]. [sent-158, score-0.26]

63 In a previous study of the statistics of natural range images [6], we have found that darker pixels in the image tend to be farther away, resulting in signiﬁcantly negative cov[i, z]. [sent-159, score-0.576]

64 We attributed this phenomenon to cast shadows in complex scenes: object interiors and concavities are farther away than object exteriors, and these regions are the most likely to be in shadow. [sent-160, score-0.276]

65 We found that the real part of ZI is especially likely to be strongly negative in images of foliage. [sent-163, score-0.282]

66 Psychophysical experiments have demonstrated that in the absence of all other cues, darker image regions appear farther, suggesting that the human visual system makes use of this cue for depth inference (see [6] for a review, also [10]). [sent-166, score-0.381]

67 4 Inference using power-law models Armed with a better understanding of the statistics of real scenes, we are better prepared to develop successful depth inference algorithms. [sent-169, score-0.31]

68 We can simply ﬁt our BK (θ)/r power law to ZIlow /II, and then use this estimate of K to reconstruct the high frequency range data. [sent-172, score-0.272]

69 Speciﬁcally, from the low-resolution range and intensity image, we compute low resolution spectra of ZI and II. [sent-173, score-0.575]

70 From the highest frequency octave of the low-resolution images, we estimate BII (θ) and BZI (θ). [sent-174, score-0.275]

71 a) Original Intensity Image b) Low-Resolution Range Data c) Power-law Shape Recipe d) Krecipe e) Kpowerlaw Figure 2: a) An example intensity image from our database. [sent-177, score-0.368]

72 b) A Lambertian rendering of the corresponding low resolution range image. [sent-178, score-0.364]

73 We now can estimate the high spatial frequencies of the range image, z. [sent-185, score-0.319]

74 Deﬁne Kpowerlaw (r, θ) = Fhigh (r) · (BZI (θ)/BII (θ))/r Zpowerlaw = Zlow + I · Kpowerlaw (5) (6) where Fhigh is the high-pass ﬁlter associated with the two highest resolution bands of the steerable ﬁlter pyramid of the full-resolution image. [sent-186, score-0.556]

75 5 Empirical evaluation In this section, we compare the performance of shape recipes with our new approach, using our ground-truth database of high-resolution range and intensity image pairs described in section 3. [sent-187, score-1.12]

76 For each range image in our database, a low-resolution (but still full-sized) range image, zlow , was generated by setting to zero the top two steerable ﬁlter pyramid layers. [sent-188, score-0.905]

77 Both algorithms accepted as input the low-resolution range image and high-resolution intensity image, and the output was compared with the original high-resolution range image. [sent-189, score-0.718]

78 The high resolution output corresponds to a 4-fold increase in spatial resolution (or a 16-fold increase in total size). [sent-190, score-0.475]

79 Although encouraging enhancements of stereo output were given by the authors, shape recipes has not been evaluated with real, ground-truth high resolution range data. [sent-191, score-0.942]

80 To maximize its performance, we implemented shape recipes using ridge regression, with the ridge coefﬁcient obtained using cross-validation. [sent-192, score-0.565]

81 Linear kernels were learned (and the output evaluated) over a region of the image at least 21 pixels from the image border. [sent-193, score-0.346]

82 For each high-resolution output, we measured the sum squared error between the reconstruction (zrecipe or zpowerlaw ) and the original range image (z). [sent-194, score-0.415]

83 We compared this with the sum-squared error of the low-resolution range image zlow to get the percent reduction errlow −errrecipe in sum-squared error: error reductionrecipe = . [sent-195, score-0.558]

84 This measure of error errlow reﬂects the performance of the method independently of the variance or absolute depth of the range image. [sent-196, score-0.395]

85 We cannot expect the error reduction values to be very high, partly because our images are highly complex natural scenes, and also because some noise was present in both the range and intensity images. [sent-202, score-0.551]

86 As a comparison, we generated an optimal linear reconstruction, zoptlin , by learning 11 × 11 shape recipe kernels for the two high resolution pyramid bands directly from the ground-truth high resolution range image. [sent-204, score-1.254]

87 This reconstruction provides a loose upper bound on the degree of improvement possible by linear shape methods. [sent-205, score-0.339]

88 We then measured the percentage of linearly achievable improvement for each image: improvementrecipe = errlow −errrecipe errlow −erroptlin Shape recipes yielded an average improvement of 23%. [sent-206, score-0.567]

89 Our approach achieved an improvement of 44%, nearly a two-fold enhancement over shape recipes. [sent-207, score-0.297]

90 6 The relative strengths of shading and shadow cues Earlier we showed that Lambertian shading alone predicts that the real part of ZI in natural scenes is empty of useful correlations between images and range images. [sent-208, score-1.084]

91 Yet in our database, the real part of ZI, which we believe is related to shadow cues, was often stronger than the imaginary component. [sent-209, score-0.393]

92 Our depth-inference algorithm offers an opportunity to compare the performance of shading cues versus shadow cues. [sent-210, score-0.269]

93 When the database is broken down into categories, the real part of ZI is responsible for 96% of total improvement in foliage scenes, 76% in rocky terrain scenes, and 35% in urban scenes (statue surfaces and building facades). [sent-215, score-0.651]

94 7 Discussion The power-law extension to shape recipes not only offers a substantial improvement in performance, but it also greatly reduces the number of parameters that must be learned. [sent-219, score-0.57]

95 The original shape recipes required one 11×11 kernel, or 121 parameters, for each orientation of the steerable ﬁlters. [sent-220, score-0.703]

96 The new algorithm requires only two parameters for each orientation (the real and the imaginary parts of BK (θ)). [sent-221, score-0.317]

97 While it is encouraging that the power-law algorithm is highly parsimonious, it also means that fewer scene properties are encoded in the shape recipe kernels than was previously hoped [3]. [sent-223, score-0.538]

98 However, when designing these more sophisticated methods, care must be taken to avoid the same pitfall encountered by shape recipes: not all properties of a scene can be scale-invariant simultaneously. [sent-228, score-0.308]

99 8 Appendix Shape recipes infer each high resolution band of the range using equation 1. [sent-229, score-0.734]

100 If we take the Fourier transform of equation 1, we get u v 1 , · (I · Fm,φ ) (7) Zhigh · Fm,φ = n−m Kn,φ c σ σ where Fm,φ is the Fourier transform of the steerable ﬁlter at level m and orientation φ, and Zhigh is the inferred high spatial frequency components of the range image. [sent-231, score-0.673]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('recipes', 0.273), ('shape', 0.232), ('zi', 0.224), ('intensity', 0.211), ('recipe', 0.198), ('resolution', 0.189), ('lambertian', 0.185), ('zhigh', 0.185), ('scenes', 0.184), ('range', 0.175), ('krecipe', 0.165), ('octave', 0.165), ('image', 0.157), ('fourier', 0.147), ('imaginary', 0.147), ('steerable', 0.147), ('imag', 0.144), ('zlow', 0.144), ('depth', 0.138), ('images', 0.127), ('real', 0.119), ('bk', 0.113), ('pyramid', 0.107), ('klow', 0.103), ('kpowerlaw', 0.103), ('octant', 0.103), ('shading', 0.101), ('shadow', 0.091), ('shadows', 0.09), ('concavities', 0.082), ('errlow', 0.082), ('cues', 0.077), ('scene', 0.076), ('lter', 0.074), ('database', 0.072), ('transform', 0.069), ('bands', 0.068), ('polar', 0.065), ('laser', 0.065), ('scanner', 0.065), ('band', 0.065), ('improvement', 0.065), ('spatial', 0.065), ('frequency', 0.065), ('albedo', 0.062), ('ectance', 0.062), ('zilow', 0.062), ('cast', 0.058), ('lighting', 0.057), ('scans', 0.055), ('obeys', 0.055), ('inference', 0.053), ('orientation', 0.051), ('torralba', 0.049), ('frequencies', 0.047), ('kn', 0.047), ('relationship', 0.046), ('farther', 0.046), ('highest', 0.045), ('reconstruction', 0.042), ('bzi', 0.041), ('darkness', 0.041), ('errrecipe', 0.041), ('exteriors', 0.041), ('facades', 0.041), ('fhigh', 0.041), ('lowresolution', 0.041), ('octaves', 0.041), ('rocky', 0.041), ('statue', 0.041), ('sunny', 0.041), ('tai', 0.041), ('zpowerlaw', 0.041), ('zrecipe', 0.041), ('illumination', 0.041), ('stereo', 0.041), ('fm', 0.039), ('natural', 0.038), ('freeman', 0.038), ('material', 0.038), ('part', 0.036), ('subband', 0.036), ('subbands', 0.036), ('urban', 0.036), ('predicts', 0.035), ('convolution', 0.035), ('orientations', 0.035), ('surfaces', 0.035), ('structures', 0.034), ('ii', 0.033), ('foliage', 0.033), ('oblique', 0.033), ('darker', 0.033), ('acquired', 0.032), ('high', 0.032), ('kernels', 0.032), ('terrain', 0.03), ('monocular', 0.03), ('ridge', 0.03), ('textures', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape

Author: Tai-sing Lee, Brian R. Potetz

2 0.24764861 23 nips-2005-An Application of Markov Random Fields to Range Sensing

Author: James Diebel, Sebastian Thrun

Abstract: This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology. 1

3 0.1746687 110 nips-2005-Learning Depth from Single Monocular Images

Author: Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng

Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufﬁcient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. 1

4 0.095770977 109 nips-2005-Learning Cue-Invariant Visual Responses

Author: Jarmo Hurri

Abstract: Multiple visual cues are used by the visual system to analyze a scene; achromatic cues include luminance, texture, contrast and motion. Singlecell recordings have shown that the mammalian visual cortex contains neurons that respond similarly to scene structure (e.g., orientation of a boundary), regardless of the cue type conveying this information. This paper shows that cue-invariant response properties of simple- and complex-type cells can be learned from natural image data in an unsupervised manner. In order to do this, we also extend a previous conceptual model of cue invariance so that it can be applied to model simple- and complex-cell responses. Our results relate cue-invariant response properties to natural image statistics, thereby showing how the statistical modeling approach can be used to model processing beyond the elemental response properties visual neurons. This work also demonstrates how to learn, from natural image data, more sophisticated feature detectors than those based on changes in mean luminance, thereby paving the way for new data-driven approaches to image processing and computer vision. 1

5 0.095163085 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?

Author: Yan Karklin, Michael S. Lewicki

Abstract: Linear implementations of the efﬁcient coding hypothesis, such as independent component analysis (ICA) and sparse coding models, have provided functional explanations for properties of simple cells in V1 [1, 2]. These models, however, ignore the non-linear behavior of neurons and fail to match individual and population properties of neural receptive ﬁelds in subtle but important ways. Hierarchical models, including Gaussian Scale Mixtures [3, 4] and other generative statistical models [5, 6], can capture higher-order regularities in natural images and explain nonlinear aspects of neural processing such as normalization and context effects [6,7]. Previously, it had been assumed that the lower level representation is independent of the hierarchy, and had been ﬁxed when training these models. Here we examine the optimal lower-level representations derived in the context of a hierarchical model and ﬁnd that the resulting representations are strikingly different from those based on linear models. Unlike the the basis functions and ﬁlters learned by ICA or sparse coding, these functions individually more closely resemble simple cell receptive ﬁelds and collectively span a broad range of spatial scales. Our work uniﬁes several related approaches and observations about natural image structure and suggests that hierarchical models might yield better representations of image structure throughout the hierarchy.

6 0.094731219 184 nips-2005-Structured Prediction via the Extragradient Method

7 0.083068609 151 nips-2005-Pattern Recognition from One Example by Chopping

8 0.079997659 158 nips-2005-Products of ``Edge-perts

9 0.077639975 108 nips-2005-Layered Dynamic Textures

10 0.077637166 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes

11 0.070735551 29 nips-2005-Analyzing Coupled Brain Sources: Distinguishing True from Spurious Interaction

12 0.069571711 97 nips-2005-Inferring Motor Programs from Images of Handwritten Digits

13 0.065133706 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection

14 0.064334169 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning

15 0.061898485 169 nips-2005-Saliency Based on Information Maximization

16 0.061727047 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data

17 0.061486505 79 nips-2005-Fusion of Similarity Data in Clustering

18 0.059982888 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection

19 0.057926401 45 nips-2005-Conditional Visual Tracking in Kernel Space

20 0.054939624 100 nips-2005-Interpolating between types and tokens by estimating power-law generators

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.174), (1, -0.019), (2, -0.015), (3, 0.244), (4, -0.07), (5, 0.083), (6, -0.069), (7, 0.115), (8, 0.177), (9, -0.181), (10, 0.035), (11, 0.13), (12, 0.09), (13, -0.087), (14, 0.08), (15, -0.02), (16, 0.189), (17, 0.026), (18, -0.059), (19, -0.108), (20, -0.062), (21, -0.071), (22, -0.002), (23, -0.057), (24, 0.069), (25, -0.069), (26, -0.047), (27, 0.002), (28, -0.021), (29, 0.042), (30, 0.053), (31, -0.027), (32, 0.033), (33, -0.103), (34, -0.043), (35, 0.092), (36, -0.094), (37, -0.0), (38, -0.029), (39, -0.035), (40, 0.027), (41, 0.094), (42, 0.041), (43, -0.064), (44, 0.006), (45, -0.029), (46, 0.057), (47, -0.081), (48, -0.027), (49, -0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97790772 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape

Author: Tai-sing Lee, Brian R. Potetz

2 0.85804898 23 nips-2005-An Application of Markov Random Fields to Range Sensing

Author: James Diebel, Sebastian Thrun

3 0.835195 110 nips-2005-Learning Depth from Single Monocular Images

Author: Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng

4 0.49115366 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning

Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun

Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.

5 0.48796394 158 nips-2005-Products of ``Edge-perts

Author: Max Welling, Peter V. Gehler

Abstract: Images represent an important and abundant source of data. Understanding their statistical structure has important applications such as image compression and restoration. In this paper we propose a particular kind of probabilistic model, dubbed the “products of edge-perts model” to describe the structure of wavelet transformed images. We develop a practical denoising algorithm based on a single edge-pert and show state-ofthe-art denoising performance on benchmark images. 1

6 0.47554934 108 nips-2005-Layered Dynamic Textures

7 0.47284618 101 nips-2005-Is Early Vision Optimized for Extracting Higher-order Dependencies?

8 0.39751178 109 nips-2005-Learning Cue-Invariant Visual Responses

9 0.37257275 169 nips-2005-Saliency Based on Information Maximization

10 0.36448306 151 nips-2005-Pattern Recognition from One Example by Chopping

11 0.36001644 184 nips-2005-Structured Prediction via the Extragradient Method

12 0.35071054 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes

13 0.3221311 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection

14 0.31184122 45 nips-2005-Conditional Visual Tracking in Kernel Space

15 0.31140375 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories

16 0.30847681 97 nips-2005-Inferring Motor Programs from Images of Handwritten Digits

17 0.30671078 35 nips-2005-Bayesian model learning in human visual perception

18 0.30307084 171 nips-2005-Searching for Character Models

19 0.2814351 203 nips-2005-Visual Encoding with Jittering Eyes

20 0.27328569 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.046), (10, 0.072), (27, 0.061), (31, 0.035), (34, 0.084), (39, 0.02), (41, 0.029), (55, 0.024), (69, 0.071), (73, 0.054), (88, 0.081), (91, 0.024), (93, 0.314)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80542016 163 nips-2005-Recovery of Jointly Sparse Signals from Few Random Projections

Author: Michael B. Wakin, Marco F. Duarte, Shriram Sarvotham, Dror Baron, Richard G. Baraniuk

Abstract: Compressed sensing is an emerging ﬁeld based on the revelation that a small group of linear projections of a sparse signal contains enough information for reconstruction. In this paper we introduce a new theory for distributed compressed sensing (DCS) that enables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and inter-signal correlation structures. The DCS theory rests on a new concept that we term the joint sparsity of a signal ensemble. We study three simple models for jointly sparse signals, propose algorithms for joint recovery of multiple signals from incoherent projections, and characterize theoretically and empirically the number of measurements per sensor required for accurate reconstruction. In some sense DCS is a framework for distributed compression of sources with memory, which has remained a challenging problem in information theory for some time. DCS is immediately applicable to a range of problems in sensor networks and arrays. 1

same-paper 2 0.79472244 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape

Author: Tai-sing Lee, Brian R. Potetz

3 0.47387832 177 nips-2005-Size Regularized Cut for Data Clustering

Author: Yixin Chen, Ya Zhang, Xiang Ji

Abstract: We present a novel spectral clustering method that enables users to incorporate prior knowledge of the size of clusters into the clustering process. The cost function, which is named size regularized cut (SRcut), is deﬁned as the sum of the inter-cluster similarity and a regularization term measuring the relative size of two clusters. Finding a partition of the data set to minimize SRcut is proved to be NP-complete. An approximation algorithm is proposed to solve a relaxed version of the optimization problem as an eigenvalue problem. Evaluations over different data sets demonstrate that the method is not sensitive to outliers and performs better than normalized cut. 1

4 0.47280887 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity

Author: Amir Navot, Lavi Shpigelman, Naftali Tishby, Eilon Vaadia

Abstract: We present a non-linear, simple, yet effective, feature subset selection method for regression and use it in analyzing cortical neural activity. Our algorithm involves a feature-weighted version of the k-nearest-neighbor algorithm. It is able to capture complex dependency of the target function on its input and makes use of the leave-one-out error as a natural regularization. We explain the characteristics of our algorithm on synthetic problems and use it in the context of predicting hand velocity from spikes recorded in motor cortex of a behaving monkey. By applying feature selection we are able to improve prediction quality and suggest a novel way of exploring neural data.

5 0.46722013 144 nips-2005-Off-policy Learning with Options and Recognizers

Author: Doina Precup, Cosmin Paduraru, Anna Koop, Richard S. Sutton, Satinder P. Singh

Abstract: We introduce a new algorithm for off-policy temporal-difference learning with function approximation that has lower variance and requires less knowledge of the behavior policy than prior methods. We develop the notion of a recognizer, a ﬁlter on actions that distorts the behavior policy to produce a related target policy with low-variance importance-sampling corrections. We also consider target policies that are deviations from the state distribution of the behavior policy, such as potential temporally abstract options, which further reduces variance. This paper introduces recognizers and their potential advantages, then develops a full algorithm for linear function approximation and proves that its updates are in the same direction as on-policy TD updates, which implies asymptotic convergence. Even though our algorithm is based on importance sampling, we prove that it requires absolutely no knowledge of the behavior policy for the case of state-aggregation function approximators. Off-policy learning is learning about one way of behaving while actually behaving in another way. For example, Q-learning is an off- policy learning method because it learns about the optimal policy while taking actions in a more exploratory fashion, e.g., according to an ε-greedy policy. Off-policy learning is of interest because only one way of selecting actions can be used at any time, but we would like to learn about many different ways of behaving from the single resultant stream of experience. For example, the options framework for temporal abstraction involves considering a variety of different ways of selecting actions. For each such option one would like to learn a model of its possible outcomes suitable for planning and other uses. Such option models have been proposed as fundamental building blocks of grounded world knowledge (Sutton, Precup & Singh, 1999; Sutton, Rafols & Koop, 2005). Using off-policy learning, one would be able to learn predictive models for many options at the same time from a single stream of experience. Unfortunately, off-policy learning using temporal-difference methods has proven problematic when used in conjunction with function approximation. Function approximation is essential in order to handle the large state spaces that are inherent in many problem do- mains. Q-learning, for example, has been proven to converge to an optimal policy in the tabular case, but is unsound and may diverge in the case of linear function approximation (Baird, 1996). Precup, Sutton, and Dasgupta (2001) introduced and proved convergence for the ﬁrst off-policy learning algorithm with linear function approximation. They addressed the problem of learning the expected value of a target policy based on experience generated using a different behavior policy. They used importance sampling techniques to reduce the off-policy case to the on-policy case, where existing convergence theorems apply (Tsitsiklis & Van Roy, 1997; Tadic, 2001). There are two important difﬁculties with that approach. First, the behavior policy needs to be stationary and known, because it is needed to compute the importance sampling corrections. Second, the importance sampling weights are often ill-conditioned. In the worst case, the variance could be inﬁnite and convergence would not occur. The conditions required to prevent this were somewhat awkward and, even when they applied and asymptotic convergence was assured, the variance could still be high and convergence could be slow. In this paper we address both of these problems in the context of off-policy learning for options. We introduce the notion of a recognizer. Rather than specifying an explicit target policy (for instance, the policy of an option), about which we want to make predictions, a recognizer speciﬁes a condition on the actions that are selected. For example, a recognizer for the temporally extended action of picking up a cup would not specify which hand is to be used, or what the motion should be at all different positions of the cup. The recognizer would recognize a whole variety of directions of motion and poses as part of picking the cup. The advantage of this strategy is not that one might prefer a multitude of different behaviors, but that the behavior may be based on a variety of different strategies, all of which are relevant, and we would like to learn from any of them. In general, a recognizer is a function that recognizes or accepts a space of different ways of behaving and thus, can learn from a wider range of data. Recognizers have two advantages over direct speciﬁcation of a target policy: 1) they are a natural and easy way to specify a target policy for which importance sampling will be well conditioned, and 2) they do not require the behavior policy to be known. The latter is important because in many cases we may have little knowledge of the behavior policy, or a stationary behavior policy may not even exist. We show that for the case of state aggregation, even if the behavior policy is unknown, convergence to a good model is achieved. 1 Non-sequential example The beneﬁts of using recognizers in off-policy learning can be most easily seen in a nonsequential context with a single continuous action. Suppose you are given a sequence of sample actions ai ∈ [0, 1], selected i.i.d. according to probability density b : [0, 1] → ℜ+ (the behavior density). For example, suppose the behavior density is of the oscillatory form shown as a red line in Figure 1. For each each action, ai , we observe a corresponding outcome, zi ∈ ℜ, a random variable whose distribution depends only on ai . Thus the behavior density induces an outcome density. The on-policy problem is to estimate the mean mb of the outcome density. This problem can be solved simply by averaging the sample outcomes: mb = (1/n) ∑n zi . The off-policy problem is to use this same data to learn what ˆ i=1 the mean would be if actions were selected in some way other than b, for example, if the actions were restricted to a designated range, such as between 0.7 and 0.9. There are two natural ways to pose this off-policy problem. The most straightforward way is to be equally interested in all actions within the designated region. One professes to be interested in actions selected according to a target density π : [0, 1] → ℜ+ , which in the example would be 5.0 between 0.7 and 0.9, and zero elsewhere, as in the dashed line in 12 Probability density functions 1.5 Target policy with recognizer 1 Target policy w/o recognizer without recognizer .5 Behavior policy 0 0 Action 0.7 Empirical variances (average of 200 sample variances) 0.9 1 0 10 with recognizer 100 200 300 400 500 Number of sample actions Figure 1: The left panel shows the behavior policy and the target policies for the formulations of the problem with and without recognizers. The right panel shows empirical estimates of the variances for the two formulations as a function of the number sample actions. The lowest line is for the formulation using empirically-estimated recognition probabilities. Figure 1 (left). The importance- sampling estimate of the mean outcome is 1 n π(ai ) mπ = ∑ ˆ zi . n i=1 b(ai ) (1) This approach is problematic if there are parts of the region of interest where the behavior density is zero or very nearly so, such as near 0.72 and 0.85 in the example. Here the importance sampling ratios are exceedingly large and the estimate is poorly conditioned (large variance). The upper curve in Figure 1 (right) shows the empirical variance of this estimate as a function of the number of samples. The spikes and uncertain decline of the empirical variance indicate that the distribution is very skewed and that the estimates are very poorly conditioned. The second way to pose the problem uses recognizers. One professes to be interested in actions to the extent that they are both selected by b and within the designated region. This leads to the target policy shown in blue in the left panel of Figure 1 (it is taller because it still must sum to 1). For this problem, the variance of (1) is much smaller, as shown in the lower two lines of Figure 1 (right). To make this way of posing the problem clear, we introduce the notion of a recognizer function c : A → ℜ+ . The action space in the example is A = [0, 1] and the recognizer is c(a) = 1 for a between 0.7 and 0.9 and is zero elsewhere. The target policy is deﬁned in general by c(a)b(a) c(a)b(a) = . (2) π(a) = µ ∑x c(x)b(x) where µ = ∑x c(x)b(x) is a constant, equal to the probability of recognizing an action from the behavior policy. Given π, mπ from (1) can be rewritten in terms of the recognizer as ˆ n π(ai ) 1 n c(ai )b(ai ) 1 1 n c(ai ) 1 mπ = ∑ zi ˆ = ∑ zi = ∑ zi (3) n i=1 b(ai ) n i=1 µ b(ai ) n i=1 µ Note that the target density does not appear at all in the last expression and that the behavior distribution appears only in µ, which is independent of the sample action. If this constant is known, then this estimator can be computed with no knowledge of π or b. The constant µ can easily be estimated as the fraction of recognized actions in the sample. The lowest line in Figure 1 (right) shows the variance of the estimator using this fraction in place of the recognition probability. Its variance is low, no worse than that of the exact algorithm, and apparently slightly lower. Because this algorithm does not use the behavior density, it can be applied when the behavior density is unknown or does not even exist. For example, suppose actions were selected in some deterministic, systematic way that in the long run produced an empirical distribution like b. This would be problematic for the other algorithms but would require no modiﬁcation of the recognition-fraction algorithm. 2 Recognizers improve conditioning of off-policy learning The main use of recognizers is in formulating a target density π about which we can successfully learn predictions, based on the current behavior being followed. Here we formalize this intuition. Theorem 1 Let A = {a1 , . . . ak } ⊆ A be a subset of all the possible actions. Consider a ﬁxed behavior policy b and let πA be the class of policies that only choose actions from A, i.e., if π(a) > 0 then a ∈ A. Then the policy induced by b and the binary recognizer cA is the policy with minimum-variance one-step importance sampling corrections, among those in πA : π(ai ) 2 π as given by (2) = arg min Eb (4) π∈πA b(ai ) Proof: Denote π(ai ) = πi , b(ai ) = bi . Then the expected variance of the one-step importance sampling corrections is: Eb πi bi πi bi 2 2 − Eb = ∑ bi i πi bi 2 −1 = ∑ i π2 i − 1, bi where the summation (here and everywhere below) is such that the action ai ∈ A. We want to ﬁnd πi that minimizes this expression, subject to the constraint that ∑i πi = 1. This is a constrained optimization problem. To solve it, we write down the corresponding Lagrangian: π2 L(πi , β) = ∑ i − 1 + β(∑ πi − 1) i i bi We take the partial derivatives wrt πi and β and set them to 0: βbi ∂L 2 = πi + β = 0 ⇒ πi = − ∂πi bi 2 (5) ∂L = πi − 1 = 0 ∂β ∑ i (6) By taking (5) and plugging into (6), we get the following expression for β: − β 2 bi = 1 ⇒ β = − 2∑ ∑i bi i By substituting β into (5) we obtain: πi = bi ∑i b i This is exactly the policy induced by the recognizer deﬁned by c(ai ) = 1 iff ai ∈ A. We also note that it is advantageous, from the point of view of minimizing the variance of the updates, to have recognizers that accept a broad range of actions: Theorem 2 Consider two binary recognizers c1 and c2 , such that µ1 > µ2 . Then the importance sampling corrections for c1 have lower variance than the importance sampling corrections for c2 . Proof: From the previous theorem, we have the variance of a recognizer cA : Var = ∑ i π2 bi i −1 = ∑ bi ∑ j∈A b j i 2 1 1 1 −1 = −1 = −1 bi µ ∑ j∈A b j 3 Formal framework for sequential problems We turn now to the full case of learning about sequential decision processes with function approximation. We use the standard framework in which an agent interacts with a stochastic environment. At each time step t, the agent receives a state st and chooses an action at . We assume for the moment that actions are selected according to a ﬁxed behavior policy, b : S × A → [0, 1] where b(s, a) is the probability of selecting action a in state s. The behavior policy is used to generate a sequence of experience (observations, actions and rewards). The goal is to learn, from this data, predictions about different ways of behaving. In this paper we focus on learning predictions about expected returns, but other predictions can be tackled as well (for instance, predictions of transition models for options (Sutton, Precup & Singh, 1999), or predictions speciﬁed by a TD-network (Sutton & Tanner, 2005; Sutton, Rafols & Koop, 2006)). We assume that the state space is large or continuous, and function approximation must be used to compute any values of interest. In particular, we assume a space of feature vectors Φ and a mapping φ : S → Φ. We denote by φs the feature vector associated with s. An option is deﬁned as a triple o = I, π, β where I ⊆ S is the set of states in which the option can be initiated, π is the internal policy of the option and β : S → [0, 1] is a stochastic termination condition. In the option work (Sutton, Precup & Singh, 1999), each of these elements has to be explicitly speciﬁed and ﬁxed in order for an option to be well deﬁned. Here, we will instead deﬁne options implicitly, using the notion of a recognizer. A recognizer is deﬁned as a function c : S × A → [0, 1], where c(s, a) indicates to what extent the recognizer allows action a in state s. An important special case, which we treat in this paper, is that of binary recognizers. In this case, c is an indicator function, specifying a subset of actions that are allowed, or recognized, given a particular state. Note that recognizers do not specify policies; instead, they merely give restrictions on the policies that are allowed or recognized. A recognizer c together with a behavior policy b generates a target policy π, where: b(s, a)c(s, a) b(s, a)c(s, a) π(s, a) = (7) = µ(s) ∑x b(s, x)c(s, x) The denominator of this fraction, µ(s) = ∑x b(s, x)c(s, x), is the recognition probability at s, i.e., the probability that an action will be accepted at s when behavior is generated according to b. The policy π is only deﬁned at states for which µ(s) > 0. The numerator gives the probability that action a is produced by the behavior and recognized in s. Note that if the recognizer accepts all state-action pairs, i.e. c(s, a) = 1, ∀s, a, then π is the same as b. Since a recognizer and a behavior policy can specify together a target policy, we can use recognizers as a way to specify policies for options, using (7). An option can only be initiated at a state for which at least one action is recognized, so µ(s) > 0, ∀s ∈ I. Similarly, the termination condition of such an option, β, is deﬁned as β(s) = 1 if µ(s) = 0. In other words, the option must terminate if no actions are recognized at a given state. At all other states, β can be deﬁned between 0 and 1 as desired. We will focus on computing the reward model of an option o, which represents the expected total return. The expected values of different features at the end of the option can be estimated similarly. The quantity that we want to compute is Eo {R(s)} = E{r1 + r2 + . . . + rT |s0 = s, π, β} where s ∈ I, experience is generated according to the policy of the option, π, and T denotes the random variable representing the time step at which the option terminates according to β. We assume that linear function approximation is used to represent these values, i.e. Eo {R(s)} ≈ θT φs where θ is a vector of parameters. 4 Off-policy learning algorithm In this section we present an adaptation of the off-policy learning algorithm of Precup, Sutton & Dasgupta (2001) to the case of learning about options. Suppose that an option’s policy π was used to generate behavior. In this case, learning the reward model of the option is a special case of temporal-difference learning of value functions. The forward ¯ (n) view of this algorithm is as follows. Let Rt denote the truncated n-step return starting at ¯ (0) time step t and let yt denote the 0-step truncated return, Rt . By the deﬁnition of the n-step truncated return, we have: ¯ (n) ¯ (n−1) Rt = rt+1 + (1 − βt+1 )Rt+1 . This is similar to the case of value functions, but it accounts for the possibility of terminating the option at time step t + 1. The λ-return is deﬁned in the usual way: ∞ ¯ (n) ¯ Rtλ = (1 − λ) ∑ λn−1 Rt . n=1 The parameters of the linear function approximator are updated on every time step proportionally to: ¯ ¯ ∆θt = Rtλ − yt ∇θ yt (1 − β1 ) · · · (1 − βt ). In our case, however, trajectories are generated according to the behavior policy b. The main idea of the algorithm is to use importance sampling corrections in order to account for the difference in the state distribution of the two policies. Let ρt = (n) Rt , π(st ,at ) b(st ,at ) be the importance sampling ratio at time step t. The truncated n-step return, satisﬁes: (n) (n−1) Rt = ρt [rt+1 + (1 − βt+1 )Rt+1 ]. The update to the parameter vector is proportional to: ∆θt = Rtλ − yt ∇θ yt ρ0 (1 − β1 ) · · · ρt−1 (1 − βt ). The following result shows that the expected updates of the on-policy and off-policy algorithms are the same. Theorem 3 For every time step t ≥ 0 and any initial state s, ¯ Eb [∆θt |s] = Eπ [∆θt |s]. (n) (n) ¯ Proof: First we will show by induction that Eb {Rt |s} = Eπ {Rt |s}, ∀n (which implies ¯ that Eb {Rtλ |s} = Eπ (Rtλ |s}). For n = 0, the statement is trivial. Assuming that it is true for n − 1, we have (n) Eb Rt |s = a ∑b(s, a)∑Pss ρ(s, a) a = s ∑∑ a Pss b(s, a) a s = a ∑π(s, a)∑Pss a (n−1) a rss + (1 − β(s ))Eb Rt+1 |s π(s, a) a ¯ (n−1) r + (1 − β(s ))Eπ Rt+1 |s b(s, a) ss a ¯ (n−1) rss + (1 − β(s ))Eπ Rt+1 |s ¯ (n) = Eπ Rt |s . s Now we are ready to prove the theorem’s main statement. Deﬁning Ωt to be the set of all trajectory components up to state st , we have: Eb {∆θt |s} = ∑ ω∈Ωt Pb (ω|s)Eb (Rtλ − yt )∇θ yt |ω t−1 ∏ ρi (1 − βi+1 ) i=0 πi (1 − βi+1 ) i=0 bi t−1 = t−1 ∑ ∏ bi Psaiisi+1 ω∈Ωt Eb Rtλ |st − yt ∇θ yt ∏ i=0 t−1 = ∑ ∏ πi Psaiisi+1 ω∈Ωt = ∑ ω∈Ωt ¯ Eπ Rtλ |st − yt ∇θ yt (1 − β1 )...(1 − βt ) i=0 ¯ ¯ Pπ (ω|s)Eπ (Rtλ − yt )∇θ yt |ω (1 − β1 )...(1 − βt ) = Eπ ∆θt |s . Note that we are able to use st and ω interchangeably because of the Markov property. ¯ Since we have shown that Eb [∆θt |s] = Eπ [∆θt |s] for any state s, it follows that the expected updates will also be equal for any distribution of the initial state s. When learning the model of options with data generated from the behavior policy b, the starting state distribution with respect to which the learning is performed, I0 is determined by the stationary distribution of the behavior policy, as well as the initiation set of the option I. We note also that the importance sampling corrections only have to be performed for the trajectory since the initiation of the updates for the option. No corrections are required for the experience prior to this point. This should generate updates that have signiﬁcantly lower variance than in the case of learning values of policies (Precup, Sutton & Dasgupta, 2001). Because of the termination condition of the option, β, ∆θ can quickly decay to zero. To avoid this problem, we can use a restart function g : S → [0, 1], such that g(st ) speciﬁes the extent to which the updating episode is considered to start at time t. Adding restarts generates a new forward update: t ∆θt = (Rtλ − yt )∇θ yt ∑ gi ρi ...ρt−1 (1 − βi+1 )...(1 − βt ), (8) i=0 where Rtλ is the same as above. With an adaptation of the proof in Precup, Sutton & Dasgupta (2001), we can show that we get the same expected value of updates by applying this algorithm from the original starting distribution as we would by applying the algorithm without restarts from a starting distribution deﬁned by I0 and g. We can turn this forward algorithm into an incremental, backward view algorithm in the following way: • Initialize k0 = g0 , e0 = k0 ∇θ y0 • At every time step t: δt = θt+1 = kt+1 = et+1 = ρt (rt+1 + (1 − βt+1 )yt+1 ) − yt θt + αδt et ρt kt (1 − βt+1 ) + gt+1 λρt (1 − βt+1 )et + kt+1 ∇θ yt+1 Using a similar technique to that of Precup, Sutton & Dasgupta (2001) and Sutton & Barto (1998), we can prove that the forward and backward algorithm are equivalent (omitted due to lack of space). This algorithm is guaranteed to converge if the variance of the updates is ﬁnite (Precup, Sutton & Dasgupta, 2001). In the case of options, the termination condition β can be used to ensure that this is the case. 5 Learning when the behavior policy is unknown In this section, we consider the case in which the behavior policy is unknown. This case is generally problematic for importance sampling algorithms, but the use of recognizers will allow us to deﬁne importance sampling corrections, as well as a convergent algorithm. Recall that when using a recognizer, the target policy of the option is deﬁned as: c(s, a)b(s, a) π(s, a) = µ(s) and the recognition probability becomes: π(s, a) c(s, a) = b(s, a) µ(s) Of course, µ(s) depends on b. If b is unknown, instead of µ(s), we will use a maximum likelihood estimate µ : S → [0, 1]. The structure used to compute µ will have to be compatible ˆ ˆ with the feature space used to represent the reward model. We will make this more precise below. Likewise, the recognizer c(s, a) will have to be deﬁned in terms of the features used to represent the model. We will then deﬁne the importance sampling corrections as: c(s, a) ˆ ρ(s, a) = µ(s) ˆ ρ(s, a) = We consider the case in which the function approximator used to model the option is actually a state aggregator. In this case, we will deﬁne recognizers which behave consistently in each partition, i.e., c(s, a) = c(p, a), ∀s ∈ p. This means that an action is either recognized or not recognized in all states of the partition. The recognition probability µ will have one ˆ entry for every partition p of the state space. Its value will be: N(p, c = 1) µ(p) = ˆ N(p) where N(p) is the number of times partition p was visited, and N(p, c = 1) is the number of times the action taken in p was recognized. In the limit, w.p.1, µ converges to ˆ ∑s d b (s|p) ∑a c(p, a)b(s, a) where d b (s|p) is the probability of visiting state s from partiˆ ˆ tion p under the stationary distribution of b. At this limit, π(s, a) = ρ(s, a)b(s, a) will be a ˆ well-deﬁned policy (i.e., ∑a π(s, a) = 1). Using Theorem 3, off-policy updates using imˆ portance sampling corrections ρ will have the same expected value as on-policy updates ˆ ˆ using π. Note though that the learning algorithm never uses π; the only quantities needed ˆ are ρ, which are learned incrementally from data. For the case of general linear function approximation, we conjecture that a similar idea can be used, where the recognition probability is learned using logistic regression. The development of this part is left for future work. Acknowledgements The authors gratefully acknowledge the ideas and encouragement they have received in this work from Eddie Rafols, Mark Ring, Lihong Li and other members of the rlai.net group. We thank Csaba Szepesvari and the reviewers of the paper for constructive comments. This research was supported in part by iCore, NSERC, Alberta Ingenuity, and CFI. References Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of ICML. Precup, D., Sutton, R. S. and Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In Proceedings of ICML. Sutton, R.S., Precup D. and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artiﬁcial Intelligence, vol . 112, pp. 181–211. Sutton,, R.S. and Tanner, B. (2005). Temporal-difference networks. In Proceedings of NIPS-17. Sutton R.S., Raffols E. and Koop, A. (2006). Temporal abstraction in temporal-difference networks”. In Proceedings of NIPS-18. Tadic, V. (2001). On the convergence of temporal-difference learning with linear function approximation. In Machine learning vol. 42, pp. 241-267. Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42:674–690.

6 0.46226808 45 nips-2005-Conditional Visual Tracking in Kernel Space

7 0.45995867 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

8 0.45797658 102 nips-2005-Kernelized Infomax Clustering

9 0.45791435 23 nips-2005-An Application of Markov Random Fields to Range Sensing

10 0.45682436 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

11 0.45609441 110 nips-2005-Learning Depth from Single Monocular Images

12 0.45257938 184 nips-2005-Structured Prediction via the Extragradient Method

13 0.45214531 48 nips-2005-Context as Filtering

14 0.45077348 198 nips-2005-Using ``epitomes'' to model genetic diversity: Rational design of HIV vaccine cocktails

15 0.45033091 171 nips-2005-Searching for Character Models

16 0.4499115 133 nips-2005-Nested sampling for Potts models

17 0.44938922 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

18 0.44865984 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

19 0.44850272 74 nips-2005-Faster Rates in Regression via Active Learning

20 0.44850048 194 nips-2005-Top-Down Control of Visual Attention: A Rational Account