nips nips2005 nips2005-108 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Antoni B. Chan, Nuno Vasconcelos
Abstract: A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. [sent-5, score-1.524]
2 One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. [sent-6, score-1.168]
3 In this work, we introduce the layered dynamic texture model, which addresses this problem. [sent-7, score-1.107]
4 Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video. [sent-9, score-0.165]
5 1 Introduction Traditional motion representations, based on optical flow, are inherently local and have significant difficulties when faced with aperture problems and noise. [sent-10, score-0.21]
6 The classical solution to this problem is to regularize the optical flow field [1, 2, 3, 4], but this introduces undesirable smoothing across motion edges or regions where the motion is, by definition, not smooth (e. [sent-11, score-0.33]
7 More recently, there have been various attempts to model video as a superposition of layers subject to homogeneous motion. [sent-14, score-0.464]
8 While layered representations exhibited significant promise in terms of combining the advantages of regularization (use of global cues to determine local motion) with the flexibility of local representations (little undue smoothing), this potential has so far not fully materialized. [sent-15, score-0.471]
9 In fact, layers are usually formulated as “cardboard” models of the world that are warped by such transformations and then stitched to form the frames in a video stream [5]. [sent-17, score-0.494]
10 This severely limits the types of video that can be synthesized: while layers showed most promise as models for scenes composed of ensembles of objects subject to homogeneous motion (e. [sent-18, score-0.606]
11 Recently, there has been more success in modeling complex scenes as dynamic textures or, more precisely, samples from stochastic processes defined over space and time [7, 8, 9, 10]. [sent-21, score-0.472]
12 This work has demonstrated that modeling both the dynamics and appearance of video as stochastic quantities leads to a much more powerful generative model for video than that of a “cardboard” figure subject to parametric motion. [sent-22, score-0.901]
13 In fact, the dynamic texture model has shown a surprising ability to abstract a wide variety of complex patterns of motion and appearance into a simple spatio-temporal model. [sent-23, score-0.857]
14 One major current limitation of the dynamic texture framework, however, is its inability to account for visual processes consisting of multiple, co-occurring, dynamic textures. [sent-24, score-1.015]
15 For example, a flock of birds flying in front of a water fountain, highway traffic moving at different speeds, video containing both trees in the background and people in the foreground, and so forth. [sent-25, score-0.524]
16 In such cases, the existing dynamic texture model is inherently incorrect, since it must represent multiple motion fields with a single dynamic process. [sent-26, score-1.121]
17 In this work, we address this limitation by introducing a new generative model for video, which we denote by the layered dynamic texture (LDT). [sent-27, score-1.16]
18 This consists of augmenting the dynamic texture with a discrete hidden variable, that enables the assignment of different dynamics to different regions of the video. [sent-28, score-0.888]
19 Conditioned on the state of this hidden variable, the video is then modeled as a simple dynamic texture. [sent-29, score-0.781]
20 By introducing a shared dynamic representation for all the pixels in the same region, the new model is a layered representation. [sent-30, score-0.79]
21 When compared with traditional layered models, it replaces the process of layer formation based on “warping of cardboard figures” with one based on sampling from the generative model (for both dynamics and appearance) provided by the dynamic texture. [sent-31, score-1.076]
22 Since each layer is a dynamic texture, the model can also be seen as a multi-state dynamic texture, which is capable of assigning different dynamics and appearance to different image regions. [sent-33, score-0.91]
23 We consider two models for the LDT, that differ in the way they enforce consistency of layer dynamics. [sent-34, score-0.316]
24 One model enforces stronger consistency but has no closed-form solution for parameter estimates (which require sampling), while the second enforces weaker consistency but is simpler to learn. [sent-35, score-0.272]
25 The models are applied to the segmentation and synthesis of sequences that are challenging for traditional vision representations. [sent-36, score-0.209]
26 It is shown that stronger consistency leads to superior performance, demonstrating the benefits of sophisticated layered representations. [sent-37, score-0.532]
27 In Section 2, we introduce the two layered dynamic texture models. [sent-39, score-1.107]
28 2 Layered dynamic textures We start with a brief review of dynamic textures, and then introduce the layered dynamic texture model. [sent-42, score-1.809]
29 1 Dynamic texture A dynamic texture [7] is a generative model for video, based on a linear dynamical system. [sent-44, score-1.198]
30 While the dynamics are represented as a time-evolving state process x t ∈ R , N the appearance of frame yt ∈ R is a linear function of the current state vector, plus some observation noise. [sent-46, score-0.212]
31 One interpretation of the dynamic texture model is that the columns of C are the principal components of the video frames, and the state vectors the PCA coefficients for each video frame. [sent-48, score-1.57]
32 Figure 1: The layered dynamic texture (left), and the approximate layered dynamic texture (right). [sent-50, score-2.235]
33 yi is an observed pixel over time, xj is a hidden state process, and Z is the collection of layer assignment variables zi that assigns each pixels to one of the state processes. [sent-51, score-0.77]
34 An alternative interpretation considers a single pixel as it evolves over time. [sent-52, score-0.101]
35 This is analogous to the discrete Fourier transform in signal processing, where a signal is represented as a weighted sum of complex exponentials although, for the dynamic texture, the trajectories are not necessarily orthogonal. [sent-55, score-0.285]
36 This interpretation illustrates the ability of the dynamic texture to model the same motion under different intensity levels (e. [sent-56, score-0.836]
37 Regardless of interpretation, the simple dynamic texture model has only one state process, which restricts the efficacy of the model to video where the motion is homogeneous. [sent-59, score-1.268]
38 2 Layered dynamic textures We now introduce the layered dynamic texture (LDT), which is shown in Figure 1 (left). [sent-61, score-1.524]
39 The model addresses the limitations of the dynamic texture by relying on a set of state processes X = {x(j) }K to model different video dynamics. [sent-62, score-1.204]
40 The layer assignment variable j=1 zi assigns pixel yi to one of the state processes (layers), and conditioned on the layer assignments, the pixels in the same layer are modeled as a dynamic texture. [sent-63, score-1.464]
41 In addition, the collection of layer assignments Z = {zi }N is modeled as a Markov random field (MRF) i=1 to ensure spatial layer consistency. [sent-64, score-0.582]
42 As a generative model, the layered dynamic texture assumes that the state processes X and the layer assignments Z are independent, i. [sent-66, score-1.527]
43 layer motion is independent of layer location, and vice versa. [sent-68, score-0.537]
44 3 Approximate layered dynamic texture We now consider a different model, the approximate layered dynamic texture (ALDT), shown in Figure 1 (right). [sent-72, score-2.235]
45 Each pixel yi is associated with its own state process xi , and a different dynamic texture is defined for each pixel. [sent-73, score-0.836]
46 However, dynamic textures associated with the same layer share the same set of dynamic parameters, which are assigned by the layer assignment variable zi . [sent-74, score-1.384]
47 Again, the collection of layer assignments Z is modeled as an MRF but, unlike the first model, conditioning on the layer assignments makes all the pixels independent. [sent-75, score-0.737]
48 This model can also be seen as a video extension of the popular image MRF models [11], where class variables for each pixel form an MRF grid and each class (e. [sent-77, score-0.521]
49 pixels in the same segment) has some class-conditional distribution (in our case a linear dynamical system). [sent-79, score-0.108]
50 The main difference between the two proposed models is in the enforcement of consistency of dynamics within a layer. [sent-80, score-0.179]
51 With the LDT, consistency of dynamics is strongly enforced by requiring each pixel in the layer to be associated with the same state process. [sent-81, score-0.516]
52 On the other hand, for the ALDT, consistency within a layer is weakly enforced by allowing the pixels to be associated with many instantiations of the state process (instantiations associated with the same layer sharing the same dynamic parameters). [sent-82, score-1.008]
53 α , zi = K K The potential function ψi defines a prior likelihood for each layer, while ψi,j attributes higher probability to configurations where neighboring pixels are in the same layer. [sent-91, score-0.241]
54 1 EM for the layered dynamic texture The E-step for the layered dynamic texture computes the conditional mean and covari(j) ance of xt given the observed video Y . [sent-97, score-2.624]
55 These expectations are intractable to compute in closed-form since it is not known to which state process each of the pixels y i is assigned, and it is therefore necessary to marginalize over all configurations of Z. [sent-98, score-0.168]
56 This problem also appears for the computation of the posterior layer assignment probability p(z i = j|Y ). [sent-99, score-0.304]
57 Once the expectations are known, the M-step parameter updates are analogous to those required to learn a regular linear dynamical system [15, 16], with a (j) minor modification in the updates if the transformation matrices Ci . [sent-105, score-0.109]
58 2 EM for the approximate layered dynamic texture The ALDT model is similar to the mixture of dynamic textures [14], a video clustering model that treats a collection of videos as a sample from a collection of dynamic textures. [sent-108, score-2.421]
59 Since, for the ALDT model, each pixel is sampled from a set of one-dimensional dynamic textures, the EM algorithm is similar to that of the mixture of dynamic textures. [sent-109, score-0.65]
60 First, the E-step computes the posterior assignment probability p(zi |Y ) given all the observed data, rather than conditioned on a single data point p(z i |yi ). [sent-111, score-0.108]
61 4 Experiments In this section, we show the efficacy of the proposed model for segmentation and synthesis of several videos with multiple regions of distinct motion. [sent-115, score-0.33]
62 Figure 2 shows the three video sequences used in testing. [sent-116, score-0.375]
63 The first (top) is a composite of three distinct video textures of water, smoke, and fire. [sent-117, score-0.61]
64 The second (middle) is of laundry spinning in a dryer. [sent-118, score-0.219]
65 The laundry in the bottom left of the video is spinning in place in a circular motion, and the laundry around the outside is spinning faster. [sent-119, score-0.813]
66 The final video (bottom) is of a highway [17] where the traffic in each lane is traveling at a different speed. [sent-120, score-0.51]
67 All three videos have multiple regions of motion and are therefore properly modeled by the models proposed in this paper, but not by a regular dynamic texture. [sent-122, score-0.595]
68 Four variations of the video models were fit to each of the three videos. [sent-123, score-0.398]
69 The four models were the layered dynamic texture and the approximate layered dynamic texture models (LDT and ALDT), and those two models without the MRF layer assignment (LDT-iid and ALDT-iid). [sent-124, score-2.576]
70 In the latter two cases, the layers assignments zi are distributed as iid multinomials. [sent-125, score-0.402]
71 We first present segmentation results, to show that the models can effectively separate layers with different dynamics, and then discuss results relative to video synthesis from the learned models. [sent-131, score-0.63]
72 1 Segmentation The videos were segmented by assigning each of the pixels to the most probable layer conditioned on the observed video, i. [sent-133, score-0.461]
73 ∗ zi = argmax p(zi = j|Y ) (6) j Another possibility would be to assign the pixels by maximizing the posterior of all the pixels p(Z|Y ). [sent-135, score-0.371]
74 The columns of Figure 3 show the segmentation results obtained with for the four models: LDT and LDT-iid in columns (a) and (b), and ALDT and ALDT-iid in columns (c) and (d). [sent-138, score-0.161]
75 From the segmentations produced by the iid models, it can be concluded that the composite and laundry videos can be reasonably well segmented without the MRF prior. [sent-140, score-0.512]
76 This confirms the intuition that the various video regions contain very distinct dynamics, which can only be modeled with separate state processes. [sent-141, score-0.536]
77 Otherwise, the pixels should be either randomly assigned among the various layers, or uniformly assigned to one of them. [sent-142, score-0.124]
78 The segmentations of the traffic video using the iid models are poor. [sent-143, score-0.55]
79 While the dynamics are different, the differences are significantly more subtle, and segmentation requires stronger enforcement of layer consistency. [sent-144, score-0.429]
80 In general, the segmentations using LDT-iid are better than to those of the ALDT-iid, due to the weaker form of layer consistency imposed by the ALDT model. [sent-145, score-0.385]
81 While this deficiency is offset by the introduction of the MRF prior, the stronger consistency enforced by the LDT model always results in better segmentations. [sent-146, score-0.152]
82 This illustrates the need for the design of sophisticated layered representations when the goal is to model video with subtle inter-layer variations. [sent-147, score-0.868]
83 For example, in the composite sequence all erroneous segments in the water region are removed, and in the traffic sequence, most of the speckled segmentation also disappears. [sent-149, score-0.252]
84 In terms of the overall segmentation quality, both LDT and ALDT are able to segment the composite video perfectly. [sent-150, score-0.576]
85 The segmentation of the laundry video by both models is plausible, as the laundry tumbling around the edge of the dryer moves faster than that spinning in place. [sent-151, score-0.89]
86 The two models also produce reasonable segmentations of the traffic video, with the segments roughly corresponding to the different lanes of traffic. [sent-152, score-0.156]
87 Much of the errors correspond to regions that either contain intermittent motion (e. [sent-153, score-0.144]
88 the region between the lanes) or almost no motion (e. [sent-155, score-0.126]
89 Some of these errors could be eliminated by filtering the video before segmentation, but we have attempted no pre or post-processing. [sent-158, score-0.375]
90 Finally, we note that the laundry and traffic videos are not trivial to segment with standard computer vision techniques, namely methods based on optical flow. [sent-159, score-0.371]
91 This is particularly true in the case of the traffic video where the abundance of straight lines and flat regions makes computing the correct optical flow difficult due to the aperture problem. [sent-160, score-0.518]
92 2 Synthesis The layered dynamic texture is a generative model, and hence a video can be synthesized by drawing a sample from the learned model. [sent-162, score-1.605]
93 A synthesized composite video using the LDT, ALDT, and the normal dynamic texture can be found at [18]. [sent-163, score-1.224]
94 When modeling a video with multiple motions, the regular dynamic texture will average different dynamics. [sent-164, score-1.094]
95 Figure 2: Frames from the test video sequences: (top) composite of water, smoke, and fire video textures; (middle) spinning laundry in a dryer; and (bottom) highway traffic with lanes traveling at different speeds. [sent-165, score-1.218]
96 (a) (b) (c) (d) Figure 3: Segmentation results for each of the test videos using: (a) the layered dynamic texture, and (b) the layered dynamic texture without MRF; (c) the approximate layered dynamic texture, and (d) the approximate LDT without MRF. [sent-166, score-2.627]
97 This is noticeable in the synthesized video, where the fire region does not flicker at the same speed as in the original video. [sent-167, score-0.102]
98 In contrast, the video synthesized from the layered dynamic texture is more realistic, as the fire region flickers at the correct speed, and the different regions follow their own motion patterns. [sent-171, score-1.728]
99 The video synthesized from the ALDT appears noisy because the pixels evolve from different instantiations of the state process. [sent-172, score-0.625]
100 Once again this illustrates the need for sophisticated layered models. [sent-173, score-0.431]
wordName wordTfidf (topN-words)
[('texture', 0.414), ('layered', 0.408), ('video', 0.375), ('dynamic', 0.285), ('layer', 0.221), ('aldt', 0.196), ('ldt', 0.196), ('zi', 0.165), ('laundry', 0.142), ('textures', 0.132), ('mrf', 0.13), ('traf', 0.113), ('segmentation', 0.095), ('motion', 0.095), ('videos', 0.092), ('iid', 0.09), ('pixel', 0.08), ('composite', 0.079), ('assignments', 0.079), ('spinning', 0.077), ('em', 0.076), ('pixels', 0.076), ('consistency', 0.072), ('highway', 0.071), ('lanes', 0.071), ('synthesized', 0.071), ('layers', 0.068), ('optical', 0.068), ('segmentations', 0.062), ('state', 0.057), ('dynamics', 0.056), ('cardboard', 0.053), ('doretto', 0.053), ('chan', 0.053), ('assignment', 0.051), ('synthesis', 0.049), ('regions', 0.049), ('water', 0.047), ('segmented', 0.047), ('instantiations', 0.046), ('vision', 0.042), ('appearance', 0.042), ('ci', 0.041), ('ow', 0.04), ('bvt', 0.036), ('dryer', 0.036), ('lane', 0.036), ('nuno', 0.036), ('ock', 0.036), ('rwt', 0.036), ('smoke', 0.036), ('expectations', 0.035), ('xt', 0.035), ('hidden', 0.033), ('dynamical', 0.032), ('generative', 0.032), ('posterior', 0.032), ('cacy', 0.031), ('modeled', 0.031), ('icker', 0.031), ('birds', 0.031), ('region', 0.031), ('processes', 0.031), ('weaker', 0.03), ('enforced', 0.03), ('collection', 0.03), ('stronger', 0.029), ('traveling', 0.028), ('enforcement', 0.028), ('frames', 0.028), ('segment', 0.027), ('aperture', 0.026), ('truck', 0.026), ('wu', 0.026), ('gibbs', 0.025), ('conditioned', 0.025), ('assigned', 0.024), ('scenes', 0.024), ('distinct', 0.024), ('motions', 0.024), ('enforces', 0.024), ('models', 0.023), ('sophisticated', 0.023), ('smoothing', 0.023), ('grid', 0.022), ('transformation', 0.022), ('columns', 0.022), ('argmax', 0.022), ('treats', 0.022), ('interpretation', 0.021), ('model', 0.021), ('promise', 0.021), ('inherently', 0.021), ('zj', 0.021), ('representations', 0.021), ('approximate', 0.021), ('learned', 0.02), ('subtle', 0.02), ('vt', 0.02), ('regular', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 108 nips-2005-Layered Dynamic Textures
Author: Antoni B. Chan, Nuno Vasconcelos
Abstract: A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video.
2 0.1404154 23 nips-2005-An Application of Markov Random Fields to Range Sensing
Author: James Diebel, Sebastian Thrun
Abstract: This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology. 1
3 0.10655526 110 nips-2005-Learning Depth from Single Monocular Images
Author: Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng
Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. 1
4 0.10133816 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning
Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun
Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.
5 0.087608449 7 nips-2005-A Cortically-Plausible Inverse Problem Solving Method Applied to Recognizing Static and Kinematic 3D Objects
Author: David Arathorn
Abstract: Recent neurophysiological evidence suggests the ability to interpret biological motion is facilitated by a neuronal
6 0.085902892 184 nips-2005-Structured Prediction via the Extragradient Method
7 0.0801346 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks
8 0.077639975 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
9 0.071865588 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
10 0.06403593 45 nips-2005-Conditional Visual Tracking in Kernel Space
11 0.062369581 130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks
12 0.061050087 79 nips-2005-Fusion of Similarity Data in Clustering
13 0.059121113 5 nips-2005-A Computational Model of Eye Movements during Object Class Detection
14 0.058167029 80 nips-2005-Gaussian Process Dynamical Models
15 0.057502788 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes
16 0.052308377 34 nips-2005-Bayesian Surprise Attracts Human Attention
17 0.051999215 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
18 0.051321134 176 nips-2005-Silicon growth cones map silicon retina
19 0.048945367 50 nips-2005-Convex Neural Networks
20 0.048645757 109 nips-2005-Learning Cue-Invariant Visual Responses
topicId topicWeight
[(0, 0.156), (1, -0.023), (2, 0.034), (3, 0.149), (4, -0.024), (5, -0.006), (6, -0.028), (7, 0.109), (8, 0.109), (9, -0.124), (10, 0.016), (11, 0.141), (12, -0.04), (13, -0.058), (14, 0.056), (15, -0.001), (16, 0.218), (17, 0.038), (18, 0.01), (19, 0.007), (20, 0.069), (21, -0.016), (22, 0.009), (23, 0.004), (24, -0.119), (25, 0.093), (26, -0.014), (27, 0.095), (28, -0.062), (29, -0.015), (30, 0.142), (31, -0.009), (32, -0.011), (33, -0.057), (34, 0.069), (35, -0.048), (36, 0.028), (37, -0.052), (38, -0.016), (39, 0.084), (40, 0.014), (41, -0.036), (42, -0.125), (43, 0.084), (44, 0.075), (45, 0.058), (46, 0.003), (47, 0.065), (48, 0.002), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.96273929 108 nips-2005-Layered Dynamic Textures
Author: Antoni B. Chan, Nuno Vasconcelos
Abstract: A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video.
2 0.661448 23 nips-2005-An Application of Markov Random Fields to Range Sensing
Author: James Diebel, Sebastian Thrun
Abstract: This paper describes a highly successful application of MRFs to the problem of generating high-resolution range images. A new generation of range sensors combines the capture of low-resolution range images with the acquisition of registered high-resolution camera images. The MRF in this paper exploits the fact that discontinuities in range and coloring tend to co-align. This enables it to generate high-resolution, low-noise range images by integrating regular camera images into the range data. We show that by using such an MRF, we can substantially improve over existing range imaging technology. 1
3 0.58816433 143 nips-2005-Off-Road Obstacle Avoidance through End-to-End Learning
Author: Urs Muller, Jan Ben, Eric Cosatto, Beat Flepp, Yann L. Cun
Abstract: We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.
4 0.58007634 110 nips-2005-Learning Depth from Single Monocular Images
Author: Ashutosh Saxena, Sung H. Chung, Andrew Y. Ng
Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. 1
Author: David Arathorn
Abstract: Recent neurophysiological evidence suggests the ability to interpret biological motion is facilitated by a neuronal
6 0.48546207 170 nips-2005-Scaling Laws in Natural Scenes and the Inference of 3D Shape
7 0.42648309 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
8 0.38815296 130 nips-2005-Modeling Neuronal Interactivity using Dynamic Bayesian Networks
9 0.38138509 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks
10 0.36356792 45 nips-2005-Conditional Visual Tracking in Kernel Space
11 0.34329215 184 nips-2005-Structured Prediction via the Extragradient Method
12 0.32069421 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks
13 0.30586532 97 nips-2005-Inferring Motor Programs from Images of Handwritten Digits
14 0.2925179 34 nips-2005-Bayesian Surprise Attracts Human Attention
15 0.28613845 129 nips-2005-Modeling Neural Population Spiking Activity with Gibbs Distributions
16 0.27040532 93 nips-2005-Ideal Observers for Detecting Motion: Correspondence Noise
17 0.26595059 169 nips-2005-Saliency Based on Information Maximization
18 0.26264876 55 nips-2005-Describing Visual Scenes using Transformed Dirichlet Processes
19 0.26042959 62 nips-2005-Efficient Estimation of OOMs
20 0.25830755 35 nips-2005-Bayesian model learning in human visual perception
topicId topicWeight
[(3, 0.038), (10, 0.022), (27, 0.023), (31, 0.499), (34, 0.06), (39, 0.017), (55, 0.017), (69, 0.048), (73, 0.025), (88, 0.103), (91, 0.042)]
simIndex simValue paperId paperTitle
1 0.98466808 164 nips-2005-Representing Part-Whole Relationships in Recurrent Neural Networks
Author: Viren Jain, Valentin Zhigulin, H. S. Seung
Abstract: There is little consensus about the computational function of top-down synaptic connections in the visual system. Here we explore the hypothesis that top-down connections, like bottom-up connections, reflect partwhole relationships. We analyze a recurrent network with bidirectional synaptic interactions between a layer of neurons representing parts and a layer of neurons representing wholes. Within each layer, there is lateral inhibition. When the network detects a whole, it can rigorously enforce part-whole relationships by ignoring parts that do not belong. The network can complete the whole by filling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identified using the theory of permitted and forbidden sets [3, 4]. The network behaviors are illustrated by recreating Rumelhart and McClelland’s “interactive activation” model [7]. In neural network models of visual object recognition [2, 6, 8], patterns of synaptic connectivity often reflect part-whole relationships between the features that are represented by neurons. For example, the connections of Figure 1 reflect the fact that feature B both contains simpler features A1, A2, and A3, and is contained in more complex features C1, C2, and C3. Such connectivity allows neurons to follow the rule that existence of the part is evidence for existence of the whole. By combining synaptic input from multiple sources of evidence for a feature, a neuron can “decide” whether that feature is present. 1 The synapses shown in Figure 1 are purely bottom-up, directed from simple to complex features. However, there are also top-down connections in the visual system, and there is little consensus about their function. One possibility is that top-down connections also reflect part-whole relationships. They allow feature detectors to make decisions using the rule that existence of the whole is evidence for existence of its parts. In this paper, we analyze the dynamics of a recurrent network in which part-whole relationships are stored as bidirectional synaptic interactions, rather than the unidirectional interactions of Figure 1. The network has a number of interesting computational capabilities. When the network detects a whole, it can rigorously enforce part-whole relationships 1 Synaptic connectivity may reflect other relationships besides part-whole. For example, invariances can be implemented by connecting detectors of several instances of the same feature to the same target, which is consequently an invariant detector of the feature. C1 C2 C3 B A1 A2 A3 Figure 1: The synaptic connections (arrows) of neuron B represent part-whole relationships. Feature B both contains simpler features and is contained in more complex features. The synaptic interactions are drawn one-way, as in most models of visual object recognition. Existence of the part is regarded as evidence for existence of the whole. This paper makes the interactions bidirectional, allowing the existence of the whole to be evidence for the existence of its parts. by ignoring parts that do not belong. The network can complete the whole by filling in missing parts. The network can refuse to recognize a whole, if the activated parts do not conform to a stored part-whole relationship. Parameter regimes in which these behaviors happen are identified using the recently developed theory of permitted and forbidden sets [3, 4]. Our model is closely related to the interactive activation model of word recognition, which was proposed by McClelland and Rumelhart to explain the word superiority effect studied by visual psychologists [7]. Here our concern is not to model a psychological effect, but to characterize mathematically how computations involving part-whole relationships can be carried out by a recurrent network. 1 Network model Suppose that we are given a set of part-whole relationships specified by a ξi = 1, if part i is contained in whole a 0, otherwise We assume that every whole contains at least one part, and every part is contained in at least one whole. The stimulus drives a layer of neurons that detect parts. These neurons also interact with a layer of neurons that detect wholes. We will refer to part-detectors as “P-neurons” and whole-detectors as “W-neurons.” The part-whole relationships are directly stored in the synaptic connections between P and a W neurons. If ξi = 1, the ith neuron in the P layer and the ath neuron in the W layer have a an excitatory interaction of strength γ. If ξi = 0, the neurons have an inhibitory interaction of strength σ. Furthermore, the P-neurons inhibit each other with strength β, and the Wneurons inhibit each other with strength α. All of these interactions are symmetric, and all activation functions are the rectification nonlinearity [z]+ = max{z, 0}. Then the dynamics of the network takes the form ˙ Wa + Wa a Pi ξ i − σ = γ i + a (1 − ξi )Pi − α i Wb , + ˙ Pi + Pi (1) b=a a Wa ξ i − σ = γ a a (1 − ξi )Wa − β a Pj + B i . j=i (2) where Bi is the input to the P layer from the stimulus. Figure 2 shows an example of a network with two wholes. Each whole contains two parts. One of the parts is contained in both wholes. -α Wa excitation γ -σ inhibition P1 B1 -β } W layer Wb -σ P2 -β B2 P3 } P layer B3 Figure 2: Model in example configuration: ξ = {(1, 1, 0), (0, 1, 1)}. When a stimulus is presented, it activates some of the P-neurons, which activate some of the W-neurons. The network eventually converges to a stable steady state. We will assume that α > 1. In the Appendix, we prove that this leads to unconditional winner-take-all behavior in the W layer. In other words, no more than one W-neuron can be active at a stable steady state. If a single W-neuron is active, then a whole has been detected. Potentially there are also many P-neurons active, indicating detection of parts. This representation may have different properties, depending on the choice of parameters β, γ, and σ. As discussed below, these include rigorous enforcement of part-whole relationships, completion of wholes by “filling in” missing parts, and non-recognition of parts that do not conform to a whole. 2 Enforcement of part-whole relationships Suppose that a single W-neuron is active at a stable steady state, so that a whole has been detected. Part-whole relationships are said to be enforced if the network always ignores parts that are not contained in the detected whole, despite potentially strong bottom-up evidence for them. It can be shown that enforcement follows from the inequality σ 2 + β 2 + γ 2 + 2σβγ > 1. (3) which guarantees that neuron i in the P layer is inactive, if neuron a in the W layer is a active and ξi = 0. When part-whole relations are enforced, prior knowledge about legal combinations of parts strictly constrains what may be perceived. This result is proven in the Appendix, and only an intuitive explanation is given here. Enforcement is easiest to understand when there is interlayer inhibition (σ > 0). In this case, the active W-neuron directly inhibits the forbidden P-neurons. The case of σ = 0 is more subtle. Then enforcement is mediated by lateral inhibition in the P layer. Excitatory feedback from the W-neuron has the effect of counteracting the lateral inhibition between the P-neurons that belong to the whole. As a result, these P-neurons become strongly activated enough to inhibit the rest of the P layer. 3 Completion of wholes by filling in missing parts If a W-neuron is active, it excites the P-neurons that belong to the whole. As a result, even if one of these P-neurons receives no bottom-up input (Bi = 0), it is still active. We call this phenomenon “completion,” and it is guaranteed to happen when (4) γ> β The network may thus “imagine” parts that are consistent with the recognized whole, but are not actually present in the stimulus. As with enforcement, this condition depends on top-down connections. √ In the special case γ = β, the interlayer excitation between a W-neuron and its P-neurons exactly cancels out the lateral inhibition between the P-neurons at a steady state. So the recurrent connections effectively vanish, letting the activity of the P-neurons be determined by their feedforward inputs. When the interlayer excitation is stronger than this, the inequality (4) holds, and completion occurs. 4 Non-recognition of a whole If there is no interlayer inhibition (σ = 0), then a single W-neuron is always active, assuming that there is some activity in the P layer. To see this, suppose for the sake of contradiction that all the W-neurons are inactive. Then they receive no inhibition to counteract the excitation from the P layer. This means some of them must be active, which contradicts our assumption. This means that the network always recognizes a whole, even if the stimulus is very different from any part-whole combination that is stored in the network. However, if interlayer inhibition is sufficiently strong (large σ), the network may refuse to recognize a whole. Neurons in the P layer are activated, but there is no activity in the W layer. Formal conditions on σ can be derived, but are not given here because of space limitations. In case of non-recognition, constraints on the P-layer are not enforced. It is possible for the network to detect a configuration of parts that is not consistent with any stored whole. 5 Example: Interactive Activation model To illustrate the computational capabilities of our network, we use it to recreate the interactive activation (IA) model of McClelland and Rumelhart. Figure 3 shows numerical simulations of a network containing three layers of neurons representing strokes, letters, and words, respectively. There are 16 possible strokes in each of four letter positions. For each stroke, there are two neurons, one signaling the presence of the stroke and the other signaling its absence. Letter neurons represent each letter of the alphabet in each of four positions. Word neurons represent each of 1200 common four letter words. The letter and word layers correspond to the P and W layers that were introduced previously. There are bidirectional interactions between the letter and word layers, and lateral inhibition within the layers. The letter neurons also receive input from the stroke neurons, but this interaction is unidirectional. Our network differs in two ways from the original IA model. First, all interactions involving letter and word neurons are symmetric. In the original model, the interactions between the letter and word layers were asymmetric. In particular, inhibitory connections only ran from letter neurons to word neurons, and not vice versa. Second, the only nonlinearity in our model is rectification. These two aspects allow us to apply the full machinery of the theory of permitted and forbidden sets. Figure 3 shows the result of presenting the stimulus “MO M” for four different settings of parameters. In each of the four cases, the word layer of the network converges to the same result, detecting the word “MOON”, which is the closest stored word to the stimulus. However, the activity in the letter layer is different in the four cases. input: P layer reconstruction W layer P layer reconstruction W layer completion noncompletion enforcement non-enforcement Figure 3: Simulation of 4 different parameter regimes in a letter- word recognition network. Within each panel, the middle column presents a feature- layer reconstruction based on the letter activity shown in the left column. W layer activity is shown in the right column. The top row shows the network state after 10 iterations of the dynamics. The bottom row shows the steady state. In the left column, the parameters obey the inequality (3), so that part- whole relationships are enforced. The activity of the letter layer is visualized by activating the strokes corresponding to each active letter neuron. The activated letters are part of the word “MOON”. In the top left, the inequality (4) is satisfied, so that the missing “O” in the stimulus is filled in. In the bottom left, completion does not occur. In the simulations of the right column, parameters are such that part- whole relationships are not enforced. Consequently, the word layer is much more active. Bottom- up input provides evidence for several other letters, which is not suppressed. In the top right, the inequality (4) is satisfied, so that the missing “O” in the stimulus is filled in. In the bottom right, the “O” neuron is not activated in the third position, so there is no completion. However, some letter neurons for the third position are activated, due to the input from neurons that indicate the absence of strokes. input: non-recognition event multi-stability Figure 4: Simulation of a non- recognition event and example of multistability. Figure 4 shows simulations for large σ, deep in the enforcement regime where non- recognition is a possibility. From one initial condition, the network converges to a state in which no W neurons are active, a non- recognition. From another initial condition, the network detects the word “NORM”. Deep in the enforcement regime, the top- down feedback can be so strong that the network has multiple stable states, many of which bear little resemblance to the stimulus at all. This is a problematic aspect of this network. It can be prevented by setting parameters at the edge of the enforcement regime. 6 Discussion We have analyzed a recurrent network that performs computations involving part- whole relationships. The network can fill in missing parts and suppress parts that do not belong. These two computations are distinct and can be dissociated from each other, as shown in Figure 3. While these two computations can also be performed by associative memory models, they are not typically dissociable in these models. For example, in the Hopfield model pattern completion and noise suppression are both the result of recall of one of a finite number of stereotyped activity patterns. We believe that our model is more appropriate for perceptual systems, because its behavior is piecewise linear, due its reliance on rectification nonlinearity. Therefore, analog aspects of computation are able to coexist with the part-whole relationships. Furthermore, in our model the stimulus is encoded in maintained synaptic input to the network, rather than as an initial condition of the dynamics. A Appendix: Permitted and forbidden sets Our mathematical results depend on the theory of permitted and forbidden sets [3, 4], which is summarized briefly here. The theory is applicable to neural networks with rectification nonlinearity, of the form xi + xi = [bi + j Wij xj ]+ . Neuron i is said to be active when ˙ xi > 0. For a network of N neurons, there are 2N possible sets of active neurons. For each active set, consider the submatrix of Wij corresponding to the synapses between active neurons. If all eigenvalues of this submatrix have real parts less than or equal to unity, then the active set is said to be permitted. Otherwise the active set is said to be forbidden. A set is permitted if and only if there exists an input vector b such that those neurons are active at a stable steady state. Permitted sets can be regarded as memories stored in the synaptic connections Wij . If Wij is a symmetric matrix, the nesting property holds: every subset of a permitted set is permitted, and every superset of a forbidden set is forbidden. The present model can be seen as a general method for storing permitted sets in a recurrent network. This method introduces a neuron for each permitted set, relying on a unary or “grandmother cell” representation. In contrast, Xie et al.[9] used lateral inhibition in a single layer of neurons to store permitted sets. By introducing extra neurons, the present model achieves superior storage capacity, much as unary models of associative memory [1] surpass distributed models [5]. A.1 Unconditional winner-take-all in the W layer The synapses between two W-neurons have strengths 0 −α −α 0 The eigenvalues of this matrix are ±α. Therefore two W-neurons constitute a forbidden set if α > 1. By the nesting property, it follows more than two W-neurons is also a forbidden set, and that the W layer has the unconditional winner-take-all property. A.2 Part-whole combinations as permitted sets Theorem 1. Suppose that β < 1. If γ 2 < β + (1 − β)/k then any combination of k ≥ 1 parts consistent with a whole corresponds to a permitted set. Proof. Consider k parts belonging to a whole. They are represented by one W-neuron and k P-neurons, with synaptic connections given by the (k + 1) × (k + 1) matrix M= −β(11T − I) γ1 , γ1T 0 (5) where 1 is the k- dimensional vector whose elements are all equal to one. Two eigenvectors of M are of the form (1T c), and have the same eigenvalues as the 2 × 2 matrix −β(k − 1) γk γ 0 This matrix has eigenvalues less than one when γ 2 < β + (1 − β)/k and β(k − 1) + 2 > 0. The other k − 1 eigenvectors are of the form (dT , 0), where dT 1 = 0. These have eigenvalues β. Therefore all eigenvalues of W are less than one if the condition of the theorem is satisfied. A.3 Constraints on combining parts Here, we derive conditions under which the network can enforce the constraint that steady state activity be confined to parts that constitute a whole. Theorem 2. Suppose that β > 0 and σ 2 +β 2 +γ 2 +2σβγ > 1 If a W- neuron is active, then only P- neurons corresponding to parts contained in the relevant whole can be active at a stable steady state. Proof. Consider P- neurons Pi , Pj , and W- neuron Wa . Supa a pose that ξi = 1 but ξj = 0. As shown in Figure 5, the matrix of connections is given by: W = 0 −β γ −β 0 −σ γ −σ 0 (6) Wa γ Pi -σ -β Pj Figure 5: A set of one W- neuron and two P- neurons is forbidden if one part belongs to the whole and the other does not. This set is permitted if all eigenvalues of W − I have negative real parts. The characteristic equation of I − W is λ3 + b1 λ2 + b2 λ + b3 = 0, where b1 = 3, b2 = 3 − σ 2 − β 2 − γ 2 and b3 = 1−2σβγ−σ 2 −β 2 −γ 2 . According to the Routh- Hurwitz theorem, all the eigenvalues have negative real parts if and only if b1 > 0, b3 > 0 and b1 b2 > b3 . Clearly, the first condition is always satisfied. The second condition is more restrictive than the third. It is satisfied only when σ 2 + β 2 + γ 2 + 2σβγ < 1. Hence, one of the eigenvalues has a positive real part when this condition is broken, i.e., when σ 2 +β 2 +γ 2 +2σβγ > 1. By the nesting property, any larger set of P- neurons inconsistent with the W- neuron is also forbidden. A.4 Completion of wholes √ Theorem 3. If γ > β and a single W- neuron a is active at a steady state, then Pi > 0 a for all i such that ξi = 1. Proof. Suppose that the detected whole has k parts. At the steady state Pi = a ξi Bi − (β − γ 2 )Ptot 1−β + where Ptot = Pi = i 1 1 − β + (β − γ 2 )k k a B i ξi i=1 (7) A.5 Preventing runaway If feedback loops cause the network activity to diverge, then the preceding analyses are not relevant. Here we give a sufficient condition guaranteeing that runaway instability does not happen. It is not a necessary condition. Interestingly, the condition implies the condition of Theorem 1. Theorem 4. Suppose that P and W obey the dynamics of Eqs. (1) and (2), and define the objective function E 1−α 2 = − 2 Wa a α + 2 Wa a 1−β + 2 a Pi Wa ξi + σ Bi Pi − γ i 2 ia Pi2 i 2 β + 2 Pi i a (1 − ξi )Pi Wa . (8) ia Then E is a Lyapunov like function that, given β > γ 2 − dynamics to a stable steady state. 1−γ 2 N −1 , ensures convergence of the Proof. (sketch) Differentiation of E with respect to time shows that that E is nonincreasing in the nonnegative orthant and constant only at steady states of the network dynamics. We must also show that E is radially unbounded, which is true if the quadratic part of E is copositive definite. Note that the last term of E is lower-bounded by zero and the previous term is upper bounded by γ ia Pi Wa . We assume α > 1. Thus, we can use Cauchy’s 2 2 inequality, i Pi2 ≥ ( i Pi ) /N , and the fact that a Wa ≤ ( a Wa )2 for Wa ≥ 0, to derive E≥ 1 2 Wa )2 + ( If β > γ 2 − unbounded. a 1−γ 2 N −1 , 1 − β + βN ( N Pi )2 − 2γ( i Wa a Pi ) i − Bi Pi . (9) i the quadratic form in the inequality is positive definite and E is radially References [1] E. B. Baum, J. Moody, and F. Wilczek. Internal representations for associative memory. Biol. Cybern., 59:217–228, 1988. [2] K. Fukushima. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern, 36(4):193–202, 1980. [3] R.H. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R.J. Douglas, and H.S. Seung. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature, 405(6789):947– 51, Jun 22 2000. [4] R.H. Hahnloser, H.S. Seung, and J.-J. Slotine. Permitted and forbidden sets in symmetric threshold-linear networks. Neural Computation, 15:621–638, 2003. [5] J.J. Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci U S A, 79(8):2554–8, Apr 1982. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1:541–551, 1989. [7] J. L. McClelland and D. E. Rumelhart. An interactive activation model of context effects in letter perception: Part i. an account of basic findings. Psychological Review, 88(5):375–407, Sep 1981. [8] M Riesenhuber and T Poggio. Hierarchical models of object recognition in cortex. Nat Neurosci, 2(11):1019–25, Nov 1999. [9] X. Xie, R.H. Hahnloser, and H. S. Seung. Selectively grouping neurons in recurrent networks of lateral inhibition. Neural Computation, 14:2627–2646, 2002.
2 0.95546234 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson
Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1
same-paper 3 0.94319141 108 nips-2005-Layered Dynamic Textures
Author: Antoni B. Chan, Nuno Vasconcelos
Abstract: A dynamic texture is a video model that treats a video as a sample from a spatio-temporal stochastic process, specifically a linear dynamical system. One problem associated with the dynamic texture is that it cannot model video where there are multiple regions of distinct motion. In this work, we introduce the layered dynamic texture model, which addresses this problem. We also introduce a variant of the model, and present the EM algorithm for learning each of the models. Finally, we demonstrate the efficacy of the proposed model for the tasks of segmentation and synthesis of video.
4 0.92835164 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting
Author: Martin J. Wainwright
Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working in the computation-limited setting, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of variational methods. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. 1 Keywords: Markov random fields; variational method; message-passing algorithms; sum-product; belief propagation; parameter estimation; learning. 1
5 0.91874135 145 nips-2005-On Local Rewards and Scaling Distributed Reinforcement Learning
Author: Drew Bagnell, Andrew Y. Ng
Abstract: We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n. 1
6 0.6875059 78 nips-2005-From Weighted Classification to Policy Search
7 0.64553094 153 nips-2005-Policy-Gradient Methods for Planning
8 0.60355973 111 nips-2005-Learning Influence among Interacting Markov Chains
9 0.60269654 46 nips-2005-Consensus Propagation
10 0.60049558 142 nips-2005-Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games
11 0.57506061 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
12 0.57305115 187 nips-2005-Temporal Abstraction in Temporal-difference Networks
13 0.56455892 124 nips-2005-Measuring Shared Information and Coordinated Activity in Neuronal Networks
14 0.56263596 36 nips-2005-Bayesian models of human action understanding
15 0.5502187 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
16 0.5468598 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
17 0.53992474 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
18 0.53535581 181 nips-2005-Spiking Inputs to a Winner-take-all Network
19 0.53366196 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
20 0.52216053 144 nips-2005-Off-policy Learning with Options and Recognizers