nips nips2008 nips2008-157 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade
Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. [sent-9, score-1.515]
2 In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. [sent-10, score-0.349]
3 The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. [sent-11, score-0.225]
4 We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. [sent-12, score-0.214]
5 We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). [sent-14, score-2.256]
6 1 Introduction Nonrigid structure from motion is the process of recovering the time varying 3D coordinates of points on a deforming object from their 2D locations in an image sequence. [sent-15, score-0.737]
7 Factorization approaches, first proposed for recovering rigid structure by Tomasi and Kanade in [1], were extended to handle nonrigidity in the seminal paper by Bregler et al. [sent-16, score-0.324]
8 The key idea in [2] is that observed shapes can be represented as a linear combination of a compact set of basis shapes. [sent-18, score-0.354]
9 Each instantaneous structure, such as the mouth of a smiling actor shown in Figure 1(a), is expressed as a point in the linear space of shapes spanned by the shape basis. [sent-19, score-0.404]
10 A number of approaches that develop the use of shape basis have subsequently been proposed, including [3, 4, 5]. [sent-20, score-0.41]
11 Since the space of spatial deformations is highly object specific, the shape basis need to be estimated anew for each video sequence. [sent-21, score-0.655]
12 The shape basis of a mouth smiling, for instance, cannot be recycled to compactly represent a person walking. [sent-22, score-0.63]
13 In this paper, we posit that representing nonrigid structure as a combination of basis shapes is one of two ways of looking at the space-time structure induced by P points seen across F frames. [sent-23, score-0.902]
14 Instead of a shape space representation, we propose looking across time, representing the time-varying structure of a nonrigid object as a linear combination of a set of basis trajectories, as illustrated in Figure 1(b). [sent-24, score-0.971]
15 The principal advantage of taking this “lateral” approach arises from the fact that compact representation in trajectory space is better motivated physically than compact representation in shape space. [sent-25, score-0.544]
16 The extent of its deformation is limited by the force that can be applied. [sent-27, score-0.135]
17 Hence, a tree swaying in the wind or a person walking cannot arbitrarily and randomly deform; the trajectories of their points are a function of the speed of the wind and the flexing of muscles respectively. [sent-28, score-0.259]
18 (b) Figure 1: 3D points on a smiling mouth: a comparison of shape and trajectory space. [sent-31, score-0.582]
19 (a) In approaches that represent the time varying structure in shape space, all 3D points observed at one time instant are projected onto a single point in the shape space. [sent-32, score-0.669]
20 S1 , S2 , · · · , Sk each represent a shape basis vector. [sent-33, score-0.439]
21 (b) In our approach, we represent the time varying structure in trajectory space, where a 3D point’s trajectory over time is projected to a single point in the trajectory space. [sent-34, score-1.068]
22 θ1 , θ2 , · · · , θk each represent a trajectory basis vector. [sent-35, score-0.541]
23 P points observed across F frames are expressed as F projected points in shape space and P points in trajectory space. [sent-36, score-0.693]
24 Since this property is, to a large degree, ubiquitous, basis can be defined in trajectory that are object independent. [sent-38, score-0.59]
25 In fact, most previous results consider deformations which have a large rigid component, such as talking-head videos or the motion of a swimming shark. [sent-40, score-0.456]
26 To the best of our knowledge, we are the first to show reasonable reconstructions of highly nonrigid motions from a single video sequence without making object specific assumptions. [sent-41, score-0.607]
27 For all results, we use the same trajectory basis, the Discrete Cosine Transform (DCT) basis, underlining the generic nature of the trajectory space representation. [sent-42, score-0.608]
28 A useful byproduct of this approach is that structure is automatically compressed for compact transmission without the need for post facto compression or the overhead transmission of object specific basis. [sent-43, score-0.203]
29 2 Related work If deformation of a 3D scene is unconstrained, the structure observed in each image would be independent of those in other images. [sent-44, score-0.266]
30 In this case, recovering structure from motion is ill-posed, equivalent to finding 3D structure from a single 2D image at each time instant. [sent-45, score-0.604]
31 To make nonrigid structure recovery tractable, some consistency in the deformation of structure has to be imposed. [sent-46, score-0.703]
32 However, the first general solution to the problem of nonrigid structure recovery was introduced by Bregler et al. [sent-48, score-0.532]
33 in [2], approximating the structure at each time instant as a linear combination of basis shapes. [sent-49, score-0.394]
34 They recovered the structure, the shape basis and the camera rotations simultaneously, by exploiting orthonormality constraints of the rotation matrices. [sent-50, score-0.941]
35 include [10] which improved the numerical stability of the estimation process and [3] which introduced a Gaussian prior on the shape coefficients. [sent-56, score-0.25]
36 Some approaches, such as [11] make explicit use of this fact to initialize rotation matrices, while others favor such sequences for stability in estimation. [sent-58, score-0.152]
37 In contrast to this entire corpus of work, which approximate structure by a shape basis, we propose a new representation of time varying structure, as a collection of trajectories. [sent-59, score-0.329]
38 We not only demonstrate that a compact trajectory space can be defined, but also that the basis of this trajectory space can be pre-defined, removing a large number of unknowns from the estimation process altogether. [sent-60, score-0.927]
39 + axk + ax2 Figure 2: As described in Equation 3, each trajectory is represented as a linear combination of k predefined basis trajectories. [sent-66, score-0.61]
40 In this paper, we use DCT basis to compactly represent trajectories. [sent-67, score-0.368]
41 Ours is the first paper to use this dual representation in the structure from motion problem, and to note that a generic basis can be defined in trajectory space which compactly represents most real trajectories. [sent-69, score-1.084]
42 3 Representing Nonrigid Structure The structure at a time instant t can be represented by arranging the 3D locations of the P points in a matrix S(t) ∈ R3×P , Xt1 XtP S(t) = Yt1 · · · YtP . [sent-70, score-0.243]
43 Zt1 ZtP The complete time varying structure can be represented by concatenating these instantaneous structures as S3F ×P = [S(1)T S(2)T · · · S(F )T ]T . [sent-71, score-0.195]
44 In [2], each instantaneous shape matrix S(t) is approximated as a linear combination of basis shapes, cj (t)S j , S(t) = (1) j where S j ∈ R3×P is a basis shape and cj (t) is the coefficient of that basis shape. [sent-72, score-1.193]
45 If the set of observed structures can be compactly expressed in terms of k such basis shapes, S has a rank of at most 3k. [sent-73, score-0.378]
46 XF 1 · · · XF P YF 1 · · · YF P ZF 1 · · · ZF P The row space of this matrix corresponds to the shape space. [sent-93, score-0.247]
47 We call the column space of this matrix the trajectory space and note that it enjoys a dual relationship with the shape space. [sent-95, score-0.534]
48 Specifically, if the time varying shape of an object can be expressed by a minimum of k shape basis, then there exist exactly k trajectory basis vectors that can represent the same time varying shape. [sent-96, score-1.099]
49 The time varying structure matrix can then be factorized into an inverse projection matrix and coefficient matrix as S3F ×P = Θ3F ×3k A3k×P , where A = [AT AT AT ]T x y z and T θ1 ax1 (1) . [sent-99, score-0.33]
50 T θF , T θ1 T θF (4) Here θi represents a truncated basis for transformation from coefficient space to original space. [sent-108, score-0.225]
51 The principal benefit of the trajectory space representation is that a basis can be pre-defined that can compactly approximate most real trajectories. [sent-109, score-0.626]
52 A number of bases such as the Hadamard Transform basis, the Discrete Fourier Transform basis, and the Discrete Wavelet Transform basis can all compactly represent trajectories in an object independent way. [sent-110, score-0.588]
53 In this paper, we use the Discrete Cosine Transform basis set to generate Θ (shown in Figure 2) for all reconstructions results shown. [sent-111, score-0.321]
54 The efficacy of the DCT basis has been demonstrated for compressing motion capture data, [14], and has been effective in our experiments as well. [sent-112, score-0.56]
55 4 Nonrigid Structure and Motion Factorization The measured 2D trajectories are contained in a 2F × P measurement matrix W, containing the location of P image points across F frames, u . [sent-113, score-0.227]
56 Therefore, to recover metric structure we need to estimate the rectification matrix Q such that the following equations hold true, ˆ Λ = ΛQ, 5 ˆ A = Q−1 A. [sent-142, score-0.151]
57 (6) Metric Upgrade The problem of recovering the rotation and structure is reduced to estimating the rectification matrix Q. [sent-143, score-0.287]
58 5 Camera motion per frame (in Degrees) 5 F=400 8 6 4 3 2 6 10 5 10 4 10 3 10 2 10 1 1 10 0 0 10 1 1. [sent-159, score-0.442]
59 5 5 Camera motion per frame (in Degrees) Figure 3: Effect of increasing camera motion on reconstruction stability. [sent-169, score-1.152]
60 Reconstruction stability is measured in terms of condition number of matrix ΛT Λ with different values of k and different values of F . [sent-170, score-0.127]
61 Synthetic rotations were generated by revolving the camera around the z-axis and camera motion was measured in terms of the angle the camera moved per frame. [sent-171, score-1.352]
62 This equation shows that the unknowns in matrix Q||| can be found by exploiting the fact that Ri ˆ is a truncated rotation matrix (as was done in [1]). [sent-172, score-0.303]
63 Once R is known, it can be multiplied with the (known) DCT basis matrix Θ3F ×3k to recover the matrix Λ2F ×3k = R2F ×3F Θ3F ×3k . [sent-179, score-0.349]
64 6 (9) Results The proposed algorithm has been validated quantitatively on motion capture data over different actions and qualitatively on video data. [sent-181, score-0.409]
65 We have tested the approach extensively on highly nonrigid human motion like volleyball digs, handstands, karate moves and dancing. [sent-182, score-0.694]
66 As mentioned earlier, we choose DCT as the basis for the trajectory space. [sent-184, score-0.512]
67 In nonrigid structure from motion, the key relationship that determines successful reconstruction is the one between the degree of deformation of the object, measured by the number of basis k required to approximate it and the degree of camera motion. [sent-190, score-1.243]
68 To test the relationship between k, camera motion and reconstruction stability, we constructed Λ matrices using different values of k and synthetic rotations around the z-axis, at various magnitudes of motion per frame. [sent-191, score-1.201]
69 In Figure 3, the reconstruction stability, measured by the condition number of ΛT Λ, is shown as k is varied between 2 and 6, for 200, 400, and 800 frames (at different angular velocities per frame). [sent-192, score-0.167]
70 The plots confirm intuition: the smaller the degree of object deformation and the larger the camera motion, the more stable reconstruction tends to be. [sent-193, score-0.618]
71 For quantitative evaluation of reconstruction accuracy we used the drink, pickup, yoga, stretch, and dance actions from the CMU Mocap database, and the shark dataset of [3]. [sent-194, score-0.204]
72 Multiple rigid body data was generated by simulation of points on rigidly moving cubes. [sent-195, score-0.215]
73 We generated synthetic camera rotations and projected 3D data using these rotations to get image observations. [sent-196, score-0.609]
74 The camera rotation for the Mocap datasets was 5 degrees per frame and 2 degrees per frame for the multi-body 0. [sent-197, score-0.668]
75 The X-coordinate trajectories for three different points on the actors is shown. [sent-265, score-0.155]
76 [9] Figure 5: The dance sequence from the CMU mocap database. [sent-270, score-0.129]
77 The black dots are the ground truth points while the gray circles are the reconstructions by the three methods respectively. [sent-271, score-0.143]
78 We did not rotate the camera for the dance and shark sequences, since the object itself was rotating in these sequences. [sent-273, score-0.481]
79 In obtaining the results discussed below, k was chosen to provide the best reconstructions, the value varying between 2 and 13 depending on the length of the sequence and the nonrigidity of motion. [sent-274, score-0.118]
80 We normalize the structure, so that the average standard deviation of the structure matrix S becomes equal to unity (to make comparison of error across datasets more meaningful). [sent-275, score-0.151]
81 Table 1 shows a quantitative comparison of our method with the shape basis approach of Torresani et al. [sent-276, score-0.463]
82 This table shows both the camera rotation estimation error and structure reconstruction error. [sent-278, score-0.551]
83 The estimated structure is valid up to a 3D rotation and translation and the estimated rotations also have a 3D rotation ambiguity. [sent-279, score-0.386]
84 Procrustes analysis was used for aligning camera rotations and the 3D structure. [sent-281, score-0.41]
85 The error measure for camera rotations was the average Frobenius norm difference between the original camera rotation and the estimated camera rotation. [sent-282, score-1.071]
86 For structure evaluation we compute the per frame mean squared error between original 3D points and the estimated 3D points. [sent-283, score-0.243]
87 Finally, to test the proposed approach on real data, we used a face sequence from the PIE dataset, a sequence from the movie “The Matrix”, a sequence capturing two rigidly moving cubes and a sequence of a toy dinosaur moving nonrigidly. [sent-284, score-0.22]
88 We show the resulting reconstructions in Figure 6, and compare against the reconstructions obtained from Torresani et al. [sent-286, score-0.245]
89 The Erot is the average Frobenius difference between original rotations and aligned estimated rotations, and E∆ is the average distance between original 3D points and aligned reconstructed points Datset D RINK P ICK U P YOGA S TRETCH M ULTI R IGID DANCE S HARK 7 Trajectory Bases Erot E∆ 5. [sent-290, score-0.217]
90 4772 Conclusion We describe an algorithm to reconstruct nonrigid structure of an object from 2D trajectories of points across a video sequence. [sent-326, score-0.723]
91 Unlike earlier approaches that require an object-specific shape basis to be estimated for each new video sequence, we demonstrate that a generic trajectory basis can be defined that can compactly represent the motion of a wide variety of real deformations. [sent-327, score-1.508]
92 Results are shown using the DCT basis to recover structures of piece-wise rigid motion, facial expressions, actors dancing, walking, and doing yoga. [sent-328, score-0.327]
93 Our experiments show that there is a relationship between camera motion, degree of object deformation, and reconstruction stability. [sent-329, score-0.483]
94 We observe that as the motion of the camera increases with respect to the degree of deformation, the reconstruction stability increases. [sent-330, score-0.805]
95 Future directions of research include experimenting with different unitary transform bases to verify that DCT basis are, in fact, the best generic basis to use, and developing a synergistic approach to use both shape and trajectory bases concurrently. [sent-331, score-1.136]
96 The motion capture data used in this project was obtained from http://mocap. [sent-339, score-0.335]
97 Shape and motion from image streams under orthography: A factorization method. [sent-346, score-0.436]
98 A closed form solution to non-rigid shape and motion recovery. [sent-364, score-0.52]
99 Nonrigid structure-from motion: Estimating shape and motion with hierarchical priors. [sent-370, score-0.52]
100 Non-rigid structure from motion using ranklet-based tracking and non-linear optimization. [sent-411, score-0.424]
wordName wordTfidf (topN-words)
[('nonrigid', 0.359), ('motion', 0.335), ('camera', 0.287), ('trajectory', 0.287), ('torresani', 0.254), ('xiao', 0.237), ('basis', 0.225), ('shape', 0.185), ('dct', 0.166), ('deformation', 0.135), ('rotations', 0.123), ('compactly', 0.114), ('reconstructions', 0.096), ('unknowns', 0.092), ('kanade', 0.092), ('structure', 0.089), ('reconstruction', 0.088), ('rotation', 0.087), ('bregler', 0.085), ('erot', 0.085), ('object', 0.078), ('trajectories', 0.076), ('video', 0.074), ('dance', 0.074), ('frame', 0.074), ('rigid', 0.07), ('na', 0.066), ('bases', 0.066), ('stability', 0.065), ('axk', 0.063), ('hertzmann', 0.063), ('nonrigidity', 0.063), ('smiling', 0.063), ('tz', 0.063), ('zf', 0.063), ('matrix', 0.062), ('rf', 0.06), ('factorization', 0.059), ('shapes', 0.058), ('xf', 0.055), ('rigidly', 0.055), ('mocap', 0.055), ('varying', 0.055), ('et', 0.053), ('instantaneous', 0.051), ('deformations', 0.051), ('ty', 0.051), ('recovering', 0.049), ('transform', 0.048), ('mouth', 0.047), ('points', 0.047), ('frames', 0.046), ('ijcv', 0.045), ('instant', 0.045), ('moving', 0.043), ('akhter', 0.042), ('anew', 0.042), ('ayj', 0.042), ('azj', 0.042), ('deforming', 0.042), ('dinosaur', 0.042), ('lahore', 0.042), ('lums', 0.042), ('pakistan', 0.042), ('shark', 0.042), ('sheikh', 0.042), ('sohaib', 0.042), ('tomasi', 0.042), ('uf', 0.042), ('yaser', 0.042), ('yoga', 0.042), ('image', 0.042), ('cvpr', 0.041), ('yf', 0.041), ('degrees', 0.04), ('tx', 0.04), ('rank', 0.039), ('cosine', 0.038), ('duality', 0.038), ('vf', 0.037), ('cubes', 0.037), ('multibody', 0.037), ('axj', 0.037), ('wind', 0.037), ('dancing', 0.037), ('compact', 0.036), ('combination', 0.035), ('generic', 0.034), ('projected', 0.034), ('orthonormality', 0.034), ('per', 0.033), ('actors', 0.032), ('pie', 0.032), ('walking', 0.032), ('factorize', 0.032), ('recovery', 0.031), ('degree', 0.03), ('person', 0.03), ('represent', 0.029), ('recti', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade
Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1
2 0.22405402 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
Author: Shuang Wu, Hongjing Lu, Alan L. Yuille
Abstract: Psychophysical experiments show that humans are better at perceiving rotation and expansion than translation. These findings are inconsistent with standard models of motion integration which predict best performance for translation [6]. To explain this discrepancy, our theory formulates motion perception at two levels of inference: we first perform model selection between the competing models (e.g. translation, rotation, and expansion) and then estimate the velocity using the selected model. We define novel prior models for smooth rotation and expansion using techniques similar to those in the slow-and-smooth model [17] (e.g. Green functions of differential operators). The theory gives good agreement with the trends observed in human experiments. 1
3 0.17974588 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
4 0.12875852 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
Author: Silvia Chiappa, Jens Kober, Jan R. Peters
Abstract: Motor primitives or motion templates have become an important concept for both modeling human motor control as well as generating robot behaviors using imitation learning. Recent impressive results range from humanoid robot movement generation to timing models of human motions. The automatic generation of skill libraries containing multiple motion templates is an important step in robot learning. Such a skill learning system needs to cluster similar movements together and represent each resulting motion template as a generative model which is subsequently used for the execution of the behavior by a robot system. In this paper, we show how human trajectories captured as multi-dimensional time-series can be clustered using Bayesian mixtures of linear Gaussian state-space models based on the similarity of their dynamics. The appropriate number of templates is automatically determined by enforcing a parsimonious parametrization. As the resulting model is intractable, we introduce a novel approximation method based on variational Bayes, which is especially designed to enable the use of efficient inference algorithms. On recorded human Balero movements, this method is not only capable of finding reasonable motion templates but also yields a generative model which works well in the execution of this complex task on a simulated anthropomorphic SARCOS arm.
5 0.10635141 119 nips-2008-Learning a discriminative hidden part model for human action recognition
Author: Yang Wang, Greg Mori
Abstract: We present a discriminative part-based approach for human action recognition from video sequences using motion features. Our model is based on the recently proposed hidden conditional random field (hCRF) for object recognition. Similar to hCRF for object recognition, we model a human action by a flexible constellation of parts conditioned on image observations. Different from object recognition, our model combines both large-scale global features and local patch features to distinguish various actions. Our experimental results show that our model is comparable to other state-of-the-art approaches in action recognition. In particular, our experimental results demonstrate that combining large-scale global features and local patch features performs significantly better than directly applying hCRF on local patches alone. 1
6 0.10279786 95 nips-2008-Grouping Contours Via a Related Image
7 0.086588033 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
8 0.084037349 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
9 0.083013579 75 nips-2008-Estimating vector fields using sparse basis field expansions
10 0.080997713 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
11 0.077755623 23 nips-2008-An ideal observer model of infant object perception
12 0.060274046 62 nips-2008-Differentiable Sparse Coding
13 0.057193983 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
14 0.057049461 48 nips-2008-Clustering via LP-based Stabilities
15 0.055690959 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks
16 0.054981958 61 nips-2008-Diffeomorphic Dimensionality Reduction
17 0.052482922 200 nips-2008-Robust Kernel Principal Component Analysis
18 0.051376965 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
19 0.049941141 238 nips-2008-Theory of matching pursuit
20 0.048759189 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
topicId topicWeight
[(0, -0.147), (1, -0.028), (2, 0.109), (3, -0.059), (4, 0.102), (5, -0.001), (6, -0.149), (7, -0.108), (8, 0.098), (9, 0.033), (10, -0.064), (11, -0.069), (12, 0.054), (13, 0.209), (14, 0.083), (15, 0.032), (16, -0.116), (17, 0.147), (18, -0.062), (19, 0.039), (20, -0.168), (21, -0.088), (22, -0.077), (23, -0.224), (24, 0.046), (25, -0.007), (26, -0.08), (27, -0.115), (28, 0.038), (29, 0.029), (30, 0.067), (31, 0.015), (32, 0.08), (33, -0.063), (34, -0.024), (35, 0.083), (36, 0.061), (37, 0.023), (38, -0.011), (39, 0.02), (40, 0.005), (41, -0.036), (42, 0.056), (43, -0.038), (44, 0.009), (45, -0.072), (46, 0.063), (47, -0.077), (48, 0.083), (49, 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 0.96991479 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade
Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1
2 0.79910856 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
Author: Shuang Wu, Hongjing Lu, Alan L. Yuille
Abstract: Psychophysical experiments show that humans are better at perceiving rotation and expansion than translation. These findings are inconsistent with standard models of motion integration which predict best performance for translation [6]. To explain this discrepancy, our theory formulates motion perception at two levels of inference: we first perform model selection between the competing models (e.g. translation, rotation, and expansion) and then estimate the velocity using the selected model. We define novel prior models for smooth rotation and expansion using techniques similar to those in the slow-and-smooth model [17] (e.g. Green functions of differential operators). The theory gives good agreement with the trends observed in human experiments. 1
3 0.64115494 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
4 0.5639323 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
Author: Silvia Chiappa, Jens Kober, Jan R. Peters
Abstract: Motor primitives or motion templates have become an important concept for both modeling human motor control as well as generating robot behaviors using imitation learning. Recent impressive results range from humanoid robot movement generation to timing models of human motions. The automatic generation of skill libraries containing multiple motion templates is an important step in robot learning. Such a skill learning system needs to cluster similar movements together and represent each resulting motion template as a generative model which is subsequently used for the execution of the behavior by a robot system. In this paper, we show how human trajectories captured as multi-dimensional time-series can be clustered using Bayesian mixtures of linear Gaussian state-space models based on the similarity of their dynamics. The appropriate number of templates is automatically determined by enforcing a parsimonious parametrization. As the resulting model is intractable, we introduce a novel approximation method based on variational Bayes, which is especially designed to enable the use of efficient inference algorithms. On recorded human Balero movements, this method is not only capable of finding reasonable motion templates but also yields a generative model which works well in the execution of this complex task on a simulated anthropomorphic SARCOS arm.
5 0.53459483 23 nips-2008-An ideal observer model of infant object perception
Author: Charles Kemp, Fei Xu
Abstract: Before the age of 4 months, infants make inductive inferences about the motions of physical objects. Developmental psychologists have provided verbal accounts of the knowledge that supports these inferences, but often these accounts focus on categorical rather than probabilistic principles. We propose that infant object perception is guided in part by probabilistic principles like persistence: things tend to remain the same, and when they change they do so gradually. To illustrate this idea we develop an ideal observer model that incorporates probabilistic principles of rigidity and inertia. Like previous researchers, we suggest that rigid motions are expected from an early age, but we challenge the previous claim that the inertia principle is relatively slow to develop [1]. We support these arguments by modeling several experiments from the developmental literature. Over the past few decades, ingenious experiments [1, 2] have suggested that infants rely on systematic expectations about physical objects when interpreting visual scenes. Looking time studies suggest, for example, that infants expect objects to follow continuous trajectories through time and space, and understand that two objects cannot simultaneously occupy the same location. Many of these studies have been replicated several times, but there is still no consensus about the best way to characterize the knowledge that gives rise to these findings. Two main approaches can be found in the literature. The verbal approach uses natural language to characterize principles of object perception [1, 3]: for example, Spelke [4] proposes that object perception is consistent with principles including continuity (“a moving object traces exactly one connected path over space and time”) and cohesion (“a moving object maintains its connectedness and boundaries”). The mechanistic approach proposes that physical knowledge is better characterized by describing the mechanisms that give rise to behavior, and researchers working in this tradition often develop computational models that support their theoretical proposals [5]. We pursue a third approach—the ideal observer approach [6, 7, 8]—that combines aspects of both previous traditions. Like the verbal approach, our primary goal is to characterize principles that account for infant behavior, and we will not attempt to characterize the mechanisms that produce this behavior. Like the mechanistic approach, we emphasize the importance of formal models, and suggest that these models can capture forms of knowledge that are difficult for verbal accounts to handle. Ideal observer models [6, 9] specify the conclusions that normatively follow given a certain source of information and a body of background knowledge. These models can therefore address questions about the information and the knowledge that support perception. Approaches to the information question characterize the kinds of perceptual information that human observers use. For example, Geisler [9] discusses which components of the information available at the retina contribute to visual perception, and Banks and Shannon [10] use ideal observer models to study the perceptual consequences of immaturities in the retina. Approaches to the knowledge question characterize the background assumptions that are combined with the available input in order to make inductive inferences. For example, Weiss and Adelson [7] describe several empirical phenomena that are consistent with the a priori assumption that motions tend to be slow and smooth. There are few previous attempts to develop ideal observer models of infant perception, and most of them focus only on the information question [10]. This paper addresses the knowledge question, and proposes that the ideal observer approach can help to identify the minimal set of principles needed to account for the visual competence of young infants. Most verbal theories of object perception focus on categorical principles [4], or principles that make a single distinction between possible and impossible scenes. We propose that physical knowledge in infancy is also characterized by probabilistic principles, or expectations that make some possible scenes more surprising than others. We demonstrate the importance of probabilistic principles by focusing on two examples: the rigidity principle states that objects usually maintain their shape and size when they move, and the inertia principle states that objects tend to maintain the same pattern of motion over time. Both principles capture important regularities, but exceptions to these regularities are relatively common. Focusing on rigidity and inertia allows us to demonstrate two contributions that probabilistic approaches can make. First, probabilistic approaches can reinforce current proposals about infant perception. Spelke [3] suggests that rigidity is a core principle that guides object perception from a very early age, and we demonstrate how this idea can be captured by a model that also tolerates exceptions, such as non-rigid biological motion. Second, probabilistic approaches can identify places where existing proposals may need to be revised. Spelke [3] argues that the principle of inertia is slow to develop, but we suggest that a probabilistic version of this principle can help to account for inferences made early in development. 1 An ideal observer approach An ideal observer approach to object perception can be formulated in terms of a generative model for scenes. Scenes can be generated in three steps. First we choose the number n of objects that will appear in the scene, and generate the shape, visual appearance, and initial location of each object. We then choose a velocity field for each object which specifies how the object moves and changes shape over time. Finally, we create a visual scene by taking a two-dimensional projection of the moving objects generated in the two previous steps. An ideal observer approach explores the idea that the inferences made by infants approximate the optimal inferences with respect to this generative model. We work within this general framework but make two simplifications. We will not discuss how the shapes and visual appearances of objects are generated, and we make the projection step simple by working with a two-dimensional world. These simplifications allow us to focus on the expectations about velocity fields that guide motion perception in infants. The next two sections present two prior distributions that can be used to generate velocity fields. The first is a baseline prior that does not incorporate probabilistic principles, and the second incorporates probabilistic versions of rigidity and inertia. The two priors capture different kinds of knowledge, and we argue that the second provides the more accurate characterization of the knowledge that infants bring to object perception. 1.1 A baseline prior on velocity fields Our baseline prior is founded on five categorical principles that are closely related to principles discussed by Spelke [3, 4]. The principles we consider rely on three basic notions: space, time, and matter. We also refer to particles, which are small pieces of matter that occupy space-time points. Particles satisfy several principles: C1. Temporal continuity. Particles are not created or destroyed. In other words, every particle that exists at time t1 must also exist at time t2 . C2. Spatial continuity. Each particle traces a continuous trajectory through space. C3. Exclusion. No two particles may occupy the same space-time point. An object is a collection of particles, and these collections satisfy two principles: C4. Discreteness. Each particle belongs to exactly one object. C5. Cohesion. At each point in time, the particles belonging to an object occupy a single connected region of space. Suppose that we are interested in a space-time window specified by a bounded region of space and a bounded interval of time. For simplicity, we will assume that space is two-dimensional, and that the space-time window corresponds to the unit cube. Suppose that a velocity field v assigns a velocity (vx , vy ) to each particle in the space-time window, and let vi be the field created by considering only particles that belong to object i. We develop a theory of object perception by defining a prior distribution p(v) on velocity fields. Consider first the distribution p(v1 ) on fields for a single object. Any field that violates one or more of principles C1–C5 is assigned zero probability. For instance, fields where part of an object winks out of existence violate the principle of temporal continuity, and fields where an object splits into two distinct pieces violate the principle of cohesion. Many fields, however, remain, including fields that specify non-rigid motions and jagged trajectories. For now, assume that we are working with a space of fields that is bounded but very large, and that the prior distribution over this space is uniform for all fields consistent with principles C1–C5: p(v1 ) ∝ f (v1 ) = 0 1 if v1 violates C1–C5 otherwise. (1) Consider now the distribution p(v1 , v2 ) on fields for pairs of objects. Principles C1 through C5 rule out some of these fields, but again we must specify a prior distribution on those that remain. Our prior is induced by the following principle: C6. Independence. Velocity fields for multiple objects are independently generated subject to principles C1 through C5. More formally, the independence principle specifies how the prior for the multiple object case is related to the prior p(v1 ) on velocity fields for a single object (Equation 1): p(v1 , . . . , vn ) ∝ f (v1 , . . . , vn ) = 0 if {vi } collectively violate C1–C5 f (v1 ) . . . f (vn ) otherwise. (2) 1.2 A smoothness prior on velocity fields We now develop a prior p(v) that incorporates probabilistic expectations about the motion of physical objects. Consider again the prior p(v1 ) on the velocity field v1 of a single object. Principles C1–C5 make a single cut that distinguishes possible from impossible fields, but we need to consider whether infants have additional knowledge that makes some of the possible fields less surprising than others. One informal idea that seems relevant is the notion of persistence[11]: things tend to remain the same, and when they change they do so gradually. We focus on two versions of this idea that may guide expectations about velocity fields: S1. Spatial smoothness. Velocity fields tend to be smooth in space. S2. Temporal smoothness. Velocity fields tend to be smooth in time. A field is “smooth in space” if neighboring particles tend to have similar velocities at any instant of time. The smoothest possible field will be one where all particles have the same velocity at any instant—in other words, where an object moves rigidly. The principle of spatial smoothness therefore captures the idea that objects tend to maintain the same shape and size. A field is “smooth in time” if any particle tends to have similar velocities at nearby instants of time. The smoothest possible field will be one where each particle maintains the same velocity throughout the entire interval of interest. The principle of temporal smoothness therefore captures the idea that objects tend to maintain their initial pattern of motion. For instance, stationary objects tend to remain stationary, moving objects tend to keep moving, and a moving object following a given trajectory tends to continue along that trajectory. Principles S1 and S2 are related to two principles— rigidity and inertia—that have been discussed in the developmental literature. The rigidity principle states that objects “tend to maintain their size and shape over motion”[3], and the inertia principle states that objects move smoothly in the absence of obstacles [4]. Some authors treat these principles rather differently: for instance, Spelke suggests that rigidity is one of the core principles that guides object perception from a very early age [3], but that the principle of inertia is slow to develop and is weak or fragile once acquired. Since principles S1 and S2 seem closely related, the suggestion that one develops much later than the other seems counterintuitive. The rest of this paper explores the idea that both of these principles are needed to characterize infant perception. Our arguments will be supported by formal analyses, and we therefore need formal versions of S1 and S2. There may be different ways to formalize these principles, but we present a simple L1 L2 U b) 200 L1 L2 U 0 log “ p(H1 |v) p(H2 |v) ” a) −200 baseline smoothness Figure 1: (a) Three scenes inspired by the experiments of Spelke and colleagues [12, 13]. Each scene can be interpreted as a single object, or as a small object on top of a larger object. (b) Relative preferences for the one-object and two-object interpretations according to two models. The baseline model prefers the one-object interpretation in all three cases, but the smoothness model prefers the one-object interpretation only for scenes L1 and L2. approach that builds on existing models of motion perception in adults [7, 8]. We define measures of instantaneous roughness that capture how rapidly a velocity field v varies in space and time: Rspace (v, t) = ∂v(x, y, t) ∂x 2 ∂v(x, y, t) ∂t 1 vol(O(t)) 2 + ∂v(x, y, t) ∂y 2 dxdy (3) O(t) Rtime (v, t) = 1 vol(O(t)) dxdy (4) O(t) where O(t) is the set of all points that are occupied by the object at time t, and vol(O(t)) is the volume of the object at time t. Rspace (v, t) will be large if neighboring particles at time t tend to have different velocities, and Rtime (v, t) will be large if many particles are accelerating at time t. We combine our two roughness measures to create a single smoothness function S(·) that measures the smoothness of a velocity field: S(v) = −λspace Rspace (v, t)dt − λtime Rtime (v, t)dt (5) where λspace and λtime are positive weights that capture the importance of spatial smoothness and temporal smoothness. For all analyses in this paper we set λspace = 10000 and λtime = 250, which implies that violations of spatial smoothness are penalized more harshly than violations of temporal smoothness. We now replace Equation 1 with a prior on velocity fields that takes smoothness into account: 0 if v1 violates C1–C5 p(v1 ) ∝ f (v1 ) = (6) exp (S(v1 )) otherwise. Combining Equation 6 with Equation 2 specifies a model of object perception that incorporates probabilistic principles of rigidity and inertia. 2 Empirical findings: spatial smoothness There are many experiments where infants aged 4 months and younger appear to make inferences that are consistent with the principle of rigidity. This section suggests that the principle of spatial smoothness can account for these results. We therefore propose that a probabilistic principle (spatial smoothness) can explain all of the findings previously presented in support of a categorical principle (rigidity), and can help in addition to explain how infants perceive non-rigid motion. One set of studies explores inferences about the number of objects in a scene. When a smaller block is resting on top of a larger block (L1 in Figure 1a), 3-month-olds infer that the scene includes a single object [12]. The same result holds when the small and large blocks are both moving in the same direction (L2 in Figure 1a) [13]. When these blocks are moving in opposite directions (U in Figure 1a), however, infants appear to infer that the scene contains two objects [13]. Results like these suggest that infants may have a default expectation that objects tend to move rigidly. We compared the predictions made by two models about the scenes in Figure 1a. The smoothness model uses a prior p(v1 ) that incorporates principles S1 and S2 (Equation 6), and the baseline model is identical except that it sets λspace = λtime = 0. Both models therefore incorporate principles C1– C6, but only the smoothness model captures the principle of spatial smoothness. Given any of the scenes in Figure 1a, an infant must solve two problems: she must compute the velocity field v for the scene and must decide whether this field specifies the motion of one or two objects. Here we focus on the second problem, and assume that the infant’s perceptual system has already computed a veridical velocity field for each scene that we consider. In principle, however, the smoothness prior in Equation 6 can address both problems. Previous authors have shown how smoothness priors can be used to compute velocity fields given raw image data [7, 8]. Let H1 be the hypothesis that a given velocity field corresponds to a single object, and let H2 be the hypothesis that the field specifies the motions of two objects. We assume that the prior probabilities of these hypotheses are equal, and that P (H1 ) = P (H2 ) = 0.5. An ideal observer can use the posterior odds ratio to choose between these hypotheses: P (H1 |v) P (v|H1 ) P (H1 ) = ≈ P (H2 |v) P (v|H2 ) P (H2 ) f (v) f (v1 )dv1 f (v1 , v2 )dv1 dv2 f (vA , vB ) (7) Equation 7 follows from Equations 2 and 6, and from approximating P (v|H2 ) by considering only the two object interpretation (vA , vB ) with maximum posterior probability. For each scene in Figure 1a, the best two object interpretation will specify a field vA for the small upper block, and a field vB for the large lower block. To approximate the posterior odds ratio in Equation 7 we compute rough approximations of f (v1 )dv1 and f (v1 , v2 )dv1 dv2 by summing over a finite space of velocity fields. As described in the supporting material, we consider all fields that can be built from objects with 5 possible shapes, 900 possible starting locations, and 10 possible trajectories. For computational tractability, we convert each continuous velocity field to a discrete field defined over a space-time grid with 45 cells along each spatial dimension and 21 cells along the temporal dimension. Our results show that both models prefer the one-object hypothesis H1 when presented with scenes L1 and L2 (Figure 1b). Since there are many more two-object scenes than one-object scenes, any typical two-object interpretation is assigned lower prior probability than a typical one-object interpretation. This preference for simpler interpretations is a consequence of the Bayesian Occam’s razor. The baseline model makes the same kind of inference about scene U, and again prefers the one-object interpretation. Like infants, however, the smoothness model prefers the two-object interpretation of scene U. This model assigns low probability to a one-object interpretation where adjacent points on the object have very different velocities, and this preference for smooth motion is strong enough to overcome the simplicity preference that makes the difference when interpreting the other two scenes. Other experiments from the developmental literature have produced results consistent with the principle of spatial smoothness. For example, 3.5-month olds are surprised when a tall object is fully hidden behind a short screen, 4 month olds are surprised when a large object appears to pass through a small slot, and 4.5-month olds expect a swinging screen to be interrupted when an object is placed in its path [1, 2]. All three inferences appear to rely on the expectation that objects tend not to shrink or to compress like foam rubber. Many of these experiments are consistent with an account that simply rules out non-rigid motion instead of introducing a graded preference for spatial smoothness. Biological motions, however, are typically non-rigid, and experiments suggest that infants can track and make inferences about objects that follow non-rigid trajectories [14]. Findings like these call for a theory like ours that incorporates a preference for rigid motion, but recognizes that non-rigid motions are possible. 3 Empirical findings: temporal smoothness We now turn to the principle of temporal smoothness (S2) and discuss some of the experimental evidence that bears on this principle. Some researchers suggest that a closely related principle (inertia) is slow to develop, but we argue that expectations about temporal smoothness are needed to capture inferences made before the age of 4 months. Baillargeon and DeVos [15] describe one relevant experiment that explores inferences about moving objects and obstacles. During habituation, 3.5-month-old infants saw a car pass behind an occluder and emerge from the other side (habituation stimulus H in Figure 2a). An obstacle was then placed in the direct path of the car (unlikely scenes U1 and U2) or beside this direct path (likely scene L), and the infants again saw the car pass behind the occluder and emerge from the other side. Looking L U1 U2 p(L) p(U 1) ” log “ p(L) p(U 2) ” 400 H “ 600 a) log log “ pH (L) pH (U 1) ” log “ pH (L) pH (U 2) ” b) 200 X X X baseline 0 smoothness Figure 2: (a) Stimuli inspired by the experiments of [15]. The habituation stimulus H shows a block passing behind a barrier and emerging on the other side. After habituation, a new block is added either out of the direct path of the first block (L) or directly in the path of the first block (U1 and U2). In U1, the first block leaps over the second block, and in U2 the second block hops so that the first block can pass underneath. (b) Relative probabilities of scenes L, U1 and U2 according to two models. The baseline model finds all three scenes equally likely a priori, and considers L and U2 equally likely after habituation. The smoothness model considers L more likely than the other scenes both before and after habituation. a) H1 H2 L U b) ” log p(L) p(U ) 300 log “ pH1 (L) pH1 (U ) ” 200 c) “ log “ pH2 (L) pH2 (U ) ” 100 0 −100 X X baseline smoothness Figure 3: (a) Stimuli inspired by the experiments of Spelke et al. [16]. (b) Model predictions. After habituation to H1, the smoothness model assigns roughly equal probabilities to L and U. After habituation to H2, the model considers L more likely. (c) A stronger test of the inertia principle. Now the best interpretation of stimulus U involves multiple changes of direction. time measurements suggested that the infants were more surprised to see the car emerge when the obstacle lay within the direct path of the car. This result is consistent with the principle of temporal smoothness, which suggests that infants expected the car to maintain a straight-line trajectory, and the obstacle to remain stationary. We compared the smoothness model and the baseline model on a schematic version of this task. To model this experiment, we again assume that the infant’s perceptual system has recovered a veridical velocity field, but now we must allow for occlusion. An ideal observer approach that treats a two dimensional scene as a projection of a three dimensional world can represent the occluder as an object in its own right. Here, however, we continue to work with a two dimensional world, and treat the occluded parts of the scene as missing data. An ideal observer approach should integrate over all possible values of the missing data, but for computational simplicity we approximate this approach by considering only one or two high-probability interpretations of each occluded scene. We also need to account for habituation, and for cases where the habituation stimulus includes occlusion. We assume that an ideal observer computes a habituation field vH , or the velocity field with maximum posterior probability given the habituation stimulus. In Figure 2a, the inferred habituation field vH specifies a trajectory where the block moves smoothly from the left to the right of the scene. We now assume that the observer expects subsequent velocity fields to be similar to vH . Formally, we use a product-of-experts approach to define a post-habituation distribution on velocity fields: pH (v) ∝ p(v)p(v|vH ) (8) The first expert p(v) uses the prior distribution in Equation 6, and the second expert p(v|vH ) assumes that field v is drawn from a Gaussian distribution centered on vH . Intuitively, after habituation to vH the second expert expects that subsequent velocity fields will be similar to vH . More information about this model of habituation is provided in the supporting material. Given these assumptions, the black and dark gray bars in Figure 2 indicate relative a priori probabilities for scenes L, U1 and U2. The baseline model considers all three scenes equally probable, but the smoothness model prefers L. After habituation, the baseline model is still unable to account for the behavioral data, since it considers scenes L and U2 to be equally probable. The smoothness model, however, continues to prefer L. We previously mentioned three consequences of the principle of temporal smoothness: stationary objects tend to remain stationary, moving objects tend to keep moving, and moving objects tend to maintain a steady trajectory. The “car and obstacle” task addresses the first and third of these proposals, but other tasks provide support for the second. Many authors have studied settings where one moving object comes to a stop, and a second object starts to move [17]. Compared to the case where the first object collides with the second, infants appear to be surprised by the “no-contact” case where the two objects never touch. This finding is consistent with the temporal smoothness principle, which predicts that infants expect the first object to continue moving until forced to stop, and expect the second object to remain stationary until forced to start. Other experiments [18] provide support for the principle of temporal smoothness, but there are also studies that appear inconsistent with this principle. In one of these studies [16], infants are initially habituated to a block that moves from one corner of an enclosure to another (H1 in Figure 3a). After habituation, infants see a block that begins from a different corner, and now the occluder is removed to reveal the block in a location consistent with a straight-line trajectory (L) or in a location that matches the final resting place during the habituation phase (U). Looking times suggest that infants aged 4-12 months are no more surprised by the inertia-violating outcome (U) than the inertia-consistent outcome (L). The smoothness model, however, can account for this finding. The outcome in U is contrary to temporal smoothness but consistent with habituation, and the tradeoff between these factors leads the model to assign roughly the same probability to scenes L and U (Figure 3b). Only one of the inertia experiments described by Spelke et al. [16] and Spelke et al. [1] avoids this tradeoff between habituation and smoothness. This experiment considers a case where the habituation stimulus (H2 in Figure 3a) is equally similar to the two test stimuli. The results suggest that 8 month olds are now surprised by the inertia-violating outcome, and the predictions of our model are consistent with this finding (Figure 3b). 4 and 6 month olds, however, continue to look equally at the two outcomes. Note, however, that the trajectories in Figure 3 include at most one inflection point. Experiments that consider trajectories with many inflection points can provide a more powerful way of exploring whether 4 month olds have expectations about temporal smoothness. One possible experiment is sketched in Figure 3c. The task is very similar to the task in Figure 3a, except that a barrier is added after habituation. In order for the block to end up in the same location as before, it must now follow a tortuous path around the barrier (U). Based on the principle of temporal smoothness, we predict that 4-month-olds will be more surprised to see the outcome in stimulus U than the outcome in stimulus L. This experimental design is appealing in part because previous work shows that infants are surprised by a case similar to U where the barrier extends all the way from one wall to the other [16], and our proposed experiment is a minor variant of this task. Although there is room for debate about the status of temporal smoothness, we presented two reasons to revisit the conclusion that this principle develops relatively late. First, some version of this principle seems necessary to account for experiments like the car and obstacle experiment in Figure 2. Second, most of the inertia experiments that produced null results use a habituation stimulus which may have prevented infants from revealing their default expectations, and the one experiment that escapes this objection considers a relatively minor violation of temporal smoothness. Additional experiments are needed to explore this principle, but we predict that the inertia principle will turn out to be yet another example of knowledge that is available earlier than researchers once thought. 4 Discussion and Conclusion We argued that characterizations of infant knowledge should include room for probabilistic expectations, and that probabilistic expectations about spatial and temporal smoothness appear to play a role in infant object perception. To support these claims we described an ideal observer model that includes both categorical (C1 through C5) and probabilistic principles (S1 and S2), and demonstrated that the categorical principles alone are insufficient to account for several experimental findings. Our two probabilistic principles are related to principles (rigidity and inertia) that have previously been described as categorical principles. Although rigidity and inertia appear to play a role in some early inferences, formulating these principles as probabilistic expectations helps to explain how infants deal with non-rigid motion and violations of inertia. Our analysis focused on some of the many existing experiments in the developmental literature, but new experiments will be needed to explore our probabilistic approach in depth. Categorical versions of a given principle (e.g. rigidity) allow room for only two kinds of behavior depending on whether the principle is violated or not. Probabilistic principles can be violated to a greater or lesser extent, and our approach predicts that violations of different magnitude may lead to different behaviors. Future studies of rigidity and inertia can consider violations of these principles that range from mild (Figure 3a) to severe (Figure 3c), and can explore whether infants respond to these violations differently. Future work should also consider whether the categorical principles we described (C1 through C5) are better characterized as probabilistic expectations. In particular, future studies can explore whether young infants consider large violations of cohesion (C5) or spatial continuity (C2) more surprising than smaller violations of these principles. Although we did not focus on learning, our approach allows us to begin thinking formally about how principles of object perception might be acquired. First, we can explore how parameters like the smoothness parameters in our model (λspace and λtime ) might be tuned by experience. Second, we can use statistical model selection to explore transitions between different sets of principles. For instance, if a learner begins with the baseline model we considered (principles C1–C6), we can explore which subsequent observations provide the strongest statistical evidence for smoothness principles S1 and S2, and how much of this evidence is required before an ideal learner would prefer our smoothness model over the baseline model. It is not yet clear which principles of object perception could be learned, but the ideal observer approach can help to resolve this question. References [1] E. S. Spelke, K. Breinlinger, J. Macomber, and K. Jacobson. Origins of knowledge. Psychological Review, 99:605–632, 1992. [2] R. Baillargeon, L. Kotovsky, and A. Needham. The acquisition of physical knowledge in infancy. In D. Sperber, D. Premack, and A. J. Premack, editors, Causal Cognition: A multidisciplinary debate, pages 79–116. Clarendon Press, Oxford, 1995. [3] E. S. Spelke. Principles of object perception. Cognitive Science, 14:29–56, 1990. [4] E. Spelke. Initial knowledge: six suggestions. Cognition, 50:431–445, 1994. [5] D. Mareschal and S. P. Johnson. Learning to perceive object unity: a connectionist account. Developmental Science, 5:151–172, 2002. [6] D. Kersten and A. Yuille. Bayesian models of object perception. Current opinion in Neurobiology, 13: 150–158, 2003. [7] Y. Weiss and E. H. Adelson. Slow and smooth: a Bayesian theory for the combination of local motion signals in human vision. Technical Report A.I Memo No. 1624, MIT, 1998. [8] A. L. Yuille and N. M. Grzywacz. A mathematical analysis of the motion coherence theory. International Journal of Computer Vision, 3:155–175, 1989. [9] W. S. Geisler. Physical limits of acuity and hyperacuity. Journal of the Optical Society of America, 1(7): 775–782, 1984. [10] M. S. Banks and E. Shannon. Spatial and chromatic visual efficiency in human neonates. In Visual perception and cognition in infancy, pages 1–46. Lawrence Erlbaum Associates, Hillsdale, NJ, 1993. [11] R. Baillargeon. Innate ideas revisited: for a principle of persistence in infants’ physical reasoning. Perspectives on Psychological Science, 3(3):2–13, 2008. [12] R. Kestenbaum, N. Termine, and E. S. Spelke. Perception of objects and object boundaries by threemonth-old infants. British Journal of Developmental Psychology, 5:367–383, 1987. [13] E. S. Spelke, C. von Hofsten, and R. Kestenbaum. Object perception and object-directed reaching in infancy: interaction of spatial and kinetic information for object boundaries. Developmental Psychology, 25:185–196, 1989. [14] G. Huntley-Fenner, S. Carey, and A. Solimando. Objects are individuals but stuff doesn’t count: perceived rigidity and cohesiveness influence infants’ representations of small groups of discrete entities. Cognition, 85:203–221, 2002. [15] R. Baillargeon and J. DeVos. Object permanence in young infants: further evidence. Child Development, 61(6):1227–1246, 1991. [16] E. S. Spelke, G. Katz, S. E. Purcell, S. M. Ehrlich, and K. Breinlinger. Early knowledge of object motion: continuity and inertia. Cognition, 51:131–176, 1994. [17] L. Kotovsky and R. Baillargeon. Reasoning about collisions involving inert objects in 7.5-month-old infants. Developmental Science, 3(3):344–359, 2000. [18] T. Wilcox and A. Schweinle. Infants’ use of speed information to individuate objects in occlusion events. Infant Behavior and Development, 26:253–282, 2003.
6 0.47808227 119 nips-2008-Learning a discriminative hidden part model for human action recognition
7 0.44846457 95 nips-2008-Grouping Contours Via a Related Image
8 0.36150903 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks
9 0.35409826 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
10 0.35290161 61 nips-2008-Diffeomorphic Dimensionality Reduction
11 0.32716179 75 nips-2008-Estimating vector fields using sparse basis field expansions
12 0.31032172 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
13 0.30669919 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
14 0.29491553 66 nips-2008-Dynamic visual attention: searching for coding length increments
15 0.27983907 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
16 0.27473328 90 nips-2008-Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity
17 0.26224807 106 nips-2008-Inferring rankings under constrained sensing
18 0.25503969 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
19 0.24307074 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics
20 0.23737293 147 nips-2008-Multiscale Random Fields with Application to Contour Grouping
topicId topicWeight
[(6, 0.031), (7, 0.048), (12, 0.604), (28, 0.096), (57, 0.041), (59, 0.016), (63, 0.016), (77, 0.026), (83, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.89959103 157 nips-2008-Nonrigid Structure from Motion in Trajectory Space
Author: Ijaz Akhter, Yaser Sheikh, Sohaib Khan, Takeo Kanade
Abstract: Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data, and qualitatively on several video sequences exhibiting nonrigid motions including piece-wise rigid motion, partially nonrigid motion (such as a facial expression), and highly nonrigid motion (such as a person dancing). 1
2 0.83887553 139 nips-2008-Modeling the effects of memory on human online sentence processing with particle filters
Author: Roger P. Levy, Florencia Reali, Thomas L. Griffiths
Abstract: Language comprehension in humans is significantly constrained by memory, yet rapid, highly incremental, and capable of utilizing a wide range of contextual information to resolve ambiguity and form expectations about future input. In contrast, most of the leading psycholinguistic models and fielded algorithms for natural language parsing are non-incremental, have run time superlinear in input length, and/or enforce structural locality constraints on probabilistic dependencies between events. We present a new limited-memory model of sentence comprehension which involves an adaptation of the particle filter, a sequential Monte Carlo method, to the problem of incremental parsing. We show that this model can reproduce classic results in online sentence comprehension, and that it naturally provides the first rational account of an outstanding problem in psycholinguistics, in which the preferred alternative in a syntactic ambiguity seems to grow more attractive over time even in the absence of strong disambiguating information. 1
3 0.76154476 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
Author: Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Abstract: Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. 1
4 0.75563163 170 nips-2008-Online Optimization in X-Armed Bandits
Author: Sébastien Bubeck, Gilles Stoltz, Csaba Szepesvári, Rémi Munos
Abstract: We consider a generalization of stochastic bandit problems where the set of arms, X , is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally H¨ lder √ a known exponent, then the expected o with regret is bounded up to a logarithmic factor by n, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider. 1 Introduction and motivation Bandit problems arise in many settings, including clinical trials, scheduling, on-line parameter tuning of algorithms or optimization of controllers based on simulations. In the classical bandit problem there are a finite number of arms that the decision maker can select at discrete time steps. Selecting an arm results in a random reward, whose distribution is determined by the identity of the arm selected. The distributions associated with the arms are unknown to the decision maker whose goal is to maximize the expected sum of the rewards received. In many practical situations the arms belong to a large set. This set could be continuous [1; 6; 3; 2; 7], hybrid-continuous, or it could be the space of infinite sequences over a finite alphabet [4]. In this paper we consider stochastic bandit problems where the set of arms, X , is allowed to be an arbitrary topological space. We assume that the decision maker knows a dissimilarity function defined over this space that constraints the shape of the mean-payoff function. In particular, the dissimilarity function is assumed to put a lower bound on the mean-payoff function from below at each maxima. We also assume that the decision maker is able to cover the space of arms in a recursive manner, successively refining the regions in the covering such that the diameters of these sets shrink at a known geometric rate when measured with the dissimilarity. ∗ Csaba Szepesv´ ri is on leave from MTA SZTAKI. He also greatly acknowledges the support received from the a Alberta Ingenuity Fund, iCore and NSERC. 1 Our work generalizes and improves previous works on continuum-armed bandit problems: Kleinberg [6] and Auer et al. [2] focussed on one-dimensional problems. Recently, Kleinberg et al. [7] considered generic metric spaces assuming that the mean-payoff function is Lipschitz with respect to the (known) metric of the space. They proposed an interesting algorithm that achieves essentially the best possible regret in a minimax sense with respect to these environments. The goal of this paper is to further these works in a number of ways: (i) we allow the set of arms to be a generic topological space; (ii) we propose a practical algorithm motivated by the recent very successful tree-based optimization algorithms [8; 5; 4] and show that the algorithm is (iii) able to exploit higher order smoothness. In particular, as we shall argue in Section 7, (i) improves upon the results of Auer et al. [2], while (i), (ii) and (iii) improve upon the work of Kleinberg et al. [7]. Compared to Kleinberg et al. [7], our work represents an improvement in the fact that just like Auer et al. [2] we make use of the local properties of the mean-payoff function around the maxima only, and not a global property, such as Lipschitzness in √ the whole space. This allows us to obtain a regret which scales as O( n) 1 when e.g. the space is the unit hypercube and the mean-payoff function is locally H¨ lder with known exponent in the neighborhood of any o maxima (which are in finite number) and bounded away from the maxima outside of these neighborhoods. Thus, we get the desirable property that the rate of growth of the regret is independent of the dimensionality of the input space. We also prove a minimax lower bound that matches our upper bound up to logarithmic factors, showing that the performance of our algorithm is essentially unimprovable in a minimax sense. Besides these theoretical advances the algorithm is anytime and easy to implement. Since it is based on ideas that have proved to be efficient, we expect it to perform well in practice and to make a significant impact on how on-line global optimization is performed. 2 Problem setup, notation We consider a topological space X , whose elements will be referred to as arms. A decision maker “pulls” the arms in X one at a time at discrete time steps. Each pull results in a reward that depends on the arm chosen and which the decision maker learns of. The goal of the decision maker is to choose the arms so as to maximize the sum of the rewards that he receives. In this paper we are concerned with stochastic environments. Such an environment M associates to each arm x ∈ X a distribution Mx on the real line. The support of these distributions is assumed to be uniformly bounded with a known bound. For the sake of simplicity, we assume this bound is 1. We denote by f (x) the expectation of Mx , which is assumed to be measurable (all measurability concepts are with respect to the Borel-algebra over X ). The function f : X → R thus defined is called the mean-payoff function. When in round n the decision maker pulls arm Xn ∈ X , he receives a reward Yn drawn from MXn , independently of the past arm choices and rewards. A pulling strategy of a decision maker is determined by a sequence ϕ = (ϕn )n≥1 of measurable mappings, n−1 where each ϕn maps the history space Hn = X × [0, 1] to the space of probability measures over X . By convention, ϕ1 does not take any argument. A strategy is deterministic if for every n the range of ϕn contains only Dirac distributions. According to the process that was already informally described, a pulling strategy ϕ and an environment M jointly determine a random process (X1 , Y1 , X2 , Y2 , . . .) in the following way: In round one, the decision maker draws an arm X1 at random from ϕ1 and gets a payoff Y1 drawn from MX1 . In round n ≥ 2, first, Xn is drawn at random according to ϕn (X1 , Y1 , . . . , Xn−1 , Yn−1 ), but otherwise independently of the past. Then the decision maker gets a rewards Yn drawn from MXn , independently of all other random variables in the past given Xn . Let f ∗ = supx∈X f (x) be the maximal expected payoff. The cumulative regret of a pulling strategy in n n environment M is Rn = n f ∗ − t=1 Yt , and the cumulative pseudo-regret is Rn = n f ∗ − t=1 f (Xt ). 1 We write un = O(vu ) when un = O(vn ) up to a logarithmic factor. 2 In the sequel, we restrict our attention to the expected regret E [Rn ], which in fact equals E[Rn ], as can be seen by the application of the tower rule. 3 3.1 The Hierarchical Optimistic Optimization (HOO) strategy Trees of coverings We first introduce the notion of a tree of coverings. Our algorithm will require such a tree as an input. Definition 1 (Tree of coverings). A tree of coverings is a family of measurable subsets (Ph,i )1≤i≤2h , h≥0 of X such that for all fixed integer h ≥ 0, the covering ∪1≤i≤2h Ph,i = X holds. Moreover, the elements of the covering are obtained recursively: each subset Ph,i is covered by the two subsets Ph+1,2i−1 and Ph+1,2i . A tree of coverings can be represented, as the name suggests, by a binary tree T . The whole domain X = P0,1 corresponds to the root of the tree and Ph,i corresponds to the i–th node of depth h, which will be referred to as node (h, i) in the sequel. The fact that each Ph,i is covered by the two subsets Ph+1,2i−1 and Ph+1,2i corresponds to the childhood relationship in the tree. Although the definition allows the childregions of a node to cover a larger part of the space, typically the size of the regions shrinks as depth h increases (cf. Assumption 1). Remark 1. Our algorithm will instantiate the nodes of the tree on an ”as needed” basis, one by one. In fact, at any round n it will only need n nodes connected to the root. 3.2 Statement of the HOO strategy The algorithm picks at each round a node in the infinite tree T as follows. In the first round, it chooses the root node (0, 1). Now, consider round n + 1 with n ≥ 1. Let us denote by Tn the set of nodes that have been picked in previous rounds and by Sn the nodes which are not in Tn but whose parent is. The algorithm picks at round n + 1 a node (Hn+1 , In+1 ) ∈ Sn according to the deterministic rule that will be described below. After selecting the node, the algorithm further chooses an arm Xn+1 ∈ PHn+1 ,In+1 . This selection can be stochastic or deterministic. We do not put any further restriction on it. The algorithm then gets a reward Yn+1 as described above and the procedure goes on: (Hn+1 , In+1 ) is added to Tn to form Tn+1 and the children of (Hn+1 , In+1 ) are added to Sn to give rise to Sn+1 . Let us now turn to how (Hn+1 , In+1 ) is selected. Along with the nodes the algorithm stores what we call B–values. The node (Hn+1 , In+1 ) ∈ Sn to expand at round n + 1 is picked by following a path from the root to a node in Sn , where at each node along the path the child with the larger B–value is selected (ties are broken arbitrarily). In order to define a node’s B–value, we need a few quantities. Let C(h, i) be the set that collects (h, i) and its descendants. We let n Nh,i (n) = I{(Ht ,It )∈C(h,i)} t=1 be the number of times the node (h, i) was visited. A given node (h, i) is always picked at most once, but since its descendants may be picked afterwards, subsequent paths in the tree can go through it. Consequently, 1 ≤ Nh,i (n) ≤ n for all nodes (h, i) ∈ Tn . Let µh,i (n) be the empirical average of the rewards received for the time-points when the path followed by the algorithm went through (h, i): n 1 µh,i (n) = Yt I{(Ht ,It )∈C(h,i)} . Nh,i (n) t=1 The corresponding upper confidence bound is by definition Uh,i (n) = µh,i (n) + 3 2 ln n + ν 1 ρh , Nh,i (n) where 0 < ρ < 1 and ν1 > 0 are parameters of the algorithm (to be chosen later by the decision maker, see Assumption 1). For nodes not in Tn , by convention, Uh,i (n) = +∞. Now, for a node (h, i) in Sn , we define its B–value to be Bh,i (n) = +∞. The B–values for nodes in Tn are given by Bh,i (n) = min Uh,i (n), max Bh+1,2i−1 (n), Bh+1,2i (n) . Note that the algorithm is deterministic (apart, maybe, from the arbitrary random choice of Xt in PHt ,It ). Its total space requirement is linear in n while total running time at round n is at most quadratic in n, though we conjecture that it is O(n log n) on average. 4 Assumptions made on the model and statement of the main result We suppose that X is equipped with a dissimilarity , that is a non-negative mapping : X 2 → R satisfying (x, x) = 0. The diameter (with respect to ) of a subset A of X is given by diam A = supx,y∈A (x, y). Given the dissimilarity , the “open” ball with radius ε > 0 and center c ∈ X is B(c, ε) = { x ∈ X : (c, x) < ε } (we do not require the topology induced by to be related to the topology of X .) In what follows when we refer to an (open) ball, we refer to the ball defined with respect to . The dissimilarity will be used to capture the smoothness of the mean-payoff function. The decision maker chooses and the tree of coverings. The following assumption relates this choice to the parameters ρ and ν1 of the algorithm: Assumption 1. There exist ρ < 1 and ν1 , ν2 > 0 such that for all integers h ≥ 0 and all i = 1, . . . , 2h , the diameter of Ph,i is bounded by ν1 ρh , and Ph,i contains an open ball Ph,i of radius ν2 ρh . For a given h, the Ph,i are disjoint for 1 ≤ i ≤ 2h . Remark 2. A typical choice for the coverings in a cubic domain is to let the domains be hyper-rectangles. They can be obtained, e.g., in a dyadic manner, by splitting at each step hyper-rectangles in the middle along their longest side, in an axis parallel manner; if all sides are equal, we split them along the√ axis. In first this example, if X = [0, 1]D and (x, y) = x − y α then we can take ρ = 2−α/D , ν1 = ( D/2)α and ν2 = 1/8α . The next assumption concerns the environment. Definition 2. We say that f is weakly Lipschitz with respect to if for all x, y ∈ X , f ∗ − f (y) ≤ f ∗ − f (x) + max f ∗ − f (x), (x, y) . (1) Note that weak Lipschitzness is satisfied whenever f is 1–Lipschitz, i.e., for all x, y ∈ X , one has |f (x) − f (y)| ≤ (x, y). On the other hand, weak Lipschitzness implies local (one-sided) 1–Lipschitzness at any maxima. Indeed, at an optimal arm x∗ (i.e., such that f (x∗ ) = f ∗ ), (1) rewrites to f (x∗ ) − f (y) ≤ (x∗ , y). However, weak Lipschitzness does not constraint the growth of the loss in the vicinity of other points. Further, weak Lipschitzness, unlike Lipschitzness, does not constraint the local decrease of the loss at any point. Thus, weak-Lipschitzness is a property that lies somewhere between a growth condition on the loss around optimal arms and (one-sided) Lipschitzness. Note that since weak Lipschitzness is defined with respect to a dissimilarity, it can actually capture higher-order smoothness at the optima. For example, f (x) = 1 − x2 is weak Lipschitz with the dissimilarity (x, y) = c(x − y)2 for some appropriate constant c. Assumption 2. The mean-payoff function f is weakly Lipschitz. ∗ ∗ Let fh,i = supx∈Ph,i f (x) and ∆h,i = f ∗ − fh,i be the suboptimality of node (h, i). We say that def a node (h, i) is optimal (respectively, suboptimal) if ∆h,i = 0 (respectively, ∆h,i > 0). Let Xε = { x ∈ X : f (x) ≥ f ∗ − ε } be the set of ε-optimal arms. The following result follows from the definitions; a proof can be found in the appendix. 4 Lemma 1. Let Assumption 1 and 2 hold. If the suboptimality ∆h,i of a region is bounded by cν1 ρh for some c > 0, then all arms in Ph,i are max{2c, c + 1}ν1 ρh -optimal. The last assumption is closely related to Assumption 2 of Auer et al. [2], who observed that the regret of a continuum-armed bandit algorithm should depend on how fast the volume of the sets of ε-optimal arms shrinks as ε → 0. Here, we capture this by defining a new notion, the near-optimality dimension of the mean-payoff function. The connection between these concepts, as well as the zooming dimension defined by Kleinberg et al. [7] will be further discussed in Section 7. Define the packing number P(X , , ε) to be the size of the largest packing of X with disjoint open balls of radius ε with respect to the dissimilarity .2 We now define the near-optimality dimension, which characterizes the size of the sets Xε in terms of ε, and then state our main result. Definition 3. For c > 0 and ε0 > 0, the (c, ε0 )–near-optimality dimension of f with respect to equals inf d ∈ [0, +∞) : ∃ C s.t. ∀ε ≤ ε0 , P Xcε , , ε ≤ C ε−d (2) (with the usual convention that inf ∅ = +∞). Theorem 1 (Main result). Let Assumptions 1 and 2 hold and assume that the (4ν1 /ν2 , ν2 )–near-optimality dimension of the considered environment is d < +∞. Then, for any d > d there exists a constant C(d ) such that for all n ≥ 1, ERn ≤ C(d ) n(d +1)/(d +2) ln n 1/(d +2) . Further, if the near-optimality dimension is achieved, i.e., the infimum is achieved in (2), then the result holds also for d = d. Remark 3. We can relax the weak-Lipschitz property by requiring it to hold only locally around the maxima. In fact, at the price of increased constants, the result continues to hold if there exists ε > 0 such that (1) holds for any x, y ∈ Xε . To show this we only need to carefully adapt the steps of the proof below. We omit the details from this extended abstract. 5 Analysis of the regret and proof of the main result We first state three lemmas, whose proofs can be found in the appendix. The proofs of Lemmas 3 and 4 rely on concentration-of-measure techniques, while that of Lemma 2 follows from a simple case study. Let us fix some path (0, 1), (1, i∗ ), . . . , (h, i∗ ), . . . , of optimal nodes, starting from the root. 1 h Lemma 2. Let (h, i) be a suboptimal node. Let k be the largest depth such that (k, i∗ ) is on the path from k the root to (h, i). Then we have n E Nh,i (n) ≤ u+ P Nh,i (t) > u and Uh,i (t) > f ∗ or Us,i∗ ≤ f ∗ for some s ∈ {k+1, . . . , t−1} s t=u+1 Lemma 3. Let Assumptions 1 and 2 hold. 1, P Uh,i (n) ≤ f ∗ ≤ n−3 . Then, for all optimal nodes and for all integers n ≥ Lemma 4. Let Assumptions 1 and 2 hold. Then, for all integers t ≤ n, for all suboptimal nodes (h, i) 8 ln such that ∆h,i > ν1 ρh , and for all integers u ≥ 1 such that u ≥ (∆h,i −νnρh )2 , one has P Uh,i (t) > 1 f ∗ and Nh,i (t) > u ≤ t n−4 . 2 Note that sometimes packing numbers are defined as the largest packing with disjoint open balls of radius ε/2, or, ε-nets. 5 . Taking u as the integer part of (8 ln n)/(∆h,i − ν1 ρh )2 , and combining the results of Lemma 2, 3, and 4 with a union bound leads to the following key result. Lemma 5. Under Assumptions 1 and 2, for all suboptimal nodes (h, i) such that ∆h,i > ν1 ρh , we have, for all n ≥ 1, 8 ln n 2 E[Nh,i (n)] ≤ + . (∆h,i − ν1 ρh )2 n We are now ready to prove Theorem 1. Proof. For the sake of simplicity we assume that the infimum in the definition of near-optimality is achieved. To obtain the result in the general case one only needs to replace d below by d > d in the proof below. First step. For all h = 1, 2, . . ., denote by Ih the nodes at depth h that are 2ν1 ρh –optimal, i.e., the nodes ∗ (h, i) such that fh,i ≥ f ∗ − 2ν1 ρh . Then, I is the union of these sets of nodes. Further, let J be the set of nodes that are not in I but whose parent is in I. We then denote by Jh the nodes in J that are located at depth h in the tree. Lemma 4 bounds the expected number of times each node (h, i) ∈ Jh is visited. Since ∆h,i > 2ν1 ρh , we get 8 ln n 2 E Nh,i (n) ≤ 2 2h + . ν1 ρ n Second step. We bound here the cardinality |Ih |, h > 0. If (h, i) ∈ Ih then since ∆h,i ≤ 2ν1 ρh , by Lemma 1 Ph,i ⊂ X4ν1 ρh . Since by Assumption 1, the sets (Ph,i ), for (h, i) ∈ Ih , contain disjoint balls of radius ν2 ρh , we have that |Ih | ≤ P ∪(h,i)∈Ih Ph,i , , ν2 ρh ≤ P X(4ν1 /ν2 ) ν2 ρh , , ν2 ρh ≤ C ν2 ρh −d , where we used the assumption that d is the (4ν1 /ν2 , ν2 )–near-optimality dimension of f (and C is the constant introduced in the definition of the near-optimality dimension). Third step. Choose η > 0 and let H be the smallest integer such that ρH ≤ η. We partition the infinite tree T into three sets of nodes, T = T1 ∪ T2 ∪ T3 . The set T1 contains nodes of IH and their descendants, T2 = ∪0≤h < 1 and then, by optimizing over ρH (the worst value being ρH ∼ ( ln n )−1/(d+2) ). 6 Minimax optimality The packing dimension of a set X is the smallest d such that there exists a constant k such that for all ε > 0, P X , , ε ≤ k ε−d . For instance, compact subsets of Rd (with non-empty interior) have a packing dimension of d whenever is a norm. If X has a packing dimension of d, then all environments have a near-optimality dimension less than d. The proof of the main theorem indicates that the constant C(d) only depends on d, k (of the definition of packing dimension), ν1 , ν2 , and ρ, but not on the environment as long as it is weakly Lipschitz. Hence, we can extract from it a distribution-free bound of the form O(n(d+1)/(d+2) ). In fact, this bound can be shown to be optimal as is illustrated by the theorem below, whose assumptions are satisfied by, e.g., compact subsets of Rd and if is some norm of Rd . The proof can be found in the appendix. Theorem 2. If X is such that there exists c > 0 with P(X , , ε) ≥ c ε−d ≥ 2 for all ε ≤ 1/4 then for all n ≥ 4d−1 c/ ln(4/3), all strategies ϕ are bound to suffer a regret of at least 2/(d+2) 1 1 c n(d+1)/(d+2) , 4 4 4 ln(4/3) where the supremum is taken over all environments with weakly Lipschitz payoff functions. sup E Rn (ϕ) ≥ 7 Discussion Several works [1; 6; 3; 2; 7] have considered continuum-armed bandits in Euclidean or metric spaces and provided upper- and lower-bounds on the regret for given classes of environments. Cope [3] derived a regret √ of O( n) for compact and convex subset of Rd and a mean-payoff function with unique minima and second order smoothness. Kleinberg [6] considered mean-payoff functions f on the real line that are H¨ lder with o degree 0 < α ≤ 1. The derived regret is Θ(n(α+1)/(α+2) ). Auer et al. [2] extended the analysis to classes of functions with only a local H¨ lder assumption around maximum (with possibly higher smoothness degree o 1+α−αβ α ∈ [0, ∞)), and derived the regret Θ(n 1+2α−αβ ), where β is such that the Lebesgue measure of ε-optimal 7 states is O(εβ ). Another setting is that of [7] who considered a metric space (X , ) and assumed that f is Lipschitz w.r.t. . The obtained regret is O(n(d+1)/(d+2) ) where d is the zooming dimension (defined similarly to our near-optimality dimension, but using covering numbers instead of packing numbers and the sets Xε \ Xε/2 ). When (X , ) is a metric space covering and packing numbers are equivalent and we may prove that the zooming dimension and near-optimality dimensions are equal. Our main contribution compared to [7] is that our weak-Lipschitz assumption, which is substantially weaker than the global Lipschitz assumption assumed in [7], enables our algorithm to work better in some common situations, such as when the mean-payoff function assumes a local smoothness whose order is larger than one. In order to relate all these results, let us consider a specific example: Let X = [0, 1]D and assume that the mean-reward function f is locally equivalent to a H¨ lder function with degree α ∈ [0, ∞) around any o maxima x∗ of f (the number of maxima is assumed to be finite): f (x∗ ) − f (x) = Θ(||x − x∗ ||α ) as x → x∗ . (3) This means that ∃c1 , c2 , ε0 > 0, ∀x, s.t. ||x − x∗ || ≤ ε0 , c1 ||x − x∗ ||α ≤ f (x∗ ) − f (x) ≤ c2 ||x − x∗ ||α . √ Under this assumption, the result of Auer et al. [2] shows that for D = 1, the regret is Θ( n) (since here √ β = 1/α). Our result allows us to extend the n regret rate to any dimension D. Indeed, if we choose our def dissimilarity measure to be α (x, y) = ||x − y||α , we may prove that f satisfies a locally weak-Lipschitz √ condition (as defined in Remark 3) and that the near-optimality dimension is 0. Thus our regret is O( n), i.e., the rate is independent of the dimension D. In comparison, since Kleinberg et al. [7] have to satisfy a global Lipschitz assumption, they can not use α when α > 1. Indeed a function globally Lipschitz with respect to α is essentially constant. Moreover α does not define a metric for α > 1. If one resort to the Euclidean metric to fulfill their requirement that f be Lipschitz w.r.t. the metric then the zooming dimension becomes D(α − 1)/α, while the regret becomes √ O(n(D(α−1)+α)/(D(α−1)+2α) ), which is strictly worse than O( n) and in fact becomes close to the slow rate O(n(D+1)/(D+2) ) when α is larger. Nevertheless, in the case of α ≤ 1 they get the same regret rate. In contrast, our result shows that under very weak constraints on the mean-payoff function and if the local behavior of the function around its maximum (or finite number of maxima) is known then global optimization √ suffers a regret of order O( n), independent of the space dimension. As an interesting sidenote let us also remark that our results allow different smoothness orders along different dimensions, i.e., heterogenous smoothness spaces. References [1] R. Agrawal. The continuum-armed bandit problem. SIAM J. Control and Optimization, 33:1926–1951, 1995. [2] P. Auer, R. Ortner, and Cs. Szepesv´ ri. Improved rates for the stochastic continuum-armed bandit problem. 20th a Conference on Learning Theory, pages 454–468, 2007. [3] E. Cope. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. Preprint, 2004. [4] P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In Proceedings of 23rd Conference on Uncertainty in Artificial Intelligence, 2007. [5] S. Gelly, Y. Wang, R. Munos, and O. Teytaud. Modification of UCT with patterns in Monte-Carlo go. Technical Report RR-6062, INRIA, 2006. [6] R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In 18th Advances in Neural Information Processing Systems, 2004. [7] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In Proceedings of the 40th ACM Symposium on Theory of Computing, 2008. [8] L. Kocsis and Cs. Szepesv´ ri. Bandit based Monte-Carlo planning. In Proceedings of the 15th European Conference a on Machine Learning, pages 282–293, 2006. 8
5 0.73951262 193 nips-2008-Regularized Co-Clustering with Dual Supervision
Author: Vikas Sindhwani, Jianying Hu, Aleksandra Mojsilovic
Abstract: By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surprisingly impressive performance improvements over traditional one-sided row clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classification algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces. Empirical results demonstrate the effectiveness and utility of our algorithms. 1
6 0.41628042 95 nips-2008-Grouping Contours Via a Related Image
7 0.41160122 136 nips-2008-Model selection and velocity estimation using novel priors for motion patterns
8 0.39182052 229 nips-2008-Syntactic Topic Models
9 0.38730624 17 nips-2008-Algorithms for Infinitely Many-Armed Bandits
10 0.38467449 164 nips-2008-On the Generalization Ability of Online Strongly Convex Programming Algorithms
11 0.3765707 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
12 0.37307099 217 nips-2008-Sparsity of SVMs that use the epsilon-insensitive loss
13 0.37272543 4 nips-2008-A Scalable Hierarchical Distributed Language Model
14 0.37093362 119 nips-2008-Learning a discriminative hidden part model for human action recognition
15 0.36562356 66 nips-2008-Dynamic visual attention: searching for coding length increments
16 0.36494443 133 nips-2008-Mind the Duality Gap: Logarithmic regret algorithms for online optimization
17 0.35806555 75 nips-2008-Estimating vector fields using sparse basis field expansions
18 0.35556775 177 nips-2008-Particle Filter-based Policy Gradient in POMDPs
19 0.35444549 44 nips-2008-Characteristic Kernels on Groups and Semigroups
20 0.35114369 99 nips-2008-High-dimensional support union recovery in multivariate regression