nips nips2010 nips2010-187 knowledge-graph by maker-knowledge-mining

187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization


Source: pdf

Author: Alper Ayvaci, Michalis Raptis, Stefano Soatto

Abstract: We tackle the problem of simultaneously detecting occlusions and estimating optical flow. We show that, under standard assumptions of Lambertian reflection and static illumination, the task can be posed as a convex minimization problem. Therefore, the solution, computed using efficient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We tackle the problem of simultaneously detecting occlusions and estimating optical flow. [sent-3, score-0.582]

2 We show that, under standard assumptions of Lambertian reflection and static illumination, the task can be posed as a convex minimization problem. [sent-4, score-0.078]

3 Therefore, the solution, computed using efficient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. [sent-5, score-0.556]

4 We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance. [sent-6, score-0.673]

5 1 Introduction Optical flow refers to the deformation of the domain of an image that results from ego- or scene motion. [sent-7, score-0.204]

6 It is, in general, different from the motion field, that is the projection onto the image plane of the spatial velocity of the scene [28], unless three conditions are satisfied: (a) Lambertian reflection, (b) constant illumination, and (c) constant visibility properties of the scene. [sent-8, score-0.454]

7 Most surfaces with benign reflectance properties (diffuse/specular) can be approximated as Lambertian almost everywhere under sparse illuminants (e. [sent-9, score-0.073]

8 In any case, widespread violation of Lambertian reflection does not enable correspondence [23], so we will embrace (a) as customary. [sent-12, score-0.046]

9 Similarly, (b) constant illumination is a reasonable assumption for ego-motion (the scene is not moving relative to the light source), and even for objects moving (slowly) relative to the light source. [sent-13, score-0.292]

10 1 Assumption (c) is the most critical, as it is needed for the motion field to be defined. [sent-14, score-0.285]

11 2 It is often taken for granted in the optical flow literature, because in the limit where two images are sampled infinitesimally close in time, there are no occluded regions, and one can focus solely on motion discontinuities. [sent-15, score-0.876]

12 Thus, most variational motion estimation approaches provide an estimate of a dense flow field at each location on the image domain, including occluded regions. [sent-16, score-0.735]

13 Alas, in occluded regions, the problem is not that optical flow is discontinuous, or forward-backward inconsistent; it is simply not defined. [sent-17, score-0.591]

14 Motion in occluded regions can be hallucinated; However, whatever motion is assigned to an occluded region cannot be validated from the data. [sent-18, score-0.962]

15 In defense of these methods, it can be argued that, even without taking the limit, for small parallax (slow-enough motion, or far-enough objects, or fast-enough temporal sampling) occluded areas are small. [sent-19, score-0.325]

16 However, small does not mean unimportant, as occlusions are critical to perception [8] and a key for developing representations for recognition [22]. [sent-20, score-0.284]

17 For this reason, we focus on issues of visibility in optical flow computation. [sent-21, score-0.345]

18 We show that forgoing assumption (c) and explicitly representing occlusions is not only conceptually correct, but also algorithmically advantageous, for the resulting optimization problem can be shown to become convex once occlusions are explicitly modeled. [sent-22, score-0.598]

19 Therefore, one can guarantee convergence to a globally 1 Assumption (b) is also made for convenience, as modeling illumination changes would require modeling reflectance, which significantly complicates the picture. [sent-23, score-0.162]

20 2 If the domain of an image portrays a portion of the scene that is not visible in another image, the two cannot be put into correspondence. [sent-24, score-0.21]

21 3), and test the resulting algorithm on benchmark datasets (sect. [sent-28, score-0.04]

22 1 Related Work The most common approach to handling occlusions in the optical flow literature is to define them as regions where forward and backwards motion estimates are inconsistent [19, 1]. [sent-33, score-1.005]

23 Most approaches return estimates of motion in the occluded regions, where they cannot be invalidated: As we have already pointed out, in an occluded region one cannot determine a motion field that maps one image onto another, because the scene is not visible in one of the two. [sent-34, score-1.403]

24 Some approaches [11, 4], while also exploiting motion symmetry, discount occlusions by weighting the data fidelity with a monotonically decreasing function. [sent-35, score-0.569]

25 An alternate approach [15, 14, 25] is to formulate joint motion estimation and occlusion detection in a discrete setting, where it is NPhard. [sent-37, score-0.909]

26 Another class of methods uses the motion estimation residual to classify a location as occluded or visible wither with a direct threshold on the residual [30] or with a more elaborate probabilistic model [24]. [sent-39, score-0.895]

27 2 Evaluation Optical flow estimation is a mature area of computer vision, and benchmark datasets have been developed, e. [sent-42, score-0.077]

28 Unfortunately, no existing benchmark provides ground truth for occluded regions, nor a scoring mechanism to evaluate occlusion detection performance. [sent-45, score-0.997]

29 Motion estimates are scored even in the occluded regions, where the data does not support them. [sent-46, score-0.344]

30 Since our primary goal is to detect occlusions, we have produced a new benchmark by taking a subset of the training data in the Middlebury dataset, and hand-labeled occluded regions. [sent-47, score-0.333]

31 We then use the same evaluation method of the Middlebury for the (ground truth) regions that are co-visible in at least two images. [sent-48, score-0.082]

32 Then, we provide a separate score for occlusion detection, in terms of precision-recall curves. [sent-50, score-0.482]

33 2 Joint Occlusion Detection and Optical Flow Estimation In this section, we show how the assumptions (a)-(b) can be used to formulate occlusion detection and optical flow estimation as a joint optimization problem. [sent-51, score-0.922]

34 Let I : D ⊂ R2 × R+ → R+ ; (x, t) → I(x, t) be a grayscale time-varying image defined on a domain D. [sent-52, score-0.106]

35 Under the assumptions (a)-(b), the relation between two consecutive frames in a video {I(x, t)}T is given by t=0 I(w(x, t), t + dt) + n(x, t), x ∈ D\Ω(t; dt) (1) ρ(x, t), x ∈ Ω(t; dt) . [sent-53, score-0.033]

36 where w : D × R+ → R2 ; x → w(x, t) = x + v(x, t) is the domain deformation mapping I(x, t) onto I(x, t + dt) everywhere except at occluded regions. [sent-54, score-0.425]

37 The occluded region Ω can change over time depending on the temporal sampling interval dt and is not necessarily simply-connected; so even if we call Ω the occluded region (singular), it is understood that it can be made of several disconnected portions. [sent-57, score-0.894]

38 Inside Ω, the image can take any value ρ : Ω × R+ → R+ that is in general unrelated to I(w(x), t + dt)|x∈Ω . [sent-58, score-0.058]

39 , the additive uncertainty is normally distributed in space and time with an isotropic and small variance λ > 0. [sent-64, score-0.027]

40 We define the residual e : D → R on the entire image domain x ∈ D, via n(x, t), x ∈ D\Ω ρ(x, t) − I(w(x, t), t + dt), . [sent-65, score-0.226]

41 e(x, t; dt) = I(x, t) − I(w(x, t), t + dt) = (3) x∈Ω which we can write as the sum of two terms, e1 : D → R and e2 : D → R, also defined on the entire domain D in such a way that . [sent-66, score-0.071]

42 We can then write, for any x ∈ D, I(x, t) = I(w(x, t), t + dt) + e1 (x, t; dt) + e2 (x, t; dt) (5) 4 4 and note that, because of (i) e1 is large but sparse, while because of (ii) e2 is small but dense . [sent-70, score-0.039]

43 We will use this as an inference criterion for w, seeking to optimize a data fidelity term that minimizes the number of nonzero elements of e1 (a proxy of the area of Ω), and the negative log-likelihood of n. [sent-71, score-0.026]

44 So, the data fidelity term depends on w but also on the characteristic function of the occlusion domain Ω. [sent-79, score-0.556]

45 5 For a sufficiently small dt, we can approximate, for any x ∈ D\Ω, I(x, t + dt) = I(x, t) + I(x, t)v(x, t) + n(x, t) (9) where the linearization error has been incorporated into the uncertainty term n(x, t). [sent-80, score-0.072]

46 (10) Since we typically do not know the variance λ of the process n, we will treat it as a tuning parameter, and because ψdata or λψdata yield the same minimizer, we have attributed the multiplier λ to the second term. [sent-82, score-0.028]

47 In a digital image, both domains D and Ω are discretized into a lattice, and dt is fixed. [sent-85, score-0.232]

48 Therefore, spatial and temporal derivative operators are approximated, typically, by first-order differences. [sent-86, score-0.075]

49 TV is desirable in the context of occlusion detection because it does not penalize motion discontinuities significantly. [sent-91, score-0.915]

50 The spatial derivative matrix A is given by A = [diag( x I) diag( y I) − I], where I is the M N × M N identity matrix, and the temporal derivative values {It (x, t)}x∈Λ are stacked into b. [sent-93, score-0.115]

51 For finiteu, u , u 0 = |{ui |ui = 0}| and u T V = dimensional vectors u ∈ RM N , u 2 = ((g1 )i (ui+1 − ui ))2 + ((g2 )i (ui+M − ui ))2 where g1 and g2 are the stacked versions of {g1 (x)}x∈Λ and {g2 (x)}x∈Λ . [sent-94, score-0.175]

52 When W is the identity, (14) becomes a standard convex relaxation of (13) and its globally optimal solution can be reached efficiently [27]. [sent-97, score-0.08]

53 The data term of the standard (unweighted) relaxation of (13) can be interpreted as a Huber norm [10]. [sent-100, score-0.047]

54 The model (9) is valid to the extent in which dt is sufficiently small relative to v (or v sufficiently slow relative to dt), so the linearization error does not alter the statistics of the residual n. [sent-102, score-0.374]

55 When this is not the case, remedies must be enacted to restore proper sampling conditions [22] and therefore differentiate contributions to the residual coming from sampling artifacts (aliasing), rather than occlusions. [sent-103, score-0.12]

56 This can be done by solving (14) in scale-space, as customary, with coarser scales used to initialize v1 , v2 so the increment is properly sampled, and the occlusion term e1 added at the finest ˆ ˆ scale. [sent-104, score-0.508]

57 The residual term e1 in (5) have been characterized in some literature as modeling illumination changes [21, 16, 26, 13]. [sent-105, score-0.279]

58 While sparsity is clearly motivated by (i), for illumination changes to be properly modeled, a reflectance function is necessary, which is absent in all models of the form (5) (see [23]. [sent-107, score-0.161]

59 Compute yk = [v1 , v2 , ek ]T − (1/L) ψ(v1 , v2 , ek ), 1 1 0 0 4. [sent-116, score-0.151]

60 Compute zk = [v1 , v2 , e0 ]T − (1/L) 1 k i=0 i i αi ψ(v1 , v2 , ei ), 1 k k 5. [sent-117, score-0.039]

61 We write ψ(v1 , v2 , e1 ) as ψ(v1 , v2 , e1 ) = ψ1 (v1 , v2 , e1 ) + λψ2 (e1 ) + µψ3 (v1 ) + µψ4 (v2 ), and compute the gradient of each term separately. [sent-121, score-0.049]

62 (18) We also need the Lipschitz constant L to compute the auxiliary variables yk and zk to minimize ψ. [sent-129, score-0.074]

63 4 Experiments To evaluate occlusion detection (Sect. [sent-136, score-0.587]

64 2), we start from [2] and generate occlusion maps as follows: for each training sequence, the residual computed from the given ground truth motion is used as a discriminant to determine ground truth occlusions, fixing obvious errors in the occlusion maps by hand. [sent-138, score-1.523]

65 We therefore restrict the evaluation of motion to the co-visible regions, and evaluate occlusion detection as a standard binary classification task. [sent-139, score-0.897]

66 We compare our algorithm to [29] and [14], the former is an example of robust motion estimation and the latter is a representative of the approaches described in Sect. [sent-140, score-0.322]

67 Note that no occlusion is present in the residual of the motion field computed by TV-L1, and subsequently the motion estimates are less precise around occluding boundaries (top-left corner of the Flower Garden, plane in the left in Venus). [sent-157, score-1.223]

68 ” The first column shows the motion estimates by TV-L1, color-coded as in [29], the second its residual I(x, t) − I(w(x), t + dt); the third shows our motion estimates, and the fourth our residual e1 defined in (14). [sent-159, score-0.861]

69 Other frames of the Flower Garden sequence are shown in Fig. [sent-160, score-0.033]

70 2, where we have regularized the occluded region by minimizing a unilateral energy on e1 with graph-cuts. [sent-161, score-0.327]

71 We have also compared Figure 2: Motion estimates for more frames of the Flower Garden sequence (left), residual e (middle), and occluded region (right). [sent-162, score-0.531]

72 motion estimates obtained with our method and [29] in the co-visible regions for the Middlebury dataset (Table 1). [sent-163, score-0.393]

73 Since occlusions can only be determined at the finest scale absent proper sampling conditions, in this experiment we minimize the same functional of [29] at coarse scales, and switch to (14) at the finest scale. [sent-164, score-0.312]

74 To evaluate occlusion detection performance, we again use the Middlebury, and compare e1 to ground truth occlusions using precision/recall curves (Fig. [sent-165, score-0.948]

75 We also show the improvement in detection performance when we use reweighted- 1 , in Table 2. [sent-167, score-0.105]

76 We have compared our occlusion detection results to [14], using the code provided online by the authors (Table 3). [sent-168, score-0.587]

77 Comparing motion estimates gives an unfair 6 AAE (ours) AAE (L1TV) AEPE (ours) AEPE (L1TV) Venus 4. [sent-169, score-0.336]

78 Average Angular Error (AAE) and Average End Point Error (AEPE) of motion estimates in co-visible regions. [sent-198, score-0.336]

79 5 recall Figure 3: Left to right: Representative samples of motion estimates from the Middlebury dataset, labeled ground-truth occlusions, error term estimate e1 , and precision-recall curves for our occlusion detection. [sent-307, score-0.888]

80 advantage to our algorithm because their approach is based on quantized disparity values, yielding lower accuracy. [sent-308, score-0.031]

81 80 Table 2: Average precision of our approach on Middlebury data with and without re-weighting. [sent-323, score-0.08]

82 provide a binary output, we display our precision at their same recall value. [sent-349, score-0.124]

83 5 Discussion We have presented an algorithm to detect occlusions and establish correspondence between two images. [sent-350, score-0.309]

84 It leverages on a formulation that, starting from standard assumptions (Lambertian reflection, constant diffuse illumination), arrives at a convex optimization problem. [sent-351, score-0.03]

85 Our approach does not assume a rigid scene, nor a single moving object. [sent-352, score-0.045]

86 It also does not assume that the occluded region is simply connected: Occlusions in natural scenes can be very complex (see Fig. [sent-353, score-0.327]

87 ” Note that in [6] the problem can be made convex only in e1 , but not jointly in e1 and v. [sent-356, score-0.03]

88 We focus on inter-frame occlusion detection; temporal consistency of occlusion “layers” was addressed in [12]. [sent-357, score-0.996]

89 Similarly, µ is a parameter that, like in any classification problem, trades off missed detections and false alarms, and therefore no single value is “optimal” in any meaningful sense. [sent-360, score-0.032]

90 These limitations are shared by most variational optical flow estimation algorithms. [sent-361, score-0.358]

91 Symmetrical dense optical flow estimation with a occlusions detection. [sent-368, score-0.658]

92 Variational stereo vision with sharp discontinuities and occlusion handling. [sent-388, score-0.601]

93 Algorithms for finding global minimizers of denoising and segmentation models. [sent-401, score-0.031]

94 Algorithms and software for total variation image reconstruction via first-order methods. [sent-410, score-0.058]

95 Dynamic shape and appearance modeling via moving and deforming layers. [sent-437, score-0.045]

96 Estimation of occlusion and dense motion fields in a bidirectional Bayesian framework. [sent-455, score-0.806]

97 Revised definition of optical flow: Integration of radiometric and geometric cues for dynamic scene analysis. [sent-459, score-0.362]

98 A method for unconstrained convex minimization problem with the rate of convergence O (1/k2). [sent-463, score-0.055]

99 Determination of optical flow and its discontinuities using a non-linear diffusion. [sent-473, score-0.341]

100 A probabilistic approach to large displacement optical flow and occlusion detection. [sent-505, score-0.814]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('occlusion', 0.482), ('optical', 0.298), ('occluded', 0.293), ('motion', 0.285), ('occlusions', 0.284), ('dt', 0.208), ('ow', 0.185), ('middlebury', 0.144), ('flower', 0.124), ('lambertian', 0.124), ('residual', 0.12), ('venus', 0.114), ('illumination', 0.112), ('detection', 0.105), ('garden', 0.103), ('ection', 0.103), ('tv', 0.086), ('hydrangea', 0.082), ('precision', 0.08), ('aepe', 0.07), ('ayvaci', 0.07), ('ui', 0.068), ('nesterov', 0.064), ('scene', 0.064), ('ectance', 0.062), ('aae', 0.062), ('rubberwhale', 0.062), ('image', 0.058), ('ek', 0.058), ('regions', 0.057), ('nest', 0.057), ('soatto', 0.057), ('estimates', 0.051), ('everywhere', 0.05), ('delity', 0.048), ('vision', 0.048), ('domain', 0.048), ('customary', 0.047), ('raptis', 0.047), ('visibility', 0.047), ('yezzi', 0.047), ('linearization', 0.046), ('moving', 0.045), ('lattice', 0.045), ('recall', 0.044), ('discontinuities', 0.043), ('benchmark', 0.04), ('visible', 0.04), ('truth', 0.04), ('dense', 0.039), ('stacked', 0.039), ('zk', 0.039), ('ground', 0.037), ('estimation', 0.037), ('diag', 0.036), ('eld', 0.036), ('unde', 0.035), ('yk', 0.035), ('region', 0.034), ('displacement', 0.034), ('deformation', 0.034), ('reg', 0.034), ('huber', 0.034), ('frames', 0.033), ('discontinuous', 0.032), ('trades', 0.032), ('temporal', 0.032), ('re', 0.031), ('segmentation', 0.031), ('quantized', 0.031), ('convex', 0.03), ('inconsistent', 0.03), ('kolmogorov', 0.03), ('candes', 0.029), ('globally', 0.029), ('multiplier', 0.028), ('absent', 0.028), ('stereo', 0.028), ('visual', 0.028), ('isotropic', 0.027), ('term', 0.026), ('objects', 0.026), ('minimization', 0.025), ('correspondence', 0.025), ('quantization', 0.025), ('evaluation', 0.025), ('iv', 0.025), ('gt', 0.025), ('digital', 0.024), ('approximated', 0.023), ('posed', 0.023), ('write', 0.023), ('variational', 0.023), ('derivative', 0.022), ('enable', 0.021), ('operators', 0.021), ('changes', 0.021), ('relaxation', 0.021), ('rm', 0.021), ('scheme', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

Author: Alper Ayvaci, Michalis Raptis, Stefano Soatto

Abstract: We tackle the problem of simultaneously detecting occlusions and estimating optical flow. We show that, under standard assumptions of Lambertian reflection and static illumination, the task can be posed as a convex minimization problem. Therefore, the solution, computed using efficient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance. 1

2 0.35654545 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

Author: Deqing Sun, Erik B. Sudderth, Michael J. Black

Abstract: Layered models are a powerful way of describing natural scenes containing smooth surfaces that may overlap and occlude each other. For image motion estimation, such models have a long history but have not achieved the wide use or accuracy of non-layered methods. We present a new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches. In particular, we define a probabilistic graphical model that explicitly captures: 1) occlusions and disocclusions; 2) depth ordering of the layers; 3) temporal consistency of the layer segmentation. Additionally the optical flow in each layer is modeled by a combination of a parametric model and a smooth deviation based on an MRF with a robust spatial prior; the resulting model allows roughness in layers. Finally, a key contribution is the formulation of the layers using an imagedependent hidden field prior based on recent models for static scene segmentation. The method achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.

3 0.23476093 20 nips-2010-A unified model of short-range and long-range motion perception

Author: Shuang Wu, Xuming He, Hongjing Lu, Alan L. Yuille

Abstract: The human vision system is able to effortlessly perceive both short-range and long-range motion patterns in complex dynamic scenes. Previous work has assumed that two different mechanisms are involved in processing these two types of motion. In this paper, we propose a hierarchical model as a unified framework for modeling both short-range and long-range motion perception. Our model consists of two key components: a data likelihood that proposes multiple motion hypotheses using nonlinear matching, and a hierarchical prior that imposes slowness and spatial smoothness constraints on the motion field at multiple scales. We tested our model on two types of stimuli, random dot kinematograms and multiple-aperture stimuli, both commonly used in human vision research. We demonstrate that the hierarchical model adequately accounts for human performance in psychophysical experiments.

4 0.21113299 98 nips-2010-Functional form of motion priors in human motion perception

Author: Hongjing Lu, Tungyou Lin, Alan Lee, Luminita Vese, Alan L. Yuille

Abstract: It has been speculated that the human motion system combines noisy measurements with prior expectations in an optimal, or rational, manner. The basic goal of our work is to discover experimentally which prior distribution is used. More specifically, we seek to infer the functional form of the motion prior from the performance of human subjects on motion estimation tasks. We restricted ourselves to priors which combine three terms for motion slowness, first-order smoothness, and second-order smoothness. We focused on two functional forms for prior distributions: L2-norm and L1-norm regularization corresponding to the Gaussian and Laplace distributions respectively. In our first experimental session we estimate the weights of the three terms for each functional form to maximize the fit to human performance. We then measured human performance for motion tasks and found that we obtained better fit for the L1-norm (Laplace) than for the L2-norm (Gaussian). We note that the L1-norm is also a better fit to the statistics of motion in natural environments. In addition, we found large weights for the second-order smoothness term, indicating the importance of high-order smoothness compared to slowness and lower-order smoothness. To validate our results further, we used the best fit models using the L1-norm to predict human performance in a second session with different experimental setups. Our results showed excellent agreement between human performance and model prediction – ranging from 3% to 8% for five human subjects over ten experimental conditions – and give further support that the human visual system uses an L1-norm (Laplace) prior.

5 0.11680168 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

6 0.085484281 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

7 0.083825313 208 nips-2010-Policy gradients in linearly-solvable MDPs

8 0.08069627 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

9 0.071231008 149 nips-2010-Learning To Count Objects in Images

10 0.067023419 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

11 0.06411615 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

12 0.062351607 101 nips-2010-Gaussian sampling by local perturbations

13 0.06054344 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

14 0.056439258 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

15 0.05613796 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

16 0.054768786 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures

17 0.052883722 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

18 0.052535459 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

19 0.052175868 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior

20 0.051537868 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.163), (1, 0.046), (2, -0.142), (3, -0.031), (4, -0.005), (5, -0.162), (6, -0.066), (7, 0.086), (8, 0.006), (9, 0.036), (10, 0.113), (11, -0.32), (12, -0.068), (13, 0.167), (14, 0.05), (15, 0.215), (16, -0.124), (17, 0.055), (18, 0.015), (19, -0.078), (20, -0.075), (21, -0.011), (22, -0.147), (23, -0.032), (24, 0.017), (25, -0.072), (26, 0.036), (27, 0.057), (28, -0.017), (29, 0.04), (30, -0.057), (31, 0.018), (32, 0.011), (33, -0.055), (34, -0.123), (35, 0.033), (36, -0.004), (37, 0.038), (38, 0.008), (39, 0.123), (40, -0.027), (41, -0.015), (42, -0.09), (43, -0.061), (44, -0.041), (45, -0.012), (46, -0.11), (47, 0.034), (48, -0.034), (49, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96461862 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

Author: Alper Ayvaci, Michalis Raptis, Stefano Soatto

Abstract: We tackle the problem of simultaneously detecting occlusions and estimating optical flow. We show that, under standard assumptions of Lambertian reflection and static illumination, the task can be posed as a convex minimization problem. Therefore, the solution, computed using efficient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance. 1

2 0.9076364 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

Author: Deqing Sun, Erik B. Sudderth, Michael J. Black

Abstract: Layered models are a powerful way of describing natural scenes containing smooth surfaces that may overlap and occlude each other. For image motion estimation, such models have a long history but have not achieved the wide use or accuracy of non-layered methods. We present a new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches. In particular, we define a probabilistic graphical model that explicitly captures: 1) occlusions and disocclusions; 2) depth ordering of the layers; 3) temporal consistency of the layer segmentation. Additionally the optical flow in each layer is modeled by a combination of a parametric model and a smooth deviation based on an MRF with a robust spatial prior; the resulting model allows roughness in layers. Finally, a key contribution is the formulation of the layers using an imagedependent hidden field prior based on recent models for static scene segmentation. The method achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.

3 0.71869397 20 nips-2010-A unified model of short-range and long-range motion perception

Author: Shuang Wu, Xuming He, Hongjing Lu, Alan L. Yuille

Abstract: The human vision system is able to effortlessly perceive both short-range and long-range motion patterns in complex dynamic scenes. Previous work has assumed that two different mechanisms are involved in processing these two types of motion. In this paper, we propose a hierarchical model as a unified framework for modeling both short-range and long-range motion perception. Our model consists of two key components: a data likelihood that proposes multiple motion hypotheses using nonlinear matching, and a hierarchical prior that imposes slowness and spatial smoothness constraints on the motion field at multiple scales. We tested our model on two types of stimuli, random dot kinematograms and multiple-aperture stimuli, both commonly used in human vision research. We demonstrate that the hierarchical model adequately accounts for human performance in psychophysical experiments.

4 0.64831179 98 nips-2010-Functional form of motion priors in human motion perception

Author: Hongjing Lu, Tungyou Lin, Alan Lee, Luminita Vese, Alan L. Yuille

Abstract: It has been speculated that the human motion system combines noisy measurements with prior expectations in an optimal, or rational, manner. The basic goal of our work is to discover experimentally which prior distribution is used. More specifically, we seek to infer the functional form of the motion prior from the performance of human subjects on motion estimation tasks. We restricted ourselves to priors which combine three terms for motion slowness, first-order smoothness, and second-order smoothness. We focused on two functional forms for prior distributions: L2-norm and L1-norm regularization corresponding to the Gaussian and Laplace distributions respectively. In our first experimental session we estimate the weights of the three terms for each functional form to maximize the fit to human performance. We then measured human performance for motion tasks and found that we obtained better fit for the L1-norm (Laplace) than for the L2-norm (Gaussian). We note that the L1-norm is also a better fit to the statistics of motion in natural environments. In addition, we found large weights for the second-order smoothness term, indicating the importance of high-order smoothness compared to slowness and lower-order smoothness. To validate our results further, we used the best fit models using the L1-norm to predict human performance in a second session with different experimental setups. Our results showed excellent agreement between human performance and model prediction – ranging from 3% to 8% for five human subjects over ten experimental conditions – and give further support that the human visual system uses an L1-norm (Laplace) prior.

5 0.48072109 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

6 0.47814697 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake

7 0.35577786 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

8 0.33943436 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures

9 0.33805948 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

10 0.32342672 234 nips-2010-Segmentation as Maximum-Weight Independent Set

11 0.32292363 171 nips-2010-Movement extraction by detecting dynamics switches and repetitions

12 0.29730988 208 nips-2010-Policy gradients in linearly-solvable MDPs

13 0.28625646 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

14 0.28218833 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

15 0.27439153 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

16 0.27430639 149 nips-2010-Learning To Count Objects in Images

17 0.27023017 9 nips-2010-A New Probabilistic Model for Rank Aggregation

18 0.25803077 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

19 0.25216639 256 nips-2010-Structural epitome: a way to summarize one’s visual experience

20 0.24703164 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.023), (17, 0.019), (27, 0.039), (30, 0.023), (35, 0.029), (45, 0.69), (50, 0.03), (52, 0.016), (60, 0.012), (77, 0.026), (90, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99907881 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

Author: Alper Ayvaci, Michalis Raptis, Stefano Soatto

Abstract: We tackle the problem of simultaneously detecting occlusions and estimating optical flow. We show that, under standard assumptions of Lambertian reflection and static illumination, the task can be posed as a convex minimization problem. Therefore, the solution, computed using efficient algorithms, is guaranteed to be globally optimal, for any number of independently moving objects, and any number of occlusion layers. We test the proposed algorithm on benchmark datasets, expanded to enable evaluation of occlusion detection performance. 1

2 0.99863625 235 nips-2010-Self-Paced Learning for Latent Variable Models

Author: M. P. Kumar, Benjamin Packer, Daphne Koller

Abstract: Latent variable models are a powerful tool for addressing several tasks in machine learning. However, the algorithms for learning the parameters of latent variable models are prone to getting stuck in a bad local optimum. To alleviate this problem, we build on the intuition that, rather than considering all samples simultaneously, the algorithm should be presented with the training data in a meaningful order that facilitates learning. The order of the samples is determined by how easy they are. The main challenge is that often we are not provided with a readily computable measure of the easiness of samples. We address this issue by proposing a novel, iterative self-paced learning algorithm where each iteration simultaneously selects easy samples and learns a new parameter vector. The number of samples selected is governed by a weight that is annealed until the entire training data has been considered. We empirically demonstrate that the self-paced learning algorithm outperforms the state of the art method for learning a latent structural SVM on four applications: object localization, noun phrase coreference, motif finding and handwritten digit recognition. 1

3 0.99722803 167 nips-2010-Mixture of time-warped trajectory models for movement decoding

Author: Elaine Corbett, Eric Perreault, Konrad Koerding

Abstract: Applications of Brain-Machine-Interfaces typically estimate user intent based on biological signals that are under voluntary control. For example, we might want to estimate how a patient with a paralyzed arm wants to move based on residual muscle activity. To solve such problems it is necessary to integrate obtained information over time. To do so, state of the art approaches typically use a probabilistic model of how the state, e.g. position and velocity of the arm, evolves over time – a so-called trajectory model. We wanted to further develop this approach using two intuitive insights: (1) At any given point of time there may be a small set of likely movement targets, potentially identified by the location of objects in the workspace or by gaze information from the user. (2) The user may want to produce movements at varying speeds. We thus use a generative model with a trajectory model incorporating these insights. Approximate inference on that generative model is implemented using a mixture of extended Kalman filters. We find that the resulting algorithm allows us to decode arm movements dramatically better than when we use a trajectory model with linear dynamics. 1 In trod u cti on When patients have lost a limb or the ability to communicate with the outside world, brain machine interfaces (BMIs) are often used to enable robotic prostheses or restore communication. To achieve this, the user's intended state of the device must be decoded from biological signals. In the context of Bayesian statistics, two aspects are important for the design of an estimator of a temporally evolving state: the observation model, which describes how measured variables relate to the system’s state and the trajectory model which describes how the state changes over time in a probabilistic manner. Following this logic many recent BMI applications have relied on Bayesian estimation for a wide range of problems including the decoding of intended human [1] and animal [2] movements. In the context of BMIs, Bayesian approaches offer a principled way of formalizing the uncertainty about signals and thus often result in improvements over other signal processing techniques [1]-[3]. Most work on state estimation in dynamical systems has assumed linear dynamics and Gaussian noise. Under these circumstances, efficient algorithms result from belief propagation. The most frequent application uses the Kalman filter (KF), which recursively combines noisy state observations with the probabilistic evolution of state defined by the trajectory model to estimate the marginal distribution over states [4]. Such approaches have been used widely for applications including upper [1] and lower [5] extremity prosthetic 1 devices, functional electric stimulation [6] and human computer interactions [7]. As these algorithms are so commonly used, it seems promising to develop extensions to nonlinear trajectory models that may better describe the probabilistic distribution of movements in everyday life. One salient departure from the standard assumptions is that people tend to produce both slow and fast movements, depending on the situation. Models with linear dynamics only allow such deviation through the noise term, which makes these models poor at describing the natural variation of movement speeds during real world tasks. Explicitly incorporating movement speed into the trajectory model should lead to better movement estimates. Knowledge of the target position should also strongly affect trajectory models. After all , we tend to accelerate our arm early during movement and slow down later on. Target information can be linearly incorporated into the trajectory model, and this has greatly improved predictions [8]-[12]. Alternatively, if there are a small number of potential targets then a mixture of trajectory models approach [13] can be used. Here we are interested in the case where available data provide a prior over potential t argets but where movement targets may be anywhere. We want to incorporate target uncertainty and allow generalization to novel targets. Prior information about potential targets could come from a number of sources but would generally be noisy. For example, activity in the dorsal premotor cortex provides information about intended target location prior to movement and may be used where such recordings are available [14]. Target information may also be found noninvasively by tracking eye movements. However, such data will generally provide non-zero priors for a number of possible target locations as the subject saccades over the scene. While subjects almost always look at a target before reaching for it [15], there may be a delay of up to a second between looking at the target and the reach – a time interval over which up to 3 saccades are typically made. Each of these fixations could be the target. Hence, a probabilistic distribution of targets is appropriate when using either neural recordings or eye tracking to estimate potential reach targets Here we present an algorithm that uses a mixture of extended Kalman Filters (EKFs) to combine our insights related to the variation of movement speed and the availability of probabilistic target knowledge. Each of the mixture component s allows the speed of the movement to vary continuously over time. We tested how well we could use EMGs and eye movements to decode hand position of humans performing a three -dimensional large workspace reaching task. We find that using a trajectory model that allows for probabilistic target information and variation of speed leads to dramatic improvements in decoding quality. 2 Gen e ral Decod i n g S etti n g We wanted to test how well different decoding algorithms can decode human movement, over a wide range of dynamics. While many recent studies have looked at more restrictive, two-dimensional movements, a system to restore arm function should produce a wide range of 3D trajectories. We recorded arm kinematics and EMGs of healthy subjects during unconstrained 3D reaches to targets over a large workspace. Two healthy subjects were asked to reach at slow, normal and fast speeds, as they would in everyday life. Subjects were seated as they reached towards 16 LEDs in blocks of 150s, which were located on two planes positioned such that all targets were just reachable (Fig 1A). The target LED was lit for one second prior to an auditory go cue, at which time the subject would reach to the target at the appropriate speed. Slow, normal and fast reaches were allotted 3 s, 1.5s and 1s respectively; however, subjects determined the speed. An approximate total of 450 reaches were performed per subject. The subjects provided informed consent, and the protocol was approved by the Northwestern University Institutional Review Board. EMG signals were measured from the pectoralis major, and the three deltoid muscles of the shoulder. This represents a small subset of the muscles involved in reaching, and approximates those muscles retaining some voluntary control following mid-level cervical spinal cord injuries. 2 The EMG signals were band-pass filtered between 10 and 1,000 Hz, and subsequently anti aliased filtered. Hand, wrist, shoulder and head positions were tracked using an Optotrak motion analysis system. We simultaneously recorded eye movements with an ASL EYETRAC-6 head mounted eye tracker. Approximately 25% of the reaches were assigned to the test set, and the rest were used for training. Reaches for which either the motion capture data was incomplete, or there was visible motion artifact on the EMG were removed. As the state we used hand positions and joint angles (3 shoulder, 2 elbow, position, velocity and acceleration, 24 dimensions). Joint angles were calculated from the shoulder and wrist marker data using digitized bony landmarks which defined a coordinate system for the upper limb as detailed by Wu et al. [16]. As the motion data were sampled at 60Hz, the mean absolute value o f the EMG in the corresponding 16.7ms windows was used as an observation of the state at each time-step. Algorithm accuracy was quantified by normalizing the root -mean-squared error by the straight line distance between the first and final position of the endpoint for each reach. We compared the algorithms statistically using repeated measures ANOVAs with Tukey post -hoc tests, treating reach and subject as random effects. In the rest of the paper we will ask how well these reaching movements can be decoded from EMG and eye-tracking data. Figure 1: A Experimental setup and B sample kinematics and processed EMGs for one reach 3 Kal man Fi l ters w i th Target i n f ormati on All models that we consider in this paper assume linear observations with Gaussian noise: (1) where x is the state, y is the observation and v is the measurement noise with p(v) ~ N(0,R), and R is the observation covariance matrix. The model fitted the measured EMGs with an average r2 of 0.55. This highlights the need to integrate information over time. The standard approach also assumes linear dynamics and Gaussian process noise: (2) where, x t represents the hand and joint angle positions, w is the process noise with p(w) ~ N(0,Q), and Q is the state covariance matrix. The Kalman filter does optimal inference for this generative model. This model can effectively capture the dynamics of stereotypical reaches to a single target by appropriately tuning its parameters. However, when used to describe reaches to multiple targets, the model cannot describe target dependent aspects of reaching but boils down to a random drift model. Fast velocities are underestimated as they are unlikely under the trajectory model and there is excessive drift close to the target (Fig. 2A). 3 In many decoding applications we may know the subject’s target. A range of recent studies have addressed the issue of incorporating this information into the trajectory model [8, 13], and we might assume the effect of the target on the dynamics to be linear. This naturally suggests adding the target to the state space, which works well in practice [9, 12]. By appending the target to the state vector (KFT), the simple linear format of the KF may be retained: (3) where xTt is the vector of target positions, with dimensionality less than or equal to that of xt. This trajectory model thus allows describing both the rapid acceleration that characterizes the beginning of a reach and the stabilization towards its end. We compared the accuracy of the KF and the KFT to the Single Target Model (STM), a KF trained only on reaches to the target being tested (Fig. 2). The STM represents the best possible prediction that could be obtained with a Kalman filter. Assuming the target is perfectly known, we implemented the KFT by correctly initializing the target state xT at the beginning of the reach. We will relax this assumption below. The initial hand and joint angle positions were also assumed to be known. Figure 2: A Sample reach and predictions and B average accuracies with standard errors for KFT, KF and MTM. Consistent with the recent literature, both methods that incorporated target information produced higher prediction accuracy than the standard KF (both p<0.0001). Interestingly, there was no significant difference between the KFT and the STM (p=0.9). It seems that when we have knowledge of the target, we do not lose much by training a single model over the whole workspace rather than modeling the targets individually. This is encouraging, as we desire a BMI system that can generalize to any target within the workspace, not just specifically to those that are available in the training data. Clearly, adding the target to the state space allows the dynamics of typical movements to be modeled effectively, resulting in dramatic increases in decoding performance. 4 Ti me Warp i n g 4.1 I m p l e m e n t i n g a t i m e - w a r p e d t r a j e c t o r y mo d e l While the KFT above can capture the general reach trajectory profile, it does not allow for natural variability in the speed of movements. Depending on our task objectives, which would not directly be observed by a BMI, we might lazily reach toward a target or move a t maximal speed. We aim to change the trajectory model to explicitly incorporate a warping factor by which the average movement speed is scaled, allowing for such variability. As the movement speed will be positive in all practical cases, we model the logarithm of this factor, 4 and append it to the state vector: (4) We create a time-warped trajectory model by noting that if the average rate of a trajectory is to be scaled by a factor S, the position at time t will equal that of the original trajectory at time St. Differentiating, the velocity will be multiplied by S, and the acceleration by S 2. For simplicity, the trajectory noise is assumed to be additive and Gaussian, and the model is assumed to be stationary: (5) where Ip is the p-dimensional identity matrix and is a p p matrix of zeros. Only the terms used to predict the acceleration states need to be estimated to build the state transition matrix, and they are scaled as a nonlinear function of xs. After adding the variable movement speed to the state space the system is no longer linear. Therefore we need a different solution strategy. Instead of the typical KFT we use the Extended Kalman Filter (EKFT) to implement a nonlinear trajectory model by linearizing the dynamics around the best estimate at each time-step [17]. With this approach we add only small computational overhead to the KFT recursions. 4.2 Tr a i n i n g t h e t i m e w a r p i n g mo d e l The filter parameters were trained using a variant of the Expectation Maximization (EM) algorithm [18]. For extended Kalman filter learning the initialization for the variables may matter. S was initialized with the ground truth average reach speeds for each movement relative to the average speed across all movements. The state transition parameters were estimated using nonlinear least squares regression, while C, Q and R were estimated linearly for the new system, using the maximum likelihood solution [18] (M-step). For the E-step we used a standard extended Kalman smoother. We thus found the expected values for t he states given the current filter parameters. For this computation, and later when testing the algorithm, xs was initialized to its average value across all reaches while the remaining states were initialized to their true values. The smoothed estimate fo r xs was then used, along with the true values for the other states, to re-estimate the filter parameters in the M-step as before. We alternated between the E and M steps until the log likelihood converged (which it did in all cases). Following the training procedure, the diagonal of the state covariance matrix Q corresponding to xs was set to the variance of the smoothed xs over all reaches, according to how much this state should be allowed to change during prediction. This allowed the estimate of xs to develop over the course of the reach due to the evidence provided by the observations, better capturing the dynamics of reaches at different speeds. 4.3 P e r f o r ma n c e o f t h e t i m e - w a r p e d E K F T Incorporating time warping explicitly into the trajectory model pro duced a noticeable increase in decoding performance over the KFT. As the speed state xs is estimated throughout the course of the reach, based on the evidence provided by the observations, the trajectory model has the flexibility to follow the dynamics of the reach more accurately (Fig. 3). While at the normal self-selected speed the difference between the algorithms is small, for the slow and fast speeds, where the dynamics deviate from average, there i s a clear advantage to the time warping model. 5 Figure 3: Hand positions and predictions of the KFT and EKFT for sample reaches at A slow, B normal and C fast speeds. Note the different time scales between reaches. The models were first trained using data from all speeds (Fig. 4A). The EKFT was 1.8% more accurate on average (p<0.01), and the effect was significant at the slow (1.9%, p<0.05) and the fast (2.8%, p<0.01), but not at the normal (p=0.3) speed. We also trained the models from data using only reaches at the self-selected normal speed, as we wanted to see if there was enough variation to effectively train the EKFT (Fig. 4B). Interestingly, the performance of the EKFT was reduced by only 0.6%, and the KFT by 1.1%. The difference in performance between the EKFT and KFT was even more pronounced on aver age (2.3%, p<0.001), and for the slow and fast speeds (3.6 and 4.1%, both p< 0.0001). At the normal speed, the algorithms again were not statistically different (p=0.6). This result demonstrates that the EKFT is a practical option for a real BMI system, as it is not necessary to greatly vary the speeds while collecting training data for the model to be effective over a wide range of intended speeds. Explicitly incorporating speed information into the trajectory model helps decoding, by modeling the natural variation in volitional speed. Figure 4: Mean and standard error of EKFT and KFT accuracy at the different subjectselected speeds. Models were trained on reaches at A all speeds and B just normal speed reaches. Asterisks indicate statistically significant differences between the algorithms. 5 Mi xtu res of Target s So far, we have assumed that the targets of our reaches are perfectly known. In a real-world system, there will be uncertainty about the intended target of the reach. However, in typical applications there are a small number of possible objectives. Here we address this situation. Drawing on the recent literature, we use a mixture model to consider each of the possible targets [11, 13]. We condition the posterior probability for the state on the N possible targets, T: (6) 6 Using Bayes' Rule, this equation becomes: (7) As we are dealing with a mixture model, we perform the Kalman filter recursion for each possible target, xT, and our solution is a weighted sum of the outputs. The weights are proportional to the prior for that target, , and the likelihood of the model given that target . is independent of the target and does not need to be calculated. We tested mixtures of both algorithms, the mKFT and mEKFT, with real uncert ain priors obtained from eye-tracking in the one-second period preceding movement. As the targets were situated on two planes, the three-dimensional location of the eye gaze was found by projecting its direction onto those planes. The first, middle and last eye samples were selected, and all other samples were assigned to a group according to which of the three was closest. The mean and variance of these three groups were used to initialize three Kalman filters in the mixture model. The priors of the three groups were assigned proportional to the number of samples in them. If the subject looks at multiple positions prior to reaching, this method ensures with a high probability that the correct target was accounted for in one of the filters in the mixture. We also compared the MTM approach of Yu et al. [13], where a different KF model was generated for each target, and a mixture is performed over these models. This approach explicitly captures the dynamics of stereotypical reaches to specific targets. Given perfect target information, it would reduce to the STM described above. Priors for the MTM were found by assigning each valid eye sample to its closest two targets, and weighting the models proportional to the number of samples assigned to the corresponding target, divided by its distance from the mean of those samples. We tried other ways of assigning priors and the one presented gave the best results. We calculated the reduction in decoding quality when instead of perfect priors we provide eye-movement based noisy priors (Fig. 5). The accuracies of the mEKFT, the mKFT and the MTM were only degraded by 0.8, 1.9 and 2.1% respectively, compared to the perfect prior situation. The mEKFT was still close to 10% better than the KF. The mixture model framework is effective in accounting for uncertain priors. Figure 5: Mean and standard errors of accuracy for algorithms with perfect priors, and uncertain priors with full and partial training set. The asterisk indicates a statistically significant effects between the two training types, where real priors are used. Here, only reaches at normal speed were used to train the models, as this is a more realistic training set for a BMI application. This accounts for the degraded performance of the MTM with perfect priors relative to the STM from above (Fig. 2). With even more stereotyped training data for each target, the MTM doesn't generalize as well to new speeds. 7 We also wanted to know if the algorithms could generalize to new targets. In a real application, the available training data will generally not span the entire useable worksp ace. We compared the algorithms where reaches to all targets except the one being tested had been used to train the models. The performance of the MTM was significantly de graded unsurprisingly, as it was designed for reaches to a set of known targets. Performance of the mKFT and mEKFT degraded by about 1%, but not significantly (both p>0.7), demonstrating that the continuous approach to target information is preferable when the target could be anywhere in space, not just at locations for which training data is available. 6 Di scu ssi on and concl u si on s The goal of this work was to design a trajectory model that would improve decoding for BMIs with an application to reaching. We incorporated two features that prominently influence the dynamics of natural reach: the movement speed and the target location. Our approach is appropriate where uncertain target information is available. The model generalizes well to new regions of the workspace for which there is no training data, and across a broad range of reaching dynamics to widely spaced targets in three dimensions. The advantages over linear models in decoding precision we report here could be equally obtained using mixtures over many targets and speeds. While mixture models [11, 13] could allow for slow versus fast movements and any number of potential targets, this strategy will generally require many mixture components. Such an approach would require a lot more training data, as we have shown that it does not generalize well. It would also be run-time intensive which is problematic for prosthetic devices that rely on low power controllers. In contrast, the algorithm introduced here only takes a small amount of additional run-time in comparison to the standard KF approach. The EKF is only marginally slower than the standard KF and the algorithm will not generally need to consider more than 3 mixture components assuming the subject fixates the target within the second pre ceding the reach. In this paper we assumed that subjects always would fixate a reach target – along with other non-targets. While this is close to the way humans usually coordinate eyes and reaches [15], there might be cases where people do not fixate a reach target. Our approach could be easily extended to deal with such situations by adding a dummy mixture component that all ows the description of movements to any target. As an alternative to mixture approaches, a system can explicitly estimate the target position in the state vector [9]. This approach, however, would not straightforwardly allow for the rich target information available; we look at the target but also at other locations, strongly suggesting mixture distributions. A combination of the two approaches could further improve decoding quality. We could both estimate speed and target position for the EKFT in a continuous manner while retaining the mixture over target priors. We believe that the issues that we have addressed here are almost universal. Virtually all types of movements are executed at varying speed. A probabilistic distribution for a small number of action candidates may also be expected in most BMI applications – after all there are usually only a small number of actions that make sense in a given environment. While this work is presented in the context of decoding human reaching, it may be applied to a wide range of BMI applications including lower limb prosthetic devices and human computer interactions, as well as different signal sources such as electrode grid recordings and electroencephalograms. The increased user control in conveying their intended movements would significantly improve the functionality of a neuroprosthetic device. A c k n o w l e d g e me n t s T h e a u t h o r s t h a n k T. H a s w e l l , E . K r e p k o v i c h , a n d V. Ravichandran for assistance with experiments. This work was funded by the NSF Program in Cyber-Physical Systems. R e f e re n c e s [1] L.R. Hochberg, M.D. Serruya, G.M. Friehs, J.A. Mukand, M. Saleh, A.H. Caplan, A. Branner, D. 8 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] Chen, R.D. Penn, and J.P. Donoghue, “Neuronal ensemble control of prosthetic devices by a human with tetraplegia,” Nature, vol. 442, 2006, pp. 164–171. W. Wu, Y. Gao, E. Bienenstock, J.P. Donoghue, and M.J. Black, “Bayesian population decoding of motor cortical activity using a Kalman filter,” Neural Computation, vol. 18, 2006, pp. 80–118. W. Wu, M.J. Black, Y. Gao, E. Bienenstock, M. Serruya, A. Shaikhouni, and J.P. Donoghue, “Neural decoding of cursor motion using a Kalman filter,” Advances in Neural Information Processing Systems 15: Proceedings of the 2002 Conference, 2003, p. 133. R.E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of basic Engineering, vol. 82, 1960, pp. 35–45. G.G. Scandaroli, G.A. Borges, J.Y. Ishihara, M.H. Terra, A.F.D. Rocha, and F.A.D.O. Nascimento, “Estimation of foot orientation with respect to ground for an above knee robotic prosthesis,” Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems, St. Louis, MO, USA: IEEE Press, 2009, pp. 1112-1117. I. Cikajlo, Z. Matjačić, and T. Bajd, “Efficient FES triggering applying Kalman filter during sensory supported treadmill walking,” Journal of Medical Engineering & Technology, vol. 32, 2008, pp. 133144. S. Kim, J.D. Simeral, L.R. Hochberg, J.P. Donoghue, and M.J. Black, “Neural control of computer cursor velocity by decoding motor cortical spiking activity in humans with tetraplegia,” Journal of Neural Engineering, vol. 5, 2008, pp. 455-476. L. Srinivasan, U.T. Eden, A.S. Willsky, and E.N. Brown, “A state-space analysis for reconstruction of goal-directed movements using neural signals,” Neural computation, vol. 18, 2006, pp. 2465–2494. G.H. Mulliken, S. Musallam, and R.A. Andersen, “Decoding trajectories from posterior parietal cortex ensembles,” Journal of Neuroscience, vol. 28, 2008, p. 12913. W. Wu, J.E. Kulkarni, N.G. Hatsopoulos, and L. Paninski, “Neural Decoding of Hand Motion Using a Linear State-Space Model With Hidden States,” IEEE Transactions on neural systems and rehabilitation engineering, vol. 17, 2009, p. 1. J.E. Kulkarni and L. Paninski, “State-space decoding of goal-directed movements,” IEEE Signal Processing Magazine, vol. 25, 2008, p. 78. C. Kemere and T. Meng, “Optimal estimation of feed-forward-controlled linear systems,” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP'05), 2005. B.M. Yu, C. Kemere, G. Santhanam, A. Afshar, S.I. Ryu, T.H. Meng, M. Sahani, and K.V. Shenoy, “Mixture of trajectory models for neural decoding of goal-directed movements,” Journal of neurophysiology, vol. 97, 2007, p. 3763. N. Hatsopoulos, J. Joshi, and J.G. O'Leary, “Decoding continuous and discrete motor behaviors using motor and premotor cortical ensembles,” Journal of neurophysiology, vol. 92, 2004, p. 1165. R.S. Johansson, G. Westling, A. Backstrom, and J.R. Flanagan, “Eye-hand coordination in object manipulation,” Journal of Neuroscience, vol. 21, 2001, p. 6917. G. Wu, F.C. van der Helm, H.E.J. Veeger, M. Makhsous, P. Van Roy, C. Anglin, J. Nagels, A.R. Karduna, and K. McQuade, “ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion–Part II: shoulder, elbow, wrist and hand,” Journal of biomechanics, vol. 38, 2005, pp. 981–992. D. Simon, Optimal state estimation: Kalman, H [infinity] and nonlinear approaches, John Wiley and Sons, 2006. Z. Ghahramani and G.E. Hinton, “Parameter estimation for linear dynamical systems,” University of Toronto technical report CRG-TR-96-2, vol. 6, 1996. 9

4 0.99721104 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

Author: Nikos Karampatziakis

Abstract: We cast the problem of identifying basic blocks of code in a binary executable as learning a mapping from a byte sequence to a segmentation of the sequence. In general, inference in segmentation models, such as semi-CRFs, can be cubic in the length of the sequence. By taking advantage of the structure of our problem, we derive a linear-time inference algorithm which makes our approach practical, given that even small programs are tens or hundreds of thousands bytes long. Furthermore, we introduce two loss functions which are appropriate for our problem and show how to use structural SVMs to optimize the learned mapping for these losses. Finally, we present experimental results that demonstrate the advantages of our method against a strong baseline. 1

5 0.99644566 32 nips-2010-Approximate Inference by Compilation to Arithmetic Circuits

Author: Daniel Lowd, Pedro Domingos

Abstract: Arithmetic circuits (ACs) exploit context-specific independence and determinism to allow exact inference even in networks with high treewidth. In this paper, we introduce the first ever approximate inference methods using ACs, for domains where exact inference remains intractable. We propose and evaluate a variety of techniques based on exact compilation, forward sampling, AC structure learning, Markov network parameter learning, variational inference, and Gibbs sampling. In experiments on eight challenging real-world domains, we find that the methods based on sampling and learning work best: one such method (AC2 -F) is faster and usually more accurate than loopy belief propagation, mean field, and Gibbs sampling; another (AC2 -G) has a running time similar to Gibbs sampling but is consistently more accurate than all baselines. 1

6 0.99616766 11 nips-2010-A POMDP Extension with Belief-dependent Rewards

7 0.99580246 100 nips-2010-Gaussian Process Preference Elicitation

8 0.99552399 133 nips-2010-Kernel Descriptors for Visual Recognition

9 0.99302238 108 nips-2010-Graph-Valued Regression

10 0.97971261 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

11 0.97547346 83 nips-2010-Evidence-Specific Structures for Rich Tractable CRFs

12 0.97197664 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

13 0.97170812 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning

14 0.96883237 144 nips-2010-Learning Efficient Markov Networks

15 0.96683031 197 nips-2010-Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets

16 0.96599954 212 nips-2010-Predictive State Temporal Difference Learning

17 0.96573365 94 nips-2010-Feature Set Embedding for Incomplete Data

18 0.96486509 118 nips-2010-Implicit Differentiation by Perturbation

19 0.96421194 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

20 0.96232951 61 nips-2010-Direct Loss Minimization for Structured Prediction