iccv iccv2013 iccv2013-441 knowledge-graph by maker-knowledge-mining

441 iccv-2013-Video Motion for Every Visible Point


Source: pdf

Author: Susanna Ricco, Carlo Tomasi

Abstract: Dense motion of image points over many video frames can provide important information about the world. However, occlusions and drift make it impossible to compute long motionpaths by merely concatenating opticalflow vectors between consecutive frames. Instead, we solve for entire paths directly, and flag the frames in which each is visible. As in previous work, we anchor each path to a unique pixel which guarantees an even spatial distribution of paths. Unlike earlier methods, we allow paths to be anchored in any frame. By explicitly requiring that at least one visible path passes within a small neighborhood of every pixel, we guarantee complete coverage of all visible points in all frames. We achieve state-of-the-art results on real sequences including both rigid and non-rigid motions with significant occlusions.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Instead, we solve for entire paths directly, and flag the frames in which each is visible. [sent-4, score-0.731]

2 As in previous work, we anchor each path to a unique pixel which guarantees an even spatial distribution of paths. [sent-5, score-0.646]

3 Unlike earlier methods, we allow paths to be anchored in any frame. [sent-6, score-0.611]

4 By explicitly requiring that at least one visible path passes within a small neighborhood of every pixel, we guarantee complete coverage of all visible points in all frames. [sent-7, score-0.806]

5 Introduction The goal of long-range, high-density motion estimation in video analysis is to compute the life of every point in a dense sampling of the visible surfaces in the scene. [sent-10, score-0.317]

6 The image projection of a scene point moves along a path in the image plane. [sent-11, score-0.425]

7 In a dense motion estimate, at least one path passes through every pixel of the sequence. [sent-13, score-0.578]

8 The computed paths can propagate to multiple frames any annotations or edits made in a single frame, thereby easing video labeling and editing. [sent-15, score-0.614]

9 If visible paths can be extrapolated into regions where they are occluded, the occluding object can be removed from the video by painting the pixels it occupies with the extrapolated colors. [sent-16, score-0.801]

10 Videos can be segmented into separate objects by clustering paths into coherent groups. [sent-17, score-0.568]

11 We find motion in all regions visible in any frame. [sent-26, score-0.248]

12 Lagrangian motion (b) only computes paths for points visible in the first or last frame. [sent-27, score-0.827]

13 To this end, we assume that (i) image paths live in a low-dimensional space, (ii) appearance remains approximately constant along the visible portion of a path, and (iii) exactly one world point is visible at every image point. [sent-33, score-0.95]

14 Like them—and several others—we assume that paths belong to a low-dimensional subspace. [sent-37, score-0.568]

15 We also anchor each path to a single pixel in the sequence, so as to keep paths from bunching up. [sent-38, score-1.214]

16 Similarly to LME, we also describe path visibility with a binary, per-frame flag, and cast motion estimation as energy minimization. [sent-39, score-0.748]

17 First, we do not have fixed “reference frames” to anchor paths into. [sent-41, score-0.802]

18 By default, LME selects the first and last frame as reference frames and estimates paths for only those scene points that are visible in one of those frames. [sent-42, score-0.969]

19 Figure 1 illustrates this limitation; LME misses large regions in an intermediate frame because the surfaces are not visible in either reference frame. [sent-43, score-0.325]

20 To guarantee that all visible surfaces are associated with paths, LME would have to select every frame as a reference frame, an approach that quickly becomes computationally infeasible for long videos. [sent-44, score-0.349]

21 The greater flexibility of our method allows for both anchors in any frame and more realistic regularization functionals leading to more accurate paths. [sent-47, score-0.238]

22 For example, in LME, the regularization terms only enforce consistency between paths anchored in the same reference frame; our method encourages consistency across all frames. [sent-48, score-0.712]

23 Third, we formulate the computation of the visibility flag as a Maximum a Posteriori (MAP) Markov Random Field (MRF) estimation problem, for which an efficient solution method is available. [sent-50, score-0.383]

24 This formulation allows for the explicit enforcement of the constraint that there must be some visible path at every pixel. [sent-51, score-0.566]

25 Their paths start in regions with sufficient texture, but cover more image regions than feature trackers like KLT [14] do. [sent-58, score-0.641]

26 Highcost paths end at suspected occlusions, and new ones are started to fill gaps. [sent-61, score-0.568]

27 Structure-from-motion methods regularize more globally by assuming rigid motion—a restrictive assumption—for which image paths can be proven [20] to lie in a space of low and known dimension. [sent-63, score-0.601]

28 These techniques precompute paths with frame-to-frame trackers, and de-noise them post facto by projection into a low-dimensional subspace. [sent-65, score-0.632]

29 More recent methods apply subspace constraints during path estimation to track points that are hard for a frame-toframe tracker to follow. [sent-66, score-0.505]

30 Early approaches applied subspace constraints during optical flow estimation to improve estimates in untextured regions of rigid scenes [12] or sampled from a path subspace to improve motion estimates along intensity edges affected by the aperture problem [21]. [sent-67, score-0.841]

31 They compute full-length paths for every point in a selected reference frame. [sent-70, score-0.678]

32 LME finds paths by optimizing a global energy function over the entire video. [sent-73, score-0.594]

33 It models visibility explicitly, and reconnects paths across brief occlusions. [sent-74, score-0.834]

34 As explained earlier, we improve upon LME by removing its reliance on reference frames, handling visibility combinatorially rather than by approximate relaxation, and minimizing energy by direct optimization rather than variational methods. [sent-75, score-0.329]

35 Model Let p be an index into a set of paths xp(t) : T → R2, where T is the (discrete) time domain of( t)he : Tvid →eo sequence. [sent-78, score-0.568]

36 A is path disi vcirseitbel)e iamt etim deo mt iifnf oitfs visibility flag νp(t) : T → {0, 1} is equal to 1 at time t. [sent-79, score-0.798]

37 te Bdo ftohr f ualnl paths in a given video sequence. [sent-81, score-0.568]

38 To ensure at least one visible path per pixel in every frame, we anchor xp(t) to point up in some frame τp by letting xp(τp) = up and νp(τp) = 1. [sent-82, score-0.959]

39 22446655 We require (and automatically select) enough anchor points to have some path pass through every pixel in the video. [sent-83, score-0.733]

40 Paths are assumed to be in the space spanned by a sequence-specific basis of paths {ϕ1 , . [sent-85, score-0.613]

41 1 The motion relative to the anchor point xp(τp) = up is determined by the unknown coefficients cp = (cp1 , . [sent-93, score-0.464]

42 Since paths in a video with F frames have F points, the standard basis over R2F can represent any path exactly. [sent-97, score-1.034]

43 Given basis paths and anchor points, we find paths and visibility flags by interleaving computing optimal paths given visibility with computing optimal visibility given paths. [sent-100, score-3.007]

44 Section 4 shows how to find the path basis and initial anchors, and Section 5 shows how to adjust the anchors and compute optimal paths and visibility. [sent-102, score-1.127]

45 Optimal paths Given a basis of paths and a set of anchors, we find the best motion coefficients for each path by minimizing an objective function that penalizes changes in appearance along a path (temporal smoothness) and differences between nearby paths (spatial smoothness): ? [sent-105, score-2.72]

46 (2c to measure the difference between the image intensity I(cp, t) = I(xp(t)) of the path in frame t and that at the anchor up in frame τp. [sent-114, score-0.848]

47 The multiplier αpq couples nearby paths that have similar appearance, and is equal to αpq= exp? [sent-121, score-0.621]

48 A small patch (red dashed squares) around each path in every frame is transported along the current path estimates and monitored for consistent appearance. [sent-126, score-1.025]

49 The arm patch (top right) is most consistent, and makes this the controlling path at that point and frame. [sent-127, score-0.502]

50 Points along paths that either coincide with or are substantially parallel to a nearby controlling path have their observed visibility flag ˆν p(t) set to 1. [sent-128, score-1.444]

51 Observed flags affect the estimated visibility flags at the nodes of a MRF that enforces spatial and temporal consistency of the flags and ensures that at least one path is visible at every pixel. [sent-130, score-1.572]

52 if the path p is visible in the anchor frame of path q (that is, if νp(τq) = 1) and passes close enough to the anchor of q (that is, if | |xp(τq) − uq | | < Δ). [sent-131, score-1.493]

53 Optimal visibility The binary visibility flag νp(t) for each path and frame is modeled as a MRF whose structure depends on the current estimates xp(t) of the paths p ∈ P. [sent-135, score-1.732]

54 he ≤ tw Δo points vp(t −all1) f axendd vp(t + 1) that are temporally adjacent to vp(t) along path p (temporal neighborhood). [sent-144, score-0.439]

55 Each node in the MRF is associated with a binary observed visibility flag ˆν p(t) computed from the data as follows. [sent-145, score-0.405]

56 Path points in each frame are scored by their patch consistency, which measures how little a patch around vp(t) changes as it is transported by the current estimates of paths near vp(t) to (i) a few frames before and after time t, and (ii) the anchor frame τp for path p. [sent-146, score-1.601]

57 We use equation (11) from LME [16] to compute patch consistency and declare = Δ the controlling path at vp(t) to be the most consistent path through the spatial neighborhood of vp(t). [sent-147, score-0.892]

58 =F1||xp(t) − xq(t)| (6) be the average distance between two paths, and let p∗ be 22446666 the controlling path at vp(t). [sent-149, score-0.435]

59 Then, the observed visibility νˆp(t) is defined as follows (see also Figure 2): ˆνp(t) =? [sent-150, score-0.266]

60 (7) In words, a path p is observed to be visible at vp(t) when it either coincides with (p = p∗ so that d¯pp∗ = 0) or is nearly parallel (d¯pp∗ 4) to the controlling path p∗ at vp(t). [sent-152, score-0.951]

61 Because we require 4t)h atot paths mntursolt lbineg gv pisaitbhle p in their anchor frames, we also always set νˆp(τp) = 1. [sent-153, score-0.822]

62 The observed visibility flags ˆν p(t) influence the (hidden) visibility flags νp(t) through a data term in the MRF. [sent-154, score-0.984]

63 We define the following average measure of intensity change along the visible portion of path p: ≤ Δp=? [sent-155, score-0.584]

64 − νˆp(t) (9) The terms with multiplier λL bias estimated visibility values νp(t) toward observed values νˆp(t). [sent-162, score-0.288]

65 Setting a point to be visible incurs the additional charge ΔIp(t), equal to the change in intensity between anchor and current point. [sent-163, score-0.51]

66 The weights on edges between the random variables of the MRF encourage both temporal and spatial consistency among visibility values. [sent-165, score-0.328]

67 = λT|νp(t) − νp(t + 1)| (10) is added between temporally adjacent neighbors to discourage changes of visibility along a path. [sent-168, score-0.314]

68 (13) In words, ΔIpq (t) measures difference in appearance between paths in a single frame, and ΔIpq measures a similar difference between anchor points. [sent-177, score-0.802]

69 The combined effect of these two terms is to push discontinuities in visibility closer to intensity boundaries, and the division by d¯pq reduces the spatial discontinuity penalty between unrelated paths. [sent-178, score-0.355]

70 To enforce the physical constraint that there must be some visible point at every pixel, we clamp some visibility values to 1 and remove the corresponding nodes from the MRF. [sent-179, score-0.5]

71 Specifically, we require that νp(τp) = 1 and νp∗ (t) = 1with p∗ a controlling path in frame t. [sent-180, score-0.534]

72 Preliminaries Before we solve for motion and visibility, we select basis paths and an initial set of anchors, paths, and visibility flags. [sent-183, score-0.96]

73 Finding the basis paths Basis paths are obtained by first tracking a sparse set of feature points with a frame-to-frame tracker [14]. [sent-186, score-1.218]

74 This yields several tracks, that is, paths that do not necessarily extend through the entire sequence. [sent-187, score-0.568]

75 We scale path coordinates so that the mean per-path motion between frames is one pixel. [sent-197, score-0.502]

76 Initialization To cover every pixel in a video sequence with paths, we need to create a number of paths of the same order of the number of visible points in the sequence. [sent-200, score-0.861]

77 Placing anchor points at every pixel in the first and last frame at worst overestimates the true number of anchors needed by a factor of two. [sent-201, score-0.596]

78 We place additional anchors to cover visible regions that happen to be occluded in these two particular frames by following a procedure inspired by Sundaram et al. [sent-202, score-0.379]

79 We first concatenate optical flow vectors into multiframe tracks, which we break when the optical flow field fails a forwardbackward consistency check or when the point is too close 22446677 to a motion boundary (equations (5) and (6) from [19], respectively). [sent-204, score-0.383]

80 Track fragments that start in the first frame are converted into paths with anchor points in the first frame. [sent-207, score-1.038]

81 If the fragments are long enough, the initial coefficients for the path are computed by projecting the track onto the path basis. [sent-208, score-0.954]

82 Otherwise, we select coefficients by copying from nearby track fragments, picking the coefficients that create the path with the best brightness constancy measured over a few frames. [sent-209, score-0.62]

83 Track fragments that reach the last frame are converted into paths anchored at points in the last frame using the same procedure. [sent-210, score-0.946]

84 The temporal extent of the track fragments provides an initial conservative estimate for the path visibility flags. [sent-211, score-0.814]

85 For all other track fragments, we place a (possible) anchor point in the frame in which they are initialized and convert to a path as before. [sent-212, score-0.795]

86 We iterate through these poten- tial paths and only include the anchor points for those that differ from already included paths by more than an average of 2 pixels per frame. [sent-213, score-1.428]

87 Figure 3a shows the anchor points selected in this way for the marple7 sequence. [sent-214, score-0.271]

88 Colors other than gray are anchors, and similar colors correspond to similar sets of path coefficients. [sent-215, score-0.375]

89 Figure 3b shows the color-coded anchor points after convergence. [sent-218, score-0.271]

90 Optimization Starting with the paths and visibility flags constructed as described in Section 4. [sent-220, score-1.06]

91 2, we interleave two steps during optimization: a combinatorial optimization step finds visibility flags νp(t) for the current path estimates, and a continuous optimization step updates path coefficients cp given the current visibility estimates. [sent-221, score-1.715]

92 In the process, we add anchor points until every pixel in the sequence has at least one path through it, and remove anchors of invisible paths. [sent-222, score-0.895]

93 We stop when the maximum change in every path falls below one pixel in every frame. [sent-223, score-0.512]

94 The initial path estimates are often poor along occlusion boundaries, because visibility is not yet accounted for. [sent-224, score-0.709]

95 Because of this, we heuristically regroup paths between each combinatorial and continuous step to let foreground and background vie for paths between them. [sent-225, score-1.197]

96 We now describe the continuous step, path regrouping, combinatorial step, anchor management, and termination. [sent-226, score-0.67]

97 We update path coefficients by minimizing the energy function (2) via trust-region Newton conjugate gradients optimization [15]. [sent-228, score-0.484]

98 The sparsity pattern of H changes over time because the coupling coefficients αpq in equation (5) depend in turn on the path coefficients. [sent-230, score-0.457]

99 When computing successive conjugate gradients, we treat the terms αpq as constants—a good approximation for small path perturbations—and recompute them between full descent steps. [sent-231, score-0.397]

100 After 40 descent steps, we allow paths to copy their coefficients and visibility flags from one of their neighbors if doing so improves the path’s fit to data. [sent-233, score-1.15]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('paths', 0.568), ('path', 0.375), ('lme', 0.271), ('visibility', 0.266), ('anchor', 0.234), ('flags', 0.226), ('vp', 0.179), ('visible', 0.141), ('anchors', 0.139), ('flag', 0.117), ('xp', 0.108), ('frame', 0.099), ('pq', 0.091), ('motion', 0.081), ('marple', 0.08), ('fragments', 0.079), ('tracks', 0.073), ('ipq', 0.068), ('cp', 0.065), ('track', 0.064), ('mrf', 0.061), ('coefficients', 0.061), ('controlling', 0.06), ('cpk', 0.056), ('every', 0.05), ('flow', 0.05), ('frames', 0.046), ('crate', 0.045), ('miss', 0.045), ('basis', 0.045), ('ip', 0.043), ('anchored', 0.043), ('optical', 0.043), ('lagrangian', 0.042), ('multiframe', 0.041), ('estimates', 0.041), ('intensity', 0.041), ('ricco', 0.04), ('charge', 0.04), ('combinatorial', 0.039), ('points', 0.037), ('facto', 0.037), ('reference', 0.037), ('pixel', 0.037), ('transported', 0.035), ('passes', 0.035), ('extrapolated', 0.033), ('rigid', 0.033), ('consistency', 0.032), ('nearby', 0.031), ('incurs', 0.031), ('vq', 0.031), ('sundaram', 0.031), ('temporal', 0.03), ('xq', 0.029), ('copy', 0.029), ('subspace', 0.029), ('create', 0.028), ('post', 0.027), ('along', 0.027), ('neighborhood', 0.027), ('occluded', 0.027), ('energy', 0.026), ('penalty', 0.026), ('regions', 0.026), ('consecutive', 0.026), ('aperture', 0.025), ('patch', 0.023), ('invisible', 0.023), ('boundaries', 0.023), ('occlusions', 0.023), ('missing', 0.023), ('point', 0.023), ('division', 0.022), ('continuous', 0.022), ('conjugate', 0.022), ('multiplier', 0.022), ('surfaces', 0.022), ('node', 0.022), ('trackers', 0.021), ('changes', 0.021), ('iterate', 0.021), ('converted', 0.021), ('arm', 0.021), ('atot', 0.02), ('softens', 0.02), ('ppqq', 0.02), ('forwardbackward', 0.02), ('abte', 0.02), ('pans', 0.02), ('etim', 0.02), ('teller', 0.02), ('clamp', 0.02), ('interleave', 0.02), ('parlance', 0.02), ('opticalflow', 0.02), ('oma', 0.02), ('cip', 0.02), ('unmodeled', 0.02), ('oitfs', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 441 iccv-2013-Video Motion for Every Visible Point

Author: Susanna Ricco, Carlo Tomasi

Abstract: Dense motion of image points over many video frames can provide important information about the world. However, occlusions and drift make it impossible to compute long motionpaths by merely concatenating opticalflow vectors between consecutive frames. Instead, we solve for entire paths directly, and flag the frames in which each is visible. As in previous work, we anchor each path to a unique pixel which guarantees an even spatial distribution of paths. Unlike earlier methods, we allow paths to be anchored in any frame. By explicitly requiring that at least one visible path passes within a small neighborhood of every pixel, we guarantee complete coverage of all visible points in all frames. We achieve state-of-the-art results on real sequences including both rigid and non-rigid motions with significant occlusions.

2 0.29169032 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses

Author: Ryan Tokola, Wongun Choi, Silvio Savarese

Abstract: We present an approach to multi-target tracking that has expressive potential beyond the capabilities of chainshaped hidden Markov models, yet has significantly reduced complexity. Our framework, which we call tracking-byselection, is similar to tracking-by-detection in that it separates the tasks of detection and tracking, but it shifts tempo-labs . com Stanford, CA ssi lvio @ st an ford . edu ral reasoning from the tracking stage to the detection stage. The core feature of tracking-by-selection is that it reasons about path hypotheses that traverse the entire video instead of a chain of single-frame object hypotheses. A traditional chain-shaped tracking-by-detection model is only able to promote consistency between one frame and the next. In tracking-by-selection, path hypotheses exist across time, and encouraging long-term temporal consistency is as simple as rewarding path hypotheses with consistent image features. One additional advantage of tracking-by-selection is that it results in a dramatically simplified model that can be solved exactly. We adapt an existing tracking-by-detection model to the tracking-by-selectionframework, and show improvedperformance on a challenging dataset (introduced in [18]).

3 0.23951186 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies

Author: Min Sun, Wan Huang, Silvio Savarese

Abstract: Many methods have been proposed to solve the image classification problem for a large number of categories. Among them, methods based on tree-based representations achieve good trade-off between accuracy and test time efficiency. While focusing on learning a tree-shaped hierarchy and the corresponding set of classifiers, most of them [11, 2, 14] use a greedy prediction algorithm for test time efficiency. We argue that the dramatic decrease in accuracy at high efficiency is caused by the specific design choice of the learning and greedy prediction algorithms. In this work, we propose a classifier which achieves a better trade-off between efficiency and accuracy with a given tree-shaped hierarchy. First, we convert the classification problem as finding the best path in the hierarchy, and a novel branchand-bound-like algorithm is introduced to efficiently search for the best path. Second, we jointly train the classifiers using a novel Structured SVM (SSVM) formulation with additional bound constraints. As a result, our method achieves a significant 4.65%, 5.43%, and 4.07% (relative 24.82%, 41.64%, and 109.79%) improvement in accuracy at high efficiency compared to state-of-the-art greedy “tree-based” methods [14] on Caltech-256 [15], SUN [32] and ImageNet 1K [9] dataset, respectively. Finally, we show that our branch-and-bound-like algorithm naturally ranks the paths in the hierarchy (Fig. 8) so that users can further process them.

4 0.16898789 153 iccv-2013-Face Recognition Using Face Patch Networks

Author: Chaochao Lu, Deli Zhao, Xiaoou Tang

Abstract: When face images are taken in the wild, the large variations in facial pose, illumination, and expression make face recognition challenging. The most fundamental problem for face recognition is to measure the similarity between faces. The traditional measurements such as various mathematical norms, Hausdorff distance, and approximate geodesic distance cannot accurately capture the structural information between faces in such complex circumstances. To address this issue, we develop a novel face patch network, based on which we define a new similarity measure called the random path (RP) measure. The RP measure is derivedfrom the collective similarity ofpaths by performing random walks in the network. It can globally characterize the contextual and curved structures of the face space. To apply the RP measure, we construct two kinds of networks: . cuhk . edu . hk the in-face network and the out-face network. The in-face network is drawn from any two face images and captures the local structural information. The out-face network is constructed from all the training face patches, thereby modeling the global structures of face space. The two face networks are structurally complementary and can be combined together to improve the recognition performance. Experiments on the Multi-PIE and LFW benchmarks show that the RP measure outperforms most of the state-of-art algorithms for face recognition.

5 0.16286606 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction

Author: Andrew Owens, Jianxiong Xiao, Antonio Torralba, William Freeman

Abstract: We present a data-driven method for building dense 3D reconstructions using a combination of recognition and multi-view cues. Our approach is based on the idea that there are image patches that are so distinctive that we can accurately estimate their latent 3D shapes solely using recognition. We call these patches shape anchors, and we use them as the basis of a multi-view reconstruction system that transfers dense, complex geometry between scenes. We “anchor” our 3D interpretation from these patches, using them to predict geometry for parts of the scene that are relatively ambiguous. The resulting algorithm produces dense reconstructions from stereo point clouds that are sparse and noisy, and we demonstrate it on a challenging dataset of real-world, indoor scenes.

6 0.13754618 429 iccv-2013-Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs

7 0.13584669 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context

8 0.1321189 307 iccv-2013-Parallel Transport of Deformations in Shape Space of Elastic Surfaces

9 0.11695971 317 iccv-2013-Piecewise Rigid Scene Flow

10 0.1095712 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning

11 0.088005848 160 iccv-2013-Fast Object Segmentation in Unconstrained Video

12 0.086427897 450 iccv-2013-What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search?

13 0.084878087 282 iccv-2013-Multi-view Object Segmentation in Space and Time

14 0.081451327 389 iccv-2013-Shortest Paths with Curvature and Torsion

15 0.073924087 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf

16 0.073710352 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

17 0.073126405 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations

18 0.073098458 220 iccv-2013-Joint Deep Learning for Pedestrian Detection

19 0.072800457 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation

20 0.072515041 139 iccv-2013-Elastic Fragments for Dense Scene Reconstruction


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.157), (1, -0.077), (2, -0.004), (3, 0.059), (4, 0.016), (5, 0.037), (6, -0.04), (7, 0.048), (8, 0.028), (9, 0.009), (10, -0.063), (11, 0.023), (12, 0.05), (13, 0.082), (14, 0.027), (15, 0.086), (16, -0.036), (17, -0.059), (18, 0.002), (19, 0.045), (20, -0.017), (21, -0.063), (22, 0.083), (23, -0.048), (24, -0.075), (25, -0.05), (26, -0.036), (27, -0.135), (28, 0.003), (29, -0.006), (30, -0.067), (31, -0.1), (32, -0.011), (33, -0.092), (34, 0.013), (35, 0.055), (36, 0.046), (37, 0.056), (38, 0.195), (39, -0.151), (40, -0.123), (41, -0.089), (42, 0.256), (43, 0.187), (44, 0.0), (45, -0.233), (46, 0.119), (47, 0.04), (48, 0.098), (49, -0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96787328 441 iccv-2013-Video Motion for Every Visible Point

Author: Susanna Ricco, Carlo Tomasi

Abstract: Dense motion of image points over many video frames can provide important information about the world. However, occlusions and drift make it impossible to compute long motionpaths by merely concatenating opticalflow vectors between consecutive frames. Instead, we solve for entire paths directly, and flag the frames in which each is visible. As in previous work, we anchor each path to a unique pixel which guarantees an even spatial distribution of paths. Unlike earlier methods, we allow paths to be anchored in any frame. By explicitly requiring that at least one visible path passes within a small neighborhood of every pixel, we guarantee complete coverage of all visible points in all frames. We achieve state-of-the-art results on real sequences including both rigid and non-rigid motions with significant occlusions.

2 0.65892029 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses

Author: Ryan Tokola, Wongun Choi, Silvio Savarese

Abstract: We present an approach to multi-target tracking that has expressive potential beyond the capabilities of chainshaped hidden Markov models, yet has significantly reduced complexity. Our framework, which we call tracking-byselection, is similar to tracking-by-detection in that it separates the tasks of detection and tracking, but it shifts tempo-labs . com Stanford, CA ssi lvio @ st an ford . edu ral reasoning from the tracking stage to the detection stage. The core feature of tracking-by-selection is that it reasons about path hypotheses that traverse the entire video instead of a chain of single-frame object hypotheses. A traditional chain-shaped tracking-by-detection model is only able to promote consistency between one frame and the next. In tracking-by-selection, path hypotheses exist across time, and encouraging long-term temporal consistency is as simple as rewarding path hypotheses with consistent image features. One additional advantage of tracking-by-selection is that it results in a dramatically simplified model that can be solved exactly. We adapt an existing tracking-by-detection model to the tracking-by-selectionframework, and show improvedperformance on a challenging dataset (introduced in [18]).

3 0.57977223 307 iccv-2013-Parallel Transport of Deformations in Shape Space of Elastic Surfaces

Author: Qian Xie, Sebastian Kurtek, Huiling Le, Anuj Srivastava

Abstract: Statistical shape analysis develops methods for comparisons, deformations, summarizations, and modeling of shapes in given data sets. These tasks require afundamental tool called parallel transport of tangent vectors along arbitrary paths. This tool is essential for: (1) computation of geodesic paths using either shooting or path-straightening method, (2) transferring deformations across objects, and (3) modeling of statistical variability in shapes. Using the square-root normal field (SRNF) representation of parameterized surfaces, we present a method for transporting deformations along paths in the shape space. This is difficult despite the underlying space being a vector space because the chosen (elastic) Riemannian metric is non-standard. Using a finite-basis for representing SRNFs of shapes, we derive expressions for Christoffel symbols that enable parallel transports. We demonstrate this framework using examples from shape analysis of parameterized spherical sur- faces, in the three contexts mentioned above.

4 0.56145471 165 iccv-2013-Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies

Author: Min Sun, Wan Huang, Silvio Savarese

Abstract: Many methods have been proposed to solve the image classification problem for a large number of categories. Among them, methods based on tree-based representations achieve good trade-off between accuracy and test time efficiency. While focusing on learning a tree-shaped hierarchy and the corresponding set of classifiers, most of them [11, 2, 14] use a greedy prediction algorithm for test time efficiency. We argue that the dramatic decrease in accuracy at high efficiency is caused by the specific design choice of the learning and greedy prediction algorithms. In this work, we propose a classifier which achieves a better trade-off between efficiency and accuracy with a given tree-shaped hierarchy. First, we convert the classification problem as finding the best path in the hierarchy, and a novel branchand-bound-like algorithm is introduced to efficiently search for the best path. Second, we jointly train the classifiers using a novel Structured SVM (SSVM) formulation with additional bound constraints. As a result, our method achieves a significant 4.65%, 5.43%, and 4.07% (relative 24.82%, 41.64%, and 109.79%) improvement in accuracy at high efficiency compared to state-of-the-art greedy “tree-based” methods [14] on Caltech-256 [15], SUN [32] and ImageNet 1K [9] dataset, respectively. Finally, we show that our branch-and-bound-like algorithm naturally ranks the paths in the hierarchy (Fig. 8) so that users can further process them.

5 0.48596698 429 iccv-2013-Tree Shape Priors with Connectivity Constraints Using Convex Relaxation on General Graphs

Author: Jan Stühmer, Peter Schröder, Daniel Cremers

Abstract: We propose a novel method to include a connectivity prior into image segmentation that is based on a binary labeling of a directed graph, in this case a geodesic shortest path tree. Specifically we make two contributions: First, we construct a geodesic shortest path tree with a distance measure that is related to the image data and the bending energy of each path in the tree. Second, we include a connectivity prior in our segmentation model, that allows to segment not only a single elongated structure, but instead a whole connected branching tree. Because both our segmentation model and the connectivity constraint are convex, a global optimal solution can be found. To this end, we generalize a recent primal-dual algorithm for continuous convex optimization to an arbitrary graph structure. To validate our method we present results on data from medical imaging in angiography and retinal blood vessel segmentation.

6 0.46575671 289 iccv-2013-Network Principles for SfM: Disambiguating Repeated Structures with Local Context

7 0.45080358 153 iccv-2013-Face Recognition Using Face Patch Networks

8 0.43570331 387 iccv-2013-Shape Anchors for Data-Driven Multi-view Reconstruction

9 0.39698324 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf

10 0.36335048 200 iccv-2013-Higher Order Matching for Consistent Multiple Target Tracking

11 0.3613914 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation

12 0.3378267 270 iccv-2013-Modeling Self-Occlusions in Dynamic Shape and Appearance Tracking

13 0.32992518 450 iccv-2013-What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search?

14 0.32253441 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking

15 0.31680629 143 iccv-2013-Estimating Human Pose with Flowing Puppets

16 0.31640023 58 iccv-2013-Bayesian 3D Tracking from Monocular Video

17 0.29900143 274 iccv-2013-Monte Carlo Tree Search for Scheduling Activity Recognition

18 0.2933889 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling

19 0.28746384 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape

20 0.27656016 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.042), (26, 0.068), (31, 0.036), (42, 0.072), (64, 0.428), (73, 0.026), (89, 0.183), (98, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95226634 99 iccv-2013-Cross-View Action Recognition over Heterogeneous Feature Spaces

Author: Xinxiao Wu, Han Wang, Cuiwei Liu, Yunde Jia

Abstract: In cross-view action recognition, “what you saw” in one view is different from “what you recognize ” in another view. The data distribution even the feature space can change from one view to another due to the appearance and motion of actions drastically vary across different views. In this paper, we address the problem of transferring action models learned in one view (source view) to another different view (target view), where action instances from these two views are represented by heterogeneous features. A novel learning method, called Heterogeneous Transfer Discriminantanalysis of Canonical Correlations (HTDCC), is proposed to learn a discriminative common feature space for linking source and target views to transfer knowledge between them. Two projection matrices that respectively map data from source and target views into the common space are optimized via simultaneously minimizing the canonical correlations of inter-class samples and maximizing the intraclass canonical correlations. Our model is neither restricted to corresponding action instances in the two views nor restricted to the same type of feature, and can handle only a few or even no labeled samples available in the target view. To reduce the data distribution mismatch between the source and target views in the commonfeature space, a nonparametric criterion is included in the objective function. We additionally propose a joint weight learning method to fuse multiple source-view action classifiers for recognition in the target view. Different combination weights are assigned to different source views, with each weight presenting how contributive the corresponding source view is to the target view. The proposed method is evaluated on the IXMAS multi-view dataset and achieves promising results.

2 0.9449169 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking

Author: Naiyan Wang, Jingdong Wang, Dit-Yan Yeung

Abstract: This paper studies the visual tracking problem in video sequences and presents a novel robust sparse tracker under the particle filter framework. In particular, we propose an online robust non-negative dictionary learning algorithm for updating the object templates so that each learned template can capture a distinctive aspect of the tracked object. Another appealing property of this approach is that it can automatically detect and reject the occlusion and cluttered background in a principled way. In addition, we propose a new particle representation formulation using the Huber loss function. The advantage is that it can yield robust estimation without using trivial templates adopted by previous sparse trackers, leading to faster computation. We also reveal the equivalence between this new formulation and the previous one which uses trivial templates. The proposed tracker is empirically compared with state-of-the-art trackers on some challenging video sequences. Both quantitative and qualitative comparisons show that our proposed tracker is superior and more stable.

3 0.93941849 88 iccv-2013-Constant Time Weighted Median Filtering for Stereo Matching and Beyond

Author: Ziyang Ma, Kaiming He, Yichen Wei, Jian Sun, Enhua Wu

Abstract: Despite the continuous advances in local stereo matching for years, most efforts are on developing robust cost computation and aggregation methods. Little attention has been seriously paid to the disparity refinement. In this work, we study weighted median filtering for disparity refinement. We discover that with this refinement, even the simple box filter aggregation achieves comparable accuracy with various sophisticated aggregation methods (with the same refinement). This is due to the nice weighted median filtering properties of removing outlier error while respecting edges/structures. This reveals that the previously overlooked refinement can be at least as crucial as aggregation. We also develop the first constant time algorithmfor the previously time-consuming weighted median filter. This makes the simple combination “box aggregation + weighted median ” an attractive solution in practice for both speed and accuracy. As a byproduct, the fast weighted median filtering unleashes its potential in other applications that were hampered by high complexities. We show its superiority in various applications such as depth upsampling, clip-art JPEG artifact removal, and image stylization.

4 0.92559075 242 iccv-2013-Learning People Detectors for Tracking in Crowded Scenes

Author: Siyu Tang, Mykhaylo Andriluka, Anton Milan, Konrad Schindler, Stefan Roth, Bernt Schiele

Abstract: People tracking in crowded real-world scenes is challenging due to frequent and long-term occlusions. Recent tracking methods obtain the image evidence from object (people) detectors, but typically use off-the-shelf detectors and treat them as black box components. In this paper we argue that for best performance one should explicitly train people detectors on failure cases of the overall tracker instead. To that end, we first propose a novel joint people detector that combines a state-of-the-art single person detector with a detector for pairs of people, which explicitly exploits common patterns of person-person occlusions across multiple viewpoints that are a frequent failure case for tracking in crowded scenes. To explicitly address remaining failure modes of the tracker we explore two methods. First, we analyze typical failures of trackers and train a detector explicitly on these cases. And second, we train the detector with the people tracker in the loop, focusing on the most common tracker failures. We show that our joint multi-person detector significantly improves both de- tection accuracy as well as tracker performance, improving the state-of-the-art on standard benchmarks.

5 0.89631331 166 iccv-2013-Finding Actors and Actions in Movies

Author: P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, J. Sivic

Abstract: We address the problem of learning a joint model of actors and actions in movies using weak supervision provided by scripts. Specifically, we extract actor/action pairs from the script and use them as constraints in a discriminative clustering framework. The corresponding optimization problem is formulated as a quadratic program under linear constraints. People in video are represented by automatically extracted and tracked faces together with corresponding motion features. First, we apply the proposed framework to the task of learning names of characters in the movie and demonstrate significant improvements over previous methods used for this task. Second, we explore the joint actor/action constraint and show its advantage for weakly supervised action learning. We validate our method in the challenging setting of localizing and recognizing characters and their actions in feature length movies Casablanca and American Beauty.

6 0.89374566 37 iccv-2013-Action Recognition and Localization by Hierarchical Space-Time Segments

same-paper 7 0.87710732 441 iccv-2013-Video Motion for Every Visible Point

8 0.85157347 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation

9 0.80890107 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes

10 0.79176086 303 iccv-2013-Orderless Tracking through Model-Averaged Posterior Estimation

11 0.77555549 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

12 0.75238061 86 iccv-2013-Concurrent Action Detection with Structural Prediction

13 0.75034791 424 iccv-2013-Tracking Revisited Using RGBD Camera: Unified Benchmark and Baselines

14 0.74384248 240 iccv-2013-Learning Maximum Margin Temporal Warping for Action Recognition

15 0.72132307 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning

16 0.71826541 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation

17 0.70067871 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection

18 0.67224681 338 iccv-2013-Randomized Ensemble Tracking

19 0.66920781 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition

20 0.66890568 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking