iccv iccv2013 iccv2013-420 knowledge-graph by maker-knowledge-mining

420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow

Source: pdf

Author: Jason Chang, John W. Fisher_III

Abstract: We present an integrated probabilistic model for layered object tracking that combines dynamics on implicit shape representations, topologicalshape constraints, adaptive appearance models, and layered flow. The generative model combines the evolution of appearances and layer shapes with a Gaussian process flow and explicit layer ordering. Efficient MCMC sampling algorithms are developed to enable a particle filtering approach while reasoning about the distribution of object boundaries in video. We demonstrate the utility oftheproposed tracking algorithm on a wide variety of video sources while achieving state-of-the-art results on a boundary-accurate tracking dataset.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present an integrated probabilistic model for layered object tracking that combines dynamics on implicit shape representations, topologicalshape constraints, adaptive appearance models, and layered flow. [sent-3, score-0.655]

2 The generative model combines the evolution of appearances and layer shapes with a Gaussian process flow and explicit layer ordering. [sent-4, score-1.052]

3 Efficient MCMC sampling algorithms are developed to enable a particle filtering approach while reasoning about the distribution of object boundaries in video. [sent-5, score-0.247]

4 We demonstrate the utility oftheproposed tracking algorithm on a wide variety of video sources while achieving state-of-the-art results on a boundary-accurate tracking dataset. [sent-6, score-0.108]

5 The resulting probabilistic model for layered object tracking combines dynamic appearance and shape models, topology constraints, and Gaussian process flow. [sent-18, score-0.443]

6 Furthermore, we also formulate a novel shape sampler that addresses a flaw in the formulation of [5] and is directly applicable to general segmentation problems. [sent-22, score-0.427]

7 The layered appearance model described herein is closely related to [15, 24], but as described in Section 2, differs explicitly by coupling occlusions and disocclusions with the layer supports. [sent-32, score-0.719]

8 Similarly, Section 3 discusses how our flow model generalizes that of [32] by using Gaussian processes. [sent-33, score-0.191]

9 A notable exception is [23] which uses an unconventional particle filter that propagates particles with a number of gradient descent iterations instead of a randomized proposal. [sent-45, score-0.104]

10 Additionally, the dynamics do not correspond to an underlying object motion, and therefore, do not couple the changes in appearance and shape. [sent-47, score-0.148]

11 To our knowledge, only [25] and [10] (optimizationbased methods) consider topology constraints in video analysis. [sent-48, score-0.094]

12 The first uses digital topology to penalize merges of two connected components (each corresponding to a sepa161 rate object). [sent-49, score-0.094]

13 Similarly, the second extends binary topology to M-ary topology to restrict objects from merging. [sent-51, score-0.253]

14 This method does not use an appearance or flow model and also does not handle occlusions that may split the visible support of objects. [sent-52, score-0.412]

15 [27, 28] estimate motion and reason about depth ordering in a layered model. [sent-54, score-0.336]

16 While [28] achieves state-of-the-art optical flow results, it does so at considerable computational expense. [sent-58, score-0.191]

17 Furthermore, they do not allow for temporal evolution of layer ordering. [sent-59, score-0.429]

18 Processing times on current standard computer hardware are on the order of hours per frame as compared to approximately one minute per frame for the proposed approach. [sent-60, score-0.095]

19 In-Frame Appearance Scene models are comprised of M layers, where each object lies in one layer, and one layer is the designated background. [sent-67, score-0.412]

20 tm,i =} 1iff pixel iis in the support of layer m at time t. [sent-71, score-0.476]

21 The visible layer at pixel i, denoted vit, can then be expressed as zmt. [sent-73, score-0.527]

22 tm,i=1} (1) Associated with each layer is a pixel-wise appearance model, atm ∈ RN×3, where atm,i is the 3-dimensional color (we use the ∈ La Rb colorspace) at pixel i. [sent-75, score-0.58]

23 Assuming Gaussian observation noise, the observed image, xt is generated by zt xit|at, ? [sent-76, score-0.17]

24 (2) We note that the visible pixel, vit, is implicitly dependent on zt through Equation 1. [sent-78, score-0.205]

25 While [15] and [24] use similar models, they do not address the appearance of occluded or disoccluded pixels which we now discuss. [sent-82, score-0.211]

26 m, am, and fm are the support, appearance, and flow for layer m, and z controls the layer order. [sent-84, score-1.034]

27 Temporal Appearance Dynamics Given the appearance and support of a previous frame, the dynamics of these random variables evolvejointly based on the underlying motion of the layer. [sent-88, score-0.214]

28 We describe the motion model for layer m from frame (t − 1) to t, denoted tfiotmn, imn Sdeelcti foonr a3. [sent-89, score-0.409]

29 Consequently, we dbeyfine a probability distribution over the evolving appearance model for each of these categories. [sent-93, score-0.156]

30 For example, the disoccluded pixels in Figure 2b are likely to be green or black. [sent-97, score-0.158]

31 Appearances in Rtm are xt ihner Feifogurre edr 2abw cna nfro bme a fm ainxytu creol oorf. [sent-105, score-0.124]

32 a A unppifoeramra ndciestsri ibnuRt ion and a kernel density estimate of the observed appearances + αRk(atm,i), k(a) |Ot1m−1|? [sent-106, score-0.095]

33 Temporal Support Dynamics The evolution ofeach layer support is coupled to the evolution of its appearance. [sent-114, score-0.546]

34 However, by introducing disoccluded and revealed pixels, we must also allow the support of each layer to deviate from the aligned support, f? [sent-115, score-0.607]

35 Consequently, layer supports are chosen to evolve according to a distribution proportional to exponentiated symmetric area difference (SAD), where SAD can be expressed as SAD(? [sent-117, score-0.617]

36 impose a curve length penalty on ·t]h ies shape odfd eiaticohn layer aen idm epnofsoercae c euarvche layer to be a single connected component. [sent-127, score-0.787]

37 The resulting temporal dynamics on layer support are expressed as p(? [sent-128, score-0.584]

38 For example, an object that splits into two will violate the topology prior of the model. [sent-161, score-0.128]

39 Gaussian Process Flow Having detailed the observation model conditioned on the layered flow fields, we describe a flow model comprised of layered Gaussian processes (GPs). [sent-165, score-0.882]

40 A GP can be parametrized with a mean and covariance function (c. [sent-166, score-0.092]

41 We restrict the model to zero-mean GPs with stationary covariance kernels which, as shown in Section 5, enables an efficient sampling-based inference method. [sent-169, score-0.244]

42 (a) (b) (c) (d) Figure 3: Samples from different GPs: (a) sparse neighborhood precision; (b) SE kernel with layered composition using the layers of (c). [sent-170, score-0.332]

43 In practice, however, these formulations have been applied to object trajectories, whereas here we consider their application to dense flow between frames. [sent-175, score-0.191]

44 GPs are not typically used to model flow for two reasons: (1) GPs are often smooth everywhere, causing errors across object boundaries; and (2) na¨ ıve inference requires inverting covariance matrices that do not scale well with the image size. [sent-176, score-0.36]

45 One particular GP covariance kernel is closely related to the L2 penalty in the Horn-Schunck optical flow formulation [13]. [sent-178, score-0.324]

46 However, that L2 penalty on flow differences results in an improper distribution with a rank deficient covariance matrix. [sent-179, score-0.355]

47 Figure 3a shows a sample from this type of GP, which fails to capture long range correlations in flow fields and, concurrently, overly penalizes discontinuities across object boundaries. [sent-181, score-0.261]

48 Following [32], we instead compose smooth, layered flows using fi = fvi,i ∀i ∈ {1, ··· , N}. [sent-182, score-0.24]

49 (10) Here, fm is the flow for layer m, and f is the composite flow for visible pixels. [sent-183, score-0.919]

50 Similar to optical flow, we assume that the x and y components of flow are independent. [sent-184, score-0.191]

51 Unlike [32] which uses layered flows of the form of [13], a GP flow is a valid prior distribution that can capture longrange dependency. [sent-189, score-0.537]

52 The associated graphical structure is fixed and depends on the covariance kernel. [sent-191, score-0.128]

53 The SE kernel often results in overly smoothed flow estimates that do not adequately capture object deformation. [sent-200, score-0.264]

54 We mitigate this issue by adopting a covariance kernel that is the composition ofthe SE kernel and a delta function. [sent-201, score-0.207]

55 It is easily verified that the resulting GP follows the distribution p(fm) = N(fm ; 0, Σg + σf2I), (11) where Σg is the resulting covariance from the SE kernel, and σf2I is the resulting covariance from the delta kernel. [sent-202, score-0.289]

56 Notation is slightly abused since each component of the flow is not explicitly written. [sent-203, score-0.191]

57 We show in Section 5 how to exploit a decomposition of this flow for efficient inference. [sent-204, score-0.191]

58 In particular, we decompose the flow into the smooth flow (denoted gm) and the remaining independent part with p(gm) = N(gm ; 0, Σg) , p(fm|gm) = N(fm ; gm, σf2I). [sent-205, score-0.416]

59 Shape Sampling Having developed a dynamic appearance and shape model, we present a Bayesian filtering procedure that reasons about the distribution of the hidden variables conditioned on past and current observations. [sent-207, score-0.232]

60 A typical approach is to use a particle filter [14], where the distribution at any time is represented by a set of weighted samples. [sent-209, score-0.131]

61 Here, we propose a more accurate particle propagation approach by incorporating both the prior and the data likelihood terms. [sent-216, score-0.093]

62 Though more accurate, this approach is typically avoided because sampling from the full conditional distribution can be computationally prohibitive. [sent-219, score-0.119]

63 A key enabler is our extension of [5] to efficiently sample the support of shapes. [sent-220, score-0.104]

64 MCMC samplers construct a transition distribution (often referred to as the proposal distribution) such that the stationary distribution of the chain is the target distribution. [sent-226, score-0.32]

65 The Metropolis-Hastings (MH) algorithm [12] enforces the correct stationary distribution by generating ? [sent-227, score-0.145]

66 The Gibbs-Inspired Metropolis Hastings shape sampler (GIMH) [5] uses the MH-MCMC algorithm. [sent-247, score-0.341]

67 It assumes that the target distribution only depends on the sign of the level set function and can be expressed as a product of a prior and independent likelihoods: π(? [sent-249, score-0.151]

68 calculating local changes in curve length, GIMH achieves extremely fast burn-in times and can control the topology of the resulting sample. [sent-254, score-0.094]

69 Both of these aspects are directly applicable to the prior we place on layers described in Section 2. [sent-255, score-0.114]

70 GIMH generates a proposal by adding a random constant to the level set function in a random subset of pixels. [sent-256, score-0.103]

71 Because of the ordering implied by the level set function, only a linear number of possible configurations in the size of the subset is possible (as opposed to the exponential number of total configurations). [sent-257, score-0.125]

72 A proposal distribution is chosen such that Equation 14 evaluates to 1and every proposal is accepted. [sent-258, score-0.278]

73 However, due to a minor flaw in the formulation GIMH does not preserve the correct stationary distribution. [sent-259, score-0.159]

74 Permutation-Based GIMH Instead ofimplicitly ordering pixels with a level set function as in GIMH, we consider explicitly using a random total ordering, o, on all pixels. [sent-263, score-0.166]

75 We define a consistent ordering as an ordering that puts all pixels with ? [sent-266, score-0.291]

76 In fact, the flaw of [5] is resolved via a simple sampling procedure while preserving the correct stationary distribution. [sent-280, score-0.206]

77 Conditioned on W, a relative ordering is defined to be an ordering on the pixels only in W. [sent-282, score-0.291]

78 We note that a relative ordering derived from a consistent total ordering must also be consistent. [sent-283, score-0.25]

79 Sample a consistent relative ordering of pixels in W uniformly using a Knuth shuffle [17] on each Wl 3. [sent-293, score-0.166]

80 Inference The development of the PGIMH method allows one to quickly sample layer supports from an arbitrary distribution. [sent-304, score-0.461]

81 We now describe a sequential (in time) Gibbs sampler that utilizes PGIMH to perform tracking. [sent-305, score-0.31]

82 As noted, this type of sampler can be viewed as a particle filter without weight updates. [sent-306, score-0.369]

83 Joint inference of flow and layer supports often exhibits better convergence [27]. [sent-307, score-0.657]

84 Our sampler marginalizes out part of the flow and, owing to our particular formulation, infers hidden variables on the order of one minute per frame as compared to [27] which takes hours per frame. [sent-308, score-0.601]

85 Single Layer Sampler The sampler we use iterates over the following steps gt ∼ p(gt|ft), (16) ? [sent-311, score-0.367]

86 zt), (19) Sampling from gt |ft is equivalent to sampling a GP with observations. [sent-318, score-0.104]

87 [22]), this distribution can be expressed as gmt|fmt ∼ N(μg∗, Σg∗), μg∗ = Σ[Σ + σf2I]−1fmt , Σg∗ = Σ − Σ[Σ (20) + σf2I]−1Σ. [sent-321, score-0.117]

88 By drawing on the work of [26], we show in the supplement that a GP with a stationary covariance kernel, k(x x? [sent-323, score-0.217]

89 Thousands of iterations of the PGIMH sampler in Algorithm 1 are used to sample the layer supports of Equation 17 in less than a second. [sent-335, score-0.771]

90 We randomly choose a layer, m, and sample its support by manipulating the following proportional distribution ? [sent-337, score-0.216]

91 (24) We can marginalize fmt by approximating p(fmt,i |gtm,i) = N(fmt,i gmt,i, σf2) with a discrete FIR filter, hf. [sent-348, score-0.2]

92 Changing the mth layer accomplishes one of the following: (1) grow and occlude the currently visible layer; (2) grow behind the visible layer; (3) shrink and disocclude the next layer; (4) shrink behind the visible layer. [sent-356, score-0.674]

93 Furthermore, the Gaussian appearance dynamics and observation models allow for efficient marginalization ofpreviously observed appearances. [sent-358, score-0.148]

94 For disoccluded and revealed appearances, we draw a sample from the distributions in Equation 4-5. [sent-359, score-0.201]

95 When the number of layers is larger, a swap proposal in a MetropolisHastings framework can be used. [sent-371, score-0.183]

96 Multiple Layer Sampler The second step of the preceding algorithm samples the support of a single layer at once which can exhibit slow convergence in certain situations. [sent-374, score-0.444]

97 For example, if layers m = 1, 2, 3 all have support at pixel i, but the actual visible layer should be the background (m = 3), the sampler must move through an intermediate state before making the back- ground visible. [sent-375, score-0.938]

98 Ifthe intermediate state is unlikely, this type of proposal can get stuck in a local extrema. [sent-376, score-0.103]

99 Consequently, we develop a single-pixel layer support sampler that samples all layers jointly. [sent-377, score-0.834]

100 First, a visible layer is sampled according to the following categorical distribution Pr(vit = m) = p(xit|vit = m) Pr[? [sent-378, score-0.522]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('layer', 0.378), ('sampler', 0.31), ('fatm', 0.229), ('gimh', 0.229), ('tm', 0.223), ('layered', 0.211), ('fmt', 0.2), ('flow', 0.191), ('gp', 0.181), ('pgimh', 0.172), ('zt', 0.133), ('ordering', 0.125), ('gps', 0.122), ('atm', 0.117), ('disoccluded', 0.117), ('vit', 0.105), ('proposal', 0.103), ('dynamics', 0.095), ('topology', 0.094), ('covariance', 0.092), ('fm', 0.087), ('flaw', 0.086), ('layers', 0.08), ('ql', 0.078), ('gm', 0.075), ('stationary', 0.073), ('distribution', 0.072), ('visible', 0.072), ('support', 0.066), ('particle', 0.059), ('gt', 0.057), ('csai', 0.057), ('gmt', 0.057), ('hastings', 0.057), ('omt', 0.057), ('otm', 0.057), ('rtm', 0.057), ('sad', 0.057), ('xit', 0.055), ('tracking', 0.054), ('appearances', 0.054), ('portions', 0.053), ('nw', 0.053), ('appearance', 0.053), ('mcmc', 0.052), ('supplement', 0.052), ('equation', 0.051), ('evolution', 0.051), ('metropolis', 0.051), ('optimizationbased', 0.051), ('gmrf', 0.047), ('jason', 0.047), ('disocclusions', 0.047), ('sampling', 0.047), ('revealed', 0.046), ('ft', 0.045), ('particles', 0.045), ('supports', 0.045), ('expressed', 0.045), ('conditioned', 0.044), ('colorspace', 0.044), ('inference', 0.043), ('moves', 0.042), ('kernel', 0.041), ('pixels', 0.041), ('shrink', 0.04), ('proportional', 0.04), ('orderings', 0.039), ('csail', 0.039), ('gaussian', 0.038), ('sample', 0.038), ('xt', 0.037), ('boundaries', 0.037), ('evolve', 0.037), ('trajectories', 0.037), ('graphical', 0.036), ('qs', 0.036), ('owing', 0.036), ('restrict', 0.036), ('smooth', 0.034), ('comprised', 0.034), ('prior', 0.034), ('minute', 0.033), ('gibbs', 0.033), ('delta', 0.033), ('enumerate', 0.033), ('pixel', 0.032), ('overly', 0.032), ('filtering', 0.032), ('shape', 0.031), ('frame', 0.031), ('john', 0.031), ('evolving', 0.031), ('occlusions', 0.03), ('pr', 0.03), ('extends', 0.029), ('batch', 0.029), ('flows', 0.029), ('se', 0.029), ('wl', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow

Author: Jason Chang, John W. Fisher_III

2 0.16048056 317 iccv-2013-Piecewise Rigid Scene Flow

Author: Christoph Vogel, Konrad Schindler, Stefan Roth

Abstract: Estimating dense 3D scene flow from stereo sequences remains a challenging task, despite much progress in both classical disparity and 2D optical flow estimation. To overcome the limitations of existing techniques, we introduce a novel model that represents the dynamic 3D scene by a collection of planar, rigidly moving, local segments. Scene flow estimation then amounts to jointly estimating the pixelto-segment assignment, and the 3D position, normal vector, and rigid motion parameters of a plane for each segment. The proposed energy combines an occlusion-sensitive data term with appropriate shape, motion, and segmentation regularizers. Optimization proceeds in two stages: Starting from an initial superpixelization, we estimate the shape and motion parameters of all segments by assigning a proposal from a set of moving planes. Then the pixel-to-segment assignment is updated, while holding the shape and motion parameters of the moving planes fixed. We demonstrate the benefits of our model on different real-world image sets, including the challenging KITTI benchmark. We achieve leading performance levels, exceeding competing 3D scene flow methods, and even yielding better 2D motion estimates than all tested dedicated optical flow techniques.

3 0.12319421 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs

Author: Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee

Abstract: Many state-of-the-art optical flow estimation algorithms optimize the data and regularization terms to solve ill-posed problems. In this paper, in contrast to the conventional optical flow framework that uses a single or fixed data model, we study a novel framework that employs locally varying data term that adaptively combines different multiple types of data models. The locally adaptive data term greatly reduces the matching ambiguity due to the complementary nature of the multiple data models. The optimal number of complementary data models is learnt by minimizing the redundancy among them under the minimum description length constraint (MDL). From these chosen data models, a new optical flow estimation energy model is designed with the weighted sum of the multiple data models, and a convex optimization-based highly effective and practical solution thatfinds the opticalflow, as well as the weights isproposed. Comparative experimental results on the Middlebury optical flow benchmark show that the proposed method using the complementary data models outperforms the state-ofthe art methods.

4 0.12018893 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal

Author: Yu Li, Michael S. Brown

Abstract: This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes’ geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods. 1. Introduction and Related Work There are situations when a scene must be imaged behind a pane of glass. This is common when “window shopping” where one takes a photograph of an object behind a window. This is not a conducive setup for imaging as the glass will produce an unwanted layer of reflection in the resulting image. This problem can be treated as one of layer separation [7, 8], where the captured image I a linear combiis nation of a reflection layer IR and the desired background scene, IB, as follows: I IR + IB. = (1) The goal of reflection removal is to separate IB and IR from an input image I shown in Figure 1. as This problem is ill-posed, as it requires extracting two layers from one image. To make the problem tractable additional information, either supplied from the user or from Fig. 1. Example of our approach separating the background (IB) and reflection (IR) layers of one of the input images. Note that the reflection layer’s contrast has been boosted to improve visualization. multiple images, is required. For example, Levin and Weiss [7, 8] proposed a method where a user labelled image gradients as belonging to either background or reflection. Combing the markup with an optimization that imposed a sparsity prior on the separated images, their method produced compelling results. The only drawback was the need for user intervention. An automatic method was proposed by Levin et al. [9] that found the most likely decomposition which minimized the total number of edges and corners in the recovered image using a database of natural images. As 22443322 with example-based methods, the results were reliant on the similarity of the examples in the database. Another common strategy is to use multiple images. Some methods assume a fixed camera that is able to capture a set of images with different mixing of the layers through various means, e.g. rotating a polarized lens [3, 6, 12, 16, 17], changing focus [15], or applying a flash [1]. While these approaches demonstrate good results, the ability of controlling focal change, polarization, and flash may not always be possible. Sarel and Irani [13, 14] proposed video based methods that work by assuming the two layers, reflection and background, to be statistically uncorrelated. These methods can handle complex geometry in the reflection layer, but require a long image sequence such that the reflection layer has significant changes in order for a median-based approach [21] to extract the intrinsic image from the sequence as the initial guess for one of the layers. Techniques closer to ours exploit motion between the layers present in multiple images. In particular, when the background is captured from different points of view, the background and the reflection layers undergo different motions due to their different distance to the transparent layer. One issue with changing viewpoint is handling alignment among the images. Szeliski et al. [19] proposed a method that could simultaneously recover the two layers by assuming they were both static scenes and related by parametric transformations (i.e. homographies). Gai et al. [4, 5] proposed a similar approach that aligned the images in the gradient domain using gradient sparsity, again assuming static scenes. Tsin et al. [20] relaxed the planar scene constraint in [19] and used dense stereo correspondence with stereo matching configuration which limits the camera motion to unidirectional parallel motion. These approaches produce good results, but the constraint on scene geometry and assumed motion of the camera limit the type of scenes that can be processed. Our Contribution Our proposed method builds on the single-image approach by Levin and Weiss [8], but removes the need for user markup by examining the relative motion in a small set (e.g. 3-5) of images to automatically label gradients as either reflection or background. This is done by first aligning the images using SIFT-flow and then examining the variation in the gradients over the image set. Gradients with more variation are assumed to be from reflection while constant gradients are assumed to be from the desired background. While a simple idea, this approach does not impose any restrictions on the scene or reflection geometry. This allows a more practical imaging setup that is suitable for handheld cameras. The remainder of this paper is organized as follows. Section 2 overviews our approach; section 3 compares our results with prior methods on several examples; the paper is concluded in section 4. Warped ? ?Recovered ? ? Recovered ? ? Warp e d ? ?Recover d ? ? Recover d ? ? Fig. 2. This figure shows the separated layers of the first two input images. The layers illustrate that the background image IB has lit- tle variation while the reflection layers, IRi ,have notable variation due to the viewpoint change. 2. Reflection Removal Method 2.1. Imaging Assumption and Procedure The input ofour approach is a small set of k images taken of the scene from slightly varying view points. We assume the background dominates in the mixture image and the images are related by a warping, such that the background is registered and the reflection layer is changing. This relationship can be expressed as: Ii = wi(IRi + IB), (2) where Ii is the i-th mixture image, {wi}, i = 1, . . . , k are warping fuisn tchteio in-sth hcma uisxetud by mthaeg camera viewpoint change with respect to a reference image (in our case I1). Assuming we can estimate the inverse warps, w−i1, where w−11 is the identity, we get the following relationship: wi−1(Ii) = IRi + IB. (3) Even though IB appears static in the mixture image, the problem is still ill-posed given we have more unknowns than the number of input images. However, the presence of a static IB in the image set makes it possible to identify gradient edges of the background layer IB and edges of the changing reflection layers IRi . More specifically, edges in IB are assumed to appear every time in the image set while the edges in the reflection layer IRi are assumed to vary across the set. This reflection-change effect can be seen in Figure 2. This means edges can be labelled based on the frequency of a gradient appearing at a particular pixel across the aligned input images. After labelling edges as either background or reflection, we can reconstruct the two layers using an optimization that imposes the sparsity prior on the separated layers as done by [7, 8]. Figure 3 shows the processing pipeline of our approach. Each step is described in the following sections. 22443333 Fig. 3. This figure shows the pipeline of our approach: 1) warping functions are estimated to align the inputs to a reference view; 2) the edges are labelled as either background or foreground based on gradient frequency; 3) a reconstruction step is used to separate the two layers; 4) all recovered background layers are combined together to get the final recovered background. 2.2. Warping Our approach begins by estimating warping functions, w−i1, to register the input to the reference image. Previous approaches estimated these warps using global parametric motion (e.g. homographies [4, 5, 19]), however, the planarity constraint often leads to regions in the image with misalignments when the scene is not planar. Traditional dense correspondence method like optical flow is another option. However, even with our assumption that the background should be more prominent than the reflection layer, optical flow methods (e.g. [2, 18]) that are based on image intensity gave poor performance due to the reflection interference. This led us to try SIFT-flow [10] that is based on more robust image features. SIFT-flow [10] proved to work surprisingly well on our input sequences and provide a dense warp suitable to bring the images into alignment even under moderate interference of reflection. Empirical demonstration of the effectiveness of SIFT-flow in this task as well as the comparison with optical flow are shown in our supplemental materials. Our implementation fixes I1 as the reference, then uses SIFT-flow to estimate the inverse-warping functions {w−i1 }, i= 2, . . . , k for each ofthe input images I2 , . . . , Ik against ,I 1i . = W 2e, a.l.s.o, compute htohef gradient magnitudes Gi of the each input image and then warp the images Ii as well as the gradient magnitudes Gi using the same inverse-warping function w−i1, denoting the warped images and gradient magnitudes as Iˆi and Gˆi. 2.3. Edge separation Our approach first identifies salient edges using a simple threshold on the gradient magnitudes in Gˆi. The resulting binary edge map is denoted as Ei. After edge detection, the edges need to be separated as either background or foreground in each aligned image Iˆi. As previously discussed, the edges of the background layer should appear frequently across all the warped images while the edges of the reflection layer would only have sparse presence. To examine the sparsity of the edge occurrence, we use the following measurement: Φ(y) =??yy??2221, (4) where y is a vector containing the gradient magnitudes at a given pixel location. Since all elements in y are non-negative, we can rewrite equation 4 as Φ(y) = yi)2. This measurement can be conside?red as a L1? normalized L2 norm. It measures the sparsity o?f the vecto?r which achieves its maximum value of 1when only one non-zero item exists and achieve its minimum value of k1 when all items are non-zero and have identical values (i.e. y1 = y2 = . . . = yk > 0). This measurement is used to assign two probabilities to each edge pixel as belonging to either background or reflection. We estimate the reflection edge probability by examining ?ik=1 yi2/(?ik=1 22443344 the edge occurrence, as follows: PRi(x) = s?(??iikk==11GGˆˆii((xx))2)2−k1?,(5) Gˆi Iˆi. where, (x) is the gradient magnitude at pixel x of We subtract k1 to move the smallest value close to zero. The sparsity measurement is further stretched by a sigmoid function s(t) = (1 + e−(t−0.05)/0.05)−1 to facilitate the separation. The background edge probability is then estimated by: PBi(x) = s?−?(??iikk==11GGˆˆii((xx))2)2−k1??,(6) where PBi (x) + PRi (x) = ?1. These probabilities are defined only at the pixels that are edges in the image. We consider only edge pixels with relatively high probability in either the background edge probability map or reflection edge probability map. The final edge separation is performed by thresholding the two probability maps as: EBi/Ri(x) =⎨⎧ 10, Ei(x) = 1 aotndhe PrwBiis/eRi(x) > 0.6 Figure 4 shows ⎩the edge separation procedure. 2.4. Layer Reconstruction With the separated edges of the background and the reflection, we can reconstruct the two layers. Levin and Weis- ???????????? Gˆ Fig. 4. Edge separation illustration: 1) shows the all gradient maps in this case we have five input images; 2) plots the gradient values at two position across the five images - top plot is a pixel on a background edge, bottom plot is a pixel on a reflection edge; 3) shows the probability map estimated for each layer; 4) Final edge separation after thresholding the probability maps. s [7, 8] showed that the long tailed distribution of gradients in natural scenes is an effective prior in this problem. This kind of distributions is well modelled by a Laplacian or hyper-Laplacian distribution (P(t) ∝ p = 1for – e−|t|p/s, Laplacian and p < 1 for hyper-Laplacian). In our work, we use Laplacian approximation since the L1 norm converges quickly with good results. For each image Iˆi , we try to maximize the probability P(IBi , IRi ) in order to separate the two layers and this is equivalent to minimizing the cost log P(IBi , IRi ). Following the same deduction tinh e[ c7]o,s tw −ithlo tgheP independent assumption of the two layers (i.e. P(IBi , IRi ) = P(IBi ) · P(IRi )), the objective function becomes: − J(IBi) = ? |(IBi ∗ fn)(x)| + |((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?EBi(x)|((Iˆi − IBi) ∗ fn)(x)| ?x, ?n + λ?ERi(x)|(IBi ?x,n ∗ fn)(x)|, (7) where fn denotes the derivative filters and ∗ is the 2D convolution operator. hFeo rd efrniv, we use trwso a nodri e∗n istat tihoen 2s Dan cdo nt-wo degrees (first order and second order) derivative filters. While the first term in the objective function keeps the gradients of the two layer as sparse as possible, the last two terms force the gradients of IBi at edges positions in EBi to agree with the gradients of input image Iˆi and gradients of IRi at edge positions in ERi agree with the gradients of Iˆi. This equation can be further rewritten in the form of J = ?Au b? 1 and be minimized efficiently using iterative − reweighted lbea?st square [11]. 2.5. Combining the Results Our approach processes each image in the input set independently. Due to the reflective glass surface, some of the images may contain saturated regions from specular highlights. When saturation occurs, we can not fully recover the structure in these saturated regions because the information about the two layers are lost. In addition, sometimes the edges of the reflection in some regions are too weak to be correctly distinguished. This can lead to local regions in the background where the reflection is still present. These erroneous regions are often in different places in each input image due to changes in the reflection. In such cases, it is reasonable to assume that the minimum value across all recovered background layers may be a proper approximation of the true background. As such, the last step of our method is to take the minimum of the pixel value of all reconstructed background images as the final recovered background, as follows: IB (x) = mini IBi (x) . 22443355 (8) Fig. 5. This figure shows our combination procedure. The recovered background on each single image is good at first glance but may have reflection remaining in local regions. A simple minimum operator combining all recovered images gives a better result in these regions. The comparison can be seen in the zoomed-in regions. × Based on this, the reflection layer of each input image can be computed by IRi = IB . The effectiveness of this combination procedure is ill−us Itrated in Figure 5. Iˆi − 3. Results In this section, we present the experimental results of our proposed method. Additional results and test cases can be found in the accompanying supplemental materials. The experiments were conducted on an Intel i7? PC (3.4GHz CPU, 8.0GB RAM). The code was implemented in Matlab. We use the SIFT-Flow implementation provided by the authors 1. Matlab code and images used in our paper can be downloaded at the author’s webpage 2. The entire procedure outlined in Figure 3 takes approximately five minutes for a 500 400 image sequence containing up to five images. All t5h0e0 d×at4a0 s0h iomwang are qreuaeln scene captured pu ntodfe irv vea irmioaugse lighting conditions (e.g. indoor, outdoor). Input sequences range from three to five images. Figure 6 shows two examples of our edge separation results and final reconstructed background layers and reflection layers. Our method provides a clear separation of the edges of the two layers which is crucial in the reconstruc- 1http://people.csail.mit.edu/celiu/SIFTflow/SIFTflow.zip 2http://www.comp.nus.edu.sg/ liyu1988/ tion step. Figure 9 shows more reflection removal results of our method. We also compare our methods with those in [8] and [5]. For the method in [8], we use the source code 3 of the author to generate the results. The comparisons between our and [8] are not entirely fair since [8] uses single image to generate the result, while we have the advantage of the entire set. For the results produced by [8], the reference view was used as input. The required user-markup is also provided. For the method in [5], we set the layer number to be one, and estimate the motions of the background layer using their method. In the reconstruction phase, we set the remaining reflection layer in k input mixture images as k different layers, each only appearing once in one mixture. Figure 8 shows the results of two examples. Our results are arguably the best. The results of [8] still exhibited some edges from different layers even with the elaborate user mark-ups. This may be fixed by going back to further refine the user markup. But in the heavily overlapping edge regions, it is challenging for users to indicate the edges. If the edges are not clearly indicated the results tend to be over smoothed in one layer. For the method of [5], since it uses global transformations to align images, local misalignment effects often appear in the final recovered background image. Also, their approach uses all the input image into the optimization to recover the layers. This may lead to the result that has edges from different reflection layers of different images mixed and appear as ghosting effect in the recovered background image. For heavily saturated regions, none of the two previous methods can give visually plausible results like ours. 4. Discussion and Conclusion We have presented a method to automatically remove reflectance interference due to a glass surface. Our approach works by capturing a set of images of a scene from slightly varying view points. The images are then aligned and edges are labelled as belonging to either background or reflectance. This alignment was enabled by SIFT-flow, whose robustness to the reflection interference enabled our method. When using SIFT-flow, we assume that the background layer will be the most prominent and will provide sufficient SIFT features for matching. While we found this to work well in practice, images with very strong reflectance can produce poor alignment as SIFT-flow may attempt to align to the foreground which is changing. This will cause problems in the subsequent layer separation. Figure 7 shows such a case. While these failures can often be handled by cropping the image or simple user input (see supplemental material), it is a notable issue. Another challenging issue is when the background scene 3http://www.wisdom.weizmann.ac.il/ levina/papers/reflections.zip 22443366 ??? ??? ?? ??? Fig. 6. Example of edge separation results and recovered background and foreground layer using our method has large homogeneous regions. In such cases there are no edges to be labelled as background. This makes subsequent separation challenging, especially when the reflection interference in these regions is weak but still visually noticeable. While this problem is not unique to our approach, it is an issue to consider. We also found that by combining all the background results of the input images we can overcome Fig. 7. A failure case of our approach due to dominant reflection against the background in some regions (i.e. the upper part of the phonograph). This will cause unsatisfactory alignment of the background in the warping procedure which further lead to our edge separation and final reconstruction failure as can be seen in the figure. local regions with high saturation. While a simple idea, this combination strategy can be incorporated into other techniques to improve their results. Lastly, we believe reflection removal is an application that would be welcomed on many mobile devices, however, the current processing time is still too long for real world use. Exploring ways to speed up the processing pipeline is an area of interest for future work. Acknowledgement This work was supported by Singapore A*STAR PSF grant 11212100. References [1] A. K. Agrawal, R. Raskar, S. K. Nayar, and Y. Li. Removing photography artifacts using gradient projection and flashexposure sampling. ToG, 24(3):828–835, 2005. [2] A. Bruhn, J. Weickert, and C. Schn o¨rr. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. IJCV, 61(3):21 1–231, 2005. [3] H. Farid and E. H. Adelson. Separating reflections from images by use of independent component analysis. JOSA A, 16(9):2136–2145, 1999. [4] K. Gai, Z. Shi, and C. Zhang. Blindly separating mixtures of multiple layers with spatial shifts. In CVPR, 2008. [5] K. Gai, Z. Shi, and C. Zhang. Blind separation of superimposed moving images using image statistics. TPAMI, 34(1): 19–32, 2012. 22443377 Ours Levin and Weiss [7]Gai et al. [4] Fig. 8. Two example of reflection removal results of our method and those in [8] and [5] (user markup for [8] provided in the supplemental material). Our method provides more visual pleasing results. The results of [8] still exhibited remaining edges from reflection and tended to over smooth some local regions. The results of [5] suffered misalignment due to their global transformation alignment which results in ghosting effect of different layers in the final recovered background image. For the reflection, our results can give very complete and clear recovery of the reflection layer. [6] N. Kong, Y.-W. Tai, and S. Y. Shin. A physically-based approach to reflection separation. In CVPR, 2012. [7] A. Levin and Y. Weiss. User assisted separation ofreflections from a single image using a sparsity prior. In ECCV, 2004. [8] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. TPAMI, 29(9): 1647–1654, 2007. [9] A. Levin, A. Zomet, and Y. Weiss. Separating reflections from a single image using local features. In CVPR, 2004. [10] C. Liu, J. Yuen, and A. Torralba. Sift flow: Dense correspondence across scenes and its applications. TPAMI, 33(5):978– 994, 2011. [11] P. Meer. Robust techniques for computer vision. Emerging Topics in Computer Vision, 2004. [12] N. Ohnishi, K. Kumaki, T. Yamamura, and T. Tanaka. Separating real and virtual objects from their overlapping images. In ECCV, 1996. [13] B. Sarel and M. Irani. Separating transparent layers through layer information exchange. In ECCV, 2004. [14] B. Sarel and M. Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005. [15] Y. Y. Schechner, N. Kiryati, and R. Basri. Separation of [16] [17] [18] [19] [20] [21] transparent layers using focus. IJCV, 39(1):25–39, 2000. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization-based decorrelation of transparent layers: The inclination angle of an invisible surface. In ICCV, 1999. Y. Y. Shechner, J. Shamir, and N. Kiryati. Polarization and statistical analysis of scenes containing a semireflector. JOSA A, 17(2):276–284, 2000. D. Sun, S.Roth, and M. Black. Secrets of optical flow estimation and their principles. In CVPR, 2010. R. Szeliski, S. Avidan, and P. Anandan. Layer Extraction from Multiple Images Containing Reflections and Transparency. In CVPR, 2000. Y. Tsin, S. B. Kang, and R. Szeliski. Stereo matching with linear superposition of layers. TPAMI, 28(2):290–301, 2006. Y. Weiss. Deriving intrinsic images from image sequences. In ICCV, 2001. 22443388 Fig. 9. More results of reflection removal using our method in varying scenes (e.g. art museum, street shop, etc.). 22443399

5 0.1171193 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations

Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller

Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.

6 0.11253668 143 iccv-2013-Estimating Human Pose with Flowing Puppets

7 0.11107929 58 iccv-2013-Bayesian 3D Tracking from Monocular Video

8 0.10356608 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching

9 0.10304172 395 iccv-2013-Slice Sampling Particle Belief Propagation

10 0.10089276 263 iccv-2013-Measuring Flow Complexity in Videos

11 0.099526152 220 iccv-2013-Joint Deep Learning for Pedestrian Detection

12 0.098452441 87 iccv-2013-Conservation Tracking

13 0.09506201 160 iccv-2013-Fast Object Segmentation in Unconstrained Video

14 0.090152629 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs

15 0.087495156 375 iccv-2013-Scene Collaging: Analysis and Synthesis of Natural Images with Semantic Layers

16 0.084760927 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition

17 0.083593845 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

18 0.083039209 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network

19 0.081729278 225 iccv-2013-Joint Segmentation and Pose Tracking of Human in Natural Videos

20 0.080351681 298 iccv-2013-Online Robust Non-negative Dictionary Learning for Visual Tracking

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, -0.064), (2, 0.018), (3, 0.044), (4, 0.036), (5, -0.007), (6, -0.032), (7, 0.083), (8, 0.03), (9, 0.022), (10, -0.072), (11, -0.032), (12, 0.127), (13, 0.017), (14, 0.001), (15, 0.001), (16, -0.052), (17, 0.044), (18, 0.081), (19, 0.045), (20, 0.028), (21, -0.046), (22, 0.08), (23, -0.05), (24, -0.092), (25, -0.031), (26, 0.116), (27, -0.01), (28, -0.017), (29, 0.114), (30, -0.052), (31, 0.031), (32, 0.055), (33, -0.015), (34, -0.021), (35, 0.081), (36, -0.038), (37, -0.022), (38, 0.011), (39, 0.043), (40, -0.015), (41, -0.057), (42, -0.036), (43, -0.076), (44, -0.033), (45, -0.025), (46, -0.025), (47, -0.012), (48, -0.0), (49, -0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94532341 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow

Author: Jason Chang, John W. Fisher_III

2 0.63826191 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs

Author: Tae Hyun Kim, Hee Seok Lee, Kyoung Mu Lee

3 0.59904319 58 iccv-2013-Bayesian 3D Tracking from Monocular Video

Author: Ernesto Brau, Jinyan Guan, Kyle Simek, Luca Del Pero, Colin Reimer Dawson, Kobus Barnard

Abstract: Jinyan Guan† j guan1 @ emai l ari z ona . edu . Kyle Simek† ks imek@ emai l ari z ona . edu . Colin Reimer Dawson‡ cdaws on@ emai l ari z ona . edu . ‡School of Information University of Arizona Kobus Barnard‡ kobus @ s i sta . ari z ona . edu ∗School of Informatics University of Edinburgh for tracking an unknown and changing number of people in a scene using video taken from a single, fixed viewpoint. We develop a Bayesian modeling approach for tracking people in 3D from monocular video with unknown cameras. Modeling in 3D provides natural explanations for occlusions and smoothness discontinuities that result from projection, and allows priors on velocity and smoothness to be grounded in physical quantities: meters and seconds vs. pixels and frames. We pose the problem in the context of data association, in which observations are assigned to tracks. A correct application of Bayesian inference to multitarget tracking must address the fact that the model’s dimension changes as tracks are added or removed, and thus, posterior densities of different hypotheses are not comparable. We address this by marginalizing out the trajectory parameters so the resulting posterior over data associations has constant dimension. This is made tractable by using (a) Gaussian process priors for smooth trajectories and (b) approximately Gaussian likelihood functions. Our approach provides a principled method for incorporating multiple sources of evidence; we present results using both optical flow and object detector outputs. Results are comparable to recent work on 3D tracking and, unlike others, our method requires no pre-calibrated cameras.

4 0.59659714 317 iccv-2013-Piecewise Rigid Scene Flow

Author: Christoph Vogel, Konrad Schindler, Stefan Roth

5 0.59382421 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal

Author: Yu Li, Michael S. Brown

6 0.58617324 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations

7 0.58425027 311 iccv-2013-Pedestrian Parsing via Deep Decompositional Network

8 0.58164978 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching

9 0.56370062 395 iccv-2013-Slice Sampling Particle Belief Propagation

10 0.55672753 430 iccv-2013-Two-Point Gait: Decoupling Gait from Body Shape

11 0.5537172 301 iccv-2013-Optimal Orthogonal Basis and Image Assimilation: Motion Modeling

12 0.54449129 263 iccv-2013-Measuring Flow Complexity in Videos

13 0.54236728 143 iccv-2013-Estimating Human Pose with Flowing Puppets

14 0.53843516 128 iccv-2013-Dynamic Probabilistic Volumetric Models

15 0.52724332 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs

16 0.5249418 87 iccv-2013-Conservation Tracking

17 0.52457196 160 iccv-2013-Fast Object Segmentation in Unconstrained Video

18 0.51928079 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking

19 0.51698542 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes

20 0.50905812 220 iccv-2013-Joint Deep Learning for Pedestrian Detection

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.201), (2, 0.054), (7, 0.027), (26, 0.095), (27, 0.013), (31, 0.079), (34, 0.018), (42, 0.092), (64, 0.087), (73, 0.042), (78, 0.01), (89, 0.165)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83744752 420 iccv-2013-Topology-Constrained Layered Tracking with Latent Flow

Author: Jason Chang, John W. Fisher_III

2 0.77003741 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

Author: Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook

Abstract: Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial distribution of image pixels, the proposed representation abstracts out unnecessary image details, allowing the assignment of comparable saliency values across similar regions, and producing perceptually accurate salient region detection. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the proposed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2% compared to the previous best result, while being computationally more efficient.

3 0.75103915 123 iccv-2013-Domain Adaptive Classification

Author: Fatemeh Mirrashed, Mohammad Rastegari

Abstract: We propose an unsupervised domain adaptation method that exploits intrinsic compact structures of categories across different domains using binary attributes. Our method directly optimizes for classification in the target domain. The key insight is finding attributes that are discriminative across categories and predictable across domains. We achieve a performance that significantly exceeds the state-of-the-art results on standard benchmarks. In fact, in many cases, our method reaches the same-domain performance, the upper bound, in unsupervised domain adaptation scenarios.

4 0.74938118 359 iccv-2013-Robust Object Tracking with Online Multi-lifespan Dictionary Learning

Author: Junliang Xing, Jin Gao, Bing Li, Weiming Hu, Shuicheng Yan

Abstract: Recently, sparse representation has been introduced for robust object tracking. By representing the object sparsely, i.e., using only a few templates via ?1-norm minimization, these so-called ?1-trackers exhibit promising tracking results. In this work, we address the object template building and updating problem in these ?1-tracking approaches, which has not been fully studied. We propose to perform template updating, in a new perspective, as an online incremental dictionary learning problem, which is efficiently solved through an online optimization procedure. To guarantee the robustness and adaptability of the tracking algorithm, we also propose to build a multi-lifespan dictionary model. By building target dictionaries of different lifespans, effective object observations can be obtained to deal with the well-known drifting problem in tracking and thus improve the tracking accuracy. We derive effective observa- tion models both generatively and discriminatively based on the online multi-lifespan dictionary learning model and deploy them to the Bayesian sequential estimation framework to perform tracking. The proposed approach has been extensively evaluated on ten challenging video sequences. Experimental results demonstrate the effectiveness of the online learned templates, as well as the state-of-the-art tracking performance of the proposed approach.

5 0.74824154 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection

Author: Lukáš Neumann, Jiri Matas

Abstract: An unconstrained end-to-end text localization and recognition method is presented. The method introduces a novel approach for character detection and recognition which combines the advantages of sliding-window and connected component methods. Characters are detected and recognized as image regions which contain strokes of specific orientations in a specific relative position, where the strokes are efficiently detected by convolving the image gradient field with a set of oriented bar filters. Additionally, a novel character representation efficiently calculated from the values obtained in the stroke detection phase is introduced. The representation is robust to shift at the stroke level, which makes it less sensitive to intra-class variations and the noise induced by normalizing character size and positioning. The effectiveness of the representation is demonstrated by the results achieved in the classification of real-world characters using an euclidian nearestneighbor classifier trained on synthetic data in a plain form. The method was evaluated on a standard dataset, where it achieves state-of-the-art results in both text localization and recognition.

6 0.74609578 180 iccv-2013-From Where and How to What We See

7 0.74537838 442 iccv-2013-Video Segmentation by Tracking Many Figure-Ground Segments

8 0.7432726 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning

9 0.7423135 425 iccv-2013-Tracking via Robust Multi-task Multi-view Joint Sparse Representation

10 0.74186295 230 iccv-2013-Latent Data Association: Bayesian Model Selection for Multi-target Tracking

11 0.74141312 414 iccv-2013-Temporally Consistent Superpixels

12 0.74101168 150 iccv-2013-Exemplar Cut

13 0.74089682 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition

14 0.74067247 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

15 0.74002498 379 iccv-2013-Semantic Segmentation without Annotating Segments

16 0.73986197 349 iccv-2013-Regionlets for Generic Object Detection

17 0.73977035 338 iccv-2013-Randomized Ensemble Tracking

18 0.73904753 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

19 0.73809969 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging

20 0.73793244 65 iccv-2013-Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses