iccv iccv2013 iccv2013-275 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dingzeyu Li, Qifeng Chen, Chi-Keung Tang
Abstract: This paper demonstrates how the nonlocal principle benefits video matting via the KNN Laplacian, which comes with a straightforward implementation using motionaware K nearest neighbors. In hindsight, the fundamental problem to solve in video matting is to produce spatiotemporally coherent clusters of moving foreground pixels. When used as described, the motion-aware KNN Laplacian is effective in addressing this fundamental problem, as demonstrated by sparse user markups typically on only one frame in a variety of challenging examples featuring ambiguous foreground and background colors, changing topologies with disocclusion, significant illumination changes, fast motion, and motion blur. When working with existing Laplacian-based systems, our Laplacian is expected to benefit them immediately with improved clustering of moving foreground pixels.
Reference: text
sentIndex sentText sentNum sentScore
1 hk Columbia University Stanford University HKUST Abstract This paper demonstrates how the nonlocal principle benefits video matting via the KNN Laplacian, which comes with a straightforward implementation using motionaware K nearest neighbors. [sent-6, score-1.214]
2 In hindsight, the fundamental problem to solve in video matting is to produce spatiotemporally coherent clusters of moving foreground pixels. [sent-7, score-0.893]
3 Successful works rely on generating dense trimaps or precise strokes in all frames to ensure good color samples for solving the alpha. [sent-12, score-0.19]
4 On the other hand, if we can produce spatially and temporally coherent clusters of moving foreground pixels, then ideally the user only needs to specify a single pixel in each cluster to drive the automatic algorithm to produce a spatio-temporally coherent video matte. [sent-13, score-0.435]
5 The matting Laplacian has been ∗The research was supported by the Hong Kong Research Grant Council under grant number 619313. [sent-15, score-0.596]
6 Comparison with nonlocal video matting [8] (middle row) on rabbit. [sent-17, score-1.046]
7 KNN video matting (bottom row) produces significantly better results in the presence of ambiguous background and foreground colors. [sent-18, score-0.836]
8 The use of optical flow as motion cue is particularly helpful in disambiguating complex situations where texture information is similar. [sent-19, score-0.228]
9 (see electronic version) widely adopted since closed-form matting and spectral matting [16, 17]. [sent-20, score-1.212]
10 Its definition is based over a small local window, and it was shown in nonlocal image matting [14] that the matting Laplacian produces scattered matting components with the use of the local color line model. [sent-21, score-2.179]
11 Later, nonlocal video matting [8] demonstrated good matting results, but the implementation is complicated involving several steps with specialized data structure and a matte regularization step in order to produce coherent video mattes. [sent-22, score-2.001]
12 This paper contributes to video matting by incorporating motion information in the so-called KNN Laplacian to make it motion-aware. [sent-23, score-0.794]
13 This is the first attempt to empirically show this simple strategy is effective in producing spatio-temporally coherent pixel clusters of moving pixels. [sent-24, score-0.142]
14 In principle, unlike nonlocal movie denoising [6] which argued against motion information, we utilize optical flow results when computing the motion-aware KNN Laplacian. [sent-25, score-0.591]
15 When used on its own, it allows for sparse user markups and alpha constraints to be incorporated in a closed-form solution to produce competitive matting results, as shown in our qualitative as well as quantitative evaluation. [sent-27, score-0.952]
16 l existing video matting systems based on graph Laplacian, thus benefiting them immediately with improved clustering of moving foreground pixels. [sent-31, score-0.84]
17 Related Work See [29] for a comprehensive survey on image and video matting before 2008. [sent-33, score-0.713]
18 The nonlocal principle was success- fully applied to image and video denoising [6] where the authors argued against the use of motion in video denoising. [sent-35, score-0.733]
19 Two recent contributions [7, 14] applied the nonlocal principle in natural image matting. [sent-36, score-0.388]
20 For video matting, the first attempt tapping into the nonlocal principle is [8] which, similar to [6], does not employ explicit motion information. [sent-37, score-0.586]
21 The method uses the multi-frame nonlocal matting Laplacian proposed in [14] defined over a nonlocal neighborhood in the spatio-temporal domain. [sent-38, score-1.262]
22 To produce a video matte, classical video matting [10] requires the user to paint a dense trimap to be propagated to all video frames before single image matting is applied on each of them. [sent-41, score-1.915]
23 Bayesian video matting [1] extends Bayesian image matting [9] by defining proper priors using natural image statistics. [sent-42, score-1.309]
24 Hardware-assisted systems [13, 19] automatically generate and propagate trimaps in all video frames before image matting is applied on each frame. [sent-44, score-0.823]
25 Without using motion information or optical flows, their emphasis is on complete automation rather than temporal consistency of the resulting mattes. [sent-45, score-0.251]
26 Recent work [3] addresses temporally-coherent video matting by adaptive trimap propagation and matte filtering in the temporal domain. [sent-47, score-1.243]
27 Since the matting Laplacian [17] was used, the trimap needs to be precise and dense to cluster relevant but scattered matting components. [sent-48, score-1.465]
28 To maintain spatio-temporal consistency of the object cutout, 3D meanshift was employed in interactive video cut [27] to cluster relevant pixels. [sent-50, score-0.141]
29 The geodesic framework was extended in [2] in spatio-temporal volume for video segmentation. [sent-51, score-0.153]
30 Rather than early commitment to optical flow vectors, which may be inaccurate, multiple candidates were kept in [18] in their graph construction to embed temporal consistency without committing to any motion vectors. [sent-52, score-0.308]
31 Motion vectors were used in [4] to shift local windows/classifiers which does not require highly accurate optical flow information. [sent-55, score-0.147]
32 While we also use optical flows, we embed at each pixel several motion candidates (specifically, K of them) when encoding our affinity matrix. [sent-56, score-0.306]
33 Nonlocal Principle for Video Matting Rather than sampling reliable albeit unknown foreground/background color pairs, we advocate good pixel 33659003 clustering for video matting. [sent-58, score-0.18]
34 To produce good clusters we leverage the nonlocal principle in video denoising [6] but argue for the use of motion information in video matting. [sent-60, score-0.78]
35 For completeness, we include a concise summary of the nonlocal principle while highlighting its motion-awareness for video matting. [sent-61, score-0.505]
36 By analogy of (2), the expected value of alpha matte: E[αi] ≈? [sent-71, score-0.223]
37 In nonlocal image matting [14]: • • the nonlocal principle applies to α as in (5); the conditional distribution α given X is E[αi |X(i) = Xthe(j c)o] d=i αj, lth daits itsri,b pixels αw gitihv ethne X same appearance are expected to share the same alpha value. [sent-74, score-1.56]
38 KNN Laplacian Applying the nonlocal principle in KNN video matting, we assume the alpha at pixel iis a weighted average of the alphas of its K nearest neighbors in the feature space which may not be necessarily spatially close to each other: E[αi] ≈ j∈K? [sent-77, score-0.893]
39 Left shows K nearest neighbors (red) of the selected point (green); note the nonlocal distribution of the neighbors; right shows a typical sparse nonlocal two-frame affinity matrix A in KNN video matting. [sent-83, score-0.953]
40 In video matting, similar pixels should have similar appearance and motion, which agrees in principle with classical perceptual grouping or specifically, grouping by common fate [25]. [sent-93, score-0.242]
41 Figure 2 shows that our KNN Laplacian is conducive to good graph clusters when motion information is encoded in feature vector X. [sent-94, score-0.15]
42 We note for most video matting approaches, optical flow is almost exclusively used in trimap generation only. [sent-95, score-1.133]
43 In contrast, as we will shortly see, optical flow is directly used in constructing our Laplacian, making our method fundamentally different because temporal consistency is considered in the matte optimization rather than the trimap generation stage. [sent-96, score-0.694]
44 We prefer to match p to the candidate with consistent (apparent) motion, since it is likely that both of them end up moving to (or remain stationary on) the same background in two consecutive frames, and thus likely to have the same alpha as they already have the same foreground colors. [sent-99, score-0.325]
45 In contrast to nonlocal denoising [6] where noise to be × removed is white noise without temporal consistency, our goal is to pull out a temporally-coherent foreground matte and so motion information is considered in constructing a 33659014 x × y × tλsλfλptime kim720x × × 4y8 0 × × t 11120. [sent-100, score-0.774]
46 Parameters and running times in secs for KNN video matting on a machine with an Intel i7 2. [sent-111, score-0.738]
47 Despite that, it is not at odds with nonlocal video denoising (see Figure 8 of [6]): both nonlocal methods define proper feature vector and match similar and moving pixels to compute optimal solutions. [sent-116, score-0.873]
48 We first describe the feature vector X which results in an asymmetric affinity A for embedding temporal biidnire acnti aonsyalm mmoettiroinc consistency, oarnd e a btwedod-finragm tee mL pfoorra ml bini-imizing the Laplacian energy to compute an optimal video matte. [sent-120, score-0.289]
49 KNN video matting has a straightforward implementation and produces comparable or at times better results than state-of-the-art approaches [2, 4, 8]. [sent-121, score-0.768]
50 Feature Vector X Our feature vector should be conducive to grouping similar pixels together, that is, pixels sharing similar appearance and similar motion should have similar α. [sent-124, score-0.186]
51 Thus it is easy to incorporate motion information in constructing motion-aware KNN Laplacian, not limited to trimap generation as done by many existing systems [3, 10, 12, 15]. [sent-126, score-0.373]
52 There are three parameters: λs controls the amount of spatial coherence, λf controls the influence of optical flow [22], and λp controls the size of image patch, which is inspired by PatchMatch [5]. [sent-127, score-0.246]
53 Temporal coherence: quantitative comparison with nonlocal video matting [8] which uses the multi-frame nonlocal Laplacian. [sent-130, score-1.398]
54 KNN video matting uses the two-frame KNN Laplacian; our video mattes not only give smaller error between consecutive α but also show a more stable temporal coherence over [8] particularly on kim. [sent-131, score-1.008]
55 In [8], in order to preserve temporal coherence, three frames are used to construct their affinity matrix. [sent-137, score-0.183]
56 (10) where A11 and A22 are intra-frame affinity matrix, computed w Aithina fdram Ae 1 and frame 2 respectively, and A12 apnutde dA w21i hdienscr firbaem tehe 1 ainntedr- ffrraammee affinity tinivfoelrym,a atniodn Abeatwndeen A the two frames under consideration. [sent-143, score-0.303]
57 In general, to enhance temporal coherence by supplying more candidate nonlocal matches, a larger affinity matrix involving n ≥ 2 frames can cbhee dse,f aine ladr gine a asfifminiiltayr manner. [sent-145, score-0.571]
58 When λs is too small, the matte becomes brittle since the affinity matrix is built using color and motion information which are ambiguous for small λs . [sent-154, score-0.415]
59 Alpha Constraint vs Trimap Propagation Most recent methods [3, 8, 10, 13, 15, 19] require trimaps for each frame be explicitly available to compute α in a video sequence, either manually drawn or computed via optical flow. [sent-158, score-0.366]
60 Accurate trimap propagation requires reli- ×× able optical flow estimation, because the trimap propagated to the next frame is expected to be error free: wronglypropagated definite foreground or hard constraint is often detrimental and hard to correct during optimization. [sent-160, score-0.882]
61 However, this accuracy is not guaranteed even with the stateof-the-art optical flow algorithms. [sent-161, score-0.147]
62 On the other hand, one may argue for trajectory estimation, which is more accurate since sophisticated motion models (such as affinity tensors [20]) are considered. [sent-162, score-0.174]
63 However, the estimated trajectories are usually too sparse to be practical for trimap propagation, typically around 200 trajectories in a 100-frame video at 600 400 resolution as in [20]. [sent-163, score-0.39]
64 In KNN video matting, we make use of αt as soft constraint to optimize αt+1, which has the additional advantage of refining αt after the optimization. [sent-164, score-0.142]
65 When erroneous alphas are present, the K nearest neighbors are capable of nonlocally averaging the alpha to ameliorate their effect. [sent-203, score-0.346]
66 Unlike [12] our alpha map on the previous frame is not motion-warped to the current frame where the current alpha is being optimized. [sent-204, score-0.612]
67 For fast and rapid motion, optical flows tend to be inaccurate, thus the incorrectly 33659036 surferOptical FlowToo large λfGood λfwalkOptical FlowToo small λfGood λf Figure 8. [sent-206, score-0.16]
68 In videos such as surfer, λf should not be too large because the optical flows are noisy and inaccurate, whereas in challenging example walk, larger λf can extract a clearer alpha matte, since optical flow gives good estimation on the man’s movement. [sent-209, score-0.53]
69 warped alphas may introduce bad constraints, not to mention that alpha warping introduces complication as the mapping is seldom one-to-one. [sent-210, score-0.269]
70 When motion is inaccurate, we can weaken the influence of optical flow (by adjusting λf) so that color/texture information can dominate. [sent-211, score-0.228]
71 6GHz CPU, note that the huge Laplacian system is twice as large as the image matting Laplacian. [sent-217, score-0.596]
72 Optical flow computation using [22] is around one minute per frame and is not in- cluded in the table. [sent-225, score-0.14]
73 Figure 4 shows the quantitative comparison on temporal coherence with nonlocal video matting which will be explained in the sequel. [sent-226, score-1.176]
74 K is the number of nearest neighbors for nonlocal matching. [sent-231, score-0.41]
75 This shows that K is not critical: although the results look similar, smaller K allows for faster running time while an overly large K produces irrelevant matches manifested as unsightly matting artifacts while the overall quality is still maintained. [sent-234, score-0.658]
76 While a small λs produces a brittle matte and a large λs over-smooths the result, we found a wide range of λs between the two extremes produces visually good results. [sent-239, score-0.27]
77 However, for hairy objects (top row of Figure 7), a smaller patch can preserve details better, when most hair strands have width of one pixel or less. [sent-243, score-0.146]
78 Textural information is less useful for hairy foreground kim and amira as the local texture at each hair strand is similar with a lot of color ambiguities. [sent-244, score-0.227]
79 When λf = 0, no motion is considered in the feature vector, then the KNN Laplacian is similar to the multiframe nonlocal Laplacian [8], except in the number offrames used (three in [8]) and in the affinity matrix construction (asymmetric here). [sent-247, score-0.507]
80 Unless otherwise stated, only one trimap on a single frame is given for KNN video matting. [sent-252, score-0.473]
81 Comparison on kim with the latest nonlocal video matting [8] is shown in Figure 9, which demonstrates that our motion-aware feature 33659047 Frame 41 Frame 42 Frame 43 Figure 9. [sent-254, score-1.064]
82 Based on the same trimaps, we produce more accurate alpha mattes without additional post-processing and matte regularization. [sent-257, score-0.483]
83 When the foreground and background have similar colors thus producing low contrast edges, and in situations where texture falls short of being discriminating, motion cues are useful in extracting good nonlocal matches. [sent-266, score-0.494]
84 In [8] their multi-frame nonlocal Laplacian does not explicitly consider optical flow information, which results in blurry and unclear boundary. [sent-268, score-0.499]
85 Figure 10 demonstrates that our method can naturally handle disocclusion and changing topology via KNN search for nonlocal neighbors, while video snapcut [4] produces a hard segmentation. [sent-270, score-0.635]
86 Figure 11 compares with geodesic video matting [2] which also only needs sparse scribbles from the user. [sent-272, score-0.749]
87 Figure 12 demonstrates the robustness of our soft alpha constraints and motionaware feature vector in KNN video matting. [sent-276, score-0.437]
88 This example demonstrates that KNN video matting can handle disocclusion and topological changes. [sent-279, score-0.781]
89 Similar to [4], we do not apply any user input on Frame 21 to 27; specifically, the only trimap provided is on the first frame. [sent-280, score-0.302]
90 With only motion-aware KNN Laplacian, while our result is not visually as good as that in [10], no trimap propagation is done in KNN video matting: only several trimaps on the keyframes are supplied. [sent-287, score-0.492]
91 A blurry image/video in general is modeled by image convolution rather than the image compositing equation (1) assumed in alpha matting. [sent-289, score-0.262]
92 Conclusion We study the nonlocal principle applied to video matting and use motion to disambiguate complex situations where colors and/or texture alone would fail. [sent-292, score-1.2]
93 This allows for less user input (or one trimap) and simple alpha constraint being incorporated in the closed-form solution to handle significant illumination changes among other challenging cases. [sent-294, score-0.273]
94 With its simple implementation, we expect that motion-aware KNN Laplacian can be readily incorporated into Laplacian-based video matting systems to benefit them with better moving pixel clustering. [sent-299, score-0.816]
95 A geodesic framework for fast interactive image and video segmentation and matting. [sent-314, score-0.153]
96 Comparison with geodesic matting [2] on talk using sparse strokes. [sent-324, score-0.651]
97 Only strokes on the first frame are given and all the αs are computed using our closed-form solution. [sent-325, score-0.142]
98 Our results (bottom) are robust to stark illumination changes given only a single input trimap (Frame 12). [sent-329, score-0.273]
99 In video snapcut (top), the user needs to supply quite a number of additional strokes to achieve a comparable segmentation, for example, by carefully drawn control points on Frame 15 and 51 as well as blue strokes on the intermediate frames. [sent-331, score-0.326]
100 Frame 29 Frame 37 Motion-aware Frame 15 Frame 18 KNN Laplacian degrades gracefully in fast and complex motion in front of a background with ambigu- ous colors (left, jurassic), and in presence of motion blur (right, waving). [sent-333, score-0.198]
wordName wordTfidf (topN-words)
[('matting', 0.596), ('knn', 0.338), ('nonlocal', 0.333), ('trimap', 0.273), ('laplacian', 0.251), ('alpha', 0.223), ('matte', 0.175), ('video', 0.117), ('affinity', 0.093), ('optical', 0.09), ('frame', 0.083), ('motion', 0.081), ('trimaps', 0.076), ('flows', 0.07), ('mattes', 0.067), ('foreground', 0.062), ('snapcut', 0.062), ('strokes', 0.059), ('flow', 0.057), ('temporal', 0.056), ('coherence', 0.055), ('principle', 0.055), ('neighbors', 0.054), ('hairy', 0.054), ('surfer', 0.054), ('motionaware', 0.054), ('disocclusion', 0.05), ('mf', 0.049), ('alphas', 0.046), ('amira', 0.046), ('hindsight', 0.046), ('markups', 0.046), ('pixel', 0.042), ('conducive', 0.04), ('moving', 0.04), ('produces', 0.037), ('geodesic', 0.036), ('frames', 0.034), ('inaccurate', 0.033), ('controls', 0.033), ('vb', 0.032), ('man', 0.032), ('coherent', 0.031), ('bai', 0.03), ('fgood', 0.03), ('flowtoo', 0.03), ('jurassic', 0.03), ('twoframeinputmatinglaplaciankn', 0.03), ('xtlx', 0.03), ('denoising', 0.03), ('textural', 0.03), ('user', 0.029), ('clusters', 0.029), ('patch', 0.028), ('vf', 0.026), ('propagation', 0.026), ('grouping', 0.025), ('immediately', 0.025), ('soft', 0.025), ('running', 0.025), ('consistency', 0.024), ('ambiguous', 0.024), ('livecut', 0.023), ('courtesy', 0.023), ('arts', 0.023), ('cutout', 0.023), ('nearest', 0.023), ('asymmetric', 0.023), ('salesin', 0.022), ('laplacians', 0.022), ('excessively', 0.022), ('strand', 0.022), ('hair', 0.022), ('pages', 0.022), ('matusik', 0.021), ('color', 0.021), ('incorporated', 0.021), ('uf', 0.021), ('chuang', 0.021), ('brittle', 0.021), ('pixels', 0.02), ('compositing', 0.02), ('waving', 0.02), ('spectral', 0.02), ('talk', 0.019), ('patchmatch', 0.019), ('quantitative', 0.019), ('blurry', 0.019), ('constructing', 0.019), ('pull', 0.018), ('ub', 0.018), ('temporally', 0.018), ('changing', 0.018), ('produce', 0.018), ('colors', 0.018), ('demonstrates', 0.018), ('gracefully', 0.018), ('vlfeat', 0.018), ('implementation', 0.018), ('propagated', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
Author: Dingzeyu Li, Qifeng Chen, Chi-Keung Tang
Abstract: This paper demonstrates how the nonlocal principle benefits video matting via the KNN Laplacian, which comes with a straightforward implementation using motionaware K nearest neighbors. In hindsight, the fundamental problem to solve in video matting is to produce spatiotemporally coherent clusters of moving foreground pixels. When used as described, the motion-aware KNN Laplacian is effective in addressing this fundamental problem, as demonstrated by sparse user markups typically on only one frame in a variety of challenging examples featuring ambiguous foreground and background colors, changing topologies with disocclusion, significant illumination changes, fast motion, and motion blur. When working with existing Laplacian-based systems, our Laplacian is expected to benefit them immediately with improved clustering of moving foreground pixels.
2 0.45883968 19 iccv-2013-A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting
Author: Inchang Choi, Sunyeong Kim, Michael S. Brown, Yu-Wing Tai
Abstract: Single image matting techniques assume high-quality input images. The vast majority of images on the web and in personal photo collections are encoded using JPEG compression. JPEG images exhibit quantization artifacts that adversely affect the performance of matting algorithms. To address this situation, we propose a learning-based post-processing method to improve the alpha mattes extracted from JPEG images. Our approach learns a set of sparse dictionaries from training examples that are used to transfer details from high-quality alpha mattes to alpha mattes corrupted by JPEG compression. Three different dictionaries are defined to accommodate different object structure (long hair, short hair, and sharp boundaries). A back-projection criteria combined within an MRF framework is used to automatically select the best dictionary to apply on the object’s local boundary. We demonstrate that our method can produces superior results over existing state-of-the-art matting algorithms on a variety of inputs and compression levels.
3 0.11262155 296 iccv-2013-On the Mean Curvature Flow on Graphs with Applications in Image and Manifold Processing
Author: Abdallah El_Chakik, Abderrahim Elmoataz, Ahcene Sadi
Abstract: In this paper, we propose an adaptation and transcription of the mean curvature level set equation on a general discrete domain (weighted graphs with arbitrary topology). We introduce the perimeters on graph using difference operators and define the curvature as the first variation of these perimeters. Our proposed approach of mean curvature unifies both local and non local notions of mean curvature on Euclidean domains. Furthermore, it allows the extension to the processing of manifolds and data which can be represented by graphs.
4 0.10517143 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
5 0.10460078 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
6 0.096900657 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
7 0.086248435 317 iccv-2013-Piecewise Rigid Scene Flow
8 0.079938866 417 iccv-2013-The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection
9 0.079060234 282 iccv-2013-Multi-view Object Segmentation in Space and Time
10 0.077003457 39 iccv-2013-Action Recognition with Improved Trajectories
11 0.076775715 256 iccv-2013-Locally Affine Sparse-to-Dense Matching for Motion and Occlusion Estimation
12 0.075459369 105 iccv-2013-DeepFlow: Large Displacement Optical Flow with Deep Matching
13 0.073855862 300 iccv-2013-Optical Flow via Locally Adaptive Fusion of Complementary Data Costs
14 0.071058601 12 iccv-2013-A General Dense Image Matching Framework Combining Direct and Feature-Based Costs
15 0.070877612 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation
16 0.06816715 143 iccv-2013-Estimating Human Pose with Flowing Puppets
17 0.06559325 82 iccv-2013-Compensating for Motion during Direct-Global Separation
18 0.061820075 108 iccv-2013-Depth from Combining Defocus and Correspondence Using Light-Field Cameras
19 0.061803013 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
20 0.06169815 354 iccv-2013-Robust Dictionary Learning by Error Source Decomposition
topicId topicWeight
[(0, 0.135), (1, -0.043), (2, 0.016), (3, 0.074), (4, -0.033), (5, 0.057), (6, -0.024), (7, 0.026), (8, 0.049), (9, -0.006), (10, -0.022), (11, 0.039), (12, 0.11), (13, 0.0), (14, -0.031), (15, 0.018), (16, -0.064), (17, -0.046), (18, 0.013), (19, 0.033), (20, 0.076), (21, -0.011), (22, 0.023), (23, -0.041), (24, -0.006), (25, 0.063), (26, 0.036), (27, 0.002), (28, 0.077), (29, -0.034), (30, -0.031), (31, -0.042), (32, 0.025), (33, 0.106), (34, -0.054), (35, 0.027), (36, 0.073), (37, 0.031), (38, -0.01), (39, 0.019), (40, -0.048), (41, 0.013), (42, -0.083), (43, -0.133), (44, 0.041), (45, -0.021), (46, 0.287), (47, 0.041), (48, 0.095), (49, 0.139)]
simIndex simValue paperId paperTitle
same-paper 1 0.90238917 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
Author: Dingzeyu Li, Qifeng Chen, Chi-Keung Tang
Abstract: This paper demonstrates how the nonlocal principle benefits video matting via the KNN Laplacian, which comes with a straightforward implementation using motionaware K nearest neighbors. In hindsight, the fundamental problem to solve in video matting is to produce spatiotemporally coherent clusters of moving foreground pixels. When used as described, the motion-aware KNN Laplacian is effective in addressing this fundamental problem, as demonstrated by sparse user markups typically on only one frame in a variety of challenging examples featuring ambiguous foreground and background colors, changing topologies with disocclusion, significant illumination changes, fast motion, and motion blur. When working with existing Laplacian-based systems, our Laplacian is expected to benefit them immediately with improved clustering of moving foreground pixels.
2 0.78646988 19 iccv-2013-A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting
Author: Inchang Choi, Sunyeong Kim, Michael S. Brown, Yu-Wing Tai
Abstract: Single image matting techniques assume high-quality input images. The vast majority of images on the web and in personal photo collections are encoded using JPEG compression. JPEG images exhibit quantization artifacts that adversely affect the performance of matting algorithms. To address this situation, we propose a learning-based post-processing method to improve the alpha mattes extracted from JPEG images. Our approach learns a set of sparse dictionaries from training examples that are used to transfer details from high-quality alpha mattes to alpha mattes corrupted by JPEG compression. Three different dictionaries are defined to accommodate different object structure (long hair, short hair, and sharp boundaries). A back-projection criteria combined within an MRF framework is used to automatically select the best dictionary to apply on the object’s local boundary. We demonstrate that our method can produces superior results over existing state-of-the-art matting algorithms on a variety of inputs and compression levels.
3 0.48785633 160 iccv-2013-Fast Object Segmentation in Unconstrained Video
Author: Anestis Papazoglou, Vittorio Ferrari
Abstract: We present a technique for separating foreground objects from the background in a video. Our method isfast, , fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-theart background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on objectproposals [14, 16, 27], while being orders of magnitude faster.
4 0.4634974 145 iccv-2013-Estimating the Material Properties of Fabric from Video
Author: Katherine L. Bouman, Bei Xiao, Peter Battaglia, William T. Freeman
Abstract: Passively estimating the intrinsic material properties of deformable objects moving in a natural environment is essential for scene understanding. We present a framework to automatically analyze videos of fabrics moving under various unknown wind forces, and recover two key material properties of the fabric: stiffness and area weight. We extend features previously developed to compactly represent static image textures to describe video textures, such as fabric motion. A discriminatively trained regression model is then used to predict the physical properties of fabric from these features. The success of our model is demonstrated on a new, publicly available database offabric videos with corresponding measured ground truth material properties. We show that our predictions are well correlated with ground truth measurements of stiffness and density for the fabrics. Our contributions include: (a) a database that can be used for training and testing algorithms for passively predicting fabric properties from video, (b) an algorithm for predicting the material properties of fabric from a video, and (c) a perceptual study of humans’ ability to estimate the material properties of fabric from videos and images.
5 0.46253234 15 iccv-2013-A Generalized Low-Rank Appearance Model for Spatio-temporally Correlated Rain Streaks
Author: Yi-Lei Chen, Chiou-Ting Hsu
Abstract: In this paper, we propose a novel low-rank appearance model for removing rain streaks. Different from previous work, our method needs neither rain pixel detection nor time-consuming dictionary learning stage. Instead, as rain streaks usually reveal similar and repeated patterns on imaging scene, we propose and generalize a low-rank model from matrix to tensor structure in order to capture the spatio-temporally correlated rain streaks. With the appearance model, we thus remove rain streaks from image/video (and also other high-order image structure) in a unified way. Our experimental results demonstrate competitive (or even better) visual quality and efficient run-time in comparison with state of the art.
6 0.45564887 82 iccv-2013-Compensating for Motion during Direct-Global Separation
7 0.45104024 312 iccv-2013-Perceptual Fidelity Aware Mean Squared Error
8 0.44378403 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations
9 0.43659818 351 iccv-2013-Restoring an Image Taken through a Window Covered with Dirt or Rain
10 0.41980791 164 iccv-2013-Fibonacci Exposure Bracketing for High Dynamic Range Imaging
11 0.41107431 23 iccv-2013-A New Image Quality Metric for Image Auto-denoising
12 0.39908949 299 iccv-2013-Online Video SEEDS for Temporal Window Objectness
13 0.38799092 33 iccv-2013-A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis
14 0.37989628 207 iccv-2013-Illuminant Chromaticity from Image Sequences
15 0.37988013 414 iccv-2013-Temporally Consistent Superpixels
16 0.37603822 98 iccv-2013-Cross-Field Joint Image Restoration via Scale Map
17 0.37246561 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition
18 0.36759922 151 iccv-2013-Exploiting Reflection Change for Automatic Reflection Removal
19 0.36362019 101 iccv-2013-DCSH - Matching Patches in RGBD Images
20 0.36031497 170 iccv-2013-Fingerspelling Recognition with Semi-Markov Conditional Random Fields
topicId topicWeight
[(2, 0.061), (26, 0.073), (31, 0.372), (42, 0.07), (64, 0.05), (73, 0.041), (89, 0.166), (98, 0.02)]
simIndex simValue paperId paperTitle
1 0.94818902 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
2 0.91891187 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
Author: Carlos Fernandez-Granda, Emmanuel J. Candès
Abstract: We present a framework to super-resolve planar regions found in urban scenes and other man-made environments by taking into account their 3D geometry. Such regions have highly structured straight edges, but this prior is challenging to exploit due to deformations induced by the projection onto the imaging plane. Our method factors out such deformations by using recently developed tools based on convex optimization to learn a transform that maps the image to a domain where its gradient has a simple group-sparse structure. This allows to obtain a novel convex regularizer that enforces global consistency constraints between the edges of the image. Computational experiments with real images show that this data-driven approach to the design of regularizers promoting transform-invariant group sparsity is very effective at high super-resolution factors. We view our approach as complementary to most recent superresolution methods, which tend to focus on hallucinating high-frequency textures.
3 0.90408391 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
4 0.88488603 357 iccv-2013-Robust Matrix Factorization with Unknown Noise
Author: Deyu Meng, Fernando De_La_Torre
Abstract: Many problems in computer vision can be posed as recovering a low-dimensional subspace from highdimensional visual data. Factorization approaches to lowrank subspace estimation minimize a loss function between an observed measurement matrix and a bilinear factorization. Most popular loss functions include the L2 and L1 losses. L2 is optimal for Gaussian noise, while L1 is for Laplacian distributed noise. However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian. To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise model. The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of noise distributions. The parameters of the MoG model can be estimated with a maximum likelihood method, while the subspace is computed with standard approaches. We illustrate the benefits of our approach in extensive syn- thetic and real-world experiments including structure from motion, face modeling and background subtraction.
5 0.88394046 38 iccv-2013-Action Recognition with Actons
Author: Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
Abstract: With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level “acton ” representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe theproperties ofbeing compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness ofapplying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.
same-paper 6 0.8154037 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
7 0.77342778 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
8 0.73080122 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
9 0.71587682 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
10 0.6977877 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
11 0.69240618 210 iccv-2013-Image Retrieval Using Textual Cues
12 0.69002819 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
13 0.66451377 180 iccv-2013-From Where and How to What We See
14 0.66212946 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging
15 0.66090029 19 iccv-2013-A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting
17 0.64712775 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
18 0.64268821 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors
19 0.64110374 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
20 0.63801706 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation