cvpr cvpr2013 cvpr2013-453 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiaojie Guo, Xiaochun Cao, Xiaowu Chen, Yi Ma
Abstract: Given an area of interest in a video sequence, one may want to manipulate or edit the area, e.g. remove occlusions from or replace with an advertisement on it. Such a task involves three main challenges including temporal consistency, spatial pose, and visual realism. The proposed method effectively seeks an optimal solution to simultaneously deal with temporal alignment, pose rectification, as well as precise recovery of the occlusion. To make our method applicable to long video sequences, we propose a batch alignment method for automatically aligning and rectifying a small number of initial frames, and then show how to align the remaining frames incrementally to the aligned base images. From the error residual of the robust alignment process, we automatically construct a trimap of the region for each frame, which is used as the input to alpha matting methods to extract the occluding foreground. Experimental results on both simulated and real data demonstrate the accurate and robust performance of our method.
Reference: text
sentIndex sentText sentNum sentScore
1 com ie Abstract Given an area of interest in a video sequence, one may want to manipulate or edit the area, e. [sent-9, score-0.164]
2 remove occlusions from or replace with an advertisement on it. [sent-11, score-0.167]
3 The proposed method effectively seeks an optimal solution to simultaneously deal with temporal alignment, pose rectification, as well as precise recovery of the occlusion. [sent-13, score-0.134]
4 To make our method applicable to long video sequences, we propose a batch alignment method for automatically aligning and rectifying a small number of initial frames, and then show how to align the remaining frames incrementally to the aligned base images. [sent-14, score-0.592]
5 From the error residual of the robust alignment process, we automatically construct a trimap of the region for each frame, which is used as the input to alpha matting methods to extract the occluding foreground. [sent-15, score-0.992]
6 For instance, if one wants to change the original facade bounded by the green window (Fig. [sent-19, score-0.207]
7 But editing a long sequence of images remains extremely difficult. [sent-22, score-0.188]
8 Since the sequence is usually captured by a hand-held camera, images of the region of interest will appear to be scaled, rotated or deformed throughout the sequence. [sent-23, score-0.194]
9 Left: an original frame, where the facade of a selected building and the occlusion map are shown within the small windows. [sent-26, score-0.312]
10 Right: the result by replacing the facade with a new texture. [sent-27, score-0.207]
11 The three small windows are the new facade (top-left), the trimap (bottom-left), and the mask of occlusion (bottom-right), respectively. [sent-28, score-0.466]
12 To guarantee visual consistency and alleviate human interaction, we first need to automatically align the region of interest precisely across the sequence. [sent-30, score-0.188]
13 [8] and [7] propose to align images by minimizing the sum of entropies of pixel values at each pixel location in the batch of aligned images. [sent-31, score-0.239]
14 Conversely, the least squares congealing procedure of [3], [4] seeks an alignment that minimizes the sum of squared distances between pairs of images. [sent-32, score-0.343]
15 The major drawback of the above approaches is that they do not simultaneously handle large illumination variations and gross pixel corruptions or partial occlusions that usually occur in real images. [sent-35, score-0.355]
16 [12] propose an algorithm named RASL to solve the task of image alignment by low-rank and sparse decomposition. [sent-37, score-0.283]
17 This method takes into account gross corruptions, and thus gives better results in real applications. [sent-38, score-0.101]
18 RASL only takes care of temporal alignment but not the 2D deformation of the region being aligned. [sent-40, score-0.385]
19 The rectified versions of the objects (bounded by red dotted windows) are displayed on top-left. [sent-44, score-0.151]
20 circular Starbucks logo looks awkward if directly pasted on the deformed facade. [sent-45, score-0.162]
21 One way of improving visual quality of editing is to transform the logo according to the facade pose. [sent-46, score-0.42]
22 Another way is to rectify the facade as shown in Fig. [sent-48, score-0.305]
23 This can be done by TILT [19], which assumes that the rectified texture has lower rank than its deformed versions. [sent-50, score-0.356]
24 Actually, it is pervasive that many natural and man-made things (of interest for editing) have low-rank appearance in their rectified position. [sent-51, score-0.151]
25 2, in which the rank of each rectified (top-left) image is much lower than that of its original. [sent-53, score-0.221]
26 Hence it is natural to try to combine them together to handle the alignment and rectification problem for video editing. [sent-58, score-0.335]
27 Besides, to make the editing results look natural and real, it is crucial to preserve the occluding foreground. [sent-59, score-0.225]
28 That means the foreground need to be precisely matted, extracted, and synthesized too. [sent-60, score-0.151]
29 Plenty of literature about alpha matting has been published [9][2] [15]. [sent-61, score-0.278]
30 As discussed above, temporal alignment, spatial rectification, and occlusion of the area are three main issues we have to handle properly for the desired video editing task. [sent-64, score-0.38]
31 With the consideration of the limitation of sparse and low-rank decomposition on large data, we first construct the area basis based on several images. [sent-67, score-0.191]
32 Every individual area is then parsed into its intrinsic appearance and the corresponding residual, according to the well-constructed basis. [sent-68, score-0.096]
33 We automatically generate the trimap from the area and its residual, then adopt alpha matting methods to extract the foreground. [sent-69, score-0.56]
34 (a) The median of initial inputs without either alignment or rectification. [sent-73, score-0.335]
35 (c) The median with both alignment and rectification by our method. [sent-75, score-0.392]
36 Initial Batch Frame Alignment In the simplest case, the areas of interest are clean (without occlusions or gross corruptions) and temporally wellaligned. [sent-82, score-0.297]
37 If we stack the 2D areas as columns of a matrix B ∈ Rm×n, where m is the number of pixels of the area, Bthe ∈ m Ratrix B is low rank. [sent-83, score-0.124]
38 In the real world, the areas are very likely to have partial occlusions or specular reflections, which may break the low rankness of the matrix B. [sent-84, score-0.198]
39 Therefore, we have B + E = A, where E is the sparse residual matrix between the corrupted matrix A and the clean (low rank) area matrix B. [sent-86, score-0.489]
40 , τn ∈ G transform all the misaligned areas to well-aligned A1∈ ∈◦ τ1 , Aans2 ◦o τ2,. [sent-96, score-0.19]
41 To fix this, we may assume that the region of interest has some spatial structures and at its “normalized” pose, the region as a 2D matrix achieves the lowest-rank. [sent-108, score-0.151]
42 1 Based on these assumptions, we can naturally formulate our task as the fol- ΓΓ 1Please note that, the 2D region does not have to be a strictly low-rank texture for this to be helpful for resolving the ambiguity. [sent-109, score-0.107]
43 0-norm, ω and λ are the coeffiwciheentrse controlling the weights of the terms of the rank of each individual area in spatial space and the sparse matrix E, and R(·) represents the linear operator of reshaping a Evec,t aonrd db Rack(· t)o r eitps original h2eD l ifnoermar. [sent-117, score-0.242]
44 Alternatively, minimizing the natural convex surrogate for the objective function in (1) can exactly recover the low rank matrix B, as long as the rank of the matrix B to be recovered is not too high and the number of non-zero elements in E is not too large [1]. [sent-122, score-0.334]
45 Given a set of n frames are already aligned as above, when a new frame, say a arrives, we can try to rectify and align it to the basis B obtained from Algorithm 1, subject to some error e. [sent-255, score-0.267]
46 In general, the well-aligned and rectified region can be represented by a linear combination of the columns from the basis, i. [sent-256, score-0.208]
47 So the formulation of aligning a new frame is: min ? [sent-259, score-0.135]
48 Initialization: initial support Ω = 1m; while not converged do Output: Optimal solution (x∗ = xk; τ∗ = τ) where e represents the residual, and τ is the transformation matrix of a. [sent-264, score-0.112]
49 When the region to be aligned is corrupted by sparse occlusions or corruptions, the procedure proposed above can handle well. [sent-283, score-0.26]
50 In such case, we gradually remove some of the pixels that have very large reconstruction errors, and run the above robust alignment again on the remaining region. [sent-285, score-0.277]
51 Note that, the area does not contain any occlusions or corruptions if the values of the elements in e are all very close to zero. [sent-324, score-0.311]
52 The procedure of aligning a new frame is summarized in Algorithm 2. [sent-328, score-0.135]
53 Foreground Separation So far, we have the recovered temporally aligned and spatially rectified area and the associated error residual. [sent-332, score-0.417]
54 If our goal is only to recover a clean texture of the region, remove the occluding foreground, and replace the region with another texture throughout the sequence, then results from the above two steps would be sufficient. [sent-333, score-0.342]
55 However, if we want to super-impose the occluding foreground back to the edited region to achieve more realistic visual effect, we have to precisely segment out the foreground. [sent-334, score-0.444]
56 No- tice that the error residual e that we have obtained from the above robust alignment is not the foreground per se: it is the difference between the original images and the background. [sent-336, score-0.5]
57 Nevertheless, the error residual is very informative about where the occluding foreground is: in general, we 2In this paper, we discuss the and foreground are opaque. [sent-337, score-0.463]
58 cases that the objects of both background get large residual errors around occluding pixels; and small ones around background pixels. [sent-338, score-0.291]
59 Therefore, we can initially assign a pixel to foreground if the error is above certain threshold and to background if the error is small enough. [sent-339, score-0.15]
60 In this way, we obtain an initial trimap for foreground segmentation. [sent-341, score-0.306]
61 We then employ existing alpha matting methods [9][2] [15] to obtain a more precise segmentation of the foreground from the image frames. [sent-342, score-0.396]
62 Unless otherwise stated, the two paramete√rs are fixed through our experiments: ω = 5/n and λ = 3/√m, where n is the number of the images used to construct the area basis and m is the amount of pixels in the area of interest. [sent-347, score-0.281]
63 We first verify the ability of the batch frame alignment × Algorithm 1 to cope with varying levels of corruptions on a (rectified) checker-board pattern with white, gray and black blocks. [sent-348, score-0.603]
64 Translation, rotation, skew, occlusion rate and the number of images are the factors to be tested. [sent-349, score-0.105]
65 The task is to align and rectify the images to a 100 100 pixel sca ntoon ailcigaln nw ainnddow rec. [sent-350, score-0.164]
66 Qualita ive results of the proposed single frame alignment method (Algorithm 2) with respect o translation, rota ion, skew and occlusion. [sent-361, score-0.571]
67 Every three pictures of the rest are in one group, which represent the deformed and polluted version of the reference, the transformed result by our proposed and the corresponding residual, respectively. [sent-363, score-0.151]
68 To take into account the skew factor, we constantly set both x0 and y0 to be 5%, and test θ0 and s0 with different values. [sent-368, score-0.235]
69 2 skew level, expect for the one of 10◦ rotation and 0. [sent-372, score-0.292]
70 We repeat the above procedure to further verify the performance of the proposed algorithm with respect to different levels of occlusion and the number of images. [sent-374, score-0.105]
71 Figure 4 (d) shows the robustness of the proposed algorithm to random occlusion and skew level with x0 = 5%, y0 = 10% and θ0 = 10◦ . [sent-375, score-0.34]
72 It indicates that our algorithm can converge well up to 15% random occlusion and 0. [sent-376, score-0.105]
73 1 skew and 15% occlusion does not succeed in every trial, it has 90% successful rate. [sent-379, score-0.382]
74 The second is the result of aligned and rectified windows without removing occlusions by our method. [sent-387, score-0.272]
75 The third is the recovered clean regions B by our method. [sent-388, score-0.146]
76 Note that due to the limit of space, we do not quantitatively analyze the performance of the single frame alignment Algorithm 2 here as it is designed in a similar spirit as Algorithm 1. [sent-393, score-0.336]
77 We synthesize deformed and polluted images by controlling parameters (x-translation, ytranslation, angle of rotation, skew, occlusion fraction) based on the reference, i. [sent-396, score-0.256]
78 Figure 7 demonstrates improved robust alignment and recovery results with occlusion detection (as discussed at the end of Section 2. [sent-404, score-0.401]
79 The top row is the recovery result by using all the pixels in the region, and the middle row is the result with occlusion detection and masking. [sent-406, score-0.235]
80 Another merit of our method is that it preserves all global illumination changes in the original frames, which can be seen from the comparison with the median facade shown in the middle of the bottom row. [sent-407, score-0.309]
81 This property ensures that visual realism of the original sequence can be maximally preserved. [sent-408, score-0.106]
82 Figure 8 shows an example of foreground separation 222222888866 Figure7. [sent-411, score-0.118]
83 Top and Middle rows: the recovered results without and with occlusion detection. [sent-413, score-0.189]
84 Bottom row: the left is the result by feature matching and RANSAC, the middle is the median area obtained from the area basis, and the right is their residual. [sent-414, score-0.294]
85 from the recovered background and error residuals (as in Section 2. [sent-415, score-0.116]
86 It contains four images (from left to right: the recovered residual, the trimap, the foreground mask, and the foreground cutout respectively). [sent-417, score-0.32]
87 With the help of texture cue, we can determine that the pixels in the original image having similar textures with their corresponding pixels in the recovered image are background. [sent-419, score-0.2]
88 Hence, the trimap is automatically constructed and used as the input to alpha matting. [sent-421, score-0.34]
89 The alpha matting result is computed by using the technique introduced in [9]. [sent-423, score-0.278]
90 Readers can adopt other cues to determine the background and unknown, and choose other texture similarity measurements and alpha matting methods for the task of foreground separation. [sent-424, score-0.478]
91 In each of the three cases, the first row displays sample images from the same sequences, and the second gives the edited results by the proposed method. [sent-427, score-0.147]
92 The top case changes the building facade, the middle one changes the monitor background of the laptop, and the bottom one repairs the building facade texture as well as adding a new advertisement banner. [sent-428, score-0.443]
93 The virtual reality achieved by our method is also rather striking, not only the pose and geometry of the edited area are consistent throughout the sequence, but also all the occlusions in the original sequence are correctly synthesized on the newly edited regions (e. [sent-432, score-0.516]
94 Using our system, the only human intervention needed to achieve these tasks is to specify the edited area in the first frame and provide the replacement texture. [sent-435, score-0.335]
95 Conclusion We have presented a new framework that simultaneously align and rectify image regions in a video sequence, as well as automatically construct trimaps and segment foregrounds. [sent-437, score-0.235]
96 Our framework leverages the recent advances in robust recovery of a high-dimensional low-rank matrix despite gross sparse errors. [sent-438, score-0.196]
97 The system can significantly reduce the interaction from users for editing certain areas in the video. [sent-439, score-0.265]
98 The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. [sent-511, score-0.21]
99 RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images. [sent-527, score-0.283]
100 Toward a practical face recognition system: Robust alignment and illumination by sparse representation. [sent-563, score-0.283]
wordName wordTfidf (topN-words)
[('rasl', 0.294), ('alignment', 0.244), ('skew', 0.235), ('facade', 0.207), ('bk', 0.164), ('trimap', 0.154), ('alpha', 0.154), ('rectified', 0.151), ('edited', 0.147), ('corruptions', 0.141), ('residual', 0.138), ('editing', 0.136), ('misaligned', 0.136), ('batch', 0.126), ('tilt', 0.124), ('matting', 0.124), ('vec', 0.119), ('foreground', 0.118), ('bx', 0.113), ('ek', 0.113), ('occlusion', 0.105), ('congealing', 0.099), ('rectify', 0.098), ('area', 0.096), ('frame', 0.092), ('rectification', 0.091), ('occluding', 0.089), ('adm', 0.088), ('bxk', 0.088), ('deformed', 0.085), ('recovered', 0.084), ('ji', 0.084), ('tianjin', 0.082), ('logo', 0.077), ('occlusions', 0.074), ('rank', 0.07), ('ganesh', 0.068), ('gross', 0.068), ('edit', 0.068), ('polluted', 0.066), ('sridharan', 0.066), ('align', 0.066), ('yk', 0.064), ('clean', 0.062), ('lagrange', 0.061), ('advertisement', 0.059), ('china', 0.059), ('region', 0.057), ('median', 0.057), ('rotation', 0.057), ('basis', 0.056), ('realism', 0.054), ('areas', 0.054), ('ransac', 0.054), ('multiplier', 0.054), ('sequence', 0.052), ('recovery', 0.052), ('texture', 0.05), ('monitor', 0.05), ('aligned', 0.047), ('alm', 0.047), ('middle', 0.045), ('transformations', 0.044), ('beijing', 0.044), ('lucey', 0.044), ('cox', 0.044), ('simulated', 0.043), ('corrupted', 0.043), ('aligning', 0.043), ('temporal', 0.043), ('loop', 0.042), ('successful', 0.042), ('qk', 0.041), ('deformation', 0.041), ('transformation', 0.041), ('users', 0.04), ('linearized', 0.039), ('sk', 0.039), ('temporally', 0.039), ('sparse', 0.039), ('simultaneously', 0.039), ('translation', 0.038), ('tpami', 0.038), ('matrix', 0.037), ('shrinkage', 0.036), ('laptop', 0.036), ('surrogate', 0.036), ('interaction', 0.035), ('inner', 0.035), ('initial', 0.034), ('replace', 0.034), ('peng', 0.034), ('precisely', 0.033), ('pixels', 0.033), ('real', 0.033), ('cn', 0.033), ('poisson', 0.032), ('pages', 0.032), ('automatically', 0.032), ('background', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 453 cvpr-2013-Video Editing with Temporal, Spatial and Appearance Consistency
Author: Xiaojie Guo, Xiaochun Cao, Xiaowu Chen, Yi Ma
Abstract: Given an area of interest in a video sequence, one may want to manipulate or edit the area, e.g. remove occlusions from or replace with an advertisement on it. Such a task involves three main challenges including temporal consistency, spatial pose, and visual realism. The proposed method effectively seeks an optimal solution to simultaneously deal with temporal alignment, pose rectification, as well as precise recovery of the occlusion. To make our method applicable to long video sequences, we propose a batch alignment method for automatically aligning and rectifying a small number of initial frames, and then show how to align the remaining frames incrementally to the aligned base images. From the error residual of the robust alignment process, we automatically construct a trimap of the region for each frame, which is used as the input to alpha matting methods to extract the occluding foreground. Experimental results on both simulated and real data demonstrate the accurate and robust performance of our method.
2 0.22055773 216 cvpr-2013-Improving Image Matting Using Comprehensive Sampling Sets
Author: Ehsan Shahrian, Deepu Rajan, Brian Price, Scott Cohen
Abstract: In this paper, we present a new image matting algorithm that achieves state-of-the-art performance on a benchmark dataset of images. This is achieved by solving two major problems encountered by current sampling based algorithms. The first is that the range in which the foreground and background are sampled is often limited to such an extent that the true foreground and background colors are not present. Here, we describe a method by which a more comprehensive and representative set of samples is collected so as not to miss out on the true samples. This is accomplished by expanding the sampling range for pixels farther from the foreground or background boundary and ensuring that samples from each color distribution are included. The second problem is the overlap in color distributions of foreground and background regions. This causes sampling based methods to fail to pick the correct samples for foreground and background. Our design of an objective function forces those foreground and background samples to be picked that are generated from well-separated distributions. Comparison on the dataset at and evaluation by www.alphamatting.com shows that the proposed method ranks first in terms of error measures used in the website.
3 0.21128826 211 cvpr-2013-Image Matting with Local and Nonlocal Smooth Priors
Author: Xiaowu Chen, Dongqing Zou, Steven Zhiying Zhou, Qinping Zhao, Ping Tan
Abstract: In this paper we propose a novel alpha matting method with local and nonlocal smooth priors. We observe that the manifold preserving editing propagation [4] essentially introduced a nonlocal smooth prior on the alpha matte. This nonlocal smooth prior and the well known local smooth priorfrom matting Laplacian complement each other. So we combine them with a simple data term from color sampling in a graph model for nature image matting. Our method has a closed-form solution and can be solved efficiently. Compared with the state-of-the-art methods, our method produces more accurate results according to the evaluation on standard benchmark datasets.
4 0.13371186 399 cvpr-2013-Single-Sample Face Recognition with Image Corruption and Misalignment via Sparse Illumination Transfer
Author: Liansheng Zhuang, Allen Y. Yang, Zihan Zhou, S. Shankar Sastry, Yi Ma
Abstract: Single-sample face recognition is one of the most challenging problems in face recognition. We propose a novel face recognition algorithm to address this problem based on a sparse representation based classification (SRC) framework. The new algorithm is robust to image misalignment and pixel corruption, and is able to reduce required training images to one sample per class. To compensate the missing illumination information typically provided by multiple training images, a sparse illumination transfer (SIT) technique is introduced. The SIT algorithms seek additional illumination examples of face images from one or more additional subject classes, and form an illumination dictionary. By enforcing a sparse representation of the query image, the method can recover and transfer the pose and illumination information from the alignment stage to the recognition stage. Our extensive experiments have demonstrated that the new algorithms significantly outperform the existing algorithms in the single-sample regime and with less restrictions. In particular, the face alignment accuracy is comparable to that of the well-known Deformable SRC algorithm using multiple training images; and the face recognition accuracy exceeds those ofthe SRC andExtended SRC algorithms using hand labeled alignment initialization.
5 0.12556089 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences
Author: Yannis Panagakis, Mihalis A. Nicolaou, Stefanos Zafeiriou, Maja Pantic
Abstract: Temporal alignment of human behaviour from visual data is a very challenging problem due to a numerous reasons, including possible large temporal scale differences, inter/intra subject variability and, more importantly, due to the presence of gross errors and outliers. Gross errors are often in abundance due to incorrect localization and tracking, presence of partial occlusion etc. Furthermore, such errors rarely follow a Gaussian distribution, which is the de-facto assumption in machine learning methods. In this paper, building on recent advances on rank minimization and compressive sensing, a novel, robust to gross errors temporal alignment method is proposed. While previous approaches combine the dynamic time warping (DTW) with low-dimensional projections that maximally correlate two sequences, we aim to learn two underlyingprojection matrices (one for each sequence), which not only maximally correlate the sequences but, at the same time, efficiently remove the possible corruptions in any datum in the sequences. The projections are obtained by minimizing the weighted sum of nuclear and ?1 norms, by solving a sequence of convex optimization problems, while the temporal alignment is found by applying the DTW in an alternating fashion. The superiority of the proposed method against the state-of-the-art time alignment methods, namely the canonical time warping and the generalized time warping, is indicated by the experimental results on both synthetic and real datasets.
6 0.10930958 364 cvpr-2013-Robust Object Co-detection
7 0.097538009 311 cvpr-2013-Occlusion Patterns for Object Class Detection
8 0.08850684 341 cvpr-2013-Procrustean Normal Distribution for Non-rigid Structure from Motion
9 0.08798676 88 cvpr-2013-Compressible Motion Fields
10 0.085810013 235 cvpr-2013-Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines
11 0.084224105 148 cvpr-2013-Ensemble Video Object Cut in Highly Dynamic Scenes
12 0.082539059 357 cvpr-2013-Revisiting Depth Layers from Occlusions
13 0.081426136 154 cvpr-2013-Explicit Occlusion Modeling for 3D Object Class Representations
14 0.080293462 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
15 0.080285452 33 cvpr-2013-Active Contours with Group Similarity
16 0.080018967 315 cvpr-2013-Online Robust Dictionary Learning
17 0.078531988 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs
18 0.077982292 158 cvpr-2013-Exploring Weak Stabilization for Motion Feature Extraction
19 0.076605819 170 cvpr-2013-Fast Rigid Motion Segmentation via Incrementally-Complex Local Models
20 0.076059118 46 cvpr-2013-Articulated and Restricted Motion Subspaces and Their Signatures
topicId topicWeight
[(0, 0.212), (1, 0.045), (2, -0.02), (3, 0.035), (4, 0.004), (5, -0.041), (6, 0.013), (7, -0.056), (8, 0.03), (9, -0.011), (10, 0.036), (11, 0.012), (12, 0.002), (13, -0.027), (14, 0.053), (15, -0.042), (16, 0.028), (17, -0.024), (18, 0.03), (19, 0.044), (20, -0.034), (21, 0.071), (22, -0.054), (23, -0.186), (24, 0.092), (25, -0.162), (26, 0.084), (27, 0.066), (28, 0.003), (29, -0.016), (30, 0.085), (31, -0.04), (32, -0.12), (33, -0.083), (34, -0.097), (35, -0.069), (36, -0.093), (37, -0.079), (38, 0.126), (39, 0.039), (40, -0.061), (41, -0.017), (42, -0.023), (43, -0.024), (44, -0.011), (45, 0.02), (46, -0.04), (47, -0.04), (48, -0.095), (49, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.93427974 453 cvpr-2013-Video Editing with Temporal, Spatial and Appearance Consistency
Author: Xiaojie Guo, Xiaochun Cao, Xiaowu Chen, Yi Ma
Abstract: Given an area of interest in a video sequence, one may want to manipulate or edit the area, e.g. remove occlusions from or replace with an advertisement on it. Such a task involves three main challenges including temporal consistency, spatial pose, and visual realism. The proposed method effectively seeks an optimal solution to simultaneously deal with temporal alignment, pose rectification, as well as precise recovery of the occlusion. To make our method applicable to long video sequences, we propose a batch alignment method for automatically aligning and rectifying a small number of initial frames, and then show how to align the remaining frames incrementally to the aligned base images. From the error residual of the robust alignment process, we automatically construct a trimap of the region for each frame, which is used as the input to alpha matting methods to extract the occluding foreground. Experimental results on both simulated and real data demonstrate the accurate and robust performance of our method.
2 0.76212472 211 cvpr-2013-Image Matting with Local and Nonlocal Smooth Priors
Author: Xiaowu Chen, Dongqing Zou, Steven Zhiying Zhou, Qinping Zhao, Ping Tan
Abstract: In this paper we propose a novel alpha matting method with local and nonlocal smooth priors. We observe that the manifold preserving editing propagation [4] essentially introduced a nonlocal smooth prior on the alpha matte. This nonlocal smooth prior and the well known local smooth priorfrom matting Laplacian complement each other. So we combine them with a simple data term from color sampling in a graph model for nature image matting. Our method has a closed-form solution and can be solved efficiently. Compared with the state-of-the-art methods, our method produces more accurate results according to the evaluation on standard benchmark datasets.
3 0.74846381 216 cvpr-2013-Improving Image Matting Using Comprehensive Sampling Sets
Author: Ehsan Shahrian, Deepu Rajan, Brian Price, Scott Cohen
Abstract: In this paper, we present a new image matting algorithm that achieves state-of-the-art performance on a benchmark dataset of images. This is achieved by solving two major problems encountered by current sampling based algorithms. The first is that the range in which the foreground and background are sampled is often limited to such an extent that the true foreground and background colors are not present. Here, we describe a method by which a more comprehensive and representative set of samples is collected so as not to miss out on the true samples. This is accomplished by expanding the sampling range for pixels farther from the foreground or background boundary and ensuring that samples from each color distribution are included. The second problem is the overlap in color distributions of foreground and background regions. This causes sampling based methods to fail to pick the correct samples for foreground and background. Our design of an objective function forces those foreground and background samples to be picked that are generated from well-separated distributions. Comparison on the dataset at and evaluation by www.alphamatting.com shows that the proposed method ranks first in terms of error measures used in the website.
4 0.68492335 358 cvpr-2013-Robust Canonical Time Warping for the Alignment of Grossly Corrupted Sequences
Author: Yannis Panagakis, Mihalis A. Nicolaou, Stefanos Zafeiriou, Maja Pantic
Abstract: Temporal alignment of human behaviour from visual data is a very challenging problem due to a numerous reasons, including possible large temporal scale differences, inter/intra subject variability and, more importantly, due to the presence of gross errors and outliers. Gross errors are often in abundance due to incorrect localization and tracking, presence of partial occlusion etc. Furthermore, such errors rarely follow a Gaussian distribution, which is the de-facto assumption in machine learning methods. In this paper, building on recent advances on rank minimization and compressive sensing, a novel, robust to gross errors temporal alignment method is proposed. While previous approaches combine the dynamic time warping (DTW) with low-dimensional projections that maximally correlate two sequences, we aim to learn two underlyingprojection matrices (one for each sequence), which not only maximally correlate the sequences but, at the same time, efficiently remove the possible corruptions in any datum in the sequences. The projections are obtained by minimizing the weighted sum of nuclear and ?1 norms, by solving a sequence of convex optimization problems, while the temporal alignment is found by applying the DTW in an alternating fashion. The superiority of the proposed method against the state-of-the-art time alignment methods, namely the canonical time warping and the generalized time warping, is indicated by the experimental results on both synthetic and real datasets.
Author: Gunhee Kim, Eric P. Xing
Abstract: With an explosion of popularity of online photo sharing, we can trivially collect a huge number of photo streams for any interesting topics such as scuba diving as an outdoor recreational activity class. Obviously, the retrieved photo streams are neither aligned nor calibrated since they are taken in different temporal, spatial, and personal perspectives. However, at the same time, they are likely to share common storylines that consist of sequences of events and activities frequently recurred within the topic. In this paper, as a first technical step to detect such collective storylines, we propose an approach to jointly aligning and segmenting uncalibrated multiple photo streams. The alignment task discovers the matched images between different photo streams, and the image segmentation task parses each image into multiple meaningful regions to facilitate the image understanding. We close a loop between the two tasks so that solving one task helps enhance the performance of the other in a mutually rewarding way. To this end, we design a scalable message-passing based optimization framework to jointly achieve both tasks for the whole input image set at once. With evaluation on the new Flickr dataset of 15 outdoor activities that consist of 1.5 millions of images of 13 thousands of photo streams, our empirical results show that the proposed algorithms are more successful than other candidate methods for both tasks.
6 0.6012826 55 cvpr-2013-Background Modeling Based on Bidirectional Analysis
7 0.58558702 352 cvpr-2013-Recovering Stereo Pairs from Anaglyphs
8 0.57524049 22 cvpr-2013-A Non-parametric Framework for Document Bleed-through Removal
9 0.54978222 47 cvpr-2013-As-Projective-As-Possible Image Stitching with Moving DLT
10 0.54537785 195 cvpr-2013-HDR Deghosting: How to Deal with Saturation?
11 0.54481781 96 cvpr-2013-Correlation Filters for Object Alignment
12 0.51075864 177 cvpr-2013-FrameBreak: Dramatic Image Extrapolation by Guided Shift-Maps
13 0.51048493 210 cvpr-2013-Illumination Estimation Based on Bilayer Sparse Coding
14 0.48430794 450 cvpr-2013-Unsupervised Joint Object Discovery and Segmentation in Internet Images
15 0.48192951 162 cvpr-2013-FasT-Match: Fast Affine Template Matching
16 0.46837908 330 cvpr-2013-Photometric Ambient Occlusion
17 0.4677698 321 cvpr-2013-PDM-ENLOR: Learning Ensemble of Local PDM-Based Regressions
18 0.46735993 113 cvpr-2013-Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video
19 0.4659076 148 cvpr-2013-Ensemble Video Object Cut in Highly Dynamic Scenes
20 0.46359614 171 cvpr-2013-Fast Trust Region for Segmentation
topicId topicWeight
[(10, 0.128), (16, 0.036), (26, 0.077), (28, 0.016), (33, 0.288), (39, 0.015), (53, 0.213), (67, 0.052), (69, 0.039), (77, 0.018), (87, 0.055)]
simIndex simValue paperId paperTitle
Author: Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, Dinggang Shen
Abstract: Analyzing brain networks from neuroimages is becoming a promising approach in identifying novel connectivitybased biomarkers for the Alzheimer’s disease (AD). In this regard, brain “effective connectivity ” analysis, which studies the causal relationship among brain regions, is highly challenging and of many research opportunities. Most of the existing works in this field use generative methods. Despite their success in data representation and other important merits, generative methods are not necessarily discriminative, which may cause the ignorance of subtle but critical disease-induced changes. In this paper, we propose a learning-based approach that integrates the benefits of generative and discriminative methods to recover effective connectivity. In particular, we employ Fisher kernel to bridge the generative models of sparse Bayesian networks (SBN) and the discriminative classifiers of SVMs, and convert the SBN parameter learning to Fisher kernel learning via minimizing a generalization error bound of SVMs. Our method is able to simultaneously boost the discriminative power of both the generative SBN models and the SBN-induced SVM classifiers via Fisher kernel. The proposed method is tested on analyzing brain effective connectivity for AD from ADNI data, and demonstrates significant improvements over the state-of-the-art work.
same-paper 2 0.88324088 453 cvpr-2013-Video Editing with Temporal, Spatial and Appearance Consistency
Author: Xiaojie Guo, Xiaochun Cao, Xiaowu Chen, Yi Ma
Abstract: Given an area of interest in a video sequence, one may want to manipulate or edit the area, e.g. remove occlusions from or replace with an advertisement on it. Such a task involves three main challenges including temporal consistency, spatial pose, and visual realism. The proposed method effectively seeks an optimal solution to simultaneously deal with temporal alignment, pose rectification, as well as precise recovery of the occlusion. To make our method applicable to long video sequences, we propose a batch alignment method for automatically aligning and rectifying a small number of initial frames, and then show how to align the remaining frames incrementally to the aligned base images. From the error residual of the robust alignment process, we automatically construct a trimap of the region for each frame, which is used as the input to alpha matting methods to extract the occluding foreground. Experimental results on both simulated and real data demonstrate the accurate and robust performance of our method.
3 0.87262797 288 cvpr-2013-Modeling Mutual Visibility Relationship in Pedestrian Detection
Author: Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
Abstract: Detecting pedestrians in cluttered scenes is a challenging problem in computer vision. The difficulty is added when several pedestrians overlap in images and occlude each other. We observe, however, that the occlusion/visibility statuses of overlapping pedestrians provide useful mutual relationship for visibility estimation - the visibility estimation of one pedestrian facilitates the visibility estimation of another. In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians. The visibility relationship among pedestrians is learned from the deep model for recognizing co-existing pedestrians. Experimental results show that the mutual visibility deep model effectively improves the pedestrian detection results. Compared with existing image-based pedestrian detection approaches, our approach has the lowest average miss rate on the CaltechTrain dataset, the Caltech-Test dataset and the ETHdataset. Including mutual visibility leads to 4% −8% improvements on mluudlitnipglem ubteunaclh vmiasibrki ditayta lesaedtss.
4 0.86506414 211 cvpr-2013-Image Matting with Local and Nonlocal Smooth Priors
Author: Xiaowu Chen, Dongqing Zou, Steven Zhiying Zhou, Qinping Zhao, Ping Tan
Abstract: In this paper we propose a novel alpha matting method with local and nonlocal smooth priors. We observe that the manifold preserving editing propagation [4] essentially introduced a nonlocal smooth prior on the alpha matte. This nonlocal smooth prior and the well known local smooth priorfrom matting Laplacian complement each other. So we combine them with a simple data term from color sampling in a graph model for nature image matting. Our method has a closed-form solution and can be solved efficiently. Compared with the state-of-the-art methods, our method produces more accurate results according to the evaluation on standard benchmark datasets.
5 0.85940236 216 cvpr-2013-Improving Image Matting Using Comprehensive Sampling Sets
Author: Ehsan Shahrian, Deepu Rajan, Brian Price, Scott Cohen
Abstract: In this paper, we present a new image matting algorithm that achieves state-of-the-art performance on a benchmark dataset of images. This is achieved by solving two major problems encountered by current sampling based algorithms. The first is that the range in which the foreground and background are sampled is often limited to such an extent that the true foreground and background colors are not present. Here, we describe a method by which a more comprehensive and representative set of samples is collected so as not to miss out on the true samples. This is accomplished by expanding the sampling range for pixels farther from the foreground or background boundary and ensuring that samples from each color distribution are included. The second problem is the overlap in color distributions of foreground and background regions. This causes sampling based methods to fail to pick the correct samples for foreground and background. Our design of an objective function forces those foreground and background samples to be picked that are generated from well-separated distributions. Comparison on the dataset at and evaluation by www.alphamatting.com shows that the proposed method ranks first in terms of error measures used in the website.
6 0.85036856 44 cvpr-2013-Area Preserving Brain Mapping
7 0.83724457 104 cvpr-2013-Deep Convolutional Network Cascade for Facial Point Detection
8 0.8343538 311 cvpr-2013-Occlusion Patterns for Object Class Detection
9 0.83267337 360 cvpr-2013-Robust Estimation of Nonrigid Transformation for Point Set Registration
10 0.83134657 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
11 0.83082348 325 cvpr-2013-Part Discovery from Partial Correspondence
12 0.83075857 424 cvpr-2013-Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure
13 0.83068436 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
14 0.83033448 96 cvpr-2013-Correlation Filters for Object Alignment
15 0.82966805 414 cvpr-2013-Structure Preserving Object Tracking
16 0.82902622 194 cvpr-2013-Groupwise Registration via Graph Shrinkage on the Image Manifold
17 0.82894862 30 cvpr-2013-Accurate Localization of 3D Objects from RGB-D Data Using Segmentation Hypotheses
18 0.8287093 285 cvpr-2013-Minimum Uncertainty Gap for Robust Visual Tracking
19 0.82863593 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
20 0.82798588 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection