iccv iccv2013 iccv2013-422 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yuqian Zhang, Cun Mu, Han-Wen Kuo, John Wright
Abstract: Illumination variation remains a central challenge in object detection and recognition. Existing analyses of illumination variation typically pertain to convex, Lambertian objects, and guarantee quality of approximation in an average case sense. We show that it is possible to build models for the set of images across illumination variation with worstcase performance guarantees, for nonconvex Lambertian objects. Namely, a natural verification test based on the distance to the model guarantees to accept any image which can be sufficiently well-approximated by an image of the object under some admissible lighting condition, and guarantees to reject any image that does not have a sufficiently good approximation. These models are generated by sampling illumination directions with sufficient density, which follows from a new perturbation bound for directional illuminated images in the Lambertian model. As the number of such images required for guaranteed verification may be large, we introduce a new formulation for cone preserving dimensionality reduction, which leverages tools from sparse and low-rank decomposition to reduce the complexity, while controlling the approximation error with respect to the original model. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Existing analyses of illumination variation typically pertain to convex, Lambertian objects, and guarantee quality of approximation in an average case sense. [sent-5, score-0.416]
2 We show that it is possible to build models for the set of images across illumination variation with worstcase performance guarantees, for nonconvex Lambertian objects. [sent-6, score-0.364]
3 These models are generated by sampling illumination directions with sufficient density, which follows from a new perturbation bound for directional illuminated images in the Lambertian model. [sent-8, score-0.524]
4 These approaches are often effective in practice, but can break down under extreme illumination. [sent-14, score-0.13]
5 The seminal work [2] argues that images of a given object with fixed pose and varying illumination should lie near a convex cone in the high-dimensional image space. [sent-19, score-1.024]
6 Many subsequent works have attempted to capture the gross structure of this cone using low-dimensional convex cone or linear subspace models [1, 15]. [sent-20, score-1.371]
7 The promise of subspace or cone models, compared to feature-based approaches described above, is that, by reasoning carefully about the image formation process, it might be possible to guarantee to well-approximate all images of the object under clearly delineated conditions. [sent-22, score-0.717]
8 It is worth asking then, what approximation guarantees do current results afford us? [sent-23, score-0.207]
9 For convex, Lambertian objects, by an elegant interpretation of the Lambertian reflectance as spherical convolution [1, 15], people showed that for uniformly random directional lightings, a nine dimensional spherical harmonic approximation captures on average about 98% of the energy [1, 8]. [sent-24, score-0.288]
10 We study these questions in the context of a model problem in object instance verification, in which one is given an object O at a fixed pose, and ask whether tohnee i inspu gtiv viemnag ane oisb an itm Oag aet o af ftihxiesd dob pjoescet, u anndder a some evathliedr illumination condition. [sent-26, score-0.388]
11 We develop rigorous guarantees for this problem, for general (including nonconvex) Lambertian objects. [sent-27, score-0.139]
12 Our results show how to build a model that guarantees to accept every image that can be interpreted as an image of the object under some lighting condition, and to reject every image that is sufficiently dissimilar to all images of the object under valid lighting conditions. [sent-28, score-0.499]
13 Similar to [12, 14, 19], our work approximates the illumination cone with a conic combination of certain images on its boundary. [sent-29, score-0.967]
14 To address 993377 this problem, we start from the goal of producing a sufficiently accurate representation of the illumination cone, and derive, in terms of the properties of the object and the scene, sufficient sampling densities for this goal to be met. [sent-31, score-0.334]
15 Our bounds depend on the level of ambient illumination, and a notion of convexity defect of the object. [sent-32, score-0.361]
16 They make precise the intuitions that: (i) it is more difficult to operate in low-light scenarios and (ii) nonconvex objects are more challenging than convex objects. [sent-33, score-0.254]
17 To address this problem, we introduce a new approach to cone preserving complexity reduction. [sent-35, score-0.711]
18 This approach leverages tools from convex programming in particular, sparse and low-rank decomposition [4, 5] but introduces a new constrained formulation which guarantees that the conic hull of the output will well-approximate the conic hull ofthe input. [sent-36, score-0.611]
19 The low-rank and sparse decomposition leverages our qualitative understanding of the physical properties of images (low-dimensionality, sparsity of cast shadows) [1, 15, 21, 4, 22], while the constraint ensures that the output of this algorithm is always a good approximation to the target cone. [sent-37, score-0.254]
20 Numerical experiments illustrate our bounds and their potential for worst case verification. [sent-40, score-0.13]
21 eM seatth oefm noanti-cally, F is a convex cone: sums of nonnegative, integrable fcuanllcyt,i oFns i sa rae caognavinex xn cononnee:ga stuimves a onfd n ninontengergaabtliev. [sent-50, score-0.193]
22 By linearity of light transport and linearity of the sensor response, the observed image y ∈ Rm is a linear function y[f] of the illuminimataiogne fy: ∈if Rthe object is subjected to the superposition f = f1 + f2 of two illuminations f1 and f2, we observe y[f1 + f2] = y[f1] + y[f2] . [sent-52, score-0.214]
23 Since f resides in the convex cone, the set C0 y[F] ⊂ Rm of possible images is also a convex cone. [sent-53, score-0.25]
24 No=te y, [hFow]e ⊂ve Rr, that the fact that C0 is a convex cone holds under very mild assumptions. [sent-54, score-0.728]
25 and a great deal of subsequent work has been devoted to Cα, for ambient levels α = 0 up to α = 5. [sent-56, score-0.269]
26 In each example fd is an extreme directional illumination. [sent-57, score-0.273]
27 Right: illumination cones Cα with varying ambient level α. [sent-59, score-0.505]
28 The cone C0 can be interpreted as the set of all images of the object under different distant lighting conditions. [sent-63, score-0.768]
29 Intuitively speaking, we expect the problem of representing images y under different illuminations to be more challenging when the light has a stronger directional component. [sent-64, score-0.135]
30 To capture the relative contribution of directional and ambient components of light, we introduce a family of function classes Fα, indexed by parameter α ∈ [0, ∞). [sent-65, score-0.324]
31 In this sense, the choice of α iFnodru acnesy a αtr ≤ade αoff: as α ⊆b Cecomes smaller, Cα becomes more complicated to compute with, but can represent broader illumination conditions. [sent-75, score-0.263]
32 Our complexity bounds in Section 3 will make this intuition precise. [sent-76, score-0.136]
33 Figure 1 shows rendered images of a face under various ambient levels α 0. [sent-77, score-0.333]
34 Our methodology asks the system designer to select a target level of ambient illumination α, and hence choose a target cone C = Cα. [sent-80, score-1.152]
35 The angular detector accepts points based on their angle with the cone C. [sent-104, score-0.703]
36 An approximate angular detector guarantees to accept any point within angle τ of C, and to reject any point with angle greater than (1+ η)τ. [sent-105, score-0.389]
37 This is efficient if the number n of extreme rays of C is small. [sent-112, score-0.218]
38 If O cisi ean convex pnoulmybheerdr non o fw eitxht oemnlye raa fyesw o fve Crtic ise ss,m thaills. [sent-113, score-0.125]
39 However, in general, the number of extreme rays in a V(vertex)-description can be large or even unbounded. [sent-116, score-0.218]
40 Tτh],is n o bu dfefemra zone arell opwlacs us oton work with a surrogate cone with much simpler structure, enabling computationally ef? [sent-127, score-0.631]
41 2For convex, Lambertian objects, in a point sampling model of image formation, the best known bound on the number of extreme rays in a V-representation of C is quadratic in the number of image pixels: n = a VO-(rmepr2e)s [e2nt]. [sent-155, score-0.299]
42 h (1+ η) τ ∈ (0, 2π) and another cone we have ∈ C? [sent-185, score-0.603]
43 We present a detailed procedure ofproducing an approximate cone In Section 3, we will show how to build an ε-approximat? [sent-198, score-0.603]
44 Then in Section 4, via solving a convex optimization problem, we form cone a γ-approximation to C¯, but with much lower complexity. [sent-201, score-0.728]
45 Extreme Rays of C As discussed above, under very general circumstances the set of images y form a convex cone. [sent-215, score-0.125]
46 In this case, the vectors y¯[u] form the extreme rays yof[ Fthe]. [sent-221, score-0.218]
47 Wmaigthe sth oisf iOnte urnpdreertat pioonin, tt ihleprevious lemma simply asserts that any image y[f] under distant, Riemann integrable illumination f can be arbitrarily well approximated using a conic combination of images 993399 y¯[u] under point illuminations. [sent-238, score-0.65]
48 The conic hull of these extreme images is (up to a set of measure zero) the cone C0 of images of O under arbitrary Riemann integrable illumi- noaft i omna. [sent-239, score-0.944]
49 g eWse o wf Oould un nlidkeer a rsbiimtrialrayr e Rxipemreassnionn i nthteagt rwabolrek isl lwumheinthe ambient level is larger than zero – we would like to also approximate the extreme rays of Cα. [sent-240, score-0.501]
50 The following lemma says that we can use images of the form ˘ y[u] = ¯ y[u] +αya, where ya is the image of O under ambient illumination: Lemma 3. [sent-241, score-0.457]
51 ya = (10) Then, we have So, to work with Cα, we can simply work with a modified set of extreme images ˘y [u], which are sums of images under point illumination and the ambient image ya. [sent-254, score-0.762]
52 A natural approach to is to discretize the set of illumination directions, by choosing a finite set u1, . [sent-256, score-0.263]
53 The following lemma asserts that as long as the ¯y [ui] can approximate any point illumination ¯ y[u] in an absolute sense, the cone generated by the finite set and the cone Cα will not differ too much: Lemma 3. [sent-260, score-1.657]
54 Below, we will see that this is possible, even for nonconvex objects, provided the object is Lambertian. [sent-281, score-0.134]
55 ShadowingEdgesχ[u](yelow)underdirectionali u- Gnomon length is a function of illumination direction u. [sent-299, score-0.263]
56 It is defined as the total length of the edges that cast shadows o? [sent-300, score-0.163]
57 For convex object, ˜ ν(x) = 1and χ[u] = 0 always hold for any po? [sent-304, score-0.125]
58 int x ∈ ∂O and any illumination direction u ∈ Sfo2r. [sent-305, score-0.263]
59 Fnoyr p moinotre x xg ∈ene ∂rOal oanbdjec atnsy, itlhlueimr ninoanticoonnv deixreitcyti ownil ul b ∈e phrased in terms of the extreme values of these quantities: Minimum visibility: Max. [sent-306, score-0.13]
60 They will play an important role in building an approximation to the original illumination cone. [sent-320, score-0.336]
61 Under these hypotheses, we obtain rigorous bounds for the error ? [sent-324, score-0.118]
62 Our bounds pertain to triangulated objects, whose boundary is a union of finitely many oriented triangles: 3For a precise definition of χ[u] , please refer to the supplement. [sent-333, score-0.162]
63 g can a bep described in terms of two operators: The direct illumination operator D: F → L2 [∂O] descriTbhees tdhier eocbtje ilcltu’ms rienfalteicotnanc oep earfatetorr rth De: :fir Fst →bou Lnce[∂ oOf ]li dghetfrom illumination function f(u): D[f](x) =? [sent-354, score-0.526]
64 Under our hypothesis, , for any pai√r of illumination directions u, u? [sent-441, score-0.325]
65 3, it gives a guideline for choosing the sampling density that guarantees a representation that works for every illumination f ∈ Fα. [sent-466, score-0.358]
66 erification with ambient illumination level α, it would be enough to ? [sent-480, score-0.505]
67 This is possible due to the very special structure of the extreme rays ¯ y[u] of Cα. [sent-483, score-0.218]
68 In contrast, general cone approximation in Rm requires a number of samples exponential in m [3]. [sent-484, score-0.676]
69 luminations (Figure 6): the low-rank term captures the smooth variations [1], while cast shadows are often sparse [21]. [sent-509, score-0.203]
70 To build a framework for complexity reduction with guaranteed approximation quality, we start with: mL,iSn rank(L) + λ? [sent-518, score-0.214]
71 The nonconvex domain Ω0 can be replaced by a convex subset Ω1 : Lemma 4. [sent-528, score-0.226]
72 , [4]), which aim at statistical estimation, and measure quality of approximation in Frobenius norm, we guarantee approximation in Hausdorff distance δ(·, ·). [sent-567, score-0.181]
73 We consider point illuminations spaced every 36π0 angle in spherical coordinates. [sent-579, score-0.13]
74 We demonstrate the ability of our solution to (19) to reduce the complexity of the representation, while preserving the conic hull. [sent-657, score-0.209]
75 We solve the low-rank alunmd sparse cone a rpepsoroluxitmioanti 4o0n× × pr4o0b. [sent-659, score-0.643]
76 The decomposition reduces the complexity in all cases; the reduction becomes more pronounced as α increases. [sent-665, score-0.138]
77 This is expected, since the cone Cα becomes smaller as α increases. [sent-666, score-0.603]
78 (m+mnn)r+s tage of our cone approximation methodology in verification under poor illumination conditions, we compare the receiver operating characteristic (ROC) curves for 5 detection dictionaries obtained from the same 3D face model under ambient level α = 0. [sent-669, score-1.386]
79 1: convex cone C1 composed of 2592 6In Figure 7, we rescale the theoretical prediction for clearer comparison – our goal is only to show the that the exponent is 1/2. [sent-670, score-0.757]
80 Cone Complexity Reduction for different nonconvex objects under zero ambient level (α = 0) (left) and for face under different ambient illumination levels (right). [sent-687, score-0.922]
81 images, corresponds to the ε-approximate of the original illumination cone; C2 is the γ-approximation (γ = 0. [sent-688, score-0.263]
82 11) of C1 with L + S structure, whose rank r = 25, number of nonzero entries s = 52699 and the complexity redution ratio is 0. [sent-689, score-0.134]
83 0539; C3 is rendered under 19 illumination directions corresponding to subsets 1 and 2 of Yale B [9] (roughly, the setting of [21]); C4 is rendered under all 64 illumination directions considered in [9]. [sent-690, score-0.74]
84 Also, we have two more ROC curves for verification results based on local gradient features, LBP and HOG respectively: the distances are calculated against the descriptor for ambient image, which act as the training dictionary here. [sent-692, score-0.323]
85 ROC Curves for different dictionaries, with test images under uniform random illumination (left) and extreme illumination (right). [sent-706, score-0.656]
86 The dictionaries are C1: ε approximation, C2 : low-rank and sparse approximation, C3-C4: point illuminations distributed similar to [9], S: nine-dimensional linear subspace. [sent-707, score-0.156]
87 Our test data consist of 1000 positive images under 1000 illumination directions and 3000 negative images of 3 other subjects. [sent-708, score-0.325]
88 We consider two distributions for the illumination directions uniform on the sphere (roughly corresponding to the “average case”), and uniform on the set of u ∈ S2 for wtoh thiceh −av0e. [sent-709, score-0.325]
89 We view this result as illustrating a tradeoff in illumination representation: uniformly good performance is possible, if we can afford a more complex repre994433 sentation. [sent-720, score-0.302]
90 The cone preserving low-rank and sparse decomposition gives a way to control the complexity of the representation, while still maintaining this good performance. [sent-721, score-0.782]
91 Although our cone construction scheme guarantees worst case detection, the number of samples required is likely to be very large, in particular for small ε: when ε = 0. [sent-724, score-0.754]
92 01, our theorem requires about 1025 images under ambient level α = 1. [sent-725, score-0.277]
93 It will be important to have very concise models for each pose; the complexity reduction by convex programming is one means of achieving this. [sent-729, score-0.232]
94 Our results could give a way of learning a set of canonical illumination models, which capture effects such as cast shadows, and which could be adapted to each new input subject. [sent-745, score-0.339]
95 What is the set of images of [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] an object under all possible illumination conditions? [sent-755, score-0.296]
96 Accuracy of spherical harmonic approximations for images of lambertian objects under far and near lighting. [sent-800, score-0.275]
97 From few to many: Illumination cone models for face recognition under variable lighting and pose. [sent-809, score-0.717]
98 Sparse representation of cast shadows via l1-regularized least squares. [sent-839, score-0.163]
99 Analytic pca construction for theoretical analysis of lighting variability in images of a lambertian object. [sent-843, score-0.296]
100 Toward a practical face recognition system: Robust alignment and illumination by sparse representation. [sent-871, score-0.349]
wordName wordTfidf (topN-words)
[('cone', 0.603), ('illumination', 0.263), ('ambient', 0.242), ('lambertian', 0.199), ('extreme', 0.13), ('convex', 0.125), ('lemma', 0.117), ('conic', 0.101), ('nonconvex', 0.101), ('ya', 0.098), ('guarantees', 0.095), ('rays', 0.088), ('shadows', 0.087), ('directional', 0.082), ('verification', 0.081), ('cast', 0.076), ('accept', 0.076), ('bounds', 0.074), ('approximation', 0.073), ('angular', 0.07), ('integrable', 0.068), ('lighting', 0.068), ('perturbation', 0.065), ('rm', 0.064), ('complexity', 0.062), ('directions', 0.062), ('fd', 0.061), ('reject', 0.06), ('oisb', 0.059), ('worst', 0.056), ('illuminations', 0.053), ('bound', 0.052), ('sin', 0.051), ('descritbhees', 0.051), ('gnomon', 0.051), ('roximation', 0.051), ('tuio', 0.051), ('spherical', 0.048), ('linearity', 0.046), ('formation', 0.046), ('face', 0.046), ('preserving', 0.046), ('roc', 0.046), ('pertain', 0.045), ('defect', 0.045), ('interreflection', 0.045), ('rendered', 0.045), ('reduction', 0.045), ('rigorous', 0.044), ('methodology', 0.044), ('triangulated', 0.043), ('riemann', 0.042), ('asserts', 0.042), ('hull', 0.042), ('visibility', 0.041), ('un', 0.041), ('ay', 0.041), ('sc', 0.041), ('sparse', 0.04), ('gross', 0.04), ('afford', 0.039), ('sufficiently', 0.038), ('quantities', 0.038), ('du', 0.038), ('polyhedral', 0.038), ('lumbi', 0.038), ('reflectance', 0.037), ('rank', 0.037), ('sensor', 0.036), ('pami', 0.036), ('admissible', 0.036), ('uth', 0.036), ('distant', 0.036), ('ganesh', 0.035), ('fc', 0.035), ('theorem', 0.035), ('nonzero', 0.035), ('guarantee', 0.035), ('guaranteed', 0.034), ('dictionaries', 0.034), ('leverages', 0.034), ('il', 0.034), ('albedo', 0.034), ('quotient', 0.034), ('numerical', 0.033), ('wright', 0.033), ('object', 0.033), ('imaging', 0.031), ('ud', 0.031), ('hausdorff', 0.031), ('decomposition', 0.031), ('oisf', 0.03), ('detector', 0.03), ('theoretical', 0.029), ('point', 0.029), ('surrogate', 0.028), ('objects', 0.028), ('interpreted', 0.028), ('devoted', 0.027), ('ad', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000015 422 iccv-2013-Toward Guaranteed Illumination Models for Non-convex Objects
Author: Yuqian Zhang, Cun Mu, Han-Wen Kuo, John Wright
Abstract: Illumination variation remains a central challenge in object detection and recognition. Existing analyses of illumination variation typically pertain to convex, Lambertian objects, and guarantee quality of approximation in an average case sense. We show that it is possible to build models for the set of images across illumination variation with worstcase performance guarantees, for nonconvex Lambertian objects. Namely, a natural verification test based on the distance to the model guarantees to accept any image which can be sufficiently well-approximated by an image of the object under some admissible lighting condition, and guarantees to reject any image that does not have a sufficiently good approximation. These models are generated by sampling illumination directions with sufficient density, which follows from a new perturbation bound for directional illuminated images in the Lambertian model. As the number of such images required for guaranteed verification may be large, we introduce a new formulation for cone preserving dimensionality reduction, which leverages tools from sparse and low-rank decomposition to reduce the complexity, while controlling the approximation error with respect to the original model. 1
2 0.17512536 405 iccv-2013-Structured Light in Sunlight
Author: Mohit Gupta, Qi Yin, Shree K. Nayar
Abstract: Strong ambient illumination severely degrades the performance of structured light based techniques. This is especially true in outdoor scenarios, where the structured light sources have to compete with sunlight, whose power is often 2-5 orders of magnitude larger than the projected light. In this paper, we propose the concept of light-concentration to overcome strong ambient illumination. Our key observation is that given a fixed light (power) budget, it is always better to allocate it sequentially in several portions of the scene, as compared to spreading it over the entire scene at once. For a desired level of accuracy, we show that by distributing light appropriately, the proposed approach requires 1-2 orders lower acquisition time than existing approaches. Our approach is illumination-adaptive as the optimal light distribution is determined based on a measurement of the ambient illumination level. Since current light sources have a fixed light distribution, we have built a prototype light source that supports flexible light distribution by controlling the scanning speed of a laser scanner. We show several high quality 3D scanning results in a wide range of outdoor scenarios. The proposed approach will benefit 3D vision systems that need to operate outdoors under extreme ambient illumination levels on a limited time and power budget.
3 0.12521562 199 iccv-2013-High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination
Author: Yudeog Han, Joon-Young Lee, In So Kweon
Abstract: We present a novel framework to estimate detailed shape of diffuse objects with uniform albedo from a single RGB-D image. To estimate accurate lighting in natural illumination environment, we introduce a general lighting model consisting oftwo components: global and local models. The global lighting model is estimated from the RGB-D input using the low-dimensional characteristic of a diffuse reflectance model. The local lighting model represents spatially varying illumination and it is estimated by using the smoothlyvarying characteristic of illumination. With both the global and local lighting model, we can estimate complex lighting variations in uncontrolled natural illumination conditions accurately. For high quality shape capture, a shapefrom-shading approach is applied with the estimated lighting model. Since the entire process is done with a single RGB-D input, our method is capable of capturing the high quality shape details of a dynamic object under natural illumination. Experimental results demonstrate the feasibility and effectiveness of our method that dramatically improves shape details of the rough depth input.
4 0.12266467 5 iccv-2013-A Color Constancy Model with Double-Opponency Mechanisms
Author: Shaobing Gao, Kaifu Yang, Chaoyi Li, Yongjie Li
Abstract: The double-opponent color-sensitive cells in the primary visual cortex (V1) of the human visual system (HVS) have long been recognized as the physiological basis of color constancy. We introduce a new color constancy model by imitating the functional properties of the HVS from the retina to the double-opponent cells in V1. The idea behind the model originates from the observation that the color distribution of the responses of double-opponent cells to the input color-biased images coincides well with the light source direction. Then the true illuminant color of a scene is easily estimated by searching for the maxima of the separate RGB channels of the responses of double-opponent cells in the RGB space. Our systematical experimental evaluations on two commonly used image datasets show that the proposed model can produce competitive results in comparison to the complex state-of-the-art approaches, but with a simple implementation and without the need for training.
5 0.10672189 45 iccv-2013-Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications
Author: Yu-Tseh Chi, Mohsen Ali, Muhammad Rushdi, Jeffrey Ho
Abstract: This paper proposes a novel approach for sparse coding that further improves upon the sparse representation-based classification (SRC) framework. The proposed framework, Affine-Constrained Group Sparse Coding (ACGSC), extends the current SRC framework to classification problems with multiple input samples. Geometrically, the affineconstrained group sparse coding essentially searches for the vector in the convex hull spanned by the input vectors that can best be sparse coded using the given dictionary. The resulting objectivefunction is still convex and can be efficiently optimized using iterative block-coordinate descent scheme that is guaranteed to converge. Furthermore, we provide a form of sparse recovery result that guarantees, at least theoretically, that the classification performance of the constrained group sparse coding should be at least as good as the group sparse coding. We have evaluated the proposed approach using three different recognition experiments that involve illumination variation of faces and textures, and face recognition under occlusions. Prelimi- nary experiments have demonstrated the effectiveness of the proposed approach, and in particular, the results from the recognition/occlusion experiment are surprisingly accurate and robust.
6 0.092817351 30 iccv-2013-A Simple Model for Intrinsic Image Decomposition with Depth Cues
7 0.080441952 219 iccv-2013-Internet Based Morphable Model
8 0.079612084 343 iccv-2013-Real-World Normal Map Capture for Nearly Flat Reflective Surfaces
9 0.07874541 157 iccv-2013-Fast Face Detector Training Using Tailored Views
10 0.077416264 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
11 0.076663442 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
12 0.075597823 392 iccv-2013-Similarity Metric Learning for Face Recognition
13 0.074381091 330 iccv-2013-Proportion Priors for Image Sequence Segmentation
14 0.072725154 310 iccv-2013-Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision
15 0.072579347 82 iccv-2013-Compensating for Motion during Direct-Global Separation
16 0.072226875 421 iccv-2013-Total Variation Regularization for Functions with Values in a Manifold
17 0.071810134 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
18 0.069800109 281 iccv-2013-Multi-view Normal Field Integration for 3D Reconstruction of Mirroring Objects
19 0.06795764 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
20 0.066153117 100 iccv-2013-Curvature-Aware Regularization on Riemannian Submanifolds
topicId topicWeight
[(0, 0.173), (1, -0.053), (2, -0.075), (3, -0.032), (4, -0.078), (5, -0.002), (6, 0.038), (7, -0.018), (8, 0.021), (9, -0.044), (10, -0.002), (11, 0.001), (12, -0.031), (13, 0.023), (14, 0.017), (15, -0.024), (16, -0.016), (17, 0.029), (18, -0.008), (19, 0.02), (20, 0.029), (21, -0.024), (22, 0.0), (23, -0.13), (24, -0.074), (25, 0.063), (26, 0.035), (27, -0.041), (28, 0.09), (29, -0.07), (30, 0.068), (31, -0.003), (32, -0.006), (33, 0.018), (34, 0.055), (35, 0.031), (36, -0.058), (37, -0.049), (38, -0.096), (39, 0.003), (40, 0.046), (41, 0.023), (42, 0.14), (43, 0.114), (44, 0.019), (45, -0.002), (46, -0.0), (47, -0.025), (48, -0.031), (49, 0.075)]
simIndex simValue paperId paperTitle
same-paper 1 0.93445677 422 iccv-2013-Toward Guaranteed Illumination Models for Non-convex Objects
Author: Yuqian Zhang, Cun Mu, Han-Wen Kuo, John Wright
Abstract: Illumination variation remains a central challenge in object detection and recognition. Existing analyses of illumination variation typically pertain to convex, Lambertian objects, and guarantee quality of approximation in an average case sense. We show that it is possible to build models for the set of images across illumination variation with worstcase performance guarantees, for nonconvex Lambertian objects. Namely, a natural verification test based on the distance to the model guarantees to accept any image which can be sufficiently well-approximated by an image of the object under some admissible lighting condition, and guarantees to reject any image that does not have a sufficiently good approximation. These models are generated by sampling illumination directions with sufficient density, which follows from a new perturbation bound for directional illuminated images in the Lambertian model. As the number of such images required for guaranteed verification may be large, we introduce a new formulation for cone preserving dimensionality reduction, which leverages tools from sparse and low-rank decomposition to reduce the complexity, while controlling the approximation error with respect to the original model. 1
2 0.79624993 30 iccv-2013-A Simple Model for Intrinsic Image Decomposition with Depth Cues
Author: Qifeng Chen, Vladlen Koltun
Abstract: We present a model for intrinsic decomposition of RGB-D images. Our approach analyzes a single RGB-D image and estimates albedo and shading fields that explain the input. To disambiguate the problem, our model estimates a number of components that jointly account for the reconstructed shading. By decomposing the shading field, we can build in assumptions about image formation that help distinguish reflectance variation from shading. These assumptions are expressed as simple nonlocal regularizers. We evaluate the model on real-world images and on a challenging synthetic dataset. The experimental results demonstrate that the presented approach outperforms prior models for intrinsic decomposition of RGB-D images.
3 0.72816968 405 iccv-2013-Structured Light in Sunlight
Author: Mohit Gupta, Qi Yin, Shree K. Nayar
Abstract: Strong ambient illumination severely degrades the performance of structured light based techniques. This is especially true in outdoor scenarios, where the structured light sources have to compete with sunlight, whose power is often 2-5 orders of magnitude larger than the projected light. In this paper, we propose the concept of light-concentration to overcome strong ambient illumination. Our key observation is that given a fixed light (power) budget, it is always better to allocate it sequentially in several portions of the scene, as compared to spreading it over the entire scene at once. For a desired level of accuracy, we show that by distributing light appropriately, the proposed approach requires 1-2 orders lower acquisition time than existing approaches. Our approach is illumination-adaptive as the optimal light distribution is determined based on a measurement of the ambient illumination level. Since current light sources have a fixed light distribution, we have built a prototype light source that supports flexible light distribution by controlling the scanning speed of a laser scanner. We show several high quality 3D scanning results in a wide range of outdoor scenarios. The proposed approach will benefit 3D vision systems that need to operate outdoors under extreme ambient illumination levels on a limited time and power budget.
4 0.72224188 207 iccv-2013-Illuminant Chromaticity from Image Sequences
Author: Veronique Prinet, Dani Lischinski, Michael Werman
Abstract: We estimate illuminant chromaticity from temporal sequences, for scenes illuminated by either one or two dominant illuminants. While there are many methods for illuminant estimation from a single image, few works so far have focused on videos, and even fewer on multiple light sources. Our aim is to leverage information provided by the temporal acquisition, where either the objects or the camera or the light source are/is in motion in order to estimate illuminant color without the need for user interaction or using strong assumptions and heuristics. We introduce a simple physically-based formulation based on the assumption that the incident light chromaticity is constant over a short space-time domain. We show that a deterministic approach is not sufficient for accurate and robust estimation: however, a probabilistic formulation makes it possible to implicitly integrate away hidden factors that have been ignored by the physical model. Experimental results are reported on a dataset of natural video sequences and on the GrayBall benchmark, indicating that we compare favorably with the state-of-the-art.
Author: Ying Fu, Antony Lam, Imari Sato, Takahiro Okabe, Yoichi Sato
Abstract: Hyperspectral imaging is beneficial to many applications but current methods do not consider fluorescent effects which are present in everyday items ranging from paper, to clothing, to even our food. Furthermore, everyday fluorescent items exhibit a mix of reflectance and fluorescence. So proper separation of these components is necessary for analyzing them. In this paper, we demonstrate efficient separation and recovery of reflective and fluorescent emission spectra through the use of high frequency illumination in the spectral domain. With the obtained fluorescent emission spectra from our high frequency illuminants, we then present to our knowledge, the first method for estimating the fluorescent absorption spectrum of a material given its emission spectrum. Conventional bispectral measurement of absorption and emission spectra needs to examine all combinations of incident and observed light wavelengths. In contrast, our method requires only two hyperspectral images. The effectiveness of our proposed methods are then evaluated through a combination of simulation and real experiments. We also demonstrate an application of our method to synthetic relighting of real scenes.
6 0.65773606 5 iccv-2013-A Color Constancy Model with Double-Opponency Mechanisms
7 0.63484585 262 iccv-2013-Matching Dry to Wet Materials
8 0.61547631 199 iccv-2013-High Quality Shape from a Single RGB-D Image under Uncalibrated Natural Illumination
9 0.60014588 364 iccv-2013-SGTD: Structure Gradient and Texture Decorrelating Regularization for Image Decomposition
10 0.5914405 407 iccv-2013-Subpixel Scanning Invariant to Indirect Lighting Using Quadratic Code Length
11 0.5900892 14 iccv-2013-A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding
12 0.56798756 55 iccv-2013-Automatic Kronecker Product Model Based Detection of Repeated Patterns in 2D Urban Images
13 0.55696553 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
14 0.55632681 45 iccv-2013-Affine-Constrained Group Sparse Coding and Its Application to Image-Based Classifications
15 0.55421114 292 iccv-2013-Non-convex P-Norm Projection for Robust Sparsity
16 0.54225641 271 iccv-2013-Modeling the Calibration Pipeline of the Lytro Camera for High Quality Light-Field Image Reconstruction
17 0.52661151 141 iccv-2013-Enhanced Continuous Tabu Search for Parameter Estimation in Multiview Geometry
18 0.52142292 82 iccv-2013-Compensating for Motion during Direct-Global Separation
19 0.51988316 310 iccv-2013-Partial Sum Minimization of Singular Values in RPCA for Low-Level Vision
20 0.49733627 135 iccv-2013-Efficient Image Dehazing with Boundary Constraint and Contextual Regularization
topicId topicWeight
[(2, 0.024), (7, 0.015), (26, 0.057), (31, 0.034), (40, 0.012), (42, 0.583), (64, 0.024), (73, 0.022), (89, 0.126), (98, 0.012)]
simIndex simValue paperId paperTitle
1 0.97382879 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
Author: Xu Wang, Stefan Atev, John Wright, Gilad Lerman
Abstract: The problem of efficiently deciding which of a database of models is most similar to a given input query arises throughout modern computer vision. Motivated by applications in recognition, image retrieval and optimization, there has been significant recent interest in the variant of this problem in which the database models are linear subspaces and the input is either a point or a subspace. Current approaches to this problem have poor scaling in high dimensions, and may not guarantee sublinear query complexity. We present a new approach to approximate nearest subspace search, based on a simple, new locality sensitive hash for subspaces. Our approach allows point-tosubspace query for a database of subspaces of arbitrary dimension d, in a time that depends sublinearly on the number of subspaces in the database. The query complexity of our algorithm is linear in the ambient dimension D, allow- ing it to be directly applied to high-dimensional imagery data. Numerical experiments on model problems in image repatching and automatic face recognition confirm the advantages of our algorithm in terms of both speed and accuracy.
same-paper 2 0.96222299 422 iccv-2013-Toward Guaranteed Illumination Models for Non-convex Objects
Author: Yuqian Zhang, Cun Mu, Han-Wen Kuo, John Wright
Abstract: Illumination variation remains a central challenge in object detection and recognition. Existing analyses of illumination variation typically pertain to convex, Lambertian objects, and guarantee quality of approximation in an average case sense. We show that it is possible to build models for the set of images across illumination variation with worstcase performance guarantees, for nonconvex Lambertian objects. Namely, a natural verification test based on the distance to the model guarantees to accept any image which can be sufficiently well-approximated by an image of the object under some admissible lighting condition, and guarantees to reject any image that does not have a sufficiently good approximation. These models are generated by sampling illumination directions with sufficient density, which follows from a new perturbation bound for directional illuminated images in the Lambertian model. As the number of such images required for guaranteed verification may be large, we introduce a new formulation for cone preserving dimensionality reduction, which leverages tools from sparse and low-rank decomposition to reduce the complexity, while controlling the approximation error with respect to the original model. 1
Author: De-An Huang, Yu-Chiang Frank Wang
Abstract: Cross-domain image synthesis and recognition are typically considered as two distinct tasks in the areas of computer vision and pattern recognition. Therefore, it is not clear whether approaches addressing one task can be easily generalized or extended for solving the other. In this paper, we propose a unified model for coupled dictionary and feature space learning. The proposed learning model not only observes a common feature space for associating cross-domain image data for recognition purposes, the derived feature space is able to jointly update the dictionaries in each image domain for improved representation. This is why our method can be applied to both cross-domain image synthesis and recognition problems. Experiments on a variety of synthesis and recognition tasks such as single image super-resolution, cross-view action recognition, and sketchto-photo face recognition would verify the effectiveness of our proposed learning model.
4 0.95641178 70 iccv-2013-Cascaded Shape Space Pruning for Robust Facial Landmark Detection
Author: Xiaowei Zhao, Shiguang Shan, Xiujuan Chai, Xilin Chen
Abstract: In this paper, we propose a novel cascaded face shape space pruning algorithm for robust facial landmark detection. Through progressively excluding the incorrect candidate shapes, our algorithm can accurately and efficiently achieve the globally optimal shape configuration. Specifically, individual landmark detectors are firstly applied to eliminate wrong candidates for each landmark. Then, the candidate shape space is further pruned by jointly removing incorrect shape configurations. To achieve this purpose, a discriminative structure classifier is designed to assess the candidate shape configurations. Based on the learned discriminative structure classifier, an efficient shape space pruning strategy is proposed to quickly reject most incorrect candidate shapes while preserve the true shape. The proposed algorithm is carefully evaluated on a large set of real world face images. In addition, comparison results on the publicly available BioID and LFW face databases demonstrate that our algorithm outperforms some state-of-the-art algorithms.
5 0.94366527 167 iccv-2013-Finding Causal Interactions in Video Sequences
Author: Mustafa Ayazoglu, Burak Yilmaz, Mario Sznaier, Octavia Camps
Abstract: This paper considers the problem of detecting causal interactions in video clips. Specifically, the goal is to detect whether the actions of a given target can be explained in terms of the past actions of a collection of other agents. We propose to solve this problem by recasting it into a directed graph topology identification, where each node corresponds to the observed motion of a given target, and each link indicates the presence of a causal correlation. As shown in the paper, this leads to a block-sparsification problem that can be efficiently solved using a modified Group-Lasso type approach, capable of handling missing data and outliers (due for instance to occlusion and mis-identified correspondences). Moreover, this approach also identifies time instants where the interactions between agents change, thus providing event detection capabilities. These results are illustrated with several examples involving non–trivial interactions amongst several human subjects. 1. Introduction and Motivation The problem of identifying causal interactions amongst targets in a video sequence has been the focus of considerable attention in the past few years. A large portion of the existing body of work in this field uses human annotated video to build a storyline that includes both recognizing the activities involved and the causal relationships between them (see for instance [10] and references therein). While these methods are powerful and work well when suitably annotated data is available, annotating video clips is expensive and parsing relevant actions requires domain knowledge which may not be readily available. Indeed, in many situations, unveiling potentially hidden causal relationships is a first step towards building such knowledge. In this paper we consider the problem of identifying causal interactions amongst targets, not necessarily human, ∗This work was supported by NSF grants IIS–0713003, IIS-1318145, and ECCS–0901433, AFOSR grant FA9559–12–1–0271, and the Alert DHS Center of Excellence under Award Number 2008-ST-061-ED0001 . from unannotated video sequences and without prior domain knowledge. Our approach exploits the concept of “Granger Causality” [9], that formalizes the intuitive idea that ifa time series {x(t)} is causally related to a second one {thya(tt)if}a, ttihmene knowledge }oifs tchaeu past vrealluateesd otfo a{yse}c1to should l{eya(dt t}o, a ebnett kern prediction o thf efu ptuasret vvaalulueess ooff {{yx}}tt+k. In [l1ea4d], Pora ab bheatktearr eprt.e aicl.t successfully vuasleude a frequency domain reformulation of this concept to uncover pairwise interactions in scenarios involving repeating events, such as social games. This technique was later extended in [17] to model causal correlations between human joints and applied to the problem of activity classification. However, since this approach is based upon estimating the crosscovariance density function between events, it cannot handle situations where these events are non repeating, are too rare to provide an accurate estimate, or where these estimates are biased by outliers or missing data. Further, estimating a pairwise measure of causal correlation requires a spectral factorization of the cross-covariance, followed by numerical integration and statistical thresholding, limiting the approach to moderately large problems. To circumvent these problems, in this paper we propose an alternative approach based upon recasting the problem into that of identifying the topology of a sparse (directed) graph, where each node corresponds to the time traces of relevant features of a target, and each link corresponds to a regressor. The situation is illustrated in Fig. 1 using as an example the problem of finding causal relations amongst 4 tennis players, leading to a graph with 4 nodes, and potentially 12 (directed) links. Note that in general, the problem of identifying causal relationships is ill posed (unless one wants to identify the set of all individuals that could possibly have causal connections), due to the existence of secondary interactions. To illustrate this point, consider a very simplistic scenario with three actors A, B, and C, where A copies (with some delay) the actions of B, which in turn mimics C, also with some delay. In this situation, the ac- tions of A can be explained in terms of either those of B delayed one time sample, or those of C delayed by two samples. Thus, an algorithm based upon a statistical analysis 33556758 would identify a causal connection between A and C, even though there is no direct link between them. Further, if the actions of C can be explained by some simple autoregressive model of the form: = C(t) ?aiC(t − i) then it follows that the acti?ons of A can be explained by the same model, e.g. = A(t) ?aiA(t − i) Hence, multiple graphs topologies, some of which include self-loops, can explain the same set of time-series. On the other hand, note that in this situation, the sparsest graph (in the sense of having the fewest links) is the one that correctly captures the causality relations: the most direct cause of A is B and that of B is C, with C potentially being explained by a self-loop. To capture this feature and regularize the problem, in the sequel we will seek to find the sparsest graph, in the sense of having the least number of interconnections, that explains the observed data, reflecting the fact that, when alternative models are possible, often the most parsimonious is the correct one. Our main result shows that the problem of identifying sparse graph structures from observed noisy data can be reduced to a convex optimization problem (via the use of Group Lasso type arguments) that can be efficiently solved. The advantages of the proposed methods are: • • • • Its ability to handle complex scenarios involving nonrepeating events, een cvoimropnlmeexn stcael changes, clvoillnegct nioonnsof targets that do not necessarily split into well defined groups, outliers and missing data. The ability to identify the sparsest interaction structure tThhaet explains th idee nobtifseyr tvheed s dpaartas e(stthu inst avoiding labeling as causal connections those indirect correlations mediated only by an intermediary), together with a sparse “indicator” function whose support set indicates time instants where the interactions between agents change. Since the approach is not based on semantic analysis, iSt can bt hee applied ctoh ti she n moto btiaosne dof o arbitrary targets, sniost, necessarily humans (indeed, it applies to arbitrary time series including for instance economic or genetic data). From a computational standpoint, the resulting optiFmriozmatio an c problems nhaalve s a specific fthoerm re asmuletinnagbl oep ttiobe solved by a class of iterative algorithms [5, 3], that require at each step only a combination of thresholding and least-squares approximations. These algorithms have been shown to substantially outperform conventional convex-optimization solvers both in terms of memory and computation time requirements. The remainder of the paper is organized as follows. In section 2 we provide a formal reformulation of the problem of finding causal relationships between agents as a sparse graph identification problem. In section 3, we show that this problem can be efficiently solved using a re-weighted Group Lasso approach. Moreover, as shown there, the resulting problem can be solved one node at a time using first order methods, which allows for handling situations involving a large number of agents. Finally, the effectiveness of the proposed method is illustrated in section 4 using both simple scenarios (for which ground truth is readily available) and video clips of sports, involving complex, nonrepeating interactions amongst many agents. Figure 1. Finding causal interactions as a graph identification problem. Top: sample frame from a doubles tennis sequence. Bottom: Representation of this sequence as a graph, where each node represents the time series associated with the position of each player and the links are vector regressive models. Causal interactions exist when one of the time series can be explained as a combination of past values of the others. 2. Preliminaries For ease of reference, in this section we summarize the notation used in the paper and give a formal definition of the problem under consideration. 2.1. Notation (M) ?M? ??MM??F ?M?1 ?M?o σi ∗ ◦ ith largest singular value of the matrix M. nuclear norm: ?M? ?i σ?i (M). Fnruocbleeanrio nours norm: ??M?2F? ?i,j Mi2j ?1 norm: ?M? 1 ?i,j |Mij? ?|. ?o quasi-norm: ?M?o number of non-zero ?eleme?nMts i?n M. Hadamard product of matrices: (A ◦ ∗ =.: =. =. =. B)i,j = Ai,jBi,j. 33556769 2.2. Statement of the Problem Next, we formalize the problem under consideration. Consider a scenario with P moving agents, and denote by the 3D homogenous coordinates of the pth individual at time t. Motivated by the idea of Granger Causality, we will say that the actions of this agent depend causally from those in a set Ip (which can possibly contain p itself), if can be written as: Q˜p(t) Q˜p(t) Q˜p(t) ?N = ? ?ajp(n)Q˜j(t − n) +˜ η p(t) +˜ u p(t) (1) j? ?∈Ip ?n=0 Here ajp are unknown coefficients, and ˜η p(t) and up(t) represent measurement noise and a piecewise constant signal that is intended to account for relatively rare events that cannot be explained by the (past) actions of other agents. Examples include interactions of an agent with the environment, for instance to avoid obstacles, or changes in the interactions between agents. Since these events are infrequent, we will model as a signal that has (component-wise) a sparse derivative. Note in passing that since (1) involves homogeneous coordinates, the coefficients aj,p(.) satisfy the following constraint1 u ?N ? ?ajp(n) j? ?∈Ip ?n=0 =1 (2) Our goal is to identify causal relationships using as data 2D measurements qp(t) in F frames of the affine projections of the 3D coordinates Q˜p(t) of the targets. Note that, under the affine camera assumption, the 2D coordinates are related exactly by the same regressor parameters [2]. Thus, (1) holds if and only if: ?N qp(t) = ? ?ajp(n)qj(t − n) + u˜ p(t) + ηp(t) (3) j?∈Ip ?n=0 In this context, the problem can be precisely stated as: Given qp(t) (in F number of frames) and some a-priori bound N on the order of the regressors (that is the “memory” of the interactions), find the sparsest set of equations of the form (3) that explains the data, that is: aj,pm,ηinp,up?nIp (4) subject to? ?(2) and: = ? ?ajp(n)qj(t − n) + ?N qp(t) j? ?∈Ip ?n=0 up(t) + ηp(t) , p = 1 . . . , P and t = 1, ..F 1This follows by considering the third coordinate in (1) (5) where nIp denotes the cardinality of the set Ip. Rewriting (5) in matrixd efnoormtes yields: [xp; yp] = [Bp, I][apTuxTpuyTp]T + ηp (6) where qp(t) up(t) ηp(t) xp yp ap aip uxp uyp Bp Xp = [xp(t)Typ(t)T]T = [uTxp(t)uyTp(t)]T = [ηxp(t)Tηyp(t)T]T = = [xp(F)xp(F − 1)...xp(1)]T = [yp(F)yp(F − 1)...yp(1)]T [aT1p, a2Tp, ..., aTPp]T = [aip(0), aip(1), ..., aip(N)]T = [uxp(F)uxp(F−1)...uxp(1)]T = [uyp(F)uyp(F−1)...uyp(1)]T = = [Xp; Yp] [hankel(x1 , N) , ..., hankel(xP, N)] Yp = [hankel(y1, N), ..., hankel(yP, N)] and where, for a sequence z(t), hankel(z, N) denotes its associated Hankel matrix: hankel(z, N) = Itfolw⎛⎜⎝ sz t(hNzFa(t. +−a)d1 2e)scrzip(tF io(N. n− )o231f)al· t h· einzt(Frac−zti(.o1N.n)s−a)m12o)⎟ ⎞⎠ ngst uηaq= ? ηuqa1 T ,ηqau2 T ,ηaqu3 T ,· ·, ηauqP T ? T (8) Thus,inthBisc=on⎢⎣⎡teBx0t.1,theB0p.r2ob·le.·m·ofB0 i.nPte⎦⎥r ⎤estcanbeforagents (that is the complete graph structure) is captured by a matrix equation of the form: q = [B, I][aTuT]T + η (7) where and malized as finding the block–sparsest solution to the set of linear equations (2) and (7). 33557770 The problem of identifying a graph structure subject to sparsity constraints, has been the subject of intense research in the past few years. For instance, [1] proposed a Lasso type algorithm to identify a sparse network where each link corresponds to a VAR process. The main idea underlying this method is to exploit the fact that penalizing the ?1 norm of the vector of regression coefficients tends to produce sparse solutions. However, enforcing sparsity of the entire vector of regressor coefficients does not necessarily result in a sparse graph structure, since the resulting solution can consist of many links, each with a few coefficients. This difficulty can be circumvented by resorting to group Lasso type approaches [18], which seek to enforce block sparsity by using a combination of ?1 and ?2 norm constraints on the coefficients of the regressor. While this approach was shown to work well with artificial data in [11], exact recovery of the underlying network can be only guaranteed when the data satisfies suitable “incoherence” type conditions [4]. Finally, a different approach was pursued in [13], based on the use of a modified Orthogonal Least Squares algorithm, Cyclic Orthogonal Least Squares. However, this approach requires enforcing an a-priori limit on the number of links allowed to point to a single node, and such information may not be readily available, specially in cases where this number has high variability amongst nodes. To address these difficulties, in the next section we develop a convex optimization based approach to the problem of identifying sparse graph structures from observed noisy data. This method is closest in spirit to that in [11], in the sense that it is also based on a group Lasso type argument. The main differences consist in the ability to handle the unknown inputs up(t), needed to model exogenous disturbances affecting the agents, and in a reformulation of the problem, that allows for using a re-weighted iterative type algorithm, leading to substantially sparser solutions, even when the conditions in [4] fail. 3. Causality Identification Algorithm In this section we present the main result of this paper, an algorithm to search for block-sparse solutions to (7). For each fixed p, the algorithm searches for sparse solutions to (6) by solving (iteratively) the following problem (suggested by the re-weighted heuristic proposed in [7]) ?P ap,muxipn,uypi?=1wja(?aip?2) + λ??diag(wu)[Δuxp;Δuyp]??1 subject to: ?ηp ? ≤ p = 1, . . , P. ∞ ?P ?, ?N ??aip(n) i?= ?1 ?n=0 ?. = 1, p = 1,...,P. (9) where [Δuxp ; Δuyp] represents the first order differences of the exogenous input vector [uxp ; uyp], Wa and Wu are weighting matrices, and λ is a Lagrange multiplier that plays the role of a tuning parameter between graph sparsity and event sensitivity. Intuitively, for a fixed set of weights w, the algorithm attempts to find a block sparse solution to (6) and a set of sparse inp?uts Δuxp ; Δuyp , by exploiting the facts that minimizing ?i ?aip ?2 (the ?2,1 norm of the vector sequence {aip}) te?nds? tao m?aximize block-sparsity [18], while minimizing et?nhed s? 1t norm mmaizxeim blizoceks sparsity [ [1168]]. wOhniclee t mheisnesolutions are found, the weights w are adjusted to penalize those elements of the sequences with small values, so that in the next iteration solutions that set these elements to zero (hence further increasing sparsity) are favored. Note however, that proceeding in this way, requires solving at each iteration a problem with n = P(Pnr + F) variables, where P and F denote the number of agents and frames, respectively, and where nr is a bound on the regressor order. On the other hand, it is easily seen that both the objective function and the constraints in (9) can be partitioned into P groups, with the pth group involving only the variables related to the pth node. It follows then that problem (9) can be solved by solving P smaller problems of the form: ?P ap,muxipn,uypi?=1wja(?aip?2) + λ??diag(wu)[Δuxp;Δuyp]??1 ?P subject to: ?ηp?∞ ?N ≤ ? and ??aip(n) i?= ?1 ?n=0 leading to the algorithm given below: =1 (10) Algorithm 1: REWEIGHTEDCAUSALITYALGORITHM for each p wa = [1, 1, ..., 1] = [1, 1, ..., 1] S > 1(self loop weight) s = [1, 1, ..., S, ..., 1] (p’th element is S) while not converged do 1. solve (9) 2. wja = 1/( ?aip ?2 + δ) 3. wja = wja ◦ s (Penalization self loops) 4. = 1./(abs([Δuxp ; Δuyp]) + δ) end while 5. At this point ajp(.) , Ip and up(t) have been identified end for wu wu It is worth emphasizing that, since the computational complexity of standard interior point methods grows as n3, solving these smaller P problems leads to roughly a O(P2) 33557781 reduction in computational time over solving a single, larger optimization. Thus, this approach can handle moderately large problems using standard, interior-point based, semidefinite optimization solvers. Larger problems can be accommodated by noting that the special form of the objective and constraints allow for using iterative Augmented La- grangian Type Methods (ALM), based upon computing, at each step, the closed form solution to suitable intermediate optimization problems. While a complete derivation of such an algorithm is beyond the scope of this paper, using results from [12] it can be shown that each step requires only a combination of thresholding and least-squares approximations. Moreover, it can be shown that such an algorithm converges Q-superlinearly. 4. Handling Outliers and Missing Data The algorithm outlined above assumes an ideal situation where the data matrix B is perfectly known. However, in practice many of its elements may be outliers (due to misidentified correspondences) or missing (due to occlusion). As we briefly show next, these situations can be efficiently handled by performing a structured robust PCA step [3] to obtain a “clean” data matrix, prior to applying Algorithm 1. From equation (6) it follows that, in the absence of exogenous inputs and noise: ?xy11.. . .yxPP? = ?XY11.. . .YXPP? ?a1...aP? (11) Since xi ∈ {col(Xj)} and yi ∈ {col(Yj }), it follows that the sets {∈co {l(cXoli(X)} a)n}d a n{dco yl(Y∈i) { }c? are self-ex?pressive, or, ?equivalently?, Xthe }ma atnridce {sc oXl( =.) }? aXre1 . . . fX-eNxp? eanssdiv eY, ?Y1 ...YN? are mraantkri cdeesfic Xient. ?Consider no?w the case =.r, w?here some ?elements xi, yi of X and Y are missing. From ?the self-expressive property ooff {Xco aln(Xd Yi)} a raen dm i{scsoinlg(Y. Fi)ro} mit tfhoello swelsf tehxaptr ethsessieve missing eyle omf {encotsl are given by: xi = argmin rank(X) , yi x = argmin rank(Y) (12) y Similarly, in the presence of outliers, X, Y can be decomposed irnlyto, itnhe t sum oesfe a lcoew o fra onkut mlieartsr,ix X (,thYe ccalenan b eda dtae)c oamnda sparse one (the outliers) by solving a problem of the form minrank?YXoo?+ λ????EEYX????os. t.: ?XYoo?+?EEYX?=?YX? From the reasoning? abov?e it follows that in the presence of noise and exogenous outputs, the clean data record can be recovered from the corrupted, partial measurements by solving the following optimization problem: s+muλibn3je? ? ? cYXtM ot ? Y?X:∗◦ +Ξ λYX1? ? ? FM XY◦ E XY? ?1+λ2? ?M YX◦ Δ U YX? ?1 ?YX?=?XYoo?+?EEXY?+?UUYX?+?ΞΞYX? (13) where we have used the standard convex relaxations of rank and cardinality2. Here Ξ and U denote noise and piecewise constant exogenous matrices, ΔU denotes the matrix obtained by taking the difference between consecutive elements in U, and MX (MY) is a “mask” matrix, with mi,j = 0 if the element (i, j) in X ( Y) is missing, mi,j = 1 otherw=i0s e, i tuhseed e etom aenvtoi (di, penalizing )e lisem miesnstisn gin, mE, Ξ, U corresponding to missing data. Problem (13) is a structured robust PCA problem (due to the Hankel structure of X, Y) trhobatu can C bAe efficiently suoelv teod t using tkheel fsitrrsut oturrdeer o mf Xeth,oYd) proposed in [3], slightly modified to handle the terms containing ΔU. 5. Experimental Results In this section we illustrate the effectiveness of the proposed approach using several video clips (provided as supplemental material). The results of the experiments are displayed using graphs embedded on the video frames: An arrow indicates causal correlation between agents, with the point of the arrow indicating the agent whose actions are affected by the agent at its tail. The internal parameters of the algorithm were experimentally tuned, leading to the values ? = 0.1, = 0.05, self loop weights S = 10. The algorithm is fairly insensitive to the value of the regularization parameters and S, which could be adjusted up or down by an order of magnitude without affecting the structure of the resulting graph. Finally, we used regressor order N=2 for the first three examples and N=4 for the last one, a choice that is consistent with the frame rate and the complexity of λ λ the actions taking place in each clip. 5.1. Clips from the UT-Interaction Data Set We considered two video clips from the UT Human Interaction Data Set [15] (sequences 6 and 16). Figures 2 and 5 compare the results obtained applying the proposed algorithm versus Group Lasso (GL) [11] and Group Lasso combined with the reweighted heuristic described in (9) (GLRW). In all cases, the inputs to the algorithm were the (approximate) coordinates of the heads of each of the agents, normalized to the interval [−1, 1], artificially corrupted ,w niothrm m10al%iz eodut tloie trhs.e Notably, [t−he1 proposed algorithm 2As shown in [6, 8] under suitable conditions these relaxations the exact minimum rank solution. 33557792 recover Figure 2. Sample frames from the UT sequence 6 with the identified causal connections superimposed. Top: Proposed Method. Center: Reweighted Group Lasso. Bottom: Group Lasso. Only the proposed method identifies the correct connections. was able to correctly identify the correlations between the agents from this very limited amount of information, while the others failed to do so. Note in passing that in both cases none of the algorithms were directly applicable, due to some of the individuals leaving the field of view or being occluded. As illustrated in Fig. 3, the missing data was recovered by solving an RPCA problem prior to applying Algorithm 1. Finally, Fig. 4 sheds more insight on the key role played by the sparse signal u. As shown there, changes in u correspond exactly to time instants when the behavior of the corresponding agent deviates from the general pattern followed during most of the clip. Figure 3. Time traces of the individual heads in the UT sequence 6, artificially corrupted with 10 % outliers. The outliers were removed and the missing data due to targets leaving the field of view was estimated solving a modified RPCA problem. Frame number Figure 4. Sample (derivative sparse) exogenous signals in the UT sequence 6. The changes correspond to the instants when the second person starts moving towards the first, who remains stationary, and when the two persons merge in an embrace. Figure 5. Sample frames from the UT sequence 16. Top: Correct correlations identified by the Proposed Method. Center and Bottom: Reweighted Group Lasso and Group Lasso (circles indicate self-loops). 5.2. Doubles Tennis Experiment This experiment considers a non-staged real-life scenario. The data consists of 230 frames of a video clip from the Australian Open Tennis Doubles Final games. The goal here is to identify causal relationships between the different players using time traces of the respective centroid positions. Note that in this case the ground truth is not available. Nevertheless, since players from the same team usually look at their opponents and react to their motions, we expect a strong causality connection between members of 33557803 opposite teams. This intuition is matched by the correlations unveiled by the algorithm, shown in Fig. 6. The identified sparse input corresponding to the vertical direction is shown in Fig. 7 (similar results for the horizontal component are omitted due to space reasons.) Figure 6. Sample frames from the tennis sequence. Top: The proposed method correctly identifies interactions between opposite team members. Center: Reweighted Group Lasso misses the interaction between the two rear-most individuals of opposite teams, generating self loops instead (denoted by the disks). Bottom: Group Lasso yields an almost complete graph. Figure 7. Exogenous signal corresponding to the vertical axis for the tennis sequence. The change in one component corresponds to the instant when the leftmost player in the bottom team moves from the line towards the net, remaining closer to it from then on. 5.3. Basketball Game Experiment This experiment considers the interactions amongst players in a basketball game. As in the case ofthe tennis players, since the data comes from a real life scenario, the ground truth is not available. However, contrary to the tennis game, this scenario involves complex interactions amongst many players, and causality is hard to discern by inspection. Nevertheless, the results shown in Fig. 8, obtained using the position of the centroids as inputs to our algorithm, match our intuition. Firstly, one would expect a strong cause/effect connection between the actions of the player with the ball and the two defending opponents facing him. These connections (denoted by the yellow arrows) were indeed successfully identified by the algorithm. The next set of causal correlations is represented by the (blue, light green) and (black, white) arrow pairs showing the defending and the opponent players on the far side of the field and under the hoop. An important, counterintuitive, connection identified by the algorithm is represented by the magenta arrows be- tween the right winger of the white team with two of his teammates: the one holding the ball and the one running behind all players. While at first sight this connection is not as obvious as the others, it becomes apparent towards the end of the sequence, when the right winger player is signaling with a raised arm. Notably, our algorithm was able to unveil this signaling without the need to perform a semantic analysis (a very difficult task here, since this signaling is apparent only in the last few frames). Rather, it used the fact that the causal correlation was encapsulated in the dynamics of the relative motions of these players. 6. Conclusions In this paper we propose a new method for detecting causal interactions between agents using video data. The main idea is to recast this problem into a blind directed graph topology identification, where each node corresponds to the observed motion of a given target, each link indicates the presence of a causal correlation and the unknown inputs account for changes in the interaction patterns. In turn, this problem can be reduced to that of finding block-sparse solutions to a set of linear equations, which can be efficiently accomplished using an iterative re-weighted Group-Lasso approach. The ability of the algorithm to correctly identify causal correlations, even in cases where portions of the data record are missing or corrupted by outliers, and the key role played by the unknown exogenous input were illustrated with several examples involving non–trivial inter- actions amongst several human subjects. Remarkably, the proposed algorithm was able to identify both the correct interactions and the time instants when interactions amongst agents changed, based on minimal motion information: in all cases we used just a single time trace per person. This success indicates that in many scenarios, the dynamic information contained in the motion pattern of a single feature associated with a target is rich enough to enable identifying complex interaction patterns, without the need to track multiple features, perform a semantic analysis or use additional domain knowledge. 33557814 Figure 8. Sample frames from a Basketball game. Top: proposed method. Center: Reweighted Group the signaling player and his teammates. Bottom: Group Lasso yields an almost complete graph. Lasso misses the interaction between References [1] A. Arnold, Y. Liu, and N. Abe. Estimating brain functional connectivity with sparse multivariate autoregression. In Proc. of the 13th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 66–75, 2007. 4 [2] M. Ayazoglu, B. Li, C. Dicle, M. Sznaier, and O. Camps. Dynamic subspace-based coordinated multicamera tracking. In 2011 IEEE ICCV, pages 2462–2469, 2011. 3 [3] M. Ayazoglu, M. Sznaier, and O. Camps. Fast algorithms for structured robust principal component analysis. In 2012 IEEE CVPR, pages 1704–171 1, June 2012. 2, 5 [4] A. Bolstad, B. Van Veen, and R. Nowak. Causal network inference via group sparse regularization. IEEE Transactions on Signal Processing, 59(6):2628–2641, 2011. 4 [5] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Dis- [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] tributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1):1–122, Jan. 2011. 2 E. Candes, X. Li, Y. Ma, and J.Wright. Robust principal component analysis? J. ACM, (3), 2011. 5 E. J. Candes, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted l1minimization. Journal of Fourier Analysis and Applications, 14(5):877–905, December 2008. 4 V. Chandrasekaran, S. Sanghavi, P. Parrilo, and A. Willsky. Rank-sparsity incoherence for matrix decomposition. Siam J. Optim., (2):572–596, 2011. 5 C. W. J. Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica, pages 424–438l, 1969. 1 A. Gupta, P. Srinivasan, J. Shi, and L. Davis. Understanding videos, constructing plots: Learning a visually grounded storyline model from annotated videos. In 2009 IEEE CVPR, pages 2012–2019, 2009. 1 S. Haufe, G. Nolte, K. R. Muller, and N. Kramer. Sparse causal discovery in multivariate time series. In Neural Information Processing Systems, 2009. 4, 5 G. Liu, Z. Lin, and Y. Yu. Robust subspace segmentation by low-rank representation. In ICML, pages 1663–670, 2010. 5 D. Materassi, G. Innocenti, and L. Giarre. Reduced complexity models in identification of dynamical networks: Links with sparsification problems. In 48th IEEE Conference on Decision and Control, pages 4796–4801, 2009. 4 K. Prabhakar, S. Oh, P. Wang, G. Abowd, and J. Rehg. Temporal causality for the analysis ofvisual events. In IEEE Conf Comp. Vision and Pattern Recog. (CVPR)., pages 1967– 1974, 2010. 1 M. S. Ryoo and J. K. Aggarwal. UT Interaction Dataset, ICPR contest on Semantic Description of Human Activities. http://cvrc.ece.utexas.edu/SDHA2010/Human Interaction.html, 2010. 5 [16] J. Tropp. Just relax: convex programming methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 52(3): 1030–1051, 2006. 4 [17] S. Yi and V. Pavlovic. Sparse granger causality graphs for human action classification. In 2012 ICPR, pages 3374–3377. 1 [18] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68(1):49–67, 2006. 4 33557825
6 0.92486107 213 iccv-2013-Implied Feedback: Learning Nuances of User Behavior in Image Search
7 0.90383112 46 iccv-2013-Allocentric Pose Estimation
8 0.88867813 184 iccv-2013-Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion
9 0.86121809 231 iccv-2013-Latent Multitask Learning for View-Invariant Action Recognition
10 0.83833545 93 iccv-2013-Correlation Adaptive Subspace Segmentation by Trace Lasso
11 0.83228868 14 iccv-2013-A Generalized Iterated Shrinkage Algorithm for Non-convex Sparse Coding
12 0.81751025 54 iccv-2013-Attribute Pivots for Guiding Relevance Feedback in Image Search
13 0.80300444 161 iccv-2013-Fast Sparsity-Based Orthogonal Dictionary Learning for Image Restoration
14 0.79279184 259 iccv-2013-Manifold Based Face Synthesis from Sparse Samples
15 0.7854014 398 iccv-2013-Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person
16 0.77755731 114 iccv-2013-Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution
17 0.77252817 52 iccv-2013-Attribute Adaptation for Personalized Image Search
18 0.77207083 154 iccv-2013-Face Recognition via Archetype Hull Ranking
19 0.76825351 335 iccv-2013-Random Faces Guided Sparse Many-to-One Encoder for Pose-Invariant Face Recognition
20 0.76742834 106 iccv-2013-Deep Learning Identity-Preserving Face Space