cvpr cvpr2013 cvpr2013-141 knowledge-graph by maker-knowledge-mining

141 cvpr-2013-Efficient Computation of Shortest Path-Concavity for 3D Meshes


Source: pdf

Author: Henrik Zimmer, Marcel Campen, Leif Kobbelt

Abstract: In the context of shape segmentation and retrieval object-wide distributions of measures are needed to accurately evaluate and compare local regions ofshapes. Lien et al. [16] proposed two point-wise concavity measures in the context of Approximate Convex Decompositions of polygons measuring the distance from a point to the polygon ’s convex hull: an accurate Shortest Path-Concavity (SPC) measure and a Straight Line-Concavity (SLC) approximation of the same. While both are practicable on 2D shapes, the exponential costs of SPC in 3D makes it inhibitively expensive for a generalization to meshes [14]. In this paper we propose an efficient and straight forward approximation of the Shortest Path-Concavity measure to 3D meshes. Our approximation is based on discretizing the space between mesh and convex hull, thereby reducing the continuous Shortest Path search to an efficiently solvable graph problem. Our approach works outof-the-box on complex mesh topologies and requires no complicated handling of genus. Besides presenting a rigorous evaluation of our method on a variety of input meshes, we also define an SPC-based Shape Descriptor and show its superior retrieval and runtime performance compared with the recently presented results on the Convexity Distribution by Lian et al. [12].

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Computer Graphics Group RWTH Aachen University, Germany Abstract In the context of shape segmentation and retrieval object-wide distributions of measures are needed to accurately evaluate and compare local regions ofshapes. [sent-7, score-0.139]

2 [16] proposed two point-wise concavity measures in the context of Approximate Convex Decompositions of polygons measuring the distance from a point to the polygon ’s convex hull: an accurate Shortest Path-Concavity (SPC) measure and a Straight Line-Concavity (SLC) approximation of the same. [sent-9, score-0.434]

3 While both are practicable on 2D shapes, the exponential costs of SPC in 3D makes it inhibitively expensive for a generalization to meshes [14]. [sent-10, score-0.142]

4 Our approximation is based on discretizing the space between mesh and convex hull, thereby reducing the continuous Shortest Path search to an efficiently solvable graph problem. [sent-12, score-0.315]

5 Our approach works outof-the-box on complex mesh topologies and requires no complicated handling of genus. [sent-13, score-0.154]

6 Furthermore, while global convexity measures define one value per shape, local convexity measures yield distributions of values, e. [sent-23, score-0.432]

7 In the context oflocal convexity an intuitive measure was left) a proper outer hull (yellow, top right) is computed to allow for a stable discretization (bottom right) of the free space. [sent-26, score-0.658]

8 recently defined based on the shortest path distance from each point on a shape through empty space to its convex hull. [sent-29, score-0.392]

9 This measure is efficiently computable on 2D polygons but an (exact) 3D mesh generalization is prohibitively expensive. [sent-30, score-0.218]

10 A standard measure for convexity of a 3D (2D) shape S is the ratio between the vcoolnuvmexei y(ar oefa) a e 3nDclo (s2eDd) by atphee shape htoe rthataito o bfe ittws convex hull CS: C(S) = Vol(S) · Vol(CS)−1. [sent-35, score-0.673]

11 While this simple 222111555533 measure captures the visual convexity of a shape it has been argued against it as it is insensitive to deep, thin slits in the shape. [sent-36, score-0.283]

12 A new projection-based global convexity measure, which is able to capture such fine concavities was introduced by Zunic and Rosin for 2D polygons in 2004 [24] and generalized to 3D meshes by Lian et al. [sent-37, score-0.364]

13 It is based on a view-dependent ratio of areas, namely the area of the projected silhouette of a mesh over the sum of all projected face areas. [sent-39, score-0.154]

14 Local convexity measures have been well investigated for 2D shapes. [sent-44, score-0.216]

15 Our research was motivated by the concavity measures defined by Lien et al. [sent-45, score-0.209]

16 Here the authors use the concept of bridges (edges on the convex hull) and pockets (concave interior regions of the shape bounded by a corresponding bridge) to define the Shortest Path-Concavity (SPC) and its approximation: Straight Line-Concavity (SLC). [sent-47, score-0.207]

17 For each pocket point the concavity is defined as the shortest distance (in the respective metric) to the closest point on the associated bridge. [sent-48, score-0.393]

18 In the context of robot motion planning the NP-hardness of continuous shortest paths was proved by Canny and Reif [5]. [sent-51, score-0.214]

19 Both approximations [6] and exact algorithms [17] have been proposed, however, they have at least superquadratic complexity even for the sub-problem of computing the shortest path between a pair of given points. [sent-52, score-0.221]

20 This projection is a geometrically delicate procedure requiring extra case-handling for certain surface cavities, and the identification of handle loops and reduction of genus on general polygonal meshes [14]. [sent-54, score-0.241]

21 As an alternative to the problematic bridge/pocket projection Mamou and Ghorbel simply use the straight line distance in normal direction to the convex hull [15]. [sent-55, score-0.429]

22 In the recent Fast-ACD [7] a localized relative concavity Straight Line-Concavity (SLC) Shortest Path-Concavity (SPC) Figure 2. [sent-57, score-0.176]

23 The SPC accurately computes the shortest distances to the convex hull without self-intersecting the object. [sent-59, score-0.593]

24 The SLC distances measured straight through the object are a poor convexity description depending only the closeness to the convex hull and not what lies in between. [sent-60, score-0.641]

25 While increasing the efficiency of the ACD and reducing oversegmentation, the concavity measure still deviates from the intuitive one and has not yet been generalized to objects with genus 0. [sent-63, score-0.241]

26 Convexity/Concavity encodes certain characteristics of shapes and can thus be used to derive shape descriptors for 3D retrieval or matching. [sent-65, score-0.133]

27 Briefly, their experiments showed two things: that using a distribution of several values is superior to using a single valued descriptor and that retrieval performance can be improved by using composite descriptors additionally enriched by convexity. [sent-69, score-0.139]

28 To obtain a descriptor the authors form a histogram with 1024 bins of their viewdependent global convexity measure sampled from 10000 random views. [sent-70, score-0.257]

29 Obtaining these distributions is rather costly, even for small meshes with 5k vertices the computation time is in the range of minutes. [sent-71, score-0.212]

30 Instead of using visibil- ity trees [8] and bridge-projections as proposed in [16, 14] we propose to tessellate the space between the mesh M awned pirtso convex h teulsls eClMlat ean tdh subsequently compute sehsohrte Mst path dtsis ctaonnvceesx w huitlhlin C this discretized space. [sent-76, score-0.318]

31 Our approach requires no extra identification of handle/tunnel loops and works on meshes of arbitrary genus – they only need to be free of self-intersections. [sent-78, score-0.226]

32 (a) shows the input mesh (black) and the convex hull (orange). [sent-84, score-0.533]

33 In (b) an offset, coarser outer hull has been computed to enable stable Constrained Delaunay Tetrahedralization (CDT). [sent-85, score-0.361]

34 ) The CDT of the free space between the input mesh and the outer hull is shown in (c). [sent-87, score-0.559]

35 To compute the shortest distances a Fast Marching Method (FMM) is used. [sent-88, score-0.214]

36 (d) shows the initialization of the FMM: the CDT boundary vertices (red) and their inner neighbors (green) are initialized with their distance to the convex hull. [sent-89, score-0.198]

37 all boundary vertices have a negative distance, their inner neighbors can have positive distances to the convex hull. [sent-90, score-0.227]

38 outer Note that while We further claim that a per-point concavity measure is more shape descriptive than a (sampled) global measure. [sent-91, score-0.354]

39 In Section 4 we propose a 4-dimensional scale, tessellation and rotation invariant descriptor derived from our SPC concavity values, and show how it generally outperforms the 1024-dimensional descriptor of [12] both in retrieval gain and run-time efficiency. [sent-92, score-0.442]

40 Even for models with 260k vertices less than 3 minutes are needed to compute the distribution of concavity values, more than one order of magnitude faster than computing the above mentioned descriptor. [sent-94, score-0.246]

41 Section 4 presents the SPC-based shape descriptor and shows retrieval results. [sent-96, score-0.155]

42 T avhee goal dofg eth leen algorithm mise stoh compute a scalar valued concavity measure C(v) ∈ [0, ∞) for every vertex v ∈d M con. [sent-100, score-0.235]

43 m T v through eth seh ovuolldum wee Vlo aplp(CroMxi)m m\a tVeol t(hMe s)h ofrrtoemst tphaet hm freosmh mto v th ther convex e h vulol. [sent-102, score-0.128]

44 The algorithm first computes CM, then tessellates the enclTosheed a space by applying a C Constrained Delaunay Tetrahedralization followed by a Fast Marching to get the shortest distances from M to CM . [sent-105, score-0.25]

45 The Mfirsart step gi tso simply s sohlvoerdte by applying any convex hull al- = gorithm. [sent-106, score-0.379]

46 Unfortunately, the requirement on the constraining surfaces to be free of intersection posed by common CDT algorithms rules out a direct tessellation of Vol(CM) \ Vol(M) due to trhulee non-empty ictnt teerssseecltliaotino. [sent-108, score-0.208]

47 nH oefn Vcoe,l (fCor th)e \ s Veocol(nMd step, ein tostead of using CM we compute a coarse but tight proper shtuelal (consisting of well shaped elements at an offset from CM) and adjust the computation of the distances accordingly. [sent-109, score-0.181]

48 The CDT of a mesh M (the constraining surface) interpolates thTe oefle ame mnetssh ho fM MM (. [sent-116, score-0.213]

49 While intersections between constraining surfaces in- tuitively could be handled by identifying and removing (stitching together) touching regions, this is a very delicate procedure geometrically, which still does not fix the problem of “bad” convex hull elements. [sent-121, score-0.508]

50 A (slightly) offset bounding box could be used as an outer constraining surface instead of the convex hull itself, but this would have a negative impact on performance, as generally unnecessarily much free space would have to be tessellated between the bounding box and the mesh. [sent-122, score-0.698]

51 We overcome these difficulties and obtain a proper and tight outer mesh by simple geometric operations. [sent-123, score-0.343]

52 ComputingaConstrainedDelaunayTetrahedralization of the convex hull directly (top row) can be unstable and leads to an overly fine tessellation. [sent-125, score-0.411]

53 Using a coarse but tight approximation of the convex hull avoids these problems (cf. [sent-126, score-0.473]

54 Our proposed outer hull OC is in spirit similar to the DisOcurerte p Ororpieonsteedd Polytopes O(k-DOP) typically used in raytracing and collision detection (cf. [sent-131, score-0.361]

55 However, while in a ray-tracing setting a fixed set of planes simplifies intersection tests, we can use a set of variable planes derived from the convex hull for a tighter fit and obtain a more object specific outer hull. [sent-133, score-0.561]

56 The k cluster centers then define k planes which we position to have a certain offset distance dist to their closest point on CM . [sent-136, score-0.144]

57 While this could be implemented using plane intersections, the problem can be solved much more efficiently by utilizing the duality between convex hulls and halfspace intersections. [sent-141, score-0.128]

58 The center of gravity of the convex hull is used as origin and dk is the distance of plane k to this point. [sent-146, score-0.379]

59 Now the convex hull of the dual points is computed, and the plane equations of the faces normalized to get points in our primal space again. [sent-147, score-0.413]

60 h Fuligl aurned compares the CDT of this hull with that of the convex hull. [sent-156, score-0.379]

61 To initialize the Fast Marching, for the outer boundary vertices v ∈ ∂ CDT we set tinagg,(v fo) = th D oOuteNrE bo uanndda riyni vtiearltizicee sth vem ∈ w ∂it hC DthTei rw eclo sse-t est (negative) distance to the convex hull: dist(v) = dist(v, CM). [sent-170, score-0.308]

62 C FoDrT th we s deitr etactg( nown)- b=o uCnLdOarSyE ne iagnhdb aolrsso w wini ∈tNia(lviz)e, wthe ∈/ m ∂ w CiDthT Tth weeir signed wdi)sta =nc CeL toO SthEe convex oh iunlil-: dist(w) = dist(w, CM). [sent-172, score-0.128]

63 Until the queue is empty, vertices are popped from the queue and the distances of their neighbors are updated; FAR neighbors are tagged CLOSE and added to the queue. [sent-176, score-0.197]

64 Even for a large mesh with about 260k vertices the computation takes less than 3 minutes in total. [sent-181, score-0.224]

65 The tX show tThaeb timings soufl t sh eo convex Chu cllo (CH), t ihoen . [sent-301, score-0.165]

66 a Tsth feo tur rows mkurt show the entries of our SPC based shape descriptor (cf. [sent-304, score-0.128]

67 The CDT time of the right Max Planck head (an irregular triangle mesh) is almost five times that of the left head (a regular remeshing [2] of the same), both meshes have approximately the same number of elements. [sent-308, score-0.265]

68 Figure 5 (left) further shows that quite low SPC errors can be achieved already with rather coarse meshes (compared to a very fine ground truth computation). [sent-312, score-0.169]

69 Note also that the shape, size and tightness of the proper outer hull (cf. [sent-313, score-0.406]

70 The plots show the effect of increasing the resolution of the CDT mesh on the accuracy of the SPC and that of the SPC-based descriptor (cf. [sent-323, score-0.203]

71 (a) a cut through the CDT mesh shows the slit inside, in the closeup the z-fighting artifacts can be seen. [sent-333, score-0.184]

72 SPC-based Shape Retrieval Based on the computed convexity values C(v) for each vertex v ∈ M we now define a 4-dimensional shape descriptor S ∈P CMD ∈e nRow4 o dfe fMine b aas 4e-dd on ncesinotnraall mshoampeen dtes-. [sent-338, score-0.309]

73 ation, skewness and kurtosis of the concavity values distributed over M: SPCD = [μ, σ, mskew, mkurt] . [sent-340, score-0.176]

74 To achieve tessellation independence the moments are computed by integration over the piecewise linear surface M, by means of some curvature formula per triangle: μ =? [sent-352, score-0.137]

75 As distance function we use a weighted L2 norm with a convex combination of empirically determined weights: [wμ , wσ , wmskew , wmkurt] = [0. [sent-367, score-0.128]

76 Average Precision-Recall curves on the canonical forms (a) and the non-deformed (b) shapes of the first 10 classes of the McGill Shape Benchmark. [sent-378, score-0.175]

77 The average precision (y-axis) is higher on the less diverse classes of the canonical forms than on the non-deformed McGill shapes. [sent-379, score-0.175]

78 The McGill Benchmark consists of 457 meshes divided into 19 classes, the first 10 consisting of articulated objects (e. [sent-383, score-0.208]

79 , ants and hands) the last 9 of moderately articulated objects (e. [sent-385, score-0.162]

80 evaluated their CD shape descriptor on computed canonical forms [13] (non-rigid deformations) of the first 10 classes. [sent-389, score-0.226]

81 We use our reference implementation of [12] to compare the descriptors on these canonical forms and also on the whole McGill dataset. [sent-390, score-0.161]

82 Table 2 shows how our 4dimensional SPCD descriptor derived from point-wise concavity measures outperforms the 1024-dimensional CD on the canonical forms of the articulated classes. [sent-392, score-0.458]

83 Retrieval performance comparison between SPCD and CD on the canonical forms of the first 10 classes of the McGill Shape Benchmark. [sent-412, score-0.175]

84 The confusion matrices visualize the mis-matches between the classes and reveal classes problematic for both methods. [sent-417, score-0.117]

85 Confusion matrices on the canonical forms of the first 10 classes of the McGill Shape Benchmark showing for a query class (row) the percentage of hits in the different classes (columns) for the proposed SPCD (a) and the CD by Lian et al. [sent-419, score-0.216]

86 We also performed four retrieval tests of both descriptors against different subsets of the non-deformed models of the McGill Shape Benchmark: on all 19 classes, on the subset of articulated classes, on the moderately articulated classes and on a set of selected classes. [sent-427, score-0.31]

87 On the articulated models the SPCD is still better than the CD, however, the result is much worse than on the canonical forms (cf. [sent-430, score-0.2]

88 Furthermore, descriptors only utilizing convexity (local or global) will never be able to perfectly separate similar classes such as birds from airplanes or spiders from ants (cf. [sent-460, score-0.407]

89 The original McGill Shape Benchmark contains more diverse poses within the classes, leading to a worse retrieval result than on the set of canonical forms of the same. [sent-464, score-0.197]

90 Computing the 10000 samples for the CD takes a couple of minutes even for relatively small meshes with 5k vertices. [sent-470, score-0.142]

91 This is about the same time needed to compute SPC on a mesh with 260k vertices (cf. [sent-471, score-0.224]

92 Both methods were evaluated on watertight (closed) meshes, in order to guarantee a consistent and unique concavity measure. [sent-477, score-0.176]

93 This poses the additional constraint that the meshes be free of self-intersections. [sent-480, score-0.186]

94 The FMM provides a very accurate approximation - using simple Dijkstra-like shortest paths, possibly even in simpler regular grids, would be possible but would introduce larger errors and a stronger tessellation dependency. [sent-483, score-0.323]

95 The efficiency of our method is evaluated on a variety of different meshes and using a Fast Marching approach we compute accurate distances already at low CDT resolutions. [sent-491, score-0.171]

96 However, when applied to high resolution, irregular input meshes the CDT time will eventually become a bottle-neck. [sent-492, score-0.178]

97 The two Max Planck heads in Table 1demonstrate exemplarily how the CDT timings can be reduced siginificantly by uniformly remeshing the input, without visually altering the SPC or significantly changing the descriptor values. [sent-493, score-0.141]

98 Linear time algorithms for visibility and shortest path problems inside simple polygons. [sent-550, score-0.221]

99 Computing first-arrival seismic traveltimes on unstructured 3-d tetrahedral grids using the fast marching method. [sent-577, score-0.116]

100 A simple and efficient approach for 3d mesh approximate convex decomposition. [sent-602, score-0.282]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cdt', 0.364), ('spc', 0.356), ('spcd', 0.328), ('hull', 0.251), ('mcgill', 0.194), ('shortest', 0.185), ('convexity', 0.183), ('concavity', 0.176), ('cm', 0.169), ('mesh', 0.154), ('meshes', 0.142), ('cd', 0.129), ('convex', 0.128), ('lian', 0.12), ('marching', 0.116), ('outer', 0.11), ('slc', 0.109), ('tessellation', 0.105), ('vol', 0.103), ('fmm', 0.091), ('oc', 0.083), ('canonical', 0.079), ('tetrahedralization', 0.073), ('vertices', 0.07), ('articulated', 0.066), ('retrieval', 0.063), ('dist', 0.062), ('lien', 0.06), ('meshing', 0.06), ('constraining', 0.059), ('delaunay', 0.056), ('campen', 0.055), ('dcg', 0.055), ('mskew', 0.055), ('remeshing', 0.055), ('scpdc', 0.055), ('socg', 0.055), ('spiders', 0.055), ('forms', 0.055), ('airplanes', 0.052), ('straight', 0.05), ('descriptor', 0.049), ('ants', 0.049), ('moderately', 0.047), ('offset', 0.046), ('proper', 0.045), ('free', 0.044), ('intersections', 0.043), ('shape', 0.043), ('classes', 0.041), ('genus', 0.04), ('polygons', 0.039), ('timings', 0.037), ('buss', 0.036), ('kremer', 0.036), ('mamou', 0.036), ('mkurt', 0.036), ('openflipper', 0.036), ('openvolumemesh', 0.036), ('plank', 0.036), ('pockets', 0.036), ('remeshed', 0.036), ('roundtable', 0.036), ('tessellates', 0.036), ('zunic', 0.036), ('queue', 0.036), ('planes', 0.036), ('irregular', 0.036), ('path', 0.036), ('confusion', 0.035), ('faces', 0.034), ('tight', 0.034), ('vertex', 0.034), ('measures', 0.033), ('approximation', 0.033), ('acd', 0.032), ('godil', 0.032), ('kobbelt', 0.032), ('mtion', 0.032), ('obius', 0.032), ('pocket', 0.032), ('rosin', 0.032), ('slits', 0.032), ('triangle', 0.032), ('visualizes', 0.032), ('overly', 0.032), ('surface', 0.032), ('pages', 0.031), ('octopus', 0.03), ('slit', 0.03), ('distances', 0.029), ('paths', 0.029), ('tessellated', 0.028), ('coarse', 0.027), ('descriptors', 0.027), ('zimmer', 0.027), ('delicate', 0.027), ('narrow', 0.027), ('tagged', 0.026), ('measure', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 141 cvpr-2013-Efficient Computation of Shortest Path-Concavity for 3D Meshes

Author: Henrik Zimmer, Marcel Campen, Leif Kobbelt

Abstract: In the context of shape segmentation and retrieval object-wide distributions of measures are needed to accurately evaluate and compare local regions ofshapes. Lien et al. [16] proposed two point-wise concavity measures in the context of Approximate Convex Decompositions of polygons measuring the distance from a point to the polygon ’s convex hull: an accurate Shortest Path-Concavity (SPC) measure and a Straight Line-Concavity (SLC) approximation of the same. While both are practicable on 2D shapes, the exponential costs of SPC in 3D makes it inhibitively expensive for a generalization to meshes [14]. In this paper we propose an efficient and straight forward approximation of the Shortest Path-Concavity measure to 3D meshes. Our approximation is based on discretizing the space between mesh and convex hull, thereby reducing the continuous Shortest Path search to an efficiently solvable graph problem. Our approach works outof-the-box on complex mesh topologies and requires no complicated handling of genus. Besides presenting a rigorous evaluation of our method on a variety of input meshes, we also define an SPC-based Shape Descriptor and show its superior retrieval and runtime performance compared with the recently presented results on the Convexity Distribution by Lian et al. [12].

2 0.1372513 316 cvpr-2013-Optical Flow Estimation Using Laplacian Mesh Energy

Author: Wenbin Li, Darren Cosker, Matthew Brown, Rui Tang

Abstract: In this paper we present a novel non-rigid optical flow algorithm for dense image correspondence and non-rigid registration. The algorithm uses a unique Laplacian Mesh Energy term to encourage local smoothness whilst simultaneously preserving non-rigid deformation. Laplacian deformation approaches have become popular in graphics research as they enable mesh deformations to preserve local surface shape. In this work we propose a novel Laplacian Mesh Energy formula to ensure such sensible local deformations between image pairs. We express this wholly within the optical flow optimization, and show its application in a novel coarse-to-fine pyramidal approach. Our algorithm achieves the state-of-the-art performance in all trials on the Garg et al. dataset, and top tier performance on the Middlebury evaluation.

3 0.12781735 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

Author: Julien P.C. Valentin, Sunando Sengupta, Jonathan Warrell, Ali Shahrokni, Philip H.S. Torr

Abstract: Semantic reconstruction of a scene is important for a variety of applications such as 3D modelling, object recognition and autonomous robotic navigation. However, most object labelling methods work in the image domain and fail to capture the information present in 3D space. In this work we propose a principled way to generate object labelling in 3D. Our method builds a triangulated meshed representation of the scene from multiple depth estimates. We then define a CRF over this mesh, which is able to capture the consistency of geometric properties of the objects present in the scene. In this framework, we are able to generate object hypotheses by combining information from multiple sources: geometric properties (from the 3D mesh), and appearance properties (from images). We demonstrate the robustness of our framework in both indoor and outdoor scenes. For indoor scenes we created an augmented version of the NYU indoor scene dataset (RGB-D images) with object labelled meshes for training and evaluation. For outdoor scenes, we created ground truth object labellings for the KITTI odometry dataset (stereo image sequence). We observe a signifi- cant speed-up in the inference stage by performing labelling on the mesh, and additionally achieve higher accuracies.

4 0.10762835 97 cvpr-2013-Correspondence-Less Non-rigid Registration of Triangular Surface Meshes

Author: Zsolt Sánta, Zoltan Kato

Abstract: A novel correspondence-less approach is proposed to find a thin plate spline map between a pair of deformable 3D objects represented by triangular surface meshes. The proposed method works without landmark extraction and feature correspondences. The aligning transformation is found simply by solving a system of nonlinear equations. Each equation is generated by integrating a nonlinear function over the object’s domains. We derive recursive formulas for the efficient computation of these integrals. Based on a series of comparative tests on a large synthetic dataset, our triangular mesh-based algorithm outperforms state of the art methods both in terms of computing time and accuracy. The applicability of the proposed approach has been demonstrated on the registration of 3D lung CT volumes.

5 0.10639348 297 cvpr-2013-Multi-resolution Shape Analysis via Non-Euclidean Wavelets: Applications to Mesh Segmentation and Surface Alignment Problems

Author: Won Hwa Kim, Moo K. Chung, Vikas Singh

Abstract: The analysis of 3-D shape meshes is a fundamental problem in computer vision, graphics, and medical imaging. Frequently, the needs of the application require that our analysis take a multi-resolution view of the shape ’s local and global topology, and that the solution is consistent across multiple scales. Unfortunately, the preferred mathematical construct which offers this behavior in classical image/signal processing, Wavelets, is no longer applicable in this general setting (data with non-uniform topology). In particular, the traditional definition does not allow writing out an expansion for graphs that do not correspond to the uniformly sampled lattice (e.g., images). In this paper, we adapt recent results in harmonic analysis, to derive NonEuclidean Wavelets based algorithms for a range of shape analysis problems in vision and medical imaging. We show how descriptors derived from the dual domain representation offer native multi-resolution behavior for characterizing local/global topology around vertices. With only minor modifications, the framework yields a method for extracting interest/key points from shapes, a surprisingly simple algorithm for 3-D shape segmentation (competitive with state of the art), and a method for surface alignment (without landmarks). We give an extensive set of comparison results on a large shape segmentation benchmark and derive a uniqueness theorem for the surface alignment problem.

6 0.068839356 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

7 0.058597252 467 cvpr-2013-Wide-Baseline Hair Capture Using Strand-Based Refinement

8 0.058420449 432 cvpr-2013-Three-Dimensional Bilateral Symmetry Plane Estimation in the Phase Domain

9 0.0567054 327 cvpr-2013-Pattern-Driven Colorization of 3D Surfaces

10 0.055914044 41 cvpr-2013-An Iterated L1 Algorithm for Non-smooth Non-convex Optimization in Computer Vision

11 0.053694062 226 cvpr-2013-Intrinsic Characterization of Dynamic Surfaces

12 0.053570069 395 cvpr-2013-Shape from Silhouette Probability Maps: Reconstruction of Thin Objects in the Presence of Silhouette Extraction and Calibration Error

13 0.050016675 218 cvpr-2013-Improving the Visual Comprehension of Point Sets

14 0.048706766 208 cvpr-2013-Hyperbolic Harmonic Mapping for Constrained Brain Surface Registration

15 0.046820313 426 cvpr-2013-Tensor-Based Human Body Modeling

16 0.04648646 44 cvpr-2013-Area Preserving Brain Mapping

17 0.046346255 69 cvpr-2013-Boosting Binary Keypoint Descriptors

18 0.045449693 333 cvpr-2013-Plane-Based Content Preserving Warps for Video Stabilization

19 0.045356937 1 cvpr-2013-3D-Based Reasoning with Blocks, Support, and Stability

20 0.044826098 230 cvpr-2013-Joint 3D Scene Reconstruction and Class Segmentation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.121), (1, 0.055), (2, 0.002), (3, 0.016), (4, 0.035), (5, -0.035), (6, -0.024), (7, -0.024), (8, -0.011), (9, -0.027), (10, 0.018), (11, 0.013), (12, -0.037), (13, -0.022), (14, 0.031), (15, -0.05), (16, 0.033), (17, 0.023), (18, 0.033), (19, 0.071), (20, -0.081), (21, -0.006), (22, 0.028), (23, 0.023), (24, -0.145), (25, 0.022), (26, 0.067), (27, -0.024), (28, -0.048), (29, -0.072), (30, 0.045), (31, 0.063), (32, 0.056), (33, 0.045), (34, -0.037), (35, -0.02), (36, 0.014), (37, -0.06), (38, -0.026), (39, 0.022), (40, -0.02), (41, -0.027), (42, 0.025), (43, 0.02), (44, -0.007), (45, -0.025), (46, 0.012), (47, 0.077), (48, -0.028), (49, 0.105)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9008323 141 cvpr-2013-Efficient Computation of Shortest Path-Concavity for 3D Meshes

Author: Henrik Zimmer, Marcel Campen, Leif Kobbelt

Abstract: In the context of shape segmentation and retrieval object-wide distributions of measures are needed to accurately evaluate and compare local regions ofshapes. Lien et al. [16] proposed two point-wise concavity measures in the context of Approximate Convex Decompositions of polygons measuring the distance from a point to the polygon ’s convex hull: an accurate Shortest Path-Concavity (SPC) measure and a Straight Line-Concavity (SLC) approximation of the same. While both are practicable on 2D shapes, the exponential costs of SPC in 3D makes it inhibitively expensive for a generalization to meshes [14]. In this paper we propose an efficient and straight forward approximation of the Shortest Path-Concavity measure to 3D meshes. Our approximation is based on discretizing the space between mesh and convex hull, thereby reducing the continuous Shortest Path search to an efficiently solvable graph problem. Our approach works outof-the-box on complex mesh topologies and requires no complicated handling of genus. Besides presenting a rigorous evaluation of our method on a variety of input meshes, we also define an SPC-based Shape Descriptor and show its superior retrieval and runtime performance compared with the recently presented results on the Convexity Distribution by Lian et al. [12].

2 0.81189048 297 cvpr-2013-Multi-resolution Shape Analysis via Non-Euclidean Wavelets: Applications to Mesh Segmentation and Surface Alignment Problems

Author: Won Hwa Kim, Moo K. Chung, Vikas Singh

Abstract: The analysis of 3-D shape meshes is a fundamental problem in computer vision, graphics, and medical imaging. Frequently, the needs of the application require that our analysis take a multi-resolution view of the shape ’s local and global topology, and that the solution is consistent across multiple scales. Unfortunately, the preferred mathematical construct which offers this behavior in classical image/signal processing, Wavelets, is no longer applicable in this general setting (data with non-uniform topology). In particular, the traditional definition does not allow writing out an expansion for graphs that do not correspond to the uniformly sampled lattice (e.g., images). In this paper, we adapt recent results in harmonic analysis, to derive NonEuclidean Wavelets based algorithms for a range of shape analysis problems in vision and medical imaging. We show how descriptors derived from the dual domain representation offer native multi-resolution behavior for characterizing local/global topology around vertices. With only minor modifications, the framework yields a method for extracting interest/key points from shapes, a surprisingly simple algorithm for 3-D shape segmentation (competitive with state of the art), and a method for surface alignment (without landmarks). We give an extensive set of comparison results on a large shape segmentation benchmark and derive a uniqueness theorem for the surface alignment problem.

3 0.74757963 316 cvpr-2013-Optical Flow Estimation Using Laplacian Mesh Energy

Author: Wenbin Li, Darren Cosker, Matthew Brown, Rui Tang

Abstract: In this paper we present a novel non-rigid optical flow algorithm for dense image correspondence and non-rigid registration. The algorithm uses a unique Laplacian Mesh Energy term to encourage local smoothness whilst simultaneously preserving non-rigid deformation. Laplacian deformation approaches have become popular in graphics research as they enable mesh deformations to preserve local surface shape. In this work we propose a novel Laplacian Mesh Energy formula to ensure such sensible local deformations between image pairs. We express this wholly within the optical flow optimization, and show its application in a novel coarse-to-fine pyramidal approach. Our algorithm achieves the state-of-the-art performance in all trials on the Garg et al. dataset, and top tier performance on the Middlebury evaluation.

4 0.74272323 44 cvpr-2013-Area Preserving Brain Mapping

Author: Zhengyu Su, Wei Zeng, Rui Shi, Yalin Wang, Jian Sun, Xianfeng Gu

Abstract: Brain mapping transforms the brain cortical surface to canonical planar domains, which plays a fundamental role in morphological study. Most existing brain mapping methods are based on angle preserving maps, which may introduce large area distortions. This work proposes an area preserving brain mapping method based on MongeBrenier theory. The brain mapping is intrinsic to the Riemannian metric, unique, and diffeomorphic. The computation is equivalent to convex energy minimization and power Voronoi diagram construction. Comparing to the existing approaches based on Monge-Kantorovich theory, the proposed one greatly reduces the complexity (from n2 unknowns to n ), and improves the simplicity and efficiency. Experimental results on caudate nucleus surface mapping and cortical surface mapping demonstrate the efficacy and efficiency of the proposed method. Conventional methods for caudate nucleus surface mapping may suffer from numerical instability; in contrast, current method produces diffeomorpic mappings stably. In the study of cortical sur- face classification for recognition of Alzheimer’s Disease, the proposed method outperforms some other morphometry features.

5 0.69030404 208 cvpr-2013-Hyperbolic Harmonic Mapping for Constrained Brain Surface Registration

Author: Rui Shi, Wei Zeng, Zhengyu Su, Hanna Damasio, Zhonglin Lu, Yalin Wang, Shing-Tung Yau, Xianfeng Gu

Abstract: Automatic computation of surface correspondence via harmonic map is an active research field in computer vision, computer graphics and computational geometry. It may help document and understand physical and biological phenomena and also has broad applications in biometrics, medical imaging and motion capture. Although numerous studies have been devoted to harmonic map research, limited progress has been made to compute a diffeomorphic harmonic map on general topology surfaces with landmark constraints. This work conquer this problem by changing the Riemannian metric on the target surface to a hyperbolic metric, so that the harmonic mapping is guaranteed to be a diffeomorphism under landmark constraints. The computational algorithms are based on the Ricci flow method and the method is general and robust. We apply our algorithm to study constrained human brain surface registration problem. Experimental results demonstrate that, by changing the Riemannian metric, the registrations are always diffeomorphic, and achieve relative high performance when evaluated with some popular cortical surface registration evaluation standards.

6 0.67965627 97 cvpr-2013-Correspondence-Less Non-rigid Registration of Triangular Surface Meshes

7 0.65223402 226 cvpr-2013-Intrinsic Characterization of Dynamic Surfaces

8 0.63761014 327 cvpr-2013-Pattern-Driven Colorization of 3D Surfaces

9 0.63202071 298 cvpr-2013-Multi-scale Curve Detection on Surfaces

10 0.61993992 432 cvpr-2013-Three-Dimensional Bilateral Symmetry Plane Estimation in the Phase Domain

11 0.60330975 284 cvpr-2013-Mesh Based Semantic Modelling for Indoor and Outdoor Scenes

12 0.58392268 289 cvpr-2013-Monocular Template-Based 3D Reconstruction of Extensible Surfaces with Local Linear Elasticity

13 0.56761676 435 cvpr-2013-Towards Contactless, Low-Cost and Accurate 3D Fingerprint Identification

14 0.52067792 467 cvpr-2013-Wide-Baseline Hair Capture Using Strand-Based Refinement

15 0.49392572 218 cvpr-2013-Improving the Visual Comprehension of Point Sets

16 0.48986989 369 cvpr-2013-Rotation, Scaling and Deformation Invariant Scattering for Texture Discrimination

17 0.47583017 391 cvpr-2013-Sensing and Recognizing Surface Textures Using a GelSight Sensor

18 0.45173305 90 cvpr-2013-Computing Diffeomorphic Paths for Large Motion Interpolation

19 0.45145133 52 cvpr-2013-Axially Symmetric 3D Pots Configuration System Using Axis of Symmetry and Break Curve

20 0.44340941 321 cvpr-2013-PDM-ENLOR: Learning Ensemble of Local PDM-Based Regressions


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.108), (16, 0.013), (26, 0.052), (28, 0.014), (33, 0.175), (57, 0.342), (59, 0.016), (67, 0.037), (69, 0.06), (87, 0.097)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71439356 141 cvpr-2013-Efficient Computation of Shortest Path-Concavity for 3D Meshes

Author: Henrik Zimmer, Marcel Campen, Leif Kobbelt

Abstract: In the context of shape segmentation and retrieval object-wide distributions of measures are needed to accurately evaluate and compare local regions ofshapes. Lien et al. [16] proposed two point-wise concavity measures in the context of Approximate Convex Decompositions of polygons measuring the distance from a point to the polygon ’s convex hull: an accurate Shortest Path-Concavity (SPC) measure and a Straight Line-Concavity (SLC) approximation of the same. While both are practicable on 2D shapes, the exponential costs of SPC in 3D makes it inhibitively expensive for a generalization to meshes [14]. In this paper we propose an efficient and straight forward approximation of the Shortest Path-Concavity measure to 3D meshes. Our approximation is based on discretizing the space between mesh and convex hull, thereby reducing the continuous Shortest Path search to an efficiently solvable graph problem. Our approach works outof-the-box on complex mesh topologies and requires no complicated handling of genus. Besides presenting a rigorous evaluation of our method on a variety of input meshes, we also define an SPC-based Shape Descriptor and show its superior retrieval and runtime performance compared with the recently presented results on the Convexity Distribution by Lian et al. [12].

2 0.63475883 95 cvpr-2013-Continuous Inference in Graphical Models with Polynomial Energies

Author: Mathieu Salzmann

Abstract: In this paper, we tackle the problem of performing inference in graphical models whose energy is a polynomial function of continuous variables. Our energy minimization method follows a dual decomposition approach, where the global problem is split into subproblems defined over the graph cliques. The optimal solution to these subproblems is obtained by making use of a polynomial system solver. Our algorithm inherits the convergence guarantees of dual decomposition. To speed up optimization, we also introduce a variant of this algorithm based on the augmented Lagrangian method. Our experiments illustrate the diversity of computer vision problems that can be expressed with polynomial energies, and demonstrate the benefits of our approach over existing continuous inference methods.

3 0.62685269 79 cvpr-2013-Cartesian K-Means

Author: Mohammad Norouzi, David J. Fleet

Abstract: A fundamental limitation of quantization techniques like the k-means clustering algorithm is the storage and runtime cost associated with the large numbers of clusters required to keep quantization errors small and model fidelity high. We develop new models with a compositional parameterization of cluster centers, so representational capacity increases super-linearly in the number of parameters. This allows one to effectively quantize data using billions or trillions of centers. We formulate two such models, Orthogonal k-means and Cartesian k-means. They are closely related to one another, to k-means, to methods for binary hash function optimization like ITQ [5], and to Product Quantization for vector quantization [7]. The models are tested on largescale ANN retrieval tasks (1M GIST, 1B SIFT features), and on codebook learning for object recognition (CIFAR-10). 1. Introduction and background Techniques for vector quantization, like the well-known k-means algorithm, are used widely in vision and learning. Common applications include codebook learning for object recognition [16], and approximate nearest neighbor (ANN) search for information retrieval [12, 14]. In general terms, such techniques involve partitioning an input vector space into multiple regions {Si}ik=1, mapping points x in each region uinlttiop region-specific representatives {ci}ik=1, known as cioennte irnst.o A resg siuonch-,s a quantizer, qse(nxt)a,t can b {ec expressed as k q(x) = ?1(x∈Si)ci, (1) i?= ?1 where 1(·) is the usual indicator function. eTrhee 1 quality oef u a quantizer oisr expressed in terms of expected distortion, a common measure of which is squared error ?x − q(x) ?22. In this case, given centers {ci}, the region t ?ox xw −hic qh(x a point nis t assigned gwivitehn m ceinnitemrasl {dcis}to,r tthioen r eisobtained by Euclidean nearest neighbor (NN) search. The k-means algorithm can be used to learn centers from data. To reduce expected distortion, crucial for many applications, one can shrink region volumes by increasing k, the number of regions. In practice, however, increasing k results in prohibitive storage and run-time costs. Even if one resorts to ANN search with approximate k-means [14] or hierarchical k-means [12], it is hard to scale to large k (e.g., k = 264), as storing the centers is untenable. This paper concerns methods for scalable quantization with tractable storage and run-time costs. Inspired by Product Quantization (PQ), a state-of-the-art algorithm for ANN search with high-dimensional data (e.g., [7]), compositionality is one key. By expressing data in terms of recurring, reusable parts, the representational capacity of compositional models grows exponentially in the number ofparameters. Compression techniques like JPEG accomplish this by encoding images as disjoint rectangular patches. PQ divides the feature space into disjoint subspaces that are quantized independently. Other examples include part-based recognition models (e.g., [18]), and tensor-based models for stylecontent separation (e.g., [17]). Here, with a compositional parameterization of region centers, we find a family of models that reduce the enc√oding cost of k centers down from k to between log2 k and √k. A model parameter controls the trade-off between model fidelity and compactness. We formulate two related algorithms, Orthogonal kmeans (ok-means) and Cartesian k-means (ck-means). They are natural extensions of k-means, and are closely related to other hashing and quantization methods. The okmeans algorithm is a generalization of the Iterative Quantization (ITQ) algorithm for finding locality-sensitive binary codes [5]. The ck-means model is an extension of okmeans, and can be viewed as a generalization of PQ.1 We then report empirical results on large-scale ANN search tasks, and on codebook learning for recognition. For ANN search we use datasets of 1M GIST and 1B SIFT features, with both symmetric and asymmetric distance measures on the latent codes. We consider codebook learning for a generic form of recognition on CIFAR-10. 2. k-means Given a dataset of n p-dim points, D ≡ {xj }jn=1, the kmeans algorithm partitions ithme n points in ≡to { kx c}lusters, and 1A very similar generalization of PQ, was developed concurrently by Ge, He, Ke and Sun, and also appears in CVPR 2013 [4]. 333000111755 represents each cluster by a center point. Let C ∈ Rp×k breep a mseantrtsix e wachhos celu sctoelurm byns a comprise tihnet .k Lceltus Cter ∈ centers, i.e., C = [c1, c2 , · · · , ck] . The k-means objective is to minimize the within,-c··lu·s ,tecr squared distances: ?k-means(C) = = x?∈Dmiin?x − ci?22 x?∈Db∈mHin1/k?x − Cb?22 (2) where H1/k ≡ {b | b ∈ {0, 1}k and ?b? = 1}, i.e., b is a binary vee Hctor comprising a ,1-1o}f-ka encoding. Lloyd’s b k- imse aa bni-s algorithm [11] finds a local minimum of (2) by iterative, alternating optimization with respect to C and the b’s. The k-means model is simple and intuitive, using NN search to assign points to centers. The assignment of points to centers can be represented with a log k-bit index per data point. The cost of storing the centers grows linearly with k. 3. Orthogonal k-means with 2m centers With a compositional model one can represent cluster centers more efficiently. One such approach is to re- construct each input with an additive combination of the columns of C. To this end, instead of the 1-of-k encoding in (2), we let b be a general m-bit vector, b ∈ Hm ≡ {0, 1}m, (a2n)d, wCe e∈ l Rt bp× bme. aA gse nsuerchal, meac-bhi tc vluesctteorr c, ebnt ∈er H His th≡e sum o}f a asundbs Cet o∈f Rthe columns of C. There are 2m possible subsets, and therefore k = 2m centers. While the number of parameters in the quantizer is linear in m, the number of centers increases exponentially. While efficient in representing cluster centers, the approach is problematic, because solving bˆ = abrg∈Hmmin?x − Cb?22,(3) is intractable; i.e., the discrete optimization is not submodular. Obviously, for small 2m one could generate all possible centers and then perform NN search to find the optimal solution, but this would not scale well to large values of m. One key observation is that if the columns of C are orthogonal, then optimization (3) becomes tractable. To explain this, without loss of generality, assume the bits belong to {−1, 1} instead of {0, 1}, i.e., b? ∈ Hm? ≡ {−1, 1}m. tToh e{−n, ?x − Cb??22 = xTx + b?TCTCb? − 2xTCb? .(4) For diagonal CTC, the middle quadratic term on the RHS becomes trace(CTC), independent of b?. As a consequence, when C has orthogonal columns, abr?g∈mH?min?x − Cb??22 = sgn(CTx) ,(5) where sgn(·) is the element-wise sign function. Cluster centers are depicted by dots, and cluster boundaries by dashed lines. (Left) Clusters formed by a 2-cube with no rotation, scaling or translation; centers = {b? |b? ∈ H2? }. (Right) Rotation, scaling and translation are used to reduce distances between data points and cluster centers; centers = {μ + RDb? | b? ∈ H2? }. To reduce quantization error further we can also introduce an offset, denoted μ, to translate x. Taken together with (5), this leads to the loss function for the model we call orthogonal k-means (ok-means):2 ?ok-means(C,μ) = x?∈Db?m∈iHn?m?x − μ − Cb??22. (6) Clearly, with a change of variables, b? = 2b −1, we can define new versions of μ and C, with ide=ntic 2abl −lo1s,s, w wfoer c wanh dicehthe unknowns are binary, but in {0, 1}m. T uhnek nook-wmnesa anrse quantizer uetn icnod {e0,s 1ea}ch data point as a vertex of a transformed m-dimensional unit hypercube. The transform, via C and μ, maps the hypercube vertices onto the input feature space, ideally as close as possible to the data points. The matrix C has orthogonal columns and can therefore be expressed in terms of rotation and scaling; i.e., C ≡ RD, where R ∈ Rp×m has orthonormal cinoglu;m i.ne.s, ( CRT ≡R = R DIm,) w, ahnerde eD R Ris ∈ diagonal and positive definite. The goal of learning is to find the parameters, R, D, and μ, which minimize quantization error. Fig. 1 depicts a small set of 2D data points (red x’s) and two possible quantizations. Fig. 1 (left) depicts the vertices of a 2-cube with C = I2 and zero translation. The cluster regions are simply the four quadrants of the 2D space. The distances between data points and cluster centers, i.e., the quantization errors, are relatively large. By comparison, Fig. 1 (right) shows how a transformed 2-cube, the full model, can greatly reduce quantization errors. 3.1. Learning ok-means To derive the learning algorithm for ok-means we first rewrite the objective in matrix terms. Given n data points, let X = [x1, x2, · · · , xn] ∈ Rp×n. Let B? ∈ {−1, 1}m×n denote the corresponding ∈clu Rster assignment {co−ef1f,ic1i}ents. Our 2While closely related to ITQ, we use the the relationship to k-means. term ok-means to emphasize 333000111 686 goal is to find the assignment coefficients B? and the transformation parameters, namely, the rotation R, scaling D, and translation μ, that minimize ?ok-means(B?, R, D,μ) = ?X − μ1T − RDB??f2 (7) = ?X? − RDB??f2 (8) where ?·?f denotes the Frobenius norm, X? ≡ X − μ1T, R iws hceornes ?tr·a?ined to have orthonormal columns≡ ≡(R XTR − μ=1 Im), and D is a positive diagonal matrix. Like k-means, coordinate descent is effective for optimizing (8). We first initialize μ and R, and then iteratively optimize ?ok-means with respect to B?, D, R, and μ: • Optimize B? and D: With straightforward algebraic manipulation, one can show that ?ok-means = ?RTX?−DB??f2 + ?R⊥TX??f2 , (9) where columns of R⊥ span the orthogonal complement of the column-space of R (i.e., the block matrix [R R⊥] is orthogonal). It follows that, given X? and R, we can optimize the first term in (9) to solve for B? and D. Here, DB? is the leastsquares approximation to RTX?, where RTX? and DB? × are m n. Further, the ith row of DB? can only contain earleem men×tsn .fr Fomur t{h−erd,i t,h +e dii} where di = Dii. Wherever tehleem corresponding delement} }o fw RheTrXe? d is positive (negative) we should put a positive (negative) value in DB?. The optimal di is determined by the mean absolute value of the elements in the ith row of RTX?: • • B? ← sgn ?RTX?? (10) D ← Diag?(mroewan?(abs(RTX?))) (11) Optimize R: Using the original objective (8), find R that minimizes ?X? −RA?f2 , subject to RTR = Im, and Am n≡i mDizBes?. ?TXhis i−s equivalent to an Orthogonal Procrustes problem [15], and can be solved exactly using SVD. In particular, by adding p − m rows of zeros to the bottom poaf rDtic, uDlaBr, bb eyc aodmdeins p p× − n. mT rhoewn sR o ifs z square a tnhde orthogoonf aDl ,a DndB can boem eessti pm ×at end. Twhiethn RSV isD s. qBuuatr es ainncde oDrtBho gisdegenerate we are only interested in the first m columns of R; the remaining columns are unique only up to rotation of the null-space.) Optimize μ: Given R, B? and D, the optimal μ is given by the column average of X −RDB?: μ ← mcoeluamn(X−RDB?) (12) 3.2. Approximate nearest neighbor search One application of scalable quantization is ANN search. Before introducing more advanced quantization techniques, we describe some experimental results with ok-means on Euclidean ANN search. The benefits of ok-means for ANN search are two-fold. Storage costs for the centers is reduced to O(log k), from O(k) with k-means. Second, substantial speedups are possible by exploiting fast methods for NN search on binary codes in Hamming space (e.g., [13]). Generally, in terms of a generic quantizer q(·), there are twoG neanteurraalll ways etrom messti omfa ate g ethneer dicis qtaunanceti z beetrw qe(·e)n, ttwheor vectors, v and u [7]. Using the Symmetric quantizer distance (SQD) ?v−u? is approximated by ?q(v)−q(u) ? . Using the Asymmetric quantizer doixsimtanacteed (A byQ ?Dq()v, only one o Uf sthineg tw thoe vectors is quantized, and ?v−u? is estimated as ?v−q(u) ? . vWechtiloer sS iQs qDu might b, aen slightly f?as itse ers t iom compute, vA−QqD(u i)?n-. curs lower quantization errors, and hence is more accurate. For ANN search, in a pre-processing stage, each database entry, u, is encoded into a binary vector corresponding to the cluster center index to which u is assigned. At test time, the queries may or may not be encoded into indices, depending on whether one uses SQD or AQD. In the ok-means model, the quantization of an input x is straightforwardly shown to be qok(x) = μ + RD sgn(RT(x − μ)) . (13) The corresponding m-bit cluster index is sgn(RT(x − μ)). Given two indices, namely b?1 , b?2 ∈ {−1, +1}m,( txhe − symmetric ok-means quantizer distance∈ ∈is { SQDok(b?1, b?2) = ?μ + RDb?1 − μ − RDb?2 ?22 = ?D(b?1 − b?2)?22 .(14) In effect, SQDok is a weighted Hamming distance. It is the sum of the squared diagonal entries of D corresponding to bits where b?1 and b?2 differ. Interestingly, in our experiments with ok-means, Hamming and weighted Hamming distances yield similar results. Thus, in ok-means experiments we simply report results for Hamming distance, to facilitate comparisons with other techniques. When the scaling in ok-means is constrained to be isotropic (i.e., D = αIm for α ∈ R+), then SQDok becomes a constant multiple off othre α αu ∈sua Rl Hamming distance. As discussed in Sec. 5, this isotropic ok-means is closely related to ITQ [5]. The ok-means AQD between a feature vector x and a cluster index b?, is defined as AQDok(x, b?) = ?x −μ − RDb??22 = ?RTx? − Db??22 + ?R⊥Tx??22 , (15) where x? = x−μ. For ANN search, in comparing distances from query x t−o a .d Faotars AetN oNf binary i,n idni ccoesm, ptharei snegc odnisdta tnecrems on the RHS of (15) is irrelevant, since it does not depend on b?. Without this term, AQDok becomes a form of asymmetric Hamming (AH) distance between RTx? and b?. While previous work [6] discussed various ad hoc AH distance measures for binary hashing, interestingly, the optimal AH distance for ok-means is derived directly in our model. 333000111977 1M SIFT, 64−bit encoding (k = 264) lRc@aeR0 . 206148 10 RIoPT1Qk K−Q m( AHeDa)Hn )s (AH)10K Figure 2. Euclidean ANN retrieval results for different methods and distance functions on the 1M SIFT dataset. 3.3. Experiments with ok-means Following [7], we report ANN search results on 1M SIFT, a corpus of 128D SIFT descriptors with disjoint sets of 100K training, 1M base, and 10K test vectors. The training set is used to train models. The base set is the database, and the test points are queries. The number of bits m is typically less than p, but no pre-processing is needed for dimensionality reduction. Rather, to initialize learning, R is a random rotation of the first m principal directions of the training data, and μ is the mean of the data. For each query we find R nearest neighbors, and compute Recall@R, the fraction of queries for which the ground-truth Euclidean NN is found in the R retrieved items. Fig. 2 shows the Recall@R plots for ok-means with Hamming (H) ≈ SQDok and asymmetric Hamming (AH) H≡a mAmQiDngok ( Hd)is t≈an SceQ, vs. ITQ [5] and PQ [7]. The PQ ≡met AhoQdD exploits a more complex asymmetric distance func- tion denoted AD ≡ AQDck (defined in Sec. 6. 1). Note first tthioant od ke-nmoeteadns A improves upon ITQ, with both Hamming and asymmetric Hamming distances. This is due to the scaling parameters (i.e., D) in ok-means. If one is interested in Hamming distance efficient retrieval, ok-means is prefered over ITQ. But better results are obtained with the asymmetric distance function. Fig. 2 also shows that PQ achieves superior recall rates. This stems from its joint encoding of multiple feature dimensions. In ok-means, each bit represents a partition ofthe feature space into two clusters, separated by a hyperplane. The intersection of m orthogonal hyperplanes yields 2m regions. Hence we obtain just two clusters per dimension, and each dimension is encoded independently. In PQ, by comparison, multiple dimensions are encoded jointly, with arbitrary numbers of clusters. PQ thereby captures statistical dependencies among different dimensions. We next extend ok-means to jointly encode multiple dimensions. 4. Cartesian k-means In the Cartesian k-means (ck-means) model, each region center is expressed parametrically as an additive combination of multiple subcenters. Let there be m sets of subcen- Fidg2ure31.Ddep4icton5fCartde?1si2nqdu?4ati5z?3aton 4qDck(da)t= ,?wd i?5134t?h the first (last) two dimensions sub-quantized on the left (right). Cartesian k-means quantizer denoted qck, combines the subquantizations in subspaces, and produces a 4D reconstruction. ters, each with h elements.3 Let C(i) be a matrix whose columns comprise the elements of the ith subcenter set; C(i) ∈ Rp×h. Finally, assume that each cluster center, c, is the∈ sum of exactly one element from each subcenter set: = ?C(i)b(i) m c i?= ?1 , (16) where b(i) ∈ H1/h is a 1-of-h encoding. As a conc∈re Hte example (see Fig. 3), suppose we are given 4D inputs, x ∈ R4, and we split each datum into m = 2 parts: z(1) = ?I2 0? x , and Then, suppose w?e z(2) = ?0 I2? x .(17) qu?antize each part, z(?1) and? z(2) , sepa- × rately. As depicted in Fig. 3 (left and middle), we could use h = 5 subcenters for each one. Placing the corresponding subcenters in the columns of 4 5 matrices C(1) and C(2) , C(1)=?d1d20d2×35d4d5?, C(2)=?d?1d?20d2×?35d?4d?5?, we obtain a model (16) that provides 52 possible centers with which to quantize the data. More generally, the total number of model centers is k = hm. Each center is a member of the Cartesian product of the subcenter sets, hence the name Cartesian k-means. Importantly, while the number of centers is hm, the number of subcenters is only mh. The model provides a super-linear number of centers with a linear number of parameters. The learning objective for Cartesian k-means is ?ck-means(C) =x?∈D{b(mi)}inim=1???x −i?=m1C(i)b(i)??22 where b(i) ∈ H1/h, and C ≡ [C(1), ··· , (18) C(m)] ∈ Rp×mh. [b(1)T, ··· ,b(m)T] If we let bT ≡ then the second sum in (18) can be expressed succinctly as Cb. 3While here we assume a fixed cardinality for all subcenter sets, the model is easily extended to allow sets with different cardinalities. 333000112088 The key problem with this formulation is that the min- imization of (18) with respect to the b(i) ’s is intractable. Nevertheless, motivated by orthogonal k-means (Sec. 3), encoding can be shown to be both efficient and exact if we impose orthogonality constraints on the sets of subcenters. To that end, assume that all subcenters in different sets are pairwise orthogonal; i.e., ∀i,j | i = j C(i)TC(j) = 0h×h .(19) Each subcenter matrix C(i) spans a linear subspace of Rp, and the linear subspaces for different subcenter sets do not intersect. Hence, (19) implies that only the subcenters in C(i) can explain the projection of x onto the C(i) subspace. In the example depicted in Fig. 3, the input features are simply partitioned (17), and the subspaces clearly satisfy the orthogonality constraints. It is also clear that C ≡ [ C(1) C(2)] is block diagonal, Iwtit ihs 2 a ×lso o5 c bleloacrks t,h adte Cnote ≡d D(1) and D(]2 i)s . bTlohec quantization error t×he5re bfolorcek bse,c doemnoeste ?x − Cb?22 = ???zz((12))?−?D0(1) D0(2)? ?b ( 21) ? ?2 = ?????z(1)−D(1)b(1)??2+???z(2)−D(2??)b(2)??2. In words, the squa??zred quantization?? erro??r zis the sum of t??he squared errors on the subspaces. One can therefore solve for the binary coefficients of the subcenter sets independently. In the general case, assuming (19) is satisfied, C can be expressed as a product R D, where R has orthonormal columns, and D is block diagonal; i.e., C = R D where Ra=nd[hRe(n1c),e·C(,i)R=(mR)]i,Dan(di).DW=i⎢t⎡⎣⎢hDs0i.(1≡)Dra0(n2)k. C.(Di)0(.m,i)t⎦⎥ ⎤fo,l(2w0)s that D(i) ∈ Rsi×h and R(i) ∈ Rp×≡sira. Clearly, ? si ≤ p, because of∈ ∈th Re orthogonality ∈con Rstraints. Replacing C(i) with R(i)D(i) in the RHS of (18?), we find ?x−Cb?22 = ?m?z(i)−D(i)b(i)?22+?R⊥Tx?22, (21) i?= ?1 where z(i)≡R(i)Tx, and R⊥is the orthogonal complement of R. This≡ ≡shRows that, with orthogonality constraints (19), the ck-means encoding problem can be split into m independent sub-encoding problems, one for each subcenter set. To find the b(i) that minimizes ??z(i) we perform NN search with z(i) again??st h si-dim vec??tors in D(i) . This entails a cost of O(hsi).? Fortunately, all? the elements of b can be found very efficiently, in O(hs), where s ≡ ? si. If we also include the cost of rotating x to −D(i)b(i)?22, Taocbkml-emt1he.aoAnds um#ceh2nkmrtyeofskm-#lobgeiatknsh,cOo-rm( pOec(2oamkn+spt),hpan)dkO- m(pOecosa(n+mst(hipns)khtesro)m of number of centers, number of bits needed for indices (i.e., log #centers), and the storage cost of representation, which is the same as the encoding cost to convert inputs to indices. The last column shows the costs given an assumption that C has a rank of s ≥ m. obtain each z(i) , the total encoding cost is O(ps + hs), i.e., O(p2+hp). Alternatively, one could perform NN search in p-dim C(i) ’s, to find the b(i) ’s, which costs O(mhp). Table 1 summerizes the models in terms of their number of centers, index size, and cost of storage and encoding. 4.1. Learning ck-means We can re-write the ck-means objective (18) in matrix form with the Frobenius norm; i.e., ?ck-means(R, D, B) = ? X − RDB ?f2 (22) where the columns of X and B comprise the data points and the subcenter assignment coefficients. The input to the learning algorithm is the training data X, the number of subcenter sets m, the cardinality of the subcenter sets h, and an upper bound on the rank of C, i.e., s. In practice, we also let each rotation matrix R(i) have the same number of columns, i.e., si = s/m. The outputs are the matrices {R(i) } and {D(i) } that provide a local minima of (22). Learning begins hwaitth p trohev idneit aia lloizcaatlio mni noimf Ra oanf d(2 D2)., followed by iterative coordinate descent in B, D, and R: • Optimize B and D: With R fixed, the objective is given by (21) where ?R⊥TX?f2 R(i)TX, is constant. Given data pro- jections Z(i) ≡ to optimize for B and D we perform one step oRf k-means for each subcenter set: – Assignment: Perform NN searches for each subcenter set to find the assignment coefficients, B(i) , B(i)← arBg(mi)in?Z(i)− D(i)B(i)?f2 – • Update: D(i)← arDg(mi)in?Z(i)− D(i)B(i)?f2 Optimize R: Placing the D(i) ’s along the diagonal of D, as in (20), and concatenating B(i) ’s as rows of B, [B(1)T, ...,B(m)T], i.e., BT = the optimization of R reduces to the orthogonal Procrustes problem: R ← argRmin?X − RDB?f2. In experiments below, R ∈ Rp×p, and rank(C) ≤ p is unIcnon esxtpraeirniemde. tFso rb high-dimensional adnadta r awnhke(rCe ) ra ≤nk( pX is) ?np, fnostrr efficiency ri th may dbime eusnesfiuoln atol dcoantast wrahiner er anr akn(kC(X). 333000112199 5. Relations and related work As mentioned above, there are close mathematical relationships between ok-means, ck-means, ITQ for binary hashing [5], and PQ for vector quantization [7]. It is instructive to specify these relationships in some detail. Iterative Quantization vs. Orthogonal k-means. ITQ [5] is a variant of locality-sensitive hashing, mapping data to binary codes for fast retrieval. To extract m-bit codes, ITQ first zero-centers the data matrix to obtain X?. PCA is then used for dimensionality reduction, fromp down to m dimensions, after which the subspace representation is randomly rotated. The composition of PCA and the random rotation can be expressed as WTX? where W ∈ Rp×m. ITQ then solves for the m×m rotation mawthriex,r eR W, t ∈hatR minimizes ?ITQ(B?, R) = ?RTWTX?− B??f2 , s.t. RTR = Im×m, (23) where B? ∈ {−1, +1}n×p. ITQ ro∈tate {s− t1h,e+ subspace representation of the data to match the binary codes, and then minimizes quantization error within the subspace. By contrast, ok-means maps the binary codes into the original input space, and then considers both the quantization error within the subspace and the out-of-subspace projection error. A key difference is the inclusion of ?R⊥X? ?f2 in the ok-means objective (9). This is important s ?inRce one can often reduce quantization errors by projecting out significant portions of the feature space. Another key difference between ITQ and ok-means is the inclusion of non-uniform scaling in ok-means. This is important when the data are not isotropic, and contributes to the marked improvement in recall rates in Fig. 2. Orthogonal k-means vs. Cartesian k-means. We next show that ok-means is a special case of ck-means with h = 2, where each subcenter set has two elements. To this end, let C(i) = and let b(i) = be the ith subcenter matrix selection vector. Since b(i) is a 1-of-2 encoding (?10? or ?01?), it follows that: [c(1i) c2(i)], b1(i)c(1i)+ b(2i)c2(i) = [b(1i) b2(i)]T c(1i)+2 c2(i)+ bi?c1(i)−2 c2(i), (24) b1(i) − b2(i) where bi? ≡ ∈ {−1, +1}. With the following setting of t≡he bok-m−e abns parameters, μ=?i=m1c1(i)+2c(2i),andC=?c1(1)−2c2(1),...,c1(m)−2c2(m)?, it should be clear that ?i C(i)b(i) = μ + Cb?, where b? ∈ {−1, +1}m, and b??i is the ith bit of b?, used in (24). Similarly, one can also map ok-means parameters onto corresponding subcenters for ck-means. Thus, there is a 1-to-1 mapping between the parameterizations of cluster centers in ok-means and ck-means for h = 2. The benefits of ok-means are its small number of parameters, and its intuitive formulation. The benefit of the ck-means generalization is its joint encoding of multiple dimensions with an arbitrary number of centers, allowing it to capture intrinsic dependence among data dimensions. Product Quantization vs. Cartesian k-means. PQ first splits the input vectors into m disjoint sub-vectors, and then quantizes each sub-vector separately using a subquantizer. Thus PQ is a special case of ck-means where the rotation R is not optimized; rather, R is fixed in both learning and retrieval. This is important because the sub-quantizers act independently, thereby ignoring intrasubspace statistical dependence. Thus the selection of subspaces is critical for PQ [7, 8]. J e´gou et al. [8] suggest using PCA, followed by random rotation, before applying PQ to high-dimensional vectors. Finding the rotation to minimize quantization error is mentioned in [8], but it was considered too difficult to estimate. Here we show that one can find a rotation to minimize the quantization error. The ck-means learning algorithm optimizes sub-quantizers in the inner loop, but they interact in an outer loop that optimizes the rotation (Sec. 4.1). 6. Experiments 6.1. Euclidean ANN search Euclidean ANN is a useful task for comparing quantization techniques. Given a database of ck-means indices, and a query, we use Asymmetric and Symmetric ck-means quantizer distance, denoted AQDck and SQDck. The AQDck between a query x and a binary index b ≡ ?b(1)T, ...,b(m)T?T, derived in (21), is AQDck(x,b) ?m?z(i)−D(i)b(i)?22+ ?R⊥Tx?22.(25) = i?= ?1 ?z(i)−D(i)b(i)?22is the distance between the ithpro- Here, jection?? of x, i.e., z(i) , ?a?nd a subcenter projection from D(i) selecte?d by b(i) . Give?n a query, these distances for each i, and all h possible values of b(i) can be pre-computed and stored in a query-specific h×m lookup table. Once created, tshtoer lookup qtaubelery i-ss pusecedif itoc compare akullp pda tatbablea.se O points etaot tehde, query. So computing AQDck entails m lookups and additions, somewhat more costly than Hamming distance. The last term on the RHS of (25) is irrelevant for NN search. Since PQ is a special case of ck-means with pre-defined subspaces, the same distances are used for PQ (cf. [7]). The SQDck between binary codes b1 and b2 is given by SQDck(b1,b2) = ?m?D(i)b1(i)− D(i)b2(i)?22.(26) i?= ?1 Since b1(i) and b(2i) are 1-of-?h encodings, an m×h?×h lookup table can be createadre t 1o- sft-ohre e nacllo pairwise msu×bh×-dhis ltaonockeusp. 333000222200 1M SIFT, 64−bit encoding (k = 264) 1M GIST, 64−bit encoding (k = 264) 1B SIFT, 64−bit encoding (k = 264) Re@acl0 .4186021RcIPoTQk0−Q m( SAeDaH)n s (SADH)1K0 .1642081 0RIocP1TkQ −K m (SAeDHa)n s (ASHD1)0K .186420 1 R0IoPc1TkQ −KQm( ASe DaHn) s (SADH1)0K Figure 4. Euclidean NN recall@R (number of items retrieved) based on different quantizers and corresponding distance functions on the 1M SIFT, 1M GIST, and 1B SIFT datasets. The dashed curves use symmetric distance. (AH ≡ AQDok, SD ≡ SQDck, AD ≡ AQDck) While the cost of computing SQDck is the same as AQDck, SQDck could also be used to estimate the distance between the indexed database entries, for diversifying the retrieval results, or to detect near duplicates, for example. Datasets. We use the 1M SIFT dataset, as in Sec. 3.3, along with two others, namely, 1M GIST (960D features) and 1B SIFT, both comprising disjoint sets of training, base and test vectors. 1M GIST has 500K training, 1M base, and 1K query vectors. 1B SIFT has 100M training, 1B base, and 10K query points. In each case, recall rates are averaged over queries in test set for a database populated from the base set. For expedience, we use only the first 1M training points for the 1B SIFT experiments. Parameters. In ANN experiments below, for both ckmeans and PQ, we use m = 8 and h = 256. Hence the number of clusters is k = 2568 = 264, so 64-bits are used as database indices. Using h = 256 is particularly attractive because the resulting lookup tables are small, encoding is fast, and each subcenter index fits into one byte. As h increases we expect retrieval results to improve, but encoding and indexing of a query to become slower. Initialization. To initialize the Di’s for learning, as in kmeans, we simply begin with random samples from the set of Zi’s (see Sec. 4.1). To initialize R we consider the different methods that J e´gou et al. [7] proposed for splitting the feature dimensions into m sub-vectors for PQ: (1) natural: sub-vectors comprise consecutive dimensions, (2) structured: dimensions with the same index modulo 8 are grouped, and (3) random: random permutations are used. For PQ in the experiments below, we use the orderings that produced the best results in [7], namely, the structured ordering for 960D GIST, and the natural ordering for 128D SIFT. For learning ck-means, R is initialized to the identity with SIFT corpora. For 1M GIST, where the PQ ordering is significant, we consider all three orderings to initialize R. Results. Fig. 4 shows Recall@R plots for ck-means and PQ [7] with symmetric and asymmetric distances (SD ≡ PSQQD [7c]k wanitdh A syDm m≡e rAicQ aDncdk) a on tmhee t3r cda dtaissteatns.c Tsh (eS Dhor ≡izontal axainsd represents tQheD number of retrieved items, R, on a log-scale. The results consistently favor ck-means. On 1M GIST, 64−bit encoding (k = 264) lae@Rc0 .261084 1R0Pc Qk −1m(K 321e )a (A nD s )(132) (A D )10K Figure 5. PQ and ck-means results using natural (1), structured (2), and random (3) ordering to define the (initial) subspaces. the high-dimensional GIST data, ck-means with AD significantly outperforms other methods; even ck-means with SD performs on par with PQ with AD. On 1M SIFT, the Recall@ 10 numbers for PQ and ck-means, both using AD, × are 59.9% and 63.7%. On 1B SIFT, Recall@ 100 numbers are 56.5% and 64.9%. As expected, with increasing dataset size, the difference between methods is more significant. In 1B SIFT, each feature vector is 128 bytes, hence a total of 119 GB. Using any method in Fig. 4 (including ckmeans) to index the database into 64 bits, this storage cost reduces to only 7.5 GB. This allows one to work with much larger datasets. In the experiments we use linear scan to find the nearest items according to quantizer distances. For NN search using 10K SIFT queries on 1B SIFT this takes about 8 hours for AD and AH and 4 hours for Hamming distance on a 2 4-core computer. Search can be sped up significantly; using a coarse tienri.tia Sl quantization sapnedd an pin svigenritefidfile structure for AD and AH, as suggested by [7, 1], and using the multi-index hashing method of [13] for Hamming distance. In this paper we did not implement these efficiencies as we focus primarily on the quality of quantization. Fig. 5 compares ck-means to PQ when R in ck-means is initialized using the 3 orderings of [7]. It shows that ckmeans is superior in all cases. Simiarly interesting, it also shows that despite the non-convexity ofthe optimization objective, ck-means learning tends to find similarly good encodings under different initial conditions. Finally, Fig. 6 compares the methods under different code dimensionality. 333000222311 1M GIST, encoding with 64, 96, and 128 bits Rl@ace0 .814206 1 R0cP kQ − m1 69e4216a−8 Knb −itsb 1t69248− bit10K Figure 6. PQ and ck-means results using different number of bits for encoding. In all cases asymmetric distance is used. Table2.Rcokg-nmiteoamPnQCesac(ondksue=r(bkaoc416y=0ko26)n40t2h[eC]IFAR7c598-u1.72096r%a tcesyuingdfferent codebook learning algorithms. 6.2. Learning visual codebooks While the ANN seach tasks above require too many clusters for k-means, it is interesing to compare k-means and ck-means on a task with a moderate number of clusters. To this end we consider codebook learning for bag-ofword√s models [3, 10]. We use ck-means with m = 2 and h = √k, and hence k centers. The main advantage of ck- × here is that finding the closest cluster center is done in O(√k) time, much faster than standard NN search with k-means in O(k). Alternatives for k-means, to improve efficiency, include approximate k-means [14], and hierarchical k-means [12]. Here we only compare to exact k-means. CIFAR-10 [9] comprises 50K training and 10K test images (32 32 pixels). Each image is one of 10 classes (airplane, 3b2i×rd,3 car, cat, )d.e Eera,c dog, frog, sh oonrsee o, ship, laansds etrsu (caki)r.We use publicly available code from Coates et al [2], with changes to the codebook learning and cluster assignment modules. Codebooks are built on 6×6 whitened color image patches. .O Cnoed histogram i sb ucirleta otend 6 per image quadrant, aagnde a linear SVM is applied to 4k-dim feature vectors. Recognition accuracy rates on the test set for different models and k are given in Table 2. Despite having fewer parameters, ck-means performs on par or better than kmeans. This is consistent for different initializations of the algorithms. Although k-means has higher fedility than ckmeans, with fewer parameters, ck-means may be less susceptible to overfitting. Table 2, also compares with the approach of [19], where PQ without learning a rotation is used for clustering features. As expected, learning the rotation has a significant impact on recognition rates, outperforming all three initializations of PQ. mean√s 7. Conclusions We present the Cartesian k-means algorithm, a generalization of k-means with a parameterization of the clus- ter centers such that number of centers is super-linear in the number of parameters. The method is also shown to be a generalization of the ITQ algorithm and Product Quantization. In experiments on large-scale retrieval and codebook learning for recognition the results are impressive, outperforming product quantization by a significant margin. An implementation of the method is available at https : / / github . com/norou z i ckmeans . / References [1] A. Babenko and V. Lempitsky. The inverted multi-index. CVPR, 2012. [2] A. Coates, H. Lee, and A. Ng. An analysis of single-layer networks in unsupervised feature learning. AISTATS, 2011. [3] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. ECCV Workshop Statistical Learning in Computer Vision, 2004. [4] T. Ge, K. He, Q. Ke, and J. Sun. Optimized product quantization for approximate nearest neighbor search. CVPR, 2013. [5] Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approach to learning binary codes. CVPR, 2011. [6] A. Gordo and F. Perronnin. Asymmetric distances for binary embeddings. CVPR, 2011. [7] H. J ´egou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE Trans. PAMI, 2011. [8] H. J ´egou, M. Douze, C. Schmid, and P. P ´erez. Aggregating local descriptors into a compact image representation. CVPR, 2010. [9] A. Krizhevsky. Learning multiple layers of features from tiny images. MSc Thesis, Univ. Toronto, 2009. [10] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. CVPR, 2006. [11] S. P. Lloyd. Least squares quantization in pcm. IEEE Trans. IT, 28(2): 129–137, 1982. [12] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. CVPR, 2006. [13] M. Norouzi, A. Punjani, and D. J. Fleet. Fast search in hamming space with multi-index hashing. CVPR, 2012. [14] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007. [15] P. Sch o¨nemann. A generalized solution of the orthogonal procrustes problem. Psychometrika, 3 1, 1966. [16] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. ICCV, 2003. [17] J. Tenenbaum and W. Freeman. Separating style and content with bilinear models. Neural Comp., 12: 1247–1283, 2000. [18] A. Torralba, K. Murphy, and W. Freeman. Sharing visual features for multiclass and multiview object detection. IEEE T. PAMI, 29(5):854–869, 2007. [19] S. Wei, X. Wu, and D. Xu. Partitioned k-means clustering for fast construction of unbiased visual vocabulary. The Era of Interactive Media, 2012. 333000222422

4 0.59864187 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition

Author: Peng Wang, Jingdong Wang, Gang Zeng, Weiwei Xu, Hongbin Zha, Shipeng Li

Abstract: In visual recognition tasks, the design of low level image feature representation is fundamental. The advent of local patch features from pixel attributes such as SIFT and LBP, has precipitated dramatic progresses. Recently, a kernel view of these features, called kernel descriptors (KDES) [1], generalizes the feature design in an unsupervised fashion and yields impressive results. In this paper, we present a supervised framework to embed the image level label information into the design of patch level kernel descriptors, which we call supervised kernel descriptors (SKDES). Specifically, we adopt the broadly applied bag-of-words (BOW) image classification pipeline and a large margin criterion to learn the lowlevel patch representation, which makes the patch features much more compact and achieve better discriminative ability than KDES. With this method, we achieve competitive results over several public datasets comparing with stateof-the-art methods.

5 0.59464669 255 cvpr-2013-Learning Separable Filters

Author: Roberto Rigamonti, Amos Sironi, Vincent Lepetit, Pascal Fua

Abstract: Learning filters to produce sparse image representations in terms of overcomplete dictionaries has emerged as a powerful way to create image features for many different purposes. Unfortunately, these filters are usually both numerous and non-separable, making their use computationally expensive. In this paper, we show that such filters can be computed as linear combinations of a smaller number of separable ones, thus greatly reducing the computational complexity at no cost in terms of performance. This makes filter learning approaches practical even for large images or 3D volumes, and we show that we significantly outperform state-of-theart methods on the linear structure extraction task, in terms ofboth accuracy and speed. Moreover, our approach is general and can be used on generic filter banks to reduce the complexity of the convolutions.

6 0.5630669 362 cvpr-2013-Robust Monocular Epipolar Flow Estimation

7 0.55925012 298 cvpr-2013-Multi-scale Curve Detection on Surfaces

8 0.55707347 350 cvpr-2013-Reconstructing Loopy Curvilinear Structures Using Integer Programming

9 0.5556283 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities

10 0.5547635 248 cvpr-2013-Learning Collections of Part Models for Object Recognition

11 0.55289543 19 cvpr-2013-A Minimum Error Vanishing Point Detection Approach for Uncalibrated Monocular Images of Man-Made Environments

12 0.55140805 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics

13 0.54976714 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis

14 0.54885316 222 cvpr-2013-Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus

15 0.54877967 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery

16 0.54662848 279 cvpr-2013-Manhattan Scene Understanding via XSlit Imaging

17 0.54606253 209 cvpr-2013-Hypergraphs for Joint Multi-view Reconstruction and Multi-object Tracking

18 0.54562569 400 cvpr-2013-Single Image Calibration of Multi-axial Imaging Systems

19 0.54448259 414 cvpr-2013-Structure Preserving Object Tracking

20 0.54438871 393 cvpr-2013-Separating Signal from Noise Using Patch Recurrence across Scales