iccv iccv2013 iccv2013-400 knowledge-graph by maker-knowledge-mining

400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection

Source: pdf

Author: Matthijs Douze, Jérôme Revaud, Cordelia Schmid, Hervé Jégou

Abstract: This paper makes two complementary contributions to event retrieval in large collections of videos. First, we propose hyper-pooling strategies that encode the frame descriptors into a representation of the video sequence in a stable manner. Our best choices compare favorably with regular pooling techniques based on k-means quantization. Second, we introduce a technique to improve the ranking. It can be interpreted either as a query expansion method or as a similarity adaptation based on the local context of the query video descriptor. Experiments on public benchmarks show that our methods are complementary and improve event retrieval results, without sacrificing efficiency.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Stable hyper-pooling and query expansion for event detection Matthijs Douze J ´er oˆme Revaud INRIA Grenoble INRIA Grenoble Abstract This paper makes two complementary contributions to event retrieval in large collections of videos. [sent-1, score-0.8]

2 First, we propose hyper-pooling strategies that encode the frame descriptors into a representation of the video sequence in a stable manner. [sent-2, score-0.516]

3 Our best choices compare favorably with regular pooling techniques based on k-means quantization. [sent-3, score-0.28]

4 It can be interpreted either as a query expansion method or as a similarity adaptation based on the local context of the query video descriptor. [sent-5, score-0.772]

5 Experiments on public benchmarks show that our methods are complementary and improve event retrieval results, without sacrificing efficiency. [sent-6, score-0.227]

6 Introduction Event retrieval in large collections of videos is an emerging problem. [sent-8, score-0.165]

7 Given a query video, the goal is to retrieve all the videos associated with the same event. [sent-9, score-0.322]

8 Solutions to this problem address needs of individual and professional users, since the video content is not necessarily annotated with relevant tags, and pure text retrieval is not precise enough. [sent-10, score-0.167]

9 In the supervised case, considered, for example, in the multimedia event detection (MED) task of the TRECVID campaign [13], the system additionally receives a set of annotated videos representing the event. [sent-12, score-0.212]

10 In the unsupervised case, only the query video is provided. [sent-13, score-0.324]

11 Most state-of-the-art systems [5, 12, 18] first extract individual frame descriptors, which are then averaged to produce the video representation. [sent-14, score-0.201]

12 In the context ofimage representations, pooling strategies are intensively used to reduce the loss of information induced by simply summing local descriptors [3, 4]. [sent-15, score-0.465]

13 This paper introduces a hyper-pooling approach for videos, that encodes frame descriptors into a single vector based on a second layer clustering, as illustrated in Figure 1. [sent-16, score-0.318]

14 Cordelia Schmid Herv e´ J e´gou INRIA Grenoble INRIA Rennes pooled to produce frame descriptors. [sent-18, score-0.136]

15 We address the stability ofthe pooling, as techniques used to encode local patch descriptors are not stable enough. [sent-20, score-0.453]

16 Most of the research effort has been devoted on the pooling at the image level. [sent-21, score-0.223]

17 [5] proposed scene-aligned pooling that fuses the frame descriptors per scene before computing a kernel based on these scenes. [sent-23, score-0.51]

18 They soft-assign each frame to 16 scene categories (using GIST descriptors) and then sum the frame descriptors using the weights of this soft-assignment. [sent-24, score-0.395]

19 Similarly, a recent method [18] based on circulant matching can be in- terpreted as a hyper-pooling technique in the frequency domain, where the average frame is complemented with frequency vectors, thereby incorporating temporal information. [sent-25, score-0.186]

20 The method was shown effective for copy detection, but for event retrieval it is only slightly better than simple average pooling, which is also used in several of the topranking systems of TRECVID MED 2012 [12]. [sent-26, score-0.227]

21 Hyper-pooling is more challenging than pooling of local descriptors: as noticed in [5], vector quantization of frame descriptors is prone to noise and, therefore, pooling techniques developed for image representation are not appropriate. [sent-27, score-0.813]

22 The first contribution of our paper is to introduce a video hyper-pooling method which relies on a stable and reliable vector quantization technique adapted to highdimensional frame descriptors. [sent-28, score-0.386]

23 It is particularly adapted to 11882255 videos comprising many shots, where average pooling tends to become less effective. [sent-29, score-0.31]

24 Our second contribution is a query expansion technique for event retrieval in videos when no training data is provided (unsupervised case). [sent-31, score-0.775]

25 Section 2 introduces the datasets and the frame representation that we use in this paper. [sent-36, score-0.139]

26 Section 3 analyzes the properties of our frame descriptors to design better pooling schemes. [sent-37, score-0.51]

27 EVVE is an event retrieval benchmark: given a single video of an event, it aims at retrieving videos related to the event from a dataset. [sent-48, score-0.48]

28 The baseline pooling and aggregation method for these PCA-reduced SIFTs is the VLAD descriptor [8]. [sent-72, score-0.325]

29 On the EVVE dataset, image descriptors are extracted for every video frame. [sent-74, score-0.244]

30 They are based on dense SIFT descriptors, that are aggregated into multiple VLAD descriptors with distinct vocabularies [7]. [sent-75, score-0.23]

31 Therefore, the first components of the MVLAD correspond to the most significant orientations in the descriptor space, but all components have equal variance after re-normalization. [sent-77, score-0.255]

32 Mean MVLAD is a fixed-size video descriptor averaging the set of MVLADs describing the frames of the video. [sent-79, score-0.176]

33 The cosine similarity is used to compare the query with all the database video descriptors. [sent-80, score-0.357]

34 Although this aggregation method is simple and produces a short descriptor (typically 512 dimensions), for event recognition it provides competitive results compared to a more sophisticated method based on Fourier domain comparison [18]. [sent-81, score-0.227]

35 Stable hyper-pooling The objective of this section is to provide a strategy to encode a sequence of frame descriptors of a video clip, based on a pooling strategy. [sent-83, score-0.575]

36 This pooling strategy consists of two stages: hashing and encoding. [sent-84, score-0.434]

37 na Lle vectors, as efor ra example t (hxe MVLA∈D descriptors of Section 2. [sent-87, score-0.203]

38 It encodes a set of local descriptors into a single vector representation as follows. [sent-104, score-0.179]

39 First, each descriptor is assigned to a cell by k-means hashing, as in Equation 1. [sent-105, score-0.167]

40 All residuals associated with the same index are added to produce a d-dimensional vector = xt − cj. [sent-107, score-0.16]

41 VLAD uses the k-means centroids both for pooling and encoding the descriptors. [sent-129, score-0.318]

42 We consider a more general formulation, in which we separate the hashing function q(. [sent-130, score-0.211]

43 ) from the encoding of the descriptors falling in each of the cells. [sent-131, score-0.209]

44 Then, we evaluate different hashing strategies, including the standard k-means. [sent-134, score-0.211]

45 This leads to a technique relying on stable components. [sent-135, score-0.16]

46 Properties of the MVLAD frame descriptors The frame descriptors used as input of our pooling scheme have specific properties. [sent-139, score-0.797]

47 As a result, two vectors assigned to the same cell are almost orthogonal, as shown by comparing the average distance of two vectors assigned to the same cell√: for k = 32, it is typically 1. [sent-144, score-0.212]

48 Another consequence is that the expectation of the vectors assigned to a given cell is close to the null vector. [sent-151, score-0.161]

49 simplify the computation of xtj(X) = xt xtj in Eqn. [sent-155, score-0.19]

50 Second, assuming that, before whitening, the frame descriptor is altered by a small and non time-consistent isotropic noise, the components of the MVLAD associated with the largest eigenvalues are more stable over time than those associated with small eigenvalues. [sent-167, score-0.428]

51 Evaluation of frame hashing strategies The previous discussion and variance analysis suggests that the pooling strategy should exploit the most stable (first) components of the MVLAD descriptor. [sent-171, score-0.813]

52 Unlike in bag-of-words, the objective ofthe hash function is not to find the best approximation of the vector, as the encoding stage will be considered independently. [sent-173, score-0.134]

53 The stability is maximized by a trivial solution, i. [sent-177, score-0.168]

54 Although not directly related to the richness of the representation, entropy reflects how the frame vectors are spread over the different clusters. [sent-184, score-0.244]

55 The stability is measured by the rate of successive frames assigned to the same q(x), see Figure 3. [sent-186, score-0.22]

56 This reflects the temporal stability of the assignment, as most successive frames belong to the same shot and are expected to be hashed similarly. [sent-187, score-0.252]

57 We consider the following choices for the hashing function q(. [sent-194, score-0.241]

58 We consider the hashing function q : R →{0, 1}m f →? [sent-205, score-0.211]

59 The previous choices are motivated by the stability criterion. [sent-222, score-0.154]

60 Table 1 compares the four strategies with respect to the stability in time, measured by the probability that the hashing key changes between two frames, and the diversity, measured by the entropy. [sent-226, score-0.376]

61 The kmeans, used in pooling schemes such as bag-of-words or × VLAD, is less stable over time and has also a lower diversity than its partial counterpart. [sent-227, score-0.375]

62 Evaluation of stability and diversity of different hashing techniques on a sample set of video clips. [sent-229, score-0.456]

63 Partial k-means is the best, but SSC does not require any learning stage if frame descriptors are already reduced by PCA. [sent-239, score-0.333]

64 The stability is illustrated on a video excerpt in Figure 4: PKM and SSC are visually more stable than the kd-tree and k-means quantizers. [sent-240, score-0.312]

65 Encoding After hashing the frames descriptors, the descriptors are aggregated per quantization cell. [sent-243, score-0.499]

66 The construction of the d k-dimensional video descriptor proceeds as follows. [sent-244, score-0.144]

67 We use the simple sum proposed in Equation 4 to aggregate the frames descriptors within a cell. [sent-246, score-0.211]

68 2-normalized and vectors are compared with the inner product, in order to compare the video descriptors with cosine similarity. [sent-252, score-0.323]

69 Discussion The pooling technique based on stable components is not specific to video, as it plays the same role as the quantizer used in VLAD to dispatch the set of descriptors into several cells. [sent-255, score-0.703]

70 , by using PKM and SSC as a pooling scheme for local descriptors transformed by PCA. [sent-259, score-0.402]

71 It is not surprising, as SIFT descriptors have a relatively low intrinsic dimensionality, and their assignment is 11882288 proaches with k = 32 cells, color = hash value. [sent-266, score-0.313]

72 For pooling local descriptors, it is therefore more important to produce the best approximation of the descriptors. [sent-270, score-0.251]

73 Query expansion: similarity in context Query expansion (QE) refers to re-ranking procedures that operate in two stages. [sent-275, score-0.189]

74 The first stage is standard: The nearest neighbors of the query vector are retrieved. [sent-276, score-0.332]

75 In the second stage, a few reliable neighbors are fused with the query to produce an expanded query that is submitted in turn in a second retrieval stage. [sent-277, score-0.734]

76 QE is effective for datasets comprising many positives per query, which guarantees that the expanded query is indeed better than the initial one. [sent-278, score-0.337]

77 Existing visual query expansion approaches [2, 6, 11] employ a geometric matching procedure to select the relevant images used in the expansion. [sent-280, score-0.448]

78 For very large sets of videos, it is, however, not reasonable3 to store individually all descriptors along with their spatial and temporal positions in the different frames. [sent-281, score-0.22]

79 Therefore, we consider only methods that rely on global descriptors, including those obtained by aggregating local descriptors as in [5, 18]. [sent-282, score-0.179]

80 It averages the BoW descriptor describing the query with those of the shortlist of nearest neighbors retrieved by the first stage: qaqe=q +1? [sent-286, score-0.389]

81 + |bN∈N1|1b, (7) where q is the query vector, N1 the neighborhood of q in the database. [sent-287, score-0.323]

82 A new nearest-neighbor search in the dataset is performed using qaqe as a query vector. [sent-288, score-0.312]

83 Query vector q is averaged with the points in its neighborhood N1, and the average of the larger neighborhood N2 is subtracted from it (Equation 8). [sent-297, score-0.163]

84 Experiments We evaluate three aspects of our method: (i) the performance of our hyper-pooling; (ii) the improvement due to our query expansion method; and (iii) the behavior of the method when videos are merged with clutter. [sent-306, score-0.511]

85 Hyper-pooling Table 2 compares the different quantizers mentioned in Section 3 in a retrieval setup, without considering query expansion at this stage. [sent-309, score-0.621]

86 The MVLAD and MMV descriptors we use here are relatively low dimensional (5 12 D). [sent-311, score-0.179]

87 The figure m is the number of dimensions of the frame descriptor used by the quantizer. [sent-314, score-0.187]

88 MA/k = 4/32 means 32 quantization cells and a multiple assignment of 4, dim is the video descriptor’s dimension. [sent-315, score-0.233]

89 We also experimented with Fisher vectors [9] as frame descriptors. [sent-317, score-0.154]

90 With a mixture of 256 Gaussians, we obtain descriptors of 16384 D, but their performance is below that of 512 D MVLAD. [sent-318, score-0.179]

91 Overall, this shows that the performance of averaged frame descriptors based on SIFT saturates, irrespective of the descriptor size. [sent-319, score-0.366]

92 This improves the retrieval performance significantly, but only for the stable pooling methods. [sent-327, score-0.448]

93 In what follows and unless specified otherwise, we use SSC with 32 centroids and multiple assignment to 4 cells. [sent-332, score-0.141]

94 2%), a more 4To perform multiple assignment with SSC, we start from the quantization cell q(f) = c ∈ {0, 1}m (Eqn. [sent-335, score-0.185]

95 In DQE [2], a discriminative SVM classifier is learned by using the query and the 3 first retrieval results as positives and the 200 last ones as negatives (we set C = 1). [sent-356, score-0.389]

96 Our DoN query expansion approach outperforms these two approaches, as it improves by +7. [sent-362, score-0.448]

97 The AQE and DoN query expansion techniques are defined in Equations 7 and 8. [sent-369, score-0.475]

98 When applied to the MMV descriptor, the DoN query expansion gives a mAP of 40. [sent-376, score-0.448]

99 SSC hyper-pooling and MMV in combination with DoN query expansion on EVVE. [sent-383, score-0.448]

100 We also evaluate our query expansion method for image retrieval on the Oxford5k dataset, using the classical VLAD descriptors (k = 64). [sent-388, score-0.729]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mvlad', 0.345), ('ssc', 0.306), ('evve', 0.282), ('query', 0.259), ('vlad', 0.225), ('pooling', 0.223), ('pkm', 0.212), ('mmv', 0.212), ('hashing', 0.211), ('expansion', 0.189), ('aqe', 0.188), ('descriptors', 0.179), ('qe', 0.127), ('event', 0.125), ('stability', 0.124), ('stable', 0.123), ('frame', 0.108), ('xt', 0.103), ('retrieval', 0.102), ('xtj', 0.087), ('descriptor', 0.079), ('assignment', 0.076), ('quantizer', 0.072), ('quantizers', 0.071), ('dqe', 0.071), ('components', 0.069), ('video', 0.065), ('centroids', 0.065), ('neighborhood', 0.064), ('videos', 0.063), ('hash', 0.058), ('cell', 0.056), ('grenoble', 0.054), ('mvlads', 0.053), ('qaqe', 0.053), ('rocchio', 0.053), ('quantization', 0.053), ('whitening', 0.053), ('entropy', 0.051), ('inria', 0.047), ('vectors', 0.046), ('stage', 0.046), ('maximized', 0.044), ('pca', 0.044), ('revaud', 0.044), ('strategies', 0.041), ('bn', 0.041), ('temporal', 0.041), ('richness', 0.039), ('cells', 0.039), ('variance', 0.038), ('technique', 0.037), ('subtracted', 0.035), ('cosine', 0.033), ('submitted', 0.033), ('med', 0.032), ('assigned', 0.032), ('sign', 0.032), ('successive', 0.032), ('frames', 0.032), ('introduces', 0.031), ('encoding', 0.03), ('choices', 0.03), ('shots', 0.03), ('diversity', 0.029), ('trecvid', 0.029), ('index', 0.029), ('produce', 0.028), ('subtracting', 0.028), ('positives', 0.028), ('confirmed', 0.028), ('vocabularies', 0.027), ('kmeans', 0.027), ('isotropic', 0.027), ('neighbors', 0.027), ('techniques', 0.027), ('null', 0.027), ('sift', 0.026), ('expanded', 0.026), ('bits', 0.025), ('indexes', 0.025), ('comprising', 0.024), ('findings', 0.024), ('aggregated', 0.024), ('bow', 0.024), ('efor', 0.024), ('shortlist', 0.024), ('flagged', 0.024), ('xtk', 0.024), ('aitv', 0.024), ('campaign', 0.024), ('alev', 0.024), ('sifts', 0.024), ('subvector', 0.024), ('signment', 0.024), ('aggregation', 0.023), ('fm', 0.023), ('shot', 0.023), ('summing', 0.022), ('eigenvalues', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection

Author: Matthijs Douze, Jérôme Revaud, Cordelia Schmid, Hervé Jégou

2 0.2509582 266 iccv-2013-Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation

Author: Basura Fernando, Tinne Tuytelaars

Abstract: In this paper we present a new method for object retrieval starting from multiple query images. The use of multiple queries allows for a more expressive formulation of the query object including, e.g., different viewpoints and/or viewing conditions. This, in turn, leads to more diverse and more accurate retrieval results. When no query images are available to the user, they can easily be retrieved from the internet using a standard image search engine. In particular, we propose a new method based on pattern mining. Using the minimal description length principle, we derive the most suitable set of patterns to describe the query object, with patterns corresponding to local feature configurations. This results in apowerful object-specific mid-level image representation. The archive can then be searched efficiently for similar images based on this representation, using a combination of two inverted file systems. Since the patterns already encode local spatial information, good results on several standard image retrieval datasets are obtained even without costly re-ranking based on geometric verification.

3 0.23328507 229 iccv-2013-Large-Scale Video Hashing via Structure Learning

Author: Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang

Abstract: Recently, learning based hashing methods have become popular for indexing large-scale media data. Hashing methods map high-dimensional features to compact binary codes that are efficient to match and robust in preserving original similarity. However, most of the existing hashing methods treat videos as a simple aggregation of independent frames and index each video through combining the indexes of frames. The structure information of videos, e.g., discriminative local visual commonality and temporal consistency, is often neglected in the design of hash functions. In this paper, we propose a supervised method that explores the structure learning techniques to design efficient hash functions. The proposed video hashing method formulates a minimization problem over a structure-regularized empirical loss. In particular, the structure regularization exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video. We show that the minimization objective can be efficiently solved by an Acceler- ated Proximal Gradient (APG) method. Extensive experiments on two large video benchmark datasets (up to around 150K video clips with over 12 million frames) show that the proposed method significantly outperforms the state-ofthe-art hashing methods.

4 0.19800323 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search

Author: Dror Aiger, Efi Kokiopoulou, Ehud Rivlin

Abstract: We propose two solutions for both nearest neighbors and range search problems. For the nearest neighbors problem, we propose a c-approximate solutionfor the restricted version ofthe decisionproblem with bounded radius which is then reduced to the nearest neighbors by a known reduction. For range searching we propose a scheme that learns the parameters in a learning stage adopting them to the case of a set of points with low intrinsic dimension that are embedded in high dimensional space (common scenario for image point descriptors). We compare our algorithms to the best known methods for these problems, i.e. LSH, ANN and FLANN. We show analytically and experimentally that we can do better for moderate approximation factor. Our algorithms are trivial to parallelize. In the experiments conducted, running on couple of million im- ages, our algorithms show meaningful speed-ups when compared with the above mentioned methods.

5 0.19338256 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing

Author: Xu Wang, Stefan Atev, John Wright, Gilad Lerman

Abstract: The problem of efficiently deciding which of a database of models is most similar to a given input query arises throughout modern computer vision. Motivated by applications in recognition, image retrieval and optimization, there has been significant recent interest in the variant of this problem in which the database models are linear subspaces and the input is either a point or a subspace. Current approaches to this problem have poor scaling in high dimensions, and may not guarantee sublinear query complexity. We present a new approach to approximate nearest subspace search, based on a simple, new locality sensitive hash for subspaces. Our approach allows point-tosubspace query for a database of subspaces of arbitrary dimension d, in a time that depends sublinearly on the number of subspaces in the database. The query complexity of our algorithm is linear in the ambient dimension D, allow- ing it to be directly applied to high-dimensional imagery data. Numerical experiments on model problems in image repatching and automatic face recognition confirm the advantages of our algorithm in terms of both speed and accuracy.

6 0.18036498 83 iccv-2013-Complementary Projection Hashing

7 0.17588505 127 iccv-2013-Dynamic Pooling for Complex Event Recognition

8 0.15151769 232 iccv-2013-Latent Space Sparse Subspace Clustering

9 0.15118751 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search

10 0.14960188 334 iccv-2013-Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval

11 0.1433053 333 iccv-2013-Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval

12 0.14127606 13 iccv-2013-A General Two-Step Approach to Learning-Based Hashing

13 0.13159598 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval

14 0.12685767 314 iccv-2013-Perspective Motion Segmentation via Collaborative Clustering

15 0.12230952 239 iccv-2013-Learning Hash Codes with Listwise Supervision

16 0.12029663 396 iccv-2013-Space-Time Robust Representation for Action Recognition

17 0.11619718 221 iccv-2013-Joint Inverted Indexing

18 0.11158045 210 iccv-2013-Image Retrieval Using Textual Cues

19 0.10956219 147 iccv-2013-Event Recognition in Photo Collections with a Stopwatch HMM

20 0.10932431 450 iccv-2013-What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search?

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.186), (1, 0.106), (2, -0.031), (3, -0.038), (4, -0.018), (5, 0.32), (6, 0.052), (7, 0.006), (8, -0.171), (9, 0.092), (10, 0.005), (11, -0.001), (12, 0.034), (13, 0.09), (14, -0.124), (15, -0.053), (16, 0.099), (17, -0.062), (18, 0.131), (19, -0.008), (20, 0.029), (21, -0.003), (22, -0.025), (23, -0.02), (24, -0.075), (25, 0.008), (26, -0.012), (27, -0.006), (28, 0.015), (29, 0.019), (30, -0.042), (31, 0.025), (32, 0.005), (33, -0.005), (34, -0.004), (35, -0.038), (36, 0.006), (37, -0.045), (38, 0.037), (39, 0.061), (40, -0.051), (41, 0.011), (42, -0.002), (43, 0.042), (44, -0.024), (45, 0.009), (46, 0.028), (47, -0.019), (48, -0.023), (49, -0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9581365 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection

Author: Matthijs Douze, Jérôme Revaud, Cordelia Schmid, Hervé Jégou

2 0.7881642 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search

Author: Dror Aiger, Efi Kokiopoulou, Ehud Rivlin

3 0.75762463 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing

Author: Xu Wang, Stefan Atev, John Wright, Gilad Lerman

4 0.75460708 221 iccv-2013-Joint Inverted Indexing

Author: Yan Xia, Kaiming He, Fang Wen, Jian Sun

Abstract: Inverted indexing is a popular non-exhaustive solution to large scale search. An inverted file is built by a quantizer such as k-means or a tree structure. It has been found that multiple inverted files, obtained by multiple independent random quantizers, are able to achieve practically good recall and speed. Instead of computing the multiple quantizers independently, we present a method that creates them jointly. Our method jointly optimizes all codewords in all quantizers. Then it assigns these codewords to the quantizers. In experiments this method shows significant improvement over various existing methods that use multiple independent quantizers. On the one-billion set of SIFT vectors, our method is faster and more accurate than a recent state-of-the-art inverted indexing method.

5 0.75136334 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search

Author: Giorgos Tolias, Yannis Avrithis, Hervé Jégou

Abstract: This paper considers a family of metrics to compare images based on their local descriptors. It encompasses the VLAD descriptor and matching techniques such as Hamming Embedding. Making the bridge between these approaches leads us to propose a match kernel that takes the best of existing techniques by combining an aggregation procedure with a selective match kernel. Finally, the representation underpinning this kernel is approximated, providing a large scale image search both precise and scalable, as shown by our experiments on several benchmarks.

6 0.74753439 450 iccv-2013-What is the Most EfficientWay to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search?

7 0.73045999 334 iccv-2013-Query-Adaptive Asymmetrical Dissimilarities for Visual Object Retrieval

8 0.72576183 266 iccv-2013-Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation

9 0.70975924 333 iccv-2013-Quantize and Conquer: A Dimensionality-Recursive Solution to Clustering, Vector Quantization, and Image Retrieval

10 0.69064832 159 iccv-2013-Fast Neighborhood Graph Search Using Cartesian Concatenation

11 0.66409636 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint

12 0.64298195 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval

13 0.63265157 446 iccv-2013-Visual Semantic Complex Network for Web Images

14 0.60811698 229 iccv-2013-Large-Scale Video Hashing via Structure Learning

15 0.55679077 3 iccv-2013-3D Sub-query Expansion for Improving Sketch-Based Multi-view Image Retrieval

16 0.53460467 83 iccv-2013-Complementary Projection Hashing

17 0.51773262 77 iccv-2013-Codemaps - Segment, Classify and Search Objects Locally

18 0.51185179 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors

19 0.51072884 365 iccv-2013-SIFTpack: A Compact Representation for Efficient SIFT Matching

20 0.49815544 239 iccv-2013-Learning Hash Codes with Listwise Supervision

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.105), (4, 0.014), (7, 0.026), (12, 0.012), (13, 0.023), (26, 0.077), (27, 0.01), (31, 0.036), (34, 0.016), (42, 0.073), (48, 0.011), (52, 0.221), (64, 0.037), (73, 0.025), (76, 0.011), (89, 0.214), (95, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86175394 306 iccv-2013-Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items

Author: Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg

Abstract: Clothing recognition is an extremely challenging problem due to wide variation in clothing item appearance, layering, and style. In this paper, we tackle the clothing parsing problem using a retrieval based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to parse the query. Our approach combines parsing from: pre-trained global clothing models, local clothing models learned on theflyfrom retrieved examples, and transferredparse masks (paper doll item transfer) from retrieved examples. Experimental evaluation shows that our approach significantly outperforms state of the art in parsing accuracy.

same-paper 2 0.82451642 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection

Author: Matthijs Douze, Jérôme Revaud, Cordelia Schmid, Hervé Jégou

3 0.76137823 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

Author: Ross Girshick, Jitendra Malik

Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.

4 0.76105082 339 iccv-2013-Rank Minimization across Appearance and Shape for AAM Ensemble Fitting

Author: Xin Cheng, Sridha Sridharan, Jason Saragih, Simon Lucey

Abstract: Active Appearance Models (AAMs) employ a paradigm of inverting a synthesis model of how an object can vary in terms of shape and appearance. As a result, the ability of AAMs to register an unseen object image is intrinsically linked to two factors. First, how well the synthesis model can reconstruct the object image. Second, the degrees of freedom in the model. Fewer degrees of freedom yield a higher likelihood of good fitting performance. In this paper we look at how these seemingly contrasting factors can complement one another for the problem of AAM fitting of an ensemble of images stemming from a constrained set (e.g. an ensemble of face images of the same person).

5 0.75955993 238 iccv-2013-Learning Graphs to Match

Author: Minsu Cho, Karteek Alahari, Jean Ponce

Abstract: Many tasks in computer vision are formulated as graph matching problems. Despite the NP-hard nature of the problem, fast and accurate approximations have led to significant progress in a wide range of applications. Learning graph models from observed data, however, still remains a challenging issue. This paper presents an effective scheme to parameterize a graph model, and learn its structural attributes for visual object matching. For this, we propose a graph representation with histogram-based attributes, and optimize them to increase the matching accuracy. Experimental evaluations on synthetic and real image datasets demonstrate the effectiveness of our approach, and show significant improvement in matching accuracy over graphs with pre-defined structures.

6 0.75914621 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies

7 0.75816458 229 iccv-2013-Large-Scale Video Hashing via Structure Learning

8 0.75809991 404 iccv-2013-Structured Forests for Fast Edge Detection

9 0.75766963 448 iccv-2013-Weakly Supervised Learning of Image Partitioning Using Decision Trees with Structured Split Criteria

10 0.75660098 107 iccv-2013-Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction

11 0.75631815 4 iccv-2013-ACTIVE: Activity Concept Transitions in Video Event Classification

12 0.75590396 71 iccv-2013-Category-Independent Object-Level Saliency Detection

13 0.75528163 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

14 0.75330067 297 iccv-2013-Online Motion Segmentation Using Dynamic Label Propagation

15 0.75302541 437 iccv-2013-Unsupervised Random Forest Manifold Alignment for Lipreading

16 0.75226521 361 iccv-2013-Robust Trajectory Clustering for Motion Segmentation

17 0.7520175 419 iccv-2013-To Aggregate or Not to aggregate: Selective Match Kernels for Image Search

18 0.75192726 29 iccv-2013-A Scalable Unsupervised Feature Merging Approach to Efficient Dimensionality Reduction of High-Dimensional Visual Data

19 0.75186634 396 iccv-2013-Space-Time Robust Representation for Action Recognition

20 0.75162649 265 iccv-2013-Mining Motion Atoms and Phrases for Complex Action Recognition