iccv iccv2013 iccv2013-229 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
Abstract: Recently, learning based hashing methods have become popular for indexing large-scale media data. Hashing methods map high-dimensional features to compact binary codes that are efficient to match and robust in preserving original similarity. However, most of the existing hashing methods treat videos as a simple aggregation of independent frames and index each video through combining the indexes of frames. The structure information of videos, e.g., discriminative local visual commonality and temporal consistency, is often neglected in the design of hash functions. In this paper, we propose a supervised method that explores the structure learning techniques to design efficient hash functions. The proposed video hashing method formulates a minimization problem over a structure-regularized empirical loss. In particular, the structure regularization exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video. We show that the minimization objective can be efficiently solved by an Acceler- ated Proximal Gradient (APG) method. Extensive experiments on two large video benchmark datasets (up to around 150K video clips with over 12 million frames) show that the proposed method significantly outperforms the state-ofthe-art hashing methods.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Recently, learning based hashing methods have become popular for indexing large-scale media data. [sent-8, score-0.581]
2 However, most of the existing hashing methods treat videos as a simple aggregation of independent frames and index each video through combining the indexes of frames. [sent-10, score-0.994]
3 , discriminative local visual commonality and temporal consistency, is often neglected in the design of hash functions. [sent-13, score-0.728]
4 In this paper, we propose a supervised method that explores the structure learning techniques to design efficient hash functions. [sent-14, score-0.674]
5 The proposed video hashing method formulates a minimization problem over a structure-regularized empirical loss. [sent-15, score-0.714]
6 In particular, the structure regularization exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video. [sent-16, score-0.671]
7 Extensive experiments on two large video benchmark datasets (up to around 150K video clips with over 12 million frames) show that the proposed method significantly outperforms the state-ofthe-art hashing methods. [sent-18, score-0.895]
8 Besides the well-known issue of semantic gap, the computational cost is another bottleneck for content-based video search since exhaustive comparisons of low-level visual features are practically prohibitive, when handling a large collection of video clips. [sent-21, score-0.364]
9 An illustration of the proposed hash code generation for videos within the event category “feeding an animal”. [sent-24, score-0.827]
10 Temporal consistency is preserved, and successive frames are grouped and put into the same hash bucket. [sent-28, score-0.76]
11 work of learning to hash has been well studied, and many new hashing methods have been developed through incorporating various machine learning techniques, ranging from unsupervised to semi-supervised to supervised learning [22, 21, 11, 7, 12, 8, 6]. [sent-29, score-1.172]
12 The key idea for learning-based hashing is to leverage data properties or human supervision to derive compact yet accurate hash codes. [sent-30, score-1.078]
13 Most of the existing hashing methods can be directly applied to index video data, such as the recent multiple feature based video hashing [19] and submodular video hashing [3]. [sent-31, score-2.094]
14 Despite of the promising results reported in the literature, the existing video hashing methods cannot explicitly encode the specific structure information in video clips, e. [sent-32, score-0.88]
15 , the commonly shared local visual patterns by videos associated with the same semantic labels and the temporal consistency between successive frames. [sent-34, score-0.476]
16 To address the above problems, we propose to explore the structure information to design novel video hashing methods. [sent-35, score-0.76]
17 Notably, although each video clip contains fruitful local visual patterns, only a limited number of discriminative local patterns are shared by videos within the same semantic category. [sent-38, score-0.476]
18 Therefore, the hashing method should ensure that the hash codes for successive frames to be as similar as possible (Figure 1). [sent-45, score-1.361]
19 To incorporate these two types of structure characteris- tics of videos, we propose a supervised framework with structure learning to design efficient linear hash functions for video indexing. [sent-46, score-0.902]
20 To capture the common local patterns across all the video frames from the same category, the first regularization term imposes a ‘2,1-norm over the hash functions so that only a small number of informative feature dimensions are selected. [sent-48, score-0.879]
21 In this way, we obtain consistent patterns across different videos, and improve the discrimination ability of the learned hash functions. [sent-52, score-0.568]
22 The second regularization term uses a ‘∞-norm on the hash codes of successive frames, which enforces successive frames to receive similar hash codes and essentially preserves the temporal consistency in Hamming space. [sent-53, score-1.677]
23 Extensive experiments over two large video benchmark datasets and comparisons with representative learning-based hashing methods demonstrate the superiority of the proposed video hashing method. [sent-55, score-1.391]
24 Recently, learning to hash framework has been extensively investigated and various machine learning algorithms are incorporated for designing efficient hash functions. [sent-59, score-1.096]
25 In the following, we briefly introduce several representative learning-based hashing methods and the recent applications on video indexing and retrieval. [sent-60, score-0.713]
26 Unsupervised hashing methods often utilize the data properties such as distribution or manifold structure to design effective indexing schemes. [sent-61, score-0.635]
27 For example, spectral hashing assumes that the data are sampled from a uniform distribution and partitions the data along their principal directions with the consideration of spatial frequencies [22]. [sent-62, score-0.555]
28 Graph hashing explores the low-dimensional manifold structure of data to design compact hash codes [13]. [sent-63, score-1.242]
29 Supervised hashing learning can be mainly categorized as pointwise and pairwise methods. [sent-64, score-0.598]
30 Pointwise methods, such as boosted similarity sensitive coding [18] and deep neural network-based method [20], often treat the design of hash functions as a special classification problem and use samples’ labels in the training procedure. [sent-65, score-0.616]
31 Pairwise methods take into account the pairwise relationship of samples in the hash function learning. [sent-66, score-0.572]
32 As a popular formulation, many recent works, including binary reconstructive embedding [11], complementary hashing [23], supervised hashing with kernels [14], and iterative quantization [7] fall into the category of pairwise methods. [sent-67, score-1.23]
33 Finally, semi-supervised hashing method plays a tradeoff between supervised information and data properties to design robust hash functions, which aims at alleviating the defects from overfitting or insufficient training [21]. [sent-68, score-1.14]
34 To our best knowledge, there are very limited studies on developing specific hash functions to index structured data like videos. [sent-70, score-0.558]
35 [3] proposed a submodular hashing framework to index videos. [sent-74, score-0.572]
36 [19] proposed a multiple feature based hashing for video near-duplicate detection. [sent-76, score-0.684]
37 However, in those video hashing methods, conventional hashing methods like locality sensitive hashing [6] and spectral hashing [22] are often applied to generate binary codes. [sent-77, score-2.332]
38 None of the existing methods really consider the special structure information like visual commonality and temporal consistency of videos to design structure-specific hash functions for video indexing. [sent-78, score-1.157]
39 In contrast, our proposed video hashing method leverages video structure information in a supervised learning paradigm to derive optimal binary codes for large-scale video retrieval. [sent-79, score-1.249]
40 , xi,ni] is a video consisting of 22227733 successive frames, yi ∈ {0, 1} is the label of video Xi1. [sent-91, score-0.408]
41 Given a video frame x, we want to learn K-bit binary codes c ∈ {0, 1}K, which needs to design K binary hash fcuondcetsio cns ∈. [sent-94, score-0.917]
42 {In0 t,1hi}s work, we consider the linear hash functions for their simplicity and efficiency. [sent-95, score-0.558]
43 , K) can be defined as: ni hk(x) = sgn(wk>x + bk), (1) where wk ∈ Rd is a hash hyperplane and bk is the intercept. [sent-99, score-0.618]
44 The resulting hk (x) ∈ {−1, 1}, and the corresponding binary hash bit can x be) simply c,a1l}cu, alantedd t as ck o(rxr)e p=o n(1d +ing gh kb (inxa)r)y/2 h. [sent-101, score-0.617]
45 a The Hamming distance between the hash codes of two frames xia and xjb from two videos Xi and Xj can be defined as: XK d(xia,xjb) = X(ck(xia) − ck(xjb))2 =k41=kX1K=1(hk(xia) − hk(xjb) 2. [sent-102, score-1.163]
46 (2) Based on the above definition, a straightforward way to define the Hamming distance between videos Xi and Xj is: D(Xi,Xj) =ni1njaXn=i1bXn=j1d(xia,xjb), (3) which means that the Hamming distance between two videos equals to the average Hamming distance of each pair of frames. [sent-103, score-0.382]
47 The basic assumption behind this definition is that most of frames within a video should be useful for measuring the distance between videos, and similar pairwise metric methodology has been proven to be effective in [3]. [sent-104, score-0.297]
48 However, the above function is not tractable due to the discrete nature of sgn function in each hash function. [sent-105, score-0.56]
49 We formulate the following objective which learns the hash functions by minimizing a structure-regularized cost function: XN mWin i,Xj=1‘? [sent-114, score-0.584]
50 Since each row of W will be multiplied with one specific dimension of the features, making it zero will discard the influence of the feature dimension from hash function learning. [sent-123, score-0.526]
51 The selected features correspond to the local common patterns in the video frames related to a certain video category, which convey discriminative information. [sent-125, score-0.474]
52 The minimization of kW>xi,t − W>xi,t+1 k∞ ensures the hash vectors toiof tnw oof s kuWccessive− −fra Wmes as simkilar as possible [2], i. [sent-126, score-0.556]
53 , encouraging the maximum absolute value of the entry-wise differences between two hash vectors to be zero. [sent-128, score-0.526]
54 This accounts for the preservation of the temporal structure in the hash codes generated for all frames of a video. [sent-129, score-0.844]
55 Notably, we apply ‘∞-norm here instead of ‘1-norm or ‘2-norm since ‘∞-norm will ensure stronger constraint that the two hash vectors of successive frames are close to each other. [sent-130, score-0.721]
56 ˆy ij = 1 leads to D(Xi, Xj) ≤ while ˆy ij = −1 leads to D(Xi, Xj) > In th)is way, iwt heinlefo rˆ yces t=hat − −v1ide loeasd wsi ttho the same category label should be close to each other while videos with different category labels should be far apart. [sent-139, score-0.359]
57 δ 22227744 With this loss term, we incorporate the supervision information into the hash function learning, leading to discriminative binary codes. [sent-141, score-0.622]
58 To demonstrate the strength of using both spatial and temporal structure information of videos, we test several variants of the proposed structure learning based hashing methods, and also compare with several representative hashing methods in our experiments. [sent-225, score-1.237]
59 Note that both SH and SPLH treat each video as a composite of independent frames and index the video by combining the Figure 2. [sent-230, score-0.427]
60 hash codes of frames, similar as that described in [3]. [sent-234, score-0.636]
61 Moreover, SH and SPLH are the representative hashing methods with publicly available codes, and hence are chosen for fair study. [sent-235, score-0.53]
62 Although there are some recent methods for video hashing [3, 19], they are built on standard hashing techniques like LSH, and require specific settings, like submodular or multiple-feature representations, hence not serving as standard comparable methods in our experiments. [sent-236, score-1.256]
63 Evaluations and Settings We follow the evaluation protocols used in [21] and adopt the following two criteria: (1) Hamming ranking: All the video clips are ranked according to their Hamming distance to the query video. [sent-239, score-0.287]
64 (2) Hash lookup: A hash lookup table is constructed and all the samples fall within a Hamming ball with radius r (r = 2 in our setting) to the query sample are returned. [sent-240, score-0.797]
65 Since each query video is represented as a set of binary codes corresponding to the frames in the video, here we adopt the following query strategy to return the nearest neighbor (NN) video clips. [sent-241, score-0.698]
66 For Hamming ranking based evaluation, it is fairly straightforward to return the nearest video clips by exhaustively computing and ranking the video Hamming distance between the query and the database samples, as defined in Eq. [sent-242, score-0.523]
67 For hash lookup based evaluation, we first retrieve the nearest neighbor frames within Hamming radius 2 for each individual hash code vector of the query video frames. [sent-244, score-1.547]
68 Then all the videos whose composite frames are successfully hit by any query frame will be regarded as the candidate NN videos. [sent-245, score-0.423]
69 An intuitive way to rank all these candidate NN videos is by counting the hit frequency of each video (normalized by the number of total frames in that video). [sent-246, score-0.474]
70 Note that these two evaluations focus on different aspects of hashing techniques. [sent-247, score-0.53]
71 Hash lookup emphasizes the search speed since the query complexity is often constant time, but the search quality could be unjustified when using very long hash codes, resulting in failed queries due to empty return within Hamming radius r. [sent-251, score-0.802]
72 For hash lookup criterion we compute retrieval precision which measures the percentage of true neighbors within Hamming radius r [21]. [sent-254, score-0.722]
73 This indicates that our method is able to achieve comparable time complexity as the existing hashing method. [sent-268, score-0.53]
74 In our experiments, we randomly select 5 videos from each semantic category as labeled data for training, and choose another 25 videos in each category as the query videos for testing hashing performance. [sent-273, score-1.294]
75 This results in 100 training videos and 500 query videos. [sent-274, score-0.267]
76 The key frames are evenly sampled every 2 seconds and each video has at least 30 key frames. [sent-276, score-0.314]
77 In the experiments, we evaluate the performance using hash codes with different length, ranging from 12 to 64 bits. [sent-282, score-0.636]
78 This is due to the fact that the former methods take advantages of structure information of video data (either discriminative local patterns or temporal consistency 22227777 (a) MAP (b) Precision within Hamming radius 2 Figure 5. [sent-285, score-0.49]
79 across successive frames) while the latter one only blindly generates hash codes without accounting any structure information; (3) Our proposed VHDT clearly beats the conventional hashing methods like SH and SPLH. [sent-287, score-1.308]
80 The reason is that these methods only try to learn hash functions for simple samples such as images and hence are not appropriate for video data; (4) Our VHDT method performs better than VHD and VHT, since the latter ones merely consider one aspect of the structure information in videos. [sent-288, score-0.779]
81 However, the precisions of all methods begin to drop when using longer hash codes. [sent-291, score-0.526]
82 This is because that, with the increasing of the number of hash bits, the number of samples falling in a bucket decreases exponentially, resulting in empty returns within the Hamming radius 2. [sent-292, score-0.685]
83 The entire dataset has around 150K videos falling into 25 semantic event categories. [sent-299, score-0.287]
84 For each video, we extract key frames every 2 seconds and obtain the final set containing over 12 million video key frames. [sent-301, score-0.314]
85 The first is to evaluate on the 10K videos with ground truth labels while the second is to evaluate on the entire 150K videos based on the top returned videos labeled by ourselves. [sent-304, score-0.604]
86 In this scenario, 25 labeled videos in each category are chosen as the query video clips for testing hashing performance. [sent-308, score-1.049]
87 The performance improvements are consistent as the number of hash bits varies. [sent-311, score-0.554]
88 In contrast, our proposed hashing method focuses on real time retrieval on large-scale video data. [sent-314, score-0.684]
89 Therefore, we only report the precision within the top 100 returned videos for each method. [sent-319, score-0.275]
90 We randomly select 5 videos with ground truth from each of the 25 categories, and consider the 125 videos as queries. [sent-320, score-0.382]
91 64-bit binary code is generated to search videos within the 150K video dataset. [sent-321, score-0.405]
92 Accuracy of top 100 retrieval videos using 64 bits over the 150K video dataset. [sent-332, score-0.373]
93 retrieve the NN videos within Hamming radius 2 and then pick up the top 100 videos ranked by the normalized frame hit frequencies. [sent-333, score-0.556]
94 Finally, the category label ofeach unlabeled video within the top 100 videos is manually annotated. [sent-334, score-0.413]
95 We calculate the accuracy of top 100 videos of each query and report the average performance across 125 queries in Table 1. [sent-335, score-0.297]
96 Figure 6 shows the key frames of some exemplar query videos as well as the top 6 returned key frames, which shows that our structure learning based video hashing consistently generates better visual retrieval results. [sent-337, score-1.185]
97 The proposed method works in a supervised setting with a ‘1-norm based empirical loss regularized by the video structure related terms. [sent-340, score-0.28]
98 Specifically, we use a ‘2,1-norm to select certain feature dimensions in the training videos to capture the discriminative local visual patterns, and employ a ‘∞-norm in the binary codes of successive frames to preserve the temporal consistency in the learned Hamming space. [sent-341, score-0.698]
99 Submodular video hashing: A unified framework towards video pooling and indexing. [sent-362, score-0.308]
100 Multiple feature hashing for real-time large scale near-duplicate video retrieval. [sent-473, score-0.684]
wordName wordTfidf (topN-words)
[('hashing', 0.53), ('hash', 0.526), ('hamming', 0.206), ('videos', 0.191), ('vhdt', 0.173), ('video', 0.154), ('xjb', 0.154), ('ccv', 0.142), ('trecvid', 0.126), ('codes', 0.11), ('med', 0.105), ('splh', 0.102), ('successive', 0.1), ('apg', 0.099), ('frames', 0.095), ('xia', 0.087), ('radius', 0.086), ('query', 0.076), ('temporal', 0.071), ('commonality', 0.068), ('lipschitz', 0.068), ('wk', 0.063), ('vhd', 0.058), ('vht', 0.058), ('lookup', 0.057), ('clips', 0.057), ('columbia', 0.055), ('ix', 0.052), ('supervised', 0.05), ('xj', 0.043), ('ij', 0.043), ('submodular', 0.042), ('patterns', 0.042), ('event', 0.042), ('structure', 0.042), ('ranking', 0.041), ('category', 0.041), ('consistency', 0.039), ('consumer', 0.039), ('rd', 0.037), ('sh', 0.037), ('wi', 0.036), ('kw', 0.035), ('design', 0.034), ('loss', 0.034), ('sgn', 0.034), ('hit', 0.034), ('hk', 0.033), ('xi', 0.033), ('semantic', 0.033), ('binary', 0.033), ('functions', 0.032), ('returned', 0.031), ('differentiable', 0.03), ('dimensions', 0.03), ('minimization', 0.03), ('queries', 0.03), ('lf', 0.03), ('indexing', 0.029), ('discriminative', 0.029), ('bk', 0.029), ('gi', 0.029), ('hi', 0.028), ('bits', 0.028), ('within', 0.027), ('ellis', 0.027), ('pthe', 0.027), ('frame', 0.027), ('xn', 0.027), ('precision', 0.026), ('objective', 0.026), ('bow', 0.026), ('ye', 0.026), ('samples', 0.025), ('pointwise', 0.025), ('spectral', 0.025), ('xt', 0.025), ('calculated', 0.025), ('ck', 0.025), ('reconstructive', 0.025), ('copy', 0.025), ('ibm', 0.025), ('smoothing', 0.024), ('treat', 0.024), ('mwin', 0.023), ('realized', 0.023), ('comparisons', 0.023), ('nn', 0.023), ('chang', 0.023), ('gradient', 0.022), ('approximation', 0.022), ('learning', 0.022), ('feeding', 0.022), ('minimizer', 0.022), ('key', 0.022), ('pairwise', 0.021), ('falling', 0.021), ('ap', 0.021), ('seconds', 0.021), ('herein', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 229 iccv-2013-Large-Scale Video Hashing via Structure Learning
Author: Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
Abstract: Recently, learning based hashing methods have become popular for indexing large-scale media data. Hashing methods map high-dimensional features to compact binary codes that are efficient to match and robust in preserving original similarity. However, most of the existing hashing methods treat videos as a simple aggregation of independent frames and index each video through combining the indexes of frames. The structure information of videos, e.g., discriminative local visual commonality and temporal consistency, is often neglected in the design of hash functions. In this paper, we propose a supervised method that explores the structure learning techniques to design efficient hash functions. The proposed video hashing method formulates a minimization problem over a structure-regularized empirical loss. In particular, the structure regularization exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video. We show that the minimization objective can be efficiently solved by an Acceler- ated Proximal Gradient (APG) method. Extensive experiments on two large video benchmark datasets (up to around 150K video clips with over 12 million frames) show that the proposed method significantly outperforms the state-ofthe-art hashing methods.
2 0.65272909 13 iccv-2013-A General Two-Step Approach to Learning-Based Hashing
Author: Guosheng Lin, Chunhua Shen, David Suter, Anton van_den_Hengel
Abstract: Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. Here we propose a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. This framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problemspecific hashing methods. Our framework decomposes the hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training standard binary classifiers. Both problems have been extensively studied in the literature. Our extensive experiments demonstrate that the proposed framework is effective, flexible and outperforms the state-of-the-art.
3 0.59320039 83 iccv-2013-Complementary Projection Hashing
Author: Zhongming Jin, Yao Hu, Yue Lin, Debing Zhang, Shiding Lin, Deng Cai, Xuelong Li
Abstract: Recently, hashing techniques have been widely applied to solve the approximate nearest neighbors search problem in many vision applications. Generally, these hashing approaches generate 2c buckets, where c is the length of the hash code. A good hashing method should satisfy the following two requirements: 1) mapping the nearby data points into the same bucket or nearby (measured by xue long l i opt . ac . cn @ a(a)b(b) the Hamming distance) buckets. 2) all the data points are evenly distributed among all the buckets. In this paper, we propose a novel algorithm named Complementary Projection Hashing (CPH) to find the optimal hashing functions which explicitly considers the above two requirements. Specifically, CPHaims at sequentiallyfinding a series ofhyperplanes (hashing functions) which cross the sparse region of the data. At the same time, the data points are evenly distributed in the hypercubes generated by these hyperplanes. The experiments comparing with the state-of-the-art hashing methods demonstrate the effectiveness of the proposed method.
4 0.54854745 239 iccv-2013-Learning Hash Codes with Listwise Supervision
Author: Jun Wang, Wei Liu, Andy X. Sun, Yu-Gang Jiang
Abstract: Hashing techniques have been intensively investigated in the design of highly efficient search engines for largescale computer vision applications. Compared with prior approximate nearest neighbor search approaches like treebased indexing, hashing-based search schemes have prominent advantages in terms of both storage and computational efficiencies. Moreover, the procedure of devising hash functions can be easily incorporated into sophisticated machine learning tools, leading to data-dependent and task-specific compact hash codes. Therefore, a number of learning paradigms, ranging from unsupervised to supervised, have been applied to compose appropriate hash functions. How- ever, most of the existing hash function learning methods either treat hash function design as a classification problem or generate binary codes to satisfy pairwise supervision, and have not yet directly optimized the search accuracy. In this paper, we propose to leverage listwise supervision into a principled hash function learning framework. In particular, the ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking. Simple linear projection-based hash functions are solved efficiently through maximizing the ranking quality over the training data. We carry out experiments on large image datasets with size up to one million and compare with the state-of-the-art hashing techniques. The extensive results corroborate that our learned hash codes via listwise supervision can provide superior search accuracy without incurring heavy computational overhead.
5 0.42523131 409 iccv-2013-Supervised Binary Hash Code Learning with Jensen Shannon Divergence
Author: Lixin Fan
Abstract: This paper proposes to learn binary hash codes within a statistical learning framework, in which an upper bound of the probability of Bayes decision errors is derived for different forms of hash functions and a rigorous proof of the convergence of the upper bound is presented. Consequently, minimizing such an upper bound leads to consistent performance improvements of existing hash code learning algorithms, regardless of whether original algorithms are unsupervised or supervised. This paper also illustrates a fast hash coding method that exploits simple binary tests to achieve orders of magnitude improvement in coding speed as compared to projection based methods.
6 0.23328507 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection
7 0.17858618 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
9 0.16594146 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
11 0.14549822 163 iccv-2013-Feature Weighting via Optimal Thresholding for Video Analysis
12 0.11972616 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
13 0.10150602 159 iccv-2013-Fast Neighborhood Graph Search Using Cartesian Concatenation
14 0.097855963 203 iccv-2013-How Related Exemplars Help Complex Event Detection in Web Videos?
15 0.086047284 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
16 0.085074998 81 iccv-2013-Combining the Right Features for Complex Event Recognition
17 0.084725067 127 iccv-2013-Dynamic Pooling for Complex Event Recognition
18 0.081335127 221 iccv-2013-Joint Inverted Indexing
19 0.080505058 266 iccv-2013-Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation
20 0.080026098 378 iccv-2013-Semantic-Aware Co-indexing for Image Retrieval
topicId topicWeight
[(0, 0.188), (1, 0.133), (2, -0.062), (3, -0.051), (4, -0.056), (5, 0.479), (6, 0.039), (7, -0.011), (8, -0.356), (9, 0.211), (10, -0.321), (11, 0.14), (12, -0.01), (13, -0.141), (14, -0.073), (15, 0.023), (16, -0.149), (17, 0.182), (18, -0.17), (19, 0.075), (20, 0.012), (21, 0.047), (22, 0.019), (23, 0.017), (24, 0.024), (25, -0.002), (26, -0.033), (27, -0.006), (28, -0.002), (29, -0.011), (30, 0.005), (31, -0.008), (32, 0.018), (33, -0.011), (34, -0.016), (35, -0.016), (36, -0.003), (37, 0.015), (38, -0.035), (39, 0.013), (40, -0.013), (41, -0.026), (42, -0.006), (43, -0.005), (44, 0.01), (45, 0.037), (46, 0.022), (47, 0.02), (48, -0.025), (49, -0.01)]
simIndex simValue paperId paperTitle
1 0.95312887 83 iccv-2013-Complementary Projection Hashing
Author: Zhongming Jin, Yao Hu, Yue Lin, Debing Zhang, Shiding Lin, Deng Cai, Xuelong Li
Abstract: Recently, hashing techniques have been widely applied to solve the approximate nearest neighbors search problem in many vision applications. Generally, these hashing approaches generate 2c buckets, where c is the length of the hash code. A good hashing method should satisfy the following two requirements: 1) mapping the nearby data points into the same bucket or nearby (measured by xue long l i opt . ac . cn @ a(a)b(b) the Hamming distance) buckets. 2) all the data points are evenly distributed among all the buckets. In this paper, we propose a novel algorithm named Complementary Projection Hashing (CPH) to find the optimal hashing functions which explicitly considers the above two requirements. Specifically, CPHaims at sequentiallyfinding a series ofhyperplanes (hashing functions) which cross the sparse region of the data. At the same time, the data points are evenly distributed in the hypercubes generated by these hyperplanes. The experiments comparing with the state-of-the-art hashing methods demonstrate the effectiveness of the proposed method.
same-paper 2 0.94274759 229 iccv-2013-Large-Scale Video Hashing via Structure Learning
Author: Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
Abstract: Recently, learning based hashing methods have become popular for indexing large-scale media data. Hashing methods map high-dimensional features to compact binary codes that are efficient to match and robust in preserving original similarity. However, most of the existing hashing methods treat videos as a simple aggregation of independent frames and index each video through combining the indexes of frames. The structure information of videos, e.g., discriminative local visual commonality and temporal consistency, is often neglected in the design of hash functions. In this paper, we propose a supervised method that explores the structure learning techniques to design efficient hash functions. The proposed video hashing method formulates a minimization problem over a structure-regularized empirical loss. In particular, the structure regularization exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video. We show that the minimization objective can be efficiently solved by an Acceler- ated Proximal Gradient (APG) method. Extensive experiments on two large video benchmark datasets (up to around 150K video clips with over 12 million frames) show that the proposed method significantly outperforms the state-ofthe-art hashing methods.
3 0.94111508 239 iccv-2013-Learning Hash Codes with Listwise Supervision
Author: Jun Wang, Wei Liu, Andy X. Sun, Yu-Gang Jiang
Abstract: Hashing techniques have been intensively investigated in the design of highly efficient search engines for largescale computer vision applications. Compared with prior approximate nearest neighbor search approaches like treebased indexing, hashing-based search schemes have prominent advantages in terms of both storage and computational efficiencies. Moreover, the procedure of devising hash functions can be easily incorporated into sophisticated machine learning tools, leading to data-dependent and task-specific compact hash codes. Therefore, a number of learning paradigms, ranging from unsupervised to supervised, have been applied to compose appropriate hash functions. How- ever, most of the existing hash function learning methods either treat hash function design as a classification problem or generate binary codes to satisfy pairwise supervision, and have not yet directly optimized the search accuracy. In this paper, we propose to leverage listwise supervision into a principled hash function learning framework. In particular, the ranking information is represented by a set of rank triplets that can be used to assess the quality of ranking. Simple linear projection-based hash functions are solved efficiently through maximizing the ranking quality over the training data. We carry out experiments on large image datasets with size up to one million and compare with the state-of-the-art hashing techniques. The extensive results corroborate that our learned hash codes via listwise supervision can provide superior search accuracy without incurring heavy computational overhead.
4 0.93975586 13 iccv-2013-A General Two-Step Approach to Learning-Based Hashing
Author: Guosheng Lin, Chunhua Shen, David Suter, Anton van_den_Hengel
Abstract: Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. Here we propose a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. This framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problemspecific hashing methods. Our framework decomposes the hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training standard binary classifiers. Both problems have been extensively studied in the literature. Our extensive experiments demonstrate that the proposed framework is effective, flexible and outperforms the state-of-the-art.
5 0.88604629 409 iccv-2013-Supervised Binary Hash Code Learning with Jensen Shannon Divergence
Author: Lixin Fan
Abstract: This paper proposes to learn binary hash codes within a statistical learning framework, in which an upper bound of the probability of Bayes decision errors is derived for different forms of hash functions and a rigorous proof of the convergence of the upper bound is presented. Consequently, minimizing such an upper bound leads to consistent performance improvements of existing hash code learning algorithms, regardless of whether original algorithms are unsupervised or supervised. This paper also illustrates a fast hash coding method that exploits simple binary tests to achieve orders of magnitude improvement in coding speed as compared to projection based methods.
8 0.44564745 400 iccv-2013-Stable Hyper-pooling and Query Expansion for Event Detection
9 0.40562871 221 iccv-2013-Joint Inverted Indexing
10 0.33965883 337 iccv-2013-Random Grids: Fast Approximate Nearest Neighbors and Range Searching for Image Search
11 0.30978587 159 iccv-2013-Fast Neighborhood Graph Search Using Cartesian Concatenation
12 0.27243781 163 iccv-2013-Feature Weighting via Optimal Thresholding for Video Analysis
13 0.26623753 162 iccv-2013-Fast Subspace Search via Grassmannian Based Hashing
14 0.26342386 446 iccv-2013-Visual Semantic Complex Network for Web Images
15 0.25551027 191 iccv-2013-Handling Uncertain Tags in Visual Recognition
16 0.24871999 34 iccv-2013-Abnormal Event Detection at 150 FPS in MATLAB
17 0.24771269 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation
18 0.24605381 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors
19 0.24185333 171 iccv-2013-Fix Structured Learning of 2013 ICCV paper k2opt.pdf
20 0.23955572 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
topicId topicWeight
[(2, 0.195), (4, 0.042), (7, 0.171), (12, 0.016), (26, 0.056), (31, 0.043), (42, 0.091), (48, 0.011), (64, 0.049), (73, 0.03), (78, 0.011), (89, 0.168)]
simIndex simValue paperId paperTitle
1 0.93150914 409 iccv-2013-Supervised Binary Hash Code Learning with Jensen Shannon Divergence
Author: Lixin Fan
Abstract: This paper proposes to learn binary hash codes within a statistical learning framework, in which an upper bound of the probability of Bayes decision errors is derived for different forms of hash functions and a rigorous proof of the convergence of the upper bound is presented. Consequently, minimizing such an upper bound leads to consistent performance improvements of existing hash code learning algorithms, regardless of whether original algorithms are unsupervised or supervised. This paper also illustrates a fast hash coding method that exploits simple binary tests to achieve orders of magnitude improvement in coding speed as compared to projection based methods.
Author: Weilin Huang, Zhe Lin, Jianchao Yang, Jue Wang
Abstract: In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding F- , measure values are 0. 72 and 0. 73, respectively, surpassing previous methods in accuracy by a large margin.
Author: Jiwen Lu, Gang Wang, Pierre Moulin
Abstract: This paper presents a new approach for image set classification, where each training and testing example contains a set of image instances of an object captured from varying viewpoints or under varying illuminations. While a number of image set classification methods have been proposed in recent years, most of them model each image set as a single linear subspace or mixture of linear subspaces, which may lose some discriminative information for classification. To address this, we propose exploring multiple order statistics as features of image sets, and develop a localized multikernel metric learning (LMKML) algorithm to effectively combine different order statistics information for classification. Our method achieves the state-of-the-art performance on four widely used databases including the Honda/UCSD, CMU Mobo, and Youtube face datasets, and the ETH-80 object dataset.
same-paper 4 0.8861109 229 iccv-2013-Large-Scale Video Hashing via Structure Learning
Author: Guangnan Ye, Dong Liu, Jun Wang, Shih-Fu Chang
Abstract: Recently, learning based hashing methods have become popular for indexing large-scale media data. Hashing methods map high-dimensional features to compact binary codes that are efficient to match and robust in preserving original similarity. However, most of the existing hashing methods treat videos as a simple aggregation of independent frames and index each video through combining the indexes of frames. The structure information of videos, e.g., discriminative local visual commonality and temporal consistency, is often neglected in the design of hash functions. In this paper, we propose a supervised method that explores the structure learning techniques to design efficient hash functions. The proposed video hashing method formulates a minimization problem over a structure-regularized empirical loss. In particular, the structure regularization exploits the common local visual patterns occurring in video frames that are associated with the same semantic class, and simultaneously preserves the temporal consistency over successive frames from the same video. We show that the minimization objective can be efficiently solved by an Acceler- ated Proximal Gradient (APG) method. Extensive experiments on two large video benchmark datasets (up to around 150K video clips with over 12 million frames) show that the proposed method significantly outperforms the state-ofthe-art hashing methods.
5 0.88359725 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds
Author: Chen Change Loy, Shaogang Gong, Tao Xiang
Abstract: Regression-based techniques have shown promising results for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustively annotating every single frame, the most informative frames are selected for annotation automatically and actively. (2) Rather than learning from only labelled data, the abundant unlabelled data are exploited. (3) Labelled data from other scenes are employed to further alleviate the burden for data annotation. All three ideas are implemented in a unified active and semi-supervised regression framework with ability to perform transfer learning, by exploiting the underlying geometric structure of crowd patterns via manifold analysis. Extensive experiments validate the effectiveness of our approach.
7 0.85565954 446 iccv-2013-Visual Semantic Complex Network for Web Images
8 0.85450506 239 iccv-2013-Learning Hash Codes with Listwise Supervision
9 0.85012251 244 iccv-2013-Learning View-Invariant Sparse Representations for Cross-View Action Recognition
10 0.84915769 13 iccv-2013-A General Two-Step Approach to Learning-Based Hashing
11 0.84784025 374 iccv-2013-Salient Region Detection by UFO: Uniqueness, Focusness and Objectness
12 0.84525287 214 iccv-2013-Improving Graph Matching via Density Maximization
13 0.84130394 83 iccv-2013-Complementary Projection Hashing
14 0.84113562 191 iccv-2013-Handling Uncertain Tags in Visual Recognition
15 0.83786166 153 iccv-2013-Face Recognition Using Face Patch Networks
16 0.83753872 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint
17 0.83581841 352 iccv-2013-Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees
18 0.82658684 322 iccv-2013-Pose Estimation and Segmentation of People in 3D Movies
19 0.82479405 323 iccv-2013-Pose Estimation with Unknown Focal Length Using Points, Directions and Lines
20 0.82277524 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences