cvpr cvpr2013 cvpr2013-5 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Go Irie, Dong Liu, Zhenguo Li, Shih-Fu Chang
Abstract: Despite significant progress, most existing visual dictionary learning methods rely on image descriptors alone or together with class labels. However, Web images are often associated with text data which may carry substantial information regarding image semantics, and may be exploited for visual dictionary learning. This paper explores this idea by leveraging relational information between image descriptors and textual words via co-clustering, in addition to information of image descriptors. Existing co-clustering methods are not optimal for this problem because they ignore the structure of image descriptors in the continuous space, which is crucial for capturing visual characteristics of images. We propose a novel Bayesian co-clustering model to jointly estimate the underlying distributions of the continuous image descriptors as well as the relationship between such distributions and the textual words through a unified Bayesian inference. Extensive experiments on image categorization and retrieval have validated the substantial value of the proposed joint modeling in improving visual dictionary learning, where our model shows superior performance over several recent methods.
Reference: text
sentIndex sentText sentNum sentScore
1 jp ‡ {dongl iu Abstract Despite significant progress, most existing visual dictionary learning methods rely on image descriptors alone or together with class labels. [sent-5, score-0.628]
2 However, Web images are often associated with text data which may carry substantial information regarding image semantics, and may be exploited for visual dictionary learning. [sent-6, score-0.468]
3 This paper explores this idea by leveraging relational information between image descriptors and textual words via co-clustering, in addition to information of image descriptors. [sent-7, score-1.1]
4 We propose a novel Bayesian co-clustering model to jointly estimate the underlying distributions of the continuous image descriptors as well as the relationship between such distributions and the textual words through a unified Bayesian inference. [sent-9, score-1.213]
5 Extensive experiments on image categorization and retrieval have validated the substantial value of the proposed joint modeling in improving visual dictionary learning, where our model shows superior performance over several recent methods. [sent-10, score-0.479]
6 The process is to first train a visual dictionary based on the extracted descriptors from an image collection and then to encode the descriptors of each image into a histogram based on the learned dictionary. [sent-14, score-0.728]
7 There has been considerable interest in visual dictionary learning which can be classified into two paradigms. [sent-16, score-0.466]
8 The second one is the supervised learning, which incorporates class labels into the visual dictionary [12, 30, 9, 10, 17]. [sent-21, score-0.483]
9 edu , isting visual dictionary learning methods are based on only single-modal information, i. [sent-25, score-0.466]
10 These facts motivate us to consider leveraging textual words for visual dictionary learning. [sent-32, score-1.136]
11 Specifically, the problem can be stated as following: given a set of images and their associated textual words, how to learn a visual dictionary that incorporates both image and text information. [sent-33, score-1.035]
12 First, there are a large number of local descriptors extracted from the images, whose corresponding relations to the textual words are totally unknown, making it difficult to explore the multimodal correlation. [sent-35, score-1.071]
13 Second, the visual and textual spaces are completely different from each other: image descriptors are generally in a continuous space (e. [sent-36, score-0.819]
14 , SIFT is typically represented as a 128-dimensional realvalued vector) whereas textual words are in a discrete space. [sent-38, score-0.706]
15 Addressing these issues, we propose a novel approach for learning a visual dictionary from both image and text information. [sent-39, score-0.504]
16 The clusters of image descriptors are determined based on their relations with respect to the textual word clusters, which well captures the multimodal correlation. [sent-44, score-1.159]
17 Note that the textual word clusters are important for discovering the significant multimodal correlation, due to the fact that the individual word is noisy and may not convey beneficial information while the clusters of multiple words can reflect the semantic topic of the consti333222999 ? [sent-45, score-1.527]
18 ated textual words, we first form a relational matrix that represents the relationship between image descriptors (rows) and textual words (columns), where each element is 1if the corresponding pair is extracted from an identical image and 0 otherwise. [sent-79, score-1.75]
19 (b) Our continuousdiscrete Bayesian co-clustering (CD-BCC) jointly estimates distributions of the continuous image descriptors as well as the relationship between the image descriptor distributions and textual words. [sent-80, score-1.26]
20 Each resulting cluster of image descriptors is thus expected to (i) have consistently co-occurring textual words and (ii) be visually different from the other clusters. [sent-82, score-0.897]
21 (c) These clusters form the final visual dictionary and are used to encode an image into a single image representation vector (histogram). [sent-83, score-0.545]
22 Specifically, we investigate how to perform co-clustering along a continuous image descriptor space and a discrete textual word space simultaneously, and propose continuousdiscrete Bayesian co-clustering (CD-BCC). [sent-85, score-0.956]
23 A straightforward approach may be first to quantize image de- scriptors using K-means and then to co-cluster quantized descriptors (visual words) and textual words. [sent-87, score-0.705]
24 Unlike these, our CD-BCC simultaneously estimates the underlying distributions of image descriptors over the continuous space and the relationship between the distributions and textual words via a unified Bayesian inference framework. [sent-89, score-1.237]
25 Consequently, each image descriptor cluster used to construct each dimension of the final image representation vector is ideally consistent to a set of textual words with consistent semantic topic as well as visually different from the other clusters. [sent-90, score-1.051]
26 Extensive experiments on five different datasets will demonstrate that the proposed multimodal visual dictionary learning approach can achieve significant performance gains when evaluated over various tasks including image classification and content-based image retrieval (CBIR). [sent-91, score-0.671]
27 Related Work We review some recent studies on visual dictionary learning, co-clustering, and multimodal topic modeling. [sent-94, score-0.732]
28 [12] trains a dictionary so as to maximize the mutual information. [sent-97, score-0.363]
29 [30, 9, 10, 17, 16] learn a single visual dictionary by jointly optimizing visual dictionaries and discriminative functions. [sent-99, score-0.557]
30 We aim to leverage weak textual words associated with images, instead of assuming strong class labels. [sent-100, score-0.73]
31 Prior supervised methods assume that class labels are mutually exclusive, which may not be reasonable in our problem because there is often strong correlation among textual words. [sent-101, score-0.647]
32 Contrary, our approach is based on co-clustering and takes into account textual word clusters which may effectively guide visual feature clustering. [sent-102, score-0.883]
33 Information theoretic co-clustering (ITCC) [4] determines clusters so as to minimize the loss of mutual information between a given relational matrix and its co-clustering results. [sent-107, score-0.431]
34 The most relevant approach to ours is Bayesian co-clustering (BCC) [22] which is a generative model of a relational matrix and estimates clusters of rows and columns in a Bayesian inference framework. [sent-109, score-0.469]
35 333333000 Unlike these previous methods, our CD-BCC is for a pair of continuous and discrete variables and jointly estimates the distributions of image descriptors over the continuous space and co-clusters ofthe distributions and textual words. [sent-111, score-1.073]
36 For instance, [29, 32] proposed LSA-based methods to model visual and textual words with an underlying latent topic space. [sent-116, score-0.891]
37 In particular, Li et al [14] proposed a Bayesian multimodal topic model for visual dictionary learning. [sent-119, score-0.732]
38 Our CDBCC is also a Bayesian model for visual dictionary learning but ours is a co-clustering model, not a topic model. [sent-120, score-0.563]
39 Specifically, topic models assume some mixture distributions over visual words as well as textual words. [sent-121, score-1.034]
40 Our CDBCC also assumes a mixture over image descriptors, but does not assume any distributions over textual words. [sent-122, score-0.731]
41 Instead, we assume a mixture over a relational matrix that encourages the model to identify the significant multimodal correlation from sparse and noisy relational data. [sent-123, score-0.832]
42 Multimodal Visual Dictionary Learning × We assume the typical image descriptor extraction process: a set of key points are detected from each training image first, and then an image descriptor (e. [sent-125, score-0.348]
43 Then our problem is: given the initial relational matri∈x RR and the corresponding set of image descriptors X, the goal is to find K clusters of image descriptors X used to form a visual dictionary with the assistance of the relational matrix R. [sent-141, score-1.368]
44 Illustrations of generative processes: (b) image descriptor generation and (c) relational matrix generation. [sent-146, score-0.533]
45 , a joint distribution of image descriptors X, the relational matrix R, and image descriptor clusters. [sent-150, score-0.63]
46 Based on the joint distribution, the image descriptor clusters are estimated based on its Bayesian posterior distributions under given X and R. [sent-151, score-0.453]
47 In the image descriptor generation process, all image descriptors X are generated from a mixture of V descriptor distributions over the continuous image descriptor space. [sent-158, score-0.9]
48 Image descriptor generation: N image descriptors X are generated from a mixture of V descriptor distributions Norm(μv , Σx) (v = 1, . [sent-162, score-0.65]
49 Draw m∼ix Nturoer proportion of descriptor distributions π; γ, V ∼ Dir(γ/V ). [sent-168, score-0.334]
50 Fπ;orγ e,aVch ∼ image descriptor xi (a) Draw descriptor distribution assignment ωi; π ∼ Mult(π) (b) Dr;awπ image descriptor xi |μ, ωi ; Σx ∼ Norm(μωi ; Σx) In Step 1, mean vectors o∼f NVo descriptor distributions, μ, are generated. [sent-170, score-0.767]
51 In Step 3, for each image descriptor xi, (a) one descriptor distribution ωi ∈ {1, . [sent-172, score-0.375]
52 Relational matrix generation: The relational matrix R is generated from Poisson(θk,l) as the following process: 4. [sent-176, score-0.326]
53 , each pair of image descriptor cluster and word cluster), draw co-occurrence frequency θk,l ; φ ∼ Gamma(β, φ). [sent-179, score-0.411]
54 For each descriptor distribution v and each word j, draw image descriptor cluster assignment |κ ∼ Mult(κ) and word cluster assignment zjw |λ ∼ Mu|κlt(λ∼) respectively. [sent-183, score-0.894]
55 For each pair of descriptor distribution and word (v, j), draw element of relational matrix rv,j |Θ, , zjw ∼ Poisson(θzvx ,zwj ). [sent-185, score-0.771]
56 Note that rv,j denotes the number of times image descriptors in v-th descriptor distributions co-occurs with j-th word1 . [sent-202, score-0.449]
57 2 Visual Dictionary Inference Observing the image descriptors X and the relational matrix R, we compute the posterior distributions to infer the image descriptor clusters and the mean vectors of the descriptor distributions μ used as the visual dictionary. [sent-205, score-1.26]
58 β, zvx zvx zvx zx The above generative process determines the joint distribution p(X, R, ω, , , π, κ, λ, μ, Θ). [sent-206, score-0.726]
59 μ of V descriptor distributions are estimated as zx zw zx μv = ? [sent-215, score-0.76]
60 rs while determines the clusters of these distributions based on their correlation to word clusters. [sent-219, score-0.438]
61 Therefore, they can be used as the visual dictionary to generate image representation for the new images. [sent-220, score-0.43]
62 R0, where R0 is the initial relational matrix between N image descriptors and W words. [sent-224, score-0.429]
63 will explain how to utilize them to generate textual information embedded image representation. [sent-225, score-0.567]
64 Innc computer vision, dictionary learning and coding are seen as independent processes (e. [sent-235, score-0.484]
65 This can be a clear evidence that our CD-BCC successfully captures discriminative information from textual words via zx zx. [sent-281, score-0.885]
66 Non-parametric Extension We note that we need to manually setup the following three parameters: the final dictionary size K, the number of descriptor distributions V and the number of word clusters L. [sent-284, score-0.923]
67 As long as we are interested in only the final dictionary size K, the other two, V and L, are parameters. [sent-285, score-0.363]
68 333333222 sual dictionary trained by our CD-BCC on UIUC-Sport dataset. [sent-288, score-0.363]
69 These datasets are selected for direct comparison to a state-of-the-art Bayesian multimodal topic model for dictionary learning [14]. [sent-323, score-0.701]
70 9623 badm bocc croq polo rock rowi sail snow badm bocc croq polo rock rowi sail snow / (a) (b) mcohinefpstraguielnhc8301oas409f3re61i2h058g1 ni06s3m291o7u6n2p1375e4n0s86rt2e0a867l mhcospifnrtaueglhio231c80. [sent-344, score-0.73]
71 (a-c) Example of image descriptors (red circles) correlated to the word cluster #7 (“horse”) are overlaid on images. [sent-357, score-0.325]
72 For both datasets, we randomly sample 50000 image descriptors (SIFT) extracted from the training data, and use them for visual dictionary learning. [sent-361, score-0.59]
73 This may be because that the unified learning of the intermediate descriptor distributions and their correlation to the textual words allows the visual dictionary to capture both visual and textual properties of the training images. [sent-368, score-2.166]
74 the dictionary size K where 50000 training samples are used and (b) the number of training samples in which we fix K = 256. [sent-373, score-0.363]
75 Numbers of (a) word clusters L and (b) descriptor distributions V at each iteration step. [sent-375, score-0.56]
76 Third, most coclustering methods show higher performance than the stateof-the-art Bayesian dictionary learning [14]. [sent-377, score-0.456]
77 This suggests that co-clustering can be more promising than multimodal topic modeling for multimodal visual dictionary learning. [sent-378, score-0.937]
78 One reason can be that CD-BCC successfully discovers image descriptor clusters related to a specific image category via co-clustered textual words. [sent-390, score-0.856]
79 5(a-c) show that image descriptors correlated to the “horse” word cluster are actually extracted from the parts of horses. [sent-394, score-0.347]
80 We also analyze the performance when varying dictionary size K and the number of training samples (image descriptors). [sent-395, score-0.387]
81 Similar to the most existing visual dictionary learning methods, the performance is somewhat sensitive to K. [sent-398, score-0.466]
82 The major reason can be that our CD-BCC trains a visual dictionary based on a statistical relationship between distributions of image descriptors and textual words, which can be stable (robust) against the number of training samples as well as ways to choose them. [sent-403, score-1.298]
83 7 shows estimated V (the number of descriptor distributions) and L (the number of clusters for text words) at each Gibbs sampler iteration step. [sent-405, score-0.354]
84 We select this dataset because this is frequently used to evaluate the performance of visual dictionary learning methods. [sent-410, score-0.466]
85 Caltech101 originally does not include any textual words, we thus directly use the class labels as textual words. [sent-414, score-1.158]
86 We employ sparse coding with SPM [13] and linear SVM to perform image categorization based on the visual dictionary trained by our CD-BCC. [sent-415, score-0.59]
87 We compare ours to ScSPM [27] (sparse coding and SPM with K-means dictionary + linear SVM), two visual dictionary learning methods (KSVD [1] and LLC [25]), and four co-clustering based methods (SCC, ITTC, NMTF, and BCC). [sent-416, score-0.914]
88 Note that our method is also comparable to some other recent supervised visual dictionary learning methods like [12, 30]. [sent-420, score-0.495]
89 The performance may be further improved by combining our visual dictionary with more sophisticated coding approaches like LLC. [sent-421, score-0.515]
90 Results on various dictionary sizes K ∈ {8, 16, 32, 64, 128, 256, 512} are reported. [sent-432, score-0.363]
91 For textual words, we first extracted only noun terms from all the documents, and selected 300 most frequent words (W = 300). [sent-440, score-0.728]
92 These results suggest that (i) leveraging textual words via co-clustering is effective for CBIR, and (ii) our CD-BCC is the promising co-clustering approach for this purpose. [sent-449, score-0.706]
93 Conclusion Focusing on the scenario where images are associated with textual words, we presented a Bayesian approach to multimodal visual dictionary learning. [sent-454, score-1.202]
94 We proposed a novel Bayesian co-clustering, CD-BCC, to learn a single visual dictionary based on the distributions of image descriptors over the continuous space, as well as the relationship between image descriptors and textual words. [sent-455, score-1.483]
95 periments validated values of textual words in improving visual dictionary learning, where our model showed superior performance over several recent methods. [sent-462, score-1.16]
96 Sampling Distribution The sampling distributions for ω, zx, and zw are: p(ωi = t|X, R, ω−i , ∝ (mt,−i + γ/V ) zx , zw) ×? [sent-465, score-0.407]
97 (9) where, mt/mt,−i is the number of image descriptors in tth descriptor distribution with/without i-th image descriptor. [sent-488, score-0.339]
98 mkx,−v (mlw,−j) is the number of descriptor distributions (kte,−xtvual wl,−orjds) in k-th (lth) cluster without v? [sent-490, score-0.364]
99 Learning a discriminative dictionary for sparse coding via label consistent k-svd. [sent-566, score-0.474]
100 On the integration of topic modeling and dictionary learning. [sent-596, score-0.46]
wordName wordTfidf (topN-words)
[('textual', 0.567), ('dictionary', 0.363), ('relational', 0.256), ('multimodal', 0.205), ('zx', 0.179), ('descriptor', 0.174), ('zvx', 0.152), ('words', 0.139), ('descriptors', 0.138), ('distributions', 0.137), ('word', 0.134), ('bayesian', 0.123), ('clusters', 0.115), ('cbir', 0.114), ('topic', 0.097), ('zjw', 0.095), ('zw', 0.091), ('polo', 0.089), ('coding', 0.085), ('zq', 0.078), ('flickr', 0.076), ('wikipedia', 0.074), ('visual', 0.067), ('mult', 0.063), ('dictionaries', 0.06), ('bcc', 0.057), ('cdbcc', 0.057), ('coclustering', 0.057), ('country', 0.057), ('nmtf', 0.057), ('xq', 0.057), ('snow', 0.054), ('cluster', 0.053), ('labelme', 0.051), ('scc', 0.051), ('draw', 0.05), ('categorization', 0.049), ('badminton', 0.047), ('continuous', 0.047), ('ml', 0.045), ('bovw', 0.044), ('croquet', 0.044), ('dir', 0.042), ('llc', 0.042), ('generative', 0.039), ('rock', 0.039), ('badm', 0.038), ('boarding', 0.038), ('bocc', 0.038), ('croq', 0.038), ('itcc', 0.038), ('mbw', 0.038), ('rowi', 0.038), ('text', 0.038), ('learning', 0.036), ('kdd', 0.036), ('matrix', 0.035), ('continuousdiscrete', 0.034), ('bocce', 0.031), ('sail', 0.031), ('texts', 0.031), ('tags', 0.031), ('generation', 0.029), ('dirichlet', 0.029), ('supervised', 0.029), ('gibbs', 0.028), ('articles', 0.028), ('collapsed', 0.028), ('correlation', 0.027), ('distribution', 0.027), ('posterior', 0.027), ('coast', 0.027), ('highway', 0.027), ('sampler', 0.027), ('categories', 0.027), ('mixture', 0.027), ('sparse', 0.026), ('relationship', 0.026), ('frequencies', 0.025), ('determines', 0.025), ('blei', 0.024), ('periments', 0.024), ('analyze', 0.024), ('horse', 0.024), ('class', 0.024), ('documents', 0.024), ('inference', 0.024), ('gamma', 0.023), ('xue', 0.023), ('proportion', 0.023), ('hyperparameters', 0.022), ('extracted', 0.022), ('xi', 0.022), ('bipartite', 0.022), ('unified', 0.022), ('semantic', 0.021), ('nonparametric', 0.021), ('latent', 0.021), ('distinguishes', 0.021), ('spm', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
Author: Go Irie, Dong Liu, Zhenguo Li, Shih-Fu Chang
Abstract: Despite significant progress, most existing visual dictionary learning methods rely on image descriptors alone or together with class labels. However, Web images are often associated with text data which may carry substantial information regarding image semantics, and may be exploited for visual dictionary learning. This paper explores this idea by leveraging relational information between image descriptors and textual words via co-clustering, in addition to information of image descriptors. Existing co-clustering methods are not optimal for this problem because they ignore the structure of image descriptors in the continuous space, which is crucial for capturing visual characteristics of images. We propose a novel Bayesian co-clustering model to jointly estimate the underlying distributions of the continuous image descriptors as well as the relationship between such distributions and the textual words through a unified Bayesian inference. Extensive experiments on image categorization and retrieval have validated the substantial value of the proposed joint modeling in improving visual dictionary learning, where our model shows superior performance over several recent methods.
2 0.31167445 296 cvpr-2013-Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization
Author: Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang
Abstract: For the task of visual categorization, the learning model is expected to be endowed with discriminative visual feature representation and flexibilities in processing many categories. Many existing approaches are designed based on a flat category structure, or rely on a set of pre-computed visual features, hence may not be appreciated for dealing with large numbers of categories. In this paper, we propose a novel dictionary learning method by taking advantage of hierarchical category correlation. For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity. Moreover, the dictionaries in lower levels also inherit the dictionary of ancestor nodes, so that categories in lower levels are described with multi-scale visual information using our dictionary learning approach. Experiments on ImageNet object data subset and SUN397 scene dataset demonstrate that our approach achieves promising performance on data with large numbers of classes compared with some state-of-the-art methods, and is more efficient in processing large numbers of categories.
3 0.26848182 392 cvpr-2013-Separable Dictionary Learning
Author: Simon Hawe, Matthias Seibert, Martin Kleinsteuber
Abstract: Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often perform better in applications as they are more adapted to the considered class of signals. In imagery, unfortunately, the numerical burden for (i) learning a dictionary and for (ii) employing the dictionary for reconstruction tasks only allows to deal with relatively small image patches that only capture local image information. The approach presented in this paper aims at overcoming these drawbacks by allowing a separable structure on the dictionary throughout the learning process. On the one hand, this permits larger patch-sizes for the learning phase, on the other hand, the dictionary is applied efficiently in reconstruction tasks. The learning procedure is based on optimizing over a product of spheres which updates the dictionary as a whole, thus enforces basic dictionary proper- , ties such as mutual coherence explicitly during the learning procedure. In the special case where no separable structure is enforced, our method competes with state-of-the-art dictionary learning methods like K-SVD.
4 0.26711231 257 cvpr-2013-Learning Structured Low-Rank Representations for Image Classification
Author: Yangmuzi Zhang, Zhuolin Jiang, Larry S. Davis
Abstract: An approach to learn a structured low-rank representation for image classification is presented. We use a supervised learning method to construct a discriminative and reconstructive dictionary. By introducing an ideal regularization term, we perform low-rank matrix recovery for contaminated training data from all categories simultaneously without losing structural information. A discriminative low-rank representation for images with respect to the constructed dictionary is obtained. With semantic structure information and strong identification capability, this representation is good for classification tasks even using a simple linear multi-classifier. Experimental results demonstrate the effectiveness of our approach.
Author: Li He, Hairong Qi, Russell Zaretzki
Abstract: This paper addresses the problem of learning overcomplete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship between the two spaces. A Bayesian method using a beta process prior is applied to learn the over-complete dictionaries. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dictionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Another advantage of the proposed method is that the number of dictionary atoms and their relative importance may be inferred non-parametrically. We compare the proposed approach to several state-of-the-art dictionary learning methods super-resolution. tionaries learned resolution results ods. by applying this method to single image The experimental results show that dicby our method produces the best supercompared to other state-of-the-art meth-
6 0.23850198 185 cvpr-2013-Generalized Domain-Adaptive Dictionaries
7 0.23566265 66 cvpr-2013-Block and Group Regularized Sparse Modeling for Dictionary Learning
8 0.23378363 315 cvpr-2013-Online Robust Dictionary Learning
9 0.18002596 422 cvpr-2013-Tag Taxonomy Aware Dictionary Learning for Region Tagging
10 0.16830944 125 cvpr-2013-Dictionary Learning from Ambiguously Labeled Data
11 0.15837173 302 cvpr-2013-Multi-task Sparse Learning with Beta Process Prior for Action Recognition
12 0.14121701 25 cvpr-2013-A Sentence Is Worth a Thousand Pixels
13 0.12381136 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification
14 0.12092886 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification
15 0.11426979 419 cvpr-2013-Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation
16 0.11411607 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
17 0.11342743 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
18 0.10267235 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
19 0.10193618 434 cvpr-2013-Topical Video Object Discovery from Key Frames by Modeling Word Co-occurrence Prior
20 0.099240869 456 cvpr-2013-Visual Place Recognition with Repetitive Structures
topicId topicWeight
[(0, 0.183), (1, -0.178), (2, -0.21), (3, 0.233), (4, -0.077), (5, -0.082), (6, 0.043), (7, 0.113), (8, -0.079), (9, 0.056), (10, 0.001), (11, 0.045), (12, 0.044), (13, 0.036), (14, 0.046), (15, -0.023), (16, 0.043), (17, 0.033), (18, 0.024), (19, -0.078), (20, 0.082), (21, -0.0), (22, 0.057), (23, 0.043), (24, -0.036), (25, 0.033), (26, -0.004), (27, 0.084), (28, -0.031), (29, -0.041), (30, 0.061), (31, 0.036), (32, -0.045), (33, 0.051), (34, 0.026), (35, -0.043), (36, -0.023), (37, 0.091), (38, 0.022), (39, -0.069), (40, 0.044), (41, -0.039), (42, -0.103), (43, 0.041), (44, -0.009), (45, 0.019), (46, -0.038), (47, 0.01), (48, 0.046), (49, -0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.95337611 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
Author: Go Irie, Dong Liu, Zhenguo Li, Shih-Fu Chang
Abstract: Despite significant progress, most existing visual dictionary learning methods rely on image descriptors alone or together with class labels. However, Web images are often associated with text data which may carry substantial information regarding image semantics, and may be exploited for visual dictionary learning. This paper explores this idea by leveraging relational information between image descriptors and textual words via co-clustering, in addition to information of image descriptors. Existing co-clustering methods are not optimal for this problem because they ignore the structure of image descriptors in the continuous space, which is crucial for capturing visual characteristics of images. We propose a novel Bayesian co-clustering model to jointly estimate the underlying distributions of the continuous image descriptors as well as the relationship between such distributions and the textual words through a unified Bayesian inference. Extensive experiments on image categorization and retrieval have validated the substantial value of the proposed joint modeling in improving visual dictionary learning, where our model shows superior performance over several recent methods.
2 0.84163642 296 cvpr-2013-Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization
Author: Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang
Abstract: For the task of visual categorization, the learning model is expected to be endowed with discriminative visual feature representation and flexibilities in processing many categories. Many existing approaches are designed based on a flat category structure, or rely on a set of pre-computed visual features, hence may not be appreciated for dealing with large numbers of categories. In this paper, we propose a novel dictionary learning method by taking advantage of hierarchical category correlation. For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity. Moreover, the dictionaries in lower levels also inherit the dictionary of ancestor nodes, so that categories in lower levels are described with multi-scale visual information using our dictionary learning approach. Experiments on ImageNet object data subset and SUN397 scene dataset demonstrate that our approach achieves promising performance on data with large numbers of classes compared with some state-of-the-art methods, and is more efficient in processing large numbers of categories.
3 0.8242498 66 cvpr-2013-Block and Group Regularized Sparse Modeling for Dictionary Learning
Author: Yu-Tseh Chi, Mohsen Ali, Ajit Rajwade, Jeffrey Ho
Abstract: This paper proposes a dictionary learning framework that combines the proposed block/group (BGSC) or reconstructed block/group (R-BGSC) sparse coding schemes with the novel Intra-block Coherence Suppression Dictionary Learning (ICS-DL) algorithm. An important and distinguishing feature of the proposed framework is that all dictionary blocks are trained simultaneously with respect to each data group while the intra-block coherence being explicitly minimized as an important objective. We provide both empirical evidence and heuristic support for this feature that can be considered as a direct consequence of incorporating both the group structure for the input data and the block structure for the dictionary in the learning process. The optimization problems for both the dictionary learning and sparse coding can be solved efficiently using block-gradient descent, and the details of the optimization algorithms are presented. We evaluate the proposed methods using well-known datasets, and favorable comparisons with state-of-the-art dictionary learning methods demonstrate the viability and validity of the proposed framework.
Author: Li He, Hairong Qi, Russell Zaretzki
Abstract: This paper addresses the problem of learning overcomplete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship between the two spaces. A Bayesian method using a beta process prior is applied to learn the over-complete dictionaries. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dictionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Another advantage of the proposed method is that the number of dictionary atoms and their relative importance may be inferred non-parametrically. We compare the proposed approach to several state-of-the-art dictionary learning methods super-resolution. tionaries learned resolution results ods. by applying this method to single image The experimental results show that dicby our method produces the best supercompared to other state-of-the-art meth-
5 0.80199504 315 cvpr-2013-Online Robust Dictionary Learning
Author: Cewu Lu, Jiaping Shi, Jiaya Jia
Abstract: Online dictionary learning is particularly useful for processing large-scale and dynamic data in computer vision. It, however, faces the major difficulty to incorporate robust functions, rather than the square data fitting term, to handle outliers in training data. In thispaper, wepropose a new online framework enabling the use of ?1 sparse data fitting term in robust dictionary learning, notably enhancing the usability and practicality of this important technique. Extensive experiments have been carried out to validate our new framework.
6 0.79887319 392 cvpr-2013-Separable Dictionary Learning
7 0.77384758 257 cvpr-2013-Learning Structured Low-Rank Representations for Image Classification
8 0.72947007 125 cvpr-2013-Dictionary Learning from Ambiguously Labeled Data
9 0.65438342 185 cvpr-2013-Generalized Domain-Adaptive Dictionaries
10 0.6525138 422 cvpr-2013-Tag Taxonomy Aware Dictionary Learning for Region Tagging
11 0.62285066 302 cvpr-2013-Multi-task Sparse Learning with Beta Process Prior for Action Recognition
12 0.59093595 83 cvpr-2013-Classification of Tumor Histology via Morphometric Context
13 0.56570554 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
14 0.54706055 178 cvpr-2013-From Local Similarity to Global Coding: An Application to Image Classification
15 0.54020321 220 cvpr-2013-In Defense of Sparsity Based Face Recognition
16 0.53885859 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
17 0.53780669 275 cvpr-2013-Lp-Norm IDF for Large Scale Image Search
18 0.52528203 53 cvpr-2013-BFO Meets HOG: Feature Extraction Based on Histograms of Oriented p.d.f. Gradients for Image Classification
19 0.50752324 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
20 0.50170928 200 cvpr-2013-Harvesting Mid-level Visual Concepts from Large-Scale Internet Images
topicId topicWeight
[(2, 0.219), (10, 0.089), (16, 0.017), (26, 0.041), (28, 0.027), (33, 0.279), (39, 0.011), (67, 0.097), (69, 0.05), (76, 0.012), (77, 0.023), (87, 0.048)]
simIndex simValue paperId paperTitle
1 0.86501801 361 cvpr-2013-Robust Feature Matching with Alternate Hough and Inverted Hough Transforms
Author: Hsin-Yi Chen, Yen-Yu Lin, Bing-Yu Chen
Abstract: We present an algorithm that carries out alternate Hough transform and inverted Hough transform to establish feature correspondences, and enhances the quality of matching in both precision and recall. Inspired by the fact that nearby features on the same object share coherent homographies in matching, we cast the task of feature matching as a density estimation problem in the Hough space spanned by the hypotheses of homographies. Specifically, we project all the correspondences into the Hough space, and determine the correctness of the correspondences by their respective densities. In this way, mutual verification of relevant correspondences is activated, and the precision of matching is boosted. On the other hand, we infer the concerted homographies propagated from the locally grouped features, and enrich the correspondence candidates for each feature. The recall is hence increased. The two processes are tightly coupled. Through iterative optimization, plausible enrichments are gradually revealed while more correct correspondences are detected. Promising experimental results on three benchmark datasets manifest the effectiveness of the proposed approach.
same-paper 2 0.85421288 5 cvpr-2013-A Bayesian Approach to Multimodal Visual Dictionary Learning
Author: Go Irie, Dong Liu, Zhenguo Li, Shih-Fu Chang
Abstract: Despite significant progress, most existing visual dictionary learning methods rely on image descriptors alone or together with class labels. However, Web images are often associated with text data which may carry substantial information regarding image semantics, and may be exploited for visual dictionary learning. This paper explores this idea by leveraging relational information between image descriptors and textual words via co-clustering, in addition to information of image descriptors. Existing co-clustering methods are not optimal for this problem because they ignore the structure of image descriptors in the continuous space, which is crucial for capturing visual characteristics of images. We propose a novel Bayesian co-clustering model to jointly estimate the underlying distributions of the continuous image descriptors as well as the relationship between such distributions and the textual words through a unified Bayesian inference. Extensive experiments on image categorization and retrieval have validated the substantial value of the proposed joint modeling in improving visual dictionary learning, where our model shows superior performance over several recent methods.
3 0.85092652 162 cvpr-2013-FasT-Match: Fast Affine Template Matching
Author: Simon Korman, Daniel Reichman, Gilad Tsur, Shai Avidan
Abstract: Fast-Match is a fast algorithm for approximate template matching under 2D affine transformations that minimizes the Sum-of-Absolute-Differences (SAD) error measure. There is a huge number of transformations to consider but we prove that they can be sampled using a density that depends on the smoothness of the image. For each potential transformation, we approximate the SAD error using a sublinear algorithm that randomly examines only a small number of pixels. We further accelerate the algorithm using a branch-and-bound scheme. As images are known to be piecewise smooth, the result is a practical affine template matching algorithm with approximation guarantees, that takes a few seconds to run on a standard machine. We perform several experiments on three different datasets, and report very good results. To the best of our knowledge, this is the first template matching algorithm which is guaranteed to handle arbitrary 2D affine transformations.
4 0.81827587 119 cvpr-2013-Detecting and Aligning Faces by Image Retrieval
Author: Xiaohui Shen, Zhe Lin, Jonathan Brandt, Ying Wu
Abstract: Detecting faces in uncontrolled environments continues to be a challenge to traditional face detection methods[24] due to the large variation in facial appearances, as well as occlusion and clutter. In order to overcome these challenges, we present a novel and robust exemplarbased face detector that integrates image retrieval and discriminative learning. A large database of faces with bounding rectangles and facial landmark locations is collected, and simple discriminative classifiers are learned from each of them. A voting-based method is then proposed to let these classifiers cast votes on the test image through an efficient image retrieval technique. As a result, faces can be very efficiently detected by selecting the modes from the voting maps, without resorting to exhaustive sliding window-style scanning. Moreover, due to the exemplar-based framework, our approach can detect faces under challenging conditions without explicitly modeling their variations. Evaluation on two public benchmark datasets shows that our new face detection approach is accurate and efficient, and achieves the state-of-the-art performance. We further propose to use image retrieval for face validation (in order to remove false positives) and for face alignment/landmark localization. The same methodology can also be easily generalized to other facerelated tasks, such as attribute recognition, as well as general object detection.
5 0.81559283 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
Author: Ishani Chakraborty, Hui Cheng, Omar Javed
Abstract: We present a unified framework for detecting and classifying people interactions in unconstrained user generated images. 1 Unlike previous approaches that directly map people/face locations in 2D image space into features for classification, we first estimate camera viewpoint and people positions in 3D space and then extract spatial configuration features from explicit 3D people positions. This approach has several advantages. First, it can accurately estimate relative distances and orientations between people in 3D. Second, it encodes spatial arrangements of people into a richer set of shape descriptors than afforded in 2D. Our 3D shape descriptors are invariant to camera pose variations often seen in web images and videos. The proposed approach also estimates camera pose and uses it to capture the intent of the photo. To achieve accurate 3D people layout estimation, we develop an algorithm that robustly fuses semantic constraints about human interpositions into a linear camera model. This enables our model to handle large variations in people size, heights (e.g. age) and poses. An accurate 3D layout also allows us to construct features informed by Proxemics that improves our semantic classification. To characterize the human interaction space, we introduce visual proxemes; a set of prototypical patterns that represent commonly occurring social interactions in events. We train a discriminative classifier that classifies 3D arrangements of people into visual proxemes and quantitatively evaluate the performance on a large, challenging dataset.
6 0.81550086 254 cvpr-2013-Learning SURF Cascade for Fast and Accurate Object Detection
7 0.81528056 94 cvpr-2013-Context-Aware Modeling and Recognition of Activities in Video
8 0.81348908 122 cvpr-2013-Detection Evolution with Multi-order Contextual Co-occurrence
9 0.81344849 204 cvpr-2013-Histograms of Sparse Codes for Object Detection
10 0.81301975 322 cvpr-2013-PISA: Pixelwise Image Saliency by Aggregating Complementary Appearance Contrast Measures with Spatial Priors
11 0.81251019 339 cvpr-2013-Probabilistic Graphlet Cut: Exploiting Spatial Structure Cue for Weakly Supervised Image Segmentation
12 0.81241429 416 cvpr-2013-Studying Relationships between Human Gaze, Description, and Computer Vision
13 0.8122704 438 cvpr-2013-Towards Pose Robust Face Recognition
14 0.81195438 318 cvpr-2013-Optimized Pedestrian Detection for Multiple and Occluded People
15 0.81142092 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
16 0.81112003 221 cvpr-2013-Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection
17 0.81063992 60 cvpr-2013-Beyond Physical Connections: Tree Models in Human Pose Estimation
18 0.80998832 334 cvpr-2013-Pose from Flow and Flow from Pose
19 0.80987257 202 cvpr-2013-Hierarchical Saliency Detection
20 0.80973238 345 cvpr-2013-Real-Time Model-Based Rigid Object Pose Estimation and Tracking Combining Dense and Sparse Visual Cues