iccv iccv2013 iccv2013-73 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
Reference: text
sentIndex sentText sentNum sentScore
1 An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. [sent-8, score-0.524]
2 This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. [sent-9, score-0.449]
3 To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. [sent-10, score-0.408]
4 a single set of topics shared across classes is replaced by multiple class-specific topic sets. [sent-14, score-0.629]
5 A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. [sent-16, score-0.498]
6 We refer to this as the flat model, due to its lack of hierarchical word groupings. [sent-25, score-0.34]
7 Popular examples include hierarchical topic models, such as latent Dirichlet allocation (LDA) [2] and probabilistic latent semantic analysis (pLSA) [9]. [sent-28, score-0.441]
8 Since LDA and pLSA topics are discovered in an unsupervised fashion, these models have limited use for classification. [sent-30, score-0.377]
9 One popular extension is to apply a classifier, such as an SVM, to the topic representation [2, 3, 14]. [sent-32, score-0.375]
10 We refer to these as discriminant extensions, and the combination of SVM with LDA topic vectors as SVM-LDA. [sent-33, score-0.348]
11 Another popular generative extension is to directly equate the topics with the class labels themselves establishing a one-to-one mapping with between topics and class labels, e. [sent-44, score-0.946]
12 Theoretical analysis shows that the impact of class information on the topics discovered by cLDA and sLDA is very weak in general, and vanishes for large samples. [sent-50, score-0.481]
13 Experiments demonstrate that the classification accuracies of cLDA and sLDA are not superior to those of unsupervised topic discovery. [sent-51, score-0.405]
14 Topic-supervision establishes a much stronger correlation between the topics and the class labels, nevertheless they are unable to outperform the simple flat model. [sent-53, score-0.65]
15 In fact, we show that topic supervised models are fundamentally not different from the flat model. [sent-54, score-0.594]
16 To combine the labeling strength of topic-supervision with the flexibility of topic-discovery of LDA, we propose a novel classification architecture, denoted class-specific simplex LDA (css-LDA). [sent-55, score-0.298]
17 Inspired by the flat model, css-LDA differs from the existing LDA extensions in that supervision is introduced directly at the level of image features. [sent-56, score-0.31]
18 This induces the discovery of class-specific topic simplices and, consequently, class-specific topic distributions, enabling a much richer modeling of intra-class structure without compromising discrimination ability. [sent-57, score-0.784]
19 This image representation can be described as a set of topic specific word counts, where topics are informed by class labels. [sent-61, score-0.852]
20 In the absence of topic structure and supervision, this vector reduces to the standard BoW histogram [6, 17]. [sent-62, score-0.358]
21 In this model, shown in Figure 1(c), a class variable Y is introduced as the parent of the topic prior Π. [sent-114, score-0.475]
22 In this way, each class defines a prior distribution in topic space PΠ|Y (π|y; αy), conditioned on which the topic probability vecto(rπ π|y ;isα sampled. [sent-115, score-0.881]
23 As shown in Figure 1(d), the class variable Y is conditioned by topics Z. [sent-121, score-0.433]
24 Topic-conditional word distributions are learned with supervision and aligned with the class-conditional distributions of the flat model. [sent-139, score-0.593]
25 Topic-Supervised Approaches Another popular approach to introduce supervision in LDA, is to equate topics directly to the class labels. [sent-144, score-0.499]
26 The resulting extension is denoted as topic supervised LDA (tsLDA) [20, 15]. [sent-145, score-0.394]
27 The graphical model of the topic supervised extension of any LDA model is exactly the same as that of the model without topic supervision. [sent-146, score-0.804]
28 The only, subtle yet significant, difference is that the topics are no longer discovered, but specified. [sent-147, score-0.281]
29 This makes the topic-conditional distributions identical to the class-conditional distributions of the flat model. [sent-148, score-0.412]
30 Figure 2(a) illustrates the two dimensional simplex of distributions over three words. [sent-160, score-0.367]
31 Each topic in an LDA model defines a probability distribution over words and is represented as a point on the word simplex. [sent-167, score-0.548]
32 Since topic probabilities are mixing probabilities for word distributions, a set of K topics defines a K−1 simfpolrex w ionr tdh dei wstroibrdu tsiiomnpsl,e ax s, e hte orfe K Kde tnooptiecds tdheef tnoepsic a Ksim−p1le sxi. [sent-168, score-0.749]
33 m Ifthe number of topics K is strictly smaller than the number of words |V|, the topic simplex is a low-dimensional subsoifm wploerxd so |fV Vt|h,e hweo rtod psicim spilmepxl. [sent-169, score-0.884]
34 Txh ies pa r loojwec-tdioimn eonfs i monaagles s obnthe topic simplex can be thought of as dimensionality reduction. [sent-170, score-0.594]
35 In Figure 2(b), the two topics are represented by Λ1 and Λ2, and span a one-dimensional simplex, shown as a connecting line segment. [sent-171, score-0.281]
36 In cLDA, each class defines a distribution (parameterized by αy) on the topic simplex. [sent-172, score-0.51]
37 Similar to cLDA, sLDA can be represented on the topic simplex, where each class defines a softmax function3. [sent-174, score-0.542]
38 Figure 2(c) shows the schematic of ts-cLDA for a two class problem on a three word simplex. [sent-175, score-0.273]
39 While the topic distributions of cLDA, learned by topic discovery, can be positioned anywhere on the word simplex, those of ts-cLDA are specified, and identical to the class-conditional distributions of the flat model. [sent-178, score-1.206]
40 3Strictly speaking, the softmax function is defined on the average of the sampled topic assignment labels z. [sent-179, score-0.389]
41 However, when the number of features N is sufficiently large, z is proportional to the topic distribution π. [sent-180, score-0.357]
42 Thus, the softmax function can be thought of as defined on the topic simplex. [sent-181, score-0.389]
43 Classification accuracy as function of the number of topics for sLDA and cLDA, using topics learned with and without class influence and codebooks of size 1024 on N13. [sent-184, score-0.74]
44 Limitations of Existing Models In this section we present theoretical and experimental evidence that, contrary to popular belief, topics discovered by sLDA and cLDA are not more suitable for discrimination than those of standard LDA. [sent-187, score-0.414]
45 In both sLDA and cLDA the parameters Λ1:K of the topic dis∑trib∑utions are obtained via the variational M-step as Λ∑kv ∝ ∑d ∑n δ(wnd, v)ϕndk, where d indexes the images, ∑v Λ∝kv∑ = ∑1, δ() is a Kronecker delta function and ϕnk i∑s the pa∑ram∑eter of the variational distribution q(z). [sent-189, score-0.494]
46 The important point is that the class label yd only influences the topic distributions through (3) for cLDA (where αyd is used to compute the parameter γd) and (6) for sLDA (where the variational parameter ϕndk depends on the class label yd through ζydk/N). [sent-192, score-0.851]
47 It follows that the connection between class label Y and the learned topics Γk is extremely weak. [sent-198, score-0.433]
48 In summary, topics learned with either cLDA or sLDA are very unlikely to be informative of semantic regularities of interest for classification, and much more likely to capture generic regularities, common to all classes. [sent-204, score-0.401]
49 To confirm these observations, we performed experiments with topics learned under two approaches. [sent-205, score-0.305]
50 In the second we severed all connections with the class label variable during topic learning, by reducing the variational E-step (of both cLDA and sLDA) to, γdk∗ = ∑ ϕndk + α, ϕdn∗k ∝ Λkwnd exp [ψ(γkd)] (7) ∑n with α = 1. [sent-209, score-0.532]
51 These results show that the performance of cLDA and sLDA is similar to that of topic learning without class supervision. [sent-219, score-0.455]
52 In both cases, the class variable has very weak impact on the learning of topic distributions. [sent-220, score-0.455]
53 Limitations of Topic-Supervised Models In the previous section, we have seen that models such as sLDA or cLDA effectively learn topics without supervision. [sent-223, score-0.281]
54 The simplest solution to address the lack of correlation between class labels and the topics, is to force topics to reflect the semantic regularities of interest as is done in 4This discussion refers to the sLDA formulation of [19], which proposed specifically for image classification. [sent-224, score-0.505]
55 For ts-sLDA and ts-cLDA the number of topics is equal to the number of classes. [sent-229, score-0.281]
56 For sLDA and cLDA, results are presented for the number of topics of best performance. [sent-230, score-0.281]
57 Class Specific Simplex Latent Dirichlet Allocation (css-LDA) To overcome the limitation of existing LDA based image classification models, in this section we introduce a new LDA model for image classification, denoted class-specific simplex LDA. [sent-239, score-0.316]
58 Motivation The inability of the LDA variants to outperform the flat model is perhaps best understood by returning to Figure 2. [sent-242, score-0.297]
59 Note that both cLDA and ts-cLDA map images from a high dimensional word simplex to a low dimensional topic simplex, which is common to all classes. [sent-243, score-0.745]
60 This restricts the scope of the class models, which are simple Dirichlet distributions over the topic simplex. [sent-244, score-0.549]
61 Since the topic simplex is common, and low dimensional, too few degrees of freedom are available to characterize intra-class structure, preventing a very detailed discrimination of the different classes. [sent-247, score-0.589]
62 In fact, the main conclusion of the previous sections is that the bulk of the modeling power of LDA lies on the selection of the topic simplex, and not on the modeling of the data distribution in it. [sent-248, score-0.357]
63 Since to capture the semantic regularities of the data, the simplex has to be aligned with the Figure−102. [sent-249, score-0.34]
64 1023fth−59eo71piFclat0M2vo3ed1cltr6s4d2io83ver by css-LDA (marked #1 - #10), and class-conditional distribution of flat model (marked flat model), for left) “Bocce”(S8) and right) “Highway”(N13) classes. [sent-252, score-0.496]
65 Also shown are the nearest neighbor images of sample topic conditional distributions. [sent-253, score-0.355]
66 class labels as is done under topic-supervision there is little room to outperform the flat model. [sent-254, score-0.369]
67 This limitation is common to any model that constrains the class-conditional distributions to lie on a common topic simplex. [sent-255, score-0.439]
68 This is the case whenever the class label Y is connected to either the prior or topic Z variables, as in the graphical models of Figure 1. [sent-256, score-0.504]
69 Since the topic simplex is smaller than the word simplex, it has limited ability to simultaneously model rich intra-class structure and keep the classes separated. [sent-257, score-0.726]
70 For this, it is necessary that the class label Y affects the word distributions directly, freeing these to distribute themselves across the word simplex in the most discriminant manner. [sent-258, score-0.719]
71 This implies that Y must be connected to the word variable W, as in the flat model. [sent-259, score-0.34]
72 The first follows from the fact that it makes the topic conditional distributions dependent on the class. [sent-261, score-0.449]
73 Returning to Figure 2, this implies that the vertices of the topic simplex are class-dependent, as shown in (d). [sent-262, score-0.571]
74 Note that there are two one-dimensional topic simplices, one for each class defined by the parameters Λ1y and Λ2y, y ∈ {1, 2}. [sent-263, score-0.455]
75 Hence, each class is endowed with its own topic simplex justifying the denomination of the model as class-specific simplex LDA. [sent-266, score-0.961]
76 With respect to LDA, because there are multiple topic simplices, the class-conditional distributions can have little overlap in word-simplex even when topic simplices are low dimensional. [sent-268, score-0.815]
77 Under the flat model, the “Bocce” class is modeled by a single point in the word simplex, the average of all these distributions, as shown in Figure 2 (a). [sent-275, score-0.468]
78 Rather than this, cssLDA devotes to each class a topic simplex, as shown in Figure 2 (d). [sent-277, score-0.455]
79 In the example of Figure 2, while the flat model approximates all the images of each class by a point in word simplex, css-LDA relies on a line segment. [sent-279, score-0.486]
80 In higher dimensions the difference can be much more substantial, since each topic simplex is a subspace of dimension K − 1 (K the number of topics), while the approximation oKf t−he 1 f (laKt m thoede nlu ims baelrwa ofys t oap ipcosi)n,t. [sent-280, score-0.571]
81 w Thihleus t hcess a-pLpDroAx icmaant aiocncount for much more complex class structure than the flat counterpart. [sent-281, score-0.352]
82 p Id iante ssu spimplielmaren tot) t,h ∏n γk∗ = ∑ ϕnk + αk, ϕn∗k ∝ Λkywn exp [ψ(γk)] (10) ∑n Note that for css-LDA, where each class is associated with a separate topic simplex, (10) differs from standard LDA in that the Λ parameters are class specific. [sent-305, score-0.6]
83 If an image BoW with N words, is modeled as a flat categorical distribution with par∑ameters Λv where v ∈ {1. [sent-318, score-0.287]
84 The cssLDA model, proposed in this work, models image words with distinct class specific simplices of topics as shown in fig 2(d). [sent-331, score-0.508]
85 The gradients of its evidence lower bound, therefore, produce an even larger histogram with class and topic specific word counts given an image. [sent-332, score-0.653]
86 C In = th 1e adnegde ϕnen1r t=e 2677 css-LDA performance is the best across number of topics, while for ts-sLDA the number of topics is equal to the number of classes. [sent-336, score-0.281]
87 These include the 8 class (N8), 13 class (N13) and 15 class (N15) datasets previously used in [8, 3, 14, 12], the 8 class sports dataset (S8) of [19] and a 50 class dataset (C50) constructed from the Coral image collection used in [7]. [sent-341, score-0.64]
88 Class Specific Topic Discovery in css-LDA We start with a set of experiments that provide insight on the topics discovered by css-LDA. [sent-346, score-0.353]
89 Figure 5 presents a visualization of the topic-conditional distributions Λzy (marked #1 to #10) discovered for classes “Bocce” (S8, left) and “Highway” (N13, right), using 10 topics per class. [sent-347, score-0.468]
90 Also shown is the class conditional distribution Λfylat (marked flat model) of the flat model. [sent-348, score-0.634]
91 This shows that, on average, topics discovered by css-LDA represent the class conditional distribution of the flat model. [sent-351, score-0.763]
92 In fact, the KL divergence between the average of the topic conditional distributions of css-LDA and the class conditional distribution of the flat model is very close to zero (0. [sent-352, score-0.877]
93 tAwlsoo i mshoagwens closest to the topic conditional distribution. [sent-361, score-0.355]
94 Note that the topics discovered by css-LDA capture the visual diversity of each class. [sent-362, score-0.37]
95 For example, “Bocce” topics #9, #7, #8 and #1 capture the diversity of environments on which sport can be played: indoors, sunny-outdoor, overcast-outdoor, and beach. [sent-363, score-0.298]
96 These variations are averaged out by the flat model, where each class is, in effect, modeled by a single topic. [sent-364, score-0.352]
97 Generative Classification Results We have previously reported that all the known LDA models are outperformed by their topic supervised (ts-) extensions. [sent-367, score-0.37]
98 In the discriminative classification framework, the css-LDA based image histogram was shown superior to the alternative image representations based on flat model and the unsupervised LDA. [sent-436, score-0.374]
99 Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. [sent-524, score-0.388]
100 MedLDA: maximum margin supervised topic models for regression and classification. [sent-563, score-0.37]
wordName wordTfidf (topN-words)
[('clda', 0.486), ('lda', 0.385), ('slda', 0.353), ('topic', 0.327), ('topics', 0.281), ('simplex', 0.244), ('flat', 0.224), ('class', 0.128), ('word', 0.116), ('distributions', 0.094), ('dirichlet', 0.082), ('bocce', 0.081), ('regularities', 0.073), ('discovered', 0.072), ('simplices', 0.067), ('softmax', 0.062), ('variational', 0.06), ('iq', 0.059), ('generative', 0.055), ('bow', 0.054), ('ndk', 0.054), ('classification', 0.054), ('py', 0.053), ('wn', 0.05), ('csslda', 0.046), ('allocation', 0.045), ('extensions', 0.045), ('supervised', 0.043), ('pw', 0.043), ('supervision', 0.041), ('yd', 0.04), ('bayes', 0.036), ('posterior', 0.036), ('categorical', 0.033), ('kv', 0.032), ('counts', 0.032), ('words', 0.032), ('histogram', 0.031), ('classlda', 0.03), ('discriminants', 0.03), ('dnk', 0.03), ('iksvm', 0.03), ('kwnd', 0.03), ('distribution', 0.03), ('blei', 0.03), ('dimensional', 0.029), ('schematic', 0.029), ('graphical', 0.029), ('nk', 0.028), ('supplement', 0.028), ('conditional', 0.028), ('nuno', 0.027), ('codebooks', 0.026), ('text', 0.026), ('defines', 0.025), ('endow', 0.025), ('equate', 0.025), ('nonmetric', 0.025), ('ucsd', 0.025), ('kd', 0.024), ('codebook', 0.024), ('extension', 0.024), ('popular', 0.024), ('unsupervised', 0.024), ('learned', 0.024), ('conditioned', 0.024), ('plsa', 0.024), ('pa', 0.023), ('semantic', 0.023), ('discriminative', 0.023), ('discovery', 0.023), ('latent', 0.023), ('movie', 0.023), ('highway', 0.022), ('induces', 0.022), ('decision', 0.022), ('acn', 0.021), ('dk', 0.021), ('discriminant', 0.021), ('indoors', 0.021), ('classes', 0.021), ('marked', 0.02), ('prior', 0.02), ('played', 0.019), ('evidence', 0.019), ('returning', 0.019), ('inability', 0.019), ('dn', 0.018), ('model', 0.018), ('vd', 0.018), ('discrimination', 0.018), ('iv', 0.017), ('delta', 0.017), ('kronecker', 0.017), ('outperform', 0.017), ('rule', 0.017), ('parameter', 0.017), ('exp', 0.017), ('day', 0.017), ('diversity', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
2 0.28801784 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
3 0.19130853 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
4 0.14436822 426 iccv-2013-Training Deformable Part Models with Decorrelated Features
Author: Ross Girshick, Jitendra Malik
Abstract: In this paper, we show how to train a deformable part model (DPM) fast—typically in less than 20 minutes, or four times faster than the current fastest method—while maintaining high average precision on the PASCAL VOC datasets. At the core of our approach is “latent LDA,” a novel generalization of linear discriminant analysis for learning latent variable models. Unlike latent SVM, latent LDA uses efficient closed-form updates and does not require an expensive search for hard negative examples. Our approach also acts as a springboard for a detailed experimental study of DPM training. We isolate and quantify the impact of key training factors for the first time (e.g., How important are discriminative SVM filters? How important is joint parameter estimation? How many negative images are needed for training?). Our findings yield useful insights for researchers working with Markov random fields and partbased models, and have practical implications for speeding up tasks such as model selection.
5 0.092935324 106 iccv-2013-Deep Learning Identity-Preserving Face Space
Author: Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Abstract: Face recognition with large pose and illumination variations is a challenging problem in computer vision. This paper addresses this challenge by proposing a new learningbased face representation: the face identity-preserving (FIP) features. Unlike conventional face descriptors, the FIP features can significantly reduce intra-identity variances, while maintaining discriminativeness between identities. Moreover, the FIP features extracted from an image under any pose and illumination can be used to reconstruct its face image in the canonical view. This property makes it possible to improve the performance of traditional descriptors, such as LBP [2] and Gabor [31], which can be extracted from our reconstructed images in the canonical view to eliminate variations. In order to learn the FIP features, we carefully design a deep network that combines the feature extraction layers and the reconstruction layer. The former encodes a face image into the FIP features, while the latter transforms them to an image in the canonical view. Extensive experiments on the large MultiPIE face database [7] demonstrate that it significantly outperforms the state-of-the-art face recognition methods.
6 0.072367467 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
7 0.070184492 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
8 0.069802538 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences
9 0.068941139 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
10 0.067380793 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
11 0.056143291 308 iccv-2013-Parsing IKEA Objects: Fine Pose Estimation
12 0.05258058 338 iccv-2013-Randomized Ensemble Tracking
13 0.051903605 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification
14 0.050612044 412 iccv-2013-Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding
15 0.049990766 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
16 0.048400585 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies
17 0.046353195 435 iccv-2013-Unsupervised Domain Adaptation by Domain Invariant Projection
18 0.044699948 104 iccv-2013-Decomposing Bag of Words Histograms
19 0.044632819 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
20 0.04440378 210 iccv-2013-Image Retrieval Using Textual Cues
topicId topicWeight
[(0, 0.112), (1, 0.057), (2, -0.018), (3, -0.034), (4, 0.022), (5, 0.021), (6, -0.013), (7, -0.004), (8, -0.024), (9, -0.057), (10, 0.057), (11, -0.057), (12, -0.019), (13, -0.012), (14, 0.001), (15, -0.023), (16, -0.044), (17, 0.018), (18, -0.003), (19, 0.007), (20, -0.021), (21, -0.026), (22, 0.033), (23, -0.042), (24, 0.028), (25, -0.015), (26, 0.088), (27, 0.04), (28, 0.069), (29, 0.109), (30, 0.069), (31, -0.02), (32, -0.022), (33, 0.063), (34, -0.072), (35, -0.015), (36, -0.121), (37, 0.01), (38, 0.009), (39, -0.017), (40, 0.051), (41, -0.007), (42, -0.099), (43, -0.127), (44, -0.075), (45, -0.096), (46, -0.021), (47, -0.024), (48, 0.118), (49, -0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.93997145 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
2 0.7795952 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
3 0.693286 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
4 0.49454734 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes
Author: Eran Swears, Anthony Hoogs, Kim Boyer
Abstract: Recognizing functional scene elemeents in video scenes based on the behaviors of moving objects that interact with them is an emerging problem ooff interest. Existing approaches have a limited ability to chharacterize elements such as cross-walks, intersections, andd buildings that have low activity, are multi-modal, or havee indirect evidence. Our approach recognizes the low activvity and multi-model elements (crosswalks/intersections) by introducing a hierarchy of descriptive clusters to fform a pyramid of codebooks that is sparse in the numbber of clusters and dense in content. The incorporation oof local behavioral context such as person-enter-building aand vehicle-parking nearby enables the detection of elemennts that do not have direct motion-based evidence, e.g. buuildings. These two contributions significantly improvee scene element recognition when compared against thhree state-of-the-art approaches. Results are shown on tyypical ground level surveillance video and for the first time on the more complex Wide Area Motion Imagery.
5 0.47720703 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation
Author: Xiatian Zhu, Chen Change Loy, Shaogang Gong
Abstract: Generating coherent synopsis for surveillance video stream remains a formidable challenge due to the ambiguity and uncertainty inherent to visual observations. In contrast to existing video synopsis approaches that rely on visual cues alone, we propose a novel multi-source synopsis framework capable of correlating visual data and independent non-visual auxiliary information to better describe and summarise subtlephysical events in complex scenes. Specifically, our unsupervised framework is capable of seamlessly uncovering latent correlations among heterogeneous types of data sources, despite the non-trivial heteroscedasticity and dimensionality discrepancy problems. Additionally, the proposed model is robust to partial or missing non-visual information. We demonstrate the effectiveness of our framework on two crowded public surveillance datasets.
7 0.46036863 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model
8 0.44719073 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies
9 0.43900281 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
10 0.43725899 258 iccv-2013-Low-Rank Sparse Coding for Image Classification
11 0.43214062 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification
12 0.42628452 352 iccv-2013-Revisiting Example Dependent Cost-Sensitive Learning with Decision Trees
13 0.41769379 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
14 0.41495442 158 iccv-2013-Fast High Dimensional Vector Multiplication Face Recognition
16 0.41120082 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
17 0.41052535 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations
18 0.40510333 193 iccv-2013-Heterogeneous Auto-similarities of Characteristics (HASC): Exploiting Relational Information for Classification
19 0.40400004 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes
20 0.40253145 142 iccv-2013-Ensemble Projection for Semi-supervised Image Classification
topicId topicWeight
[(2, 0.083), (4, 0.011), (7, 0.017), (13, 0.014), (25, 0.01), (26, 0.095), (31, 0.139), (35, 0.011), (40, 0.01), (42, 0.072), (48, 0.011), (64, 0.04), (69, 0.186), (73, 0.038), (78, 0.018), (89, 0.113), (93, 0.014), (98, 0.01)]
simIndex simValue paperId paperTitle
same-paper 1 0.79800284 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
2 0.74413025 38 iccv-2013-Action Recognition with Actons
Author: Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
Abstract: With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level “acton ” representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe theproperties ofbeing compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness ofapplying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.
3 0.73542702 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
4 0.72640657 357 iccv-2013-Robust Matrix Factorization with Unknown Noise
Author: Deyu Meng, Fernando De_La_Torre
Abstract: Many problems in computer vision can be posed as recovering a low-dimensional subspace from highdimensional visual data. Factorization approaches to lowrank subspace estimation minimize a loss function between an observed measurement matrix and a bilinear factorization. Most popular loss functions include the L2 and L1 losses. L2 is optimal for Gaussian noise, while L1 is for Laplacian distributed noise. However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian. To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise model. The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of noise distributions. The parameters of the MoG model can be estimated with a maximum likelihood method, while the subspace is computed with standard approaches. We illustrate the benefits of our approach in extensive syn- thetic and real-world experiments including structure from motion, face modeling and background subtraction.
5 0.7244482 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
6 0.72320396 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
7 0.70954108 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
8 0.70064747 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
9 0.69789273 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
10 0.69699633 205 iccv-2013-Human Re-identification by Matching Compositional Template with Cluster Sampling
11 0.69481301 180 iccv-2013-From Where and How to What We See
12 0.68617153 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
13 0.68333209 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
14 0.68109363 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
15 0.6708107 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging
16 0.66594458 210 iccv-2013-Image Retrieval Using Textual Cues
17 0.66538751 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors
18 0.66501772 19 iccv-2013-A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting
19 0.6635595 52 iccv-2013-Attribute Adaptation for Personalized Image Search
20 0.66259301 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors