iccv iccv2013 iccv2013-72 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
Reference: text
sentIndex sentText sentNum sentScore
1 edu c Abstract In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. [sent-2, score-0.642]
2 Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. [sent-3, score-0.394]
3 At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. [sent-4, score-0.744]
4 A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. [sent-5, score-0.355]
5 We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. [sent-6, score-0.546]
6 As illustrated in Figure 1, layouts convey significant information for both semantic interpretation (e. [sent-9, score-0.385]
7 Therefore, a good model of layouts is of fundamental importance. [sent-14, score-0.319]
8 Our primary goal here is to develop a layout model that can capture the common structures of outdoor scenes while allowing flexible variations. [sent-15, score-0.61]
9 In this work, we develop a generative model of layouts, which can be used in various vision tasks, including scene classification and semantic segmentation. [sent-25, score-0.296]
10 Moreover, leveraging the scene structures captured by the model, one can extrapolate the scenes beyond the visible scope. [sent-26, score-0.402]
11 Instead of modeling layouts explicitly, these models typically utilize spatial relations via potentials that couple semantic labels at different sites. [sent-30, score-0.549]
12 Generative models, unlike discriminative ones, often resort to hierarchical Bayesian models to describe a scene [1, 5, 14, 23]. [sent-33, score-0.144]
13 Taking advantage of the flexibility of graphi884411 cal models, they are able to express various relations in a complex scene, such as the ones between a scene and its parts [1, 23] and those between concurrent objects [5, 14]. [sent-34, score-0.239]
14 Since the introduction of Latent Dirichlet Allocation [3] to scene categorization [6], topic models have also been widely used in scene understanding [4, 16, 19, 26, 29]. [sent-35, score-0.692]
15 Whereas some of them take into account spatial relations, their treatment is often simplified focusing only on pairwise relations between objects or making assumptions that ignore important spatial dependencies. [sent-36, score-0.266]
16 Hence, the resultant models are generally not the most appropriate choices for characterizing the layouts of outdoor scenes. [sent-37, score-0.42]
17 Towards the goal of providing an effective layout model, we develop the Spatial topicprocess, a new formulation that – builds upon topic models and goes beyond by allowing distributions of topics to vary continuously across the image plane. [sent-38, score-1.233]
18 Specifically, to capture the statistical dependencies across both spatial locations and visual categories, we introduce a set of Gaussian processes (GPs) to generate a map of topic distributions. [sent-39, score-0.803]
19 These GPs are coupled via a latent representation that encodes the global scene structure. [sent-40, score-0.239]
20 This model provides a rich representation that can express layout variations through pixel-dependent topic distributions, and on the other hand ensures both local coherence and global structural consistency via the use of coupled GPs. [sent-41, score-0.944]
21 This new layout model is useful for a variety of vision problems. [sent-42, score-0.328]
22 We demonstrate its practical utility on three applications: (1) scene classification using the layout representation, (2) semantic segmentation based on spatially varying topic distributions, and (3) layout hallucination, a task trying to extrapolate beyond the visible part of a scene. [sent-43, score-1.681]
23 Related Work This work is related to several models developed in recent years that try to incorporate spatial relations into topic models. [sent-45, score-0.634]
24 Wang and Grimson proposed Spatial LDA [26], where each pixel is assigned a topic chosen from a local document. [sent-46, score-0.47]
25 This model enables spatial variation of topics, but ignores the dependencies between topic assignments by assuming that they are independently chosen. [sent-47, score-0.657]
26 [29] goes one step further by introducing an MRF to encourage coherent topic assignment. [sent-49, score-0.47]
27 [19] proposed a reconfigurable model for scene recognition, which treats a scene as a composite of a fixed number of rectangular regions, each governed by a topic. [sent-52, score-0.267]
28 While allowing flexible topic-region association, it does not take into account the dependencies between topic assignments either. [sent-53, score-0.625]
29 There has been other work that combines latent GPs for spatially coherent segmentation [8, 21, 22]. [sent-54, score-0.185]
30 Sudderth and Jordan [22] proposed a formulation of dependent PitmanYor processes (DPY), where spatial dependencies are induced via thresholded GPs. [sent-55, score-0.3]
31 It is, however, important to note that there is a fundamental aspect that distinguishes our work from this paper: we aim to learn a generative model that is able to capture the prior structure of outdoor scenes, such that one can sample new scenes from it or infers missing parts of a scene. [sent-56, score-0.251]
32 Generative Model of Layouts Following the paradigm of topic models, we characterize an image by a set of visual worlds: S = {(xi , yi, wi)}in=1 . [sent-60, score-0.503]
33 aHne rime, xi a bnyd yi are ft vheis pixel ocrolodrsd:i nSat =es {o(fx the i-th )v}isual word, and wi is the quantized label. [sent-61, score-0.176]
34 We aim to develop a generative model to explain the spatial configuration of S. [sent-62, score-0.186]
35 Given zi, one can draw the visual word wi from the corresponding topic. [sent-67, score-0.226]
36 Therefore, it is desirable to jointly model the distributions of zi over the entire image so as to capture the correlations between them. [sent-71, score-0.232]
37 In particular, we develop a probabilistic model called Spatial Topic Process that can generate a continuous map of topic distributions based on a set of coupled Gaussian processes. [sent-72, score-0.741]
38 (1) Generate a continuous map of topic distributions as θ ∼ θ|λ. [sent-77, score-0.645]
39 Here, θ(x, y) is the predicted tdoisptircib duitsitorinb uotfi topics θa t∼ (x θ,| λy). [sent-78, score-0.156]
40 (3) Draw the topic indicator zi at each location from θi ? [sent-82, score-0.546]
41 (4) Draw the visual word wi from the corresponding topic βzi . [sent-84, score-0.696]
42 884422 from a set of coupled Gaussian processes (with parameter λ), where each map corresponds to a topic. [sent-85, score-0.173]
43 Finally, at each sample point (xi ,yi), a topic is chosen according to θi, and then a visual word wi is drawn from the word distribution of the corresponding topic, i. [sent-87, score-0.825]
44 To begin with, we first consider a simpler problem – devising a joint distribution to incorporate correlations between a finite set of discrete distributions θ1 , . [sent-93, score-0.222]
45 By further extending the finite dimensional Gaussian distributions in Eq. [sent-112, score-0.191]
46 Within each topic are links between values at neighboring grid points. [sent-128, score-0.604]
47 There are also links (depicted in orange color) between values for different topics at corresponding grid points. [sent-129, score-0.29]
48 However, this is not an appropriate design in the context of scene layout modeling, where both the mean and the variance are location dependent. [sent-134, score-0.439]
49 We first define a Gaussian distribution over a finite grid and then extend it to a Gaussian process via smooth interpolation. [sent-136, score-0.161]
50 =1 884433 Here, wj (v) = exp(−d(v, sj)2/σ2g) is a weight value that reflects t(hev )in =flu eexnpce(− odf( tvh,es j-th seed to v. [sent-154, score-0.207]
51 (3) and (4) introduce K Gaussian processes each can be characterized by a finite dimensional Gaussian distributions using the grid-based parametrization as above. [sent-162, score-0.304]
52 (5) ensures local coherence, while Gaussian distributions over the grid capture long range spatial relations. [sent-164, score-0.322]
53 To capture such relations, it is desirable to further couple all GPs this can be achieved through a joint distributions over the grid values for all topics. [sent-169, score-0.22]
54 Empirical testings showed that a 6-by-6 grid suffices to express most variations in the – – layout of natural scenes, and regions roughly fall into 20 to 30 categories (e. [sent-171, score-0.489]
55 ,k,l Here, g is an mK-dimensional vector that contains all values at grid points, which we call the latent layout representation, and is the value for the k-th topic at the i-th grid point. [sent-184, score-1.056]
56 As shown in Figure 3, this GMRF comprises two types of links: the ones between values for the same topic at neighboring sites (i ∼ j indicates i auends j are neighbors), ca antd n ethigosheb oberitnwge seinte vsa (liue ∼s fjo irn ddiifcfaetreesnt i topics at the same site. [sent-186, score-0.626]
57 Given β (the word distributions) and λ (the param- eter of the Spatial Topic Process), the joint probability of these visual words and their associated topic indicators is ? [sent-192, score-0.712]
58 (6), is the prior of tHheer ela,te pn(tg layout representation. [sent-197, score-0.362]
59 p(zj |xj , yj , igs) hise th prei topic probability at (xj , yj), which is define|xd by Eq. [sent-198, score-0.61]
60 p(wj |zj ; β) is the probability of choosing visual word wj from th|ez topic βzj . [sent-202, score-0.839]
61 Inference and Learning Algorithms This section presents algorithms to infer layouts of images and to learn model parameters. [sent-204, score-0.356]
62 Inferring Layouts Given the model parameters, including λ and β, we can derive the latent layout representation g of a new image as follows. [sent-207, score-0.396]
63 Each word is represented by a triple (xj , yj , wj ), and is associated with a hidden variable zj that assigns it to a topic. [sent-209, score-0.63]
64 zK=1 Here, p(wj |xj , yj , g) = p(wj |z)p(z|xj , yj , g) . [sent-214, score-0.28]
65 Learning Model Parameters The goal of learning is to estimate the word distribution βk of each topic, and the GP parameter λ that governs the spatially varying topic distribution. [sent-234, score-0.64]
66 Suppose pixel-wise topic labeling is provided for each training image. [sent-237, score-0.517]
67 Here, (xj , yj) is the coordinate, wj is the word label, and zj is the topic label. [sent-239, score-0.96]
68 both the model parameter λ and the latent layout representations g1, . [sent-247, score-0.396]
69 The basic idea is to treat the topic indicators for such images as hidden variables, and use E-steps to infer the expected probabilities of their values, as in Eq. [sent-272, score-0.547]
70 Applications and Experiments We conducted experiments on three applications scene classification, semantic segmentation, and layout hallucination to test the practical utility of the proposed model. [sent-276, score-0.636]
71 In the legend, SPM-Lk refers to spatial pyramid matching with k-levels, STP-k refers to spatial topic process on a k k grid. [sent-290, score-0.706]
72 We found empirically that this feature tends to achieve better performance than dense SIFT in outdoor scenes, as significant parts of such scenes are textured regions instead of objects. [sent-294, score-0.17]
73 We learned the layout models from the training set following the procedure described in section 4. [sent-296, score-0.328]
74 In specific, we set the prior count α to 10−4 in estimating the word distributions of each topic. [sent-297, score-0.288]
75 We learned the spatial topic processes on three grid sizes 3 3, 4 4 and 6 6 over a standard image soinzteh 2re56e g×r i2d5s6i,z aensd3 ×set3 σg ×to4 8a0n,d d660×, a6ndov 4e0r respectively. [sent-298, score-0.78]
76 Scene Classification Given an image I, one can infer the latent layout representation g using the optimization algorithm presented in section 4. [sent-302, score-0.433]
77 We observe: (1) For the proposed method (STP), the classification accuracy increases when using finer grids, which suggests that local scale variations in the layouts convey useful information for classification. [sent-310, score-0.354]
78 (2) STP outperforms SPM when using a 4 4 or 6 6 grid, which indifcoartemss sth SaPtM Mdis wchriemnin uastiinvge ain 4fo ×rm 4at oiorn 6 i s× effectively captured by the layout representation. [sent-311, score-0.36]
79 dimnso104SPTM105 image, the inferred layout (using a 4 4 grid), the result by our method (based on the inferred layout), and the result by SLDA. [sent-315, score-0.388]
80 Particularly, timhea image t ion fveirsrueadli lzaey tohuet i(nusfeirnrge da layout irisd generated by mixing tehtheo cdol (obrass eodf doinf tfehere nintf topics using t,h aen probabilities θ(x, y) as weights. [sent-316, score-0.484]
81 However, it is interesting to notice that such increase is much faster for STP than for others a small subset of visual words is sufficient to estimate the layout reliably. [sent-320, score-0.401]
82 Given an image I, we first oversegment it into super-pixels using SLIC [2], and then obtain a semantic segmentation by assigning a label to each super On MSRC On SUN 0 . [sent-335, score-0.186]
83 Note that one can derive a continuous map θ of topic distributions from the layout representation g using Eq. [sent-343, score-0.973]
84 We can then combine this prior with the visual words within a super pixel to infer its topic label. [sent-345, score-0.658]
85 Specifically, let zs denote the label of a super pixel s, then its posterior distribution is given by p(zs|s; θ) ∝ ? [sent-346, score-0.156]
86 s Here, we use i∈ s to indicate the i-th visual word is within tHheer super-pixel s. [sent-351, score-0.162]
87 For comparison, we also implemented a evxapri(aηnt of spatial LDA [26, 29], which incorporates an MRF to enforce coherence between topics allocated to neighboring pixels. [sent-355, score-0.311]
88 Figure 6 shows part of the segmentation results obtained on the SUN dataset, which accurately reflect the scene structures and have very good spatial coherence. [sent-359, score-0.328]
89 As we can see, the inferred layouts capture the spatial structures very well, thus substantially reducing the ambiguities of labeling. [sent-362, score-0.521]
90 Layout Hallucination It is an interesting phenomenon in human vision system that people often remember seeing a surrounding region of a scene that was not visible in the view. [sent-370, score-0.2]
91 These findings lead us to the belief that a model that effectively captures the visual structures of a scene category should be able to extrapolate beyond the input images. [sent-373, score-0.31]
92 Specifically, we solve the optimal layout representation g based on a subset of visual words extracted from the visible part, and use it to generate the entire layout. [sent-376, score-0.459]
93 As more regions are revealed, the true layout is gradually recovered. [sent-380, score-0.361]
94 These results demonstrates our model’s capability of extrapolating layouts beyond the visible part. [sent-383, score-0.463]
95 Conclusions We presented a novel approach to layout modeling. [sent-385, score-0.328]
96 At the heart of this model is a spatial topic process which uses a set of coupled Gaussian processes to generate topic distributions that vary continuously across the image plane. [sent-386, score-1.417]
97 Using the grid-based parameterization, we further derived a finite dimensional representation of layouts that captures the correlations across both locations and topics. [sent-387, score-0.416]
98 The experiments on both scene classification and semantic segmentation showed that the proposed methods achieve considerable improvement over state-of-the-art, which is owning to the strong structural prior provided by the layout model. [sent-388, score-0.65]
99 We also performed experiments on layout hallucination, which demonstrates that our model is able to extrapolate scene layouts beyond the visible part. [sent-389, score-0.943]
100 Shared segmentation of natural scenes using dependent pitman-yor processes. [sent-542, score-0.143]
wordName wordTfidf (topN-words)
[('topic', 0.47), ('layout', 0.328), ('layouts', 0.319), ('wj', 0.207), ('topics', 0.156), ('zj', 0.154), ('yj', 0.14), ('word', 0.129), ('distributions', 0.125), ('processes', 0.113), ('zs', 0.112), ('scene', 0.111), ('spatial', 0.102), ('xj', 0.098), ('grid', 0.095), ('msrc', 0.093), ('hallucination', 0.09), ('gps', 0.089), ('extrapolate', 0.086), ('dependencies', 0.085), ('stp', 0.082), ('gaussian', 0.077), ('zi', 0.076), ('segmentation', 0.076), ('gi', 0.076), ('outdoor', 0.07), ('latent', 0.068), ('scenes', 0.067), ('semantic', 0.066), ('finite', 0.066), ('wi', 0.064), ('dirichlet', 0.063), ('sudderth', 0.062), ('relations', 0.062), ('coupled', 0.06), ('visible', 0.058), ('eext', 0.058), ('hills', 0.058), ('sun', 0.054), ('coherence', 0.053), ('exp', 0.053), ('continuous', 0.05), ('generative', 0.048), ('eint', 0.048), ('gmrf', 0.048), ('parizi', 0.048), ('labeling', 0.047), ('yi', 0.047), ('logp', 0.046), ('reconfigurable', 0.045), ('capability', 0.045), ('super', 0.044), ('zij', 0.043), ('covariance', 0.042), ('beyond', 0.041), ('spatially', 0.041), ('superparsing', 0.041), ('tighe', 0.041), ('utility', 0.041), ('words', 0.04), ('lda', 0.04), ('continuously', 0.04), ('indicators', 0.04), ('softmax', 0.04), ('links', 0.039), ('structures', 0.039), ('thheer', 0.038), ('qj', 0.038), ('heart', 0.037), ('allowing', 0.037), ('infer', 0.037), ('meanings', 0.036), ('develop', 0.036), ('dg', 0.035), ('classification', 0.035), ('prior', 0.034), ('mrf', 0.034), ('regions', 0.033), ('express', 0.033), ('visual', 0.033), ('flexible', 0.033), ('xi', 0.033), ('slic', 0.033), ('concurrent', 0.033), ('hierarchical', 0.033), ('quantized', 0.032), ('pyramid', 0.032), ('sth', 0.032), ('distinguishes', 0.032), ('devised', 0.032), ('jt', 0.032), ('jordan', 0.032), ('stochastic', 0.031), ('characterizing', 0.031), ('correlations', 0.031), ('holistic', 0.031), ('gp', 0.031), ('seeing', 0.031), ('substantially', 0.031), ('inferred', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
2 0.28801784 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
3 0.26716447 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
Author: Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
Abstract: In this paper we propose an approach to jointly estimate the layout ofrooms as well as the clutterpresent in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent decomposition of additive potentials. We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.
4 0.23981112 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
Author: Alexander G. Schwing, Sanja Fidler, Marc Pollefeys, Raquel Urtasun
Abstract: In this paper we propose an approach to jointly infer the room layout as well as the objects present in the scene. Towards this goal, we propose a branch and bound algorithm which is guaranteed to retrieve the global optimum of the joint problem. The main difficulty resides in taking into account occlusion in order to not over-count the evidence. We introduce a new decomposition method, which generalizes integral geometry to triangular shapes, and allows us to bound the different terms in constant time. We exploit both geometric cues and object detectors as image features and show large improvements in 2D and 3D object detection over state-of-the-art deformable part-based models.
5 0.23767075 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
6 0.14230481 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
7 0.11864164 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
8 0.099886641 197 iccv-2013-Hierarchical Joint Max-Margin Learning of Mid and Top Level Representations for Visual Recognition
9 0.099563114 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions
10 0.098194018 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
11 0.08829774 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations
12 0.088199429 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
13 0.085422434 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps
14 0.084581077 187 iccv-2013-Group Norm for Learning Structured SVMs with Unstructured Latent Variables
15 0.083268747 410 iccv-2013-Support Surface Prediction in Indoor Scenes
16 0.082125217 85 iccv-2013-Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach
17 0.081776857 294 iccv-2013-Offline Mobile Instance Retrieval with a Small Memory Footprint
18 0.08145801 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
19 0.080481477 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
20 0.08016292 236 iccv-2013-Learning Discriminative Part Detectors for Image Classification and Cosegmentation
topicId topicWeight
[(0, 0.223), (1, 0.018), (2, 0.007), (3, -0.027), (4, 0.071), (5, 0.031), (6, -0.065), (7, -0.006), (8, -0.043), (9, -0.129), (10, 0.057), (11, 0.005), (12, -0.073), (13, 0.0), (14, -0.031), (15, -0.049), (16, -0.087), (17, 0.001), (18, -0.04), (19, -0.077), (20, -0.106), (21, -0.067), (22, 0.118), (23, -0.084), (24, 0.117), (25, -0.094), (26, 0.175), (27, 0.026), (28, 0.061), (29, 0.153), (30, 0.027), (31, 0.001), (32, 0.009), (33, 0.174), (34, -0.011), (35, 0.003), (36, -0.17), (37, -0.022), (38, -0.016), (39, 0.039), (40, 0.028), (41, 0.016), (42, -0.056), (43, -0.124), (44, -0.075), (45, -0.071), (46, -0.029), (47, 0.017), (48, 0.115), (49, -0.121)]
simIndex simValue paperId paperTitle
same-paper 1 0.95740163 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
2 0.81294334 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos
Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.
3 0.71099466 144 iccv-2013-Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors
Author: Jian Zhang, Chen Kan, Alexander G. Schwing, Raquel Urtasun
Abstract: In this paper we propose an approach to jointly estimate the layout ofrooms as well as the clutterpresent in the scene using RGB-D data. Towards this goal, we propose an effective model that is able to exploit both depth and appearance features, which are complementary. Furthermore, our approach is efficient as we exploit the inherent decomposition of additive potentials. We demonstrate the effectiveness of our approach on the challenging NYU v2 dataset and show that employing depth reduces the layout error by 6% and the clutter estimation by 13%.
4 0.70319533 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation
Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang
Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.
5 0.69531608 64 iccv-2013-Box in the Box: Joint 3D Layout and Object Reasoning from Single Images
Author: Alexander G. Schwing, Sanja Fidler, Marc Pollefeys, Raquel Urtasun
Abstract: In this paper we propose an approach to jointly infer the room layout as well as the objects present in the scene. Towards this goal, we propose a branch and bound algorithm which is guaranteed to retrieve the global optimum of the joint problem. The main difficulty resides in taking into account occlusion in order to not over-count the evidence. We introduce a new decomposition method, which generalizes integral geometry to triangular shapes, and allows us to bound the different terms in constant time. We exploit both geometric cues and object detectors as image features and show large improvements in 2D and 3D object detection over state-of-the-art deformable part-based models.
6 0.63286084 433 iccv-2013-Understanding High-Level Semantics by Modeling Traffic Patterns
7 0.59750426 201 iccv-2013-Holistic Scene Understanding for 3D Object Detection with RGBD Cameras
8 0.56815076 132 iccv-2013-Efficient 3D Scene Labeling Using Fields of Trees
9 0.56337607 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling
10 0.56295019 1 iccv-2013-3DNN: Viewpoint Invariant 3D Geometry Matching for Scene Understanding
11 0.5405218 2 iccv-2013-3D Scene Understanding by Voxel-CRF
12 0.53760588 331 iccv-2013-Pyramid Coding for Functional Scene Element Recognition in Video Scenes
13 0.5205161 246 iccv-2013-Learning the Visual Interpretation of Sentences
14 0.51689386 42 iccv-2013-Active MAP Inference in CRFs for Efficient Semantic Segmentation
15 0.50515229 216 iccv-2013-Inferring "Dark Matter" and "Dark Energy" from Videos
16 0.47597939 410 iccv-2013-Support Surface Prediction in Indoor Scenes
17 0.47593379 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation
18 0.46451455 412 iccv-2013-Synergistic Clustering of Image and Segment Descriptors for Unsupervised Scene Understanding
19 0.4637627 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection
20 0.458729 234 iccv-2013-Learning CRFs for Image Parsing with Adaptive Subgradient Descent
topicId topicWeight
[(2, 0.073), (7, 0.018), (12, 0.011), (26, 0.081), (31, 0.436), (42, 0.07), (64, 0.033), (73, 0.022), (89, 0.158)]
simIndex simValue paperId paperTitle
1 0.92606843 345 iccv-2013-Recognizing Text with Perspective Distortion in Natural Scenes
Author: Trung Quy Phan, Palaiahnakote Shivakumara, Shangxuan Tian, Chew Lim Tan
Abstract: This paper presents an approach to text recognition in natural scene images. Unlike most existing works which assume that texts are horizontal and frontal parallel to the image plane, our method is able to recognize perspective texts of arbitrary orientations. For individual character recognition, we adopt a bag-of-keypoints approach, in which Scale Invariant Feature Transform (SIFT) descriptors are extracted densely and quantized using a pre-trained vocabulary. Following [1, 2], the context information is utilized through lexicons. We formulate word recognition as finding the optimal alignment between the set of characters and the list of lexicon words. Furthermore, we introduce a new dataset called StreetViewText-Perspective, which contains texts in street images with a great variety of viewpoints. Experimental results on public datasets and the proposed dataset show that our method significantly outperforms the state-of-the-art on perspective texts of arbitrary orientations.
2 0.89176905 408 iccv-2013-Super-resolution via Transform-Invariant Group-Sparse Regularization
Author: Carlos Fernandez-Granda, Emmanuel J. Candès
Abstract: We present a framework to super-resolve planar regions found in urban scenes and other man-made environments by taking into account their 3D geometry. Such regions have highly structured straight edges, but this prior is challenging to exploit due to deformations induced by the projection onto the imaging plane. Our method factors out such deformations by using recently developed tools based on convex optimization to learn a transform that maps the image to a domain where its gradient has a simple group-sparse structure. This allows to obtain a novel convex regularizer that enforces global consistency constraints between the edges of the image. Computational experiments with real images show that this data-driven approach to the design of regularizers promoting transform-invariant group sparsity is very effective at high super-resolution factors. We view our approach as complementary to most recent superresolution methods, which tend to focus on hallucinating high-frequency textures.
same-paper 3 0.87660354 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes
Author: Dahua Lin, Jianxiong Xiao
Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –
4 0.8511132 38 iccv-2013-Action Recognition with Actons
Author: Jun Zhu, Baoyuan Wang, Xiaokang Yang, Wenjun Zhang, Zhuowen Tu
Abstract: With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level “acton ” representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe theproperties ofbeing compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness ofapplying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.
5 0.8475821 357 iccv-2013-Robust Matrix Factorization with Unknown Noise
Author: Deyu Meng, Fernando De_La_Torre
Abstract: Many problems in computer vision can be posed as recovering a low-dimensional subspace from highdimensional visual data. Factorization approaches to lowrank subspace estimation minimize a loss function between an observed measurement matrix and a bilinear factorization. Most popular loss functions include the L2 and L1 losses. L2 is optimal for Gaussian noise, while L1 is for Laplacian distributed noise. However, real data is often corrupted by an unknown noise distribution, which is unlikely to be purely Gaussian or Laplacian. To address this problem, this paper proposes a low-rank matrix factorization problem with a Mixture of Gaussians (MoG) noise model. The MoG model is a universal approximator for any continuous distribution, and hence is able to model a wider range of noise distributions. The parameters of the MoG model can be estimated with a maximum likelihood method, while the subspace is computed with standard approaches. We illustrate the benefits of our approach in extensive syn- thetic and real-world experiments including structure from motion, face modeling and background subtraction.
6 0.76749468 275 iccv-2013-Motion-Aware KNN Laplacian for Video Matting
7 0.72223568 269 iccv-2013-Modeling Occlusion by Discriminative AND-OR Structures
8 0.68723631 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification
9 0.66951632 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection
10 0.65136141 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction
11 0.64560443 210 iccv-2013-Image Retrieval Using Textual Cues
12 0.64236152 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions
13 0.61931145 180 iccv-2013-From Where and How to What We See
14 0.61459064 415 iccv-2013-Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors
15 0.60734046 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging
16 0.59960669 19 iccv-2013-A Learning-Based Approach to Reduce JPEG Artifacts in Image Matting
17 0.59760308 192 iccv-2013-Handwritten Word Spotting with Corrected Attributes
18 0.59436214 287 iccv-2013-Neighbor-to-Neighbor Search for Fast Coding of Feature Vectors
19 0.59182477 156 iccv-2013-Fast Direct Super-Resolution by Simple Functions
20 0.58150744 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation