iccv iccv2013 iccv2013-59 knowledge-graph by maker-knowledge-mining

59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation


Source: pdf

Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang

Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract We address the problem of localisation of objects as bounding boxes in images with weak labels. [sent-6, score-0.521]

2 This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. [sent-7, score-1.082]

3 We propose a novel framework based on Bayesian joint topic modelling. [sent-8, score-0.348]

4 Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. [sent-9, score-0.424]

5 (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. [sent-10, score-0.244]

6 (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. [sent-11, score-0.839]

7 For example, for many vision tasks such as object classification [22], detection [13], and segmentation [19, 18] hundreds or even thousands of object samples must be annotated from images for each object classes. [sent-17, score-0.299]

8 Most of them address the problem as a weakly supervised learning problem, particularly as a multi-instance learning (MIL) problem, where images are bags, and potential object locations are instances. [sent-25, score-0.577]

9 These methods are typically discriminative in nature and attempt to localise each class of objects independently from the other classes, even when the weak labels indicate that different types of objects co-exist in the same images (see Fig. [sent-26, score-0.371]

10 However, localising objects of different classes independently rather than jointly brings about a number of limitations: (1) The knowledge that multiple objects co-exist within each image is not exploited. [sent-28, score-0.327]

11 (2) Although different object classes have different appearances, the background appearance is relevant to them all. [sent-31, score-0.303]

12 When different classes are modelled independently, the background appearance must be re-learned re222999888444 peatedly for each class, when it would be more statistically robust to share this common knowledge between classes. [sent-32, score-0.334]

13 Beyond joint versus independent learning there is the issue of encoding prior knowledge or top-down cues about appearance, which is very important to obtain good WSOL performance [10, 26]. [sent-33, score-0.294]

14 However, existing approaches provide no mechanism for learning from unlabelled data together with weakly labelled data for object localisation (i. [sent-39, score-1.167]

15 This limitation is also related to the lack of joint learning, because for SSL joint learning is important to disambiguate the unlabelled images. [sent-42, score-0.479]

16 In this paper, a novel framework based on Bayesian latent topic models is proposed to overcome the previously mentioned limitations. [sent-43, score-0.327]

17 In our framework, both multiple object classes and different types of backgrounds are modelled jointly in a single generative model as latent topics, in order to explicitly exploit their correlations (see Fig. [sent-44, score-0.408]

18 As bagof-words (BoW) models, conventional latent topic models have no notion of localisation. [sent-46, score-0.419]

19 We overcome this problem by incorporating an explicit notion ofobject location, alongside the ability to incorporate prior knowledge about object appearance in a fully Bayesian approach. [sent-47, score-0.365]

20 Importantly, as a joint generative model, unlabelled data can now be easily used to compensate for sparse training annotations, simply by allowing the model to also infer both which unknown objects are present in those images and where they are. [sent-48, score-0.407]

21 Related Work Weakly supervised object localisation Weakly supervised learning (WSL) has attracted increasing attention as the volume of data which we are interested in learning from grows much faster than available annotations. [sent-50, score-0.864]

22 Weakly supervised object localisation (WSOL) is of particular interest [10, 28, 27, 24, 26, 22, 8], due to the onerous demands of annotating object location information. [sent-51, score-0.728]

23 However, only relatively recently have localisation models capable of learning from challenging data such as PASCAL VOC 2007 been proposed [10, 28, 27, 24, 26]. [sent-53, score-0.495]

24 One of the first studies to address this was [10] which employed a conditional random field and generic prior object knowledge learned from a fully annotated dataset. [sent-55, score-0.349]

25 In contrast to these studies, which are all based on discriminative models, we introduce a generative topic model based approach which retains the benefits of both intra- and inter-class cues, as well as the potential for exploiting both spatial and appearance priors. [sent-58, score-0.385]

26 Moreover, it uniquely exploits joint multi-label learning of all object classes simultaneously, as well as enables semi-supervised learning which allows annotation requirements to be further reduced. [sent-59, score-0.487]

27 Most studies have addressed the simpler tasks of learning classification [3 1, 7, 20] or annotation [3 1, 20, 4], rather than localisation which we are interested in here. [sent-61, score-0.611]

28 This is because conventional topic models have no explicit notion of the spatial location and extent of an object in an image; and because supervised topic models such as CorrLDA [4] and derivatives [3 1] allow much less direct supervision than we will exploit here. [sent-62, score-0.983]

29 Nevertheless topic models have good potential for this challenge because they can be modified for multi-label weakly supervised learning [14, 17], and can then reason jointly about multiple objects in each image. [sent-64, score-0.789]

30 In this paper we address the limitations of existing topic models for this task by incorporating an explicit notion of object location; and developing a Bayesian model with the ability to incorporate prior knowledge about object appearance (e. [sent-66, score-0.678]

31 Other joint learning approaches An approach similar in spirit to ours in the sense of jointly learning a model for all classes is that of Cabral et al. [sent-69, score-0.341]

32 However we add two key factors of (i) a stronger notion of the spatial location and extent of each object, and (ii) the ability to encode human knowledge or transferred knowledge through Bayesian priors. [sent-72, score-0.256]

33 Our contributions are threefold: (1) We propose the novel concept of joint modelling of all object classes and backgrounds for weakly supervised object localisation. [sent-78, score-0.779]

34 (2) We formulate a novel Bayesian topic model suitable for localisation of object and utilising various types of prior knowledge available. [sent-79, score-0.929]

35 (3) We provide a solution for exploiting unlabelled data for weakly supervised object localisation. [sent-80, score-0.682]

36 Methods In this section, we introduce our new latent topic model (LTM) [5] approach to the weakly-supervised object localisation task, and the associated learning algorithms. [sent-83, score-0.894]

37 Applied to images, conventional LTMs factor images into combinations of latent topics [25, 29]. [sent-84, score-0.266]

38 We will achieve this in a fully Bayesian LTM framework by applying weak supervision to partially constrain the available topics for each image. [sent-87, score-0.308]

39 If there are C classes of objects to be localised, Kfg = C of these will represent the (foreground) classes, and Kbg = K − Kfg topics will model background data to be explained away. [sent-96, score-0.394]

40 Tfg and Tbg index foreground and background topics respectively. [sent-97, score-0.349]

41 Each topic will encode a distribution over the Nv sized appearance vocabulary, and over the spatial location of these words within each image. [sent-98, score-0.372]

42 For each foreground topic k ∈ Tfg draw a location dFiosrtri ebacuthio fno:r {μkj , Λkj o}p i∼c N kW ∈( Tμk0 , Λk0, βk0, vk0) 3. [sent-113, score-0.464]

43 To learn the model and localise all the weakly annotated ob- jects, we wish to infer the posterior p(H|O, Π) = p({yj, μjk,Λjk, {πk}kK|{xj, lj}jJ=1, Π). [sent-125, score-0.541]

44 In conventional topic models the α parameter encodes the expected proportion of words for each topic. [sent-143, score-0.345]

45 (3) has the effect of factoring images into combinations of latent topics; where Kbg background topics are always available to explain away backgrounds, and Kfg foreground topics are only available to images with annotated classes. [sent-147, score-0.738]

46 Encoding human or transferred knowledge via Bayesian prior An important capability of our Bayesian approach is that top-down prior knowledge from human expertise, or other transferrable cues can be encoded. [sent-149, score-0.315]

47 This knowledge is encoded via the Gaussian foreground topic spatial distribution and the uniform background topic distribution. [sent-152, score-0.79]

48 (c) in generative process) Second, aggregated across all images, the background is more dominant than any single object class in terms of size (hence the amount of visual w? [sent-154, score-0.267]

49 This prior knowledge is essentially the fact that foreground objects stand out against background, and thus is related to the notion of saliency, not within an image, but across all images. [sent-177, score-0.311]

50 Apart from these two types of human knowledge, other human or transferrable knowledge extracted from auxiliary labelled data can also be readily integrated into our model via the Bayesian priors. [sent-180, score-0.27]

51 For example, if there is prior knowledge about the appearance of individual classes (e. [sent-181, score-0.274]

52 , by obtaining the opinion of a generic object detector or object saliency model [1] on images labelled with class c), then this can be encoded via the appearance prior by specifying an informative πc0 set to the average statistics of the generic object bounding-boxes. [sent-183, score-0.495]

53 In summary, our Bayesian joint topic model is flexible and versatile in allowing use of any knowledge available additional to the weak labels. [sent-184, score-0.464]

54 Importantly, unknown images can include those from the same pool of classes but without annotation (for which the posterior q(θ) will pick out the present classes), or those from a completely disjoint pool of classes (for which the q(θ) will encode only background). [sent-188, score-0.311]

55 Thereafter, any strategy for heat-map based localisation may be used. [sent-198, score-0.428]

56 Experiments After briefly introducing the evaluation datasets, we first compare our model with state-of-the-art on localising objects in weakly annotated images in Sec. [sent-201, score-0.424]

57 Datasets We use the challenging PASCAL VOC 2007 dataset that has become widely used for weakly supervised annotation. [sent-213, score-0.371]

58 Note that VOC07-20 is different to the Pascal07-all defined in [10] which actually contains 14 classes as the other 6 were used as fully annotated auxiliary data. [sent-216, score-0.277]

59 Settings For our model, we set the foreground topic number Nfg to be equal to the number of classes, and Nbg = 20 for background topics. [sent-218, score-0.449]

60 Their performance is compared against two variations of our models, Our-Sampling and Our-Gaussian which differ only in the final object localisation step (see Sec. [sent-227, score-0.5]

61 Require 10 out of the 20 classes fully annotated with bounding-boxes and used as auxiliary data. [sent-240, score-0.277]

62 [26] take a transfer learning approach and require a fully annotated auxiliary dataset. [sent-243, score-0.276]

63 Initial localisation Table 1 reports the initial annotation accuracy of our model compared with state-of-the-art. [sent-247, score-0.502]

64 Refined by detector After the initial annotation of the weakly labelled images, a conventional strong object detector can be trained using these annotations as ground truth. [sent-250, score-0.546]

65 In particular, even after this costly refinement process, the localisation accuracy of many competitors is inferior to our model without the refinement. [sent-254, score-0.516]

66 Semi-supervised Learning One important advantage of our model is the ability to utilise completely unlabelled data, to further reduce the manual annotation requirements. [sent-258, score-0.343]

67 To demonstrate this we randomly select 10% of the VOC07-6×2 data as weakly lraanbedlolemdl training 1d0a%ta, afn tdh eth VeOn a0d7d- d×if2fe rdeantat u ansla wbeeallkeldy data. [sent-259, score-0.256]

68 × only 5 weakly labelled images per class for the VOC076×2 dataset, which is significantly less than any previous 6m×e2tho dda exploits. [sent-262, score-0.428]

69 Finally, two evaluation procedures are considered: (i) Evaluating localisation performance on the initially annotated 10% (standard WSOL task); and (ii) WSOL performance on the held out VOC07-6×2 test set. [sent-265, score-0.511]

70 s Tcehniasr iloat twerhe prroe tcehed uloreca cloisrarteisopno nmdosde tol is trained on one database and needs to be applied online to localise objects in incoming weakly labelled images. [sent-267, score-0.572]

71 From the results shown in Table 2, our model is clearly capable of exploiting unlabelled data to good effect. [sent-272, score-0.239]

72 More impressively, even if only a third of the provided unlabelled data is at all relevant (10%L+AllU), good performance is still obtained. [sent-275, score-0.239]

73 This result shows that our approach has good promise for effective use in economically realistic scenarios of learning from only few weak annotations and a large volume of only partially relevant unlabelled data. [sent-276, score-0.357]

74 3, where unlabelled data clearly helps to learn a better object model. [sent-278, score-0.311]

75 Finally, the similarly good results on the held-out test set verify that our model is indeed learning a good generalisable localisation mechanism and is not merely over fitting to the training data. [sent-279, score-0.495]

76 Insights into Our Model Object localisation and learned foreground topics Qualitative results are illustrated in Fig. [sent-282, score-0.703]

77 4, including heat maps of the object location showing what has been learned by those object (foreground) topics in our model. [sent-283, score-0.466]

78 These examples show that the foreground topics indeed capture what each object class looks like and can distinguish it from background and between different object classes. [sent-285, score-0.56]

79 4(e) where the single Gaussian assumption is not ideal when the foreground topic has less a compact response. [sent-294, score-0.375]

80 Learned background topics A key ability of our framework is the explicit modelling of background non-annotated data. [sent-298, score-0.389]

81 This allows such irrelevant pixels to be explained, reducing confusion with foreground objects and hence improving localisation accuracy. [sent-299, score-0.569]

82 5 via plots of the background topic response (heat map). [sent-301, score-0.387]

83 It shows that some ofthe background topics have clear semantic meaning, corresponding to common components such as sky, grass, road and water, despite none of these has ever been annotated. [sent-302, score-0.25]

84 the water topic gives strong response to both water and sky. [sent-305, score-0.411]

85 Illustration of the object localisation process and what are learned by the object (foreground) topics using the heat map in the bottom row (higher intensity values mean higher model response). [sent-314, score-0.853]

86 Without spatially aware representation (NoSpatial): The Gaussian representation of appearance within each image enforces spatial compactness, and hence helps to disambiguate object appearance from background appearance. [sent-318, score-0.256]

87 Without learning spatial extent, background patches of similar appearance to objects in the feature space cannot be properly disambiguated, leading to poorer learning and reduced localisation accuracy. [sent-319, score-0.77]

88 Alternative joint learning approaches In this experiment we compare other joint multi-instance/weakly-supervised multi-label learning methods, and show that none are effec- MethodVOC07-6×2VOC07-20 sOtruipr-Speadm-dpolwin gIN Lo PSrpiaotriOalfap 4 4241. [sent-321, score-0.278]

89 Our Bayesian topic inference process not only enables prior knowledge to be used, but also achieves 10-fold improvements in convergence time compared to EM inference used by most conventional topic models with point222999999000 estimated Dirichlet topics. [sent-346, score-0.708]

90 For object localisation in training images, direct Gaussian localisation is effectively free and heat-map sampling took around 0. [sent-349, score-0.958]

91 These statistics compare favourably to alternatives: [10] reports 2 hours to train 100 images; while our Matlab implementations of [27], [28] and [4] took 10, 15 and 20 hours respectively to localise objects for all 5,01 1images. [sent-351, score-0.241]

92 Our approach surpasses the performance of all prior methods, obtaining state-of-the-art results due to three novel features: joint multi-label learning, a Bayesian formulation, and an explicit spatial model of object location. [sent-354, score-0.261]

93 Spatially coherent latent topic model for concurrent object segmentation and classification. [sent-405, score-0.399]

94 Weakly supervised learning ofpartbased spatial models for visual object recognition. [sent-410, score-0.254]

95 Identifying rare and subtle behaviors: A weakly supervised joint topic model. [sent-477, score-0.719]

96 Weakly supervised discriminative localization and classification: a joint learning process. [sent-509, score-0.254]

97 Scene recognition and weakly supervised object localization with deformable part-based models. [sent-518, score-0.443]

98 Transfer learning by ranking for weakly supervised object annotation. [sent-530, score-0.51]

99 In defence of negative mining for annotating weakly labelled data. [sent-536, score-0.361]

100 Weakly supervised object detector learning with model drift detection. [sent-541, score-0.254]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('localisation', 0.428), ('topic', 0.276), ('wsol', 0.266), ('weakly', 0.256), ('unlabelled', 0.239), ('jk', 0.206), ('topics', 0.176), ('localise', 0.169), ('miml', 0.121), ('bayesian', 0.12), ('supervised', 0.115), ('heat', 0.105), ('labelled', 0.105), ('classes', 0.102), ('yij', 0.1), ('foreground', 0.099), ('tfg', 0.097), ('competitors', 0.088), ('annotated', 0.083), ('annotation', 0.074), ('horse', 0.074), ('background', 0.074), ('kj', 0.073), ('allu', 0.072), ('kfg', 0.072), ('localised', 0.072), ('siva', 0.072), ('joint', 0.072), ('object', 0.072), ('lij', 0.068), ('voc', 0.068), ('learning', 0.067), ('class', 0.067), ('knowledge', 0.065), ('tbg', 0.064), ('vmp', 0.064), ('yijk', 0.064), ('jj', 0.06), ('boat', 0.06), ('backgrounds', 0.058), ('auxiliary', 0.057), ('kk', 0.056), ('appearance', 0.055), ('generative', 0.054), ('notion', 0.053), ('prior', 0.052), ('latent', 0.051), ('weak', 0.051), ('hospedales', 0.051), ('water', 0.049), ('corrlda', 0.048), ('fjg', 0.048), ('fjgk', 0.048), ('ltm', 0.048), ('zhiyuan', 0.048), ('draw', 0.048), ('blei', 0.048), ('supervision', 0.046), ('away', 0.043), ('localising', 0.043), ('cabral', 0.043), ('kbg', 0.043), ('transferrable', 0.043), ('objects', 0.042), ('studies', 0.042), ('location', 0.041), ('shi', 0.04), ('dirichlet', 0.039), ('conventional', 0.039), ('cues', 0.038), ('modelled', 0.038), ('xij', 0.038), ('poorer', 0.037), ('ssl', 0.037), ('pascal', 0.037), ('response', 0.037), ('nj', 0.036), ('multi', 0.036), ('factoring', 0.036), ('utilising', 0.036), ('fully', 0.035), ('updates', 0.034), ('oen', 0.034), ('transfer', 0.034), ('uniquely', 0.033), ('explicit', 0.033), ('jointly', 0.033), ('posterior', 0.033), ('modelling', 0.032), ('extent', 0.032), ('pandey', 0.032), ('surpasses', 0.032), ('gin', 0.031), ('dir', 0.031), ('whilst', 0.031), ('proportion', 0.03), ('took', 0.03), ('utilise', 0.03), ('lack', 0.029), ('explaining', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang

Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.

2 0.23767075 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes

Author: Dahua Lin, Jianxiong Xiao

Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –

3 0.19130853 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos

Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.

4 0.12679456 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

Author: Chen Change Loy, Shaogang Gong, Tao Xiang

Abstract: Regression-based techniques have shown promising results for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustively annotating every single frame, the most informative frames are selected for annotation automatically and actively. (2) Rather than learning from only labelled data, the abundant unlabelled data are exploited. (3) Labelled data from other scenes are employed to further alleviate the burden for data annotation. All three ideas are implemented in a unified active and semi-supervised regression framework with ability to perform transfer learning, by exploiting the underlying geometric structure of crowd patterns via manifold analysis. Extensive experiments validate the effectiveness of our approach.

5 0.11987584 445 iccv-2013-Visual Reranking through Weakly Supervised Multi-graph Learning

Author: Cheng Deng, Rongrong Ji, Wei Liu, Dacheng Tao, Xinbo Gao

Abstract: Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval engines. The current trend lies in employing a crowd of retrieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. However, a major challenge pertaining to current reranking methods is how to take full advantage of the complementary property of distinct feature modalities. Given a query image and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image reranking approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across different graphs. Moreover, weakly supervised learning driven by image attributes is performed to denoise the pseudo- labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automatically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark image retrieval datasets, demonstrating a significant performance gain over the state-of-the-arts.

6 0.11637604 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

7 0.10780986 379 iccv-2013-Semantic Segmentation without Annotating Segments

8 0.10399299 166 iccv-2013-Finding Actors and Actions in Movies

9 0.10361125 380 iccv-2013-Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes

10 0.10226782 104 iccv-2013-Decomposing Bag of Words Histograms

11 0.096793525 26 iccv-2013-A Practical Transfer Learning Algorithm for Face Verification

12 0.092803717 338 iccv-2013-Randomized Ensemble Tracking

13 0.091283754 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

14 0.091101006 111 iccv-2013-Detecting Dynamic Objects with Multi-view Background Subtraction

15 0.086120516 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps

16 0.084864184 160 iccv-2013-Fast Object Segmentation in Unconstrained Video

17 0.08402244 318 iccv-2013-PixelTrack: A Fast Adaptive Algorithm for Tracking Non-rigid Objects

18 0.082621865 440 iccv-2013-Video Event Understanding Using Natural Language Descriptions

19 0.082092054 233 iccv-2013-Latent Task Adaptation with Large-Scale Hierarchies

20 0.079752907 377 iccv-2013-Segmentation Driven Object Detection with Fisher Vectors


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.203), (1, 0.053), (2, 0.035), (3, -0.04), (4, 0.067), (5, -0.005), (6, -0.057), (7, 0.035), (8, -0.024), (9, -0.042), (10, 0.027), (11, -0.015), (12, -0.051), (13, -0.047), (14, -0.016), (15, -0.05), (16, -0.018), (17, 0.014), (18, -0.031), (19, -0.027), (20, -0.012), (21, -0.02), (22, -0.009), (23, -0.032), (24, 0.067), (25, 0.03), (26, 0.105), (27, -0.002), (28, 0.102), (29, 0.048), (30, 0.06), (31, 0.026), (32, -0.007), (33, 0.062), (34, -0.066), (35, 0.043), (36, -0.073), (37, 0.109), (38, -0.006), (39, 0.017), (40, 0.044), (41, 0.006), (42, -0.15), (43, -0.118), (44, -0.051), (45, -0.067), (46, 0.013), (47, -0.067), (48, 0.117), (49, -0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91858208 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang

Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.

2 0.84481829 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

Author: Mandar Dixit, Nikhil Rasiwasia, Nuno Vasconcelos

Abstract: An extension of the latent Dirichlet allocation (LDA), denoted class-specific-simplex LDA (css-LDA), is proposed for image classification. An analysis of the supervised LDA models currently used for this task shows that the impact of class information on the topics discovered by these models is very weak in general. This implies that the discovered topics are driven by general image regularities, rather than the semantic regularities of interest for classification. To address this, we introduce a model that induces supervision in topic discovery, while retaining the original flexibility of LDA to account for unanticipated structures of interest. The proposed css-LDA is an LDA model with class supervision at the level of image features. In css-LDA topics are discovered per class, i.e. a single set of topics shared across classes is replaced by multiple class-specific topic sets. This model can be used for generative classification using the Bayes decision rule or even extended to discriminative classification with support vector machines (SVMs). A css-LDA model can endow an image with a vector of class and topic specific count statistics that are similar to the Bag-of-words (BoW) histogram. SVM-based discriminants can be learned for classes in the space of these histograms. The effectiveness of css-LDA model in both generative and discriminative classification frameworks is demonstrated through an extensive experimental evaluation, involving multiple benchmark datasets, where it is shown to outperform all existing LDA based image classification approaches.

3 0.77233255 72 iccv-2013-Characterizing Layouts of Outdoor Scenes Using Spatial Topic Processes

Author: Dahua Lin, Jianxiong Xiao

Abstract: In this paper, we develop a generative model to describe the layouts of outdoor scenes the spatial configuration of regions. Specifically, the layout of an image is represented as a composite of regions, each associated with a semantic topic. At the heart of this model is a novel stochastic process called Spatial Topic Process, which generates a spatial map of topics from a set of coupled Gaussian processes, thus allowing the distributions of topics to vary continuously across the image plane. A key aspect that distinguishes this model from previous ones consists in its capability of capturing dependencies across both locations and topics while allowing substantial variations in the layouts. We demonstrate the practical utility of the proposed model by testing it on scene classification, semantic segmentation, and layout hallucination. –

4 0.6096195 386 iccv-2013-Sequential Bayesian Model Update under Structured Scene Prior for Semantic Road Scenes Labeling

Author: Evgeny Levinkov, Mario Fritz

Abstract: Semantic road labeling is a key component of systems that aim at assisted or even autonomous driving. Considering that such systems continuously operate in the realworld, unforeseen conditions not represented in any conceivable training procedure are likely to occur on a regular basis. In order to equip systems with the ability to cope with such situations, we would like to enable adaptation to such new situations and conditions at runtime. Existing adaptive methods for image labeling either require labeled data from the new condition or even operate globally on a complete test set. None of this is a desirable mode of operation for a system as described above where new images arrive sequentially and conditions may vary. We study the effect of changing test conditions on scene labeling methods based on a new diverse street scene dataset. We propose a novel approach that can operate in such conditions and is based on a sequential Bayesian model update in order to robustly integrate the arriving images into the adapting procedure.

5 0.59443724 208 iccv-2013-Image Co-segmentation via Consistent Functional Maps

Author: Fan Wang, Qixing Huang, Leonidas J. Guibas

Abstract: Joint segmentation of image sets has great importance for object recognition, image classification, and image retrieval. In this paper, we aim to jointly segment a set of images starting from a small number of labeled images or none at all. To allow the images to share segmentation information with each other, we build a network that contains segmented as well as unsegmented images, and extract functional maps between connected image pairs based on image appearance features. These functional maps act as general property transporters between the images and, in particular, are used to transfer segmentations. We define and operate in a reduced functional space optimized so that the functional maps approximately satisfy cycle-consistency under composition in the network. A joint optimization framework is proposed to simultaneously generate all segmentation functions over the images so that they both align with local segmentation cues in each particular image, and agree with each other under network transportation. This formulation allows us to extract segmentations even with no training data, but can also exploit such data when available. The collective effect of the joint processing using functional maps leads to accurate information sharing among images and yields superior segmentation results, as shown on the iCoseg, MSRC, and PASCAL data sets.

6 0.58546954 248 iccv-2013-Learning to Rank Using Privileged Information

7 0.57088339 443 iccv-2013-Video Synopsis by Heterogeneous Multi-source Correlation

8 0.56864649 178 iccv-2013-From Semi-supervised to Transfer Counting of Crowds

9 0.5519895 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

10 0.54668647 179 iccv-2013-From Subcategories to Visual Composites: A Multi-level Framework for Object Detection

11 0.53713298 290 iccv-2013-New Graph Structured Sparsity Model for Multi-label Image Annotations

12 0.53685141 215 iccv-2013-Incorporating Cloud Distribution in Sky Representation

13 0.51667291 126 iccv-2013-Dynamic Label Propagation for Semi-supervised Multi-class Multi-label Classification

14 0.51195961 285 iccv-2013-NEIL: Extracting Visual Knowledge from Web Data

15 0.51062763 426 iccv-2013-Training Deformable Part Models with Decorrelated Features

16 0.50720161 411 iccv-2013-Symbiotic Segmentation and Part Localization for Fine-Grained Categorization

17 0.50694335 326 iccv-2013-Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation

18 0.50669861 198 iccv-2013-Hierarchical Part Matching for Fine-Grained Visual Categorization

19 0.50534946 327 iccv-2013-Predicting an Object Location Using a Global Image Representation

20 0.50270462 63 iccv-2013-Bounded Labeling Function for Global Segmentation of Multi-part Objects with Geometric Constraints


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.082), (7, 0.036), (12, 0.018), (13, 0.021), (26, 0.075), (31, 0.084), (35, 0.013), (42, 0.094), (61, 0.182), (64, 0.075), (73, 0.03), (78, 0.024), (89, 0.153), (98, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82849097 194 iccv-2013-Heterogeneous Image Features Integration via Multi-modal Semi-supervised Learning Model

Author: Xiao Cai, Feiping Nie, Weidong Cai, Heng Huang

Abstract: Automatic image categorization has become increasingly important with the development of Internet and the growth in the size of image databases. Although the image categorization can be formulated as a typical multiclass classification problem, two major challenges have been raised by the real-world images. On one hand, though using more labeled training data may improve the prediction performance, obtaining the image labels is a time consuming as well as biased process. On the other hand, more and more visual descriptors have been proposed to describe objects and scenes appearing in images and different features describe different aspects of the visual characteristics. Therefore, how to integrate heterogeneous visual features to do the semi-supervised learning is crucial for categorizing large-scale image data. In this paper, we propose a novel approach to integrate heterogeneous features by performing multi-modal semi-supervised classification on unlabeled as well as unsegmented images. Considering each type of feature as one modality, taking advantage of the large amoun- t of unlabeled data information, our new adaptive multimodal semi-supervised classification (AMMSS) algorithm learns a commonly shared class indicator matrix and the weights for different modalities (image features) simultaneously.

same-paper 2 0.82662249 59 iccv-2013-Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

Author: Zhiyuan Shi, Timothy M. Hospedales, Tao Xiang

Abstract: We address the problem of localisation of objects as bounding boxes in images with weak labels. This weakly supervised object localisation problem has been tackled in the past using discriminative models where each object class is localised independently from other classes. We propose a novel framework based on Bayesian joint topic modelling. Our framework has three distinctive advantages over previous works: (1) All object classes and image backgrounds are modelled jointly together in a single generative model so that “explaining away” inference can resolve ambiguity and lead to better learning and localisation. (2) The Bayesian formulation of the model enables easy integration of prior knowledge about object appearance to compensate for limited supervision. (3) Our model can be learned with a mixture of weakly labelled and unlabelled data, allowing the large volume of unlabelled images on the Internet to be exploited for learning. Extensive experiments on the challenging VOC dataset demonstrate that our approach outperforms the state-of-the-art competitors.

3 0.77157772 180 iccv-2013-From Where and How to What We See

Author: S. Karthikeyan, Vignesh Jagadeesh, Renuka Shenoy, Miguel Ecksteinz, B.S. Manjunath

Abstract: Eye movement studies have confirmed that overt attention is highly biased towards faces and text regions in images. In this paper we explore a novel problem of predicting face and text regions in images using eye tracking data from multiple subjects. The problem is challenging as we aim to predict the semantics (face/text/background) only from eye tracking data without utilizing any image information. The proposed algorithm spatially clusters eye tracking data obtained in an image into different coherent groups and subsequently models the likelihood of the clusters containing faces and text using afully connectedMarkov Random Field (MRF). Given the eye tracking datafrom a test image, itpredicts potential face/head (humans, dogs and cats) and text locations reliably. Furthermore, the approach can be used to select regions of interest for further analysis by object detectors for faces and text. The hybrid eye position/object detector approach achieves better detection performance and reduced computation time compared to using only the object detection algorithm. We also present a new eye tracking dataset on 300 images selected from ICDAR, Street-view, Flickr and Oxford-IIIT Pet Dataset from 15 subjects.

4 0.76874083 78 iccv-2013-Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations

Author: Manjunath Narayana, Allen Hanson, Erik Learned-Miller

Abstract: In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.

5 0.76767182 137 iccv-2013-Efficient Salient Region Detection with Soft Image Abstraction

Author: Ming-Ming Cheng, Jonathan Warrell, Wen-Yan Lin, Shuai Zheng, Vibhav Vineet, Nigel Crook

Abstract: Detecting visually salient regions in images is one of the fundamental problems in computer vision. We propose a novel method to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection, using a soft image abstraction representation. By considering both appearance similarity and spatial distribution of image pixels, the proposed representation abstracts out unnecessary image details, allowing the assignment of comparable saliency values across similar regions, and producing perceptually accurate salient region detection. We evaluate our salient region detection approach on the largest publicly available dataset with pixel accurate annotations. The experimental results show that the proposed method outperforms 18 alternate methods, reducing the mean absolute error by 25.2% compared to the previous best result, while being computationally more efficient.

6 0.76639903 376 iccv-2013-Scene Text Localization and Recognition with Oriented Stroke Detection

7 0.76267087 131 iccv-2013-EVSAC: Accelerating Hypotheses Generation by Modeling Matching Scores with Extreme Value Theory

8 0.76152849 73 iccv-2013-Class-Specific Simplex-Latent Dirichlet Allocation for Image Classification

9 0.75858772 328 iccv-2013-Probabilistic Elastic Part Model for Unsupervised Face Detector Adaptation

10 0.75823933 253 iccv-2013-Linear Sequence Discriminant Analysis: A Model-Based Dimensionality Reduction Method for Vector Sequences

11 0.75809604 315 iccv-2013-PhotoOCR: Reading Text in Uncontrolled Conditions

12 0.75792646 38 iccv-2013-Action Recognition with Actons

13 0.75659788 340 iccv-2013-Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests

14 0.75625455 349 iccv-2013-Regionlets for Generic Object Detection

15 0.75361741 188 iccv-2013-Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps

16 0.75298548 173 iccv-2013-Fluttering Pattern Generation Using Modified Legendre Sequence for Coded Exposure Imaging

17 0.7526952 338 iccv-2013-Randomized Ensemble Tracking

18 0.75200963 22 iccv-2013-A New Adaptive Segmental Matching Measure for Human Activity Recognition

19 0.75047505 260 iccv-2013-Manipulation Pattern Discovery: A Nonparametric Bayesian Approach

20 0.75041837 95 iccv-2013-Cosegmentation and Cosketch by Unsupervised Learning