nips nips2009 nips2009-5 knowledge-graph by maker-knowledge-mining

5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation


Source: pdf

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The analysis employs image features and, when present, the words associated with accompanying annotations. [sent-7, score-0.352]

2 The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). [sent-8, score-0.649]

3 Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. [sent-9, score-0.572]

4 The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. [sent-10, score-0.445]

5 Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases. [sent-12, score-0.226]

6 1 Introduction There has recently been much interest in developing statistical models for analyzing and organizing images, based on image features and, when available, auxiliary information, such as words (e. [sent-13, score-0.284]

7 Three important aspects of this problem are: (i) sorting multiple images into scene-level classes, (ii) image annotation, and (iii) segmenting and labeling localized objects within images. [sent-16, score-0.56]

8 Although good classification performance was achieved using this approach, the model is employed in a supervised manner, utilizing scene-labeled images for scene classification. [sent-24, score-0.262]

9 Nevertheless, to improve performance, in [16] some images are required for supervised learning, based on the segmented and labeled objects obtained via the method proposed in [10], with these used to initialize the algorithm. [sent-26, score-0.393]

10 The four main contributions of this paper are: • Each object in an image is represented as a mixture of image-feature model parameters, accounting for the heterogeneous character of individual objects. [sent-29, score-0.427]

11 By contrast, each object is only associated with one image-feature component/atom in the Corr-LDA-like models [6, 16, 23]. [sent-31, score-0.207]

12 • Multiple images are processed jointly; all, none or a subset of the images may be annotated. [sent-32, score-0.363]

13 The model infers the linkage between image-feature parameters and object types, with this linkage used to yield localized labeling of objects within all images. [sent-33, score-0.509]

14 • A novel logistic stick-breaking process (LSBP) is proposed, imposing the belief that proximate portions of an image are more likely to reside within the same segment (object). [sent-35, score-0.347]

15 This spatially constrained prior yields contiguous objects with sharp boundaries, and via the aforementioned mixture models the segmented objects may be composed of heterogeneous building blocks. [sent-36, score-0.61]

16 The number of image classes, number of object types, number of image-feature mixture components per object, and the linkage between words and image model parameters are inferred nonparametrically. [sent-38, score-0.807]

17 1 Bag of image features We jointly process data from M images, and each image is assumed to come from an associated class type (e. [sent-40, score-0.464]

18 The class type associated with image m is denoted by zm ∈ {1, . [sent-44, score-0.395]

19 , I}, and it is drawn from the mixture model I zm ∼ ui δi , u ∼ StickI (αu ) (1) i=1 where StickI (αu ) is a stick-breaking process [13] that is truncated to I sticks, with hyper-parameter αu > 0. [sent-47, score-0.212]

20 The symbol δi represents a unit measure at the integer i, and the parameter ui denotes the probability that image type i will be observed across the M images. [sent-48, score-0.214]

21 The observed data are image feature vectors, each tied to a local region in the image (for example, associated with an over-segmented portion of the image). [sent-49, score-0.432]

22 The Lm observed image feature vectors associated with image m are {xml }Lm , and the lth feature vector is assumed drawn xml ∼ F (θ ml ). [sent-50, score-0.714]

23 Each image is assumed to be composed of a set of latent objects. [sent-52, score-0.234]

24 An indicator variable ζml defines which object type the lth feature vector from image m is associated with, and it is drawn K ζml ∼ wzm k δk , w i ∼ StickK (αw ) (2) k=1 where index k corresponds to the kth type of object that may reside within an image. [sent-53, score-0.88]

25 The vector wi defines the probability that each of the K object types will occur, conditioned on the image type i ∈ {1, . [sent-54, score-0.385]

26 , I}; the kth component of w zm , wzm k , denotes the probability of observing object type k in image m, when image m was drawn from class zm ∈ {1, . [sent-57, score-0.915]

27 The image class zm and corresponding objects {ζml }Lm associated with image m are latent varil=1 ables. [sent-61, score-0.736]

28 Specifically, a separate such mixture model is manifested for each of the K object types, motivated by the idea that each object will in general be composed of a different set of image-feature building blocks. [sent-63, score-0.452]

29 2 Bag of clustered image features While the model described above is straightforward to understand, it has been found to be ineffecK tive. [sent-69, score-0.209]

30 from k=1 wzm k δk , and therefore there is 2 nothing in the model that encourages the image features, xml and xml′ , which are associated with the same image-feature atom θ∗ , to be assigned to the same object k. [sent-73, score-0.737]

31 Specifically, consider the following augmented model: T xml ∼ F (θ ml ) , θml ∼ Gcml , cml ∼ K vmt δζmt , ζmt ∼ t=1 I wzm k δk , zm ∼ u i δi (4) i=1 k=1 where v m ∼ StickT (αv ), and Gk is as defined in (3). [sent-75, score-0.861]

32 3 Linking words with images In the above discussion it was assumed that the only observed data are the image feature vectors {xml }Lm . [sent-78, score-0.448]

33 In this setting we assume that we have a K-dimensional dictionary of words associated with objects in images, and a word is assigned to each of the objects k ∈ {1, . [sent-80, score-0.495]

34 Of the collection of M images, some may be annotated and some not, and all will be processed simultaneously by the joint model; in so doing, annotations will be inferred for the originally non-annotated images. [sent-84, score-0.345]

35 For an image for which no annotation is given, the image is assumed generated via (4). [sent-85, score-0.511]

36 If image m is in class zm , then we simply set y m ∼ Mult(wzm ) , wi ∼ StickK (αw ) (5) Namely, ϕm = w zm , recalling that wi defines the probability of observing each object type for image class i. [sent-87, score-0.825]

37 1 Logistic stick-breaking process (LSBP) In (5), note that once the image class zm is drawn for image m, the order/location of the xml within the image may be interchanged, and nothing in the generative process will change. [sent-93, score-0.873]

38 This is because the indicator variable cml , which defines the object class associated with feature vector l in image m, T is drawn i. [sent-94, score-0.611]

39 With each feature vector xml there is an associated spatial location, which we denote sml (this is a two-dimensional vector). [sent-99, score-0.399]

40 We wish to draw T cml ∼ K vmt (sml )δζmt , t=1 ζmt ∼ wzm k δk (6) k=1 where the cluster probabilities vmt (sml ) are now a function of position sml (the ζmt ∈ {1, . [sent-100, score-0.897]

41 The challenge, therefore, becomes development of a means of constructing vmt (s) to encourage nearby feature vectors to come from the same object type. [sent-104, score-0.379]

42 , T − 1 we impose t−1 vmt (s) = σ[gmt (s)] {1 − σ[gmτ (s)]} (7) τ =1 T −1 t=1 vmt (s). [sent-109, score-0.48]

43 L (m) (m) m where vmT (s) = 1 − We define gmt (s) = l=1 Wtl K(s, sml ) + Wt0 where K(s, sml ) is a kernel, and here we utilize the radial basis function kernel K(s, sml ) = exp[− s − sml 2 /φmt ]. [sent-110, score-0.63]

44 3 We desire that a given stick vmt (s) has importance (at most) over a localized region, and therefore (m) (m) (m) we impose sparseness priors on parameters {Wtl }Lm . [sent-113, score-0.316]

45 For notational convenience, cml ∼ t=1 vmt (sml )δζmt and ζmt ∼ k=1 wzm k δk constructed as above is represented as a draw from LSBPT (wzm ). [sent-117, score-0.528]

46 A particular non-zero Wtl is (via the kernel) associated with the lth local spatial region, with spatial extent defined by φmt . [sent-126, score-0.188]

47 All locations s for which (roughly) gmt (s) ≥ 4 will have – via the “clipping” manifested via the logistic – nearly the same high probability of being associated with model layer t. [sent-129, score-0.275]

48 2 Processing images with no words given If one is given M images, all non-annotated, then the model may be employed on the data {xml }Lm , l=1 for m = 1, . [sent-140, score-0.293]

49 , M , from which a posterior distribution is inferred on the image model parameters ∗ J {θj }j=1 , and on {Gk }K . [sent-143, score-0.32]

50 Note that properties of the image classes and of the objects within k=1 images is inferred by processing all M images jointly. [sent-144, score-0.787]

51 By placing all images within the context of each other, the model is able to infer which building blocks (classes and objects) are responsible for all of the data. [sent-145, score-0.304]

52 In this sense the simultaneous processing of multiple images is critical: the learning of properties of objects in one image is aided by the properties being learned for objects in all other images, through the inference of inter-relationships and commonalities. [sent-146, score-0.624]

53 After the M images are analyzed in the absence of annotations, one may observe example portions of the M images, to infer the link between actual object characteristics within imagery and the associated latent object indicator to which it was assigned. [sent-147, score-0.688]

54 With this linkage made, one may assign words to all or a subset of the K object types. [sent-148, score-0.292]

55 After words are assigned to previously latent object types, the results of the analysis (with no additional processing) may be used to automatically label regions (objects) in all of the images. [sent-149, score-0.293]

56 This is manifested because each of the cluster indicators cml is associated with a latent localized object type (to which a word may now be assigned). [sent-150, score-0.596]

57 3 Joint processing of images and annotations We may consider problems for which a subset of the images are provided with annotations (but not the explicit location and segmented-out objects); the words are assumed to reside in a prescribed dictionary of object types. [sent-152, score-0.939]

58 The generation of the annotations (and images) is constituted via the model in (5), with the LSBP employed as discussed. [sent-153, score-0.229]

59 We do not require that all images are annotated (the non-annotated images help learn the properties of the image features, and are therefore useful even if they do not provide information about the words). [sent-154, score-0.574]

60 The presence of the same word within the annotations of multiple images encourages the model to infer what objects (represented in terms of image features) are common to the associated images, aiding the learning. [sent-156, score-0.891]

61 Hence, the presence of annotations serves as a learning aid (encourages looking for commonalities between particular images, if words are shared in the associated annotations). [sent-157, score-0.338]

62 Further, the annotations associated with images may disambiguate objects that appear similar in image-feature space (because they will have different annotations). [sent-158, score-0.539]

63 From the above discussion, the model performance will improve as more images are annotated with each word, but presumably this annotation is much easier for the human than requiring one to segment out and localize words within a scene. [sent-159, score-0.559]

64 For the MSRC dataset, 10 categories of images with manual annotations are selected: “tree”, “building”, “cow”, “face”, “car”, “sheep”, “flower”, “sign”, “book” and “chair”. [sent-166, score-0.332]

65 The number of images in the “cow” class is 45, and in the “sheep” class there are 35; there are 30 images in all other classes. [sent-167, score-0.328]

66 From each category, we randomly choose 10 images, and remove the annotations, treating these as non-annotated images within the analysis (to allow quantification of inferred-annotation quality). [sent-168, score-0.192]

67 The following analysis, in which annotated and non-annotated images are processed jointly, is executed as discussed in Section 4. [sent-174, score-0.263]

68 Here we randomly choose 25 images for each category, and each image is resized to a dimension of 240 × 320 or 320 × 240. [sent-177, score-0.346]

69 After performing this analysis, and upon examining the properties of segmented data associated with each (latent) object class on a small subset of the data, 5 we can infer words associated with some important Gk , and then label portions (objects) within each image via the inferred words. [sent-180, score-0.817]

70 1 Image preprocessing Each image is first segmented into 800 “superpixels”, which are local, coherent and preserve most of the structure necessary for segmentation at the scale of interest [19]. [sent-187, score-0.343]

71 We discretize these features using a codebook of size 64 (other codebook sizes gave similar performance), and then calculate the distribution [1] for each feature within each superpixel as visual words [3, 6, 10, 11, 20, 23, 24]. [sent-198, score-0.194]

72 j 1j ρ 2j ρ 3j ρ The center of each superpixel is recorded as the location coordinate sml . [sent-201, score-0.193]

73 In addition, based on the learned posterior word distribution wi for each image class i, we can further infer which words/objects are probable for each scene class. [sent-215, score-0.4]

74 Although not shown here for brevity, the analysis on UIUC features correctly inferred the 8 image classes associated with that data (without using annotations). [sent-217, score-0.36]

75 By examining the words and segmented objects extracted with high probability as represented by wi , we may also assign names to each of the 18 image classes across both the MSRC and UIUC data, consistent with the associated class labels provided with the data. [sent-218, score-0.645]

76 , M } we also have a posterior distribution on the associated class indicator zm . [sent-222, score-0.25]

77 We approximate the membership for each image by assigning it to the mixture with largest probability. [sent-223, score-0.228]

78 This “hard” decision is employed to provide scene-level label for each image (the Bayesian analysis can also yield a “soft” decision in terms of a full posterior distribution). [sent-224, score-0.215]

79 2, and employing results from the processed nonannotated UIUC-Sport data, we examined the properties of segmented data associated with each (latent) object type. [sent-229, score-0.364]

80 We inferred the presence of 12 unique objects, and these objects were assigned the following words: “human”, “horse”, “grass”, “sky”, “tree”, “ground”,“water”, “rock”, “court”, “boat”, “sailboat” and “snow”. [sent-230, score-0.217]

81 Using these words, we annotated each image and re-trained our model in the presence of annotations. [sent-231, score-0.273]

82 The improvement in performance, relative to processing the images without annotations, is attributed to the ability of words to disambiguate distinct objects that have similar properties in image-feature space (e. [sent-235, score-0.405]

83 1 Building Sky Grass Tree Object Index 0 Void Grass Cow Tree Void Object Index Building Figure 2: Example inferred latent properties associated with MSRC dataset. [sent-250, score-0.198]

84 Middle and Right: Example probability of objects for a given class, w i (probability of object/words); here we only give the top 5 words for each class. [sent-252, score-0.241]

85 68 Figure 3: Comparisons using confusion matrices for all images in each dataset (all of the annotated and nonannotated images in MSRC; all the non-annotated images in UIUC-Sport). [sent-585, score-0.588]

86 3 Image annotation The proposed model infers a posterior distribution for the indicator variables cml (defining the object/word for super-pixel l in image m). [sent-594, score-0.612]

87 Similar to the “hard” image-class assignment discussed above, a “hard” segmentation is employed here to provide object labels for each super-pixel. [sent-595, score-0.21]

88 For the MSRC images for which annotations were held out, we evaluate whether the words associated with objects in a given image were given in the associated annotation (thus, our annotation is defined by the words we have assigned to objects in an image). [sent-596, score-1.426]

89 Table 1: Comparison of precision and recall values for annotation and segmentation with Corr-LDA [6], our model without LSBP (Simp. [sent-597, score-0.245]

90 The left part of Table 1 lists detailed annotation results for five objects, as well as the overall scores from all objects classes for the MSRC data. [sent-734, score-0.318]

91 For example, for complicated objects the Corr-LDA segmentation results are very sensitive to the feature variance, and an object is generally segmented into many small, detailed parts. [sent-739, score-0.439]

92 The name of original images are inferred by scene-level classification via our model. [sent-748, score-0.242]

93 One VB run of our model with LSBP, for 70 VB iterations, required nearly 7 hours for 320 images from MSRC dataset. [sent-756, score-0.191]

94 6 Conclusions A nonparametric Bayesian model has been developed for clustering M images into classes; the images are represented as a aggregation of distinct localized objects, to which words may be assigned. [sent-761, score-0.504]

95 To infer the relationships between image objects and words (labels), we only need to make the association between inferred model parameters and words. [sent-762, score-0.563]

96 This may be done as a post-processing step if no words are provided, and it may done in situ if all or a subset of the M images are annotated. [sent-763, score-0.266]

97 Spatially contiguous objects are realized via a new logistic stick-breaking process. [sent-764, score-0.294]

98 Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. [sent-838, score-0.289]

99 Towards total scene understaning: classification, annotation and segmentation in an automatic framework. [sent-878, score-0.289]

100 Multi-modal hierarchical Dirichlet process model for predicting image annotation and image-object label correspondence. [sent-937, score-0.356]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lsbp', 0.545), ('msrc', 0.256), ('vmt', 0.24), ('image', 0.182), ('annotations', 0.168), ('images', 0.164), ('cml', 0.16), ('ksbp', 0.16), ('xml', 0.16), ('annotation', 0.147), ('objects', 0.139), ('object', 0.139), ('sml', 0.129), ('wzm', 0.128), ('mt', 0.123), ('zm', 0.113), ('prec', 0.112), ('wtl', 0.112), ('words', 0.102), ('cow', 0.096), ('segmented', 0.09), ('rec', 0.09), ('gmt', 0.08), ('inferred', 0.078), ('lm', 0.078), ('sheep', 0.077), ('contiguous', 0.073), ('scene', 0.071), ('segmentation', 0.071), ('vb', 0.07), ('associated', 0.068), ('annotated', 0.064), ('rowing', 0.064), ('sailing', 0.064), ('superpixel', 0.064), ('ml', 0.06), ('mult', 0.057), ('chair', 0.057), ('latent', 0.052), ('linkage', 0.051), ('manifested', 0.051), ('building', 0.05), ('logistic', 0.049), ('tl', 0.048), ('bocce', 0.048), ('croquet', 0.048), ('polo', 0.048), ('ymk', 0.048), ('localized', 0.047), ('word', 0.047), ('mixture', 0.046), ('grass', 0.046), ('variational', 0.044), ('sky', 0.043), ('spatial', 0.042), ('uiuc', 0.042), ('void', 0.042), ('spatially', 0.04), ('car', 0.039), ('dir', 0.039), ('iccv', 0.037), ('book', 0.036), ('indicator', 0.036), ('blei', 0.036), ('lth', 0.036), ('tree', 0.036), ('infer', 0.035), ('processed', 0.035), ('reside', 0.034), ('constituted', 0.034), ('kernel', 0.034), ('dirichlet', 0.034), ('gk', 0.034), ('truncation', 0.034), ('posterior', 0.033), ('encourages', 0.033), ('realized', 0.033), ('heterogeneous', 0.033), ('classes', 0.032), ('boat', 0.032), ('gcml', 0.032), ('nonannotated', 0.032), ('rockc', 0.032), ('sailboat', 0.032), ('sticki', 0.032), ('stickk', 0.032), ('type', 0.032), ('wi', 0.032), ('stick', 0.029), ('contiguity', 0.028), ('clipping', 0.028), ('dunson', 0.028), ('superpixels', 0.028), ('within', 0.028), ('model', 0.027), ('segment', 0.027), ('infers', 0.027), ('objectives', 0.027), ('portions', 0.027), ('drawn', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

2 0.20246781 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

Author: Tomoharu Iwata, Takeshi Yamada, Naonori Ueda

Abstract: We propose a probabilistic topic model for analyzing and extracting contentrelated annotations from noisy annotated discrete data such as web pages stored in social bookmarking services. In these services, since users can attach annotations freely, some annotations do not describe the semantics of the content, thus they are noisy, i.e. not content-related. The extraction of content-related annotations can be used as a preprocessing step in machine learning tasks such as text classification and image recognition, or can improve information retrieval performance. The proposed model is a generative model for content and annotations, in which the annotations are assumed to originate either from topics that generated the content or from a general distribution unrelated to the content. We demonstrate the effectiveness of the proposed method by using synthetic data and real social annotation data for text and images.

3 0.18105902 201 nips-2009-Region-based Segmentation and Object Detection

Author: Stephen Gould, Tianshi Gao, Daphne Koller

Abstract: Object detection and multi-class image segmentation are two closely related tasks that can be greatly improved when solved jointly by feeding information from one task to the other [10, 11]. However, current state-of-the-art models use a separate representation for each task making joint inference clumsy and leaving the classification of many parts of the scene ambiguous. In this work, we propose a hierarchical region-based approach to joint object detection and image segmentation. Our approach simultaneously reasons about pixels, regions and objects in a coherent probabilistic model. Pixel appearance features allow us to perform well on classifying amorphous background classes, while the explicit representation of regions facilitate the computation of more sophisticated features necessary for object detection. Importantly, our model gives a single unified description of the scene—we explain every pixel in the image and enforce global consistency between all random variables in our model. We run experiments on the challenging Street Scene dataset [2] and show significant improvement over state-of-the-art results for object detection accuracy. 1

4 0.17947574 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

5 0.14254738 96 nips-2009-Filtering Abstract Senses From Image Search Results

Author: Kate Saenko, Trevor Darrell

Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1

6 0.13745958 133 nips-2009-Learning models of object structure

7 0.10668937 175 nips-2009-Occlusive Components Analysis

8 0.097601339 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

9 0.090731986 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

10 0.085019998 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

11 0.084858291 236 nips-2009-Structured output regression for detection with partial truncation

12 0.084841602 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

13 0.084039129 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

14 0.08311817 104 nips-2009-Group Sparse Coding

15 0.083000861 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

16 0.081513017 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

17 0.079748534 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process

18 0.076057121 260 nips-2009-Zero-shot Learning with Semantic Output Codes

19 0.07391572 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

20 0.073440664 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.204), (1, -0.179), (2, -0.208), (3, -0.108), (4, 0.025), (5, 0.093), (6, -0.012), (7, 0.04), (8, 0.139), (9, 0.064), (10, -0.01), (11, -0.014), (12, 0.087), (13, -0.002), (14, -0.041), (15, 0.017), (16, -0.075), (17, -0.02), (18, 0.088), (19, 0.017), (20, 0.042), (21, 0.006), (22, -0.053), (23, 0.005), (24, 0.006), (25, -0.063), (26, -0.028), (27, 0.023), (28, 0.026), (29, -0.022), (30, -0.013), (31, -0.002), (32, -0.009), (33, -0.028), (34, -0.011), (35, 0.042), (36, -0.089), (37, 0.085), (38, 0.104), (39, 0.05), (40, 0.058), (41, -0.041), (42, -0.017), (43, 0.047), (44, 0.046), (45, 0.016), (46, 0.013), (47, -0.137), (48, 0.039), (49, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94873983 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

2 0.75458175 96 nips-2009-Filtering Abstract Senses From Image Search Results

Author: Kate Saenko, Trevor Darrell

Abstract: We propose an unsupervised method that, given a word, automatically selects non-abstract senses of that word from an online ontology and generates images depicting the corresponding entities. When faced with the task of learning a visual model based only on the name of an object, a common approach is to find images on the web that are associated with the object name and train a visual classifier from the search result. As words are generally polysemous, this approach can lead to relatively noisy models if many examples due to outlier senses are added to the model. We argue that images associated with an abstract word sense should be excluded when training a visual classifier to learn a model of a physical object. While image clustering can group together visually coherent sets of returned images, it can be difficult to distinguish whether an image cluster relates to a desired object or to an abstract sense of the word. We propose a method that uses both image features and the text associated with the images to relate latent topics to particular senses. Our model does not require any human supervision, and takes as input only the name of an object category. We show results of retrieving concrete-sense images in two available multimodal, multi-sense databases, as well as experiment with object classifiers trained on concrete-sense images returned by our method for a set of ten common office objects. 1

3 0.73652244 211 nips-2009-Segmenting Scenes by Matching Image Composites

Author: Bryan Russell, Alyosha Efros, Josef Sivic, Bill Freeman, Andrew Zisserman

Abstract: In this paper, we investigate how, given an image, similar images sharing the same global description can help with unsupervised scene segmentation. In contrast to recent work in semantic alignment of scenes, we allow an input image to be explained by partial matches of similar scenes. This allows for a better explanation of the input scenes. We perform MRF-based segmentation that optimizes over matches, while respecting boundary information. The recovered segments are then used to re-query a large database of images to retrieve better matches for the target regions. We show improved performance in detecting the principal occluding and contact boundaries for the scene over previous methods on data gathered from the LabelMe database.

4 0.73557323 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

Author: Mario Fritz, Gary Bradski, Sergey Karayev, Trevor Darrell, Michael J. Black

Abstract: Existing methods for visual recognition based on quantized local features can perform poorly when local features exist on transparent surfaces, such as glass or plastic objects. There are characteristic patterns to the local appearance of transparent objects, but they may not be well captured by distances to individual examples or by a local pattern codebook obtained by vector quantization. The appearance of a transparent patch is determined in part by the refraction of a background pattern through a transparent medium: the energy from the background usually dominates the patch appearance. We model transparent local patch appearance using an additive model of latent factors: background factors due to scene content, and factors which capture a local edge energy distribution characteristic of the refraction. We implement our method using a novel LDA-SIFT formulation which performs LDA prior to any vector quantization step; we discover latent topics which are characteristic of particular transparent patches and quantize the SIFT space into transparent visual words according to the latent topic dimensions. No knowledge of the background scene is required at test time; we show examples recognizing transparent glasses in a domestic environment. 1

5 0.73383445 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

6 0.71318549 201 nips-2009-Region-based Segmentation and Object Detection

7 0.70952022 175 nips-2009-Occlusive Components Analysis

8 0.63121271 236 nips-2009-Structured output regression for detection with partial truncation

9 0.62304455 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships

10 0.59345895 153 nips-2009-Modeling Social Annotation Data with Content Relevance using a Topic Model

11 0.59174317 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

12 0.58492833 251 nips-2009-Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

13 0.55218124 258 nips-2009-Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

14 0.53277367 6 nips-2009-A Biologically Plausible Model for Rapid Natural Scene Identification

15 0.51476753 93 nips-2009-Fast Image Deconvolution using Hyper-Laplacian Priors

16 0.50615901 85 nips-2009-Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model

17 0.50011653 83 nips-2009-Estimating image bases for visual image reconstruction from human brain activity

18 0.49702257 84 nips-2009-Evaluating multi-class learning strategies in a generative hierarchical framework for object detection

19 0.49692637 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

20 0.47800541 104 nips-2009-Group Sparse Coding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(24, 0.035), (25, 0.096), (31, 0.011), (35, 0.056), (36, 0.089), (39, 0.072), (58, 0.051), (66, 0.012), (71, 0.079), (86, 0.073), (91, 0.014), (98, 0.298)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82965541 206 nips-2009-Riffled Independence for Ranked Data

Author: Jonathan Huang, Carlos Guestrin

Abstract: Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, which encompasses a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. In ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. We provide a formal introduction and present algorithms for using riffled independence within Fourier-theoretic frameworks which have been explored by a number of recent papers. 1

same-paper 2 0.75578302 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation

Author: Lan Du, Lu Ren, Lawrence Carin, David B. Dunson

Abstract: A non-parametric Bayesian model is proposed for processing multiple images. The analysis employs image features and, when present, the words associated with accompanying annotations. The model clusters the images into classes, and each image is segmented into a set of objects, also allowing the opportunity to assign a word to each object (localized labeling). Each object is assumed to be represented as a heterogeneous mix of components, with this realized via mixture models linking image features to object types. The number of image classes, number of object types, and the characteristics of the object-feature mixture models are inferred nonparametrically. To constitute spatially contiguous objects, a new logistic stick-breaking process is developed. Inference is performed efficiently via variational Bayesian analysis, with example results presented on two image databases.

3 0.72831267 25 nips-2009-Adaptive Design Optimization in Experiments with People

Author: Daniel Cavagnaro, Jay Myung, Mark A. Pitt

Abstract: In cognitive science, empirical data collected from participants are the arbiters in model selection. Model discrimination thus depends on designing maximally informative experiments. It has been shown that adaptive design optimization (ADO) allows one to discriminate models as efficiently as possible in simulation experiments. In this paper we use ADO in a series of experiments with people to discriminate the Power, Exponential, and Hyperbolic models of memory retention, which has been a long-standing problem in cognitive science, providing an ideal setting in which to test the application of ADO for addressing questions about human cognition. Using an optimality criterion based on mutual information, ADO is able to find designs that are maximally likely to increase our certainty about the true model upon observation of the experiment outcomes. Results demonstrate the usefulness of ADO and also reveal some challenges in its implementation. 1

4 0.64637172 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes

Author: Mladen Kolar, Le Song, Eric P. Xing

Abstract: To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1

5 0.54383963 133 nips-2009-Learning models of object structure

Author: Joseph Schlecht, Kobus Barnard

Abstract: We present an approach for learning stochastic geometric models of object categories from single view images. We focus here on models expressible as a spatially contiguous assemblage of blocks. Model topologies are learned across groups of images, and one or more such topologies is linked to an object category (e.g. chairs). Fitting learned topologies to an image can be used to identify the object class, as well as detail its geometry. The latter goes beyond labeling objects, as it provides the geometric structure of particular instances. We learn the models using joint statistical inference over category parameters, camera parameters, and instance parameters. These produce an image likelihood through a statistical imaging model. We use trans-dimensional sampling to explore topology hypotheses, and alternate between Metropolis-Hastings and stochastic dynamics to explore instance parameters. Experiments on images of furniture objects such as tables and chairs suggest that this is an effective approach for learning models that encode simple representations of category geometry and the statistics thereof, and support inferring both category and geometry on held out single view images. 1

6 0.53978091 154 nips-2009-Modeling the spacing effect in sequential category learning

7 0.53848153 28 nips-2009-An Additive Latent Feature Model for Transparent Object Recognition

8 0.53816098 226 nips-2009-Spatial Normalized Gamma Processes

9 0.53630292 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

10 0.53499436 112 nips-2009-Human Rademacher Complexity

11 0.53389955 97 nips-2009-Free energy score space

12 0.53367734 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

13 0.53353959 260 nips-2009-Zero-shot Learning with Semantic Output Codes

14 0.53256017 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

15 0.53193533 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

16 0.53133231 211 nips-2009-Segmenting Scenes by Matching Image Composites

17 0.5292238 155 nips-2009-Modelling Relational Data using Bayesian Clustered Tensor Factorization

18 0.52909547 204 nips-2009-Replicated Softmax: an Undirected Topic Model

19 0.52869356 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

20 0.52840233 44 nips-2009-Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships