nips nips2008 nips2008-246 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kate Saenko, Trevor Darrell
Abstract: Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionarybased approach outperforms baseline methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Polysemy is a problem for methods that exploit image search engines to build object category models. [sent-5, score-0.405]
2 Existing unsupervised approaches do not take word sense into consideration. [sent-6, score-0.411]
3 We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. [sent-7, score-1.052]
4 The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. [sent-8, score-0.674]
5 The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. [sent-10, score-1.052]
6 We evaluate our method on a dataset obtained by searching the web for polysemous words. [sent-12, score-0.472]
7 1 Introduction We address the problem of unsupervised learning of object classifiers for visually polysemous words. [sent-14, score-0.342]
8 Visual polysemy means that a word has several dictionary senses that are visually distinct. [sent-15, score-0.821]
9 The drawback is that multiple word meanings often lead to mixed results, especially for polysemous words. [sent-18, score-0.369]
10 One approach involves bootstrapping object classifiers from labeled image data [9], others cluster the unlabeled images into coherent components [6],[2]. [sent-22, score-0.528]
11 The unsupervised approach of [12] bootstraps an SVM from the top-ranked images returned by a search engine, with the assumption that they have higher precision for the category. [sent-24, score-0.495]
12 We propose a fully unsupervised method that specifically takes word sense into account. [sent-26, score-0.411]
13 The only input to our algorithm is a list of words (such as all English nouns, for example) and their dictionary entries. [sent-27, score-0.453]
14 Our method is multimodal, using both web search images and the text surrounding them in the document in which they are embedded. [sent-28, score-0.781]
15 The key idea is to learn a text model of the word sense, using an electronic dictionary such as Wordnet together with a large amount of unlabeled text. [sent-29, score-0.811]
16 The model is then used to retrieve images of a specific sense from the mixed-sense search results. [sent-30, score-0.604]
17 One application is an image search filter that automatically groups results by word sense for easier navigation for the user. [sent-31, score-0.683]
18 However, our main focus in this paper is on using the re-ranked images 1 Figure 1: Which sense of “mouse”? [sent-32, score-0.436]
19 The resulting classifier can predict not only the English word that best describes an input image, but also the correct sense of that word. [sent-35, score-0.365]
20 We regard this method as a baseline to our main approach, which overcomes these issues by learning a model of each sense from a large amount of text obtained by searching the web. [sent-44, score-0.491]
21 Web text is more natural and is a closer match to the text surronding web images than dictionary entries, which allows us to learn more robust models. [sent-45, score-1.133]
22 Each dictionary sense is represented in the latent space of hidden “topics” learned empirically for the polysemous word. [sent-46, score-0.84]
23 To evaluate our algorithm, we collect a dataset by searching the Yahoo Search engine for five polysemous words: “bass”, “face”, “mouse”, “speaker” and “watch”. [sent-47, score-0.383]
24 Experimental evaluation on this dataset includes both retrieval and classification of unseen images into specific visual senses. [sent-49, score-0.442]
25 2 Model The inspiration for our method comes from the fact that text surrounding web images indexed by a polysemous keyword can be a rich source of information about the sense of that word. [sent-50, score-1.362]
26 The main idea is to learn a probabilistic model of each sense, as defined by entries in a dictionary (in our case, Wordnet), from a large amount of unlabeled text. [sent-51, score-0.489]
27 The use of a dictionary is key because it frees us from needing a labeled set of images to learn the visual sense model. [sent-52, score-0.951]
28 Like standard word sense disambiguation (WSD) methods, we make a one-sense-perdocument assumption [14], and rely on words co-occurring with the image in the HTML document to indicate that sense. [sent-54, score-0.688]
29 Our method consists of three steps: 1) discovering latent dimensions in text associated with a keyword, 2) learning probabilistic models of dictionary senses in that latent space, and 3) using the text-based sense models to construct sense-specific image classifiers. [sent-55, score-1.293]
30 1 Latent Text Space Unlike words in text commonly used in WSD, image links are not guaranteed to be surrounded by grammatical prose. [sent-58, score-0.461]
31 Finally, for each word token i, we choose a topic zi from the multinomial θd , and then choose a word wi from the multinomial φzi . [sent-78, score-0.438]
32 However, while the resulting topics were often aligned along sense boundaries, the approach suffered from over-fitting, due to the irregular quality and low quantity of the data. [sent-83, score-0.344]
33 Often, the only clue to the image’s sense is a short text fragment, such as “fishing with friends” for an image returned for the query “bass”. [sent-84, score-0.684]
34 To allieviate the overfitting problem, we instead create an additional dataset of text-only web pages returned from regular web search. [sent-85, score-0.456]
35 We then learn an LDA model on this dataset and use the resulting distributions to train a model of the dictionary senses, described next. [sent-86, score-0.489]
36 2 Dictionary Sense Model We use the limited text available in the Wordnet entries to relate dictionary sense to topics formed above. [sent-88, score-0.913]
37 We denote the bag-of-words extracted from such a dictionary entry for sense s as es = w1 , w2 , . [sent-95, score-0.583]
38 The model is trained as follows: Given a query word with sense s ∈ {1, 2, . [sent-99, score-0.42]
39 S} we define the likelihood of a particular sense given the topic j as P (s|z = j) ≡ 1 Es Es P (wi |z = j), (2) i=1 or the average likelihood of words in the definition. [sent-102, score-0.382]
40 For a web image with an associated text document d = w1 , w2 , . [sent-103, score-0.538]
41 Finally, we define the probability of a particular dictionary sense given the image to be equal to P (s|d). [sent-108, score-0.742]
42 Thus, our model is able to assign sense probabilities to images returned from the search engine, which in turn allows us to group the images according to sense. [sent-109, score-0.855]
43 3 Table 1: Dataset Description: sizes of the three datasets, and distribution of ground truth sense labels in the keyword dataset. [sent-114, score-0.657]
44 We represent images as histograms of visual words, which are obtained by detecting local interest points and vector-quantizing their descriptors using a fixed visual vocabulary. [sent-117, score-0.379]
45 We compare our model with a simple baseline method that attempts to refine the search by automatically generating search terms from the dictionary entry. [sent-118, score-0.686]
46 Consequently, the terms are generated by appending the polysemous word to its synonyms and first-level hypernyms. [sent-120, score-0.373]
47 3 Datasets To train and evaluate the outlined algorithms, we use three datasets: image search results using the given keyword, image search results using sense-specific search terms, and text search results using the given keyword. [sent-123, score-1.038]
48 The first dataset was collected automatically by issuing queries to the Yahoo Image SearchTM website and downloading the returned images and HTML web pages. [sent-124, score-0.597]
49 The images were labeled by a human annotator with one sense per keyword. [sent-129, score-0.559]
50 The annotator saw only the images, and not the text or the dictionary definitions. [sent-131, score-0.588]
51 For evaluation, we used only good labels as positive, and grouped partial and unrelated images into the negative class. [sent-134, score-0.358]
52 The second image search dataset was collected in a similar manner but using the generated sensespecific search terms. [sent-136, score-0.476]
53 The third, text-only dataset was collected via regular web search for the original keywords. [sent-137, score-0.354]
54 4 Features When extracting words from web pages, all HTML tags are removed, and the remaining text is tokenized. [sent-140, score-0.434]
55 To extract text context words for an image, the image link is 4 located automatically in the corresponding HTML page. [sent-143, score-0.499]
56 All word tokens in a 100-token window surrounding the location of the image link are extracted. [sent-144, score-0.394]
57 The text vocabulary size used for the sense model ranges between 12K-20K words for different keywords. [sent-145, score-0.503]
58 To extract image features, all images are resized to 300 pixels in width and converted to grayscale. [sent-146, score-0.369]
59 A codebook of size 800 is constructed by k-means clustering a randomly chosen subset of the database (300 images per keyword), and all images are converted to histograms over the resulting visual words. [sent-155, score-0.492]
60 1 Experiments Re-ranking Image Search Results In the first set of experiments, we evaluate how well our text-based sense model can distinguish between images depicting the correct visual sense and all the other senses. [sent-159, score-0.761]
61 We train a separate LDA model for each keyword on the text-only dataset, setting the number of topics K to 8 in each case. [sent-160, score-0.465]
62 We then compute P (s|d) for all text contexts d associated with images in the keyword dataset, using Equation 3, and rank the corresponding images according to the probability of each sense. [sent-173, score-0.916]
63 Since we only have ground truth labels for a single sense per keyword (see Section 3), we evaluate the retrieval performance for that particular ground truth sense. [sent-174, score-0.848]
64 For example, the first plot shows ROCs obtained by the eight models corresponding to each of the senses of the keyword “bass”. [sent-176, score-0.573]
65 The other thick curves show the dictionary sense models that correspond to the ground truth sense (a fish). [sent-178, score-0.946]
66 The results demonstrate that we are able to learn a useful sense model that retrieves far more positiveclass images than the original search engine order. [sent-179, score-0.671]
67 Note that, for some keywords, there are multiple dictionary definitions that are difficult to distinguish visually, for example, “human face” and “facial expression”. [sent-181, score-0.357]
68 In interactive applications, the human user can specify the intended sense of the word by providing an extra keyword, such as by saying or typing “bass fish”. [sent-183, score-0.399]
69 The correct dictionary sense can then be selected by evaluating the probability of the extra keyword under each sense model, and choosing the highest-scoring one. [sent-184, score-1.124]
70 5 Figure 2: Retrieval of the ground truth sense from keyword search results. [sent-185, score-0.748]
71 Other thick lines are the ROCs obtained by our dictionary model for the true senses, and thin lines are the ROCs obtained for the other senses. [sent-187, score-0.407]
72 We train a classifier for the object corresponding to the ground-truth sense of each polysemous keyword in our data. [sent-190, score-0.84]
73 The clasifiers are binary, assigning a positive label to the correct sense and a negative label to incorrect senses and all other objects. [sent-191, score-0.517]
74 The top N unlabeled images ranked by the sense model are selected as positive training images. [sent-192, score-0.49]
75 The unlabeled pool used in our model consists of both the keyword and the sense-term datasets. [sent-193, score-0.369]
76 Recall 6 terms dict 85% 85% terms dict 85% terms dict 75% 75% 75% 65% 65% 55% 55% bass face mouse speaker (a) 1-SENSE test set watch 65% 55% bass face mouse speaker watch 50 100 150 200 250 300 N (b) MIX-SENSE test set (c) 1-SENSE average vs. [sent-197, score-2.059]
77 N Figure 3: Classification accuracy for the search-terms baseline (terms) and our dictionary model (dict). [sent-198, score-0.407]
78 that this dataset was collected by simply searching with word combinations extracted from the target sense definition. [sent-199, score-0.476]
79 For example, we test detection of “computer mouse” among other keyword objects as well as “animal mouse”, “Mickey Mouse” and other senses returned by the search, including unrelated images. [sent-205, score-0.786]
80 In both test cases, our dictionary method significantly improves on the baseline algorithm. [sent-211, score-0.407]
81 However, in the other three cases, the term generation fails while our model is still able to capture the dictionary sense. [sent-214, score-0.357]
82 Yarowsky [14] proposed an unsupervised WSD method, and suggested the use of dictionary definitions as an initial seed. [sent-216, score-0.403]
83 Several approaches to building object models using image search results have been proposed, although none have specifically addressed polysemous words. [sent-217, score-0.546]
84 [12] incorporate text features (such as whether the keyword appears in the URL) and use them re-rank the images before training the image model. [sent-226, score-0.865]
85 The combination of image and text features is used in some web retrieval methods (e. [sent-232, score-0.574]
86 [1] use visual features to help disambiguate word senses in such loosely labeled data. [sent-238, score-0.539]
87 Models of annotated images assume that there is a correspondence between each image region and a word in the caption (e. [sent-239, score-0.581]
88 In contrast, our model predicts a category label based on all of the words in the web image’s text context. [sent-243, score-0.47]
89 In general, a text context word does not necessarily have a corresponding visual region, and vice versa. [sent-244, score-0.392]
90 7 Conclusion We introduced a model that uses a dictionary and text contexts of web images to disambiguate image senses. [sent-247, score-1.095]
91 To the best of our knowledge, it is the first use of a dictionary in either web-based image retrieval or classifier learning. [sent-248, score-0.593]
92 Our approach harnesses the large amount of unlabeled text available through keyword search on the web in conjunction with the dictionary entries to learn a generative model of sense. [sent-249, score-1.262]
93 Our sense model is purely unsupervised, and is appropriate for web images. [sent-250, score-0.383]
94 The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. [sent-251, score-0.674]
95 The definition text is used to learn a distribution over the empirical text topics that best represents the sense. [sent-252, score-0.527]
96 As a final step, a discriminative classifier is trained on the re-ranked mixed-sense images that can predict the correct sense for novel images. [sent-253, score-0.462]
97 We evaluated our model on a large dataset of over 10,000 images consisting of search results for five polysemous words. [sent-254, score-0.584]
98 Experiments included retrieval of the ground truth sense and classification of unseen images. [sent-255, score-0.42]
99 On the retrieval task, our dictionary model improved on the baseline search engine precision by re-ranking the images according to sense probability. [sent-256, score-1.138]
100 Of course, we would not expect the dictionary senses to always produce accurate visual models, as many senses do not refer to objects (e. [sent-260, score-0.983]
wordName wordTfidf (topN-words)
[('dictionary', 0.357), ('keyword', 0.315), ('mouse', 0.272), ('bass', 0.268), ('senses', 0.258), ('sense', 0.226), ('images', 0.21), ('polysemous', 0.201), ('watch', 0.191), ('text', 0.181), ('image', 0.159), ('web', 0.157), ('word', 0.139), ('search', 0.12), ('topics', 0.118), ('lda', 0.105), ('words', 0.096), ('speaker', 0.092), ('face', 0.091), ('returned', 0.089), ('unrelated', 0.086), ('rocs', 0.084), ('wordnet', 0.081), ('retrieval', 0.077), ('dict', 0.077), ('visual', 0.072), ('surrounding', 0.072), ('engine', 0.068), ('wsd', 0.067), ('object', 0.066), ('yahoo', 0.065), ('html', 0.061), ('topic', 0.06), ('classi', 0.058), ('latent', 0.056), ('unlabeled', 0.054), ('dataset', 0.053), ('sh', 0.052), ('annotator', 0.05), ('baseline', 0.05), ('thick', 0.05), ('retrieve', 0.048), ('learn', 0.047), ('unsupervised', 0.046), ('truth', 0.044), ('er', 0.044), ('caption', 0.043), ('ground', 0.043), ('document', 0.041), ('keywords', 0.039), ('labeled', 0.039), ('automatically', 0.039), ('barnard', 0.038), ('mickey', 0.038), ('polysemy', 0.038), ('saenko', 0.038), ('schroff', 0.038), ('objects', 0.038), ('wi', 0.038), ('dirichlet', 0.036), ('category', 0.036), ('discover', 0.035), ('human', 0.034), ('searching', 0.034), ('electronic', 0.033), ('musical', 0.033), ('synonyms', 0.033), ('porter', 0.033), ('negative', 0.033), ('train', 0.032), ('entries', 0.031), ('multinomial', 0.031), ('animal', 0.031), ('device', 0.031), ('disambiguate', 0.031), ('datasets', 0.03), ('annotated', 0.03), ('unseen', 0.03), ('precision', 0.03), ('query', 0.029), ('labels', 0.029), ('meanings', 0.029), ('visually', 0.029), ('evaluate', 0.027), ('english', 0.027), ('iccv', 0.027), ('disambiguation', 0.027), ('blei', 0.027), ('trained', 0.026), ('trevor', 0.026), ('speakers', 0.026), ('links', 0.025), ('queries', 0.025), ('berg', 0.025), ('descriptors', 0.025), ('edge', 0.025), ('fergus', 0.024), ('engines', 0.024), ('collected', 0.024), ('link', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
Author: Kate Saenko, Trevor Darrell
Abstract: Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionarybased approach outperforms baseline methods. 1
2 0.22696157 226 nips-2008-Supervised Dictionary Learning
Author: Julien Mairal, Jean Ponce, Guillermo Sapiro, Andrew Zisserman, Francis R. Bach
Abstract: It is now well established that sparse signal models are well suited for restoration tasks and can be effectively learned from audio, image, and video data. Recent research has been aimed at learning discriminative sparse models instead of purely reconstructive ones. This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary and discriminative class models. The linear version of the proposed model admits a simple probabilistic interpretation, while its most general variant admits an interpretation in terms of kernels. An optimization framework for learning all the components of the proposed model is presented, along with experimental results on standard handwritten digit and texture classification tasks. 1
3 0.14927398 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
4 0.14222376 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
Author: Tae-kyun Kim, Roberto Cipolla
Abstract: We present a new co-clustering problem of images and visual features. The problem involves a set of non-object images in addition to a set of object images and features to be co-clustered. Co-clustering is performed in a way that maximises discrimination of object images from non-object images, thus emphasizing discriminative features. This provides a way of obtaining perceptual joint-clusters of object images and features. We tackle the problem by simultaneously boosting multiple strong classifiers which compete for images by their expertise. Each boosting classifier is an aggregation of weak-learners, i.e. simple visual features. The obtained classifiers are useful for object detection tasks which exhibit multimodalities, e.g. multi-category and multi-view object detection tasks. Experiments on a set of pedestrian images and a face data set demonstrate that the method yields intuitive image clusters with associated features and is much superior to conventional boosting classifiers in object detection tasks. 1
5 0.13728862 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
Author: Simon Lacoste-julien, Fei Sha, Michael I. Jordan
Abstract: Probabilistic topic models have become popular as methods for dimensionality reduction in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood or Bayesian methods. In this paper, we discuss an alternative: a discriminative framework in which we assume that supervised side information is present, and in which we wish to take that side information into account in finding a reduced dimensionality representation. Specifically, we present DiscLDA, a discriminative variation on Latent Dirichlet Allocation (LDA) in which a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification. We compare the predictive power of the latent structure of DiscLDA with unsupervised LDA on the 20 Newsgroups document classification task and show how our model can identify shared topics across classes as well as class-dependent topics.
6 0.13571751 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
7 0.12847491 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
8 0.12798698 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
9 0.1271304 229 nips-2008-Syntactic Topic Models
10 0.11986013 4 nips-2008-A Scalable Hierarchical Distributed Language Model
11 0.11440451 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
12 0.11303762 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
13 0.11022159 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
14 0.097553924 113 nips-2008-Kernelized Sorting
15 0.095987104 184 nips-2008-Predictive Indexing for Fast Search
16 0.092647314 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
17 0.091113038 148 nips-2008-Natural Image Denoising with Convolutional Networks
18 0.090622231 242 nips-2008-Translated Learning: Transfer Learning across Different Feature Spaces
19 0.087413624 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
20 0.083026022 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
topicId topicWeight
[(0, -0.211), (1, -0.191), (2, 0.113), (3, -0.25), (4, -0.069), (5, 0.034), (6, 0.035), (7, -0.025), (8, -0.111), (9, 0.023), (10, 0.018), (11, -0.056), (12, -0.099), (13, -0.018), (14, -0.035), (15, -0.084), (16, 0.011), (17, -0.034), (18, 0.033), (19, -0.05), (20, 0.051), (21, 0.022), (22, -0.005), (23, 0.01), (24, 0.038), (25, -0.003), (26, 0.016), (27, -0.074), (28, -0.076), (29, -0.061), (30, 0.012), (31, 0.066), (32, 0.071), (33, -0.033), (34, 0.01), (35, 0.03), (36, 0.018), (37, 0.035), (38, 0.031), (39, 0.017), (40, -0.002), (41, -0.002), (42, 0.036), (43, 0.055), (44, 0.016), (45, -0.091), (46, -0.104), (47, 0.082), (48, 0.082), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.96472639 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
Author: Kate Saenko, Trevor Darrell
Abstract: Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionarybased approach outperforms baseline methods. 1
2 0.64884907 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
Author: Simon Lacoste-julien, Fei Sha, Michael I. Jordan
Abstract: Probabilistic topic models have become popular as methods for dimensionality reduction in collections of text documents or images. These models are usually treated as generative models and trained using maximum likelihood or Bayesian methods. In this paper, we discuss an alternative: a discriminative framework in which we assume that supervised side information is present, and in which we wish to take that side information into account in finding a reduced dimensionality representation. Specifically, we present DiscLDA, a discriminative variation on Latent Dirichlet Allocation (LDA) in which a class-dependent linear transformation is introduced on the topic mixture proportions. This parameter is estimated by maximizing the conditional likelihood. By using the transformed topic mixture proportions as a new representation of documents, we obtain a supervised dimensionality reduction algorithm that uncovers the latent structure in a document collection while preserving predictive power for the task of classification. We compare the predictive power of the latent structure of DiscLDA with unsupervised LDA on the 20 Newsgroups document classification task and show how our model can identify shared topics across classes as well as class-dependent topics.
3 0.6354813 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller
Abstract: One of the original goals of computer vision was to fully understand a natural scene. This requires solving several sub-problems simultaneously, including object detection, region labeling, and geometric reasoning. The last few decades have seen great progress in tackling each of these problems in isolation. Only recently have researchers returned to the difficult task of considering them jointly. In this work, we consider learning a set of related models in such that they both solve their own problem and help each other. We develop a framework called Cascaded Classification Models (CCM), where repeated instantiations of these classifiers are coupled by their input/output variables in a cascade that improves performance at each level. Our method requires only a limited “black box” interface with the models, allowing us to use very sophisticated, state-of-the-art classifiers without having to look under the hood. We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d reconstruction. 1
4 0.58960021 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
Author: Kilian Q. Weinberger, Olivier Chapelle
Abstract: Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat” classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization. 1
5 0.58244514 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
Author: Geremy Heitz, Gal Elidan, Benjamin Packer, Daphne Koller
Abstract: Discriminative tasks, including object categorization and detection, are central components of high-level computer vision. Sometimes, however, we are interested in more refined aspects of the object in an image, such as pose or particular regions. In this paper we develop a method (LOOPS) for learning a shape and image feature model that can be trained on a particular object class, and used to outline instances of the class in novel images. Furthermore, while the training data consists of uncorresponded outlines, the resulting LOOPS model contains a set of landmark points that appear consistently across instances, and can be accurately localized in an image. Our model achieves state-of-the-art results in precisely outlining objects that exhibit large deformations and articulations in cluttered natural images. These localizations can then be used to address a range of tasks, including descriptive classification, search, and clustering. 1
6 0.57939315 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
7 0.57935548 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
8 0.57308364 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
9 0.56777221 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
10 0.56550157 226 nips-2008-Supervised Dictionary Learning
11 0.56351602 229 nips-2008-Syntactic Topic Models
12 0.55899352 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
13 0.55696583 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
14 0.5516305 66 nips-2008-Dynamic visual attention: searching for coding length increments
15 0.54694194 148 nips-2008-Natural Image Denoising with Convolutional Networks
16 0.54195225 4 nips-2008-A Scalable Hierarchical Distributed Language Model
17 0.52903855 52 nips-2008-Correlated Bigram LSA for Unsupervised Language Model Adaptation
18 0.52196604 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree
19 0.51680082 62 nips-2008-Differentiable Sparse Coding
20 0.51325721 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
topicId topicWeight
[(6, 0.052), (7, 0.055), (12, 0.058), (28, 0.148), (32, 0.227), (35, 0.01), (57, 0.1), (59, 0.014), (63, 0.014), (71, 0.012), (77, 0.06), (78, 0.012), (81, 0.039), (83, 0.093)]
simIndex simValue paperId paperTitle
1 0.8996588 28 nips-2008-Asynchronous Distributed Learning of Topic Models
Author: Padhraic Smyth, Max Welling, Arthur U. Asuncion
Abstract: Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known unsupervised learning frameworks: Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Processes (HDP). In the proposed approach, the data are distributed across P processors, and processors independently perform Gibbs sampling on their local data and communicate their information in a local asynchronous manner with other processors. We demonstrate that our asynchronous algorithms are able to learn global topic models that are statistically as accurate as those learned by the standard LDA and HDP samplers, but with significant improvements in computation time and memory. We show speedup results on a 730-million-word text corpus using 32 processors, and we provide perplexity results for up to 1500 virtual processors. As a stepping stone in the development of asynchronous HDP, a parallel HDP sampler is also introduced. 1
same-paper 2 0.82667017 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
Author: Kate Saenko, Trevor Darrell
Abstract: Polysemy is a problem for methods that exploit image search engines to build object category models. Existing unsupervised approaches do not take word sense into consideration. We propose a new method that uses a dictionary to learn models of visual word sense from a large collection of unlabeled web data. The use of LDA to discover a latent sense space makes the model robust despite the very limited nature of dictionary definitions. The definitions are used to learn a distribution in the latent space that best represents a sense. The algorithm then uses the text surrounding image links to retrieve images with high probability of a particular dictionary sense. An object classifier is trained on the resulting sense-specific images. We evaluate our method on a dataset obtained by searching the web for polysemous words. Category classification experiments show that our dictionarybased approach outperforms baseline methods. 1
3 0.78866845 250 nips-2008-Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
Author: Ali Rahimi, Benjamin Recht
Abstract: Randomized neural networks are immortalized in this AI Koan: In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. “What are you doing?” asked Minsky. “I am training a randomly wired neural net to play tic-tac-toe,” Sussman replied. “Why is the net wired randomly?” asked Minsky. Sussman replied, “I do not want it to have any preconceptions of how to play.” Minsky then shut his eyes. “Why do you close your eyes?” Sussman asked his teacher. “So that the room will be empty,” replied Minsky. At that moment, Sussman was enlightened. We analyze shallow random networks with the help of concentration of measure inequalities. Specifically, we consider architectures that compute a weighted sum of their inputs after passing them through a bank of arbitrary randomized nonlinearities. We identify conditions under which these networks exhibit good classification performance, and bound their test error in terms of the size of the dataset and the number of random nonlinearities. 1
4 0.78760409 103 nips-2008-Implicit Mixtures of Restricted Boltzmann Machines
Author: Vinod Nair, Geoffrey E. Hinton
Abstract: We present a mixture model whose components are Restricted Boltzmann Machines (RBMs). This possibility has not been considered before because computing the partition function of an RBM is intractable, which appears to make learning a mixture of RBMs intractable as well. Surprisingly, when formulated as a third-order Boltzmann machine, such a mixture model can be learned tractably using contrastive divergence. The energy function of the model captures threeway interactions among visible units, hidden units, and a single hidden discrete variable that represents the cluster label. The distinguishing feature of this model is that, unlike other mixture models, the mixing proportions are not explicitly parameterized. Instead, they are defined implicitly via the energy function and depend on all the parameters in the model. We present results for the MNIST and NORB datasets showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data. 1
5 0.6974805 92 nips-2008-Generative versus discriminative training of RBMs for classification of fMRI images
Author: Tanya Schmah, Geoffrey E. Hinton, Steven L. Small, Stephen Strother, Richard S. Zemel
Abstract: Neuroimaging datasets often have a very large number of voxels and a very small number of training cases, which means that overfitting of models for this data can become a very serious problem. Working with a set of fMRI images from a study on stroke recovery, we consider a classification task for which logistic regression performs poorly, even when L1- or L2- regularized. We show that much better discrimination can be achieved by fitting a generative model to each separate condition and then seeing which model is most likely to have generated the data. We compare discriminative training of exactly the same set of models, and we also consider convex blends of generative and discriminative training. 1
6 0.69069374 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
7 0.68016148 95 nips-2008-Grouping Contours Via a Related Image
8 0.67775095 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
9 0.67596364 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
10 0.6725831 229 nips-2008-Syntactic Topic Models
11 0.67142826 4 nips-2008-A Scalable Hierarchical Distributed Language Model
12 0.67031264 219 nips-2008-Spectral Hashing
13 0.66800535 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
14 0.66791642 194 nips-2008-Regularized Learning with Networks of Features
15 0.6673286 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
16 0.66624552 226 nips-2008-Supervised Dictionary Learning
17 0.66561568 200 nips-2008-Robust Kernel Principal Component Analysis
18 0.66528958 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
19 0.66361701 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
20 0.6632899 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes