nips nips2008 nips2008-116 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. [sent-6, score-0.778]
2 We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. [sent-7, score-1.382]
3 We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. [sent-8, score-0.447]
4 Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. [sent-9, score-0.47]
5 1 Introduction Image annotation, or image labeling, in which the task is to label each pixel or region of an image with a class label, is becoming an increasingly popular problem in the machine learning and machine vision communities [7, 14]. [sent-10, score-0.864]
6 State-of-the-art methods formulate image annotation as a structured prediction problem, and utilize methods such as Conditional Random Fields [8, 4], which output multiple values for each input item. [sent-11, score-0.42]
7 Learning labeling models with such data would help improve segmentation performance and relax the constraint of discriminative labeling methods. [sent-17, score-0.499]
8 A wide range of learning methods have been developed for using partially-labeled image data. [sent-18, score-0.276]
9 One approach adopts a discriminative formulation, and treats the unlabeled regions as missing data [16], Others take a semi-supervised learning approach by viewing unlabeled image regions as unlabeled data. [sent-19, score-0.561]
10 However, the common assumption about the smoothness of the label distribution with respect to the input data may not be valid in image labeling, due to large intra-class variation of object appearance. [sent-21, score-0.613]
11 Other semi-supervised methods adopt a hybrid approach, combining a generative model of the input data with a discriminative model for image labeling, in which the unlabeled data are used to regularize the learning of a discriminative model [6, 9]. [sent-22, score-0.769]
12 Our approach described in this paper extends the hybrid modeling strategy by incorporating a more flexible generative model for image data. [sent-24, score-0.506]
13 In particular, we introduce a set of latent variables that capture image feature patterns in a hidden feature space, which are used to facilitate the labeling task. [sent-25, score-0.672]
14 First, we extend the Latent Dirichlet Allocation model (LDA) [3] to include not only input features but also label information, capturing co-occurrences within and between image feature patterns and object classes in the data set. [sent-26, score-0.695]
15 Unlike other topic models in image modeling [11, 18], our model integrates a generative model of image appearance and a discriminative model of region 1 labels. [sent-27, score-1.355]
16 Second, the original LDA structure does not impose any spatial smoothness constraint to label prediction, yet incorporating such a spatial prior is important for scene segmentation. [sent-28, score-0.594]
17 Previous approaches have introduced lateral connections between latent topic variables [17, 15]. [sent-29, score-0.438]
18 However, this complicates the model learning, and as a latent representation of image data, the topic variables can be non-smooth over the image plane in general. [sent-30, score-1.021]
19 In this paper, we model the spatial dependency of labels by two different structures: one introduces directed connections between each label variable and its neighboring topic variables, and the other incorporates lateral connections between label variables. [sent-31, score-1.175]
20 We will investigate whether these structures effectively capture the spatial prior, and lead to accurate label predictions. [sent-32, score-0.356]
21 The next section presents the base model, and two different extensions to handle label spatial dependencies. [sent-34, score-0.423]
22 2 Model description The structured prediction problem in image labeling can be formulated as follows. [sent-37, score-0.536]
23 Let an image x be represented as a set of subregions {xi }Nx . [sent-38, score-0.276]
24 The aim is to assign each xi a label li from a i=1 categorical set L. [sent-39, score-0.617]
25 For instance, subregion xi ’s can be image patches or pixels, and L consists of object classes. [sent-40, score-0.387]
26 We first introduce our base model for capturing individual patterns in image appearance and label space. [sent-44, score-0.925]
27 Assume each subregion xi is represented by two features (ai , ti ), in which ai describes its appearance (including color, texture, etc. [sent-45, score-0.78]
28 ) in some appearance feature space A and ti is its position on the image plane T . [sent-46, score-0.795]
29 We achieve this by extending the latent Dirichlet allocation model to include both label and appearance. [sent-48, score-0.419]
30 More specifically, we assume each observation pair (ai , li ) in image x is generated from a mixture of K hidden ‘topic’ components shared across the whole dataset, given the position information ti . [sent-49, score-0.978]
31 Also, zi is used as an indicator variable to specify from which hidden topic component the pair (ai, li ) is generated. [sent-51, score-0.909]
32 Our model defines a joint distribution of label variables l and appearance feature variables a given the position t as follows, Pb (l, a|t, α) = [ θ P (li |ai , ti , zi )P (ai |zi )P (zi |θ)]P (θ|α)dθ i (1) zi where P (θ|α) is the Dirichlet distribution. [sent-53, score-1.347]
33 We specify the appearance model P (ai |zi ) to be position invariant but the label predictor P (li |ai , ti , zi ) depends on the position information. [sent-54, score-1.077]
34 (a) Label prediction module P (li |ai , ti , zi ). [sent-56, score-0.613]
35 The label predictor P (li |ai , ti , zi ) is modeled by a probabilistic classifier that takes (ai , ti , zi ) as its input and produces a properly normalized distribution for li . [sent-57, score-1.571]
36 We follow the convention of topic models and model the topic conditional distributions of the image appearance using a multinomial distribution with parameters βzi . [sent-62, score-1.046]
37 As the appearance features typically take on real values, we first apply k-means clustering to the image features {ai } to build a visual vocabulary V. [sent-63, score-0.551]
38 Thus a feature ai in the appearance space A can be represented as a visual word v, and we have P (ai = v|zi = k) = βk,v . [sent-64, score-0.527]
39 While the topic prediction model in Equation 1 is able to capture regularly co-occurring patterns in the joint space of label and appearance, it ignores spatial priors on the label prediction. [sent-65, score-0.971]
40 However, 2 α α zi β α θ θ θ zi−1 zi zi+1 zi−1 ai−1 ai ai+1 zi zi+1 ai−1 ai ai+1 li−1 li li+1 ai K li ti β K β li li−1 li+1 K N t D t D D Figure 1: Left:A graphical representation of the base topic prediction model (Model I). [sent-66, score-3.412]
41 spatial priors, such as spatial smoothness, are crucial to labeling tasks, as neighboring labels are usually strongly correlated. [sent-71, score-0.525]
42 We introduce a dependency between each label variable and its neighboring topic variables. [sent-74, score-0.556]
43 In this model, each label value is predicted based on the summary information of topics within a neighborhood. [sent-75, score-0.298]
44 More specifically, we change the label prediction model into the following form: P (li |ai , ti , zN (i) ) = P (li |ai , ti , j∈N (i) wj zj ), (2) where N (i) is a predefined neighborhood for site i, and wj is the weight for the topic variable zj . [sent-76, score-1.428]
45 We add lateral connections between label variables to build a Conditional Random Field of labels. [sent-84, score-0.364]
46 The joint label distribution given input image is defined as 1 P (l|a, t, α) = exp{ f (li , lj ) + γ log Pb (li |a, t, α)}, (3) i,j∈N (i) i Z where Z is the partition function. [sent-85, score-0.65]
47 The pairwise potential f (li , lj ) = a,b uab δli ,a δlj ,b , and the unary potential is defined as log output of the base topic prediction model weighted by γ. [sent-86, score-0.641]
48 Note that Pb (li |a, t, α) = zi P (li |ai , ti , zi )P (zi |a, t). [sent-88, score-0.735]
49 Note that the base model (Model I) obtains spatially smooth labels simply through the topics capturing location-dependent co-occurring appearance/label patterns, which tend to be nearby in image space. [sent-90, score-0.564]
50 Model II explicitly predicts a region’s label from the topics in its local neighborhood, so that neighboring labels share similar contexts defined by latent topics. [sent-91, score-0.516]
51 The third model uses a conventional form of spatial dependency by directly incorporating local smoothing in the label field. [sent-93, score-0.526]
52 3 Inference and Label Prediction Given a new image x = {a, t} and our topic models, we predict its labeling based on the Maximum Posterior Marginals (MPM) criterion: ∗ li = arg max P (li |a, t). [sent-95, score-1.04]
53 (4) li We consider the label inference procedure for three models separately as follows. [sent-96, score-0.649]
54 Models Iⅈ: The marginal label distribution P (li |a, t) can be computed as: P (li |a, t) = zN (i) P (li |ai , ti , 3 j∈N (i) wj zj )P (zN (i) |a, t) (5) The summation here is difficult when N (i) is large. [sent-97, score-0.669]
55 Denote vi = j∈N (i) wj zj and vi,q = j∈N (i) wj q(zj ), where q(zj ) = {P (zj |a, t)} is the vector form of posterior distribution. [sent-99, score-0.424]
56 The marginal label distribution can be written as P (li |a, t) = P (li |ai , ti , vi ) P (zN (i) |a,t) . [sent-101, score-0.533]
57 We take the first-order approximation of P (li |ai , ti , vi ) around vi,q using Taylor expansion: P (li |ai , ti , vi ) ≈ P (li |ai , ti , vi,q ) + (vi − vi,q )T · vi P (li |ai , ti , vi )|vi,q . [sent-102, score-1.176]
58 P (zN (i) |a, t) (notice that vi P (zN (i) |a,t) = vi,q ), we have the following approximation: P (li |a, t) ≈ zN (i) P (li |ai , ti , j∈N (i) wj q(zj )). [sent-106, score-0.402]
59 Model III: We first compute the unary potential of the CRF model from the base topic prediction model, i. [sent-107, score-0.439]
60 , Pb (li |a, t) = zi P (li |ai , ti , zi )P (zi |a, t). [sent-109, score-0.735]
61 Then the label marginals in Equation 4 are computed by applying loopy belief propagation to the conditional random field. [sent-110, score-0.309]
62 In both situations, we need the conditional distribution of the hidden topic variables z given observed data components to compute the label prediction. [sent-111, score-0.584]
63 From Equation 1, we can derive the posterior of each topic variable zi given other variables, which is required by Gibbs sampling: P (zi = k|z−i , ai ) ∝ P (ai |zi )(αk + m∈S\i δzm ,k ) (7) where z−i denotes all the topic variables in z except zi , and S is the set of all sites. [sent-113, score-1.3]
64 4 Learning with partially labeled data Here we consider estimating the parameters of both extended models from a partially labeled image set D = {xn , ln }. [sent-115, score-0.643]
65 For an image xn , its label ln = (ln , ln ) in which ln denotes the observed labels, o h o and ln are missing. [sent-116, score-0.799]
66 The lower bound of the likelihood can be written as n n log P (li |an , tn , zN (i) ) + i i Q= n n log P (an |zi ) + log P (z) i i∈o P (zn |ln ,an ) o (9) i In the E step, the posterior distributions of the topic variables are estimated by a Gibbs sampling procedure similar to Equation 7. [sent-122, score-0.458]
67 It uses the following conditional probability: P (zi = k|z−i , ai , l, t) ∝ P (lj |aj , tj , zN (j) )P (ai |zi )(αk + j∈N (i)∩o δzm ,k ) (10) m∈S\i Note that any label variable is marginalized out if it is missing. [sent-123, score-0.567]
68 Denote the posterior distribution of z as q(·), the updating equation for parameters of the appearance module P (a|z) can be derived from the stationary point of Q: ∗ βk,v ∝ n,i n q(zi = k)δ(an , v). [sent-125, score-0.316]
69 i (11) The classifier in the label prediction module is learned by maximizing the following log likelihood, n log P (li |an , tn , i i Lc = n,i∈o wj zj ) q(zN (i) ) n log P (li |an , tn , i i ≈ n,i∈o j∈N (i) 4 wj q(zj )). [sent-126, score-0.938]
70 The parameters of the base topic prediction model are learned using the same procedure as in Models I&II. [sent-132, score-0.396]
71 ; More specifically, we set N (i) = i and estimate the parameters of the appearance module and label classifier based on Maximum Likelihood. [sent-133, score-0.525]
72 Given the base topic prediction model, we compute the marginal label probability Pb (li |a, t) and plug in the unary potential function in the CRF model (see Equation 3). [sent-135, score-0.678]
73 (13) n n where Zi = li exp{ j∈N (i) a,b uab δli ,a δlj ,b + γ log Pb (li |a, t)} is the normalizing constant. [sent-137, score-0.483]
74 This subset includes 240 images and 9 different label classes. [sent-142, score-0.288]
75 The second set is the full MSRC image dataset, including 591 images and 21 object classes. [sent-143, score-0.369]
76 We use the normalized cut segmentation algorithm [13] to build a super-pixel representation of the images, in which the segmentation algorithm is tuned to generate approximately 1000 segments for each image on average. [sent-146, score-0.38]
77 We extract a set of basic image features, including color, edge and texture information, from each pixel site. [sent-147, score-0.383]
78 We also augment each feature by a SIFT descriptor extracted from a 30 × 30 image patch centered at the super-pixel. [sent-154, score-0.317]
79 The image position of a super-pixel is the average position of its pixels. [sent-155, score-0.376]
80 To compute the vocabulary of visual words in the topic model, we apply k-means to group the super-pixel descriptors into clusters. [sent-156, score-0.315]
81 In the basic CRF, the conditional distribution of the labels of an image is defined as: P (l|a, t) ∝ exp{ σu,v δli ,u δlj ,v + γ i,j u,v h(li |ai , ti )} (14) i where h(·) is the log output from the super-pixel classifier. [sent-163, score-0.647]
82 We train the CRF model by maximizing its conditional pseudo-likelihood, and label the image based on the marginal distribution of each label variable, computed by the loopy belief propagation algorithm. [sent-164, score-0.878]
83 The classifiers for label prediction have 15 hidden units. [sent-233, score-0.351]
84 The appearance model for topics and the classifier are initialized randomly. [sent-234, score-0.32]
85 Also, Model II and III improve the accuracy further by incorporating the label spatial priors. [sent-241, score-0.453]
86 We notice that the lateral connections between label variables are more effective than integrating information from neighboring latent topic variables. [sent-242, score-0.727]
87 In order to test the robustness of the latent feature representation, we evaluate our models using data with different amount of labeling information. [sent-245, score-0.295]
88 We use an image dilation operator on the image regions labeled as ‘void’, and control the proportion of labeled data by varying the diameters of the dilation operator (see [16] for similar processing). [sent-246, score-0.962]
89 The model setting is the same as in MSRC-9 except that we use a MLP with 20 hidden units for label prediction. [sent-277, score-0.348]
90 For the full MSRC set, the two extended versions of our model achieve the similar performance as in [14], and we can see that the latent topic representation 6 S_Class Model−I Model−II Model−III 0. [sent-281, score-0.395]
91 Right bottom: Examples of original labeling and labeling after dilation (the ratio is 36. [sent-297, score-0.394]
92 Also, our models have the same accuracy as reported in [5] on the Corel-B dataset, while we have a simpler label random field and use a smaller training set. [sent-300, score-0.301]
93 It is interesting to note that the topics and spatial smoothness play less roles in the labeling performance on CorelB. [sent-301, score-0.398]
94 We can see that our models handle the extended regions better than those fine object structures, due to the tendency of (over)smoothing caused by super-pixelization and the two spatial dependency structures. [sent-303, score-0.284]
95 6 Discussion In this paper, we presented a hybrid framework for image labeling, which combines a generative topic model with discriminative label prediction models. [sent-304, score-1.046]
96 The generative model extends latent Dirichlet allocation to capture joint patterns in the label and appearance space of images. [sent-305, score-0.726]
97 This latent representation of an image then provides an additional input to the label predictor. [sent-306, score-0.638]
98 We also incorporated the spatial dependency into the model structure in two different ways, both imposing a prior of spatial smoothness for labeling on the image plane. [sent-307, score-0.835]
99 The results of applying our methods to three different image datasets suggest that this integrated approach may extend to a variety of image databases with only partial labeling available. [sent-308, score-0.72]
100 Using a stronger appearance model may help us understand the role of different visual cues, as well as construct a more powerful generative model. [sent-318, score-0.347]
wordName wordTfidf (topN-words)
[('li', 0.378), ('ai', 0.287), ('image', 0.276), ('zi', 0.258), ('label', 0.239), ('ti', 0.219), ('topic', 0.218), ('appearance', 0.207), ('labeling', 0.168), ('msrc', 0.155), ('zn', 0.138), ('spatial', 0.117), ('wj', 0.108), ('pb', 0.105), ('zj', 0.103), ('labeled', 0.101), ('crf', 0.098), ('lj', 0.097), ('latent', 0.095), ('discriminative', 0.093), ('module', 0.079), ('mlp', 0.078), ('dirichlet', 0.075), ('vi', 0.075), ('labels', 0.073), ('texture', 0.072), ('ln', 0.071), ('classi', 0.07), ('er', 0.068), ('base', 0.067), ('incorporating', 0.067), ('subregion', 0.067), ('uab', 0.067), ('xuming', 0.067), ('tn', 0.065), ('topics', 0.059), ('dilation', 0.058), ('prediction', 0.057), ('hybrid', 0.056), ('color', 0.055), ('hidden', 0.055), ('model', 0.054), ('smoothness', 0.054), ('generative', 0.053), ('annotation', 0.052), ('lateral', 0.052), ('zemel', 0.052), ('iii', 0.052), ('position', 0.05), ('neighboring', 0.05), ('proportion', 0.05), ('dependency', 0.049), ('images', 0.049), ('sky', 0.047), ('patterns', 0.047), ('jakob', 0.044), ('orig', 0.044), ('lda', 0.044), ('object', 0.044), ('plane', 0.043), ('unary', 0.043), ('connections', 0.042), ('regions', 0.042), ('descriptor', 0.041), ('conditional', 0.041), ('pixels', 0.041), ('dataset', 0.041), ('panel', 0.039), ('void', 0.039), ('bike', 0.039), ('bill', 0.039), ('segmentation', 0.038), ('region', 0.038), ('log', 0.038), ('ascent', 0.037), ('unlabeled', 0.036), ('richard', 0.036), ('nx', 0.036), ('pixel', 0.035), ('variants', 0.035), ('structured', 0.035), ('vocabulary', 0.035), ('andrew', 0.035), ('cvpr', 0.035), ('capturing', 0.035), ('cow', 0.033), ('grass', 0.033), ('visual', 0.033), ('ii', 0.033), ('models', 0.032), ('variables', 0.031), ('partially', 0.031), ('allocation', 0.031), ('posterior', 0.03), ('david', 0.03), ('accuracy', 0.03), ('verbeek', 0.029), ('loopy', 0.029), ('descriptors', 0.029), ('representation', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
2 0.24662219 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis
Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1
3 0.22506131 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
Author: Leo Zhu, Yuanhao Chen, Yuan Lin, Chenxi Lin, Alan L. Yuille
Abstract: Language and image understanding are two major goals of artificial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efficient polynomialtime parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efficient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition. This HIM is represented by recursive segmentation and recognition templates in multiple layers and has advantages for representation, inference, and learning. Firstly, the HIM has a coarse-to-fine representation which is capable of capturing long-range dependency and exploiting different levels of contextual information. Secondly, the structure of the HIM allows us to design a rapid inference algorithm, based on dynamic programming, which enables us to parse the image rapidly in polynomial time. Thirdly, we can learn the HIM efficiently in a discriminative manner from a labeled dataset. We demonstrate that HIM outperforms other state-of-the-art methods by evaluation on the challenging public MSRC image dataset. Finally, we sketch how the HIM architecture can be extended to model more complex image phenomena. 1
4 0.21536122 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
5 0.20439334 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
Author: Mohak Shah
Abstract: We derive risk bounds for the randomized classifiers in Sample Compression setting where the classifier-specification utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classifiers and also recover the corresponding classical PAC-Bayes bound. We further show how these compare favorably to the existing results.
6 0.18675013 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
7 0.17030019 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
8 0.15683903 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
9 0.15666133 229 nips-2008-Syntactic Topic Models
10 0.14937286 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
11 0.14927398 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
12 0.14881919 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
13 0.13685879 119 nips-2008-Learning a discriminative hidden part model for human action recognition
14 0.12736462 148 nips-2008-Natural Image Denoising with Convolutional Networks
15 0.1269992 216 nips-2008-Sparse probabilistic projections
16 0.12248254 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
17 0.11996259 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
18 0.1166169 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
19 0.11125603 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
20 0.10880589 95 nips-2008-Grouping Contours Via a Related Image
topicId topicWeight
[(0, -0.315), (1, -0.231), (2, 0.131), (3, -0.283), (4, -0.084), (5, 0.007), (6, -0.068), (7, -0.09), (8, -0.11), (9, 0.027), (10, -0.04), (11, 0.035), (12, 0.169), (13, -0.041), (14, 0.044), (15, -0.058), (16, 0.021), (17, -0.004), (18, -0.081), (19, -0.162), (20, 0.104), (21, -0.058), (22, -0.063), (23, 0.033), (24, 0.011), (25, -0.129), (26, 0.013), (27, 0.08), (28, 0.149), (29, -0.042), (30, 0.054), (31, 0.02), (32, 0.025), (33, 0.019), (34, 0.089), (35, -0.098), (36, -0.107), (37, -0.082), (38, 0.054), (39, -0.015), (40, 0.002), (41, 0.068), (42, -0.017), (43, -0.02), (44, 0.005), (45, 0.061), (46, -0.005), (47, 0.044), (48, -0.092), (49, -0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.97885549 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
2 0.66935062 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
Author: Abhinav Gupta, Jianbo Shi, Larry S. Davis
Abstract: We present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or “informative” background(context). We present a “shape-aware” model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity. 1
3 0.64703286 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller
Abstract: One of the original goals of computer vision was to fully understand a natural scene. This requires solving several sub-problems simultaneously, including object detection, region labeling, and geometric reasoning. The last few decades have seen great progress in tackling each of these problems in isolation. Only recently have researchers returned to the difficult task of considering them jointly. In this work, we consider learning a set of related models in such that they both solve their own problem and help each other. We develop a framework called Cascaded Classification Models (CCM), where repeated instantiations of these classifiers are coupled by their input/output variables in a cascade that improves performance at each level. Our method requires only a limited “black box” interface with the models, allowing us to use very sophisticated, state-of-the-art classifiers without having to look under the hood. We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d reconstruction. 1
4 0.62821794 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
Author: Mohak Shah
Abstract: We derive risk bounds for the randomized classifiers in Sample Compression setting where the classifier-specification utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classifiers and also recover the corresponding classical PAC-Bayes bound. We further show how these compare favorably to the existing results.
5 0.61995041 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
Author: Erik B. Sudderth, Michael I. Jordan
Abstract: We develop a statistical framework for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman–Yor (PY) process. This nonparametric prior distribution leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY processes, we use Gaussian processes to discover spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes. 1
6 0.61254156 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
7 0.60181338 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
8 0.54837126 118 nips-2008-Learning Transformational Invariants from Natural Movies
9 0.54161638 148 nips-2008-Natural Image Denoising with Convolutional Networks
10 0.52161127 119 nips-2008-Learning a discriminative hidden part model for human action recognition
11 0.51268351 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
12 0.50674659 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
13 0.50381029 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
14 0.50133014 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
15 0.49632117 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
16 0.47849324 52 nips-2008-Correlated Bigram LSA for Unsupervised Language Model Adaptation
17 0.46511379 33 nips-2008-Bayesian Model of Behaviour in Economic Games
18 0.46040404 147 nips-2008-Multiscale Random Fields with Application to Contour Grouping
19 0.44879389 229 nips-2008-Syntactic Topic Models
20 0.44709077 95 nips-2008-Grouping Contours Via a Related Image
topicId topicWeight
[(6, 0.084), (7, 0.062), (12, 0.084), (24, 0.11), (28, 0.13), (35, 0.016), (57, 0.156), (59, 0.032), (63, 0.016), (77, 0.047), (78, 0.018), (81, 0.021), (83, 0.138)]
simIndex simValue paperId paperTitle
same-paper 1 0.87905443 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
Author: Xuming He, Richard S. Zemel
Abstract: Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on three real image datasets demonstrate the effectiveness of incorporating the latent structure. 1
2 0.84867561 95 nips-2008-Grouping Contours Via a Related Image
Author: Praveen Srinivasan, Liming Wang, Jianbo Shi
Abstract: Contours have been established in the biological and computer vision literature as a compact yet descriptive representation of object shape. While individual contours provide structure, they lack the large spatial support of region segments (which lack internal structure). We present a method for further grouping of contours in an image using their relationship to the contours of a second, related image. Stereo, motion, and similarity all provide cues that can aid this task; contours that have similar transformations relating them to their matching contours in the second image likely belong to a single group. To find matches for contours, we rely only on shape, which applies directly to all three modalities without modification, in contrast to the specialized approaches developed for each independently. Visually salient contours are extracted in each image, along with a set of candidate transformations for aligning subsets of them. For each transformation, groups of contours with matching shape across the two images are identified to provide a context for evaluating matches of individual contour points across the images. The resulting contexts of contours are used to perform a final grouping on contours in the original image while simultaneously finding matches in the related image, again by shape matching. We demonstrate grouping results on image pairs consisting of stereo, motion, and similar images. Our method also produces qualitatively better results against a baseline method that does not use the inferred contexts. 1
3 0.84732479 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
Author: Geremy Heitz, Stephen Gould, Ashutosh Saxena, Daphne Koller
Abstract: One of the original goals of computer vision was to fully understand a natural scene. This requires solving several sub-problems simultaneously, including object detection, region labeling, and geometric reasoning. The last few decades have seen great progress in tackling each of these problems in isolation. Only recently have researchers returned to the difficult task of considering them jointly. In this work, we consider learning a set of related models in such that they both solve their own problem and help each other. We develop a framework called Cascaded Classification Models (CCM), where repeated instantiations of these classifiers are coupled by their input/output variables in a cascade that improves performance at each level. Our method requires only a limited “black box” interface with the models, allowing us to use very sophisticated, state-of-the-art classifiers without having to look under the hood. We demonstrate the effectiveness of our method on a large set of natural images by combining the subtasks of scene categorization, object detection, multiclass image segmentation, and 3d reconstruction. 1
4 0.83003759 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
Author: Mehmet K. Muezzinoglu, Alexander Vergara, Ramon Huerta, Thomas Nowotny, Nikolai Rulkov, Henry Abarbanel, Allen Selverston, Mikhail Rabinovich
Abstract: The odor transduction process has a large time constant and is susceptible to various types of noise. Therefore, the olfactory code at the sensor/receptor level is in general a slow and highly variable indicator of the input odor in both natural and artificial situations. Insects overcome this problem by using a neuronal device in their Antennal Lobe (AL), which transforms the identity code of olfactory receptors to a spatio-temporal code. This transformation improves the decision of the Mushroom Bodies (MBs), the subsequent classifier, in both speed and accuracy. Here we propose a rate model based on two intrinsic mechanisms in the insect AL, namely integration and inhibition. Then we present a MB classifier model that resembles the sparse and random structure of insect MB. A local Hebbian learning procedure governs the plasticity in the model. These formulations not only help to understand the signal conditioning and classification methods of insect olfactory systems, but also can be leveraged in synthetic problems. Among them, we consider here the discrimination of odor mixtures from pure odors. We show on a set of records from metal-oxide gas sensors that the cascade of these two new models facilitates fast and accurate discrimination of even highly imbalanced mixtures from pure odors. 1
5 0.82629144 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
Author: Erik B. Sudderth, Michael I. Jordan
Abstract: We develop a statistical framework for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman–Yor (PY) process. This nonparametric prior distribution leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY processes, we use Gaussian processes to discover spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes. 1
6 0.82456988 179 nips-2008-Phase transitions for high-dimensional joint support recovery
7 0.80631995 194 nips-2008-Regularized Learning with Networks of Features
8 0.80265313 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words
9 0.80140716 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
10 0.80087113 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
11 0.79930043 207 nips-2008-Shape-Based Object Localization for Descriptive Classification
12 0.79882002 32 nips-2008-Bayesian Kernel Shaping for Learning Control
13 0.79622686 120 nips-2008-Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
14 0.79495466 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
15 0.79450989 148 nips-2008-Natural Image Denoising with Convolutional Networks
16 0.79275912 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
17 0.79080665 75 nips-2008-Estimating vector fields using sparse basis field expansions
18 0.79075795 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
19 0.79059774 200 nips-2008-Robust Kernel Principal Component Analysis
20 0.7897656 229 nips-2008-Syntactic Topic Models