nips nips2010 nips2010-137 knowledge-graph by maker-knowledge-mining

137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models


Source: pdf

Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing

Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu † School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213 ‡ Department of Computer Science, Stanford University, Stanford, CA 94305 Abstract Upstream supervised topic models have been widely used for complicated scene understanding. [sent-6, score-0.876]

2 However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. [sent-7, score-1.222]

3 This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. [sent-8, score-1.401]

4 We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization. [sent-10, score-1.527]

5 The standard unsupervised LDA ignores the commonly available supervision information, and thus can discover a sub-optimal topic representation for prediction tasks. [sent-15, score-0.274]

6 Extensions to supervised topic models which can explore side information for discovering predictive topic representations have been proposed, such as the sLDA [4, 25] and MedLDA [27]. [sent-16, score-0.407]

7 Another type of supervised topic models are the so-called upstream models, of which the response variables directly or indirectly generate latent topic variables. [sent-18, score-0.768]

8 In contrast to downstream supervised topic models (dSTM), which are mainly designed by machine learning researchers, upstream supervised topic models (uSTM) are well-motivated from human vision and psychology research [18, 10] and have been widely used for scene understanding tasks. [sent-19, score-1.52]

9 For example, in the recently developed scene understanding models [23, 13, 14, 8], complex scene images are modeled as a hierarchy of semantic concepts where the most top level corresponds to a scene, which can be represented as a set of latent objects likely to be found in a given scene. [sent-20, score-1.527]

10 To learn an upstream scene model, maximum likelihood estimation (MLE) is the most common choice. [sent-21, score-0.969]

11 However, MLE can make the prediction model estimation independent of latent topic discovery and result in an imbalanced prediction rule for scene classification, as we explain in Section 3. [sent-22, score-1.2]

12 1 In this paper, our goal is to address the weakness of MLE for learning upstream supervised topic models. [sent-23, score-0.543]

13 In such downstream models, latent topic assignments are sufficient statistics for the prediction model and it is easy to define the max-margin constraints based on existing max-margin methods (e. [sent-26, score-0.4]

14 However, for upstream supervised topic models, the discriminant function for prediction involves an intractable computation of posterior distributions, which makes the max-margin training more delicate. [sent-29, score-0.644]

15 Specifically, we present a joint max-margin and max-likelihood estimation method for learning upstream scene understanding models. [sent-30, score-1.038]

16 , scene categories), our max-margin learning approach iterates between posterior probabilistic inference and max-margin parameter learning. [sent-33, score-0.669]

17 The parameter learning solves an online loss-augmented SVM, which closely couples the prediction model estimation and latent topic discovery, and this close interplay results in a well-balanced prediction rule for scene categorization. [sent-34, score-1.18]

18 Finally, we demonstrate the advantages of our max-margin approach on both the 8category sports [13] and the 67-class MIT indoor scene [20] datasets. [sent-35, score-0.846]

19 Empirical results show that max-margin learning can significantly improve the scene classification accuracy. [sent-36, score-0.633]

20 2 presents a generic scene understanding model we will work on. [sent-39, score-0.731]

21 3 discusses the weakness of MLE in learning upstream models. [sent-41, score-0.325]

22 2 Joint Scene and Object Model: a Generic Running Example In this section, we present a generic joint scene categorization and object annotation model, which will be used to demonstrate the large margin learning of upstream scene understanding models. [sent-47, score-1.964]

23 1 Image Representation How should we represent a scene image? [sent-49, score-0.633]

24 While individual objects contribute to the recognition of visual scenes, human vision researchers Navon [18] and Biederman [2] also showed that people perform rapid global scene analysis before conducting more detailed local object analysis when recognizing scene images. [sent-51, score-1.529]

25 To obtain a generic model, we represent a scene by using its global scene features and objects within it. [sent-52, score-1.496]

26 To model the global scene representation, we extract a set of global features G [19]. [sent-60, score-0.858]

27 S is the scene random variable, taking values from a finite set S = {s1 , · · · , sMs }. [sent-65, score-0.633]

28 For an image, the distribution over scene categories depends on its global representation features G. [sent-66, score-0.811]

29 Each scene is represented as a mixture over latent objects O and the mixing weights are defined with a generalized linear model (GLM) parameterized by ψ. [sent-67, score-0.806]

30 By using a normal prior on ψ, the scene model can capture the mutual correlations between different objects, similar to the correlated topic models (CTMs) [3]. [sent-68, score-0.845]

31 Sample a scene category from a conditional scene model: p(s|g, θ) = 2. [sent-71, score-1.323]

32 From the joint distribution, we can make two types of predictions, namely scene classification and object annotation. [sent-94, score-0.792]

33 For scene classification, we infer the maximum a posteriori prediction s ˆ arg max p(s|g, r, x) = arg max log p(s, r, x|g). [sent-95, score-0.725]

34 s s (1) For object annotation, we can use the inferred latent representation of regions based on p(o|g, r, x) and build a classifier to categorize regions into object classes, when some training examples with manually annotated objects are provided. [sent-96, score-0.41]

35 Since collecting fully labeled images with annotated objects is difficult, upstream scene models are usually learned with partially labeled images for scene categorization, where only scene categories are provided and objects are treated as latent topics or themes [9]. [sent-97, score-2.607]

36 Some empirical results on object annotation will be reported when labeled objects are available. [sent-99, score-0.272]

37 We use this joint model as a running example to demonstrate the basic principe of performing maxmargin learning for the widely applied upstream scene understanding models because it is wellmotivated, very generic and covers many other existing scene understanding models. [sent-100, score-1.794]

38 For example, if we do not incorporate the global scene representation G, the joint model will be reduced to a model similar as [14, 6, 23]. [sent-101, score-0.793]

39 Moreover, the generic joint model provides a good framework for studying the relative contributions of local object modeling and global scene representation, which has been shown to be useful for scene classification [20] and object detection [17] tasks. [sent-102, score-1.663]

40 3 Weak Coupling of MLE in Learning Upstream Scene Models To learn an upstream scene model, the most commonly used method is the maximum likelihood estimation (MLE), such as in [23, 6, 14]. [sent-103, score-0.969]

41 In this section, we discuss the weakness of MLE for learning upstream scene models and motivate the max-margin approach. [sent-104, score-0.983]

42 Since Ls,−θ does not depend on θ, the MLE estimation of the conditional scene model is to solve max θ ∑ d log p(sd |gd , θ), (3) which does not depend on the latent object model. [sent-108, score-0.922]

43 This is inconsistent with the prediction rule (1) which does depend on both the conditional scene model (i. [sent-109, score-0.816]

44 3 This decoupling will result in an imbalanced combination between the conditional scene and object models for prediction, as we explain below. [sent-113, score-0.834]

45 By introducing a variational distribution qs (ψ, o) to approximate the posterior p(ψ, o|s, r, x, Θ) and using the Jensen’s inequality, we can derive a lower bound Ls,−θ ≥ Eqs [log p(ψ, o, r, x|s, Θ)] + H(qs ) L−θ (qs , Θ), (4) where H(q) = −Eq [q] is the entropy. [sent-117, score-0.351]

46 Then, the intractable prediction rule (1) can be approximated with the variational prediction rule s ˆ ∑ ( ) arg max log p(s|g, θ) + L−θ (qs , Θ) . [sent-118, score-0.36]

47 See Appendix for the inference of qs as involved in the prediction rule (5) and the estimation of Θ−θ . [sent-120, score-0.42]

48 Now, we examine the effects of the conditional scene model p(s|g, θ) in making a prediction via the prediction rule (5). [sent-121, score-0.886]

49 We can see that in MLE the conditional scene model plays a very weak role in making a prediction when it is combined with the object model, i. [sent-124, score-0.865]

50 1 (b-right), in the max-margin approach to be presented, the conditional scene model plays a much more influential role in making a prediction via the rule (5). [sent-135, score-0.816]

51 This results in a better balanced combination between the scene and the object models. [sent-136, score-0.75]

52 All our discussions are concentrated on learning upstream supervised topic models, as generically represented by the model in Fig. [sent-139, score-0.526]

53 4 Max-Margin Training Now, we present the max-margin method for learning upstream scene understanding models. [sent-141, score-0.967]

54 , 0/1 loss), and ∆Fd (s; Θ) = F (sd , gd , rd , xd ; Θ) − F (s, gd , rd , xd ; Θ) is the margin favored by the true category sd over any other category s. [sent-148, score-0.444]

55 As in MLE, we use a variational distribution qs to approximate it. [sent-150, score-0.315]

56 (7) and applying the principle of regularized empirical risk minimization, we define the max-margin learning of the joint scene and object model as solving ∑ min Ω(Θ) + λ Θ d (− max L−θ (qsd )) + CRhinge (Θ), qsd (8) 1 where Ω(Θ) is a regularizer of the parameters. [sent-158, score-1.066]

57 When λ → ∞, the problem (8) reduces to the standard MLE of the joint scene model with a fixed uniform prior on scene classes. [sent-162, score-1.331]

58 Here, we minimize a hinge loss, which is defined on the joint prediction rule, while MLE minimizes the log-likelihood loss log p(sd |gd , θ), which does not depend on the latent object model. [sent-164, score-0.327]

59 Therefore, our approach can be expected to achieve a closer dependence between the conditional scene model and the latent object model. [sent-165, score-0.871]

60 First, we infer the optimal variational posterior2 ⋆ qs = arg maxqs L−θ (qs ) for each s and each training image. [sent-177, score-0.344]

61 ⋆ The first step of inferring qs is to compute the discriminant function F under the current model. [sent-181, score-0.288]

62 This term indicates that the estimation of the scene classification model is influenced by the topic discovery procedure, which finds an optimum posterior distribution q ⋆ . [sent-202, score-0.915]

63 If ∆L⋆ (s) < 0, s ̸= sd , d which means it is very likely that a wrong scene s explains the image content better than the true scene sd , then the term ∆L⋆ (s) acts in a role of augmenting the linear decision boundary θ to make d a correct prediction on this image by using the prediction rule (5). [sent-203, score-1.774]

64 If ∆L⋆ (s) > 0, which means d the true scene can explain the image content better than s, then the linear decision boundary can be slightly relaxed. [sent-204, score-0.683]

65 Solving the loss-augmented SVM will result in an amplified influence of the scene classification model in the joint predictive rule (5) as shown in Fig. [sent-207, score-0.766]

66 5 Experiments Now, we present empirical evaluation of our approach on the sports [13] and MIT indoor scene [20] datasets. [sent-209, score-0.821]

67 Our goal is to demonstrate the advantages of the max-margin method over the MLE for learning upstream scene models with or without global features. [sent-210, score-1.04]

68 1 can also be used for object annotation, we report the performance on scene categorization only, which is our main focus in this paper. [sent-212, score-0.82]

69 1 Datasets and Features The sports data contain 1574 diverse scene images from 8 categories, as listed in Fig. [sent-215, score-0.757]

70 The indoor scene dataset [20] contains 15620 scene images from 67 categories as listed in Table 2. [sent-217, score-1.476]

71 2 Models For the upstream scene model as in Fig. [sent-224, score-0.941]

72 1, we compare the max-margin learning with the MLE method, and we denote the scene models trained with max-margin training and MLE by MM-Scene and MLE-Scene, respectively. [sent-225, score-0.658]

73 For both methods, we evaluate the effectiveness of global features, and we denote the scene models without global features by MM-Scene-NG and MLE-Scene-NG, respectively. [sent-226, score-0.86]

74 Since our main goal in this paper is to demonstrate the advantages of max-margin learning in upstream supervised topic models, rather than dominance of such models over all others, we just compare with one example of downstream models–the multi-class sLDA (Multi-sLDA) [25]. [sent-227, score-0.62]

75 For the downstream Multi-sLDA, the image-wise scene category variable S is generated from latent object variables O via a softmax function. [sent-229, score-0.928]

76 Finally, to show the usefulness of the object model in scene categorization, we also compare with the margin-based multi-class SVM [7] and likelihood-based logistic regression for scene classification based on the global features. [sent-231, score-1.507]

77 75 Scene Classification Accuracy The average overall accuracy of scene categorization 0. [sent-244, score-0.741]

78 5 matrix of the max-margin scene model with 100 latent 0. [sent-253, score-0.732]

79 35 all, the max-margin scene model with global features 0. [sent-258, score-0.786]

80 Interestingly, al# Topics though we provide only scene categories as supervised Figure 3: Classification accuracy of different information during training, our best performance with models with respect to the number of topics. [sent-261, score-0.798]

81 The outstanding performance of the max-margin method for scene classification can be understood from the following aspects. [sent-263, score-0.633]

82 Global features: from the comparison between the scene models with and without global features, we can see that using the gist features can significantly (about 8 percent) improve the scene categorization accuracy in both MLE and max-margin training. [sent-266, score-1.597]

83 Specifically, the max-margin scene model achieves an accuracy of about 0. [sent-269, score-0.694]

84 83 in scene classification, and the likelihood-based model obtains an accuracy of about 0. [sent-270, score-0.694]

85 1 (c)), which use global features only, indicates that modeling objects can facilitate scene categorization. [sent-273, score-0.837]

86 This is because the scene classification model is influenced by the latent object modeling through the term ∆L⋆ (s), which can d improve the decision boundary of a standard linear SVM for those images that have negative scores of ∆L⋆ (s), as we have discussed in the online loss-augmented SVM. [sent-274, score-0.909]

87 However, object modeling d does not improve the classification accuracy and sometimes it can even be harmful when the scene model is learned with the standard MLE. [sent-275, score-0.811]

88 We also compare with the theme model [9], which is for scene categorization only. [sent-287, score-0.762]

89 717 bocce croquet polo rowing sailing ton climbing boarding badminton 0. [sent-293, score-0.337]

90 8 Finally, we examine the influence of the loss function 0/1 0/5 ∆ℓd (s) on the performance of the max-margin scene 0. [sent-426, score-0.633]

91 4 Scene Categorization on the 67-Class MIT Indoor Scene Dataset Scene Classification Accuracy The MIT indoor dataset [20] contains complex scene 0. [sent-442, score-0.758]

92 We compare the joint scene model with MLE−Scene MM−Scene 0. [sent-449, score-0.698]

93 We can see that the joint scene model (both MLE-Scene and MM-Scene) significantly outperforms SVM and LR that use global features only. [sent-462, score-0.828]

94 By using max-margin training, the joint scene model (i. [sent-464, score-0.698]

95 6 Conclusions In this paper, we address the weak coupling problem of the commonly used maximum likelihood estimation in learning upstream scene understanding models by presenting a joint maximum margin and maximum likelihood learning method. [sent-468, score-1.159]

96 The proposed approach achieves a close interplay between the prediction model estimation and latent topic discovery, and thereby a well-balanced prediction rule. [sent-469, score-0.456]

97 Finally, we demonstrate the advantages of max-margin training and the effectiveness of using global features in scene understanding on both an 8-category sports dataset and the 67-class MIT indoor scene data. [sent-471, score-1.682]

98 Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. [sent-512, score-0.361]

99 A bayesian hierarchical model for learning natural scene categories. [sent-529, score-0.656]

100 Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. [sent-560, score-0.738]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('scene', 0.633), ('mle', 0.325), ('upstream', 0.285), ('qs', 0.253), ('qsd', 0.251), ('topic', 0.164), ('object', 0.117), ('indoor', 0.101), ('sd', 0.1), ('sports', 0.087), ('gd', 0.084), ('annotation', 0.081), ('fd', 0.081), ('latent', 0.076), ('objects', 0.074), ('global', 0.072), ('svm', 0.071), ('prediction', 0.07), ('categorization', 0.07), ('rule', 0.068), ('gist', 0.068), ('downstream', 0.067), ('variational', 0.062), ('bocce', 0.059), ('croquet', 0.059), ('polo', 0.059), ('rowing', 0.059), ('rockclimbing', 0.059), ('features', 0.058), ('sailing', 0.056), ('supervised', 0.054), ('classi', 0.053), ('medlda', 0.052), ('snowboarding', 0.052), ('image', 0.05), ('bar', 0.049), ('understanding', 0.049), ('categories', 0.048), ('badminton', 0.045), ('joint', 0.042), ('room', 0.04), ('weakness', 0.04), ('supervision', 0.04), ('xd', 0.039), ('sift', 0.039), ('cvpr', 0.038), ('accuracy', 0.038), ('imbalanced', 0.037), ('images', 0.037), ('mx', 0.036), ('slda', 0.036), ('theme', 0.036), ('posterior', 0.036), ('discriminant', 0.035), ('mm', 0.035), ('category', 0.035), ('multi', 0.033), ('lr', 0.033), ('blei', 0.031), ('classification', 0.031), ('discovery', 0.03), ('roi', 0.03), ('ctms', 0.029), ('dstm', 0.029), ('lsd', 0.029), ('maxmargin', 0.029), ('maxqs', 0.029), ('maxqsd', 0.029), ('rnm', 0.029), ('subway', 0.029), ('estimation', 0.029), ('logistic', 0.029), ('restaurant', 0.028), ('margin', 0.028), ('texture', 0.027), ('topics', 0.026), ('mr', 0.026), ('generic', 0.026), ('annotated', 0.026), ('disclda', 0.026), ('rhinge', 0.026), ('xnm', 0.026), ('green', 0.026), ('ls', 0.026), ('models', 0.025), ('advantages', 0.025), ('region', 0.024), ('segmentation', 0.024), ('coupling', 0.024), ('improvements', 0.024), ('dataset', 0.024), ('interplay', 0.024), ('inside', 0.024), ('online', 0.023), ('model', 0.023), ('ng', 0.022), ('kitchen', 0.022), ('likelihood', 0.022), ('log', 0.022), ('conditional', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing

Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

2 0.27681178 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing

Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.

3 0.2661767 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen

Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1

4 0.24662259 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

Author: Yang Wang, Greg Mori

Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.

5 0.23420642 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

Author: Abhinav Gupta, Martial Hebert, Takeo Kanade, David M. Blei

Abstract: There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art. 1

6 0.14872916 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

7 0.11742084 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

8 0.11182833 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach

9 0.092660017 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

10 0.086954713 133 nips-2010-Kernel Descriptors for Visual Recognition

11 0.086259753 286 nips-2010-Word Features for Latent Dirichlet Allocation

12 0.086151406 149 nips-2010-Learning To Count Objects in Images

13 0.083245717 194 nips-2010-Online Learning for Latent Dirichlet Allocation

14 0.079795077 228 nips-2010-Reverse Multi-Label Learning

15 0.075820059 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

16 0.073182032 1 nips-2010-(RF)^2 -- Random Forest Random Field

17 0.070932828 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

18 0.070847869 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data

19 0.066821381 177 nips-2010-Multitask Learning without Label Correspondences

20 0.066262223 103 nips-2010-Generating more realistic images using gated MRF's


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.186), (1, 0.109), (2, -0.154), (3, -0.296), (4, -0.079), (5, -0.009), (6, -0.029), (7, 0.032), (8, -0.002), (9, 0.046), (10, 0.059), (11, 0.113), (12, -0.102), (13, 0.085), (14, 0.04), (15, 0.077), (16, 0.137), (17, -0.025), (18, 0.205), (19, 0.072), (20, 0.039), (21, -0.068), (22, -0.085), (23, 0.042), (24, -0.046), (25, -0.059), (26, -0.005), (27, 0.131), (28, 0.033), (29, -0.02), (30, -0.014), (31, -0.06), (32, -0.001), (33, -0.014), (34, -0.084), (35, 0.052), (36, 0.001), (37, 0.085), (38, -0.077), (39, 0.024), (40, 0.046), (41, -0.021), (42, 0.063), (43, -0.058), (44, 0.149), (45, -0.036), (46, -0.055), (47, 0.026), (48, -0.047), (49, 0.084)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96438593 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing

Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

2 0.84684449 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

Author: Abhinav Gupta, Martial Hebert, Takeo Kanade, David M. Blei

Abstract: There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art. 1

3 0.82718974 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

Author: Li-jia Li, Hao Su, Li Fei-fei, Eric P. Xing

Abstract: Robust low-level image features have been proven to be effective representations for a variety of visual recognition tasks such as object recognition and scene classification; but pixels, or even local image patches, carry little semantic meanings. For high level visual tasks, such low-level image representations are potentially not enough. In this paper, we propose a high-level image representation, called the Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors, blind to the testing dataset or visual task. Leveraging on the Object Bank representation, superior performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM. Sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.

4 0.70936382 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

Author: Mario Fritz, Kate Saenko, Trevor Darrell

Abstract: Metric constraints are known to be highly discriminative for many objects, but if training is limited to data captured from a particular 3-D sensor the quantity of training data may be severly limited. In this paper, we show how a crucial aspect of 3-D information–object and feature absolute size–can be added to models learned from commonly available online imagery, without use of any 3-D sensing or reconstruction at training time. Such models can be utilized at test time together with explicit 3-D sensing to perform robust search. Our model uses a “2.1D” local feature, which combines traditional appearance gradient statistics with an estimate of average absolute depth within the local window. We show how category size information can be obtained from online images by exploiting relatively unbiquitous metadata fields specifying camera intrinstics. We develop an efficient metric branch-and-bound algorithm for our search task, imposing 3-D size constraints as part of an optimal search for a set of features which indicate the presence of a category. Experiments on test scenes captured with a traditional stereo rig are shown, exploiting training data from from purely monocular sources with associated EXIF metadata. 1

5 0.69650716 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence

Author: Yang Wang, Greg Mori

Abstract: We propose a discriminative latent model for annotating images with unaligned object-level textual annotations. Instead of using the bag-of-words image representation currently popular in the computer vision community, our model explicitly captures more intricate relationships underlying visual and textual information. In particular, we model the mapping that translates image regions to annotations. This mapping allows us to relate image regions to their corresponding annotation terms. We also model the overall scene label as latent information. This allows us to cluster test images. Our training data consist of images and their associated annotations. But we do not have access to the ground-truth regionto-annotation mapping or the overall scene label. We develop a novel variant of the latent SVM framework to model them as latent variables. Our experimental results demonstrate the effectiveness of the proposed model compared with other baseline methods.

6 0.68252701 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

7 0.56575727 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process

8 0.55491668 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

9 0.55129445 149 nips-2010-Learning To Count Objects in Images

10 0.5348531 245 nips-2010-Space-Variant Single-Image Blind Deconvolution for Removing Camera Shake

11 0.49656615 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach

12 0.4925715 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

13 0.49038211 256 nips-2010-Structural epitome: a way to summarize one’s visual experience

14 0.42375076 1 nips-2010-(RF)^2 -- Random Forest Random Field

15 0.40185666 228 nips-2010-Reverse Multi-Label Learning

16 0.38785458 17 nips-2010-A biologically plausible network for the computation of orientation dominance

17 0.3862285 95 nips-2010-Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike

18 0.36477888 267 nips-2010-The Multidimensional Wisdom of Crowds

19 0.34725037 235 nips-2010-Self-Paced Learning for Latent Variable Models

20 0.34171143 40 nips-2010-Beyond Actions: Discriminative Models for Contextual Group Activities


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.026), (27, 0.081), (30, 0.052), (35, 0.026), (45, 0.184), (50, 0.042), (52, 0.015), (60, 0.021), (77, 0.024), (90, 0.433)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87733233 205 nips-2010-Permutation Complexity Bound on Out-Sample Error

Author: Malik Magdon-Ismail

Abstract: We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The permutation complexity is based (like the maximum discrepancy) on dependent sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated.

2 0.86425346 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen

Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1

same-paper 3 0.83898842 137 nips-2010-Large Margin Learning of Upstream Scene Understanding Models

Author: Jun Zhu, Li-jia Li, Li Fei-fei, Eric P. Xing

Abstract: Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

4 0.83172226 250 nips-2010-Spectral Regularization for Support Estimation

Author: Ernesto D. Vito, Lorenzo Rosasco, Alessandro Toigo

Abstract: In this paper we consider the problem of learning from data the support of a probability distribution when the distribution does not have a density (with respect to some reference measure). We propose a new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which we call “completely regular”. Completely regular kernels allow to capture the relevant geometric and topological properties of an arbitrary probability space. In particular, they are the key ingredient to prove the universal consistency of the spectral estimators and in this respect they are the analogue of universal kernels for supervised problems. Numerical experiments show that spectral estimators compare favorably to state of the art machine learning algorithms for density support estimation.

5 0.74323964 127 nips-2010-Inferring Stimulus Selectivity from the Spatial Structure of Neural Network Dynamics

Author: Kanaka Rajan, L Abbott, Haim Sompolinsky

Abstract: How are the spatial patterns of spontaneous and evoked population responses related? We study the impact of connectivity on the spatial pattern of fluctuations in the input-generated response, by comparing the distribution of evoked and intrinsically generated activity across the different units of a neural network. We develop a complementary approach to principal component analysis in which separate high-variance directions are derived for each input condition. We analyze subspace angles to compute the difference between the shapes of trajectories corresponding to different network states, and the orientation of the low-dimensional subspaces that driven trajectories occupy within the full space of neuronal activity. In addition to revealing how the spatiotemporal structure of spontaneous activity affects input-evoked responses, these methods can be used to infer input selectivity induced by network dynamics from experimentally accessible measures of spontaneous activity (e.g. from voltage- or calcium-sensitive optical imaging experiments). We conclude that the absence of a detailed spatial map of afferent inputs and cortical connectivity does not limit our ability to design spatially extended stimuli that evoke strong responses. 1 1 Motivation Stimulus selectivity in neural networks was historically measured directly from input-driven responses [1], and only later were similar selectivity patterns observed in spontaneous activity across the cortical surface [2, 3]. We argue that it is possible to work in the reverse order, and show that analyzing the distribution of spontaneous activity across the different units in the network can inform us about the selectivity of evoked responses to stimulus features, even when no apparent sensory map exists. Sensory-evoked responses are typically divided into a signal component generated by the stimulus and a noise component corresponding to ongoing activity that is not directly related to the stimulus. Subsequent effort focuses on understanding how the signal depends on properties of the stimulus, while the remaining, irregular part of the response is treated as additive noise. The distinction between external stochastic processes and the noise generated deterministically as a function of intrinsic recurrence has been previously studied in chaotic neural networks [4]. It has also been suggested that internally generated noise is not additive and can be more sensitive to the frequency and amplitude of the input, compared to the signal component of the response [5 - 8]. In this paper, we demonstrate that the interaction between deterministic intrinsic noise and the spatial properties of the external stimulus is also complex and nonlinear. We study the impact of network connectivity on the spatial pattern of input-driven responses by comparing the structure of evoked and spontaneous activity, and show how the unique signature of these dynamics determines the selectivity of networks to spatial features of the stimuli driving them. 2 Model description In this section, we describe the network model and the methods we use to analyze its dynamics. Subsequent sections explore how the spatial patterns of spontaneous and evoked responses are related in terms of the distribution of the activity across the network. Finally, we show how the stimulus selectivity of the network can be inferred from its spontaneous activity patterns. 2.1 Network elements We build a firing rate model of N interconnected units characterized by a statistical description of the underlying circuitry (as N → ∞, the system “self averages” making the description independent of a specific network architecture, see also [11, 12]). Each unit is characterized by an activation variable xi ∀ i = 1, 2, . . . N , and a nonlinear response function ri which relates to xi through ri = R0 + φ(xi ) where,   R0 tanh x for x ≤ 0 R0 φ(x) = (1) x  (Rmax − R0 ) tanh otherwise. Rmax −R0 Eq. 1 allows us to independently set the maximum firing rate Rmax and the background rate R0 to biologically reasonable values, while retaining a maximum gradient at x = 0 to guarantee the smoothness of the transition to chaos [4]. We introduce a recurrent weight matrix with element Jij equivalent to the strength of the synapse from unit j → unit i. The individual weights are chosen independently and randomly from a Gaus2 sian distribution with mean and variance given by [Jij ]J = 0 and Jij J = g 2 /N , where square brackets are ensemble averages [9 - 11,13]. The control parameter g which scales as the variance of the synaptic weights, is particularly important in determining whether or not the network produces spontaneous activity with non-trivial dynamics (Specifically, g = 0 corresponds to a completely uncoupled network and a network with g = 1 generates non-trivial spontaneous activity [4, 9, 10]). The activation variable for each unit xi is therefore determined by the relation, N τr dxi = −xi + g Jij rj + Ii , dt j=1 with the time scale of the network set by the single-neuron time constant τr of 10 ms. 2 (2) The amplitude I of an oscillatory external input of frequency f , is always the same for each unit, but in some examples shown in this paper, we introduce a neuron-specific phase factor θi , chosen randomly from a uniform distribution between 0 and 2π, such that Ii = I cos(2πf t + θi ) ∀ i = 1, 2, . . . N. (3) In visually responsive neurons, this mimics a population of simple cells driven by a drifting grating of temporal frequency f , with the different phases arising from offsets in spatial receptive field locations. The randomly assigned phases in our model ensure that the spatial pattern of input is not correlated with the pattern of recurrent connectivity. In our selectivity analysis however (Fig. 3), we replace the random phases with spatial input patterns that are aligned with network connectivity. 2.2 PCA redux Principal component analysis (PCA) has been applied profitably to neuronal recordings (see for example [14]) but these analyses often plot activity trajectories corresponding to different network states using the fixed principal component coordinates derived from combined activities under all stimulus conditions. Our analysis offers a complementary approach whereby separate principal components are derived for each stimulus condition, and the resulting principal angles reveal not only the difference between the shapes of trajectories corresponding to different network states, but also the orientation of the low-dimensional subspaces these trajectories occupy within the full N -dimensional space of neuronal activity. The instantaneous network state can be described by a point in an N -dimensional space with coordinates equal to the firing rates of the N units. Over time, the network activity traverses a trajectory in this N -dimensional space and PCA can be used to delineate the subspace in which this trajectory lies. The analysis is done by diagonalizing the equal-time cross-correlation matrix of network firing rates given by, Dij = (ri (t) − ri )(rj (t) − rj ) , (4) where

6 0.74225432 178 nips-2010-Multivariate Dyadic Regression Trees for Sparse Learning Problems

7 0.662673 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

8 0.65908372 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

9 0.61308181 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

10 0.5972923 173 nips-2010-Multi-View Active Learning in the Non-Realizable Case

11 0.59691733 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression

12 0.5923925 282 nips-2010-Variable margin losses for classifier design

13 0.58894265 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis

14 0.58179814 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

15 0.58106947 117 nips-2010-Identifying graph-structured activation patterns in networks

16 0.57773149 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

17 0.57227874 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability

18 0.5701859 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

19 0.56999701 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

20 0.5584147 281 nips-2010-Using body-anchored priors for identifying actions in single images