nips nips2013 nips2013-190 knowledge-graph by maker-knowledge-mining

190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

Source: pdf

Author: Carl Doersch, Abhinav Gupta, Alexei A. Efros

Abstract: Recent work on mid-level visual representations aims to capture information at the level of complexity higher than typical “visual words”, but lower than full-blown semantic objects. Several approaches [5, 6, 12, 23] have been proposed to discover mid-level visual elements, that are both 1) representative, i.e., frequently occurring within a visual dataset, and 2) visually discriminative. However, the current approaches are rather ad hoc and difﬁcult to analyze and evaluate. In this work, we pose visual element discovery as discriminative mode seeking, drawing connections to the the well-known and well-studied mean-shift algorithm [2, 1, 4, 8]. Given a weakly-labeled image collection, our method discovers visually-coherent patch clusters that are maximally discriminative with respect to the labels. One advantage of our formulation is that it requires only a single pass through the data. We also propose the Purity-Coverage plot as a principled way of experimentally analyzing and evaluating different visual discovery approaches, and compare our method against prior work on the Paris Street View dataset of [5]. We also evaluate our method on the task of scene classiﬁcation, demonstrating state-of-the-art performance on the MIT Scene-67 dataset. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Recent work on mid-level visual representations aims to capture information at the level of complexity higher than typical “visual words”, but lower than full-blown semantic objects. [sent-8, score-0.249]

2 Several approaches [5, 6, 12, 23] have been proposed to discover mid-level visual elements, that are both 1) representative, i. [sent-9, score-0.249]

3 , frequently occurring within a visual dataset, and 2) visually discriminative. [sent-11, score-0.347]

4 In this work, we pose visual element discovery as discriminative mode seeking, drawing connections to the the well-known and well-studied mean-shift algorithm [2, 1, 4, 8]. [sent-13, score-0.604]

5 Given a weakly-labeled image collection, our method discovers visually-coherent patch clusters that are maximally discriminative with respect to the labels. [sent-14, score-0.543]

6 We also propose the Purity-Coverage plot as a principled way of experimentally analyzing and evaluating different visual discovery approaches, and compare our method against prior work on the Paris Street View dataset of [5]. [sent-16, score-0.369]

7 1 Introduction In terms of sheer size, visual data is, by most accounts, the biggest “Big Data” out there. [sent-18, score-0.249]

8 [13]) are not equipped to handle it directly, at the raw pixel level, making research on ﬁnding good visual representations particularly relevant and timely. [sent-21, score-0.249]

9 Currently, the most popular visual representations in machine learning are based on “visual words” [24], which are obtained by unsupervised clustering (k-means) of local features (SIFT) over a large dataset. [sent-22, score-0.32]

10 Recently, several approaches [5, 6, 11, 12, 15, 23, 26, 27] have proposed mining visual data for discriminative mid-level visual elements, i. [sent-28, score-0.69]

11 , scene categories [12] or GPS coordinates [5] (but can also run unsupervised [23]), and have been recently used for tasks including image classiﬁcation [12, 23, 27], object detection [6], visual data mining [5, 15], action recognition [11], and geometry estimation [7]. [sent-33, score-0.613]

12 But how are informative visual elements to be identiﬁed in the weakly-labeled visual dataset? [sent-34, score-0.706]

13 The idea is to search for clusters of image patches that are both 1) representative, i. [sent-35, score-0.443]

14 Unfortunately, algorithms for ﬁnding patches that ﬁt these criteria remain rather ad-hoc and poorly understood. [sent-38, score-0.279]

15 17 Figure 1: The distribution of patches in HOG feature space is very non-uniform and absolute distances cannot be trusted. [sent-50, score-0.321]

16 We show two patches with their 5 nearest-neighbors from the Paris Street View dataset [5]; beneath each nearest neighbor is its distance from query. [sent-51, score-0.347]

17 Although the nearest neighbors on the left are visually much better, their distances are more than twice those on the right, meaning that the actual densities of the two regions will differ by a factor of more than 2d , where d is the intrinsic dimensionality of patch feature space. [sent-52, score-0.346]

18 the well-known, well-understood mean-shift algorithm can produce visual elements that are more representative and discriminative than those of previous approaches. [sent-54, score-0.655]

19 Mining visual elements from a large dataset is difﬁcult for a number of reasons. [sent-55, score-0.525]

20 First, the search space is huge: a typical dataset for visual data mining has tens of thousands of images, and ﬁnding something in an image (e. [sent-56, score-0.423]

21 , ﬁnding matches for a visual template) involves searching across tens of thousands of patches at different positions and scales. [sent-58, score-0.528]

22 To make matters worse, patch descriptors tend to be on the order of thousands of dimensions; not only is the curse of dimensionality a constant problem, but we must sift through terabytes of data. [sent-59, score-0.284]

23 And we are searching for a needle in a haystack: the vast majority of patches are actually uninteresting, either because they are rare (e. [sent-60, score-0.279]

24 The goal of mean-shift is to ﬁnd the local maxima (modes) of a density using a sample from that density. [sent-65, score-0.273]

25 But in our case, we can use the weak labels to divide our data into two different subsets (“positive” (+) and “negative” ( )) and seek visual elements which appear only in the “positive” set and not in the “negative” set. [sent-69, score-0.494]

26 That is, we want to ﬁnd points in feature space where the density of the positive set is large, and the density of the negative set is small. [sent-70, score-0.358]

27 While a number of algorithms exist for estimating ratios of densities (see [25] for a review), we did not ﬁnd any that were particularly suitable for ﬁnding local maxima of density ratios. [sent-72, score-0.273]

28 Hence, the ﬁrst contribution of our paper is to propose a discriminative variant of mean-shift for ﬁnding visual elements. [sent-73, score-0.39]

29 Similar to the way mean-shift performs gradient ascent on a density estimate, our algorithm performs gradient ascent on the density ratio (section 2). [sent-74, score-0.534]

30 When we perform gradient ascent separately for each element as in standard mean-shift, however, we ﬁnd that the most frequently-occuring elements tend to be over-represented. [sent-75, score-0.427]

31 Hence, section 3 describes a modiﬁcation to our gradient ascent algorithm which uses inter-element communication to approximate common adaptive bandwidth procedures. [sent-76, score-0.358]

32 Finally, in section 4 we demonstrate that our algorithms produce visual elements which are more representative and discriminative than previous methods, and in section 5 we show they signiﬁcantly improve performance in scene classiﬁcation. [sent-77, score-0.811]

33 2 Mode Seeking on Density Ratios Our goal is to extract discriminative visual elements by ﬁnding the local maxima of the density ratio. [sent-78, score-0.871]

34 , a Gaussian), and h is a globally-shared bandwidth parameter. [sent-81, score-0.236]

35 The 2 bandwidth deﬁnes how much the density is smoothed before gradient ascent is performed, meaning these estimators assume a roughly equal distribution of points in all regions of the space. [sent-82, score-0.518]

36 Unfortunately, absolute distances in HOG feature space cannot be trusted, as shown in Figure 1: any kernel bandwidth which is large enough to work well in the left example will be far too large to work well in the right. [sent-83, score-0.321]

37 One way to deal with the non-uniformity of the feature space is to use an adaptive bandwidth [4]: that is, different bandwidths are used in different regions of the space. [sent-84, score-0.357]

38 We want to maximize the density ratio, so we simply divide the two density estimates. [sent-91, score-0.273]

39 We allow an adaptive bandwidth, but rather than associating a bandwidth with each datapoint, we compute it as a function of w which depends on the data. [sent-92, score-0.236]

40 Hence, we deﬁne B(w) as the value of b which satisﬁes: nneg X max(b (3) d(xi , w), 0) = i=1 Where is a constant analogous to the bandwidth parameter, except that it directly controls how many negative datapoints are in each cluster. [sent-95, score-0.392]

41 This approach makes the implicit assumption that the distribution of the negatives captures the overall density of the patch space. [sent-98, score-0.317]

42 We can further rewrite the above equation as ﬁnding the local maxima of: npos X i=1 nneg max(w> x+ i b, 0) kwk2 s. [sent-107, score-0.307]

43 We run this as an online algorithm by breaking the dataset into chunks and then mining, one chunk at a time, for patches where w> x b > ✏ for some small ✏, akin to “hard mining” for SVMs. [sent-121, score-0.409]

44 One way to deal with this is to assign smaller bandwidths to patches in dense regions of the space [4], e. [sent-128, score-0.358]

45 , the window railing on row 1 of Figure 2 (middle) would hopefully have a smaller bandwidth and hence not match to the sidewalk barrier. [sent-130, score-0.363]

46 However, estimating a bandwidth for every datapoint in our setting is not practical, so we seek an approach which only requires one pass through the data. [sent-131, score-0.328]

47 Since patches in regions of the feature space with high density ratio will be members of many clusters, we want a mechanism that will reduce their bandwidth. [sent-132, score-0.535]

48 Speciﬁcally, we control how a single patch can contribute to multiple clusters by introducing a sharing weight ↵i,j for each patch i that is contained in a cluster j, akin to soft-assignment in EM GMM ﬁtting. [sent-134, score-0.573]

49 Returning to our fomulation, we maximize (again with respect to the w’s and b’s): npos m XX i=1 j=1 > ↵i,j max(wj x+ i bj , 0) m X j=1 nneg kwj k2 s. [sent-135, score-0.235]

50 8j X > max(wj xi bj , 0) = (6) i=1 Where each ↵i,j is chosen such that any patch which is a member of multiple clusters gets a lower weight. [sent-137, score-0.428]

51 (6) also has a natural interpretation in terms of maximizing the “representativeness” of the set of clusters: clusters are rewarded for representing patches that are not repre> sented by other clusters. [sent-138, score-0.388]

52 However, since w is roughly proportional to the density of the positive data, the bandwidth is only reduced when the density of positive data is high. [sent-141, score-0.542]

53 In each plot, purity measures the accuracy of the element detectors, whereas coverage captures how often they ﬁre. [sent-174, score-0.579]

54 However, this goes against our mean-shift intuition: if two patches are really instances of the same element, then clusters initialized from those two points should converge to the same mode and not “compete” with one another. [sent-179, score-0.453]

55 Then we set ↵i,j = > max(wj x+ i > max(wj x+ bj , 0) i Pm > bj , 0) + k=1 I(Ck 6= Cj ) max(wk x+ i (7) bk , 0) In this way, any “competition” from elements that are too similar to each other is ignored. [sent-182, score-0.411]

56 To obtain the clusters, we perform agglomerative (UPGMA) clustering on the set of element clusters, using the negative of the number of overlapping cluster members as a “distance” metric. [sent-183, score-0.242]

57 In practice, however, it is extremely rare that the exact same patch is a member of two different clusters; instead, clusters will have member patches that merely overlap with each other. [sent-184, score-0.661]

58 Then we compute ↵i,j for a given patch by averaging ↵i,j,p over all pixels in the patch. [sent-186, score-0.261]

59 It is admittedly difﬁcult to analyze how well these heuristics approximate the adaptive bandwidth approach of [4], and even there the setting of the bandwidth for each datapoint has heuristic aspects. [sent-188, score-0.609]

60 4 Evaluation via Purity-Coverage Plot Our aim is to discover visual elements that are maximally representative and discriminative. [sent-190, score-0.553]

61 To measure this, we deﬁne two quantities for a set of visual elements: coverage (which captures representativeness) and purity (which captures discriminativeness). [sent-191, score-0.731]

62 Given a held-out test set, visual elements will generate a set of patch detections. [sent-192, score-0.656]

63 We deﬁne the coverage of this set of patches to be the fraction of the pixels from the positive images claimed by at least one patch. [sent-193, score-0.675]

64 We deﬁne the purity of a set as the percentage of the patches that share the same label. [sent-194, score-0.555]

65 For an individual visual element, of course, there is an inherent trade-off between purity and coverage: if we lower the detection threshold, we cover more pixels but also increase the likelihood of making mistakes. [sent-195, score-0.653]

66 We could perform this analysis on any dataset containing positive and negative images, but [5] presents a dataset which is particularly suitable. [sent-197, score-0.216]

67 The goal is to mine visual elements which deﬁne the look and feel of a geographical locale, with a training set of 2,000 Paris Street View images and 8,000 5 Purity of 100% Purity of 90% 0. [sent-198, score-0.551]

68 2012) 100 200 300 Number of Elements 400 500 Figure 4: Coverage versus the number of elements used in the representation. [sent-212, score-0.208]

69 On the right, we lower the detection threshold until the elements are 90% pure. [sent-216, score-0.274]

70 Note: this is the same purity and coverage measure for the same elements as Figure 3, just plotted differently. [sent-217, score-0.69]

71 To plot the curve for a given value of purity p, we rank all patches by w> x b independently for every element, and select, for a given element, all patches up until the last point where the element has the desired purity. [sent-220, score-0.931]

72 We then compute the coverage as the union of patches selected for every element. [sent-221, score-0.485]

73 Because we are taking a union of patches, adding more elements can only increase coverage, but in practice we prefer concise representations, both for interpretability and for computational reasons. [sent-222, score-0.208]

74 Hence, to compare two element discovery methods, we must select exactly the same number of elements for both of them. [sent-223, score-0.393]

75 Hence, we select elements in the same way for all algorithms, which approximates an “ideal” selection for our measure. [sent-225, score-0.244]

76 Speciﬁcally, we ﬁrst ﬁx a level of purity (95%) and greedily select elements to maximize coverage (on the testing data) for that level of purity. [sent-226, score-0.726]

77 Hence, this ranking serves as an oracle to choose the “best” set of elements for covering the dataset at that level of purity. [sent-227, score-0.276]

78 While this ranking has a bias toward large elements (which inherently cover more pixels per detection), we believe that it provides a valuable comparison between algorithms. [sent-228, score-0.27]

79 We can also slice the same data differently, ﬁxing a level of purity for all elements and varying the number of elements, as shown in Figure 4. [sent-230, score-0.484]

80 We initially train 20, 000 visual elements for all the baselines, and select the top elements using the method above. [sent-233, score-0.701]

81 Each cluster is represented by a hyperplane which maximally separates a single seed patch from the negative dataset learned via LDA, i. [sent-235, score-0.417]

82 To show the effects of re-clustering, “LDA Retrained” takes the top 5 positive-set patches retrieved in Exemplar LDA (including the initial patch itself), and repeats LDA, separating those 5 from the negative Gaussian. [sent-238, score-0.523]

83 Finally, “LDA Retrained 5 times” begins with elements initialized via the LDA retraining method, and retrains the LDA classiﬁer, each time throwing out the previous top 5 used to train the previous LDA, and selecting a new top 5 from held-out data. [sent-240, score-0.245]

84 Implementation details: We use the same patch descriptors described in [5] and whiten them following [10]. [sent-244, score-0.241]

85 We mine elements using the online version of our algorithm, with a chunk size of 1000 (200 Paris, 800 non-Paris per batch). [sent-245, score-0.309]

86 We set ⇤ = t/500 where t is the iteration number, such that the bandwidth increases proportional to the number of samples. [sent-246, score-0.236]

87 We train the elements for about 200 6 Figure 5: For each correctly classiﬁed image (left), we show four elements (center) and heatmap of the locations (right) that contributed most to the classiﬁcation. [sent-247, score-0.471]

88 To compute ↵i,j for patch i and detector j, we actually use scale-space voxels rather than pixels, since a large detection can completely cover a small detection but not vice versa. [sent-267, score-0.375]

89 Finally, to reduce the impact of highly redundant textures, we divide ↵i,j divided by the total number of detections for element j in the image containing i. [sent-273, score-0.259]

90 5 Scene Classiﬁcation Finally, we evaluate whether our visual element representation is useful for scene classiﬁcation. [sent-275, score-0.502]

91 For instance, it may not be obvious why a corridor would be classiﬁed as a staircase, but we can see (top right) that the algorithm has identiﬁed the railings as a key staircase element, and has found no other staircase elements the image. [sent-277, score-0.437]

92 For indoor scenes, objects within the scene are often more useful features than global scene statistics [12]: for instance, shoe shops are similar to other stores in global layout, but they mostly contain shoes. [sent-279, score-0.312]

93 We also used smaller descriptors: 6-by-6 HOG cells, corresponding to 64-by-64 patches and 1188-dimensional descriptors. [sent-284, score-0.279]

94 We again select elements by ﬁxing purity and greedily selecting elements to maximize coverage, as above. [sent-285, score-0.728]

95 We even outperform the Improved Fisher Vector of [12], as well as IFV combined with discriminative patches (IFV+BoP). [sent-295, score-0.42]

96 6 Conclusion We developed an extension of the classic mean-shift algorithm to density ratio estimation, showing that the resulting algorithm could be used for element discovery, and demonstrating state-of-the-art results for scene classiﬁcation. [sent-299, score-0.425]

97 Also, our elements are detected based only on individual patches, but images often contain global structures beyond patches. [sent-304, score-0.263]

98 The variable bandwidth mean shift and data-driven scale selection. [sent-333, score-0.28]

99 Object bank: A high-level image representation for scene classiﬁcation and semantic feature sparsiﬁcation. [sent-403, score-0.253]

100 Learning discriminative part detectors for image classiﬁcation and cosegmentation. [sent-481, score-0.238]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('patches', 0.279), ('purity', 0.276), ('visual', 0.249), ('bandwidth', 0.236), ('elements', 0.208), ('lda', 0.207), ('coverage', 0.206), ('patch', 0.199), ('retrained', 0.165), ('scene', 0.156), ('discriminative', 0.141), ('ifv', 0.126), ('maxima', 0.118), ('density', 0.118), ('clusters', 0.109), ('doersch', 0.101), ('element', 0.097), ('datapoint', 0.092), ('staircase', 0.089), ('bj', 0.083), ('cvpr', 0.081), ('guess', 0.08), ('paris', 0.079), ('ascent', 0.077), ('wj', 0.077), ('bop', 0.076), ('nneg', 0.076), ('npos', 0.076), ('sidewalk', 0.076), ('gupta', 0.075), ('street', 0.073), ('detections', 0.07), ('hog', 0.07), ('dataset', 0.068), ('exemplar', 0.068), ('gt', 0.066), ('cluster', 0.066), ('detection', 0.066), ('mode', 0.065), ('visually', 0.063), ('pixels', 0.062), ('hariharan', 0.062), ('chunk', 0.062), ('ini', 0.058), ('seeking', 0.057), ('representative', 0.057), ('sivic', 0.055), ('images', 0.055), ('image', 0.055), ('denominator', 0.055), ('ratio', 0.054), ('efros', 0.053), ('svm', 0.052), ('discovery', 0.052), ('classi', 0.052), ('mining', 0.051), ('abhinav', 0.051), ('centrist', 0.051), ('corridor', 0.051), ('noverlap', 0.051), ('pnneg', 0.051), ('railing', 0.051), ('ramesh', 0.051), ('modes', 0.048), ('max', 0.048), ('iccv', 0.048), ('gradient', 0.045), ('negative', 0.045), ('itera', 0.045), ('representativeness', 0.045), ('admittedly', 0.045), ('comaniciu', 0.045), ('shift', 0.044), ('detector', 0.044), ('kernel', 0.043), ('sift', 0.043), ('feature', 0.042), ('regions', 0.042), ('descriptors', 0.042), ('detectors', 0.042), ('cj', 0.039), ('maximally', 0.039), ('xing', 0.039), ('mine', 0.039), ('pm', 0.039), ('fraction', 0.038), ('divide', 0.037), ('member', 0.037), ('local', 0.037), ('bandwidths', 0.037), ('retraining', 0.037), ('centroid', 0.037), ('iarpa', 0.037), ('bk', 0.037), ('object', 0.036), ('select', 0.036), ('occurring', 0.035), ('datapoints', 0.035), ('positive', 0.035), ('clustering', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

Author: Carl Doersch, Abhinav Gupta, Alexei A. Efros

2 0.1803944 5 nips-2013-A Deep Architecture for Matching Short Texts

Author: Zhengdong Lu, Hang Li

Abstract: Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents, etc.). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufﬁcient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More speciﬁcally, we apply this model to matching tasks in natural language, e.g., ﬁnding sensible responses for a tweet, or relevant answers to a given question. This new architecture naturally combines the localness and hierarchy intrinsic to the natural language problems, and therefore greatly improves upon the state-of-the-art models. 1

3 0.13431792 119 nips-2013-Fast Template Evaluation with Vector Quantization

Author: Mohammad Amin Sadeghi, David Forsyth

Abstract: Applying linear templates is an integral part of many object detection systems and accounts for a signiﬁcant portion of computation time. We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy. Our method is a combination of approximating scores by vector quantizing feature windows and a number of speedup techniques including cascade. Our procedure allows speed and accuracy to be traded off in two ways: by choosing the number of Vector Quantization levels, and by choosing to rescore windows or not. Our method can be directly plugged into any recognition system that relies on linear templates. We demonstrate our method to speed up the original Exemplar SVM detector [1] by an order of magnitude and Deformable Part models [2] by two orders of magnitude with no loss of accuracy. 1

4 0.13235329 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

Author: Yangqing Jia, Joshua T. Abbott, Joseph Austerweil, Thomas Griffiths, Trevor Darrell

Abstract: Learning a visual concept from a small number of positive examples is a significant challenge for machine learning algorithms. Current methods typically fail to ﬁnd the appropriate level of generalization in a concept hierarchy for a given set of visual examples. Recent work in cognitive science on Bayesian models of generalization addresses this challenge, but prior results assumed that objects were perfectly recognized. We present an algorithm for learning visual concepts directly from images, using probabilistic predictions generated by visual classiﬁers as the input to a Bayesian generalization model. As no existing challenge data tests this paradigm, we collect and make available a new, large-scale dataset for visual concept learning using the ImageNet hierarchy as the source of possible concepts, with human annotators to provide ground truth labels as to whether a new image is an instance of each concept using a paradigm similar to that used in experiments studying word learning in children. We compare the performance of our system to several baseline algorithms, and show a signiﬁcant advantage results from combining visual classiﬁers with the ability to identify an appropriate level of abstraction using Bayesian generalization. 1

5 0.13128249 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

Author: Zhenwen Dai, Georgios Exarchakis, Jörg Lücke

Abstract: We study optimal image encoding based on a generative approach with non-linear feature combinations and explicit position encoding. By far most approaches to unsupervised learning of visual features, such as sparse coding or ICA, account for translations by representing the same features at different positions. Some earlier models used a separate encoding of features and their positions to facilitate invariant data encoding and recognition. All probabilistic generative models with explicit position encoding have so far assumed a linear superposition of components to encode image patches. Here, we for the ﬁrst time apply a model with non-linear feature superposition and explicit position encoding for patches. By avoiding linear superpositions, the studied model represents a closer match to component occlusions which are ubiquitous in natural images. In order to account for occlusions, the non-linear model encodes patches qualitatively very different from linear models by using component representations separated into mask and feature parameters. We ﬁrst investigated encodings learned by the model using artiﬁcial data with mutually occluding components. We ﬁnd that the model extracts the components, and that it can correctly identify the occlusive components with the hidden variables of the model. On natural image patches, the model learns component masks and features for typical image components. By using reverse correlation, we estimate the receptive ﬁelds associated with the model’s hidden units. We ﬁnd many Gabor-like or globular receptive ﬁelds as well as ﬁelds sensitive to more complex structures. Our results show that probabilistic models that capture occlusions and invariances can be trained efﬁciently on image patches, and that the resulting encoding represents an alternative model for the neural encoding of images in the primary visual cortex. 1

6 0.11740539 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model

7 0.11351682 167 nips-2013-Learning the Local Statistics of Optical Flow

8 0.10491367 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

9 0.099962004 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning

10 0.098885849 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression

11 0.093453743 84 nips-2013-Deep Neural Networks for Object Detection

12 0.091681518 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

13 0.090640977 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer

14 0.090340376 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs

15 0.088986903 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

16 0.086460687 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation

17 0.086445853 251 nips-2013-Predicting Parameters in Deep Learning

18 0.083472513 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections

19 0.083183348 148 nips-2013-Latent Maximum Margin Clustering

20 0.082108483 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.219), (1, 0.098), (2, -0.124), (3, -0.091), (4, 0.131), (5, -0.042), (6, -0.032), (7, 0.018), (8, -0.027), (9, 0.037), (10, -0.148), (11, 0.044), (12, 0.001), (13, 0.014), (14, 0.03), (15, 0.034), (16, 0.024), (17, -0.173), (18, -0.054), (19, -0.002), (20, 0.011), (21, 0.055), (22, 0.007), (23, 0.01), (24, -0.12), (25, -0.048), (26, 0.035), (27, -0.019), (28, -0.027), (29, -0.066), (30, 0.035), (31, 0.046), (32, 0.094), (33, -0.018), (34, -0.054), (35, 0.023), (36, 0.031), (37, 0.032), (38, -0.065), (39, 0.02), (40, 0.054), (41, -0.077), (42, -0.045), (43, -0.034), (44, -0.075), (45, -0.09), (46, -0.013), (47, 0.056), (48, 0.031), (49, -0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96031713 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

Author: Carl Doersch, Abhinav Gupta, Alexei A. Efros

2 0.79187441 195 nips-2013-Modeling Clutter Perception using Parametric Proto-object Partitioning

Author: Chen-Ping Yu, Wen-Yu Hua, Dimitris Samaras, Greg Zelinsky

Abstract: Visual clutter, the perception of an image as being crowded and disordered, affects aspects of our lives ranging from object detection to aesthetics, yet relatively little effort has been made to model this important and ubiquitous percept. Our approach models clutter as the number of proto-objects segmented from an image, with proto-objects deﬁned as groupings of superpixels that are similar in intensity, color, and gradient orientation features. We introduce a novel parametric method of clustering superpixels by modeling mixture of Weibulls on Earth Mover’s Distance statistics, then taking the normalized number of proto-objects following partitioning as our estimate of clutter perception. We validated this model using a new 90-image dataset of real world scenes rank ordered by human raters for clutter, and showed that our method not only predicted clutter extremely well (Spearman’s ρ = 0.8038, p < 0.001), but also outperformed all existing clutter perception models and even a behavioral object segmentation ground truth. We conclude that the number of proto-objects in an image affects clutter perception more than the number of objects or features. 1

3 0.77361554 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach

Author: Zhenwen Dai, Georgios Exarchakis, Jörg Lücke

4 0.72693175 119 nips-2013-Fast Template Evaluation with Vector Quantization

Author: Mohammad Amin Sadeghi, David Forsyth

5 0.71124256 166 nips-2013-Learning invariant representations and applications to face verification

Author: Qianli Liao, Joel Z. Leibo, Tomaso Poggio

Abstract: One approach to computer object recognition and modeling the brain’s ventral stream involves unsupervised learning of representations that are invariant to common transformations. However, applications of these ideas have usually been limited to 2D afﬁne transformations, e.g., translation and scaling, since they are easiest to solve via convolution. In accord with a recent theory of transformationinvariance [1], we propose a model that, while capturing other common convolutional networks as special cases, can also be used with arbitrary identitypreserving transformations. The model’s wiring can be learned from videos of transforming objects—or any other grouping of images into sets by their depicted object. Through a series of successively more complex empirical tests, we study the invariance/discriminability properties of this model with respect to different transformations. First, we empirically conﬁrm theoretical predictions (from [1]) for the case of 2D afﬁne transformations. Next, we apply the model to non-afﬁne transformations; as expected, it performs well on face veriﬁcation tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction. Surprisingly, it can also tolerate clutter “transformations” which map an image of a face on one background to an image of the same face on a different background. Motivated by these empirical ﬁndings, we tested the same model on face veriﬁcation benchmark tasks from the computer vision literature: Labeled Faces in the Wild, PubFig [2, 3, 4] and a new dataset we gathered—achieving strong performance in these highly unconstrained cases as well. 1

6 0.70424241 167 nips-2013-Learning the Local Statistics of Optical Flow

7 0.69150394 84 nips-2013-Deep Neural Networks for Object Detection

8 0.68071997 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty

9 0.65693998 329 nips-2013-Third-Order Edge Statistics: Contour Continuation, Curvature, and Cortical Connections

10 0.65052682 37 nips-2013-Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs

11 0.64916557 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation

12 0.64776963 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

13 0.63184965 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking

14 0.61340153 226 nips-2013-One-shot learning by inverting a compositional causal process

15 0.61077029 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator

16 0.55277842 136 nips-2013-Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream

17 0.55256802 5 nips-2013-A Deep Architecture for Matching Short Texts

18 0.53638995 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling

19 0.53638077 357 nips-2013-k-Prototype Learning for 3D Rigid Structures

20 0.53335875 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.017), (16, 0.037), (18, 0.176), (19, 0.016), (33, 0.186), (34, 0.125), (41, 0.02), (49, 0.052), (56, 0.1), (70, 0.056), (85, 0.041), (89, 0.037), (93, 0.068), (95, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.88002539 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking

Author: Carl Doersch, Abhinav Gupta, Alexei A. Efros

2 0.82578927 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

Author: Nataliya Shapovalova, Michalis Raptis, Leonid Sigal, Greg Mori

Abstract: We propose a weakly-supervised structured learning approach for recognition and spatio-temporal localization of actions in video. As part of the proposed approach, we develop a generalization of the Max-Path search algorithm which allows us to efﬁciently search over a structured space of multiple spatio-temporal paths while also incorporating context information into the model. Instead of using spatial annotations in the form of bounding boxes to guide the latent model during training, we utilize human gaze data in the form of a weak supervisory signal. This is achieved by incorporating eye gaze, along with the classiﬁcation, into the structured loss within the latent SVM learning framework. Experiments on a challenging benchmark dataset, UCF-Sports, show that our model is more accurate, in terms of classiﬁcation, and achieves state-of-the-art results in localization. In addition, our model can produce top-down saliency maps conditioned on the classiﬁcation label and localized latent paths. 1

3 0.82366562 201 nips-2013-Multi-Task Bayesian Optimization

Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams

Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efﬁciency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to ﬁnd optimal hyperparameter settings more efﬁciently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method signiﬁcantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1

4 0.82298529 64 nips-2013-Compete to Compute

Author: Rupesh K. Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino Gomez, Jürgen Schmidhuber

Abstract: Local competition among neighboring neurons is common in biological neural networks (NNs). In this paper, we apply the concept to gradient-based, backprop-trained artiﬁcial multilayer NNs. NNs with competing linear units tend to outperform those with non-competing nonlinear units, and avoid catastrophic forgetting when training sets change over time. 1

5 0.82291001 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

Author: Marius Pachitariu, Adam M. Packer, Noah Pettit, Henry Dalgleish, Michael Hausser, Maneesh Sahani

Abstract: Biological tissue is often composed of cells with similar morphologies replicated throughout large volumes and many biological applications rely on the accurate identiﬁcation of these cells and their locations from image data. Here we develop a generative model that captures the regularities present in images composed of repeating elements of a few different types. Formally, the model can be described as convolutional sparse block coding. For inference we use a variant of convolutional matching pursuit adapted to block-based representations. We extend the KSVD learning algorithm to subspaces by retaining several principal vectors from the SVD decomposition instead of just one. Good models with little cross-talk between subspaces can be obtained by learning the blocks incrementally. We perform extensive experiments on simulated images and the inference algorithm consistently recovers a large proportion of the cells with a small number of false positives. We ﬁt the convolutional model to noisy GCaMP6 two-photon images of spiking neurons and to Nissl-stained slices of cortical tissue and show that it recovers cell body locations without supervision. The ﬂexibility of the block-based representation is reﬂected in the variability of the recovered cell shapes. 1

6 0.82132757 251 nips-2013-Predicting Parameters in Deep Learning

7 0.81901801 173 nips-2013-Least Informative Dimensions

8 0.81851155 331 nips-2013-Top-Down Regularization of Deep Belief Networks

9 0.81694031 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

10 0.81686729 5 nips-2013-A Deep Architecture for Matching Short Texts

11 0.81686372 183 nips-2013-Mapping paradigm ontologies to and from the brain

12 0.81528199 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

13 0.81499207 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

14 0.81471449 301 nips-2013-Sparse Additive Text Models with Low Rank Background

15 0.81393123 294 nips-2013-Similarity Component Analysis

16 0.81375688 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning

17 0.81361556 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions

18 0.81298023 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

19 0.81240374 30 nips-2013-Adaptive dropout for training deep neural networks

20 0.81134444 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables