cvpr cvpr2013 cvpr2013-320 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Paul Wohlhart, Martin Köstinger, Michael Donoser, Peter M. Roth, Horst Bischof
Abstract: The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.
Reference: text
sentIndex sentText sentNum sentScore
1 Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. [sent-6, score-0.294]
2 Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. [sent-7, score-1.063]
3 In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. [sent-8, score-1.239]
4 Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. [sent-9, score-0.889]
5 Introduction Over the last decade we have seen an impressive progress in important fields of computer vision like image classification and object detection, which is mainly a result of developments in supervised machine learning and the steadily growing amount of available training data. [sent-12, score-0.17]
6 Largescale datasets for effectively training classifiers have already proven to be quite beneficial in several contexts. [sent-13, score-0.125]
7 Furthermore, optimal classification decision boundaries do not necessarily exhibit the structure as defined by a kernel. [sent-22, score-0.127]
8 For example, in [14] a stochastic gradient descent approach for solving the optimization problem of linear SVMs was introduced, where the total runtime does not directly depend on the size of the training set. [sent-24, score-0.151]
9 These prototypes (or anchor points) reside in the same space as the input data points and the classifier is then trained and evaluated in the context of a local neighborhood around them. [sent-29, score-0.948]
10 A recent example for such a classifier is the Locally Linear SVM (LL-SVM) proposed in [9], where the non-linear decision boundary is approximated with many linear SVMs. [sent-30, score-0.111]
11 The core idea is to embed data points into a new feature space, again defined by soft assignments to prototypes, and to learn a linear maxmargin classifier on this new representation. [sent-34, score-0.11]
12 An iterative method is introduced, that alternately optimizes the prototypes and then the classifier parameters. [sent-35, score-0.919]
13 The necessity for large-scale classification has in general led to a comeback of non-parametric classifiers like the nearest neighbor algorithm. [sent-37, score-0.318]
14 For example, in [1] it was shown that for Bag-of-words based image classification, simple adaptations in the visual word quantization and in the image-to-image distance calculation are sufficient to achieve state-of-the-art results using a simple nearest neighbor method as subsequent classifier. [sent-39, score-0.229]
15 Nearest neighbor classification especially shows promising performance in combination with metric learning (e. [sent-40, score-0.198]
16 , [17]), where the assignment to the nearest neighbors is based on a Mahalanobis distance metric instead of the common Euclidean distance. [sent-42, score-0.131]
17 In general, such methods aim at improving the nearest neighbor classification, e. [sent-43, score-0.187]
18 In such a way, these methods are a counterpart to max-margin classifiers like SVMs, where nearest neighbor assignments replace the weighted sum of (kernelized) distances to support vectors. [sent-46, score-0.321]
19 An interesting way to further reduce the complexity of nearest neighbor classifiers is to constrain the analysis to selected prototypes, instead of considering all training data samples during testing. [sent-47, score-0.354]
20 Finding optimal prototypes from labeled data has a long history, where methods are in general referred to as Learning Vector Quantization (LVQ). [sent-49, score-0.867]
21 This field was initiated by the seminal work of [8], where a heuristic was used to move prototypes close to training samples of the same class and away from other classes. [sent-50, score-1.002]
22 The method was later adapted in [13], where the problem was formulated as an optimization problem maximizing the likelihood ratio between the probability of correct classification and the total probabil- ity in a Gaussian Mixture Model (each prototype is represented as a Gaussian). [sent-51, score-0.325]
23 Remarkably, the authors demonstrate that prototype based classification is even able to outperform nearest neighbor classification using all training data due to improved generalization properties caused by a reduction of over-fitting to noise. [sent-53, score-0.623]
24 In this paper, we address large-scale classification by a prototype based nearest neighbor approach. [sent-54, score-0.512]
25 We aim to go beyond the most related methods of [9, 18], where local codings based on prototypes are used in a subsequently applied classifier, by completely focusing on the most simple variant: a 1-nearest neighbor (NN) algorithm. [sent-55, score-0.976]
26 Instead of considering all N training samples in the 1-NN algorithm as done in [8, 13, 3], we aim at discriminatively deriving a small, optimal set of P prototypes with P ? [sent-56, score-1.021]
27 single lnadb ebla soef the nearest neighbor within this prototype set. [sent-58, score-0.444]
28 To reach this goal, we present a differentiable probabilistic relaxation of 1-NN classification, based on soft assignments of samples to prototypes, that can be optimized using gradient descent. [sent-59, score-0.167]
29 Thus, in contrast to related methods, we can provide discriminative prototypes directly optimized for subsequent pure 1-NN classi- fication, by gradually decreasing the softness of the assignment while optimizing the prototypes. [sent-62, score-1.002]
30 In thorough experiments, we demonstrate that our discriminatively learned prototypes consistently outperform the kmeans baseline, even with very low numbers of prototypes. [sent-64, score-0.956]
31 Furthermore, we show that already a very small number of prototypes is sufficient to obtain reasonable classification accuracy on several datasets. [sent-65, score-0.916]
32 However, we show that by learning the prototypes in the proposed discriminative manner, learning Mahalanobis metrics becomes less important. [sent-67, score-0.892]
33 We first show in Section 2, how to relax the 1-nearest neighbor classification to a soft-max variant, which then allows to find the prototypes using gradient descent by minimizing the empirical risk of misclassification over the entire training dataset. [sent-69, score-1.169]
34 We derive formulations for the exponential and the hinge 444565991 loss, and discuss possible influences on the results. [sent-70, score-0.186]
35 Section 3 gives illustrative results on toy examples, as well as a thorough comparison to state-of-the-art on three standard classification datasets. [sent-71, score-0.197]
36 Method Given a training set T with labeled samples (xi, yi), wheGreiv xi a∈ Raind nangd yi T∈ w{−ith1, l1ab},e ltehde goal lise sto ( define a classifier∈ ∈f (Rx) = y th∈at correctly hcela gssoiaflies is t teost d samples from the same distribution. [sent-73, score-0.219]
37 As discussed before, to reduce the model complexity and the test time, our proposed classifier is a prototype based 1-nearest neighbor algorithm. [sent-74, score-0.437]
38 We aim at deriving discriminative, labeled prototypes (pj , θj) ∈ P, pj ∈ Rd, θj ∈ {−1, 1}, that are optimized for simple 1,-n pear∈est R neighbor c {la−ss1if,i1c}at,io thna: f(x) = θk, where k = argjmin||x − pj|| . [sent-75, score-1.038]
39 (1) To get an optimal classifier, the prototypes pj must be arranged such as to minimize the empirical risk of misclassification. [sent-76, score-0.922]
40 Thus, we formulate the optimization of the prototypes as a gradient descent on a loss function over the training examples. [sent-77, score-1.078]
41 (1) is not differentiable, we relax the hard assignment of samples to prototypes to a soft-max formulation over sample similarities (i. [sent-79, score-1.01]
42 j=1 The weights are defined by a soft-max over the similarities between input sample x and prototypes pj : wj(x) =? [sent-91, score-0.943]
43 milarity of samples x1 and x2, and γ controls the degree of softness in the assignment, which will be discussed in Sec. [sent-93, score-0.118]
44 The similarity of two samples is defined as the negative exponential of the distance between the samples: ? [sent-101, score-0.124]
45 (5) Inspired by the recent developments in metric learning, we also consider learned Mahalanobis distance metrics M to define the distances between samples: dmetric(x1,x2) = (x1 − x2)TM(x1 − x2) . [sent-105, score-0.13]
46 With lower values of γ more weight is given to the next closest prototypes and more distant interactions are established, until for γ = 0 all locality vanishes and the result of a classification is just the prior distribution of the training labels. [sent-114, score-0.959]
47 The discriminatively derived prototypes are then supposed to be used in 1-NN classification, as in Eq. [sent-117, score-0.872]
48 If we were sacrificing the benefit of the simple 1-NN classification and looking for an optimal soft-max classifier as defined in Eq. [sent-119, score-0.158]
49 In this way, we allow more than one prototype to have influence on a sample, and thus vice versa, let each sample have influence on more than one prototype. [sent-122, score-0.278]
50 For the current γ-value, we derive optimal prototypes using a gradient descent approach as outlined in the next section. [sent-123, score-0.95]
51 In practice, we initialize γ such that the difference between the highest and the second highest prototype weight is less than 0. [sent-126, score-0.257]
52 We then increase γ until for every training sample there is significant weight for only a single prototype (i. [sent-128, score-0.321]
53 Empirical loss over training data In order to optimize the prototypes with gradient descent, we need to design a derivable loss function, measuring the quality of the output of the current probabilistic classifier setup. [sent-134, score-1.219]
54 Given the definition of the classifier, the loss over all training samples is defined by L(T ,Δ? [sent-135, score-0.208]
55 is a function measuring the loss of the classification outcome. [sent-141, score-0.172]
56 In particular, we apply two types of loss functions, namely the exponential loss Δexp(f(xi), yi) = exp (−yif(xi)) (8) and the hinge loss Δhinge(f(xi), yi) = max(0, δ − yif(xi)) , (9) which is more robust to outliers, where with δ the margin can be adjusted. [sent-142, score-0.529]
57 prototypes In order to perform the gradient descent, we need the derivatives w. [sent-151, score-0.9]
58 In the case of multiclass classification tasks with C classes we adopt a lifted formulation and extend training and prototype labels to vectors yi, θj ∈ {−1, 1}C (i. [sent-159, score-0.447]
59 Partial derivatives for each individual part of the classification loss w. [sent-168, score-0.193]
60 t prototype locations, as needed for the optimization with gradient descent. [sent-170, score-0.288]
61 The multi-class loss is then defined as the sum over the losses for each individual class: ? [sent-171, score-0.123]
62 However, although this formulation yields meaningful results on toy examples, it consistently produced considerably worse result than the formulation in Eq. [sent-191, score-0.159]
63 Experiments We first outline characteristics of our proposed method for selecting optimal prototypes for 1-NN classification on two toy examples in Section 3. [sent-194, score-1.012]
64 In all experiments, we initialize our method using prototypes defined by applying k-means clustering for each class individually. [sent-198, score-0.88]
65 Starting from the k-means initialization, we apply our iterative prototype updating as described in the previ- ous section using the non-linear conjugate gradient descent method (ncg) from the Poblano Toolbox [4]. [sent-202, score-0.34]
66 Our final classifier is then defined by using the obtained prototypes in a 1-NN assignment strategy, as defined in Eq. [sent-203, score-0.951]
67 Illustrative toy examples We illustrate how prototypes evolve during our optimization for two 2D toy examples, namely a dataset consisting of three classes with multimodal distributions and a dataset of 10 classes with regular structure. [sent-207, score-1.048]
68 Figures 2 and 3 show prototypes and the consequently emerging decision surfaces for 1-nearest neighbor classification using either the exponential or the hinge loss during our iterative prototype update method. [sent-208, score-1.593]
69 2(f)-(g) and 3(f)-(g)), where the x-axis represents the value of γ used to optimize the prototypes in the current iteration of the algorithm, the y-axis represents values of γ used in the soft-max over similarities, and the height of the mesh reflects the corresponding classification accuracy. [sent-212, score-0.934]
70 Note that the decision surfaces become much smoother during prototype evolution and that overfitting, prevalent for the initial k-means clustering, is drastically reduced. [sent-213, score-0.297]
71 The hinge loss smoothes the decision boundary more than the exponential loss, where for higher values of γ the decision boundaries tend to fit more around single samples that × are scattered into areas that are more densely populated by samples from differing classes. [sent-214, score-0.519]
72 Note also the very different configuration of the finally obtained prototypes for the two different loss functions. [sent-215, score-0.952]
73 Generally, with the hinge loss the prototypes tend to be grouped together to more generic locations, whereas with the exponential loss prototypes distribute more to also capture instances in low density areas. [sent-216, score-2.071]
74 The difference is also reflected in the performance plots, which show steadily increased training accuracy for increasing γ values in both cases, but a slight breakdown in the test accuracy for the exponential loss. [sent-218, score-0.132]
75 Image classification We evaluate our method on three widely used classification datasets, each providing sufficient training samples per class, such that a reduction to a small set of discriminative prototypes really pays off. [sent-222, score-1.115]
76 Our proposed Nearest Prototype Classifier (NPC) with exponential loss is denoted by NPC-e and with hinge loss NPC-h. [sent-232, score-0.375]
77 We show results for different numbers k of prototypes per class. [sent-233, score-0.848]
78 2 (f) train (left) and test (right) accuracy, exp loss (g) train (left) and test (right) accuracy, hinge loss Figure 2. [sent-275, score-0.335]
79 Evolution of prototypes during learning on a 2D toy example with 3 classes. [sent-276, score-0.925]
80 In general, using our learned prototypes consistently outperforms the baseline, where prototypes are defined by k-means clustering results. [sent-318, score-1.749]
81 USPS, NPC-e, k=50) our nearest prototype classifier even outperforms 1-NN classification using all training data samples, although only a few discriminative prototypes are considered (e. [sent-321, score-1.392]
82 This clearly demonstrates the ability of our method to identify powerful prototypes in a discriminative manner, while reducing overfitting to noise and improving generalization properties. [sent-324, score-0.936]
83 As opposed to the simple 2D toy examples, the exponential loss generally performs a bit better than the hinge loss. [sent-325, score-0.348]
84 The effect of using the learned distance metric instead of Euclidean distances on the classification error varies over the different settings. [sent-327, score-0.148]
85 When using all training samples or the initial k-means prototypes for the 1-NN classification, the learned metric performs consistently better. [sent-328, score-1.026]
86 Nevertheless, if using our discriminatively learned prototypes, the plain Euclidean distance shows better performance, especially when the number of learned prototypes is low. [sent-329, score-0.93]
87 Already with only 15 prototypes, it consistently outperforms PNN [18], which uses 40 prototypes per class. [sent-331, score-0.872]
88 If using 50 prototypes, we also improve over the ensemble version EPNN [18], which uses 800 prototypes (40 per class 20 weak classifiers). [sent-332, score-0.9]
89 For example, if using 50 prototypes per class, Euclidean distances and the exponential loss we reach the same performance as a multiclass SVM with RBF kernel on the USPS dataset. [sent-339, score-1.095]
90 Visualization of learned prototypes Since the input to our classifier is a vector representation of raw pixel values, it is also possible to visualize the learned prototypes in comparison to standard k-means results. [sent-343, score-1.825]
91 Figure 4 visualizes obtained prototypes for the MNIST dataset, where the discriminatively identified prototypes are sorted per class in descending importance, analyzing the sum of weights assigned by the training samples. [sent-344, score-1.832]
92 As can be seen, prototypes obtained by k-means clustering purely capture generative aspects of different modes of the training data, while our learning approach adds discriminative information. [sent-345, score-0.918]
93 Our prototypes reflect this by featuring negative values in those areas, with the purpose of increasing the distance to samples of other classes. [sent-347, score-0.909]
94 Samples of some of the prototypes for MNIST (a) initialized by k-means and (b) learned with our algorithm. [sent-350, score-0.877]
95 In this paper we showed that by a probabilistic relaxation of the 1-NN classification into a soft-max formulation, we are able to derive optimal prototypes by gradient descent in a discriminative manner. [sent-354, score-1.045]
96 Since during testing we only have to evaluate distances to the few prototypes ob- tained, our proposed method is highly efficient, where a further speedup would be possible using approximated nearest neighbor methods like KD-trees. [sent-355, score-1.065]
97 We further demonstrated in the experiments that based on our properly learned set of discriminative prototypes, we can achieve state-of-theart results on challenging datasets, even getting close to the performance of significantly more complex classifiers like non-linear kernel SVMs. [sent-356, score-0.159]
98 Additionally, we have noticed that, when starting with a setting considering more long range interactions, prototypes tend to group together at generic locations. [sent-357, score-0.848]
99 This motivates further research on how to start with a rather high number of prototypes and select the best ones afterwards. [sent-358, score-0.848]
100 Distance metric learning for large margin nearest neighbor classification. [sent-485, score-0.235]
wordName wordTfidf (topN-words)
[('prototypes', 0.848), ('prototype', 0.257), ('neighbor', 0.109), ('usps', 0.108), ('loss', 0.104), ('hinge', 0.104), ('nearest', 0.078), ('toy', 0.077), ('npc', 0.076), ('classifier', 0.071), ('classification', 0.068), ('exponential', 0.063), ('classifiers', 0.063), ('samples', 0.061), ('letter', 0.06), ('lvq', 0.057), ('softness', 0.057), ('descent', 0.052), ('mnist', 0.048), ('mahalanobis', 0.043), ('training', 0.043), ('overfitting', 0.041), ('decision', 0.04), ('dmetric', 0.038), ('epnn', 0.038), ('explos', 0.038), ('hingelos', 0.038), ('poblano', 0.038), ('pj', 0.037), ('pnn', 0.034), ('yif', 0.034), ('developments', 0.033), ('class', 0.032), ('assignment', 0.032), ('austrian', 0.031), ('thorough', 0.031), ('gradient', 0.031), ('distances', 0.03), ('xi', 0.03), ('learned', 0.029), ('formulation', 0.029), ('anchor', 0.029), ('locally', 0.028), ('additionally', 0.028), ('euclidean', 0.028), ('fc', 0.028), ('donoser', 0.027), ('wohlhart', 0.027), ('multiclass', 0.027), ('margin', 0.027), ('discriminative', 0.027), ('lc', 0.027), ('deriving', 0.026), ('steadily', 0.026), ('svms', 0.026), ('runtime', 0.025), ('lmnn', 0.025), ('kernels', 0.024), ('scattered', 0.024), ('yi', 0.024), ('discriminatively', 0.024), ('consistently', 0.024), ('exp', 0.023), ('afterwards', 0.023), ('digits', 0.023), ('calculation', 0.023), ('classes', 0.023), ('kernel', 0.023), ('infinity', 0.022), ('assignments', 0.022), ('differing', 0.022), ('sample', 0.021), ('jp', 0.021), ('derivatives', 0.021), ('metric', 0.021), ('illustrative', 0.021), ('pure', 0.02), ('powerful', 0.02), ('ensemble', 0.02), ('wj', 0.02), ('toolbox', 0.02), ('similarities', 0.019), ('beneficial', 0.019), ('optimal', 0.019), ('quantization', 0.019), ('sum', 0.019), ('go', 0.019), ('influences', 0.019), ('optimized', 0.018), ('differentiable', 0.018), ('risk', 0.018), ('seminal', 0.018), ('nevertheless', 0.018), ('weights', 0.018), ('optimize', 0.018), ('metrics', 0.017), ('soft', 0.017), ('minima', 0.017), ('primal', 0.017), ('getting', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
Author: Paul Wohlhart, Martin Köstinger, Michael Donoser, Peter M. Roth, Horst Bischof
Abstract: The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.
2 0.25621802 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
Author: Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
Abstract: In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. Experimental results on both ImageNet and image sets of general concepts demonstrate the superior performance of our model against the state-of-the-arts.
3 0.10671063 220 cvpr-2013-In Defense of Sparsity Based Face Recognition
Author: Weihong Deng, Jiani Hu, Jun Guo
Abstract: The success of sparse representation based classification (SRC) has largely boosted the research of sparsity based face recognition in recent years. A prevailing view is that the sparsity based face recognition performs well only when the training images have been carefully controlled and the number of samples per class is sufficiently large. This paper challenges the prevailing view by proposing a “prototype plus variation ” representation model for sparsity based face recognition. Based on the new model, a Superposed SRC (SSRC), in which the dictionary is assembled by the class centroids and the sample-to-centroid differences, leads to a substantial improvement on SRC. The experiments results on AR, FERET and FRGC databases validate that, if the proposed prototype plus variation representation model is applied, sparse coding plays a crucial role in face recognition, and performs well even when the dictionary bases are collected under uncontrolled conditions and only a single sample per classes is available.
4 0.073868737 39 cvpr-2013-Alternating Decision Forests
Author: Samuel Schulter, Paul Wohlhart, Christian Leistner, Amir Saffari, Peter M. Roth, Horst Bischof
Abstract: This paper introduces a novel classification method termed Alternating Decision Forests (ADFs), which formulates the training of Random Forests explicitly as a global loss minimization problem. During training, the losses are minimized via keeping an adaptive weight distribution over the training samples, similar to Boosting methods. In order to keep the method as flexible and general as possible, we adopt the principle of employing gradient descent in function space, which allows to minimize arbitrary losses. Contrary to Boosted Trees, in our method the loss minimization is an inherent part of the tree growing process, thus allowing to keep the benefits ofcommon Random Forests, such as, parallel processing. We derive the new classifier and give a discussion and evaluation on standard machine learning data sets. Furthermore, we show how ADFs can be easily integrated into an object detection application. Compared to both, standard Random Forests and Boosted Trees, ADFs give better performance in our experiments, while yielding more compact models in terms of tree depth.
5 0.063105561 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets
Author: Aurélien Lucchi, Yunpeng Li, Pascal Fua
Abstract: We propose a working set based approximate subgradient descent algorithm to minimize the margin-sensitive hinge loss arising from the soft constraints in max-margin learning frameworks, such as the structured SVM. We focus on the setting of general graphical models, such as loopy MRFs and CRFs commonly used in image segmentation, where exact inference is intractable and the most violated constraints can only be approximated, voiding the optimality guarantees of the structured SVM’s cutting plane algorithm as well as reducing the robustness of existing subgradient based methods. We show that the proposed method obtains better approximate subgradients through the use of working sets, leading to improved convergence properties and increased reliability. Furthermore, our method allows new constraints to be randomly sampled instead of computed using the more expensive approximate inference techniques such as belief propagation and graph cuts, which can be used to reduce learning time at only a small cost of performance. We demonstrate the strength of our method empirically on the segmentation of a new publicly available electron microscopy dataset as well as the popular MSRC data set and show state-of-the-art results.
6 0.061971154 67 cvpr-2013-Blocks That Shout: Distinctive Parts for Scene Classification
7 0.060494855 143 cvpr-2013-Efficient Large-Scale Structured Learning
8 0.059324097 388 cvpr-2013-Semi-supervised Learning of Feature Hierarchies for Object Detection in a Video
9 0.058447942 389 cvpr-2013-Semi-supervised Learning with Constraints for Person Identification in Multimedia Data
10 0.057012714 134 cvpr-2013-Discriminative Sub-categorization
11 0.056619003 371 cvpr-2013-SCaLE: Supervised and Cascaded Laplacian Eigenmaps for Visual Object Recognition Based on Nearest Neighbors
12 0.053915855 237 cvpr-2013-Kernel Learning for Extrinsic Classification of Manifold Features
13 0.05322846 244 cvpr-2013-Large Displacement Optical Flow from Nearest Neighbor Fields
14 0.052974001 260 cvpr-2013-Learning and Calibrating Per-Location Classifiers for Visual Place Recognition
15 0.051883433 179 cvpr-2013-From N to N+1: Multiclass Transfer Incremental Learning
16 0.051642347 223 cvpr-2013-Inductive Hashing on Manifolds
17 0.051028799 142 cvpr-2013-Efficient Detector Adaptation for Object Detection in a Video
18 0.049533531 407 cvpr-2013-Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera
19 0.047453661 319 cvpr-2013-Optimized Product Quantization for Approximate Nearest Neighbor Search
20 0.046251513 421 cvpr-2013-Supervised Kernel Descriptors for Visual Recognition
topicId topicWeight
[(0, 0.125), (1, -0.033), (2, -0.03), (3, 0.019), (4, 0.037), (5, 0.021), (6, -0.024), (7, -0.019), (8, -0.04), (9, -0.02), (10, -0.024), (11, -0.008), (12, -0.015), (13, -0.007), (14, -0.044), (15, -0.017), (16, -0.026), (17, -0.031), (18, 0.023), (19, -0.023), (20, -0.01), (21, -0.008), (22, 0.008), (23, 0.016), (24, 0.006), (25, 0.054), (26, -0.029), (27, 0.045), (28, 0.001), (29, -0.012), (30, -0.026), (31, 0.046), (32, 0.007), (33, -0.006), (34, -0.013), (35, -0.06), (36, -0.066), (37, -0.021), (38, -0.077), (39, -0.069), (40, -0.087), (41, -0.064), (42, -0.016), (43, 0.041), (44, 0.064), (45, 0.059), (46, -0.004), (47, 0.002), (48, 0.086), (49, 0.128)]
simIndex simValue paperId paperTitle
same-paper 1 0.90437526 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
Author: Paul Wohlhart, Martin Köstinger, Michael Donoser, Peter M. Roth, Horst Bischof
Abstract: The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.
2 0.67464536 157 cvpr-2013-Exploring Implicit Image Statistics for Visual Representativeness Modeling
Author: Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
Abstract: In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques. Unlike previous models that build their representativeness measure based on the visible data, our model takes the initial inputs as explicit positive reference and extend the measure by exploring the implicit negatives. Given a group of images that contains obvious visual concepts, we create a customized image ontology consisting of both positive and negative instances by mining the most related and confusable neighbors of the positive concept in ontological semantic knowledge bases. The representativeness of a new item is then determined by its likelihoods for both the positive and negative references. To ensure the effectiveness of probability inference as well as the cognitive plausibility, we discover the potential prototypes and treat them as an intermediate representation of semantic concepts. In the experiment, we evaluate the performance of representativeness models based on both human judgements and user-click logs of commercial image search engine. Experimental results on both ImageNet and image sets of general concepts demonstrate the superior performance of our model against the state-of-the-arts.
3 0.67134511 134 cvpr-2013-Discriminative Sub-categorization
Author: Minh Hoai, Andrew Zisserman
Abstract: The objective of this work is to learn sub-categories. Rather than casting this as a problem of unsupervised clustering, we investigate a weakly supervised approach using both positive and negative samples of the category. We make the following contributions: (i) we introduce a new model for discriminative sub-categorization which determines cluster membership for positive samples whilst simultaneously learning a max-margin classifier to separate each cluster from the negative samples; (ii) we show that this model does not suffer from the degenerate cluster problem that afflicts several competing methods (e.g., Latent SVM and Max-Margin Clustering); (iii) we show that the method is able to discover interpretable sub-categories in various datasets. The model is evaluated experimentally over various datasets, and itsperformance advantages over k-means and Latent SVM are demonstrated. We also stress test the model and show its resilience in discovering sub-categories as the parameters are varied.
4 0.64894682 39 cvpr-2013-Alternating Decision Forests
Author: Samuel Schulter, Paul Wohlhart, Christian Leistner, Amir Saffari, Peter M. Roth, Horst Bischof
Abstract: This paper introduces a novel classification method termed Alternating Decision Forests (ADFs), which formulates the training of Random Forests explicitly as a global loss minimization problem. During training, the losses are minimized via keeping an adaptive weight distribution over the training samples, similar to Boosting methods. In order to keep the method as flexible and general as possible, we adopt the principle of employing gradient descent in function space, which allows to minimize arbitrary losses. Contrary to Boosted Trees, in our method the loss minimization is an inherent part of the tree growing process, thus allowing to keep the benefits ofcommon Random Forests, such as, parallel processing. We derive the new classifier and give a discussion and evaluation on standard machine learning data sets. Furthermore, we show how ADFs can be easily integrated into an object detection application. Compared to both, standard Random Forests and Boosted Trees, ADFs give better performance in our experiments, while yielding more compact models in terms of tree depth.
5 0.59746665 403 cvpr-2013-Sparse Output Coding for Large-Scale Visual Recognition
Author: Bin Zhao, Eric P. Xing
Abstract: Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands. In this paper, we propose sparse output coding, a principled way for large-scale multi-class classification, by turning high-cardinality multi-class categorization into a bit-by-bit decoding problem. Specifically, sparse output coding is composed of two steps: efficient coding matrix learning with scalability to thousands of classes, and probabilistic decoding. Empirical results on object recognition and scene classification demonstrate the effectiveness ofour proposed approach.
6 0.59530139 15 cvpr-2013-A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration
7 0.59504825 239 cvpr-2013-Kernel Null Space Methods for Novelty Detection
8 0.59282804 143 cvpr-2013-Efficient Large-Scale Structured Learning
9 0.58864921 34 cvpr-2013-Adaptive Active Learning for Image Classification
10 0.58470595 261 cvpr-2013-Learning by Associating Ambiguously Labeled Images
11 0.57963234 168 cvpr-2013-Fast Object Detection with Entropy-Driven Evaluation
12 0.57555467 8 cvpr-2013-A Fast Approximate AIB Algorithm for Distributional Word Clustering
13 0.56987214 377 cvpr-2013-Sample-Specific Late Fusion for Visual Category Recognition
14 0.56859523 390 cvpr-2013-Semi-supervised Node Splitting for Random Forest Construction
15 0.56190985 201 cvpr-2013-Heterogeneous Visual Features Fusion via Sparse Multimodal Machine
17 0.54547399 41 cvpr-2013-An Iterated L1 Algorithm for Non-smooth Non-convex Optimization in Computer Vision
18 0.52826065 173 cvpr-2013-Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
19 0.52483094 406 cvpr-2013-Spatial Inference Machines
20 0.52224648 262 cvpr-2013-Learning for Structured Prediction Using Approximate Subgradient Descent with Working Sets
topicId topicWeight
[(10, 0.121), (16, 0.034), (26, 0.052), (28, 0.032), (33, 0.266), (41, 0.198), (67, 0.05), (69, 0.061), (87, 0.095)]
simIndex simValue paperId paperTitle
same-paper 1 0.87003374 320 cvpr-2013-Optimizing 1-Nearest Prototype Classifiers
Author: Paul Wohlhart, Martin Köstinger, Michael Donoser, Peter M. Roth, Horst Bischof
Abstract: The development of complex, powerful classifiers and their constant improvement have contributed much to the progress in many fields of computer vision. However, the trend towards large scale datasets revived the interest in simpler classifiers to reduce runtime. Simple nearest neighbor classifiers have several beneficial properties, such as low complexity and inherent multi-class handling, however, they have a runtime linear in the size of the database. Recent related work represents data samples by assigning them to a set of prototypes that partition the input feature space and afterwards applies linear classifiers on top of this representation to approximate decision boundaries locally linear. In this paper, we go a step beyond these approaches and purely focus on 1-nearest prototype classification, where we propose a novel algorithm for deriving optimal prototypes in a discriminative manner from the training samples. Our method is implicitly multi-class capable, parameter free, avoids noise overfitting and, since during testing only comparisons to the derived prototypes are required, highly efficient. Experiments demonstrate that we are able to outperform related locally linear methods, while even getting close to the results of more complex classifiers.
2 0.86175084 81 cvpr-2013-City-Scale Change Detection in Cadastral 3D Models Using Images
Author: Aparna Taneja, Luca Ballan, Marc Pollefeys
Abstract: In this paper, we propose a method to detect changes in the geometry of a city using panoramic images captured by a car driving around the city. We designed our approach to account for all the challenges involved in a large scale application of change detection, such as, inaccuracies in the input geometry, errors in the geo-location data of the images, as well as, the limited amount of information due to sparse imagery. We evaluated our approach on an area of 6 square kilometers inside a city, using 3420 images downloaded from Google StreetView. These images besides being publicly available, are also a good example of panoramic images captured with a driving vehicle, and hence demonstrating all the possible challenges resulting from such an acquisition. We also quantitatively compared the performance of our approach with respect to a ground truth, as well as to prior work. This evaluation shows that our approach outperforms the current state of the art.
3 0.85921884 220 cvpr-2013-In Defense of Sparsity Based Face Recognition
Author: Weihong Deng, Jiani Hu, Jun Guo
Abstract: The success of sparse representation based classification (SRC) has largely boosted the research of sparsity based face recognition in recent years. A prevailing view is that the sparsity based face recognition performs well only when the training images have been carefully controlled and the number of samples per class is sufficiently large. This paper challenges the prevailing view by proposing a “prototype plus variation ” representation model for sparsity based face recognition. Based on the new model, a Superposed SRC (SSRC), in which the dictionary is assembled by the class centroids and the sample-to-centroid differences, leads to a substantial improvement on SRC. The experiments results on AR, FERET and FRGC databases validate that, if the proposed prototype plus variation representation model is applied, sparse coding plays a crucial role in face recognition, and performs well even when the dictionary bases are collected under uncontrolled conditions and only a single sample per classes is available.
4 0.83962971 365 cvpr-2013-Robust Real-Time Tracking of Multiple Objects by Volumetric Mass Densities
Author: Horst Possegger, Sabine Sternig, Thomas Mauthner, Peter M. Roth, Horst Bischof
Abstract: Combining foreground images from multiple views by projecting them onto a common ground-plane has been recently applied within many multi-object tracking approaches. These planar projections introduce severe artifacts and constrain most approaches to objects moving on a common 2D ground-plane. To overcome these limitations, we introduce the concept of an occupancy volume exploiting the full geometry and the objects ’ center of mass and develop an efficient algorithm for 3D object tracking. Individual objects are tracked using the local mass density scores within a particle filter based approach, constrained by a Voronoi partitioning between nearby trackers. Our method benefits from the geometric knowledge given by the occupancy volume to robustly extract features and train classifiers on-demand, when volumetric information becomes unreliable. We evaluate our approach on several challenging real-world scenarios including the public APIDIS dataset. Experimental evaluations demonstrate significant improvements compared to state-of-theart methods, while achieving real-time performance. – –
5 0.83616388 61 cvpr-2013-Beyond Point Clouds: Scene Understanding by Reasoning Geometry and Physics
Author: Bo Zheng, Yibiao Zhao, Joey C. Yu, Katsushi Ikeuchi, Song-Chun Zhu
Abstract: In this paper, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset.
6 0.83591688 248 cvpr-2013-Learning Collections of Part Models for Object Recognition
8 0.83296579 242 cvpr-2013-Label Propagation from ImageNet to 3D Point Clouds
9 0.83248615 98 cvpr-2013-Cross-View Action Recognition via a Continuous Virtual Path
10 0.83179069 227 cvpr-2013-Intrinsic Scene Properties from a Single RGB-D Image
11 0.83150291 331 cvpr-2013-Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis
12 0.83135742 446 cvpr-2013-Understanding Indoor Scenes Using 3D Geometric Phrases
13 0.83113503 4 cvpr-2013-3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image
14 0.83100641 225 cvpr-2013-Integrating Grammar and Segmentation for Human Pose Estimation
15 0.83045268 303 cvpr-2013-Multi-view Photometric Stereo with Spatially Varying Isotropic Materials
16 0.83037049 155 cvpr-2013-Exploiting the Power of Stereo Confidences
17 0.82996523 372 cvpr-2013-SLAM++: Simultaneous Localisation and Mapping at the Level of Objects
18 0.82987058 445 cvpr-2013-Understanding Bayesian Rooms Using Composite 3D Object Models
19 0.82945347 143 cvpr-2013-Efficient Large-Scale Structured Learning
20 0.82936889 71 cvpr-2013-Boundary Cues for 3D Object Shape Recovery