nips nips2010 nips2010-132 knowledge-graph by maker-knowledge-mining

132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers


Source: pdf

Author: Leonidas Lefakis, Francois Fleuret

Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ch Abstract The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. [sent-5, score-0.815]

2 The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. [sent-6, score-0.149]

3 We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. [sent-7, score-0.942]

4 We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. [sent-10, score-0.261]

5 1 Introduction Object detection remains one of the core objectives of computer vision, either as an objective per se, for instance for automatic focusing on faces in digital cameras, or as means to get high-level understanding of natural scenes for robotics and image retrieval. [sent-11, score-0.209]

6 The computational cost of such approaches is controlled traditionally with a cascade, that is a succession of classifiers, each one being evaluated only if the previous ones in the sequence have not already rejected the candidate location. [sent-14, score-0.102]

7 In its original form, this approach constructs classifiers one after another during training, each one from examples which have not been rejected by the previous ones. [sent-16, score-0.108]

8 Finally the third drawback is the inability of a standard cascade to properly exploit 1 the trade-off between the different levels. [sent-21, score-0.686]

9 A response marginally below threshold at a certain level is enough to reject a sample, even if classifiers at other levels have strong responses. [sent-22, score-0.128]

10 At a more conceptual level, standard training for cascades does not allow the classifiers to exploit their joint modeling: Each classifier is trained as if it has to do the job alone, without having the opportunity to properly balance its own modeling effort and that of the other classifiers. [sent-23, score-0.221]

11 We interpret the individual responses of the classifiers as probabilities of responding positively, and define the overall response of the cascade as the probability of all the classifiers responding positively under an assumption of independence. [sent-25, score-0.863]

12 This noisy-AND model leads to a very simple criterion for a new Boosting procedure, which improves all the classifiers symmetrically on the positive samples, and focuses on improving the classifier with the best response on every negative sample. [sent-27, score-0.217]

13 We demonstrate the efficiency of this technique for face and pedestrian detection. [sent-28, score-0.178]

14 Experiments show that this joint cascade learning requires far less negative training examples, and achieves performance better than standard cascades without the need for intensive bootstrapping. [sent-29, score-0.922]

15 The idea common to these approaches is to rely on a form of adaptive testing : only candidates which cannot be trivially rejected as not being the object of interest will require heavy computation. [sent-32, score-0.147]

16 1 Reducing object detection computational cost Heisele et al. [sent-35, score-0.168]

17 [1] propose a hierarchy of linear Support Vector Machines, each trained on images of increasing resolution, to weed out background patches, followed by a final computationally intensive polynomial SVM. [sent-36, score-0.137]

18 Fleuret and Geman [5] introduce a hierarchy of classifiers dedicated to positive populations with geometrical poses of decreasing randomness. [sent-40, score-0.1]

19 This approach generalizes the cascade to more complex pose spaces, but as for cascades, trains the classifiers separately. [sent-41, score-0.667]

20 In [6] a branch and bound approach is utilized during scanning, while in [7] a divide and conquer approach is proposed, wherein regions in the image are either accepted or rejected as a whole or split and further processed. [sent-43, score-0.124]

21 The most popular approach however, for both its conceptual simplicity and practical efficiency, is the attentional cascade proposed by Viola and Jones [10]. [sent-45, score-0.715]

22 2 Improving attentional cascades In recent years approaches have been proposed that address some of the issues we list in the introduction. [sent-48, score-0.121]

23 In [14] the authors train a cascade with a global performance criteria and a single set of parameters common to all stages. [sent-49, score-0.729]

24 In [15] the authors address the asymmetric nature of the stage goals via a biased minimax probability machine, while in [16] the authors formulate the stage goals as a constrained optimization problem. [sent-50, score-0.236]

25 In [17] a alternate boosting method dubbed FloatBoost is proposed. [sent-51, score-0.103]

26 During training, fk (x) stands for that response after t steps of Boosting. [sent-59, score-0.17]

27 1 pk (x) = 1+exp(−fk (x)) probability of classifier k to response positively on x. [sent-60, score-0.181]

28 During training, t pt (x) stands for the same value after t steps of Boosting, computed from fk (x). [sent-61, score-0.23]

29 During training, pt (x) is that value after only t steps of Boosting, computed from the pt (x). [sent-63, score-0.298]

30 k Sochman and Matas [18] presented a Boosting algorithm based on sequential probability ratio tests, minimizing the average evaluation time subject to upper bounds on the false negative and false positive rates. [sent-64, score-0.182]

31 A general framework for probabilistic boosting trees (of which cascades are a degenerated case) was proposed in [19]. [sent-65, score-0.2]

32 In all these methods however, a set of free parameters concerning detection and false alarm performances must be set during training. [sent-66, score-0.167]

33 The authors in [20] use the output of each stage as an initial weak classifier of the boosting classifier in the next stage. [sent-68, score-0.229]

34 This allows the cascade to retain information between stages. [sent-69, score-0.667]

35 No information concerning the future performance of the cascade is available to each stage. [sent-71, score-0.716]

36 In [21] sample traces are utilized to keep track of the performance of the cascade on the training data, and thresholds are picked after the cascade training is finished. [sent-72, score-1.498]

37 However besides a validation set, a large number of negative examples must also be bootstrapped not only during the training phase, but also during the post-processing step of threshold and order calibration. [sent-74, score-0.176]

38 In [22] the authors attempt to jointly optimize a cascade of SVMs. [sent-77, score-0.727]

39 As can be seen, a cascade effectively performs an AND operation over the data, enforcing that a positive example passes all stages; and that a negative example be rejected by at least one stage. [sent-78, score-0.842]

40 In order to simulate this behavior, the authors attempt to minimize the maximum hinge loss over the SVMs for the positive examples, and to minimize the product of the hinge losses for the negative examples. [sent-79, score-0.134]

41 In [23] the authors present a method similar to ours, jointly optimizing a cascade using the product of the output of individual logistic regression base classifiers. [sent-81, score-0.727]

42 As is the case with the work in [22], the authors consider the ordering of the stages a priori fixed. [sent-83, score-0.134]

43 3 Method Our approach can be interpreted as a noisy-AND: The classifiers in the cascade produce stochastic Boolean predictions, conditionally independent given the signal to classify. [sent-84, score-0.667]

44 We define the global response of the cascade as the probability that all these predictions are positive. [sent-85, score-0.779]

45 This can be interpreted as if we were first computing from the signal x, for each classifier in the cascade, a probability pk (x), and defining the response of the cascade as the probability that K independent Bernoulli variables of parameters p1 (x), . [sent-86, score-0.816]

46 However, their approach aims at decomposing a complex population into a collection of homogeneous populations, while our objective is to speed up the computation for the detection of a homogeneous 3 population. [sent-92, score-0.108]

47 1 Formalization Let fk (x) stand for the non-thresholded response of the classifier at level k of the cascade. [sent-95, score-0.17]

48 From that, we define the final output of the cascade as the probability that all classifiers make positive predictions, under the assumption that they are conditionally independent, given x K p(x) = pk (x). [sent-97, score-0.755]

49 Conversely the example will be classified as negative if pk (x) = 0 for at least one k. [sent-99, score-0.125]

50 In order to train our cascade we consider the maximization of the joint maximum log likelihood of the data: y p(xn ) n (1 − p(xn ))1−yn . [sent-107, score-0.712]

51 k ∂fk (xn ) 1 − pt (xn ) (6) k,t It should be noted that in this formulation, the weight wn are signed, and these assigned to negative examples are negative. [sent-110, score-0.317]

52 k,t In the case of a positive example xn this simplifies to wn = 1 − pt (xn ) and thus this criterion k pushes every classifier in the cascade to maximize the response on positive samples, irrespective of the performance of the overall cascade. [sent-111, score-1.203]

53 t −p k,t In the case of a negative example however, the weight update rule becomes wn = 1−pt(xn )) (1 − (xn pt (xn )), each classifier in the cascade is then passed information regarding the overall performance k −pt via the term 1−pt(xn )) . [sent-112, score-0.985]

54 If the cascade is already rejecting the negative example, then this term (xn becomes 0 and the classifier ignores its performance on the specific example. [sent-113, score-0.732]

55 On the other hand, if the cascade is performing poorly, then the term becomes increasingly large and the classifiers put large weights on that example. [sent-114, score-0.667]

56 Furthermore, due to the term 1 − pt (xn ), each classifier puts larger weight on negative examples k that it is already performing well on, effectively partitioning the space of negative examples. [sent-115, score-0.305]

57 1 Implementation Details We comparatively evaluate the proposed cascade framework on two data-sets. [sent-122, score-0.667]

58 In [10] the authors present an initial comparison between their cascade framework and an AdaBoost classifier on the CMU-MIT data-set. [sent-123, score-0.708]

59 They train the monolithic classifier for 200 rounds and compare it against a simple cascade containing ten stages, each with 20 weak learners. [sent-124, score-0.766]

60 As cascade architecture plays an important role in the final performance of the cascade, and in order to avoid any issues in the comparison pertaining to architectural designs, we keep this structure and evaluate both the proposed cascade and the Viola and Jones cascade, using this architecture. [sent-125, score-1.334]

61 During the training, the thresholds for each stage in the Viola and Jones cascade are set to achieve a 99. [sent-127, score-0.755]

62 We experimented with bootstrapping a fixed number M of negative examples at fixed intervals, similar to [21] and attained higher performance than the one presented here. [sent-130, score-0.143]

63 2, JointCascade Augmented is the same, but is trained with as many negative examples as the total number used by the Viola and Jones cascade, and JointCascade Exponential uses the same number of negative samples as the basic setting, but uses the exponential version of the loss described in § 3. [sent-133, score-0.225]

64 1 Data-Sets Pedestrians For pedestrian detection we use the INRIA pedestrian data-set [25], which contains pedestrian images of various poses with high variance concerning background and lighting. [sent-138, score-0.564]

65 The training set consists of 1239 images of pedestrians as positive examples, and 12180 negative examples, mined from 1218 pedestrian-free images. [sent-139, score-0.308]

66 Of these we keep 900 images for training (together with their mirror images, for a total of 1800) and 9000 negative examples. [sent-140, score-0.209]

67 The remaining images in the original training set are put aside to be used as a validation set by the Viola and Jones cascade. [sent-141, score-0.149]

68 The trained classifiers are then tested on a test set composed of 1126 images of pedestrians and 18120 non-pedestrian images. [sent-144, score-0.158]

69 For training we use the same data-set as that used by Viola and Jones consisting of 4916 images of faces. [sent-149, score-0.112]

70 Of these we use 4000 (plus their mirror images) for training and set apart a further 916 (plus mirror images) for use as the validation set needed by the classical cascade approach. [sent-150, score-0.836]

71 The negative portion of the training set is comprised of 10000 non-face images, mined randomly from non-face containing images. [sent-151, score-0.15]

72 In order to test the trained classifiers, we extract the 507 faces in the data-set and scale-normalize to 24x24 images, a further 12700 non-face image patches are extracted from the background of the images in the data-set. [sent-152, score-0.191]

73 3 Bootstrap Images As, during training, the Viola and Jones cascade needs to bootstrap false positive examples after each stage, we randomly mine a data-set of approximately 7000 images from the web. [sent-156, score-0.844]

74 These images have been manually inspected to ensure that they do not contain either faces or pedestrians. [sent-157, score-0.14]

75 These images are used for bootstrapping in both sets of experiments. [sent-158, score-0.116]

76 3 Error rate The evaluation on the face data-set can be seen in Figure 1. [sent-160, score-0.11]

77 The ROC curves for the pedestrian detection task can be seen in Figure 2. [sent-164, score-0.197]

78 4 Optimization of the evaluation order As stated, one of the main motivations for using cascades is speed. [sent-171, score-0.116]

79 We compare the average number of stages visited per negative example for the various methods presented. [sent-172, score-0.135]

80 Typically in cascade training, the thresholds and orders of the various stages must be determined during training, either by setting them in an ad hoc manner or by using one of the optimization schemes of the many proposed. [sent-173, score-0.786]

81 In our case however, any decision concerning the thresholds as well as the ordering of the stages can be postponed till after training. [sent-174, score-0.191]

82 It is easy to derive for any given detection goal, a relevant threshold θ on the overall cascade responce. [sent-175, score-0.777]

83 Subsequently the image patch will be rejected if the product of any subset of strong classifiers has a value smaller than θ. [sent-177, score-0.105]

84 Based on this we use a greedy method to evaluate, using the original training set, the optimal order of classifiers as follows : Originally we chose as the first stage in our cascade, the classifier whose 6 Faces 1 True-positive rate 0. [sent-178, score-0.114]

85 9 Non-cascade AdaBoost VJ cascade JointCascade JointCascade Augmented JointCascade Exponential 0. [sent-180, score-0.667]

86 9 Non-cascade AdaBoost VJ cascade JointCascade JointCascade Augmented JointCascade Exponential 0. [sent-195, score-0.667]

87 false-positive rate on the pedestrian data-set for the methods proposed, AdaBoost and the Viola and Jones type cascade. [sent-203, score-0.141]

88 The JointCascade variants require marginally more operations at a fixed rate on the pedestrian population, and marginally less on the faces except at very conservative rates. [sent-207, score-0.315]

89 We then iteratively add to the order of the cascade, that classifier which leads to a response smaller than θ for the most negative examples, when multiplied with the aggregated response of the stages already ordered in the cascade. [sent-250, score-0.313]

90 As stated this ordering of the cascade stages is computed using the training set. [sent-251, score-0.808]

91 We then measure the speed of our ordered cascade on the same test sets as above, as shown on Table 2. [sent-252, score-0.692]

92 As can be seen, in the case of the face dataset, in almost all cases our approach is actually faster during scanning than the classical Viola and Jones approach. [sent-253, score-0.129]

93 The speed of our JointCascade approach on the pedestrian data-set is marginally worst than that of Viola and Jones, which is due to the lower false-positive rates. [sent-255, score-0.178]

94 5 Conclusion We have presented a new criterion to train a cascade of classifiers in a joint manner. [sent-256, score-0.747]

95 This approach has a clear probabilistic interpretation as a noisy-AND, and leads to a global decision criterion which avoids thresholding classifiers individually, and can exploit independence in the classifier response amplitudes. [sent-257, score-0.124]

96 Hierarchical classification and feature reduction for fast face detection with support vector machines. [sent-274, score-0.147]

97 Rapid object detection using a boosted cascade of simple features. [sent-314, score-0.851]

98 Fast human detection using a cascade of histograms of oriented gradients. [sent-321, score-0.777]

99 On the design of cascades of boosted ensembles for face detection. [sent-330, score-0.197]

100 MCBoost: Multiple classifier boosting for perceptual co-clustering of images and visual features. [sent-374, score-0.167]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cascade', 0.667), ('jointcascade', 0.387), ('viola', 0.198), ('jones', 0.17), ('ers', 0.162), ('pt', 0.149), ('classi', 0.139), ('pedestrian', 0.114), ('er', 0.104), ('xn', 0.103), ('boosting', 0.103), ('cascades', 0.097), ('response', 0.089), ('detection', 0.083), ('adaboost', 0.082), ('rejected', 0.082), ('fk', 0.081), ('wn', 0.077), ('faces', 0.076), ('cascaded', 0.07), ('stages', 0.07), ('pedestrians', 0.066), ('vision', 0.065), ('negative', 0.065), ('object', 0.065), ('face', 0.064), ('images', 0.064), ('pk', 0.06), ('fleuret', 0.055), ('bootstrapping', 0.052), ('thresholds', 0.049), ('concerning', 0.049), ('training', 0.048), ('recognition', 0.047), ('augmented', 0.047), ('weak', 0.046), ('scanning', 0.045), ('authors', 0.041), ('exponential', 0.041), ('stage', 0.039), ('marginally', 0.039), ('validation', 0.037), ('pattern', 0.037), ('heisele', 0.037), ('martigny', 0.037), ('mined', 0.037), ('sochman', 0.037), ('boosted', 0.036), ('criterion', 0.035), ('false', 0.035), ('floatboost', 0.032), ('brubaker', 0.032), ('idiap', 0.032), ('monolithic', 0.032), ('mullin', 0.032), ('positively', 0.032), ('mirror', 0.032), ('vj', 0.031), ('cyclic', 0.03), ('graf', 0.03), ('goals', 0.028), ('positive', 0.028), ('trained', 0.028), ('overall', 0.027), ('rate', 0.027), ('computer', 0.027), ('oriented', 0.027), ('sliding', 0.026), ('subwindow', 0.026), ('examples', 0.026), ('poses', 0.026), ('speed', 0.025), ('boolean', 0.025), ('yn', 0.025), ('attentional', 0.024), ('responding', 0.024), ('bootstrap', 0.024), ('switzerland', 0.024), ('concentrates', 0.024), ('conceptual', 0.024), ('joint', 0.024), ('hierarchy', 0.024), ('predictions', 0.023), ('ordering', 0.023), ('pages', 0.023), ('image', 0.023), ('populations', 0.022), ('par', 0.022), ('intensive', 0.021), ('conference', 0.021), ('train', 0.021), ('nal', 0.02), ('cost', 0.02), ('classical', 0.02), ('conservative', 0.02), ('asymmetric', 0.02), ('evaluation', 0.019), ('drawback', 0.019), ('utilized', 0.019), ('jointly', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

Author: Leonidas Lefakis, Francois Fleuret

Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1

2 0.47915101 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

3 0.29392388 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

Author: David Weiss, Benjamin Sapp, Ben Taskar

Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1

4 0.2146039 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen

Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1

5 0.12679236 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

Author: Matthew Blaschko, Andrea Vedaldi, Andrew Zisserman

Abstract: A standard approach to learning object category detectors is to provide strong supervision in the form of a region of interest (ROI) specifying each instance of the object in the training images [17]. In this work are goal is to learn from heterogeneous labels, in which some images are only weakly supervised, specifying only the presence or absence of the object or a weak indication of object location, whilst others are fully annotated. To this end we develop a discriminative learning approach and make two contributions: (i) we propose a structured output formulation for weakly annotated images where full annotations are treated as latent variables; and (ii) we propose to optimize a ranking objective function, allowing our method to more effectively use negatively labeled images to improve detection average precision performance. The method is demonstrated on the benchmark INRIA pedestrian detection dataset of Dalal and Triggs [14] and the PASCAL VOC dataset [17], and it is shown that for a significant proportion of weakly supervised images the performance achieved is very similar to the fully supervised (state of the art) results. 1

6 0.12593226 15 nips-2010-A Theory of Multiclass Boosting

7 0.10471314 190 nips-2010-On the Convexity of Latent Social Network Inference

8 0.10027369 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

9 0.090492226 282 nips-2010-Variable margin losses for classifier design

10 0.090209231 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

11 0.081411563 192 nips-2010-Online Classification with Specificity Constraints

12 0.079685077 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

13 0.078404136 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition

14 0.069426507 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata

15 0.068382859 228 nips-2010-Reverse Multi-Label Learning

16 0.067646869 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

17 0.061558966 149 nips-2010-Learning To Count Objects in Images

18 0.055621423 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

19 0.054876905 94 nips-2010-Feature Set Embedding for Incomplete Data

20 0.05429725 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.176), (1, 0.074), (2, -0.065), (3, -0.193), (4, 0.046), (5, 0.066), (6, -0.173), (7, -0.012), (8, 0.038), (9, 0.037), (10, -0.14), (11, 0.087), (12, 0.057), (13, 0.141), (14, -0.027), (15, 0.074), (16, 0.195), (17, 0.311), (18, -0.458), (19, 0.2), (20, -0.042), (21, -0.011), (22, -0.132), (23, 0.026), (24, -0.107), (25, 0.01), (26, -0.073), (27, -0.078), (28, -0.126), (29, -0.059), (30, -0.079), (31, -0.015), (32, 0.034), (33, 0.071), (34, -0.011), (35, -0.078), (36, -0.017), (37, 0.007), (38, 0.014), (39, 0.022), (40, 0.052), (41, -0.013), (42, -0.002), (43, -0.016), (44, -0.005), (45, 0.024), (46, -0.036), (47, 0.013), (48, 0.041), (49, 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94957185 42 nips-2010-Boosting Classifier Cascades

Author: Nuno Vasconcelos, Mohammad J. Saberian

Abstract: The problem of optimal and automatic design of a detector cascade is considered. A novel mathematical model is introduced for a cascaded detector. This model is analytically tractable, leads to recursive computation, and accounts for both classification and complexity. A boosting algorithm, FCBoost, is proposed for fully automated cascade design. It exploits the new cascade model, minimizes a Lagrangian cost that accounts for both classification risk and complexity. It searches the space of cascade configurations to automatically determine the optimal number of stages and their predictors, and is compatible with bootstrapping of negative examples and cost sensitive learning. Experiments show that the resulting cascades have state-of-the-art performance in various computer vision problems. 1

same-paper 2 0.92333984 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

Author: Leonidas Lefakis, Francois Fleuret

Abstract: The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1

3 0.62458116 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades

Author: David Weiss, Benjamin Sapp, Ben Taskar

Abstract: For many structured prediction problems, complex models often require adopting approximate inference techniques such as variational methods or sampling, which generally provide no satisfactory accuracy guarantees. In this work, we propose sidestepping intractable inference altogether by learning ensembles of tractable sub-models as part of a structured prediction cascade. We focus in particular on problems with high-treewidth and large state-spaces, which occur in many computer vision tasks. Unlike other variational methods, our ensembles do not enforce agreement between sub-models, but filter the space of possible outputs by simply adding and thresholding the max-marginals of each constituent model. Our framework jointly estimates parameters for all models in the ensemble for each level of the cascade by minimizing a novel, convex loss function, yet requires only a linear increase in computation over learning or inference in a single tractable sub-model. We provide a generalization bound on the filtering loss of the ensemble as a theoretical justification of our approach, and we evaluate our method on both synthetic data and the task of estimating articulated human pose from challenging videos. We find that our approach significantly outperforms loopy belief propagation on the synthetic data and a state-of-the-art model on the pose estimation/tracking problem. 1

4 0.52244341 15 nips-2010-A Theory of Multiclass Boosting

Author: Indraneel Mukherjee, Robert E. Schapire

Abstract: Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the “correct” requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements. 1

5 0.47424823 272 nips-2010-Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models

Author: Congcong Li, Adarsh Kowdle, Ashutosh Saxena, Tsuhan Chen

Abstract: In many machine learning domains (such as scene understanding), several related sub-tasks (such as scene categorization, depth estimation, object detection) operate on the same raw data and provide correlated outputs. Each of these tasks is often notoriously hard, and state-of-the-art classifiers already exist for many subtasks. It is desirable to have an algorithm that can capture such correlation without requiring to make any changes to the inner workings of any classifier. We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that maximizes the joint likelihood of the sub-tasks, while requiring only a ‘black-box’ interface to the original classifier for each sub-task. We use a two-layer cascade of classifiers, which are repeated instantiations of the original ones, with the output of the first layer fed into the second layer as input. Our training method involves a feedback step that allows later classifiers to provide earlier classifiers information about what error modes to focus on. We show that our method significantly improves performance in all the sub-tasks in two different domains: (i) scene understanding, where we consider depth estimation, scene categorization, event categorization, object detection, geometric labeling and saliency detection, and (ii) robotic grasping, where we consider grasp point detection and object classification. 1

6 0.36481723 190 nips-2010-On the Convexity of Latent Social Network Inference

7 0.34065333 192 nips-2010-Online Classification with Specificity Constraints

8 0.33287105 282 nips-2010-Variable margin losses for classifier design

9 0.32364446 24 nips-2010-Active Learning Applied to Patient-Adaptive Heartbeat Classification

10 0.31917381 175 nips-2010-Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers

11 0.31752378 228 nips-2010-Reverse Multi-Label Learning

12 0.28207192 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision

13 0.26775807 28 nips-2010-An Alternative to Low-level-Sychrony-Based Methods for Speech Detection

14 0.2553024 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach

15 0.24930969 2 nips-2010-A Bayesian Approach to Concept Drift

16 0.23732646 186 nips-2010-Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification

17 0.22462375 143 nips-2010-Learning Convolutional Feature Hierarchies for Visual Recognition

18 0.22339834 62 nips-2010-Discriminative Clustering by Regularized Information Maximization

19 0.21690392 281 nips-2010-Using body-anchored priors for identifying actions in single images

20 0.21652183 209 nips-2010-Pose-Sensitive Embedding by Nonlinear NCA Regression


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.036), (17, 0.021), (27, 0.053), (30, 0.049), (35, 0.014), (45, 0.189), (50, 0.133), (52, 0.044), (60, 0.039), (77, 0.044), (78, 0.218), (90, 0.078)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91226882 74 nips-2010-Empirical Bernstein Inequalities for U-Statistics

Author: Thomas Peel, Sandrine Anthoine, Liva Ralaivola

Abstract: We present original empirical Bernstein inequalities for U-statistics with bounded symmetric kernels q. They are expressed with respect to empirical estimates of either the variance of q or the conditional variance that appears in the Bernsteintype inequality for U-statistics derived by Arcones [2]. Our result subsumes other existing empirical Bernstein inequalities, as it reduces to them when U-statistics of order 1 are considered. In addition, it is based on a rather direct argument using two applications of the same (non-empirical) Bernstein inequality for U-statistics. We discuss potential applications of our new inequalities, especially in the realm of learning ranking/scoring functions. In the process, we exhibit an efficient procedure to compute the variance estimates for the special case of bipartite ranking that rests on a sorting argument. We also argue that our results may provide test set bounds and particularly interesting empirical racing algorithms for the problem of online learning of scoring functions. 1

2 0.89523673 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors

Author: Alessandro Chiuso, Gianluigi Pillonetto

Abstract: We introduce a new Bayesian nonparametric approach to identification of sparse dynamic linear systems. The impulse responses are modeled as Gaussian processes whose autocovariances encode the BIBO stability constraint, as defined by the recently introduced “Stable Spline kernel”. Sparse solutions are obtained by placing exponential hyperpriors on the scale factors of such kernels. Numerical experiments regarding estimation of ARMAX models show that this technique provides a definite advantage over a group LAR algorithm and state-of-the-art parametric identification techniques based on prediction error minimization. 1

3 0.85796726 151 nips-2010-Learning from Candidate Labeling Sets

Author: Jie Luo, Francesco Orabona

Abstract: In many real world applications we do not have access to fully-labeled training data, but only to a list of possible labels. This is the case, e.g., when learning visual classifiers from images downloaded from the web, using just their text captions or tags as learning oracles. In general, these problems can be very difficult. However most of the time there exist different implicit sources of information, coming from the relations between instances and labels, which are usually dismissed. In this paper, we propose a semi-supervised framework to model this kind of problems. Each training sample is a bag containing multi-instances, associated with a set of candidate labeling vectors. Each labeling vector encodes the possible labels for the instances in the bag, with only one being fully correct. The use of the labeling vectors provides a principled way not to exclude any information. We propose a large margin discriminative formulation, and an efficient algorithm to solve it. Experiments conducted on artificial datasets and a real-world images and captions dataset show that our approach achieves performance comparable to an SVM trained with the ground-truth labels, and outperforms other baselines.

4 0.82943851 112 nips-2010-Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning

Author: Prateek Jain, Sudheendra Vijayanarasimhan, Kristen Grauman

Abstract: We consider the problem of retrieving the database points nearest to a given hyperplane query without exhaustively scanning the database. We propose two hashingbased solutions. Our first approach maps the data to two-bit binary keys that are locality-sensitive for the angle between the hyperplane normal and a database point. Our second approach embeds the data into a vector space where the Euclidean norm reflects the desired distance between the original points and hyperplane query. Both use hashing to retrieve near points in sub-linear time. Our first method’s preprocessing stage is more efficient, while the second has stronger accuracy guarantees. We apply both to pool-based active learning: taking the current hyperplane classifier as a query, our algorithm identifies those points (approximately) satisfying the well-known minimal distance-to-hyperplane selection criterion. We empirically demonstrate our methods’ tradeoffs, and show that they make it practical to perform active selection with millions of unlabeled points. 1

5 0.82316232 222 nips-2010-Random Walk Approach to Regret Minimization

Author: Hariharan Narayanan, Alexander Rakhlin

Abstract: We propose a computationally efficient random walk on a convex body which rapidly mixes to a time-varying Gibbs distribution. In the setting of online convex optimization and repeated games, the algorithm yields low regret and presents a novel efficient method for implementing mixture forecasting strategies. 1

same-paper 6 0.82300961 132 nips-2010-Joint Cascade Optimization Using A Product Of Boosted Classifiers

7 0.76743853 42 nips-2010-Boosting Classifier Cascades

8 0.76392007 23 nips-2010-Active Instance Sampling via Matrix Partition

9 0.75957954 36 nips-2010-Avoiding False Positive in Multi-Instance Learning

10 0.75529462 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average

11 0.74684262 282 nips-2010-Variable margin losses for classifier design

12 0.74502951 25 nips-2010-Active Learning by Querying Informative and Representative Examples

13 0.74001312 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices

14 0.73927814 22 nips-2010-Active Estimation of F-Measures

15 0.73923057 52 nips-2010-Convex Multiple-Instance Learning by Estimating Likelihood Ratio

16 0.73588341 243 nips-2010-Smoothness, Low Noise and Fast Rates

17 0.73540622 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes

18 0.73268116 63 nips-2010-Distributed Dual Averaging In Networks

19 0.7319631 174 nips-2010-Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition

20 0.73127538 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes